Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Sep 26;15:33291. doi: 10.1038/s41598-025-15052-2

Secure federated learning with metaheuristic optimized dimensionality reduction and multi-head attention for DDoS attack mitigation

Adwan A Alanazi 1, Ashrf Althbiti 2, Sara Abdelwahab Ghorashi 3, Fathea M O Birkea 4, Roosvel Soto-Diaz 5,, José Escorcia-Gutierrez 6,
PMCID: PMC12475425  PMID: 41006431

Abstract

With the fast development of Internet of Things (IoT) devices, it is urgently needed to understand the real-time cybersecurity risks posed to them actively. In the ever-growing field of IoT environments, Distributed Denial of Service (DDoS) threats pose an essential challenge, cooperating with the reliability of these methods. These attacks are usually utilized in real-time to write down e-commerce platforms, government websites, and banking systems. To deal with the DDoS attacks, there’s an increased interest in decentralized learning methods, especially federated learning (FL), a newly acquired enhanced examination from the cyberattack cooperatively trained deep learning (DL) methods with dispersed cyber threats summaries. The recommendation of FL resolves the data privacy problem successfully. FL intends to form a global approach by allowing multi-participants with local information to train a similar method in a distributed way, with outcomes without replacing sample data. This paper presents a Metaheuristic-Driven Dimensionality Reduction for Robust Attack Defense Using Deep Learning Models (MDRRAD-DLM) in real-world IoT applications. The aim is to propose effective detection and mitigation strategies for DDoS attacks. The data preprocessing phase initially applies Z-score normalization to transform the input data into a standardized format. Furthermore, the parrot optimization (PO) technique is employed for the feature selection process to select the significant and relevant features from input data. Moreover, the temporal convolutional network and bi-directional gated recurrent unit with multi-head attention (TCN-MHA-Bi-GRU) technique is implemented for the attack classification process. Finally, the elk herd optimizer (EHO) technique fine-tunes the parameter selection of the TCN-MHA-Bi-GRU technique. The efficiency of the MDRRAD-DLM approach is examined under NSLKDD and CIC-IDS2017 datasets. The experimental validation of the MDRRAD-DLM approach portrayed a superior accuracy value of 99.14% and 99.41% over the dual datasets.

Keywords: Dimensionality reduction, DDoS defense, Deep learning, IoT, Elk herd optimizer, Data pre-processing, Real-world application

Subject terms: Computer science, Information technology

Introduction

The IoT represents the evolution of the digital landscape, overcoming conventional devices such as smartphones and computers to produce a connected web of daily objects1. These objects, embedded in software, sensors, and other technology, flawlessly interact and interchange data with other networks and gadgets through the Internet. IoT is developed as a cornerstone of 21st-century digital modernization2. From smart thermostats and wearable health monitors to intelligent traffic systems and advanced manufacturing devices, the integration of IoT is rapidly expanding across diverse sectors. However, this growth also presents various vulnerabilities3. For example, many IoT devices suffer from weak authentication mechanisms, unencrypted communications, and outdated firmware, which increase their susceptibility to attacks. Among these threats, DDoS attacks are hazardous due to IoT devices’ widespread connectivity and limited security features4. These threats implicate overcoming a targeted system, like an IoT device or website, with an overflow of Internet traffic, rendering it inactive. Figure 1 depicts the general structure of a DDoS attack.

Fig. 1.

Fig. 1

Structure of DDoS attack.

With the fast evolution of 5G, system security concerns have become more severe, and a DDoS attack is harmful5. DDoS attacks in IoT environments include application layer (e.g., HTTP floods), network layer (e.g., SYN, UDP floods), and botnet-based attacks (e.g., Mirai), each targeting different system components to disrupt services. Because of the broad range of DDoS attacks, the kinds of threat datasets gathered by a solitary consumer aren’t necessary, and the recognition method trained depends upon this dataset’s particular restrictions. Simultaneously, several organizations sometimes never want to create every data flow in their system domain public6. With the security of users’ privacy creation of dataset sharing a limited factor, how to employ the dataset gathered from various fields without compromising the confidentiality of data, consequently to identify the flow of DDoS attacks widely, is a crucial concern to be resolved7. Mitigating and understanding DDoS risks is essential to guarantee service and cybersecurity stability in real-time settings.

Thus, a precise intrusion detection system (IDS) is required to diminish various kinds of threats effectively. IDS is a significant element in the security of a network. The use of FL in cybersecurity for IDS is investigated in an earlier study. Inspired by the above concerns, Google projected the concept of FL for data confidentiality preservation and on-device learning. FL permits the gadgets to learn a collaborative method without sharing data with a centralized server8. In other words, DL and machine learning (ML) are trained through various servers and devices with decentralized data around many iterations. FL is an iterative process where, in every round, the entire ML/DL technique is enhanced9. Additionally, it aids in improving the computational cost of the central processing servers, keeps data confidentiality, enhances bandwidth usage, and deals better with an overflow of numerous data communications10.

This paper presents a Metaheuristic-Driven Dimensionality Reduction for Robust Attack Defense Using Deep Learning Models (MDRRAD-DLM) in real-world IoT applications. The aim is to propose effective detection and mitigation strategies for DDoS attacks. The data preprocessing phase initially applies Z-score normalization to transform the input data into a standardized format. Furthermore, the parrot optimization (PO) technique is employed for the feature selection process to select the significant and relevant features from input data. Moreover, the temporal convolutional network and bi-directional gated recurrent unit with multi-head attention (TCN-MHA-Bi-GRU) technique is implemented for the attack classification process. Finally, the elk herd optimizer (EHO) technique fine-tunes the parameter selection of the TCN-MHA-Bi-GRU technique. The efficiency of the MDRRAD-DLM approach is examined under NSLKDD and CIC-IDS2017 datasets. The significant contribution of the MDRRAD-DLM approach is listed below.

  • The MDRRAD-DLM model applies Z-score normalization to standardize and scale input data, improving the quality and consistency of features for enhanced learning. This preprocessing step assists in reducing variability and ensuring that the data is well-prepared for subsequent feature selection and classification stages, improving the overall model performance.

  • The MDRRAD-DLM approach utilizes PO to choose the most relevant features, reducing dimensionality and improving computational efficiency. This optimization technique improves the accuracy of the classification process by concentrating on crucial data attributes, which contributes to better overall model performance and robustness.

  • The MDRRAD-DLM technique integrates a hybrid classifier integrating TCN and MHA-Bi-GRU, effectively capturing temporal dependencies and contextual data. This fusion improves the model’s capability of analyzing intrinsic sequential data, resulting in improved classification accuracy and robustness in handling varied input patterns.

  • The MDRRAD-DLM methodology implements the EHO model for optimized tuning of hyperparameters, which significantly improves classification performance. This optimization technique improves the method’s convergence speed and accuracy, ensuring more reliable and efficient predictions.

  • The MDRRAD-DLM method offers a novel and effective solution by integrating PO-based feature selection with a hybrid TCN-MHA-Bi-GRU classifier optimized through the EHO. It significantly enhances accuracy and computational efficiency in handling complex sequential data classification tasks. The incorporated use of advanced optimization and hybrid DL models uniquely improves predictive performance.

Literature of works

Rahmati11 projects an FL-driven cybersecurity Structure for IoT settings. The structure allows decentralized data processing by trained methods locally on edge gadgets, guaranteeing the confidentiality of data. The presented structure employs a recurrent neural network (RNN) for anomaly detection, enhanced for resource-constrained IoT systems. Vemulapalli and Sekhar12 progressed the structure of Customized Temporal FL through Adversarial Networks (CusTFL-AN) that integrate generative adversarial networks (GANs) and temporal convolutional networks (TCNs) for personalized and robust threat recognition. CusTFL-AN allows clients to train local models while preserving data confidentiality by creating synthetic datasets employing GANs and combining these at a central server, thus reducing risks connected with direct data sharing. Ragab et al.13 introduce an Advanced AI with an FL infrastructure for the privacy-preserving cyberattack detection (AAIFLF-PPCD) method in IoT. The projected method intends to guarantee scalable and robust cyber-attack recognition while maintaining the confidentiality of IoT users. Primarily, the proposed method employs a Harris hawk optimizer (HHO)-based FS to recognize the more relevant characteristics of the IoT. Subsequently, the SSAE is utilized to identify cyber-attacks. Jianping et al.14 developed an attention-based GNN to identify cross-department and cross-level system threats. It allows collective training of the model while safeguarding data confidentiality on distributed gadgets. Structuring system traffic data in chronological order and making a framework of graphs reliant on log density improves the precision of network threat recognition. An overview of the AM and the structure of the FedGAT method are employed to assess the connectivity among nodes. Subramanian and Chinnadurai15 intend to develop an innovative solution to tackle these restrictions over FL. The centralized method has progressed by combining attention networks and presents a quantum-stimulated federated averaging optimizer process for cyber threat identification. The projected method employs a hierarchical model aggregation process.

Bukhari et al.16 developed a new Stacked CNN and BiLSTM (SCNN-BiLSTM) technique for IDS in WSN. The FL-based SCNN-BiLSTM methodology is individual in its model, enabling various sensor nodes to collectively train a central global method without exposing confidential data, thus improving confidentiality issues. Javeed et al.17 present a horizontal FL model which combines BiLSTM and CNN for efficient IDS. Particularly, CNN was employed for spatial feature extractors, permitting the model to recognize local designs and significant potential intrusions. However, the element of Bi-LSTM acquires temporal dependence and learns sequential designs with the data. In18, an innovative FDIA detection model is projected to rely on FL, creating a global detection technique. In the projected model, the state owners perform an FL model employing their data, which evades huge data transmission and safeguards data confidentiality. Nandanwar and Katarya19 proposed Cyber-Sentinet, a DL-based IDS with Shapley Additive Explanations (SHAP) for interpretable and accurate detection of cyber-attacks in cyber-physical systems (CPS) within Industrial IoT (IIoT) environments. Nandanwar and Katarya20 proposed a robust DL method, AttackNet, based on an adaptive convolutional neural network–GRU (CNN-GRU) model, for the efficient and accurate detection and classification of botnet attacks in Industrial IoT (IIoT) environments. Nandanwar and Katarya21 proposed a blockchain (BC)-based decentralized application (DApp) integrated with IoT and non-interactive zero-knowledge proof (NIZKP) to securely manage healthcare data, using Ethereum smart contracts, BC data storage, and interplanetary file system (IPFS), while addressing security threats through an IDS. Nandanwar and Katarya22 developed a hybrid DL model using CNN–bidirectional long short-term memory (CNN-BiLSTM) with transfer learning (TL) for accurate detection and classification of Mirai and BASHLITE botnet attacks in IoT environments. Saheed and Chukwuere23 proposed a CPS-IIoT attack detection model using the Pearson correlation coefficient, agglomerative clustering, BiLSTM with scaled dot-product attention, and SHAP to enhance accuracy, privacy, and interpretability across diverse IIoT environments.

Kauhsik, Nandanwar, and Katarya24 identified existing IoT security solutions gaps and explored ML and DL techniques. Saheed, Omole, and Sabit25 proposed a genetic algorithm (GA) with attention mechanism (AM) and modified Adam-optimized LSTM (GA-mADAM-IIoT), a GmADAM-LSTM-based IDS with an AM and SHAP to detect threats in IIoT using real-world datasets. Nandanwar and Katarya26 provided an overview of BC architecture, components, security mechanisms, and applications across healthcare, IoT, smart grid, governance, defence, and military while analyzing security risks and countermeasures. Saheed and Misra27 proposed an explainable, privacy-preserving deep neural network (DNN) technique using SHAP for accurate and interpretable anomaly detection in CPS-IoT networks. Saheed and Chukwuere28 presented an explainable AI (XAI) ensemble TL model using SHAP and optimized DL methods to detect zero-day botnet attacks in the Internet of Vehicles (IoV), improving transparency, accuracy, and efficiency with limited labelled data. Alhashmi et al.29 proposed a DDoS attack detection method for smart grids using a DNN based on VGG19 integrated with the Harris Hawks optimization (HHO) method to improve real-time detection accuracy and efficiency. Saheed, Misra, and Chockalingam30 proposed an IDS using autoencoder (AE)-based feature reduction, deep CNN (DCNN), and LSTM to detect cyber-attacks in industrial control systems (ICS) without prior network knowledge, validated on ICS and gas pipeline datasets. Berríos et al.31 proposed an ML technique using random forest (RF), extreme gradient boosting (XGBoost), and LSTM models to detect and mitigate DDoS attacks in IoT and cloud environments. Saheed, Abdulganiyu, and Ait Tchakoucht32 proposed IoT-defender, a lightweight IDS integrating a modified GA (MGA) for feature selection and an LSTM network optimized via GA to detect cyberattacks in IoT networks within an edge computing (EC) framework. Pandey et al.33 proposed an enhanced IDS for wireless sensor networks (WSNs) by integrating tabu search (TS) optimization with an RF classifier to tune hyperparameters and improve attack detection performance automatically.

Despite crucial advances in FL, DL, and optimization-based IDS models for IoT, IIoT, CPS, and smart grids, several limitations still exist. The model shows lesser detection accuracy due to labelled data, which is often scarce or imbalanced. The computational complexity and resource constraints of IoT and edge devices challenge the deployment of deep models. Furthermore, several techniques lack comprehensive interpretability, mitigating trust and affecting effective threat analysis. Privacy concerns remain critical due to data sharing among distributed devices. The research gap is in developing lightweight, privacy-preserving, and explainable IDS frameworks that effectively handle data scarcity and heterogeneity while optimizing model parameters automatically and ensuring scalability across diverse IoT ecosystems.

Proposed methods

This paper proposes the MDRRAD-DLM approach in real-world IoT applications. The primary purpose of the MDRRAD-DLM approach is to propose effective detection and mitigation strategies for DDoS attacks. It includes data preprocessing, feature selection of subsets, hybrid attack classification, and hyperparameter tuning. Figure 2 represents the entire procedure of the MDRRAD-DLM model.

Fig. 2.

Fig. 2

Overall flow of MDRRAD-DLM approach.

Data preprocessing

Initially, the Z-score normalization was used to transform input data into a standardized format. Z-score normalization, otherwise named standardization, is a statistical approach to normalizing data by transforming it into the normal standard distribution with a mean of 0 and a standard deviation of 134. This model is chosen for its ability to standardize features by centring them around a mean of zero with a standard deviation of one. This transformation ensures that all features contribute equally to the learning process, preventing dominance by larger-scale features. This model shows supremacy in handling data variability and enhances the convergence speed and stability of gradient-based learning algorithms, specifically in DL and optimization contexts. The method is more appropriate for models like neural networks, where feature distribution uniformity improves training performance. Additionally, it preserves the original data distribution, making it an ideal choice in medical and diagnostic applications where accuracy is critical.

During DDoS attack detection, it assists in preprocessing network traffic data by scaling dissimilar features to an ordinary range, enhancing the performance of ML methods. It guarantees that features with larger numeric ranges do not regulate those with small ranges and permit fair weighting distribution in anomaly detection. This model is valuable in processing heterogeneous or skewed data, common in IoT-based systems. By using this model, detection methods can attain improved precision and fast convergence. It further assists in decreasing the influence of outliers, making the technique more strongly opposed to noisy data.

PO-based feature selection process

Furthermore, the PO technique is employed for the FS process35. This technique is chosen due to its efficiency in balancing exploration and exploitation in the search space. Inspired by parrots’ intelligent foraging and learning behaviour, PO efficiently detects the most relevant features while discarding redundant or noisy data. Compared to conventional methods like recursive feature elimination or mutual information, PO presents higher adaptability and global search capability, avoiding local optima. Its population-based strategy allows diverse candidate solutions, enhancing the robustness of the chosen feature subset. This results in mitigated dimensionality, faster model training, and improved classification performance. PO is more appropriate for complex biomedical data where feature relevance can be non-linear and interdependent. PO model mainly involves the four behaviours, which are given below.

Foraging behavior

Observe the position of food or owner; the parrot measures the estimated nutrition position and then flies towards it. So, the parrot’s movement is represented utilizing the succeeding equations:

graphic file with name 41598_2025_15052_Article_Equ1.gif 1
graphic file with name 41598_2025_15052_Article_Equ2.gif 2
graphic file with name 41598_2025_15052_Article_Equ3.gif 3

Here, Inline graphic specifies the existing location, Inline graphic is the upgraded location; Inline graphic represents the maximal iteration counts; Inline graphic is the average location of the existing population described in Eq. (2); Inline graphic signifies the Levy distribution stated in Eq. (3) that aids in representing the parrot flight; Inline graphic has specified a Inline graphic value that is employed to define the parrot’s flight; Inline graphic is the existing optimum location; Inline graphic is the existing iteration; Inline graphic specifies the movement relies on the related location to the owner, and Inline graphic Inline graphic is a purpose of the food position more accurately through monitoring the location of the entire population.

Staying behavior

Modelling the parrot’s behaviour remaining arbitrarily on diverse segments permits for the combination of randomness to the method of searching:

graphic file with name 41598_2025_15052_Article_Equ4.gif 4

Now Inline graphic is a flying process toward the owner, and Inline graphic represents an arbitrary stop on a specific portion of the owner’s body.

Communicating performance

Parrots are naturally sociable and frequently interact inside their flock, flying near the group and interacting while staying beyond it. During this PO model, it is presumed these dual behaviours are the equivalent possibilities of existence, and an average location of the existing population is acquired.

graphic file with name 41598_2025_15052_Article_Equ5.gif 5

Here, 0.Inline graphic specifies the sequence of an individual has become a part of the parrot group for interaction, whereas Inline graphic specifies the condition whereas an individual leaves right after communicating. Either behaviour is accomplished by creating an arbitrary number in the range of Inline graphic

Fear of stranger’s behavior

Parrots naturally fear unknown individuals and will remain beyond visitors and searching safeguards from their owners.

graphic file with name 41598_2025_15052_Article_Equ6.gif 6

Here, Inline graphic specifies the reorienting process to fly near the owner, and Inline graphic. Inline graphic represent the process of distancing itself from strangers.

The fitness function (FF) reproduces the classification accuracy and the chosen feature amounts. It exploits the classification precision and lowers the fixed dimension of the selected features. Then, the next FF is applied to evaluate individual solutions, as provided in Eq. (7).

graphic file with name 41598_2025_15052_Article_Equ7.gif 7

However, Inline graphic epitomizes the classifier error using the chosen features. Inline graphic is determined as the inappropriate percentage categorized to the no. of classifications finished, indicated as the rate among (0,1). (Inline graphic refers to negating the classification precision), Inline graphic denotes selected feature counts, and Inline graphic signifies the complete number of features in the new data. Inline graphic is executed for controlling the importance of classification quality and subset length.

TCN-MHA-Bi-GRU-based classification process

In this section, the proposed MDRRAD-DLM model implements the TCN-MHA-Bi-GRU technique for the attack classification process. This model is chosen for its superior capability in capturing both short- and long-term dependencies in sequential data. TCNs provide parallel processing and stable gradients, making them more efficient than conventional RNNs for long sequences. Bi-GRUs improve contextual understanding by analyzing input in both forward and backward directions. The integration of MHA additionally strengthens the model by allowing it to concentrate on multiple relevant parts of the sequence simultaneously. This hybrid technique integrates temporal precision, contextual depth, and dynamic attention, outperforming standalone LSTMs, GRUs, or CNNs in complex classification tasks involving time-dependent biomedical data. Figure 3 represents the structure of TCN-MHA-Bi-GRU.

Fig. 3.

Fig. 3

Structure of TCN-MHA-Bi-GRU model.

TCN block: In the DL, exploring TCNs as a powerful replacement for traditional recurrent structures has triggered the paradigm move36. The underpinning idea of TCNs rests in using Inline graphic temporal convolutions to explain complex patterns and dependencies inside sequential data. The convolutional process illustrates this mathematical structure:

graphic file with name 41598_2025_15052_Article_Equ8.gif 8

where as Inline graphic signifies the output at time-step Inline graphic =Inline graphic denotes the activation function, Inline graphic characterizes filter weights, Inline graphic symbolizes inputs at previous time-steps, Inline graphic denotes the biased term, and Inline graphic signifies the kernel dimensions, makes the backbone of TCNs. This method allows TCNs to effectively capture temporal nuances and complex dependencies inside sequences, thus providing a new vision on information propagation and memory retention in sequence modelling tasks. TCNs are increasingly favoured due to their superior capability in capturing long-term dependencies compared to conventional recurrent models. This enhanced memory retention makes TCNs highly effective for tasks that demand comprehension of extended historical context, enabling efficient and precise processing of complex sequential data.

The separator method stimulates the ConvTasNet module and measures a multiplicative function, generally identified as a mask, for all target sources in the input signal. The separator utilizes a TCN to evaluate these masks successfully. The TCN separation module processes the encoding features from the encoder output and gives masks that separate and improve the target sources. Using these masks, this method can remove and rebuild individual modules from composite auditory signals, guaranteeing efficient and accurate separation.

MHA layer: The Attention model permits the method to excel in capturing long‐ or short-term dependencies, relationships, and context in challenging environments. Attention Transformers can meaningfully improve processing by learning to amplify and detect, mainly in noisy environments or with distorted inputs. The AM is mathematically characterized by the Eq. (9):

graphic file with name 41598_2025_15052_Article_Equ9.gif 9

Inline graphic, and Inline graphic correspondingly represent values, queries, and keys. These keys are applied to calculate Attention weights, representing the significance of dissimilar sections. Values characterize the processed data that, after utilizing Attention weights, are collected into the last recognized output. Furthermore, Inline graphic denotes the key’s dimensionality.

MHA, a basic concept in modern neural network (NN) frameworks, transforms traditional AMs by simultaneously inspiring the model’s capability to handle information through numerous views. By presenting MHA, all furnished with different learned linear projections converting values, queries, and keys into various sizes, the method advances the capability for exploring complex relationships inside the data in parallel. The main formula leading MHA.

graphic file with name 41598_2025_15052_Article_Equ10.gif 10

where as every Inline graphic is calculated as:

graphic file with name 41598_2025_15052_Article_Equ11.gif 11

Now, the estimates are described by parameter matrices Inline graphic Inline graphic and Inline graphic

During this fresh methodology, instead of depending on a particular attention function with fixed‐dimensional inputs, the values, queries, and keys experience individual linear transformations Inline graphictimes. These transformed methods are then handled with AMs, making output values that summarize a wider range of information. Then, these outputs are united together over a connection and additional projection, concluding in a complete and improved model of the new data. This strategic design selection guarantees that the method can effectively harness the assistances of MHA without exponentially improving computational efficiency, thus paving the method for more advanced and efficient NN frameworks.

GRU layer: It is a form of RNN structure tailored to address the problem of vanishing gradients in classic RNNs. GRUs are related to LSTM units but have a direct structure with small parameters, making them mathematically low luxurious. GRUs are comprised of dual gates, such as update and gate. The reset gate expresses how much of the preceding data wants to be forgotten, whereas an update gate adjusts how much of the novel information is combined into the cell state. This gating method allows GRUs to take longer‐term dependencies in sequential data successfully.

In a GRU, the reset gate Inline graphic, update gate Inline graphic, new hidden layer (HL) Inline graphic, and candidate HL Inline graphic are calculated as shown:

The update gate is measured as:

graphic file with name 41598_2025_15052_Article_Equ12.gif 12

The gate of reset is calculated as Inline graphic, the candidate HL is established by Inline graphic Inline graphic, and the novel HL is upgraded based on Inline graphic

The Bi‐GRU expands the concept of the GRU by handling input sequences in either forward or backward directions concurrently. By incorporating dual GRU networks—one handling the input sequence from the start to end and another processing it backwards—Bi‐GRU captures dependencies from past or future contexts. The forward HL Inline graphic, reverse HL Inline graphic, and last Bi-GRU HL Inline graphic are established as Inline graphic, and Inline graphic Inline graphic, correspondingly.

These mathematical explanations summarize the core processing of Bi‐GRU and GRU methods, permitting them to proficiently seize dependencies in sequential data by upgrading HLs utilizing contextual and input information from past or future time steps. Bi-GRU methods can use information from either direction to understand and seizure composite patterns in sequential data, enhancing performance on various sequential learning tasks.

EHO-hyperparameter tuning model

Finally, the EHO model optimally alters the hyperparameter range of the TCN-MHA-Bi-GRU technique, resulting in higher classification performance37. This technique is employed due to its strong balance between exploration and exploitation in complex search spaces. This model effectually navigates diverse solution landscapes, avoiding premature convergence common in conventional methods like grid or random search. Its adaptive mechanism identifies the best-performing parameter combinations, improving model generalization. When applied to the TCN-MHA-Bi-GRU framework, EHO significantly enhances classification accuracy by fine-tuning critical parameters. EHO shows improved convergence stability and faster optimization for high-dimensional problems in biomedical classification tasks compared to other metaheuristic algorithms like PSO or GA.

The EHO is a new meta-heuristic technique stimulated by the elk herd’s breeding behaviour. This model balances exploitation and exploration in optimization tasks, making EHO an efficient solution for composite problems. The EHO is tailored to simulate the natural dynamics of elk herds over a sequence of essential stages. It starts with the population initialization and the problem parameters. The method then arrives at the rutting season, separating the population into families directed by the fittest bulls. During this calving season, these families yield novel solutions according to the features of the bull and its harems. At last, in the selection season, each solution is assessed, and the fittest are chosen to make the next generation, with this process repeated till the model converges or the iteration limit is attained. The stages of the EHO are as shown:

Initialization

In the initialization stage of the EHO, the model starts by setting up the population and describing the problem‐specific parameters. The fundamental initializing elements are the elk herd size Inline graphic, the bull rate Inline graphic, and the searching region boundaries. The elk herd Inline graphic is initialized as a matrix of size Inline graphic × Inline graphic. In contrast, Inline graphic denotes the problem’s dimensionality, and all elements in the matrix signify a possible solution (elk). Mathematically, every solution Inline graphic in the population is made inside the described searching region limits utilizing Eq. (13):

graphic file with name 41598_2025_15052_Article_Equ13.gif 13

whereas Inline graphic signifies the Inline graphic feature of the Inline graphic solution, Inline graphic and Inline graphic denote lower and upper limits for the Inline graphic attribute. Inline graphic represents a randomly generated value distributed uniformly among Inline graphic, 1). The fitness is calculated using the objective function Inline graphic, and the solutions are sorted according to their fitness values in ascending order. This initial setup makes the elk herd for the following stages of the model.

Making the primary Elk herd solutions

During this second stage, the model concentrates on making the first solution population, demonstrating the elk herd. After describing the problem‐specific parameters and initializing the population matrix Inline graphic in the initial phase, this stage includes assigning fitness values to every solution and establishing the herd architecture. The elk herd EH is produced as a matrix of dimensions Inline graphic, whereas every row is associated with a possible solution in the search area, as shown in Eq. (14).

graphic file with name 41598_2025_15052_Article_Equ14.gif 14

Once the early population was produced, fitness was assessed using the objective function Inline graphic. Then, the herd was classified in ascending order of fitness, guaranteeing that the optimal solutions (strong elks) were placed at the top. This ordered structure sets the basis for the following rutting season stage, whereas the population should be separated into families.

Rutting season

During this third stage, the EHO model splits the primary population into families, with all families directed by the bull (the appropriate individual). The division depends on the bull’s fitness, reflecting natural behaviour, whereas strong bulls guide the largest groups. Initially, the model establishes the bulls count Inline graphic inside the population utilizing the Inline graphic and the Inline graphic as demonstrated in Eq. (15):

graphic file with name 41598_2025_15052_Article_Equ15.gif 15

Here, Inline graphic denotes bull counts, Inline graphic represents bull rate, and Inline graphic means population size. The best Inline graphic individuals are chosen as bulls depending on their fitness values. Then, the bulls contest to make families, all containing a bull and its allocated harems (followers). The task of harems to every bull is completed by employing a roulette‐wheel selection method, while the probability. Inline graphic of a bull Inline graphic attract a harem is corresponding to its fitness offered in Eq. (16):

graphic file with name 41598_2025_15052_Article_Equ16.gif 16

Now, Inline graphic denotes the fitness of the Inline graphic bull, and the sum in the denominator is the complete fitness of all bulls. The roulette‐wheel selection guarantees that high-fitness bulls are more prone to direct more harems. After the harems are given, every bull manages its family, with the dimensions of every family reflecting the bull’s strength. This structured segment sets the phase for the calving season, whereas novel solutions (calves) should be made according to the bulls’ features and harems.

Calving season

During this fourth phase, the EHO model concentrates on making novel solutions (calves) inside every family according to the genetic properties of the bull (leader) and its harems (followers). This method imitators the natural reproduction procedure in elk herds, while the offspring inherit features from either parent, promoting diversity inside the population. For every family, novel solutions Inline graphic are made by incorporating features from the bull Inline graphic and its harems Inline graphic. When the calf’s index matches that of its bull father, the novel solution is made utilizing Eq. (17):

graphic file with name 41598_2025_15052_Article_Equ17.gif 17

Now, Inline graphic denotes a number generated randomly among (Inline graphic, 1), and Inline graphic represents randomly chosen features from the present population. This equation guarantees that the novel solution is affected mainly by the bull, with several variations presented by the arbitrary selection from the group. When the calf’s index matches that of its mother harem, the novel solution is made by joining the characteristics of either the mother or the bull, utilizing Eq. (18):

graphic file with name 41598_2025_15052_Article_Equ18.gif 18

Now, Inline graphic and Inline graphic are arbitrarily generated numbers within an interval of Inline graphic Inline graphic embodies the bull’s features, and Inline graphic stands for random features from other bulls. This calving method endures for each family, creating a novel generation of solutions that inherit the predecessors’ strengths. However, they present novel variations significant for the model’s exploitation and exploration abilities in the searching region.

Selection season

During this fifth stage, this method combines the newly-formed solutions (calves) with the recent population of bulls and harems to create a mixed herd. The bulls, harems, and newly-formed calves are combined into a particular matrix, Inline graphic. All fitness of the individual in Inline graphic is assessed utilizing the objective function, and the complete population is ordered in ascending order depending on fitness values. From this ordered population, the best Inline graphic individuals, where Inline graphic are chosen to make the novel population for the following iteration. This selection process parallels the Inline graphic selection approach generally applied in evolutionary methods, whereas either parent (bulls and harems) or offspring (calves) contest equally for survival. By constantly choosing the fittest individuals, the model iteratively improves the population, refining the complete fitness of the herd with every cycle. This selection procedure repeats till the end conditions, like maximal iteration counts or convergence to the best solution, are encountered. Algorithm 1 describes the EHO technique.

Algorithm 1.

Algorithm 1

EHO model

The EHO model originates from an FF that aims to reach the heightened outcome of a classifier. It establishes a progressive integer to characterize the enriched efficiency of the candidate solution. The classification rate of error reduction is reflected as FF, which is shown in Eq. (19).

graphic file with name 41598_2025_15052_Article_Equ19.gif 19

Performance analysis

The simulation analysis of the MDRRAD-DLM approach is examined under the NSLKDD dataset38. The technique is simulated using Python 3.6.5 on a PC with an i5-8600 k, 250 GB SSD, GeForce 1050Ti 4 GB, 16 GB RAM, and 1 TB HDD. Parameters include a learning rate of 0.01, ReLU activation, 50 epochs, 0.5 dropouts, and a batch size of 5. This dataset holds 148,517 samples under five classes, as depicted in Table 1. The complete no. of features is 42, where only 31 features are chosen.

Table 1.

Details of the NSLKDD dataset.

NSLKDD dataset
Class No. of samples
“Normal” 77,054
“DoS” 53,385
“Probe” 14,410
“R2L” 3416
“U2R” 252
Total samples 148,517

Figure 4 illustrates the classifier performance of the MDRRAD-DLM model on the NSLKDD dataset. Figures 4a, b displays the confusion matrices with accurate classification and detection of each class in 70%TRAPHA and 30%TESPHA. Figure 4c represents the PR curve, demonstrating higher performance through each class label. Ultimately, Fig. 4d signifies the identification of ROC, representing proficient outcomes with superior value of ROC for different class labels.

Fig. 4.

Fig. 4

NSLKDD dataset (a–b) 70%TRAPHA and 30%TESPHA and (c–d) PR and ROC curves.

Table 2 and Fig. 5 depict the attack recognition of MDRRAD-DLM methodology on the NSLKDD dataset. The performance specified that the MDRRAD-DLM method appropriately classified each class label. Based on 70% TRAPHA, the projected MDRRAD-DLM model attains an average Inline graphic of 99.11%, Inline graphic of 85.89%, Inline graphic of 80.43%, Inline graphicof 82.46%, and Inline graphic of 82.09%. Furthermore, based on 30% TESPHA, the presented MDRRAD-DLM method obtains an average Inline graphic of 99.14%, Inline graphic of 87.41%, Inline graphic of 82.42%, Inline graphic of 84.51%, and Inline graphic of 84.02%.

Table 2.

Attack detection of MDRRAD-DLM technique under the NSLKDD dataset.

Classes Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
TRAPHA (70%)
 Normal 98.32 98.46 98.30 98.38 96.64
 DoS 98.59 97.81 98.28 98.04 96.94
 Probe 99.45 96.29 98.09 97.18 96.88
 R2L 99.37 89.58 82.20 85.73 85.49
 U2R 99.83 47.31 25.29 32.96 34.51
 Average 99.11 85.89 80.43 82.46 82.09
TESPHA (30%)
 Normal 98.37 98.45 98.41 98.43 96.73
 DoS 98.62 97.89 98.27 98.08 97.01
 Probe 99.51 96.65 98.31 97.47 97.20
 R2L 99.37 90.22 81.21 85.48 85.28
 U2R 99.83 53.85 35.90 43.08 43.89
 Average 99.14 87.41 82.42 84.51 84.02

Fig. 5.

Fig. 5

Average of MDRRAD-DLM model under the NSLKDD dataset.

Figure 6 displays the training (TRAN) Inline graphic and validation (VALN) Inline graphic outcomes of the MDRRAD-DLM methodology under the NSLKDD dataset. The Inline graphic values are calculated for 0–25 epochs. The figure underlined that either Inline graphic values present increasing tendencies that described the ability of the MDRRAD-DLM methodology using a heightened solution through multiple iteration counts. Moreover, it remains closer to the epoch counts, specifying minimum over-fitting and displaying the MDRRAD-DLM technique’s greater outcome.

Fig. 6.

Fig. 6

Inline graphic curve of MDRRAD-DLM approach under the NSLKDD dataset.

In Fig. 7, the TRAN loss (TRANLOS) and VALN loss (VALNLOS) graphs of the MDRRAD-DLM approach on the NSLKDD dataset are demonstrated. The value loss is measured through an interval of 0–25 epochs. Either value depicts reducing tendencies, indicating the ability of the MDRRAD-DLM model to balance a trade-off. The recurrent decrease ensures a better solution of the MDRRAD-DLM method and modifies the prediction results.

Fig. 7.

Fig. 7

Loss curve of MDRRAD-DLM approach under the NSLKDD dataset.

Table 3 and Fig. 8 examine the comparative result of the MDRRAD-DLM approach on the NSLKDD dataset with existing methodologies3941. The solution highlighted that the Naïve Bayes (NB), k-nearest neighbours (KNN), Gradient Boosting (GB), IForest, MLP, CNN, and RNN models have stated poor performance. While, the projected MDRRAD-DLM method stated superior outcome with maximal Inline graphic, Inline graphic and Inline graphic of 99.14%, 87.41%, 82.42%, and 84.51%, respectively.

Table 3.

Comparative study of the MDRRAD-DLM technique with existing methods under the NSLKDD dataset.

NSLKDD dataset
Technique Inline graphic Inline graphic Inline graphic Inline graphic
NB 98.43 83.49 80.07 81.61
KNN algorithm 97.41 81.26 79.13 76.22
GB 93.35 81.93 75.63 77.12
IForest 89.51 79.43 79.17 83.55
MLP method 90.79 79.14 75.91 82.44
CNN classifier 98.19 80.98 78.69 75.64
RNN method 91.66 80.62 78.05 80.81
MDRRAD-DLM 99.14 87.41 82.42 84.51

Fig. 8.

Fig. 8

Comparative study of MDRRAD-DLM technique with existing methods under the NSLKDD dataset.

Table 4 and Fig. 9 demonstrate the computational time (CT) analysis of the MDRRAD-DLM technique with existing methods. Among the various methods evaluated, the NB technique required 14.57 s, while the KNN model took 11.25 s. The gradient boosting (GB) method and the multi-layer perceptron (MLP) approach attained similar times of 14.18 s and 14.44 s, respectively. The isolation forest (IForest) showed a faster CT of 7.05 s. CNN and RNN classifiers illustrated moderate CTs of 12.87 and 12.79 s, respectively. The MDRRAD-DLM method recorded the shortest CT of 5.66 s, indicating a more efficient processing capability than other methods. These CT values reflect the efficiency of each technique in handling the dataset and their suitability for real-time IDS.

Table 4.

CT analysis of the MDRRAD-DLM approach with existing models under the NSLKDD dataset.

NSLKDD dataset
Technique CT (sec)
NB 14.57
KNN algorithm 11.25
GB 14.18
IForest 7.05
MLP method 14.44
CNN classifier 12.87
RNN method 12.79
MDRRAD-DLM 5.66

Fig. 9.

Fig. 9

CT analysis of the MDRRAD-DLM approach with existing models under the NSLKDD dataset.

The ablation study of the MDRRAD-DLM methodology is specified in Table 5 and Fig. 10. The ablation study on the NSLKDD dataset compares the performance of the MDRRAD-DLM methodology with other existing techniques. The MDRRAD-DLM method achieved an Inline graphic of 99.14%, Inline graphic of 87.41%, Inline graphic of 82.42%, and an Inline graphic of 84.51%, outperforming POA, which recorded an Inline graphic of 96.94%, Inline graphic of 85.18%, Inline graphic of 80.50%, and an Inline graphic of 82.65%. Similarly, MDRRAD-DLM method outperformed EHOA with an Inline graphic of 97.74%, Inline graphic of 85.98%, Inline graphic of 81.30%, and an Inline graphic of 83.31%, as well as TCN-MHA-Bi-GRU, which achieved an Inline graphic of 98.43%, Inline graphic of 86.63%, Inline graphic of 81.91%, and an Inline graphic of 83.84%. This demonstrates the superior efficiency of the MDRRAD-DLM method across all key performance metrics.

Table 5.

Ablation study-based comparative analysis of the MDRRAD-DLM methodology against recent techniques.

NSLKDD dataset
Technique Inline graphic Inline graphic Inline graphic Inline graphic
POA 96.94 85.18 80.50 82.65
EHOA 97.74 85.98 81.30 83.31
TCN-MHA-Bi-GRU 98.43 86.63 81.91 83.84
MDRRAD-DLM 99.14 87.41 82.42 84.51

Fig. 10.

Fig. 10

Ablation study-based comparative analysis of the MDRRAD-DLM methodology against recent techniques.

Also, the proposed MDRRAD-DLM technique is confirmed below in the CIC-IDS2017 dataset42. This dataset holds 30,800 samples below 12 traffic types, as shown in Table 6. 78 features are present, but only 45 are selected features.

Table 6.

Details of the CIC-IDS2017 dataset.

CIC-IDS2017 dataset
Traffic type Samples
“Benign” 3000
“Bot” 1800
“DDoS” 3000
“DoS goldeneye” 3000
“DoS hulk” 3000
“DoS slowhttptest” 3000
“DoS slowloris” 3000
“FTP-PATATOR” 3000
“Portscan” 3000
“SSH-PATATOR” 3000
“Webattack bruteforce” 1500
“Webattack XSS” 500
Total 30,800

Figure 11 represents the classifier outcomes of the MDRRAD-DLM approach under the CIC-IDS2017 dataset. Figure 11a, b shows the confusion matrix with perfect classification and recognition of all class labels below 70%TRAPHA and 30%TESPHA. Figure 11c exhibits the PR analysis, signifying superior outcomes over each class. Besides, Fig. 11d illustrates the ROC values, indicative of efficient findings with greater values of ROC for dissimilar classes.

Fig. 11.

Fig. 11

CIC-IDS2017 dataset (a–b) 70%TRAPHA and 30%TESPHA and (c–d) PR and ROC curves.

Table 7 and Fig. 12 signify the attack detection of the MDRRAD-DLM approach on the CIC-IDS2017 dataset. The outcomes stated that the MDRRAD-DLM approach has correctly classified each dissimilar class label. Based on 70% TRAPHA, the MDRRAD-DLM approach attains an average Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic of 99.41%, 95.88%, 94.51%, 95.11%, and 94.83%. Furthermore, based on 30% TESPHA, the MDRRAD-DLM technique achieves average Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic of 99.37%, 96.13%, 94.02%, 94.85%, and 94.62%.

Table 7.

Attack detection of MDRRAD-DLM technique under the CIC-IDS2017 dataset.

Classes Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
TRAPHA (70%)
 Benign 99.26 96.19 96.19 96.19 95.77
 Bot 99.42 95.06 94.91 94.99 94.68
 DDoS 99.42 96.76 97.35 97.06 96.73
 DoS goldeneye 99.49 96.72 98.06 97.39 97.10
 DoS hulk 99.36 96.81 96.58 96.70 96.34
 DoS slowhttptest 99.35 96.20 97.15 96.67 96.31
 DoS slowloris 99.42 97.22 96.85 97.03 96.71
 FTP-PATATOR 99.36 96.58 96.91 96.75 96.39
 Portscan 99.36 96.12 97.35 96.73 96.37
 SSH-PATATOR 99.55 97.44 97.90 97.67 97.42
 Webattack bruteforce 99.47 95.50 93.40 94.44 94.17
 Webattack XSS 99.43 89.93 71.51 79.67 79.92
 Average 99.41 95.88 94.51 95.11 94.83
TESPHA (30%)
 Benign 99.33 95.86 97.34 96.59 96.22
 Bot 99.30 93.52 95.02 94.26 93.89
 DDoS 99.46 96.54 97.85 97.19 96.89
 DoS goldeneye 99.36 95.92 97.53 96.72 96.37
 DoS hulk 99.48 96.90 97.76 97.33 97.04
 DoS slowhttptest 99.43 96.98 97.09 97.03 96.71
 DoS slowloris 99.49 97.89 96.92 97.40 97.12
 FTP-PATATOR 99.22 95.79 96.22 96.00 95.57
 Portscan 99.26 95.66 96.73 96.19 95.79
 SSH-PATATOR 99.43 96.90 97.23 97.06 96.75
 Webattack bruteforce 99.36 95.19 91.63 93.38 93.06
 Webattack XSS 99.37 96.46 66.87 78.99 80.04
 Average 99.37 96.13 94.02 94.85 94.62

Fig. 12.

Fig. 12

Average of MDRRAD-DLM technique on CIC-IDS2017 dataset.

Figure 13 illustrates the TRAN Inline graphic and VALN Inline graphic values of the MDRRAD-DLM approach under the CIC-IDS2017 dataset. The Inline graphic values are calculated through a period of 0–25 epochs. The figure highlights that both values of Inline graphic demonstrate growing tendencies that notified the abilities of the MDRRAD-DLM approach with maximal outcome over numerous iterations. Simultaneously, both Inline graphic leftovers nearer through the epochs, which specifies lesser overfitting and shows the better result of the MDRRAD-DLM method.

Fig. 13.

Fig. 13

Inline graphic analysis of MDRRAD-DLM technique on the CIC-IDS2017 dataset.

Figure 14 shows the MDRRAD-DLM technique’s TRANLOS and VALNLOS graphs on the CIC-IDS2017 dataset. The loss values are computed throughout 0–25 epochs. Both values establish reducing tendencies, which notifies the MDRRAD-DLM technique’s capacity to balance a trade-off. The continual fall in loss values further guarantees the optimal performance of the MDRRAD-DLM model.

Fig. 14.

Fig. 14

Loss graph of MDRRAD-DLM approach on the CIC-IDS2017 dataset.

Table 8 and Fig. 15 compare the MDRRAD-DLM methodology’s comparison results on the CIC-IDS2017 dataset with existing techniques. The results highlight that the Deep Q-Learning, Deep RL, 2DQN, RF, PCA, AE, and HBOS have reported worse performance. In contrast, the proposed MDRRAD-DLM method reported maximum outcomes with high Inline graphic of 99.41%, Inline graphic of 95.88%, Inline graphic of 94.51% and Inline graphic of 95.11%, respectively.

Table 8.

Comparative study of the MDRRAD-DLM methodology under the CIC-IDS2017 dataset.

CIC-IDS2017 dataset
Approach Inline graphic Inline graphic Inline graphic Inline graphic
Deep Q-learning 92.91 91.37 89.95 89.48
Deep RL 96.84 92.50 90.80 92.45
2DQN model 96.77 94.79 93.15 95.00
RF 98.67 92.49 92.46 90.73
PCA method 97.30 91.07 92.80 90.00
AE 92.17 92.75 90.61 92.63
HBOS model 97.17 94.16 91.48 90.62
MDRRAD-DLM 99.41 95.88 94.51 95.11

Fig. 15.

Fig. 15

Comparative study of the MDRRAD-DLM methodology under the CIC-IDS2017 dataset.

Table 9 and Fig. 16 indicate the CT evaluation of the MDRRAD-DLM approach with existing techniques. The comparison of CT on the CIC-IDS2017 dataset shows that the MDRRAD-DLM approach is the fastest among all evaluated methods. The MDRRAD-DLM technique completes its process in just 4.16 s, significantly outperforming Deep Q-Learning, which requires 22.36 s, and Deep RL, which takes 9.72 s. The 2DQN model finishes in 8.34 s, while RF and PCA methods require 18.26 s and 16.54 s, respectively. The AE model performs relatively faster at 6.75 s, and the HBOS model completes in 12.30 s. These outcomes highlight the superior efficiency of the MDRRAD-DLM technique in terms of CT, enabling quicker processing without compromising detection capabilities.

Table 9.

CT evaluation of the MDRRAD-DLM approach with existing techniques under the CIC-IDS2017 dataset.

CIC-IDS2017 dataset
Approach CT (sec)
Deep Q-learning 22.36
Deep RL 9.72
2DQN model 8.34
RF 18.26
PCA method 16.54
AE 6.75
HBOS model 12.30
MDRRAD-DLM 4.16

Fig. 16.

Fig. 16

CT evaluation of the MDRRAD-DLM approach with existing techniques under the CIC-IDS2017 dataset.

The ablation study of the MDRRAD-DLM model is illustrated in Table 10 and Fig. 17. Tthe MDRRAD-DLM model achieved an Inline graphic of 99.41%, surpassing other methods that reached 97.00%, 97.89%, and 98.62%. The Inline graphic of the MDRRAD-DLM approach was 95.88%, higher than the comparative values of 93.69%, 94.45%, and 95.21%. For Inline graphic, the model recorded 94.51%, outperforming the others, which achieved 92.15%, 92.87%, and 93.69%. Additionally, the Inline graphic reached 95.11%, while the alternative approaches recorded 92.89%, 93.68%, and 94.34%. These results highlight the superior capability of the MDRRAD-DLM method in detecting and classifying threats within the CIC-IDS2017 dataset.

Table 10.

Ablation study results comparing the MDRRAD-DLM method with existing techniques.

CIC-IDS2017 dataset
Approach Inline graphic Inline graphic Inline graphic Inline graphic
POA 97.00 93.69 92.15 92.89
EHOA 97.89 94.45 92.87 93.68
TCN-MHA-Bi-GRU 98.62 95.21 93.69 94.34
MDRRAD-DLM 99.41 95.88 94.51 95.11

Fig. 17.

Fig. 17

Ablation study results comparing the MDRRAD-DLM method with existing techniques.

Conclusion

In this study, the MDRRAD-DLM approach in real-world IoT applications is proposed. The aim is to propose effectual detection and mitigation strategies for DDoS attacks. Initially, the data preprocessing phase applies Z-score normalization to transform input data into a beneficial layout. Furthermore, the PO technique is employed for the feature selection to select the significant and relevant features from input data. Moreover, the TCN-MHA-Bi-GRU model is implemented for the attack classification process. Finally, the EHO model optimally alters the hyperparameter range of the TCN-MHA-Bi-GRU technique, resulting in higher classification performance. The efficiency of the MDRRAD-DLM approach is examined under NSLKDD and CIC-IDS2017 datasets. The experimental validation of the MDRRAD-DLM approach portrayed a superior accuracy value of 99.14% and 99.41% over the dual datasets. The limitations of the MDRRAD-DLM approach comprise reliance on benchmark datasets, which may not fully represent real-world attack complexities or evolving threat landscapes. Due to unseen traffic patterns, the performance could vary when deployed in diverse and dynamic environments. While high accuracy was achieved, detecting low-frequency or novel attacks remains challenging. The computational requirements, although optimized, may still restrict deployment on ultra-constrained devices. Additionally, interpretability in complex scenarios may not be sufficient for non-expert users. Future research should improve adaptability to new threats, reduce false positives in highly imbalanced data, enhance lightweight deployment for edge devices, and expand evaluation using real-time traffic from multiple domains.

Acknowledgments

The authors extend their appreciation to Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R755),Princess Nourah bint Abdulrahman University, Riyadh,Saudi Arabia.

Author contributions

Adwan A. Alanazi: Conceptualization, methodology development, experiment, formal analysis, investigation, writing. Ashrf Althbiti: Formal analysis, investigation, validation, visualization, writing. Sara Abdelwahab Ghorashi: Formal analysis, review and editing. Fathea M.O. Birkea: Methodology, investigation. Roosvel Soto-Diaz: He has involved since the begining in Methodologydevelopment, formal analysis and review and editing. José Escorcia-Gutierrez: Conceptualization, methodology development, investigation, supervision, review and editing. All authors have read and agreed to the published version of the manuscript.

Data availability

The data supporting this study’s findings are openly available in the Kaggle repository at https://www.kaggle.com/datasets/hassan06/nslkdd and https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset, reference numbers38,39.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Roosvel Soto-Diaz, Email: roosvel.soto@unisimon.edu.co.

José Escorcia-Gutierrez, Email: jescorci56@cuc.edu.co.

References

  • 1.Zhang, J., Yu, P., Qi, L., Liu, S., Zhang, H. and Zhang, J. FLDDoS: DDoS attack detection model based on federated learning. In 2021 IEEE 20th International Conference on Trust, Security, and Privacy in Computing and Communications (TrustCom) (pp. 635–642). IEEE (2021).
  • 2.Tian, Q., Guang, C., Wenchao, C. and Si, W.. A lightweight residual networks framework for DDoS attack classification based on federated learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 1–6). IEEE (2021).
  • 3.Neto, E.C.P., Dadkhah, S. and Ghorbani, A.A. Collaborative DDoS detection in distributed multi-tenant IoT using federated learning. In 2022 19th Annual International Conference on Privacy, Security & Trust (PST) (pp. 1–10). IEEE (2022).
  • 4.Liu, Z., Guo, C., Liu, D. & Yin, X. An asynchronous federated learning arbitration model for low-rate DDOS attack detection. IEEE Access11, 18448–18460 (2023). [Google Scholar]
  • 5.Lv, D., Cheng, X., Zhang, J., Zhang, W., Zhao, W. and Xu, H. Ddos attack detection based on CNN and federated learning. In 2021 Ninth International Conference on Advanced Cloud and Big Data (CBD) (pp. 236–241). IEEE (2022).
  • 6.Pourahmadi, V., Alameddine, H. A., Salahuddin, M. A. & Boutaba, R. Spotting anomalies at the edge: Outlier exposure-based cross-silo federated learning for DDoS detection. IEEE Trans. Dependable Sec. Comput.20(5), 4002–4015 (2022). [Google Scholar]
  • 7.Nguyen, T.D., Rieger, P., Miettinen, M. and Sadeghi, A.R. Poisoning attacks on federated learning-based IoT intrusion detection system. In Proc. Workshop Decentralized IoT Syst. Secur. (DISS) (Vol. 79) (2020).
  • 8.Mothukuri, V. et al. Federated-learning-based anomaly detection for IoT security attacks. IEEE Internet Things J.9(4), 2545–2554 (2021). [Google Scholar]
  • 9.Agrawal, S. et al. Federated learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun.195, 346–361 (2022). [Google Scholar]
  • 10.Isma’ila, U.A., Danyaro, K.U., Hassan, M.F., Muazu, A.A. and Liew, M.S. An IoT Device-level vulnerability control model through federated detection. J. Intell. Syst. Internet Things12(2) (2024).
  • 11.Rahmati, M. Federated Learning-Driven Cybersecurity Framework for IoT Networks with Privacy-Preserving and Real-Time Threat Detection Capabilities. arXiv preprint arXiv:2502.10599 (2025).
  • 12.Vemulapalli, L. & Sekhar, P. C. A customized temporal federated learning through adversarial networks for cyber attack detection in IoT. J. Robot. Control JRC6(1), 366–384 (2025). [Google Scholar]
  • 13.Ragab, M. et al. Advanced artificial intelligence with federated learning framework for privacy-preserving cyberthreat detection in IoT-assisted sustainable smart cities. Sci. Rep.15(1), 4470 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jianping, W., Guangqiu, Q., Chunming, W., Weiwei, J. & Jiahe, J. Federated learning for network attack detection using attention-based graph neural networks. Sci. Rep.14(1), 19088 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Subramanian, G. & Chinnadurai, M. Hybrid quantum enhanced federated learning for cyber attack detection. Sci. Rep.14(1), 32038 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bukhari, S. M. S. et al. Secure and privacy-preserving intrusion detection in wireless sensor networks: Federated learning with SCNN-Bi-LSTM for enhanced reliability. Ad. Hoc. Netw.155, 103407 (2024). [Google Scholar]
  • 17.Javeed, D., Saeed, M. S., Adil, M., Kumar, P. & Jolfaei, A. A federated learning-based zero trust intrusion detection system for Internet of Things. Ad. Hoc. Netw.162, 103540 (2024). [Google Scholar]
  • 18.Lin, W. T., Chen, G. & Zhou, X. Privacy-preserving federated learning for detecting false data injection attacks on power systems. Electr. Power Syst. Res.229, 110150 (2024). [Google Scholar]
  • 19.Nandanwar, H. & Katarya, R. Securing Industry 5.0: An explainable deep learning model for intrusion detection in cyber-physical systems. Comp. Electr. Eng.123, 110161 (2025). [Google Scholar]
  • 20.Nandanwar, H. & Katarya, R. Deep learning enabled intrusion detection system for Industrial IOT environment. Expert Syst. Appl.249, 123808 (2024). [Google Scholar]
  • 21.Nandanwar, H. and Katarya, R., 2025. Privacy-preserving data sharing in blockchain-enabled IoT healthcare management system. Comp. J. pp. bxaf065.
  • 22.Nandanwar, H. & Katarya, R. TL-BILSTM IoT: Transfer learning model for prediction of intrusion detection system in IoT environment. Int. J. Inf. Secur.23(2), 1251–1277 (2024). [Google Scholar]
  • 23.Saheed, Y.K. & Chukwuere, J.E. CPS-IIoT-P2attention: Explainable privacy-preserving with scaled dot-product attention in cyber physical system-industrial IoT Network. IEEE Access (2025).
  • 24.Kauhsik, B., Nandanwar, H. & Katarya, R. IoT security: A deep learning-based approach for intrusion detection and prevention. In 2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT) (pp. 1–7). IEEE (2023).
  • 25.Saheed, Y. K., Omole, A. I. & Sabit, M. O. GA-mADAM-IIoT: A new lightweight threats detection in the industrial IoT via genetic algorithm with attention mechanism and LSTM on multivariate time series sensor data. Sens. Int.6, 100297 (2025). [Google Scholar]
  • 26.Nandanwar, H. and Katarya, R. A systematic literature review: Approach toward blockchain future research trends. In 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT) (pp. 259–264). IEEE (2023).
  • 27.Saheed, Y. K. & Misra, S. CPS-IoT-PPDNN: A new explainable privacy preserving DNN for resilient anomaly detection in cyber-physical systems-enabled IoT networks. Chaos Solitons Fractals191, 115939 (2025). [Google Scholar]
  • 28.Saheed, Y. K. & Chukwuere, J. E. Xaiensembletl-iov: A new explainable artificial intelligence ensemble transfer learning for zero-day botnet attack detection in the internet of vehicles. Results Eng.24, 103171 (2024). [Google Scholar]
  • 29.Alhashmi, A., Idwaib, H., Avci, S. A., Rahebi, J. & Ghadami, R. Distributed denial-of-service (DDoS) on the smart grids based on VGG19 deep neural network and Harris Hawks optimization algorithm. Sci. Rep.15(1), 1–18 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Saheed, Y.K., Misra, S. and Chockalingam, S. Autoencoder via DCNN and LSTM models for intrusion detection in industrial control systems of critical infrastructures. In 2023 IEEE/ACM 4th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS) (pp. 9–16). IEEE (2023).
  • 31.Berríos, S., Garcia, S., Hermosilla, P. & Allende-Cid, H. A machine-learning-based approach for the detection and mitigation of distributed denial-of-service attacks in internet of things environments. Appl. Sci.15(11), 6012 (2025). [Google Scholar]
  • 32.Saheed, Y. K., Abdulganiyu, O. H. & Ait Tchakoucht, T. Modified genetic algorithm and fine-tuned long short-term memory network for intrusion detection in the internet of things networks with edge capabilities. Appl. Soft Comput.155, 111434 (2024). [Google Scholar]
  • 33.Pandey, V. K. et al. Enhancing intrusion detection in wireless sensor networks using a Tabu search based optimized random forest. Sci. Rep.15(1), 1–21 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang, X., Yang, X., Zhou, J. & Ren, H. Z-score-based improved topsis method and its implementation for elderly people health examination results evaluation: A statistic case study in Harbin, China. Health Soc. Care Commun.2025(1), 5974609 (2025). [Google Scholar]
  • 35.Yang, Y., Fu, M., Zhou, X., Jia, C. & Wei, P. A multi-strategy parrot optimization algorithm and its application. Biomimetics10(3), 153 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Essaid, B., Kheddar, H., Batel, N. and Chowdhury, M.E. Deep learning-based coding strategy for improved cochlear implant speech perception in noisy environments. IEEE Access (2025).
  • 37.Ouertani, M.W., Oueslati, R. and Manita, G., Improved Binary Elk Herd Optimizer with Fitness Balance Distance for Feature Selection Using Gene Expression Data.
  • 38.https://www.kaggle.com/datasets/hassan06/nslkdd
  • 39.Finistrella, S., Mariani, S. and Zambonelli, F. Multi-agent reinforcement learning for cybersecurity: Classification and survey. Intell. Syst. Appl. 200495 (2025).
  • 40.Jemili, F., Jouini, K. & Korbaa, O. Detecting unknown intrusions from large heterogeneous data through ensemble learning. Intell. Syst. Appl.25, 200465 (2025). [Google Scholar]
  • 41.Srivastav, S., Shukla, A. K., Kumar, S. & Muhuri, P. K. HYRIDE: HYbrid and robust intrusion detection approach for enhancing cybersecurity in industry 4.0. Internet Things30, 101492 (2025). [Google Scholar]
  • 42.https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data supporting this study’s findings are openly available in the Kaggle repository at https://www.kaggle.com/datasets/hassan06/nslkdd and https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset, reference numbers38,39.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES