Secure federated learning with metaheuristic optimized dimensionality reduction and multi-head attention for DDoS attack mitigation

Adwan A Alanazi; Ashrf Althbiti; Sara Abdelwahab Ghorashi; Fathea M O Birkea; Roosvel Soto-Diaz; José Escorcia-Gutierrez

doi:10.1038/s41598-025-15052-2

. 2025 Sep 26;15:33291. doi: 10.1038/s41598-025-15052-2

Secure federated learning with metaheuristic optimized dimensionality reduction and multi-head attention for DDoS attack mitigation

Adwan A Alanazi ¹, Ashrf Althbiti ², Sara Abdelwahab Ghorashi ³, Fathea M O Birkea ⁴, Roosvel Soto-Diaz ^5,^✉, José Escorcia-Gutierrez ^6,^✉

PMCID: PMC12475425 PMID: 41006431

Abstract

With the fast development of Internet of Things (IoT) devices, it is urgently needed to understand the real-time cybersecurity risks posed to them actively. In the ever-growing field of IoT environments, Distributed Denial of Service (DDoS) threats pose an essential challenge, cooperating with the reliability of these methods. These attacks are usually utilized in real-time to write down e-commerce platforms, government websites, and banking systems. To deal with the DDoS attacks, there’s an increased interest in decentralized learning methods, especially federated learning (FL), a newly acquired enhanced examination from the cyberattack cooperatively trained deep learning (DL) methods with dispersed cyber threats summaries. The recommendation of FL resolves the data privacy problem successfully. FL intends to form a global approach by allowing multi-participants with local information to train a similar method in a distributed way, with outcomes without replacing sample data. This paper presents a Metaheuristic-Driven Dimensionality Reduction for Robust Attack Defense Using Deep Learning Models (MDRRAD-DLM) in real-world IoT applications. The aim is to propose effective detection and mitigation strategies for DDoS attacks. The data preprocessing phase initially applies Z-score normalization to transform the input data into a standardized format. Furthermore, the parrot optimization (PO) technique is employed for the feature selection process to select the significant and relevant features from input data. Moreover, the temporal convolutional network and bi-directional gated recurrent unit with multi-head attention (TCN-MHA-Bi-GRU) technique is implemented for the attack classification process. Finally, the elk herd optimizer (EHO) technique fine-tunes the parameter selection of the TCN-MHA-Bi-GRU technique. The efficiency of the MDRRAD-DLM approach is examined under NSLKDD and CIC-IDS2017 datasets. The experimental validation of the MDRRAD-DLM approach portrayed a superior accuracy value of 99.14% and 99.41% over the dual datasets.

Keywords: Dimensionality reduction, DDoS defense, Deep learning, IoT, Elk herd optimizer, Data pre-processing, Real-world application

Subject terms: Computer science, Information technology

Introduction

The IoT represents the evolution of the digital landscape, overcoming conventional devices such as smartphones and computers to produce a connected web of daily objects¹. These objects, embedded in software, sensors, and other technology, flawlessly interact and interchange data with other networks and gadgets through the Internet. IoT is developed as a cornerstone of 21st-century digital modernization². From smart thermostats and wearable health monitors to intelligent traffic systems and advanced manufacturing devices, the integration of IoT is rapidly expanding across diverse sectors. However, this growth also presents various vulnerabilities³. For example, many IoT devices suffer from weak authentication mechanisms, unencrypted communications, and outdated firmware, which increase their susceptibility to attacks. Among these threats, DDoS attacks are hazardous due to IoT devices’ widespread connectivity and limited security features⁴. These threats implicate overcoming a targeted system, like an IoT device or website, with an overflow of Internet traffic, rendering it inactive. Figure 1 depicts the general structure of a DDoS attack.

With the fast evolution of 5G, system security concerns have become more severe, and a DDoS attack is harmful⁵. DDoS attacks in IoT environments include application layer (e.g., HTTP floods), network layer (e.g., SYN, UDP floods), and botnet-based attacks (e.g., Mirai), each targeting different system components to disrupt services. Because of the broad range of DDoS attacks, the kinds of threat datasets gathered by a solitary consumer aren’t necessary, and the recognition method trained depends upon this dataset’s particular restrictions. Simultaneously, several organizations sometimes never want to create every data flow in their system domain public⁶. With the security of users’ privacy creation of dataset sharing a limited factor, how to employ the dataset gathered from various fields without compromising the confidentiality of data, consequently to identify the flow of DDoS attacks widely, is a crucial concern to be resolved⁷. Mitigating and understanding DDoS risks is essential to guarantee service and cybersecurity stability in real-time settings.

Thus, a precise intrusion detection system (IDS) is required to diminish various kinds of threats effectively. IDS is a significant element in the security of a network. The use of FL in cybersecurity for IDS is investigated in an earlier study. Inspired by the above concerns, Google projected the concept of FL for data confidentiality preservation and on-device learning. FL permits the gadgets to learn a collaborative method without sharing data with a centralized server⁸. In other words, DL and machine learning (ML) are trained through various servers and devices with decentralized data around many iterations. FL is an iterative process where, in every round, the entire ML/DL technique is enhanced⁹. Additionally, it aids in improving the computational cost of the central processing servers, keeps data confidentiality, enhances bandwidth usage, and deals better with an overflow of numerous data communications¹⁰.

This paper presents a Metaheuristic-Driven Dimensionality Reduction for Robust Attack Defense Using Deep Learning Models (MDRRAD-DLM) in real-world IoT applications. The aim is to propose effective detection and mitigation strategies for DDoS attacks. The data preprocessing phase initially applies Z-score normalization to transform the input data into a standardized format. Furthermore, the parrot optimization (PO) technique is employed for the feature selection process to select the significant and relevant features from input data. Moreover, the temporal convolutional network and bi-directional gated recurrent unit with multi-head attention (TCN-MHA-Bi-GRU) technique is implemented for the attack classification process. Finally, the elk herd optimizer (EHO) technique fine-tunes the parameter selection of the TCN-MHA-Bi-GRU technique. The efficiency of the MDRRAD-DLM approach is examined under NSLKDD and CIC-IDS2017 datasets. The significant contribution of the MDRRAD-DLM approach is listed below.

The MDRRAD-DLM model applies Z-score normalization to standardize and scale input data, improving the quality and consistency of features for enhanced learning. This preprocessing step assists in reducing variability and ensuring that the data is well-prepared for subsequent feature selection and classification stages, improving the overall model performance.
The MDRRAD-DLM approach utilizes PO to choose the most relevant features, reducing dimensionality and improving computational efficiency. This optimization technique improves the accuracy of the classification process by concentrating on crucial data attributes, which contributes to better overall model performance and robustness.
The MDRRAD-DLM technique integrates a hybrid classifier integrating TCN and MHA-Bi-GRU, effectively capturing temporal dependencies and contextual data. This fusion improves the model’s capability of analyzing intrinsic sequential data, resulting in improved classification accuracy and robustness in handling varied input patterns.
The MDRRAD-DLM methodology implements the EHO model for optimized tuning of hyperparameters, which significantly improves classification performance. This optimization technique improves the method’s convergence speed and accuracy, ensuring more reliable and efficient predictions.
The MDRRAD-DLM method offers a novel and effective solution by integrating PO-based feature selection with a hybrid TCN-MHA-Bi-GRU classifier optimized through the EHO. It significantly enhances accuracy and computational efficiency in handling complex sequential data classification tasks. The incorporated use of advanced optimization and hybrid DL models uniquely improves predictive performance.

Literature of works

Rahmati¹¹ projects an FL-driven cybersecurity Structure for IoT settings. The structure allows decentralized data processing by trained methods locally on edge gadgets, guaranteeing the confidentiality of data. The presented structure employs a recurrent neural network (RNN) for anomaly detection, enhanced for resource-constrained IoT systems. Vemulapalli and Sekhar¹² progressed the structure of Customized Temporal FL through Adversarial Networks (CusTFL-AN) that integrate generative adversarial networks (GANs) and temporal convolutional networks (TCNs) for personalized and robust threat recognition. CusTFL-AN allows clients to train local models while preserving data confidentiality by creating synthetic datasets employing GANs and combining these at a central server, thus reducing risks connected with direct data sharing. Ragab et al.¹³ introduce an Advanced AI with an FL infrastructure for the privacy-preserving cyberattack detection (AAIFLF-PPCD) method in IoT. The projected method intends to guarantee scalable and robust cyber-attack recognition while maintaining the confidentiality of IoT users. Primarily, the proposed method employs a Harris hawk optimizer (HHO)-based FS to recognize the more relevant characteristics of the IoT. Subsequently, the SSAE is utilized to identify cyber-attacks. Jianping et al.¹⁴ developed an attention-based GNN to identify cross-department and cross-level system threats. It allows collective training of the model while safeguarding data confidentiality on distributed gadgets. Structuring system traffic data in chronological order and making a framework of graphs reliant on log density improves the precision of network threat recognition. An overview of the AM and the structure of the FedGAT method are employed to assess the connectivity among nodes. Subramanian and Chinnadurai¹⁵ intend to develop an innovative solution to tackle these restrictions over FL. The centralized method has progressed by combining attention networks and presents a quantum-stimulated federated averaging optimizer process for cyber threat identification. The projected method employs a hierarchical model aggregation process.

Bukhari et al.¹⁶ developed a new Stacked CNN and BiLSTM (SCNN-BiLSTM) technique for IDS in WSN. The FL-based SCNN-BiLSTM methodology is individual in its model, enabling various sensor nodes to collectively train a central global method without exposing confidential data, thus improving confidentiality issues. Javeed et al.¹⁷ present a horizontal FL model which combines BiLSTM and CNN for efficient IDS. Particularly, CNN was employed for spatial feature extractors, permitting the model to recognize local designs and significant potential intrusions. However, the element of Bi-LSTM acquires temporal dependence and learns sequential designs with the data. In¹⁸, an innovative FDIA detection model is projected to rely on FL, creating a global detection technique. In the projected model, the state owners perform an FL model employing their data, which evades huge data transmission and safeguards data confidentiality. Nandanwar and Katarya¹⁹ proposed Cyber-Sentinet, a DL-based IDS with Shapley Additive Explanations (SHAP) for interpretable and accurate detection of cyber-attacks in cyber-physical systems (CPS) within Industrial IoT (IIoT) environments. Nandanwar and Katarya²⁰ proposed a robust DL method, AttackNet, based on an adaptive convolutional neural network–GRU (CNN-GRU) model, for the efficient and accurate detection and classification of botnet attacks in Industrial IoT (IIoT) environments. Nandanwar and Katarya²¹ proposed a blockchain (BC)-based decentralized application (DApp) integrated with IoT and non-interactive zero-knowledge proof (NIZKP) to securely manage healthcare data, using Ethereum smart contracts, BC data storage, and interplanetary file system (IPFS), while addressing security threats through an IDS. Nandanwar and Katarya²² developed a hybrid DL model using CNN–bidirectional long short-term memory (CNN-BiLSTM) with transfer learning (TL) for accurate detection and classification of Mirai and BASHLITE botnet attacks in IoT environments. Saheed and Chukwuere²³ proposed a CPS-IIoT attack detection model using the Pearson correlation coefficient, agglomerative clustering, BiLSTM with scaled dot-product attention, and SHAP to enhance accuracy, privacy, and interpretability across diverse IIoT environments.

Kauhsik, Nandanwar, and Katarya²⁴ identified existing IoT security solutions gaps and explored ML and DL techniques. Saheed, Omole, and Sabit²⁵ proposed a genetic algorithm (GA) with attention mechanism (AM) and modified Adam-optimized LSTM (GA-mADAM-IIoT), a GmADAM-LSTM-based IDS with an AM and SHAP to detect threats in IIoT using real-world datasets. Nandanwar and Katarya²⁶ provided an overview of BC architecture, components, security mechanisms, and applications across healthcare, IoT, smart grid, governance, defence, and military while analyzing security risks and countermeasures. Saheed and Misra²⁷ proposed an explainable, privacy-preserving deep neural network (DNN) technique using SHAP for accurate and interpretable anomaly detection in CPS-IoT networks. Saheed and Chukwuere²⁸ presented an explainable AI (XAI) ensemble TL model using SHAP and optimized DL methods to detect zero-day botnet attacks in the Internet of Vehicles (IoV), improving transparency, accuracy, and efficiency with limited labelled data. Alhashmi et al.²⁹ proposed a DDoS attack detection method for smart grids using a DNN based on VGG19 integrated with the Harris Hawks optimization (HHO) method to improve real-time detection accuracy and efficiency. Saheed, Misra, and Chockalingam³⁰ proposed an IDS using autoencoder (AE)-based feature reduction, deep CNN (DCNN), and LSTM to detect cyber-attacks in industrial control systems (ICS) without prior network knowledge, validated on ICS and gas pipeline datasets. Berríos et al.³¹ proposed an ML technique using random forest (RF), extreme gradient boosting (XGBoost), and LSTM models to detect and mitigate DDoS attacks in IoT and cloud environments. Saheed, Abdulganiyu, and Ait Tchakoucht³² proposed IoT-defender, a lightweight IDS integrating a modified GA (MGA) for feature selection and an LSTM network optimized via GA to detect cyberattacks in IoT networks within an edge computing (EC) framework. Pandey et al.³³ proposed an enhanced IDS for wireless sensor networks (WSNs) by integrating tabu search (TS) optimization with an RF classifier to tune hyperparameters and improve attack detection performance automatically.

Despite crucial advances in FL, DL, and optimization-based IDS models for IoT, IIoT, CPS, and smart grids, several limitations still exist. The model shows lesser detection accuracy due to labelled data, which is often scarce or imbalanced. The computational complexity and resource constraints of IoT and edge devices challenge the deployment of deep models. Furthermore, several techniques lack comprehensive interpretability, mitigating trust and affecting effective threat analysis. Privacy concerns remain critical due to data sharing among distributed devices. The research gap is in developing lightweight, privacy-preserving, and explainable IDS frameworks that effectively handle data scarcity and heterogeneity while optimizing model parameters automatically and ensuring scalability across diverse IoT ecosystems.

Proposed methods

This paper proposes the MDRRAD-DLM approach in real-world IoT applications. The primary purpose of the MDRRAD-DLM approach is to propose effective detection and mitigation strategies for DDoS attacks. It includes data preprocessing, feature selection of subsets, hybrid attack classification, and hyperparameter tuning. Figure 2 represents the entire procedure of the MDRRAD-DLM model.

Fig. 2 — Overall flow of MDRRAD-DLM approach.

Data preprocessing

Initially, the Z-score normalization was used to transform input data into a standardized format. Z-score normalization, otherwise named standardization, is a statistical approach to normalizing data by transforming it into the normal standard distribution with a mean of 0 and a standard deviation of 1³⁴. This model is chosen for its ability to standardize features by centring them around a mean of zero with a standard deviation of one. This transformation ensures that all features contribute equally to the learning process, preventing dominance by larger-scale features. This model shows supremacy in handling data variability and enhances the convergence speed and stability of gradient-based learning algorithms, specifically in DL and optimization contexts. The method is more appropriate for models like neural networks, where feature distribution uniformity improves training performance. Additionally, it preserves the original data distribution, making it an ideal choice in medical and diagnostic applications where accuracy is critical.

During DDoS attack detection, it assists in preprocessing network traffic data by scaling dissimilar features to an ordinary range, enhancing the performance of ML methods. It guarantees that features with larger numeric ranges do not regulate those with small ranges and permit fair weighting distribution in anomaly detection. This model is valuable in processing heterogeneous or skewed data, common in IoT-based systems. By using this model, detection methods can attain improved precision and fast convergence. It further assists in decreasing the influence of outliers, making the technique more strongly opposed to noisy data.

PO-based feature selection process

Furthermore, the PO technique is employed for the FS process³⁵. This technique is chosen due to its efficiency in balancing exploration and exploitation in the search space. Inspired by parrots’ intelligent foraging and learning behaviour, PO efficiently detects the most relevant features while discarding redundant or noisy data. Compared to conventional methods like recursive feature elimination or mutual information, PO presents higher adaptability and global search capability, avoiding local optima. Its population-based strategy allows diverse candidate solutions, enhancing the robustness of the chosen feature subset. This results in mitigated dimensionality, faster model training, and improved classification performance. PO is more appropriate for complex biomedical data where feature relevance can be non-linear and interdependent. PO model mainly involves the four behaviours, which are given below.

Foraging behavior

Observe the position of food or owner; the parrot measures the estimated nutrition position and then flies towards it. So, the parrot’s movement is represented utilizing the succeeding equations:

graphic file with name 41598_2025_15052_Article_Equ3.gif

Here, Inline graphic specifies the existing location, is the upgraded location; represents the maximal iteration counts; is the average location of the existing population described in Eq. (2); signifies the Levy distribution stated in Eq. (3) that aids in representing the parrot flight; has specified a Inline graphic value that is employed to define the parrot’s flight; is the existing optimum location; is the existing iteration; specifies the movement relies on the related location to the owner, and is a purpose of the food position more accurately through monitoring the location of the entire population.

Staying behavior

Modelling the parrot’s behaviour remaining arbitrarily on diverse segments permits for the combination of randomness to the method of searching:

Now Inline graphic is a flying process toward the owner, and represents an arbitrary stop on a specific portion of the owner’s body.

Communicating performance

Parrots are naturally sociable and frequently interact inside their flock, flying near the group and interacting while staying beyond it. During this PO model, it is presumed these dual behaviours are the equivalent possibilities of existence, and an average location of the existing population is acquired.

Here, 0. Inline graphic specifies the sequence of an individual has become a part of the parrot group for interaction, whereas specifies the condition whereas an individual leaves right after communicating. Either behaviour is accomplished by creating an arbitrary number in the range of

Fear of stranger’s behavior

Parrots naturally fear unknown individuals and will remain beyond visitors and searching safeguards from their owners.

Here, Inline graphic specifies the reorienting process to fly near the owner, and . represent the process of distancing itself from strangers.

The fitness function (FF) reproduces the classification accuracy and the chosen feature amounts. It exploits the classification precision and lowers the fixed dimension of the selected features. Then, the next FF is applied to evaluate individual solutions, as provided in Eq. (7).

However, Inline graphic epitomizes the classifier error using the chosen features. is determined as the inappropriate percentage categorized to the no. of classifications finished, indicated as the rate among (0,1). ( refers to negating the classification precision), denotes selected feature counts, and signifies the complete number of features in the new data. Inline graphic is executed for controlling the importance of classification quality and subset length.

TCN-MHA-Bi-GRU-based classification process

In this section, the proposed MDRRAD-DLM model implements the TCN-MHA-Bi-GRU technique for the attack classification process. This model is chosen for its superior capability in capturing both short- and long-term dependencies in sequential data. TCNs provide parallel processing and stable gradients, making them more efficient than conventional RNNs for long sequences. Bi-GRUs improve contextual understanding by analyzing input in both forward and backward directions. The integration of MHA additionally strengthens the model by allowing it to concentrate on multiple relevant parts of the sequence simultaneously. This hybrid technique integrates temporal precision, contextual depth, and dynamic attention, outperforming standalone LSTMs, GRUs, or CNNs in complex classification tasks involving time-dependent biomedical data. Figure 3 represents the structure of TCN-MHA-Bi-GRU.

Fig. 3 — Structure of TCN-MHA-Bi-GRU model.

TCN block: In the DL, exploring TCNs as a powerful replacement for traditional recurrent structures has triggered the paradigm move³⁶. The underpinning idea of TCNs rests in using Inline graphic temporal convolutions to explain complex patterns and dependencies inside sequential data. The convolutional process illustrates this mathematical structure:

where as Inline graphic signifies the output at time-step = denotes the activation function, characterizes filter weights, symbolizes inputs at previous time-steps, denotes the biased term, and signifies the kernel dimensions, makes the backbone of TCNs. This method allows TCNs to effectively capture temporal nuances and complex dependencies inside sequences, thus providing a new vision on information propagation and memory retention in sequence modelling tasks. TCNs are increasingly favoured due to their superior capability in capturing long-term dependencies compared to conventional recurrent models. This enhanced memory retention makes TCNs highly effective for tasks that demand comprehension of extended historical context, enabling efficient and precise processing of complex sequential data.

The separator method stimulates the ConvTasNet module and measures a multiplicative function, generally identified as a mask, for all target sources in the input signal. The separator utilizes a TCN to evaluate these masks successfully. The TCN separation module processes the encoding features from the encoder output and gives masks that separate and improve the target sources. Using these masks, this method can remove and rebuild individual modules from composite auditory signals, guaranteeing efficient and accurate separation.

MHA layer: The Attention model permits the method to excel in capturing long‐ or short-term dependencies, relationships, and context in challenging environments. Attention Transformers can meaningfully improve processing by learning to amplify and detect, mainly in noisy environments or with distorted inputs. The AM is mathematically characterized by the Eq. (9):

Inline graphic , and correspondingly represent values, queries, and keys. These keys are applied to calculate Attention weights, representing the significance of dissimilar sections. Values characterize the processed data that, after utilizing Attention weights, are collected into the last recognized output. Furthermore, Inline graphic denotes the key’s dimensionality.

MHA, a basic concept in modern neural network (NN) frameworks, transforms traditional AMs by simultaneously inspiring the model’s capability to handle information through numerous views. By presenting MHA, all furnished with different learned linear projections converting values, queries, and keys into various sizes, the method advances the capability for exploring complex relationships inside the data in parallel. The main formula leading MHA.

where as every Inline graphic is calculated as:

Now, the estimates are described by parameter matrices Inline graphic and

During this fresh methodology, instead of depending on a particular attention function with fixed‐dimensional inputs, the values, queries, and keys experience individual linear transformations Inline graphic times. These transformed methods are then handled with AMs, making output values that summarize a wider range of information. Then, these outputs are united together over a connection and additional projection, concluding in a complete and improved model of the new data. This strategic design selection guarantees that the method can effectively harness the assistances of MHA without exponentially improving computational efficiency, thus paving the method for more advanced and efficient NN frameworks.

GRU layer: It is a form of RNN structure tailored to address the problem of vanishing gradients in classic RNNs. GRUs are related to LSTM units but have a direct structure with small parameters, making them mathematically low luxurious. GRUs are comprised of dual gates, such as update and gate. The reset gate expresses how much of the preceding data wants to be forgotten, whereas an update gate adjusts how much of the novel information is combined into the cell state. This gating method allows GRUs to take longer‐term dependencies in sequential data successfully.

In a GRU, the reset gate Inline graphic , update gate , new hidden layer (HL) , and candidate HL are calculated as shown:

The update gate is measured as:

The gate of reset is calculated as Inline graphic , the candidate HL is established by , and the novel HL is upgraded based on

The Bi‐GRU expands the concept of the GRU by handling input sequences in either forward or backward directions concurrently. By incorporating dual GRU networks—one handling the input sequence from the start to end and another processing it backwards—Bi‐GRU captures dependencies from past or future contexts. The forward HL Inline graphic , reverse HL , and last Bi-GRU HL are established as , and , correspondingly.

These mathematical explanations summarize the core processing of Bi‐GRU and GRU methods, permitting them to proficiently seize dependencies in sequential data by upgrading HLs utilizing contextual and input information from past or future time steps. Bi-GRU methods can use information from either direction to understand and seizure composite patterns in sequential data, enhancing performance on various sequential learning tasks.

EHO-hyperparameter tuning model

Finally, the EHO model optimally alters the hyperparameter range of the TCN-MHA-Bi-GRU technique, resulting in higher classification performance³⁷. This technique is employed due to its strong balance between exploration and exploitation in complex search spaces. This model effectually navigates diverse solution landscapes, avoiding premature convergence common in conventional methods like grid or random search. Its adaptive mechanism identifies the best-performing parameter combinations, improving model generalization. When applied to the TCN-MHA-Bi-GRU framework, EHO significantly enhances classification accuracy by fine-tuning critical parameters. EHO shows improved convergence stability and faster optimization for high-dimensional problems in biomedical classification tasks compared to other metaheuristic algorithms like PSO or GA.

The EHO is a new meta-heuristic technique stimulated by the elk herd’s breeding behaviour. This model balances exploitation and exploration in optimization tasks, making EHO an efficient solution for composite problems. The EHO is tailored to simulate the natural dynamics of elk herds over a sequence of essential stages. It starts with the population initialization and the problem parameters. The method then arrives at the rutting season, separating the population into families directed by the fittest bulls. During this calving season, these families yield novel solutions according to the features of the bull and its harems. At last, in the selection season, each solution is assessed, and the fittest are chosen to make the next generation, with this process repeated till the model converges or the iteration limit is attained. The stages of the EHO are as shown:

Initialization

In the initialization stage of the EHO, the model starts by setting up the population and describing the problem‐specific parameters. The fundamental initializing elements are the elk herd size Inline graphic , the bull rate , and the searching region boundaries. The elk herd is initialized as a matrix of size × . In contrast, denotes the problem’s dimensionality, and all elements in the matrix signify a possible solution (elk). Mathematically, every solution in the population is made inside the described searching region limits utilizing Eq. (13):

whereas Inline graphic signifies the feature of the solution, and denote lower and upper limits for the attribute. represents a randomly generated value distributed uniformly among , 1). The fitness is calculated using the objective function , and the solutions are sorted according to their fitness values in ascending order. This initial setup makes the elk herd for the following stages of the model.

Making the primary Elk herd solutions

During this second stage, the model concentrates on making the first solution population, demonstrating the elk herd. After describing the problem‐specific parameters and initializing the population matrix Inline graphic in the initial phase, this stage includes assigning fitness values to every solution and establishing the herd architecture. The elk herd EH is produced as a matrix of dimensions , whereas every row is associated with a possible solution in the search area, as shown in Eq. (14).

graphic file with name 41598_2025_15052_Article_Equ14.gif

Once the early population was produced, fitness was assessed using the objective function Inline graphic . Then, the herd was classified in ascending order of fitness, guaranteeing that the optimal solutions (strong elks) were placed at the top. This ordered structure sets the basis for the following rutting season stage, whereas the population should be separated into families.

Rutting season

During this third stage, the EHO model splits the primary population into families, with all families directed by the bull (the appropriate individual). The division depends on the bull’s fitness, reflecting natural behaviour, whereas strong bulls guide the largest groups. Initially, the model establishes the bulls count Inline graphic inside the population utilizing the and the as demonstrated in Eq. (15):

Here, Inline graphic denotes bull counts, represents bull rate, and means population size. The best individuals are chosen as bulls depending on their fitness values. Then, the bulls contest to make families, all containing a bull and its allocated harems (followers). The task of harems to every bull is completed by employing a roulette‐wheel selection method, while the probability. Inline graphic of a bull attract a harem is corresponding to its fitness offered in Eq. (16):

Now, Inline graphic denotes the fitness of the bull, and the sum in the denominator is the complete fitness of all bulls. The roulette‐wheel selection guarantees that high-fitness bulls are more prone to direct more harems. After the harems are given, every bull manages its family, with the dimensions of every family reflecting the bull’s strength. This structured segment sets the phase for the calving season, whereas novel solutions (calves) should be made according to the bulls’ features and harems.

Calving season

During this fourth phase, the EHO model concentrates on making novel solutions (calves) inside every family according to the genetic properties of the bull (leader) and its harems (followers). This method imitators the natural reproduction procedure in elk herds, while the offspring inherit features from either parent, promoting diversity inside the population. For every family, novel solutions Inline graphic are made by incorporating features from the bull and its harems . When the calf’s index matches that of its bull father, the novel solution is made utilizing Eq. (17):

Now, Inline graphic denotes a number generated randomly among (, 1), and represents randomly chosen features from the present population. This equation guarantees that the novel solution is affected mainly by the bull, with several variations presented by the arbitrary selection from the group. When the calf’s index matches that of its mother harem, the novel solution is made by joining the characteristics of either the mother or the bull, utilizing Eq. (18):

Now, Inline graphic and are arbitrarily generated numbers within an interval of embodies the bull’s features, and stands for random features from other bulls. This calving method endures for each family, creating a novel generation of solutions that inherit the predecessors’ strengths. However, they present novel variations significant for the model’s exploitation and exploration abilities in the searching region.

Selection season

During this fifth stage, this method combines the newly-formed solutions (calves) with the recent population of bulls and harems to create a mixed herd. The bulls, harems, and newly-formed calves are combined into a particular matrix, Inline graphic . All fitness of the individual in is assessed utilizing the objective function, and the complete population is ordered in ascending order depending on fitness values. From this ordered population, the best individuals, where are chosen to make the novel population for the following iteration. This selection process parallels the Inline graphic selection approach generally applied in evolutionary methods, whereas either parent (bulls and harems) or offspring (calves) contest equally for survival. By constantly choosing the fittest individuals, the model iteratively improves the population, refining the complete fitness of the herd with every cycle. This selection procedure repeats till the end conditions, like maximal iteration counts or convergence to the best solution, are encountered. Algorithm 1 describes the EHO technique.

The EHO model originates from an FF that aims to reach the heightened outcome of a classifier. It establishes a progressive integer to characterize the enriched efficiency of the candidate solution. The classification rate of error reduction is reflected as FF, which is shown in Eq. (19).

Performance analysis

The simulation analysis of the MDRRAD-DLM approach is examined under the NSLKDD dataset³⁸. The technique is simulated using Python 3.6.5 on a PC with an i5-8600 k, 250 GB SSD, GeForce 1050Ti 4 GB, 16 GB RAM, and 1 TB HDD. Parameters include a learning rate of 0.01, ReLU activation, 50 epochs, 0.5 dropouts, and a batch size of 5. This dataset holds 148,517 samples under five classes, as depicted in Table 1. The complete no. of features is 42, where only 31 features are chosen.

Table 1.

Details of the NSLKDD dataset.

NSLKDD dataset
Class	No. of samples
“Normal”	77,054
“DoS”	53,385
“Probe”	14,410
“R2L”	3416
“U2R”	252
Total samples	148,517

Open in a new tab

Figure 4 illustrates the classifier performance of the MDRRAD-DLM model on the NSLKDD dataset. Figures 4a, b displays the confusion matrices with accurate classification and detection of each class in 70%TRAPHA and 30%TESPHA. Figure 4c represents the PR curve, demonstrating higher performance through each class label. Ultimately, Fig. 4d signifies the identification of ROC, representing proficient outcomes with superior value of ROC for different class labels.

Fig. 4 — NSLKDD dataset (**a–b**) 70%TRAPHA and 30%TESPHA and (**c–d**) PR and ROC curves.

Table 2 and Fig. 5 depict the attack recognition of MDRRAD-DLM methodology on the NSLKDD dataset. The performance specified that the MDRRAD-DLM method appropriately classified each class label. Based on 70% TRAPHA, the projected MDRRAD-DLM model attains an average Inline graphic of 99.11%, of 85.89%, of 80.43%, of 82.46%, and of 82.09%. Furthermore, based on 30% TESPHA, the presented MDRRAD-DLM method obtains an average of 99.14%, of 87.41%, of 82.42%, of 84.51%, and of 84.02%.

Table 2.

Attack detection of MDRRAD-DLM technique under the NSLKDD dataset.

Classes
TRAPHA (70%)
Normal	98.32	98.46	98.30	98.38	96.64
DoS	98.59	97.81	98.28	98.04	96.94
Probe	99.45	96.29	98.09	97.18	96.88
R2L	99.37	89.58	82.20	85.73	85.49
U2R	99.83	47.31	25.29	32.96	34.51
Average	99.11	85.89	80.43	82.46	82.09
TESPHA (30%)
Normal	98.37	98.45	98.41	98.43	96.73
DoS	98.62	97.89	98.27	98.08	97.01
Probe	99.51	96.65	98.31	97.47	97.20
R2L	99.37	90.22	81.21	85.48	85.28
U2R	99.83	53.85	35.90	43.08	43.89
Average	99.14	87.41	82.42	84.51	84.02

Open in a new tab

Fig. 5 — Average of MDRRAD-DLM model under the NSLKDD dataset.

Figure 6 displays the training (TRAN) Inline graphic and validation (VALN) outcomes of the MDRRAD-DLM methodology under the NSLKDD dataset. The values are calculated for 0–25 epochs. The figure underlined that either values present increasing tendencies that described the ability of the MDRRAD-DLM methodology using a heightened solution through multiple iteration counts. Moreover, it remains closer to the epoch counts, specifying minimum over-fitting and displaying the MDRRAD-DLM technique’s greater outcome.

Fig. 6 — curve of MDRRAD-DLM approach under the NSLKDD dataset.

In Fig. 7, the TRAN loss (TRANLOS) and VALN loss (VALNLOS) graphs of the MDRRAD-DLM approach on the NSLKDD dataset are demonstrated. The value loss is measured through an interval of 0–25 epochs. Either value depicts reducing tendencies, indicating the ability of the MDRRAD-DLM model to balance a trade-off. The recurrent decrease ensures a better solution of the MDRRAD-DLM method and modifies the prediction results.

Table 3 and Fig. 8 examine the comparative result of the MDRRAD-DLM approach on the NSLKDD dataset with existing methodologies^39–41. The solution highlighted that the Naïve Bayes (NB), k-nearest neighbours (KNN), Gradient Boosting (GB), IForest, MLP, CNN, and RNN models have stated poor performance. While, the projected MDRRAD-DLM method stated superior outcome with maximal Inline graphic , and of 99.14%, 87.41%, 82.42%, and 84.51%, respectively.

Table 3.

Comparative study of the MDRRAD-DLM technique with existing methods under the NSLKDD dataset.

NSLKDD dataset
Technique
NB	98.43	83.49	80.07	81.61
KNN algorithm	97.41	81.26	79.13	76.22
GB	93.35	81.93	75.63	77.12
IForest	89.51	79.43	79.17	83.55
MLP method	90.79	79.14	75.91	82.44
CNN classifier	98.19	80.98	78.69	75.64
RNN method	91.66	80.62	78.05	80.81
MDRRAD-DLM	99.14	87.41	82.42	84.51

Open in a new tab

Fig. 8 — Comparative study of MDRRAD-DLM technique with existing methods under the NSLKDD dataset.

Table 4 and Fig. 9 demonstrate the computational time (CT) analysis of the MDRRAD-DLM technique with existing methods. Among the various methods evaluated, the NB technique required 14.57 s, while the KNN model took 11.25 s. The gradient boosting (GB) method and the multi-layer perceptron (MLP) approach attained similar times of 14.18 s and 14.44 s, respectively. The isolation forest (IForest) showed a faster CT of 7.05 s. CNN and RNN classifiers illustrated moderate CTs of 12.87 and 12.79 s, respectively. The MDRRAD-DLM method recorded the shortest CT of 5.66 s, indicating a more efficient processing capability than other methods. These CT values reflect the efficiency of each technique in handling the dataset and their suitability for real-time IDS.

Table 4.

CT analysis of the MDRRAD-DLM approach with existing models under the NSLKDD dataset.

NSLKDD dataset
Technique	CT (sec)
NB	14.57
KNN algorithm	11.25
GB	14.18
IForest	7.05
MLP method	14.44
CNN classifier	12.87
RNN method	12.79
MDRRAD-DLM	5.66

Open in a new tab

Fig. 9 — CT analysis of the MDRRAD-DLM approach with existing models under the NSLKDD dataset.

The ablation study of the MDRRAD-DLM methodology is specified in Table 5 and Fig. 10. The ablation study on the NSLKDD dataset compares the performance of the MDRRAD-DLM methodology with other existing techniques. The MDRRAD-DLM method achieved an Inline graphic of 99.14%, of 87.41%, of 82.42%, and an of 84.51%, outperforming POA, which recorded an of 96.94%, of 85.18%, of 80.50%, and an of 82.65%. Similarly, MDRRAD-DLM method outperformed EHOA with an of 97.74%, of 85.98%, of 81.30%, and an of 83.31%, as well as TCN-MHA-Bi-GRU, which achieved an Inline graphic of 98.43%, of 86.63%, of 81.91%, and an of 83.84%. This demonstrates the superior efficiency of the MDRRAD-DLM method across all key performance metrics.

Table 5.

Ablation study-based comparative analysis of the MDRRAD-DLM methodology against recent techniques.

NSLKDD dataset
Technique
POA	96.94	85.18	80.50	82.65
EHOA	97.74	85.98	81.30	83.31
TCN-MHA-Bi-GRU	98.43	86.63	81.91	83.84
MDRRAD-DLM	99.14	87.41	82.42	84.51

Open in a new tab

Fig. 10 — Ablation study-based comparative analysis of the MDRRAD-DLM methodology against recent techniques.

Also, the proposed MDRRAD-DLM technique is confirmed below in the CIC-IDS2017 dataset⁴². This dataset holds 30,800 samples below 12 traffic types, as shown in Table 6. 78 features are present, but only 45 are selected features.

Table 6.

Details of the CIC-IDS2017 dataset.

CIC-IDS2017 dataset
Traffic type	Samples
“Benign”	3000
“Bot”	1800
“DDoS”	3000
“DoS goldeneye”	3000
“DoS hulk”	3000
“DoS slowhttptest”	3000
“DoS slowloris”	3000
“FTP-PATATOR”	3000
“Portscan”	3000
“SSH-PATATOR”	3000
“Webattack bruteforce”	1500
“Webattack XSS”	500
Total	30,800

Open in a new tab

Figure 11 represents the classifier outcomes of the MDRRAD-DLM approach under the CIC-IDS2017 dataset. Figure 11a, b shows the confusion matrix with perfect classification and recognition of all class labels below 70%TRAPHA and 30%TESPHA. Figure 11c exhibits the PR analysis, signifying superior outcomes over each class. Besides, Fig. 11d illustrates the ROC values, indicative of efficient findings with greater values of ROC for dissimilar classes.

Fig. 11 — CIC-IDS2017 dataset (**a–b**) 70%TRAPHA and 30%TESPHA and (**c–d**) PR and ROC curves.

Table 7 and Fig. 12 signify the attack detection of the MDRRAD-DLM approach on the CIC-IDS2017 dataset. The outcomes stated that the MDRRAD-DLM approach has correctly classified each dissimilar class label. Based on 70% TRAPHA, the MDRRAD-DLM approach attains an average Inline graphic , , , and of 99.41%, 95.88%, 94.51%, 95.11%, and 94.83%. Furthermore, based on 30% TESPHA, the MDRRAD-DLM technique achieves average , , , and of 99.37%, 96.13%, 94.02%, 94.85%, and 94.62%.

Table 7.

Attack detection of MDRRAD-DLM technique under the CIC-IDS2017 dataset.

Classes
TRAPHA (70%)
Benign	99.26	96.19	96.19	96.19	95.77
Bot	99.42	95.06	94.91	94.99	94.68
DDoS	99.42	96.76	97.35	97.06	96.73
DoS goldeneye	99.49	96.72	98.06	97.39	97.10
DoS hulk	99.36	96.81	96.58	96.70	96.34
DoS slowhttptest	99.35	96.20	97.15	96.67	96.31
DoS slowloris	99.42	97.22	96.85	97.03	96.71
FTP-PATATOR	99.36	96.58	96.91	96.75	96.39
Portscan	99.36	96.12	97.35	96.73	96.37
SSH-PATATOR	99.55	97.44	97.90	97.67	97.42
Webattack bruteforce	99.47	95.50	93.40	94.44	94.17
Webattack XSS	99.43	89.93	71.51	79.67	79.92
Average	99.41	95.88	94.51	95.11	94.83
TESPHA (30%)
Benign	99.33	95.86	97.34	96.59	96.22
Bot	99.30	93.52	95.02	94.26	93.89
DDoS	99.46	96.54	97.85	97.19	96.89
DoS goldeneye	99.36	95.92	97.53	96.72	96.37
DoS hulk	99.48	96.90	97.76	97.33	97.04
DoS slowhttptest	99.43	96.98	97.09	97.03	96.71
DoS slowloris	99.49	97.89	96.92	97.40	97.12
FTP-PATATOR	99.22	95.79	96.22	96.00	95.57
Portscan	99.26	95.66	96.73	96.19	95.79
SSH-PATATOR	99.43	96.90	97.23	97.06	96.75
Webattack bruteforce	99.36	95.19	91.63	93.38	93.06
Webattack XSS	99.37	96.46	66.87	78.99	80.04
Average	99.37	96.13	94.02	94.85	94.62

Open in a new tab

Fig. 12 — Average of MDRRAD-DLM technique on CIC-IDS2017 dataset.

Figure 13 illustrates the TRAN Inline graphic and VALN values of the MDRRAD-DLM approach under the CIC-IDS2017 dataset. The values are calculated through a period of 0–25 epochs. The figure highlights that both values of demonstrate growing tendencies that notified the abilities of the MDRRAD-DLM approach with maximal outcome over numerous iterations. Simultaneously, both Inline graphic leftovers nearer through the epochs, which specifies lesser overfitting and shows the better result of the MDRRAD-DLM method.

Fig. 13 — analysis of MDRRAD-DLM technique on the CIC-IDS2017 dataset.

Figure 14 shows the MDRRAD-DLM technique’s TRANLOS and VALNLOS graphs on the CIC-IDS2017 dataset. The loss values are computed throughout 0–25 epochs. Both values establish reducing tendencies, which notifies the MDRRAD-DLM technique’s capacity to balance a trade-off. The continual fall in loss values further guarantees the optimal performance of the MDRRAD-DLM model.

Fig. 14 — Loss graph of MDRRAD-DLM approach on the CIC-IDS2017 dataset.

Table 8 and Fig. 15 compare the MDRRAD-DLM methodology’s comparison results on the CIC-IDS2017 dataset with existing techniques. The results highlight that the Deep Q-Learning, Deep RL, 2DQN, RF, PCA, AE, and HBOS have reported worse performance. In contrast, the proposed MDRRAD-DLM method reported maximum outcomes with high Inline graphic of 99.41%, of 95.88%, of 94.51% and of 95.11%, respectively.

Table 8.

Comparative study of the MDRRAD-DLM methodology under the CIC-IDS2017 dataset.

CIC-IDS2017 dataset
Approach
Deep Q-learning	92.91	91.37	89.95	89.48
Deep RL	96.84	92.50	90.80	92.45
2DQN model	96.77	94.79	93.15	95.00
RF	98.67	92.49	92.46	90.73
PCA method	97.30	91.07	92.80	90.00
AE	92.17	92.75	90.61	92.63
HBOS model	97.17	94.16	91.48	90.62
MDRRAD-DLM	99.41	95.88	94.51	95.11

Open in a new tab

Fig. 15 — Comparative study of the MDRRAD-DLM methodology under the CIC-IDS2017 dataset.

Table 9 and Fig. 16 indicate the CT evaluation of the MDRRAD-DLM approach with existing techniques. The comparison of CT on the CIC-IDS2017 dataset shows that the MDRRAD-DLM approach is the fastest among all evaluated methods. The MDRRAD-DLM technique completes its process in just 4.16 s, significantly outperforming Deep Q-Learning, which requires 22.36 s, and Deep RL, which takes 9.72 s. The 2DQN model finishes in 8.34 s, while RF and PCA methods require 18.26 s and 16.54 s, respectively. The AE model performs relatively faster at 6.75 s, and the HBOS model completes in 12.30 s. These outcomes highlight the superior efficiency of the MDRRAD-DLM technique in terms of CT, enabling quicker processing without compromising detection capabilities.

Table 9.

CT evaluation of the MDRRAD-DLM approach with existing techniques under the CIC-IDS2017 dataset.

CIC-IDS2017 dataset
Approach	CT (sec)
Deep Q-learning	22.36
Deep RL	9.72
2DQN model	8.34
RF	18.26
PCA method	16.54
AE	6.75
HBOS model	12.30
MDRRAD-DLM	4.16

Open in a new tab

Fig. 16 — CT evaluation of the MDRRAD-DLM approach with existing techniques under the CIC-IDS2017 dataset.

The ablation study of the MDRRAD-DLM model is illustrated in Table 10 and Fig. 17. Tthe MDRRAD-DLM model achieved an Inline graphic of 99.41%, surpassing other methods that reached 97.00%, 97.89%, and 98.62%. The of the MDRRAD-DLM approach was 95.88%, higher than the comparative values of 93.69%, 94.45%, and 95.21%. For , the model recorded 94.51%, outperforming the others, which achieved 92.15%, 92.87%, and 93.69%. Additionally, the Inline graphic reached 95.11%, while the alternative approaches recorded 92.89%, 93.68%, and 94.34%. These results highlight the superior capability of the MDRRAD-DLM method in detecting and classifying threats within the CIC-IDS2017 dataset.

Table 10.

Ablation study results comparing the MDRRAD-DLM method with existing techniques.

CIC-IDS2017 dataset
Approach
POA	97.00	93.69	92.15	92.89
EHOA	97.89	94.45	92.87	93.68
TCN-MHA-Bi-GRU	98.62	95.21	93.69	94.34
MDRRAD-DLM	99.41	95.88	94.51	95.11

Open in a new tab

Fig. 17 — Ablation study results comparing the MDRRAD-DLM method with existing techniques.

Conclusion

In this study, the MDRRAD-DLM approach in real-world IoT applications is proposed. The aim is to propose effectual detection and mitigation strategies for DDoS attacks. Initially, the data preprocessing phase applies Z-score normalization to transform input data into a beneficial layout. Furthermore, the PO technique is employed for the feature selection to select the significant and relevant features from input data. Moreover, the TCN-MHA-Bi-GRU model is implemented for the attack classification process. Finally, the EHO model optimally alters the hyperparameter range of the TCN-MHA-Bi-GRU technique, resulting in higher classification performance. The efficiency of the MDRRAD-DLM approach is examined under NSLKDD and CIC-IDS2017 datasets. The experimental validation of the MDRRAD-DLM approach portrayed a superior accuracy value of 99.14% and 99.41% over the dual datasets. The limitations of the MDRRAD-DLM approach comprise reliance on benchmark datasets, which may not fully represent real-world attack complexities or evolving threat landscapes. Due to unseen traffic patterns, the performance could vary when deployed in diverse and dynamic environments. While high accuracy was achieved, detecting low-frequency or novel attacks remains challenging. The computational requirements, although optimized, may still restrict deployment on ultra-constrained devices. Additionally, interpretability in complex scenarios may not be sufficient for non-expert users. Future research should improve adaptability to new threats, reduce false positives in highly imbalanced data, enhance lightweight deployment for edge devices, and expand evaluation using real-time traffic from multiple domains.

Acknowledgments

The authors extend their appreciation to Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R755),Princess Nourah bint Abdulrahman University, Riyadh,Saudi Arabia.

Author contributions

Adwan A. Alanazi: Conceptualization, methodology development, experiment, formal analysis, investigation, writing. Ashrf Althbiti: Formal analysis, investigation, validation, visualization, writing. Sara Abdelwahab Ghorashi: Formal analysis, review and editing. Fathea M.O. Birkea: Methodology, investigation. Roosvel Soto-Diaz: He has involved since the begining in Methodologydevelopment, formal analysis and review and editing. José Escorcia-Gutierrez: Conceptualization, methodology development, investigation, supervision, review and editing. All authors have read and agreed to the published version of the manuscript.

Data availability

The data supporting this study’s findings are openly available in the Kaggle repository at https://www.kaggle.com/datasets/hassan06/nslkdd and https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset, reference numbers^38,39.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Roosvel Soto-Diaz, Email: roosvel.soto@unisimon.edu.co.

José Escorcia-Gutierrez, Email: jescorci56@cuc.edu.co.

References

1.Zhang, J., Yu, P., Qi, L., Liu, S., Zhang, H. and Zhang, J. FLDDoS: DDoS attack detection model based on federated learning. In 2021 IEEE 20th International Conference on Trust, Security, and Privacy in Computing and Communications (TrustCom) (pp. 635–642). IEEE (2021).
2.Tian, Q., Guang, C., Wenchao, C. and Si, W.. A lightweight residual networks framework for DDoS attack classification based on federated learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 1–6). IEEE (2021).
3.Neto, E.C.P., Dadkhah, S. and Ghorbani, A.A. Collaborative DDoS detection in distributed multi-tenant IoT using federated learning. In 2022 19th Annual International Conference on Privacy, Security & Trust (PST) (pp. 1–10). IEEE (2022).
4.Liu, Z., Guo, C., Liu, D. & Yin, X. An asynchronous federated learning arbitration model for low-rate DDOS attack detection. IEEE Access11, 18448–18460 (2023). [Google Scholar]
5.Lv, D., Cheng, X., Zhang, J., Zhang, W., Zhao, W. and Xu, H. Ddos attack detection based on CNN and federated learning. In 2021 Ninth International Conference on Advanced Cloud and Big Data (CBD) (pp. 236–241). IEEE (2022).
6.Pourahmadi, V., Alameddine, H. A., Salahuddin, M. A. & Boutaba, R. Spotting anomalies at the edge: Outlier exposure-based cross-silo federated learning for DDoS detection. IEEE Trans. Dependable Sec. Comput.20(5), 4002–4015 (2022). [Google Scholar]
7.Nguyen, T.D., Rieger, P., Miettinen, M. and Sadeghi, A.R. Poisoning attacks on federated learning-based IoT intrusion detection system. In Proc. Workshop Decentralized IoT Syst. Secur. (DISS) (Vol. 79) (2020).
8.Mothukuri, V. et al. Federated-learning-based anomaly detection for IoT security attacks. IEEE Internet Things J.9(4), 2545–2554 (2021). [Google Scholar]
9.Agrawal, S. et al. Federated learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun.195, 346–361 (2022). [Google Scholar]
10.Isma’ila, U.A., Danyaro, K.U., Hassan, M.F., Muazu, A.A. and Liew, M.S. An IoT Device-level vulnerability control model through federated detection. J. Intell. Syst. Internet Things12(2) (2024).
11.Rahmati, M. Federated Learning-Driven Cybersecurity Framework for IoT Networks with Privacy-Preserving and Real-Time Threat Detection Capabilities. arXiv preprint arXiv:2502.10599 (2025).
12.Vemulapalli, L. & Sekhar, P. C. A customized temporal federated learning through adversarial networks for cyber attack detection in IoT. J. Robot. Control JRC6(1), 366–384 (2025). [Google Scholar]
13.Ragab, M. et al. Advanced artificial intelligence with federated learning framework for privacy-preserving cyberthreat detection in IoT-assisted sustainable smart cities. Sci. Rep.15(1), 4470 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Jianping, W., Guangqiu, Q., Chunming, W., Weiwei, J. & Jiahe, J. Federated learning for network attack detection using attention-based graph neural networks. Sci. Rep.14(1), 19088 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Subramanian, G. & Chinnadurai, M. Hybrid quantum enhanced federated learning for cyber attack detection. Sci. Rep.14(1), 32038 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bukhari, S. M. S. et al. Secure and privacy-preserving intrusion detection in wireless sensor networks: Federated learning with SCNN-Bi-LSTM for enhanced reliability. Ad. Hoc. Netw.155, 103407 (2024). [Google Scholar]
17.Javeed, D., Saeed, M. S., Adil, M., Kumar, P. & Jolfaei, A. A federated learning-based zero trust intrusion detection system for Internet of Things. Ad. Hoc. Netw.162, 103540 (2024). [Google Scholar]
18.Lin, W. T., Chen, G. & Zhou, X. Privacy-preserving federated learning for detecting false data injection attacks on power systems. Electr. Power Syst. Res.229, 110150 (2024). [Google Scholar]
19.Nandanwar, H. & Katarya, R. Securing Industry 5.0: An explainable deep learning model for intrusion detection in cyber-physical systems. Comp. Electr. Eng.123, 110161 (2025). [Google Scholar]
20.Nandanwar, H. & Katarya, R. Deep learning enabled intrusion detection system for Industrial IOT environment. Expert Syst. Appl.249, 123808 (2024). [Google Scholar]
21.Nandanwar, H. and Katarya, R., 2025. Privacy-preserving data sharing in blockchain-enabled IoT healthcare management system. Comp. J. pp. bxaf065.
22.Nandanwar, H. & Katarya, R. TL-BILSTM IoT: Transfer learning model for prediction of intrusion detection system in IoT environment. Int. J. Inf. Secur.23(2), 1251–1277 (2024). [Google Scholar]
23.Saheed, Y.K. & Chukwuere, J.E. CPS-IIoT-P2attention: Explainable privacy-preserving with scaled dot-product attention in cyber physical system-industrial IoT Network. IEEE Access (2025).
24.Kauhsik, B., Nandanwar, H. & Katarya, R. IoT security: A deep learning-based approach for intrusion detection and prevention. In 2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT) (pp. 1–7). IEEE (2023).
25.Saheed, Y. K., Omole, A. I. & Sabit, M. O. GA-mADAM-IIoT: A new lightweight threats detection in the industrial IoT via genetic algorithm with attention mechanism and LSTM on multivariate time series sensor data. Sens. Int.6, 100297 (2025). [Google Scholar]
26.Nandanwar, H. and Katarya, R. A systematic literature review: Approach toward blockchain future research trends. In 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT) (pp. 259–264). IEEE (2023).
27.Saheed, Y. K. & Misra, S. CPS-IoT-PPDNN: A new explainable privacy preserving DNN for resilient anomaly detection in cyber-physical systems-enabled IoT networks. Chaos Solitons Fractals191, 115939 (2025). [Google Scholar]
28.Saheed, Y. K. & Chukwuere, J. E. Xaiensembletl-iov: A new explainable artificial intelligence ensemble transfer learning for zero-day botnet attack detection in the internet of vehicles. Results Eng.24, 103171 (2024). [Google Scholar]
29.Alhashmi, A., Idwaib, H., Avci, S. A., Rahebi, J. & Ghadami, R. Distributed denial-of-service (DDoS) on the smart grids based on VGG19 deep neural network and Harris Hawks optimization algorithm. Sci. Rep.15(1), 1–18 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Saheed, Y.K., Misra, S. and Chockalingam, S. Autoencoder via DCNN and LSTM models for intrusion detection in industrial control systems of critical infrastructures. In 2023 IEEE/ACM 4th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS) (pp. 9–16). IEEE (2023).
31.Berríos, S., Garcia, S., Hermosilla, P. & Allende-Cid, H. A machine-learning-based approach for the detection and mitigation of distributed denial-of-service attacks in internet of things environments. Appl. Sci.15(11), 6012 (2025). [Google Scholar]
32.Saheed, Y. K., Abdulganiyu, O. H. & Ait Tchakoucht, T. Modified genetic algorithm and fine-tuned long short-term memory network for intrusion detection in the internet of things networks with edge capabilities. Appl. Soft Comput.155, 111434 (2024). [Google Scholar]
33.Pandey, V. K. et al. Enhancing intrusion detection in wireless sensor networks using a Tabu search based optimized random forest. Sci. Rep.15(1), 1–21 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Wang, X., Yang, X., Zhou, J. & Ren, H. Z-score-based improved topsis method and its implementation for elderly people health examination results evaluation: A statistic case study in Harbin, China. Health Soc. Care Commun.2025(1), 5974609 (2025). [Google Scholar]
35.Yang, Y., Fu, M., Zhou, X., Jia, C. & Wei, P. A multi-strategy parrot optimization algorithm and its application. Biomimetics10(3), 153 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Essaid, B., Kheddar, H., Batel, N. and Chowdhury, M.E. Deep learning-based coding strategy for improved cochlear implant speech perception in noisy environments. IEEE Access (2025).
37.Ouertani, M.W., Oueslati, R. and Manita, G., Improved Binary Elk Herd Optimizer with Fitness Balance Distance for Feature Selection Using Gene Expression Data.
38.https://www.kaggle.com/datasets/hassan06/nslkdd
39.Finistrella, S., Mariani, S. and Zambonelli, F. Multi-agent reinforcement learning for cybersecurity: Classification and survey. Intell. Syst. Appl. 200495 (2025).
40.Jemili, F., Jouini, K. & Korbaa, O. Detecting unknown intrusions from large heterogeneous data through ensemble learning. Intell. Syst. Appl.25, 200465 (2025). [Google Scholar]
41.Srivastav, S., Shukla, A. K., Kumar, S. & Muhuri, P. K. HYRIDE: HYbrid and robust intrusion detection approach for enhancing cybersecurity in industry 4.0. Internet Things30, 101492 (2025). [Google Scholar]
42.https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Zhang, J., Yu, P., Qi, L., Liu, S., Zhang, H. and Zhang, J. FLDDoS: DDoS attack detection model based on federated learning. In 2021 IEEE 20th International Conference on Trust, Security, and Privacy in Computing and Communications (TrustCom) (pp. 635–642). IEEE (2021).

[CR2] 2.Tian, Q., Guang, C., Wenchao, C. and Si, W.. A lightweight residual networks framework for DDoS attack classification based on federated learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 1–6). IEEE (2021).

[CR3] 3.Neto, E.C.P., Dadkhah, S. and Ghorbani, A.A. Collaborative DDoS detection in distributed multi-tenant IoT using federated learning. In 2022 19th Annual International Conference on Privacy, Security & Trust (PST) (pp. 1–10). IEEE (2022).

[CR4] 4.Liu, Z., Guo, C., Liu, D. & Yin, X. An asynchronous federated learning arbitration model for low-rate DDOS attack detection. IEEE Access11, 18448–18460 (2023). [Google Scholar]

[CR5] 5.Lv, D., Cheng, X., Zhang, J., Zhang, W., Zhao, W. and Xu, H. Ddos attack detection based on CNN and federated learning. In 2021 Ninth International Conference on Advanced Cloud and Big Data (CBD) (pp. 236–241). IEEE (2022).

[CR6] 6.Pourahmadi, V., Alameddine, H. A., Salahuddin, M. A. & Boutaba, R. Spotting anomalies at the edge: Outlier exposure-based cross-silo federated learning for DDoS detection. IEEE Trans. Dependable Sec. Comput.20(5), 4002–4015 (2022). [Google Scholar]

[CR7] 7.Nguyen, T.D., Rieger, P., Miettinen, M. and Sadeghi, A.R. Poisoning attacks on federated learning-based IoT intrusion detection system. In Proc. Workshop Decentralized IoT Syst. Secur. (DISS) (Vol. 79) (2020).

[CR8] 8.Mothukuri, V. et al. Federated-learning-based anomaly detection for IoT security attacks. IEEE Internet Things J.9(4), 2545–2554 (2021). [Google Scholar]

[CR9] 9.Agrawal, S. et al. Federated learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun.195, 346–361 (2022). [Google Scholar]

[CR10] 10.Isma’ila, U.A., Danyaro, K.U., Hassan, M.F., Muazu, A.A. and Liew, M.S. An IoT Device-level vulnerability control model through federated detection. J. Intell. Syst. Internet Things12(2) (2024).

[CR11] 11.Rahmati, M. Federated Learning-Driven Cybersecurity Framework for IoT Networks with Privacy-Preserving and Real-Time Threat Detection Capabilities. arXiv preprint arXiv:2502.10599 (2025).

[CR12] 12.Vemulapalli, L. & Sekhar, P. C. A customized temporal federated learning through adversarial networks for cyber attack detection in IoT. J. Robot. Control JRC6(1), 366–384 (2025). [Google Scholar]

[CR13] 13.Ragab, M. et al. Advanced artificial intelligence with federated learning framework for privacy-preserving cyberthreat detection in IoT-assisted sustainable smart cities. Sci. Rep.15(1), 4470 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Jianping, W., Guangqiu, Q., Chunming, W., Weiwei, J. & Jiahe, J. Federated learning for network attack detection using attention-based graph neural networks. Sci. Rep.14(1), 19088 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Subramanian, G. & Chinnadurai, M. Hybrid quantum enhanced federated learning for cyber attack detection. Sci. Rep.14(1), 32038 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Bukhari, S. M. S. et al. Secure and privacy-preserving intrusion detection in wireless sensor networks: Federated learning with SCNN-Bi-LSTM for enhanced reliability. Ad. Hoc. Netw.155, 103407 (2024). [Google Scholar]

[CR17] 17.Javeed, D., Saeed, M. S., Adil, M., Kumar, P. & Jolfaei, A. A federated learning-based zero trust intrusion detection system for Internet of Things. Ad. Hoc. Netw.162, 103540 (2024). [Google Scholar]

[CR18] 18.Lin, W. T., Chen, G. & Zhou, X. Privacy-preserving federated learning for detecting false data injection attacks on power systems. Electr. Power Syst. Res.229, 110150 (2024). [Google Scholar]

[CR19] 19.Nandanwar, H. & Katarya, R. Securing Industry 5.0: An explainable deep learning model for intrusion detection in cyber-physical systems. Comp. Electr. Eng.123, 110161 (2025). [Google Scholar]

[CR20] 20.Nandanwar, H. & Katarya, R. Deep learning enabled intrusion detection system for Industrial IOT environment. Expert Syst. Appl.249, 123808 (2024). [Google Scholar]

[CR21] 21.Nandanwar, H. and Katarya, R., 2025. Privacy-preserving data sharing in blockchain-enabled IoT healthcare management system. Comp. J. pp. bxaf065.

[CR22] 22.Nandanwar, H. & Katarya, R. TL-BILSTM IoT: Transfer learning model for prediction of intrusion detection system in IoT environment. Int. J. Inf. Secur.23(2), 1251–1277 (2024). [Google Scholar]

[CR23] 23.Saheed, Y.K. & Chukwuere, J.E. CPS-IIoT-P2attention: Explainable privacy-preserving with scaled dot-product attention in cyber physical system-industrial IoT Network. IEEE Access (2025).

[CR24] 24.Kauhsik, B., Nandanwar, H. & Katarya, R. IoT security: A deep learning-based approach for intrusion detection and prevention. In 2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT) (pp. 1–7). IEEE (2023).

[CR25] 25.Saheed, Y. K., Omole, A. I. & Sabit, M. O. GA-mADAM-IIoT: A new lightweight threats detection in the industrial IoT via genetic algorithm with attention mechanism and LSTM on multivariate time series sensor data. Sens. Int.6, 100297 (2025). [Google Scholar]

[CR26] 26.Nandanwar, H. and Katarya, R. A systematic literature review: Approach toward blockchain future research trends. In 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT) (pp. 259–264). IEEE (2023).

[CR27] 27.Saheed, Y. K. & Misra, S. CPS-IoT-PPDNN: A new explainable privacy preserving DNN for resilient anomaly detection in cyber-physical systems-enabled IoT networks. Chaos Solitons Fractals191, 115939 (2025). [Google Scholar]

[CR28] 28.Saheed, Y. K. & Chukwuere, J. E. Xaiensembletl-iov: A new explainable artificial intelligence ensemble transfer learning for zero-day botnet attack detection in the internet of vehicles. Results Eng.24, 103171 (2024). [Google Scholar]

[CR29] 29.Alhashmi, A., Idwaib, H., Avci, S. A., Rahebi, J. & Ghadami, R. Distributed denial-of-service (DDoS) on the smart grids based on VGG19 deep neural network and Harris Hawks optimization algorithm. Sci. Rep.15(1), 1–18 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Saheed, Y.K., Misra, S. and Chockalingam, S. Autoencoder via DCNN and LSTM models for intrusion detection in industrial control systems of critical infrastructures. In 2023 IEEE/ACM 4th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS) (pp. 9–16). IEEE (2023).

[CR31] 31.Berríos, S., Garcia, S., Hermosilla, P. & Allende-Cid, H. A machine-learning-based approach for the detection and mitigation of distributed denial-of-service attacks in internet of things environments. Appl. Sci.15(11), 6012 (2025). [Google Scholar]

[CR32] 32.Saheed, Y. K., Abdulganiyu, O. H. & Ait Tchakoucht, T. Modified genetic algorithm and fine-tuned long short-term memory network for intrusion detection in the internet of things networks with edge capabilities. Appl. Soft Comput.155, 111434 (2024). [Google Scholar]

[CR33] 33.Pandey, V. K. et al. Enhancing intrusion detection in wireless sensor networks using a Tabu search based optimized random forest. Sci. Rep.15(1), 1–21 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Wang, X., Yang, X., Zhou, J. & Ren, H. Z-score-based improved topsis method and its implementation for elderly people health examination results evaluation: A statistic case study in Harbin, China. Health Soc. Care Commun.2025(1), 5974609 (2025). [Google Scholar]

[CR35] 35.Yang, Y., Fu, M., Zhou, X., Jia, C. & Wei, P. A multi-strategy parrot optimization algorithm and its application. Biomimetics10(3), 153 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Essaid, B., Kheddar, H., Batel, N. and Chowdhury, M.E. Deep learning-based coding strategy for improved cochlear implant speech perception in noisy environments. IEEE Access (2025).

[CR37] 37.Ouertani, M.W., Oueslati, R. and Manita, G., Improved Binary Elk Herd Optimizer with Fitness Balance Distance for Feature Selection Using Gene Expression Data.

[CR38] 38.https://www.kaggle.com/datasets/hassan06/nslkdd

[CR39] 39.Finistrella, S., Mariani, S. and Zambonelli, F. Multi-agent reinforcement learning for cybersecurity: Classification and survey. Intell. Syst. Appl. 200495 (2025).

[CR40] 40.Jemili, F., Jouini, K. & Korbaa, O. Detecting unknown intrusions from large heterogeneous data through ensemble learning. Intell. Syst. Appl.25, 200465 (2025). [Google Scholar]

[CR41] 41.Srivastav, S., Shukla, A. K., Kumar, S. & Muhuri, P. K. HYRIDE: HYbrid and robust intrusion detection approach for enhancing cybersecurity in industry 4.0. Internet Things30, 101492 (2025). [Google Scholar]

[CR42] 42.https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset

PERMALINK

Secure federated learning with metaheuristic optimized dimensionality reduction and multi-head attention for DDoS attack mitigation

Adwan A Alanazi

Ashrf Althbiti

Sara Abdelwahab Ghorashi

Fathea M O Birkea

Roosvel Soto-Diaz

José Escorcia-Gutierrez

Abstract

Introduction

Fig. 1.

Literature of works

Proposed methods

Fig. 2.

Data preprocessing

PO-based feature selection process

Foraging behavior

Staying behavior

Communicating performance

Fear of stranger’s behavior

TCN-MHA-Bi-GRU-based classification process

Fig. 3.

EHO-hyperparameter tuning model

Initialization

Making the primary Elk herd solutions

Rutting season

Calving season

Selection season

Algorithm 1.

Performance analysis

Table 1.

Fig. 4.

Table 2.

Fig. 5.

Fig. 6.

Fig. 7.

Table 3.

Fig. 8.

Table 4.

Fig. 9.

Table 5.

Fig. 10.

Table 6.

Fig. 11.

Table 7.

Fig. 12.

Fig. 13.

Fig. 14.

Table 8.

Fig. 15.

Table 9.

Fig. 16.

Table 10.

Fig. 17.

Conclusion

Acknowledgments

Author contributions

Data availability

Declarations

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases