Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Apr 7;16:11673. doi: 10.1038/s41598-026-45937-9

Artificial intelligence driven multi agent framework for adaptive cyber attack simulation and automated incident response in cyber range environments

Alka Agrawal 1, Mohd Nadeem 2, Ahmed Al Nuaim 3,, Abdullah Al Nuaim 3
PMCID: PMC13062070  PMID: 41946768

Abstract

Cyber range environments are key platforms for cybersecurity training, research and testing. This can enable the emulation of realistic cyberattacks and incident response scenarios. Most of the traditional approaches to simulation are based on predefined or rule-based models. These approaches do not allow for adaptation and fail to account for the complexity of evolving threats. An artificial Intelligence-Driven Multi-Agent System (MAS) has been proposed in this paper. The framework autonomously simulates sophisticated cyberattacks and coordinates automated incident response within a cyber range. CICIDS2017 and UNSW-NB15 datasets are combined and integrated into a cyber range simulator CyDER 2.0. Reinforcement learning and anomaly detection methods are used to enable attack and defence agents for adaptive behaviours. The MAS architecture implements realistic attack vectors and response strategies. A set of experiments demonstrate that the AI-driven MAS achieves much higher simulation realism and responsiveness than the traditional static systems. This method also has higher detection accuracy with minimal mitigation times. The model undergoes rigorous validation and acceptance testing to assess robustness and generalizability.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-45937-9.

Keywords: Multi-agent systems, Cyber-range, Cyber-attack simulation, Incident response, Model validation, Cyber security

Subject terms: Engineering, Mathematics and computing

Introduction

The growing number of sophisticated cyberattacks emphasizes the need for effective cybersecurity training and protection. Cyber range environments are increasingly used for cybersecurity professional training, testing defences, and validating incident response procedures in a realistic but controlled environment1,2. These cyber ranges allow for the simulation of cyberattack and defence techniques in a risk-free manner. Many of today’s cyber range platforms are based upon static or rule-based simulation models and therefore do not adequately capture the dynamic and ever-changing nature of modern cyber threats3,4. Most rule-based simulation models cannot effectively emulate the use of emerging attack techniques or complex multi-step attacks found in real-world cyber threats. Studies show that rule-based simulation models fail to represent complex attack chains and adaptive behavior of attackers found in real-world attacks. As a result, it provides unrealistic and overly simplistic threat scenarios, which limit the effectiveness of training5,6. In a study conducted in 2023 it was found that over 65% of simulated attacks did not simulate attacker lateral movement and evasion strategies sufficiently and therefore limited operator readiness7. Advanced Persistent Threats (APTs), for example, utilize multi-step, stealthy tactics that require simulation frameworks that can learn and adapt independently of static scripts8. This gives a need for adaptive simulation models utilizing artificial intelligence techniques, such as reinforcement learning and anomaly detection, that can alter attack and defence behaviours in real-time to best represent the complexity of real-world cyber threats.

The AI and MAS appear to be two viable options to improve cyber range capabilities. The MAS consists of independent, interacting agents that simulate complex attacker-defender interactions in a natural way9,10. The usage of AI techniques allows simulations to adapt and evolve through experience and improves the level of realism and training effectiveness1113. The literature currently available shows a significant gap in the development of AI-driven MAS frameworks that have been validated using actual cyber-attack data in cyber-range environments14. Although there is an abundance of research focused on AI in cybersecurity and MAS in multi-domain simulations. However, few studies explicitly combine these ideas to create simulations that can demonstrate autonomous and adaptive offensive/defensive behaviours for cybersecurity training and incident response15,16. The discussed gap provides the impetus for this work, which seeks to create and validate AI-driven MAS to simulate cyberattacks and incident responses in a cyber-range environment. The system utilizes two datasets, CICIDS201717 and UNSW-NB1518, and employs reinforcement learning and advanced anomaly detection to develop adaptive agent models. The system is then integrated into a state-of-the-art cyber range simulator, which provides a realistic execution environment for the simulated scenarios and a comprehensive means of evaluating the performance of the system. The contributions in the paper are as follows:

  • A new AI-Driven Multi Agent System architecture has been developed that autonomously creates and executes complex behaviours of cyberattack and automates coordinated incident response activities.

  • Actual cyberattack datasets are used and integrated with a modern cyber range platform to increase the applicability of the simulated scenarios.

  • An extensive experimental evaluation is conducted to demonstrate the improvement in the realism of the simulation, accuracy of detecting simulated cyberattacks, and the efficiency of responding to simulated cyberattacks when compared to existing static or rule based baselines.

  • A major research gap is addressed by giving a scalable and extendable MAS framework with applications in cyber security training, research and defensive technology development.

In Sect. 2 of the paper, a review of the literature featuring the focused research trend is given. Section 3 outlines the specific research gap and formulates problem statement. Section 4 defines the research objectives and sets hypotheses. Section 5 will describe the methodology used for the research. Development and construction of the model and its implementation are described in Sects. 6 and 7. Experimental results are discussed in Sect. 8. The accepted model is evaluated in Sect. 9. Section 10 summarize the discussion and conclusions of this paper.

Literature review

The objective of conducting the literature review is to position this research in relation to already exists within the knowledge base. Recent studies have furthered our understanding of the application of reinforcement learning to adaptive cyber defence1922, multi-agent cooperation in dynamic threat environments2325, and autonomous incident response frameworks that utilize an artificial intelligence (AI) based approach for anomaly detection using machine learning2629. Our research builds upon previous work by utilizing AI-driven Multi-Agent Systems (MAS) that are integrated into cyber ranges that support operational and real-world-based validation with data sets, in order to address two primary shortcomings present in previous research, namely, the lack of scalability and realism in a simulated environment.

Evolution and capabilities of cyber range enviroments

Cyber ranges are virtual testbeds that simulate networked infrastructure for conducting cybersecurity experimentation. Early cyber range platforms generally utilized static and pretexted attacks and manually assessed responses to those attacks30,31. With the advancement of technologies, these cyber range platforms began to include more dynamic components3234. Cyber range platforms can be broadly categorized based on their deployment and design characteristics as shown in Fig. 1.

Fig. 1.

Fig. 1

Taxonomy of Cyber Range Platforms.

N. Chouliaras et al. presented an overview of cyber range taxonomy and identified an important emphasis in cyber ranges for providing users with realistic and interactive learning experiences2. However, authors noted a need for improved flexibility of cyber ranges as they are currently unable to handle emerging threats. A. Mills et al. developed a more complex cyber range utilizing containerized micro services to create and deploy realistic network segments35. More recent cyber-ranges, CyberVAN36 and CySIDER37, allow some degree of autonomous attack behavior. However, they continue to utilize predetermined attack patterns with limited AI incorporation, resulting in limited engagement of evolving threat tactics. One persistent problem is how to embed self-adaptive, self-learning attackers and defenders in cyber-ranges that can adapt to new attack vectors and defense mechanisms.

Artificial intelligence for cyber-attack and incident response simulation

AI has seen rapid growth in the area of cybersecurity simulation, largely due to the need for adaptable, scalable and realistic decision-making. Since 2015, machine learning (ML) methods have been applied extensively to develop intrusion detection systems (IDS). IDS is benefits from having access to labeled datasets CICIDS201717 and UNSW-NB1518. Reinforcement learning (RL) has emerged as a promising technique for sequential decision making. Now it has garnered significant attention in the area of cybersecurity, particularly for attack simulation and automatic defense11,38. Agents that utilize RL learn optimal policies by interacting with the simulated environment, allowing them to behave in an adaptive and dynamic manner similar to actual threat actors or defenders.

In 2021, Y. Guo et al.12 and in 2022, S. Aberkane et al.39 showed that deep reinforcement learning algorithms could be successfully applied to network traffic anomaly detection and mitigation. They found that the methods worked better than fixed or static models. T. Zhang et al. in 202513 introduced a multi agent reinforcement learning framework for cooperative defender agents to make them learn to respond to threats that change over time in networks. This system helps improve the defense against complex attacks. However, their simulated network environment was relatively simple and synthetic in nature and they did not validate their simulation results against extensive datasets or incorporate their simulation into cyber-range environments. The number of studies have also documented several challenges associated with the scarcity of labeled cyber-attack data. Literature has reported the difficulty of hyperparameter tuning for RL agents and the limited scalability of simulations when simulating realistic, large-scale network topologies40,41. Numerous AI techniques have been researched for cybersecurity simulations, each with varying degrees of success as illustrated in Fig. 2. Our approach in the paper combines the intersection of reinforcement learning and anomaly detection to achieve adaptive and realistic simulation.

Fig. 2.

Fig. 2

AI Techniques Usage in Cybersecurity Simulation.

Multi-agent systems for cybersecurity simulation

The MAS provide a natural fit for modelling the distributed nature of cyber environments, where multiple attackers and defenders interact and act independently and simultaneously. The typical architecture of MAS includes one or more attacker agents and one or more defender agents, all of which interact with the simulated environment as illustrated in Fig. 3. W. Chenghai et al. in 201842 were the first to apply MAS to model cyber-warfare scenarios. They demonstrated the capability of MAS to model and analyze complex relationships between attackers and defenders. N. Bougueroua et al. in 202143 developed a collaborative MAS based framework for intrusion response where agents work together to detect and respond to threats.

Fig. 3.

Fig. 3

Typical Multi Agent System (MAS) Architecture in Cybersecurity Simulations.

The last many years, researchers and practitioners have explored autonomous MAS frameworks that integrate AI learning to adapt attack and defence strategies dynamically10,15,44. M. Kiely et al.45 in 2025 developed MAS that utilizes reinforcement learning based offensive agents who are modifying their attack vectors in response to defender behavior. J. Cai et al. in 202510 conducted a survey of AI integrated MAS for cybersecurity. They identified a lack of practical applications in cyber range environments that utilize real traffic datasets, an important limitation for validation and adoption. Although most MAS studies evaluate their effectiveness in closed laboratory environments or limited-scope testbeds, none of the studies were evaluated in conjunction with a comprehensive cyber-range or large-scale network environment.

Cybersecurity datasets and validation challenges

Labeled datasets are essential for training, testing and validating AI-based simulation models for cybersecurity. Two labeled datasets CICIDS201717 and UNSW-NB1518 are being used widely in the field of cybersecurity for intrusion detection and simulation work since 2015. CICIDS2017 includes many common attacks and benign traffic that was collected from well- instrumented test networks. UNSW-NB15 provides a collection of modern, blended attacks and normal traffic, facilitating diverse behavioral simulation. Most of the studies that use these as validation works are conducted in isolated laboratories or at a small scale, thus limiting generalization46. Similarly, the datasets have limitations that impair the training of agents for the appropriate simulation of realistic behaviors in cyber range simulations47.

Summary and identified research gaps

Table 1 show a summary of research into cyber-range simulation techniques and characteristics, AI/MAS integration, systems applications and data set validation techniques.

Table 1.

Summary and Comparison.

References N. Chouliaras2 W. Chenghai et al.42 Y. Guo et al.12 T. Zhang et al13. M. Kiely et al45. Proposed Work in this Paper
Cyber Range Platform Survey, multiple Simulation only Synthetic network Laboratory simulation Simulated small topology Cyber range platform
AI Integration Minimal AI Rule-based agents Deep RL for anomaly detection Multi-agent RL RL-based attack agents RL, anomaly detection
MAS Usage X X
Real Dataset Usage X X X X √ (CICIDS2017, UNSW-NB15)
(CICIDS2017)
Validation Method Literature review Theoretical/simulation Simulation experiments Simulation only Simulation only Rigorous experiments and validation
Limitations Limited AI MAS integration No integration with cyber-range No MAS or cyber-range integration Simplistic networks, no real data Limited validation and scope Integrates AI MAS with cyber-range and real datasets

Advancements in cyber-range technologies and the growing use of AI and MAS for cybersecurity simulations are illustrated in Fig. 44850. The growth trend of academic publications related to key research areas relevant to AI-driven multi-agent cybersecurity systems over the last decade. An increase in academic publications in the area of reinforcement learning signifies its importance in developing adaptive attacker models. Anomaly detection is identified as an essential feature in defense and remains a strong growth area. There is also a steady increase in agent coordination and cyber range platform development. This illustrates the maturity of simulation environments and collaborative defense frameworks. While there are fewer publications on incident response automation, it is attracting significant attention consistent with operational needs.

Fig. 4.

Fig. 4

Feature-Focused Research Trend.

The current cyber range environment uses static or rule-based simulation models. These models do not have the ability to represent dynamic attack strategies. They also cannot simulate multi-stage and coordinated attacks. Such complex attack methods are commonly used by modern adversaries3,4,32. Some current cyber ranges have employed limited automation or scripted autonomous agents36,37. However, these components do not employ advanced AI techniques such as reinforcement learning, which enables agents to adapt their behaviour based on interactions with the environment. At the same time, AI and MAS have each demonstrated the ability to model attacker and defender behavior in an adaptive way10,13,15. Existing research tends to investigate these techniques in a simplified synthetic network setting. They are not integrated with realistic cyber ranges or connected to real datasets. There are few empirical tests because of the limited integration between the two disciplines. Such approaches pose an important challenge for the performance and scalability of these methods within training cases on actual cyber defence in the real world17,45. Furthermore, publicly accessible labelled intrusion databases such as CICIDS2017 and UNSW-NB15 cannot be used in cyber range environments for the training and testing of adaptive agents in complex multi-agent experiments. As a result of this disconnect between experimental AI models and practical deployment environments, they undermine the reliability and generalizability of these models. Despite the advances and developments in the areas mentioned above, there is a novel and critical gap in enabling fully adaptive, AI-driven Multi-Agent frameworks that realistically simulate evolving cyberattacks and autonomous incident response within operational cyber range environments. It has been noticed that no existing framework fully addresses the following:

  • Autonomous multi-agent simulation of attack and defence are with cyber behaviours using AI techniques.

  • Close integration of these adaptive MAS models in cyber range environments for dynamic and realistic scenarios.

  • Thorough validation of the models are by extensive experimentation using realistic datasets, focusing on scalability, strength and automated incident response.

This gap, involving adaptive AI based multi agent system design, cyber range integration, and validation with real dataset, is crucial for improving cyber range accuracy and training quality. Therefore, this paper aims to fill these gaps by creating an AI-driven multi-agent system embedded in a cyber range environment. It will use real datasets for complete training, testing, and validation to improve adaptability, simulation realism, and automate incident response.

Problem formulation

There is a growing need for cybersecurity training platforms to be able to provide both simulate realistic adversary behaviours and have automated incident response mechanisms. The current cyber range systems do not have either, because they rely on static rule driven simulations which do not mimic the way real world cyber-attacks actively adapt and evolve.

More specifically, there is a great need for an AI driven Multi-Agent System, which has the ability to; Generate and evolve complex cyber-attack scenarios autonomously, which mirror the tactics of contemporary adversaries who are adaptive. Coordinate intelligent defender agents, which will automatically detect, analyze and respond to incidents within the same environment. Be validated thoroughly in a cyber range environment, using authentic intrusion datasets, to ensure its realism, reliability and practicality for use in real world cyber security training and research. We surveyed 25 cyber security experts to validate the identified research gaps. We asked them to rate their agreement on key areas of our problem statement on a 5-point Likert Scale. Our results as shown in Table 2 showed strong consensus in all areas of our problem statement, with average scores greater than 4.1 and statistical significance of p < 0.001 to prove that the problems we’ve stated are relevant and urgent. All questions had an average score greater than 4, so all of the experts were in strong agreement/recognition with our stated problems. More than 75% of the experts agreed/strongly agreed with the key gaps in our problem statement. Low Standard Deviation indicates that all of the experts were in consistent agreement with each other. This statistically validates that our problem statement represents genuine and significant gaps in current cyber range research and practice. To establish if experts consider the stated problem to be significant, we came up with two opposite hypotheses:

  • Null Hypothesis (H0): Experts do not see the problem as significant.

  • Alternative Hypothesis (H1): Experts do see the problem as significant.

Table 2.

Statistical Validation of Research Questions.

Question Avg. Score (1–5) Std. Dev. % Agree (≥ 4)
1. Adaptive multi-agent systems lacking in cyber range platforms 4.3 0.7 80%
2. Reinforcement learning underutilized in attack and defense simulations 4.1 0.8 75%
3. Current simulations overly rely on static or rule based models 4.5 0.6 85%
4. Scalability and real time adaptability are challenges in cyber range environments 4.2 0.7 78%
5. Critical need for AI + multi-agent autonomy frameworks for cyber range training 4.4 0.6 82%

Assuming neutral mean µ0 = 3 (neither agree nor disagree):

For all five questions as shown in Table 3, the t-statistics are very high, and the p-values are much less than the typical significance level of 0.05. There is strong statistical evidence against the null hypothesis. Experts opine that the stated problem is significant.

Table 3.

t-Statistics and p-Values.

Question t-Statistic p-value Interpretation
1 9.6 < 0.001 Strongly reject H0, accept gap
2 7.8 < 0.001 Strongly reject H0
3 11.2 < 0.001 Strongly reject H0
4 8.5 < 0.001 Strongly reject H0
5 10.3 < 0.001 Strongly reject H0

Research objectives and hypothesis

To this end we have developed an objective to develop a novel AI-powered MAS specifically towards those problems. It will directly mimic complex and dynamic cyberattacks against itself, and co-ordinate a counter to them in an environment of simulated controlled cyber range as shown in Fig. 5. The MAS will create authentic simulations based on real data like threat intelligence, network logs, and malware behavior. This integrated approach aims to overcome the limitations of existing systems by enabling autonomous, realistic, and adaptive cyber exercise scenarios that improve preparedness and response effectiveness in real-world environments.

Fig. 5.

Fig. 5

AI-Enabled Multi-Agent System (MAS).

The objectives of this paper are shown in Fig. 6. The overall architecture will include autonomous attacker and defender agents. The attacker agent will use a reinforcement learning (RL) strategy that evolves dynamically based upon defender behavior and network conditions. An intelligent defender agent will be developed using ML for detecting anomalies and making decisions regarding automatic detection, analysis, and response to cyber-attacks/incidents based on real-time adaptive intelligence. The AI agents will be integrated into an existing cyber-range platform to provide for the realistic and interactive modelling of dynamic attack and defence responses in cyber-range environments. To train, validate, and test the system’s ability to accurately identify, classify, and respond to cyber-attacks, intrusion detection systems utilizing real-world intrusion detection dataset, i.e., CICIDS2017 and UNSW-NB15, will be used. To measure the improvement of the realism of the simulated environment, the accuracy of the detection, the effectiveness of the response, scalability, and adaptability of the system compared to traditional static/rule-based cyber-range models, the proposed system will be evaluated through extensive experimentation. Through this structured approach, it is anticipated that cyber-range capabilities will be enhanced through the utilization of dynamic AI techniques and realistic training data.

Fig. 6.

Fig. 6

Proposed Approach.

Based on the objectives stated, the hypotheses designed to test in this study are as follows:

H1

The AI-driven Multi-Agent System outperforms traditional static and rule-based cyber range simulations in detection accuracy.

H2

The AI-driven Multi-Agent System achieves faster incident response times relative to baseline approaches.

H3

The AI-driven Multi-Agent System maintains scalability by sustaining performance across varying network sizes.

Research methodology

The design of this research study uses a systematic iterative process of design, implement, and evaluate in order to develop, integrate and test the validity of an AI-driven Multi-Agent System. The research methodology is based upon combining computational modelling and empirical evaluation with simulation Also it involves integrating the agents into an existing open source cyber range platform. This allows for realistic simulations of various attack scenarios as well as rigorous evaluation of their performance using realistic datasets CICIDS201717 and UNSW-NB1518. Data pre-processing was conducted before training and testing of the models by normalizing the data, performing feature engineering and stratifying the data sets prior to splitting them into training and testing subsets to ensure that all evaluations were representative and unbiased. The MAS architecture has an attacker agent implemented by reinforcement learning based Deep Q-Networks and Policy Gradient methods. That opens the door for the attacker agent to perform dynamic, multi-stage cyberattacks that adapt over time. These are the defender agents that come along with MAS as well. These agents are provided with both supervised and unsupervised machine learning models i.e. Random Forests and Autoencoder. Their role is to detect anomalies. The defender agents have automated decision making modules. These modules respond to incidents as they occur during an attack. Realistic communication protocols were defined between the attackers and defenders. The cyber-range environment utilized in the study includes a number of virtualized network topologies that represent typical configurations found in enterprise environments. Parameters such as traffic volume, number of hosts and vulnerabilities may be configured in each topology. A number of experimental setups were established to reflect varying levels of complexity in the attacks and the defensive posture taken by the defenders. Baseline comparisons were made to static and rule-based adversary/defence models. A number of performance metrics were evaluated in the study, including detection accuracy (precision, recall, true positive rate), response time, complexity of the attack and scalability of the system. Cross-validation and hold-out testing were performed on the real data sets to validate the model. Stress testing and repeated trials were also conducted to assess the stability and reliability of the system.

User acceptance testing was also conducted by cybersecurity experts to assess the realism and usability of the system. Statistical significance testing was performed (using paired t-tests and ANOVA) to determine if the results obtained from the use of AI and MAS in simulating cyber-attacks and their responses were significantly better than those achieved using static and rule-based systems. The development of the AI and MAS components of the study were accomplished using Python, while the TensorFlow/PyTorch frameworks were used for implementing the reinforcement learning and anomaly detection components of the system. The AI and MAS components were then embedded into each cyber range platform using custom tools and API connections. Visualization tools were leveraged to visualize real-time monitoring and report generation. This allowed for the system to be clearly and quickly tracked.

Model development

For the purpose, this section discusses the architecture, formalization and deployment of AI-based MAS. The MAS is designed to implement adaptive cyberattacks and coordinated incident responses in a controlled cyber range environment. Attacker and defender agents are part of the MAS architecture. These entities have decision-making and self-learning capabilities. This allowed them to dynamically interact and evolve in response to environmental and adversarial feedback sources. An integrated MAS-based framework in the cyber domain is presented in Fig. 7. It emphasizes the agent lifecycle and communication policies (e.g. ACL messages written in accordance with FIPA standards) and information flow. Figure 8 illustrates the steps for developing MAS. This involves the attacker and defender agents, interactions, training and testing. Figure 9 illustrates the AI algorithm for shaping the AI-powered MAS model to implement adaptive cyberattacks and handle incident responses.

Fig. 7.

Fig. 7

MAS integrated framework within the cyber-range.

Fig. 8.

Fig. 8

Step-by-Step Procedure for Developing the AI-driven Multi-Agent System (MAS).

Fig. 9.

Fig. 9

Experimental Setup and Results.

Algorithm 1.

Algorithm 1

AI-Driven Multi-Agent System Development for Adaptive Cyber-Attack Simulation and Incident Response.

Multi-agent system architecture

In the proposed MAS, attacker agent creates cyberattacks and learns about the environment and defensive mechanisms through multiple course of action and subsequent steps with varied tactics and strategies. Defender agents monitor the network behavior. If anomalies are detected, relevant incident response actions are performed to mitigate and neutralize ongoing attacks and effectively protect the system.

The attacker agent is attempting to create attack plans that have an effect but remain unidentified. The Markov property is used to simplify the MDP, which assumes that the next state is determined solely by the current state and action; this property enables the use of reinforcement learning algorithms to train the attacker agent to develop multi-step and adaptive attack plans independently of the environment. In the case of adversarial agents, the attacker agent learns to interact with the environment and develop its best model after repeated steps by performing reinforcement learning as it gets more exposure to the attacker. At training time, the algorithm is applied to exploit the best attack strategy. State representation indicates the level of awareness of the network used by the attacker at the point of use. In this example, the known vulnerabilities and defender responses are presented whereas the action space contains a wide variety of attack tactics. The reward structure provides incentives for successful exploitation and stealthy attacks. This is without detection and penalizes failed attacks and detection events. Throughout the training process, the attacker agent will try various tactics to maximize the cumulative rewards and develop complex, multi-step attack behaviours that continue to evolve as the defensive mechanisms evolve.

A defender agent uses machine learning to detect malicious activity on the network. Supervised learning models are trained on labeled traffic features extracted from the CICIDS2017 and UNSW-NB15 datasets to categorize known attack types. Unsupervised learning models are employed to identify zero-day attacks by assigning anomaly scores to abnormal network behavior, thereby detecting previously unknown threats. Upon identification of a threat by the defender agents, they respond to the threat by initiating incident response actions based on either rule based policies or learned decision making processes. As a result, they isolate compromised hosts, block malicious traffic, apply security patches or notify administrators. This limits the damage from an attack while maintaining network integrity. Also the ability of multiple defender agents to communicate allows them to share alerts and collaborate on mitigation efforts.

Agents communicate via a standard protocol that enables attackers to assess defensive measures and adjust their plans accordingly. At the same time they allow defender agents to share alert information and coordinate responses. The communication framework allows the agents to perform operations in parallel and asynchronously. This enhances the system’s scalability and efficiency relative to the number of real-time interactions among the autonomous agents. Payload metadata and temporal characteristics of raw data from CICIDS2017 and UNSW-NB15 datasets are transformed into features to serve as inputs for the agents. Normalizing the data and reducing the dimensionality of the feature space increases learning efficiency by making the data more scalable and manageable for modelling. Where applicable, class-balancing methods are also applied to reduce the disparity in the distributions of benign and attack traffic. This leads to a more robust detection and decision making of the agents. The agents are trained on repeat cycles of iterations to improve their behavior. The agents’ effectiveness is evaluated based on multiple measurements like attack success rate, detection precision, recall and average response times to assess the offensive and defensive performance of the agents. In order to validate the agents’ generalized and robust capabilities, they are validated separately on test datasets and untested scenarios within the cyber range simulator. In order to mitigate overfitting and maximize agent performance, early stopping techniques and hyperparameter optimization are applied continuously throughout the training process.

Model implementation

To illustrate the practical application of the model, the process of implementing it has been described in this section. The purpose of this section is to show evidence of correct model development, model training development, and model preparedness to be deployed in a full-scale cyber-range environment.

Implementation overview

  • The attacker and defender agents are developed using modular Python based components which integrate Artificial Intelligence models with cyber range APIs.

  • The attacker agents use reinforcement learning to adaptively select multi-stage attack sequences.

  • The defender agents use both supervised and unsupervised machine learning techniques for anomaly detection and rule-based automated response policies.

  • Inter-Agent communication is facilitated by ZeroMQ with JSON formatted message exchanges enabling coordination between the agents.

  • Cyber range APIs facilitate agent-environment interaction through event hooks and control interfaces.

Dataset preparation and feature engineering

Data from both datasets (CICIDS2017 and UNSW-NB15) has been pre-processed as shown in Table 4 to give balanced, feature-rich training data for the defender anomaly detection modules and to inform attacker state representations. The data preprocessing provided a rich set of features that were extracted while addressing the issue of missing values and balancing the classes using SMOTE. The preprocessing addressed potential biases and allowed defender agents to have a representative sample of data upon which they could do effective anomaly detection. The stratified train/test split enables robust model validation.

Table 4.

Dataset Preprocessing and Feature Engineering.

Step Raw Dataset Size Feature Extraction Missing Data Handling Class Balancing Train-Test Split
Description CICIDS2017: ~80GB raw traffic logs Flow features: packet counts, durations, flags Removal and imputation of missing values SMOTE oversampling for minority attack classes Stratified split 80%/20%
Result Initial dataset acquired Extracted ~ 85 features < 0.5% rows affected Balanced dataset (~ 50k samples) 40k training/10k testing

Developing an attacker agent model

In our RL model, the attacker agent is formulated as a Markov Decision Process (MDP), denoted as M=(S, A,P, R,γ) where:

State Space (S): The attacker’s perception of the current environment, which represents all relevant information about the network, including known vulnerabilities in network hosts, recent defensive actions (e.g., alerts, host isolation), and network configuration information. This consolidated state space allows the agent to have context that enables it to plan adaptive attack sequences.

Action Space (A): The attacker selects from a finite discrete set of actions that are representative of typical cyberattack tactics, namely:

  • Scan: Identify potential vulnerable hosts and services on the network.

  • Exploit: Perform an attempt to exploit the identified or discovered vulnerability.

  • Privilege Escalation: Escalate privileges to a higher level of access for compromised hosts.

  • Evade: Use evasion techniques to lower the detection risk of the attack; e.g., change attack vectors or timing.

Reward Function (R): The reward function is constructed to promote successful, stealthy attacks while penalize the attacker for failures to achieve success and detectability.

graphic file with name d33e1063.gif

Positive reinforcement for the agent results in an incentive to find multi-step stealthy attack paths to maximize the damage while avoiding detection.

Hyper-Parameters: The DQN architecture uses three connected layers with 128, 64 and 32 neurons. All layers use ReLU activation. Key training hyper-parameters are as follows:

Learning rate: 0.001.

Discount factor (γ): 0.95, controlling the balance between immediate and future rewards.

Exploration strategy: ε-greedy with ε linearly decaying from 0.8 to 0.1 over 5000 training episodes.

Replay buffer size: 100,000 experience tuples.

Batch size: 64 samples per training iteration.

Target network synchronization updated every 1000 steps.

Stopping and Training Criteria:

Training will continue until all episodes have been completed (Episode 5000). Agent will interact with the Cyber Range Environment on a continuous basis during this time. The two key metrics we will be monitoring will be Average Reward Per Episode and Attack Success Rate. We will terminate training prematurely if these two metrics show signs of plateauing. This indicates that the policy has converged and consistent attack patterns are being used.

This selection of parameters enables the Attacker Agent to automatically develop and utilize complex multi-step cyber-attacks which can adjust based upon Defender behavior changes. It enhances the realism and efficacy of Cyber Range Simulations.

Evaluation of training intermediate metrics is shown in Table 5.

Table 5.

Attacker agent training progression over 5000 episodes.

Training Epoch Average Reward Attack Success Rate (%) Exploration Rate (ε)
1000 25.3 18.7 0.8
2500 56.1 46.4 0.6
4000 92.7 79.1 0.3
5000 101.4 84.5 0.1

The attacker agents continually improve to learn effective attacks. This is shown by improving average rewards and increasing success rates. Also, decrease in the exploration rate demonstrate that the agent has converged to utilize learned policies with less random exploration. This is the demonstration of the stable policy development needed for the practical simulation of attacks.

Developing a defender agent model

Model: An Autoencoder was combined with a Random Forest classifier to create an anomaly detector.

Training: The model was trained using the Sklearn tool with 10-fold cross-validation on the training datasets.

Random Forest Classifier parameters are as follows:

  • Number of trees: 100.

  • Maximum tree depth: 20.

  • Criterion: Gini impurity for split quality.

  • Training: Performed using scikit-learn with 10-fold cross-validation on labeled intrusion datasets (CICIDS2017 and UNSW-NB15) to optimize parameters and prevent overfitting.

Autoencoder Neural Network parameters are as follows:

  • Activation functions: Sigmoid for hidden layers, linear output.

  • Loss function: Mean Squared Error (MSE) minimized on benign traffic to learn normal network behavior patterns.

  • Training: Early stopping is applied if validation loss does not improve over 10 consecutive epochs, preventing overfitting and ensuring generalization.

Metrics on validation set are shown in Table 6.

Table 6.

Defender anomaly detection validation metrics.

Model Component Accuracy (%) Precision (%) Recall (%) F1-Score (%)
Random Forest 91.2 89.8 90.5 90.1
Autoencoder 87.5 85.3 86 85.6
Ensemble (weighted) 93 91.7 92.1 91.9

An ensemble approach combining the Random Forest and Autoencoder outperformed each model separately, providing a good balance of precision and recall to help minimize false positives and false negatives. A high F1 score demonstrates how tightly integrated are the supervised and unsupervised learning components in the anomaly detection process in multiple cyber-scenarios.

Hyperparameter tuning for attacker and defender models are conducted with the help of grid search and cross-validation on training datasets to identify configurations yielding optimal performance. Training is stopped early based on validation loss stabilization or policy reward convergence, preventing overfitting and ensuring robust model generalization.

Integration with cyber range environment

An API was developed for each of the cyber range environments with custom hooks to allow agent communication and action execution. Agent modules were containerized using Docker to support easy deployment of multiple agents. The secure messaging channels were set up as asynchronous using ZeroMQ. On preliminary integration testing, round-trip communication from agent to environment was achieved with action execution times under 100 ms, indicating effective communication between agents and environment.

Intermediate integration testing

Agent to cyber range communication achieved latency and reliability for the intended agent to cyber range system. This is verified on the integration tests to be within required latency and reliability standards, ensuring real time responsiveness. Resource usage is stable and sustainable for multi-agent deployments. High correctness of scripted attack execution indicates reliable agent operation before full-scale simulation as shown in Table 7.

Table 7.

Intermediate system performance and integration quality indicators.

Test Scenario Result/Metric Acceptance Threshold
Agent Action Execution Latency Avg. 85ms (std dev 15ms) < 150ms
Message Delivery Reliability 99.8% successful delivery > 99%
System Resource Usage (per agent) CPU 35%, RAM 140 MB Within allotted limits
Single-Agent Cyber-Range Interaction Correct execution of 95% scripted attack steps > 90%

Summary of implementation outcomes

Documentation of this detailed implementation and intermediate results provide assurance of methodological rigor and serve as a solid base for the interpretation of the experimental results in the following sections. Table 8 justifies each significant phase of the model implementation process. The pre-processing of the data was successful in producing a comprehensive, well-balanced data set with very little missing data that could be used to properly train the defender agents. Over the course of 5,000 episodes, the attackers showed considerable growth in their ability to learn, which resulted in attacks being successful over 84% of the time and also showed a decline in the exploratory nature of their behavior as they became converged toward a specific policy. The defender’s ensemble model achieved a high F1 score (over 91%) when detecting anomalies, demonstrating how effective it is in this role. The communication between the cyber-range environment and our integrated system met all of the strict performance requirements with an average latency of actions 85 millisecond, a message reliability of greater than 99% and resource utilization that fell within all established boundaries. Finally, the scripted execution of the attackers was correct 95% of the time, validating the robustness and preparedness of the fully-integrated system for deployment into actual real-world simulations.

Table 8.

Model implementation data summary.

Step No Name of the Step Key Metrics Value
1 Dataset Preparation Raw Data Size ~ 80 GB (CICIDS2017)
Features Extracted ~ 85
Missing Data (% rows affected) < 0.5%
Balanced Dataset Size ~ 50,000 samples
Train/Test Split 40,000/10,000
2 Attacker Agent Training Training Episodes 5,000
Average Reward (Start → End) 25.3 → 101.4
Attack Success Rate (%) (Start → End) 18.7% → 84.5%
Exploration Rate ε (Start → End) 0.8 → 0.1
3 Defender Agent Validation Random Forest F1-Score (%) 90.1
Autoencoder F1-Score (%) 85.6
Ensemble Model F1-Score (%) 91.9
Ensemble Model Accuracy (%) 93
4 Communication Latency Agent Action Execution Latency (avg ± std) 85 ms ± 15 ms
Acceptance Threshold < 150 ms
5 Message Reliability Successful Message Delivery (%) 99.80%
Acceptance Threshold > 99%
6 System Resource Usage CPU Usage per Agent 35%
RAM Usage per Agent 140 MB
Allotted Limits Within limits
7 Scripted Attack Execution Correct Execution Rate (%) 95%
Acceptance Threshold > 90%

Experimental setup and results

The experiments demonstrate the efficiency of the proposed AI Multi-Agent System (MAS). As depicted in Fig. 9, adaptive cyberattacks and coordinated incident responses within a cyber range were simulated in the experimentation. This can improve detection accuracy and reduce response latency over rule-based and static systems. It also ensures the scalability of the system and the robustness of the system under different network sizes and attack complexity.

Environment configuration

CyDER 2.0, an open-source cloud-based cyber range simulator, is used. Three network topologies are described to study scalability: Small (50 hosts), Medium (120 hosts), Large (200 hosts). Each network has attacker agents (2 for each size), which use Deep Q-Network reinforcement learning to execute attacks, and defender agents (3 per size), which are equipped with anomaly detection and response capabilities. Datasets for training and validations from CICIDS2017 and UNSW-NB15 have been evaluated with respect to flow features. Three cases are examined with respect to each specific attack type. Scenario A attacks in very basic single stages, such as brute forcing or port scanning. Scenario B uses complex multi-stage attacks featuring reconnaissance, exploitation, privilege escalation, lateral movement, and data exfiltration. Scenario C attacks in multiple simultaneous modes beginning with the attack agent. The evaluation baselines consist of a static cyber-range with specified scripted attacks. Another baseline is a rule-based multi-agent system (MAS) in which agents is trained according to a fixed set of attack and defense rules, without the help of an alternative algorithm. Detection metrics like precision, recall, and F1-score are used to evaluate its performance. Response latency was obtained, that is, the average time between when an attack is detected and when in-depth response starts. The Attack Complexity Score is the measure of successful unique attack stages and multi-stage evolution. Resource usage is also calculated by computing CPU and memory consumption per agent.

Experimental results

Detection performance by scenario and network size

For the Small network with basic attacks, the AI-based MAS gets the highest F1-scoring at 95.8% precision, 93.2% recall, and 94.5% F1-score. The rule-based MAS and the static system have lower scores (Table 9). With more complex attacks in the Medium network, the AI-driven MAS remains the No. 1 network with a high F1-score of 89.9%. The rule-based MAS drops below 75.5%, and the static system falls below 59.2%. In the Large network that is a result of mixed attacks, all models achieve lower scores. The AI-based MAS was best with an F1-score of 87.9%, and the rule-based MAS (71.8%) and static system (56.1%), respectively.

Table 9.

Detection accuracy by network size, scenario, and model.

Network Size Scenario Model Precision (%) Recall (%) F1-Score (%)
Small (50) A AI-driven MAS 95.8 93.2 94.5
Rule-based MAS 81.3 78.9 80.1
Static 70.2 65.5 67.8
Medium (120) B AI-driven MAS 91.2 88.7 89.9
Rule-based MAS 77.4 73.8 75.5
Static 61.5 57.1 59.2
Large (200) C AI-driven MAS 89.6 86.3 87.9
Rule-based MAS 73.1 70.5 71.8
Static 58.4 54 56.1

As the size of the network is increased, detection accuracy slightly decreases due to the increase in complexity and diversity found in larger networks. However, regardless of the number of agents in the network, the AI driven MAS will be able to perform better than the static and rule based MAS on both the precision and recall aspects, while demonstrating a significant advantage over those methods. The ability for the MAS to demonstrate this level of performance can be attributed to the scalability of the architecture, allowing for the ability to have multiple agents operate in parallel, as well as utilizing distributed processing in order to process large networks.

Average response latency (seconds)

All of the Table 10 have shown that the AI-driven multi-agent system (MAS) has a minimum reaction time in all network sizes and attack scenarios. There is an acceptable response time of 4.2 s for the small network, 5.6 s for the medium network, and 6.1 s for the large network. These rule-based MAS are slower, running from 6.5 to 9.5 s. The slowest response time for the static cyber-range is as high as 12.1 s, and then on 12.6 s up to 18.4 s when the network becomes larger. This approach demonstrates that AI-enabled MAS identifies and works faster than all other models. Responding quickly reduces harm during an attack. This AI-based MAS are very helpful for big and complex networks because acting fast is very important.

Table 10.

Average response latency for different models and scenarios.

Network Size Small(50) Medium(120) Large(200)
Scenario A B C
AI-Driven MAS 4.2 5.6 6.1
Rule Based MAS 6.5 8.3 9.5
Static Cyber Range 12.1 15.8 18.4

The response time benchmarks referenced throughout this study are grounded in cybersecurity operational standards, and prior work indicates that automated incident response systems achieving sub-10-second latency provide timely mitigation to limit attack progression51. The AI-driven MAS maintains latency below this threshold for all tested network sizes, highlighting its practical viability for rapid detection and response in realistic cybersecurity scenarios.

Attack complexity score

The Attack Complexity score shows how many different types of attack stages were successfully executed in a simulation by the attackers using the attacker agents. Each attack stage is typically associated with a specific tactic, such as reconnaissance, exploit, escalate privileges, and lateral movement. Since this metric measures both the sophistication of the simulated attacks as well as their multi-stage nature, it also provides insight on how well the model can replicate an adversary’s behavior during real-world attacks. The complexity score is determined by first identifying the total number of successful attack phases that occurred for each attack simulated in a run and then averaging those counts across all simulated runs. Table 11 presents the attack complexity scores for the various models. It turns out that AI-powered MAS has the best statistics for all network capacities. That allows it to execute more of the particular attack steps. It also competes better in multi-stage attacks than the others. The AI-driven MAS scores 5 in the small network. For the rule-based MAS, it gives a score of 3, and so will the static system. In medium networks, the scores are 12, 7 and 4, respectively. In large networks, the scores also boost to 15 AI-driven MAS, 9 for rule-based and 5 for static. These results show that the AI-based MAS can achieve more complex attacks. It is much more flexible and complex than rule-based or static systems.

Table 11.

Attack complexity achieved by attacker agents.

Network Size Scenario AI-driven MAS Rule-based MAS Static Cyber-Range
Small (50) A 5 3 2
Medium (120) B 12 7 4
Large (200) C 15 9 5

Resource utilization

Table 12 demonstrates average resource utilization per agent type in the experiments. Attacker agents consume a rough 35% CPU and 140 MB of memory, whereas defender agents utilise slightly less, at about 30% CPU and 130 MB of memory. Then there is also some variability in the numbers as shown by the standard deviations. The outcomes demonstrate that attacker and defender agents need moderate and reasonable computational resources and are sufficiently efficient to function in normal cyber-range domains without overloading system resources.

Table 12.

Average CPU and Memory usage per agent.

Agent Attacker Defender
CPU Utilization (%) 35 ± 5 30 ± 4
Memory Utilization (MB) 140 ± 15 130 ± 12

Observations

During the first 5000 training episodes, reinforcement learning based attackers showed an increase in performance. By about episode 4200 their strategies stabilized. The anomaly detection model of the defender agent had an approximate 92% validation accuracy before integration. So agents were able to use less than 5% of the total network’s resources for communication between agents and it was indeed able to coordinate its agents effectively. Run time for simulations ranged from 30 min for a smaller network to 2 h or longer for larger ones with greater complexity due to the many different types of attacks.

Results

The study indicates that the system proposed by MAS for the AI-oriented multi-agents is superior to either rule-based models or static MAS and the same model outperforms the rule-based and static models in all major parameters. AI-Driven Multi-Agent System has an F1-score 25% higher than the non-adaptive version of the system. This shows the benefits of applying adaptive learning and collaboration in cyber threat detection. The time taken for responses is also reduced by 35%. This enables rapid response and mitigation of attacks. Reinforcement learning attacker agent develops more complex attack methods. Therefore, the system attains a higher attack complexity score than that of the static script version. Moreover, the system’s resource usage is demonstrated to be stable and consistent with respect to varying network sizes. This demonstrates that it is capable of scaling well in real-world cyber ranges. In conclusion, the evaluation confirms the applicability of AI-Driven MAS for simulation and training in cybersecurity.

Table 13 summarizes the validation and acceptance of the proposed system and shows the results of different tests done to check the system. The defender model was tested with 10-fold cross-validation and showed strong accuracy, so it was validated. The attacker model had many training runs and showed good learning and flexibility, also validated. Experts reviewed replayed scenarios and said they were very realistic, so this part was validated too. The system performance tests showed it runs fast, with high uptime, and were accepted. Stress tests proved the system handles faults well and recovers automatically, which was accepted. Finally scaling tests showed the system keeps a steady speed even when many agents and hosts are added, so scalability was validated.

Table 13.

Summary of validation and acceptance of proposed AI driven MAS.

Validation Aspect Methodology Key Outcomes Status(Validated/Accepted)
Defender Model Performance 10-fold Cross-validation Stable accuracy (average F1 > 0.90) Yes
Attacker Model Consistency Multiple independent RL trainings Robust policy convergence and adaptability Yes
Cyber-Range Scenario Fidelity Replay and expert review High realism and scenario plausibility Yes
System Performance Extended performance testing Low latency, high throughput, > 99.5% uptime Yes
Robustness & Fault Tolerance Stress testing with faults Graceful handling, automatic recovery Yes
Scalability Scaling tests with agents and hosts Linear resource use, consistent latency Yes

The incident response component of our system uses a predefined library of actions, modeled after existing cybersecurity Incident response methodologies, including NIST SP 800 − 6152. By applying these rules, the incident response component will automatically and timely mitigate the threat and protect the network’s integrity. Common actions to mitigate the threat include isolating compromised devices to stop lateral movement and blocking malicious IP addresses on firewalls and IPS systems. It further terminates suspicious user sessions and dynamically adjusts the device’s security configurations to prevent the threat from continuing to exploit it. It generates alerts for human operators to continue analyzing the threat and determining the best way to remediate it.

As an example of this functionality, if we were to identify a multi-staged attack (port scan, exploit, privilege escalation, data exfiltration) and we detect the port scan activity, the system automatically blocks the IP address of the scanner to prevent it from continuing to conduct reconnaissance. If we were to identify an exploitation attempt on a host, the host would be quarantined and would have no ability to communicate over the network. If we detect privilege escalation, the system would terminate all active sessions of the user and create forensic logs. If we were to identify anomalous outbound traffic indicating that the threat has exfiltrated data, the system would block all anomalous outbound traffic and alert security staff to review the logs to understand what occurred.

Each stage of the attack would trigger specific response rules based on the output of the Defender Agent. Utilizing a rules-based approach to incident response will provide a stable foundation for automatic defence by providing rapid containment and mitigation of evolving threats with minimal false positives and operational disruption. The rules used within the incident response module are intended to be easily extended and support the adaptive nature of the AI-driven detection modules.

Comparative analysis with existing models

Table 14 compares four models with respect to key features. The model by I. Lateș et al. does not exhibit adaptive autonomy and does not employ rule-based methods for attack simulation. While it provides some integration with cyber ranges, it does not use real data. Its detection accuracy is 65%, its average response time is greater than 15 s, and it can support fewer than five agents. In contrast, T. Zhang et al.’s model utilizes a reinforcement learning approach for autonomous operation and simulates moderately complex attacks. It lacks both cyber-range integration and the use of realistic datasets. As such, the reported accuracy of this model is 78%, although the average response time is unknown, as is the number of supported agents. I. S. Choi et al.’s model illustrates partial autonomy, which limits the scope of its attack simulation. The proposed AI-driven MAS incorporates realistic datasets and achieves a higher accuracy of 91%, lower latency of 5.3 s and scale to over twenty-five agents. These results clearly demonstrate that the proposed model will have superior performance and scalability relative to the other models considered. Table 15 shows the limitations of current models and solutions with the proposed multi-agent system (MAS). The existing models did not use adaptive learning agents to emulate the behavior of either the attacker or defender. However, the proposed MAS incorporates reinforcement learning to implement the attackers and machine learning to simulate the defenders. The available literature consists of simulation-based models, none of which covered any operational cyber range environment.

Table 14.

Comparative overview of cyber-attack simulation frameworks focussing on AI driven multi agent capabilities.

Model/Reference Adaptive Autonomy Multi-Stage Attack Simulation Cyber Range Integration Real Dataset Usage Detection Accuracy (F1) Response Latency (s) Scalability (# Agents)
I. Lateș et al.4 No Limited (rule-based) Partial No 65% > 15 < 5
T. Zhang et al.13 Yes, RL-based only Moderate No No 78% N/A 3
I. S. Choi et al.15 Partial Limited No No 74% N/A 4
Proposed AI-Driven MAS Full, RL + ML Comprehensive (adaptive multi-stage) Full Yes 91% 5.3 25+

Table 15.

Limitations of existing models addressed.

Limitation Existing Model Proposed MAS Model
Lack of adaptive learning agents Most of the existing models4,15 Integration of reinforcement learning attacker and machine learning defenders
Absence of cyber range integration Simulations only13,15 Full embedding within operational cyber range
Limited validation datasets Only synthetic or no datasets13 Realistic datasets for model validation17,18
Scalability restrictions Small network or agent counts4,13 Scalable architecture tested on large networks

Though, the model proposed in this paper is implemented fully on an operational cyber range environment. Existing models were either synthetic data or no data validated, which justifies using realistic datasets to make a more solid one. Lastly, scalability was the main limitation of all existing models as they could model very small networks and were used with few agents. On the other hand, the proposed MAS model was scalable and able to model large networks. Many agents provide a significantly higher scalability and applicability than on existing research. Hence, the proposed model solves the majority of the shortcomings found in the existing research so far.

We formulated explicit hypotheses focusing on improvements in detection accuracy, response latency, and scalability. Statistical hypothesis testing including paired t-tests and ANOVA was conducted to evaluate these claims. In all key comparisons, null hypotheses were rejected at p < 0.05, confirming the significant superiority of the AI-driven MAS over baseline models.

Conclusion

The AI based MAS has been developed in this paper to improve simulation of cyber-attacks and train responders to incident scenarios in simulated environments. The experimental results demonstrate that the model performs superior to existing models in terms of detecting cyber-attacks, speed of responding to cyber-attacks, ability to defend against complex cyber-attack scenarios and scalability to large-scale systems. The proposed AI-driven MAS architecture leverages distributed, asynchronous agent communication and modular agent deployment. This helps maintain operational efficiency as the number of agents and nodes increases. The authors also demonstrated that the AI-driven multi-agent model can be effectively used with realistic dataset, indicating that it is both functional and robust. The modular design of the system allows for simulating very large, real scenarios allowing cybersecurity professionals to be trained against current threats. The proposed model is a major advancement in providing dynamic and scalable cyber defence training and will serve as a valuable tool in increasing the preparedness to respond to rapidly changing cyber threats. The CPU and memory utilization presented here is for a server class environment common in cyber range platforms. However, running AI-driven MAS on resource constrained distributed edge devices will be challenging from computation and storage perspectives. Future work includes developing model compression techniques, lightweight inference models and hybrid edge-cloud deployment architectures to allow for MAS to function effectively within these environments.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (641.9KB, csv)

Author contributions

Ahmed Al Nuaim and Abdullah Al Nuaim conceptualized the model and conducted simulations; Mohd Nadeem performed statistical analyses; Alka Agrawal led the literature review, validation, and writing; all authors reviewed and edited the final version.

Acknowlwdgement Funding

This work was funded by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU261713].

Data availability

All data generated or analysed during this study are included as a supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kirmani, F., Unni, A. S., Kulkarni, V. P., Lackey, K. & Rose, J. R. Detecting polar ring galaxies via deep learning. RAS Tech. Instrum.10.1093/rasti/rzaf043 (2025). [Google Scholar]
  • 2.Miller, E. et al. Classifying cyber ranges: A case-based analysis using the UWF cyber range. Encyclopedia5, 162. 10.3390/encyclopedia5040162 (2025). [Google Scholar]
  • 3.Chouliaras, N. et al. Cyber ranges and testbeds for education, training, and research. Appl. Sci.11, 1809. 10.3390/app11041809 (2021). [Google Scholar]
  • 4.Stamatopoulos, D. et al. Exploring the architectural composition of cyber ranges: A systematic review. Future Internet16, 231. 10.3390/fi16070231 (2024). [Google Scholar]
  • 5.Kirmani, S. & Raghavan, P. Scalable parallel graph partitioning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’13), 1–10 (2013). 10.1145/2503210.2503280
  • 6.Alyami, H. et al. Analyzing the data of software security life-span: Quantum computing era. Intell. Autom. Soft Comput.10.32604/iasc.2022.020780 (2022). [Google Scholar]
  • 7.Lateș, I. & Boja, C. Cyber range as a competency based education instrument in cyber security. In 8th BASIQ International Conference on New Trends in Sustainable Business and Consumption, 703–710 (2022).
  • 8.Wooldridge, M. An Introduction to MultiAgent Systems 2nd edn (Wiley, 2009).
  • 9.Nadeem, M. et al. Evaluating the factors of CGTMSE scheme in bank by using fuzzy AHP. In 2023 6th International Conference on Contemporary Computing and Informatics (IC3I), 56–61 (2023). 10.1109/IC3I59117.2023.10397669
  • 10.Cai, J. et al. An overview of security threats, attack detection and defense for large-scale multi-agent systems in IoT. IEEE Trans. Ind. Cyber-Phys. Syst.3, 70–81. 10.1109/TICPS.2024.3514552 (2025). [Google Scholar]
  • 11.Hu, Z., Chen, P., Zhu, M. & Liu, P. Reinforcement learning for adaptive cyber defense against zero-day attacks. In Lecture Notes in Computer Science 11830 54–93 (Springer, 2019). 10.1007/978-3-030-30719-6_4. [Google Scholar]
  • 12.Kirmani, F., Lane, B. J. & Rose, J. R. Exploring machine learning techniques to improve peptide identification. In 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), 66–71 (2019). 10.1109/BIBE.2019.00021
  • 13.Nadeem, M. Analyze quantum security in software design using fuzzy-AHP. Int. J. Inf. Technol.10.1007/s41870-024-02002-w (2024). [Google Scholar]
  • 14.Guo, Y., Wang, L., Liu, Z. & Shen, Y. Reinforcement-learning-based dynamic defense strategy of multistage game against dynamic load altering attack. Int. J. Electr. Power Energy Syst.131, 107113 (2021). [Google Scholar]
  • 15.Zhang, T., Tang, X., Kang, J. & Xu, C. AI-driven moving target defense for VANETs: Route mutation via multiagent reinforcement learning. In Moving Target Defense Based on Artificial Intelligence 81–105 (Springer, 2025). 10.1007/978-981-95-0615-6_5. [Google Scholar]
  • 16.Waizel, G. Bridging the AI divide: The evolving arms race between AI-driven cyber attacks and AI-powered cybersecurity defenses. In Machine Intelligence & Security for Smart Cities (TRUST), 141–156 (2024).
  • 17.Choi, I. S., Hong, J. & Kim, T. W. Multi-agent based cyber attack detection and mitigation for distribution automation system. IEEE Access8, 183495–183504 (2020). [Google Scholar]
  • 18.Jiang, H., Choi, T., Ko, R. K. & Pandora, A cyber range environment for the safe testing and deployment of autonomous cyber attack tools. In Security in Computing and Communication 1–20 (Springer, 2020). [Google Scholar]
  • 19.Sharafaldin, I., Lashkari, A. H. & Ghorbani, A. A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of ICISSP, 108–116 (2018). 10.5220/0006634301080116
  • 20.Moustafa, N. & Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems. In MilCIS, 1–6 (2015). 10.1109/MilCIS.2015.7348942
  • 21.Admass, W. S., Munaye, Y. Y. & Diro, A. A. Cyber security: State of the art, challenges and future directions. Cyber Secur. Appl.2, 100031. 10.1016/j.csa.2023.100031 (2024). [Google Scholar]
  • 22.Katsantonis, M. N. et al. Cyber range design framework for cyber security education and training. Int. J. Inf. Secur.22, 1005–1027 (2023). [Google Scholar]
  • 23.Leitner, M. et al. AIT cyber range: Flexible cyber security environment for exercises, training and research. In European Interdisciplinary Cybersecurity Conference, 1–6 (2020).
  • 24.Kirmani, S., Park, J. & Raghavan, P. An embedded sectioning scheme for multiprocessor topology-aware mapping of irregular applications. Int. J. High Perform. Comput. Appl.31, 91–103. 10.1177/1094342015597082 (2017). [Google Scholar]
  • 25.Kirmani, S., Sun, H. & Raghavan, P. A scalability and sensitivity study of parallel geometric algorithms for graph partitioning. In 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 420–427 (2018). 10.1109/CAHPC.2018.8645916
  • 26.Zhang, J. et al. Springer,. A survey of cyber range: Current status, analysis, and future trends. In International Conference on Network Simulation and Evaluation, 88–101 (2023).
  • 27.Shin, Y., Kwon, H., Jeong, J. & Shin, D. A study on designing cyber training and cyber range to effectively respond to cyber threats. Electronics13, 3867 (2024). [Google Scholar]
  • 28.Mills, A., White, J. & Legg, P. GoibhniUWE: A lightweight and modular container-based cyber range. Journal of Cybersecurity and Privacy4, 615–628 (2024). [Google Scholar]
  • 29.Kirmani, F., Lane, B. & Rose, J. Identifying proteotypic peptides via deep learning. In Proceedings of the 11th International Conference on Bioinformatics Research and Applications, 42–47 (2025). 10.1145/3700666.3700691
  • 30.Tyler, J. et al. Exposing, formalizing and reasoning over the latent semantics of tags in multimodal data sources. Appl. Ontol.8, 95–130. 10.3233/AO-130124 (2013). [Google Scholar]
  • 31.Chadha, R. et al. CyberVAN: A cyber security virtual assured network testbed. In MILCOM, 1125–1130 (2016). 10.1109/MILCOM.2016.7795481
  • 32.Popoola, D., Bhattacharya, S., & Govindarasu, M. CySIDER Cybersecurity situational intelligence framework for DER networks. In Resilience Week (RWS), 1–10 (2025). 10.1109/RWS66711.2025.11304446
  • 33.Tian, J. & Zhu, Q. Reinforcement learning for cybersecurity: A review. Computers & Security106, 102280 (2021). [Google Scholar]
  • 34.Aberkane, S. & Elarbi-Boudihir, M. Deep reinforcement learning-based anomaly detection for video surveillance. Informatica46 (2022).
  • 35.Bharadiya, J. Machine learning in cybersecurity: Techniques and challenges. Eur. J. Technol.7, 1–14 (2023). [Google Scholar]
  • 36.Terranova, F., Lahmadi, A. & Chrisment, I. Scalable and generalizable RL agents for attack path discovery via continuous invariant spaces. In 28th International Symposium on Research in Attacks, Intrusions and Defenses (RAID) (2025).
  • 37.Chenghai, W., Kaiyu, O. & Jiying, W. Multi-agent based information warfare system modeling and simulation. In IEEE CSAA Guidance, Navigation and Control Conference (CGNCC), 1–7 (2018).
  • 38.Kirmani, S. & Shankar, M. Generating keywords by associative context with input words. Google Patents (2022).
  • 39.Mishra, A., Kirmani, S. & Madduri, K. Fast spectral graph layout on multicore platforms. In Proceedings of the ACM International Conference on Supercomputing (2020). 10.1145/3404397.3404471
  • 40.Bougueroua, N. et al. A survey on multi-agent based collaborative intrusion detection systems. J. Artif. Intell. Soft Comput. Res.11, 111–142 (2021). [Google Scholar]
  • 41.Zhang, Z. et al. Psysafe: A comprehensive framework for psychological-based attack, defense, and evaluation of multi-agent system safety. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 15202–15231 (2024).
  • 42.Nelson, A., Rekhi, S., Souppaya, M. & Scarfone, K. Incident response recommendations and considerations for cybersecurity risk management: A CSF 2.0 community profile. NIST Special Publication 800 − 61 Rev. 3. 10.6028/NIST.SP.800-61r3 (2025).
  • 43.Kirmani, S. & Madduri, K. Spectral graph drawing: Building blocks and performance analysis. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 269–277 (2018). 10.1109/IPDPSW.2018.00053
  • 44.Upadhyay, S. et al. Application of reinforcement learning in adaptive cyber defence mechanism. In Lecture Notes in Networks and Systems 1287, 443–455 (Springer, 2025). 10.1007/978-981-96-3284-8_33.
  • 45.Alauthman, M. et al. Reinforcement learning for adaptive cyber defense training autonomous systems for dynamic threat response and strategy optimization. In AI-Driven Security Systems and Intelligent Threat Response Using Autonomous Cyber Defense, 209–234 (IGI Global, 2025).
  • 46.Hammad, A. & Tarik, J. F. Adaptive cyber defense using advanced deep reinforcement learning algorithms: A real-time comparative analysis. J. Comput. Theor. Appl.2, 523–535 (2025). [Google Scholar]
  • 47.Chen, J. et al. Defending against APT attacks in cloud computing environments using grouped multiagent deep reinforcement learning. IEEE Internet Things J.12, 19459–19470. 10.1109/JIOT.2025.3542119 (2025). [Google Scholar]
  • 48.Verma, S. AI-driven autonomous incident response: Revolutionizing cybersecurity operations with real-time threat mitigation. Int. J. Commun. Networks Inf. Secur.17, 69–78 (2025). [Google Scholar]
  • 49.Rebet, J. AI automated incident response and threat mitigation using AI. In Revolutionizing Cybersecurity With Deep Learning and Large Language Models 201–236 (2025). [Google Scholar]
  • 50.Omar, M. et al. (eds) Integrating Artificial Intelligence in Cybersecurity and Forensic Practices (IGI Global, 2025). 10.4018/979-8-3373-0588-2.
  • 51.Al-Thani, G. M. The AIM-PRISM framework: A novel strategic model for machine learning and artificial intelligence deployment in national infrastructure cybersecurity. Adv. Artif. Intell. Mach. Learn.5, 4053–4073 (2025). [Google Scholar]
  • 52.Sharafaldin, I., Habibi Lashkari, A. & Ghorbani, A. A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, Funchal, 22–24 January 2018, 108–116 (2018). 10.5220/0006639801080116

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (641.9KB, csv)

Data Availability Statement

All data generated or analysed during this study are included as a supplementary information files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES