Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 11;15:29358. doi: 10.1038/s41598-025-15225-z

Soft actor-critic algorithm and improved GNN model in secure access control of disaggregated optical networks

Zhenqian Zhao 1,, Yuhe Wang 2
PMCID: PMC12339999  PMID: 40790073

Abstract

To address the challenges of coordinated defense amid dynamic topology evolution and multidimensional security threats in decomposed optical networks, this study introduces the Graph-Entangled Security Actor-Critic (GESAC) model. GESAC is built on spatiotemporal modeling of evolving topologies and leverages a cross-layer spatiotemporal Graph Neural Network (GNN) to capture causal dependencies between optical path switching and access requests. Additionally, it enables adaptive delineation of security boundaries across multiple domains through federated representation learning. Within this framework, the Soft Actor-Critic (SAC) algorithm is employed to construct a policy optimization mechanism. By integrating entropy-guided multi-objective reinforcement learning, GESAC maps encoded network states to access control strategies, jointly optimizing for security, service quality, and system resilience. Experimental validation is conducted on a heterogeneous dataset comprising Cooperative Association for Internet Data Analysis (CAIDA) topology data, Canadian Institute for Cybersecurity Intrusion Detection Systems (CIC-IDS) access logs, and International Telecommunication Union Telecommunication Standardization Sector threat characteristics. The dataset encompasses 12 attack scenarios, 57,000 dynamic topology sequences, and 2.8 million cross-domain authentication events. Key findings include: (1) Threat Detection: GESAC achieves an F1-score of 0.915–0.931 in identifying physical-layer attacks such as wavelength eavesdropping and cross-domain privilege escalation, with a false positive rate as low as 0.7%. (2) Resource Optimization: Compared to greedy strategies, GESAC improves wavelength utilization variance by up to 58.9% and reduces end-to-end latency standard deviation by up to 57.7% under high-load conditions. (3) Policy Robustness: In scenarios involving topological mutations, the model increases Pareto frontier coverage by over 100% and reduces policy entropy decay rate by more than 65%, indicating strong robustness. (4) Scalability: At a scale of 100,000 network nodes, GESAC achieves a single-step decision latency of just 25.6µs and significantly reduces communication overhead, demonstrating excellent scalability. GESAC is designed to overcome the limitations of static security policies in the face of dynamic decomposition and large-scale attacks in optical networks. Integrating causal inference with game-theoretic equilibrium redefines the security control paradigm—shifting from passive defense to proactive resilience—and provides an interpretable, highly adaptive foundation for next-generation architectures such as multi-domain collaboration and computing-network convergence.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-15225-z.

Keywords: Disaggregated optical networks, Optical network secure access control, Cross-layer spatiotemporal graph attention mechanism, GESAC, Dynamic adaptation

Subject terms: Materials science, Mathematics and computing

Introduction

In the digital era, optical networks—functioning as the backbone of modern communication infrastructure—play a vital role in global data exchange, owing to their high bandwidth and low latency capabilities1. Cloudflare’s latest report shows that global internet traffic grew by 17.2% in 2024 compared to the previous year. Mobile devices now account for over 40% of total traffic, significantly increasing the performance and security demands on underlying optical transmission networks. Additionally, cybersecurity threats continue to escalate worldwide. According to the World Economic Forum’s Global Risks Report 2024, 39% of respondents believe cyberattacks are likely to spark global crises in the near future. Cybercrime is now ranked among the top ten global risks for the coming years. In its 2025 Security Report, Check Point projected a 44% year-over-year increase in cyberattacks globally. Verizon’s Data Breach Investigations Report further highlighted a shift in threat patterns: in 2025, vulnerability exploitation overtook phishing as the leading initial attack vector, surpassing credential abuse. Notably, 22% of attacks directly targeted Virtual Private Networks (VPNs) and gateway devices—an eightfold increase from the previous year.

In this context, as network architectures evolve toward greater decomposition and multi-domain integration, static rule-based access control mechanisms are increasingly insufficient for addressing complex, dynamic security threats2,3. Current optical network defenses largely depend on static perimeter definitions and role-based access controls. These systems rely on predefined attack signatures or single-policy templates, limiting their ability to respond to rapidly shifting access patterns and evolving topologies. Cross-domain access within optical networks is also becoming more common, with subdomain boundaries growing increasingly blurred. As a result, attack paths now demonstrate greater lateral movement and chain-propagation behavior. This trend undermines the effectiveness of single-point defenses and conventional blacklist–whitelist strategies. Moreover, optical networks must now meet rising demands driven by large-scale computing resource scheduling, cloud service integration, and smart device connectivity. These networks must balance strong security isolation with high performance across multiple dimensions—including bandwidth utilization, wavelength allocation fairness, and service latency. This growing complexity intensifies the conflict between secure access control and efficient resource orchestration. Currently, optical networks face three fundamental challenges: (1) Escalating Attack Techniques: Threats are evolving from conventional denial-of-service (DoS) attacks to advanced physical-layer exploits, such as wavelength eavesdropping. (2) Dynamic Topologies: Constant changes in network topology obscure security boundaries, complicating the tracing of cross-domain attack paths. (3) Resource-Security Tradeoffs: There is an increasing need to balance resource allocation with adaptive enforcement of security policies4.

Fundamentally, advancing secure access control in optical networks hinges on dynamically modeling network states, accurately identifying potential threats, and optimizing policies in real time. The Soft Actor-Critic (SAC) algorithm, a reinforcement learning technique, is well-suited for these tasks. Through continuous interaction with its environment, SAC learns optimal policies and adapts effectively to complex, dynamic conditions—making it a strong fit for the nature of optical networks5. Meanwhile, Graph Neural Network (GNN) is highly effective in processing graph-structured data by leveraging node and edge message-passing mechanisms to capture high-order relationships. Given the intricate and interwoven topologies of optical networks, GNN is well positioned to model their structural evolution and the associated security dependencies6. However, under a single-method framework, static models often struggle to capture the cascading security effects triggered by real-time topology changes. Moreover, access control strategies and physical resource management are typically implemented in isolation, lacking holistic coordination. To address these limitations, integrating SAC and GNN offers a compelling solution: SAC provides dynamic decision-making capabilities, while GNN enables sophisticated feature extraction and structural modeling. Enhancing GNN to reflect the unique security demands of optical networks further strengthens their applicability in access control contexts. In response, this study proposes the Graph-Entangled Security Actor-Critic (GESAC) model, which synergizes reinforcement learning with graph-based modeling. GESAC seeks to establish a new paradigm for secure access control in decomposed optical networks by constructing a collaborative defense system that simultaneously ensures global security, optimizes resource efficiency, and adapts policy strategies to evolving threats.

Implementation principles of SAC and GNN models

SAC is an algorithm based on maximum entropy RL, designed to simultaneously enhance cumulative rewards and policy diversity during optimization. It employs dual critic networks to estimate state-action values, reducing value estimation bias, while an actor network generates policies and maximizes policy entropy to enhance exploration. During training, SAC integrates experience replay and target networks and optimizes policies by minimizing Bellman error and maximizing entropy regularization, ensuring stable and robust policy outputs. The implementation principles of SAC are illustrated in Fig. 1.

Fig. 1.

Fig. 1

Implementation principles of the SAC algorithm.

GNN employs an iterative message-passing mechanism to perform feature aggregation between nodes and their adjacent structures, enabling representation learning for graph-structured data. At each layer, GNN aggregates and updates the target node’s representation based on its neighbors’ weighted features, thereby capturing complex topological relationships and inter-node dependencies7. In optical network modeling, GNN can effectively represent physical links, node roles, and their interactions, mapping network states into high-dimensional embeddings as inputs to the policy model. This structured information supports decision optimization in subsequent processes. The implementation principles of the GNN model are illustrated in Fig. 2.

Fig. 2.

Fig. 2

Implementation principles of the GNN model.

Building on traditional GNN architectures, this study introduces a multi-layer spatiotemporal attention mechanism and a cross-layer gated fusion structure to enhance the modeling capabilities of dynamic optical network characteristics. A temporal sliding window mechanism captures topological evolution patterns over different time periods. This is combined with a self-attention mechanism to extract latent causal dependencies between nodes, improving the model’s awareness of topology mutations and attack propagation paths8. A cross-layer gating module is designed to control the information flow across different GNN layers, ensuring effective fusion of physical-layer features (e.g., wavelength allocation, path switching) with logical-layer states. This enhances the model’s ability to express heterogeneous information9. A community discovery approach partitions the graph into security subdomains, allowing the model to perform local training and global aggregation under a federated learning framework. This mitigates policy fragmentation issues caused by resource fragmentation and provides fine-grained structural constraints for policy optimization10,11. The implementation principles of the improved GNN model are illustrated in Fig. 3.

Fig. 3.

Fig. 3

Implementation principles of the improved GNN model.

In the improved GNN, node features evolve over time and are encoded as follows at time Inline graphic and layer Inline graphic:

graphic file with name d33e300.gif 1

Inline graphic represents the graph embedding of node v, Inline graphic denotes its neighbor set, Inline graphic is the attention weight, Inline graphic is the layer-specific parameter matrix, and Inline graphic is the activation function.

The attention weight is determined by the temporal perception mechanism:

graphic file with name d33e339.gif 2

Inline graphic represents the attention vector, and Inline graphic denotes vector concatenation. Inline graphic and Inline graphic are mapping matrices, while Inline graphic is the temporal decay factor.

The temporal decay factor models the influence strength of past events on the current state, as shown in Eq. (3):

graphic file with name d33e382.gif 3

Inline graphic is the temporal decay rate, and Inline graphic represents the end of the current time window.

The cross-layer gating mechanism dynamically fuses features from adjacent layers:

graphic file with name d33e403.gif 4

Inline graphic is the gated output, Inline graphic is the gating weight, and Inline graphic denotes the Hadamard product.

The gating weight is controlled by the node’s own state and graph structure:

graphic file with name d33e430.gif 5

Inline graphic is the gating weight matrix, and Inline graphic denotes the node degree.

The graph security subdomains are defined based on embedding similarity:

graphic file with name d33e451.gif 6

Inline graphic represents the Inline graphic-th security subdomain, Inline graphic is the Inline graphic-th cluster center.

After each iteration, the subdomain centers are updated as follows:

graphic file with name d33e484.gif 7

Inline graphic represents the number of nodes in the Inline graphicth subdomain.

Building on the above mathematical formulations, Fig. 4 presents the implementation workflow of the improved GNN model, detailing the data flow from initial input to the final output.

Fig. 4.

Fig. 4

Data flow implementation of the improved GNN model.

Implementation of the GESAC model

The GESAC model constructs a graph-based RL framework tailored to the spatiotemporal characteristics of dynamic optical networks. The core idea is to model network topology evolution using a GNN and integrate it with the SAC algorithm for policy optimization. At the lower level, the improved GNN extracts structural features and security context information from each physical subdomain. A temporal attention mechanism identifies dynamic interactions between optical path switching and access requests. A gating mechanism enhances semantic consistency in multi-layer graph representations. Community partitioning enables automatic identification of security boundaries.

On this basis, GESAC incorporates a federated architecture to aggregate GNN representations from different subdomains into a unified state vector, which is then processed by the SAC module for policy evaluation and action decision-making4. SAC employs a dual-critic mechanism to estimate the long-term rewards of access policies12. By maximizing expected rewards and policy entropy, the actor network generates diverse and robust access control actions. Throughout the training process, GNN and SAC share parameters and undergo joint optimization, allowing the model to balance resource utilization and access security while maintaining strong adaptability13,14.

In GESAC, the global state is constructed by aggregating subdomain graph embeddings:

graphic file with name d33e542.gif 8

Inline graphic represents the graph structure of the Inline graphicth subdomain at time Inline graphic, while Inline graphic denotes the corresponding GNN encoder. The operation Inline graphic refers to aggregation.

The probability distribution of control actions output by the policy network is computed as shown in Eq. (9):

graphic file with name d33e584.gif 9

Inline graphic is the actor policy parameterized by Inline graphic, Inline graphic represents the action, and Inline graphic is the state. The mean and variance are computed by a neural network.

The soft Q-value of the state-action pair, estimated by the value network, is calculated as shown in Eq. (10):

graphic file with name d33e620.gif 10

Inline graphic denotes the critic network, and Inline graphic represents the target network parameters. The term Inline graphic is the discount factor, while Inline graphic denotes the entropy coefficient.

The policy objective maximizes the expected entropy-adjusted return, computed as shown in Eq. (11):

graphic file with name d33e656.gif 11

The critic loss function is computed as shown in Eq. (12):

graphic file with name d33e667.gif 12

The target value is derived from the target network and entropy regularization, as shown in Eq. (13):

graphic file with name d33e679.gif 13

The gradient of the graph embedding is obtained through backpropagation from the policy loss:

graphic file with name d33e687.gif 14

Inline graphic represents the GNN parameters, and Inline graphic is the state graph embedding.

After federated aggregation, the global policy parameters are obtained as follows:

graphic file with name d33e708.gif 15

Inline graphic denotes the local parameters of the Inline graphicth subdomain at round Inline graphic, and Inline graphic represents the number of its samples.

The final action must satisfy both physical and security constraints:

graphic file with name d33e741.gif 16

Inline graphicis the access control matrix, while Inline graphic and Inline graphic represent resource and security constraints, respectively. The thresholds are denoted by Inline graphic and Inline graphic.

The pseudocode for implementing the GESAC model is shown in Fig. 5.

Fig. 5.

Fig. 5

Pseudocode for GESAC model implementation.

The tabulation of related variables in the above code is shown in Table 1:

Table 1.

Symbol and variable definitions for pseudocode description.

Symbol/variable Description
k Subdomain index, representing the k-th distributed subnetwork in federated learning
θk Parameter set of the GNN encoder in the k-th subdomain
πk Actor policy network in the k-th subdomain
Qk1, Qk2 Twin Critic networks in the k-th subdomain
πg Global policy network obtained through federated aggregation
Dk Experience replay buffer in the k-th subdomain
Gk Current dynamic graph sequence in the k-th subdomain
sk, s’k State vectors encoded by the GNN, representing the current and next states
ak Action (i.e., access control decision) output by the policy under state sk
rk Immediate reward received after action execution
γ Discount factor controlling the impact of future rewards
α Entropy temperature coefficient, balancing exploration and exploitation
N Federated communication interval (model aggregation and synchronization every N steps)
Q_target Target value function used for Critic network update via Bellman equation
MaxEpisode Maximum number of training episodes, defining the upper limit of iterations
EncodeGraph() Graph encoding function mapping node and edge attributes to state vectors via GNN
SampleAction() Entropy-regularized action sampling from policy π
StepEnvironment() Simulates environment interaction: executes action and returns reward and next state

Data collection process for GESAC model validation

This study employs a heterogeneous data federation framework, integrating Cooperative Association for Internet Data Analysis (CAIDA) topology data, Canadian Institute for Cybersecurity Intrusion Detection Systems (CIC-IDS) access logs, and International Telecommunication Union Telecommunication Standardization Sector (ITU-T) threat characteristics to construct a dynamic optical network validation dataset. First, subgraph skeletons matching optical network characteristics are extracted from CAIDA’s Multiprotocol Label Switching (MPLS) dataset. Topologies with abnormal node degree distributions or excessive link wavelength capacities are filtered out, preserving only core nodes and critical optical paths to form the physical-layer infrastructure. Next, based on the CIC-IDS-2017/2018 dataset, attack events such as brute force and Distributed Denial of Service (DDoS) are mapped to corresponding nodes in the CAIDA subgraph using IP address hashing and timestamp alignment. This process generates dynamic access request sequences with labeled normal and abnormal behaviors. Additionally, in accordance with ITU-T X.1818 standards, threats such as wavelength eavesdropping and cross-domain privilege escalation are transformed into executable edge attributes. These threats are injected into the topology links following predefined attack chain patterns, simulating multi-stage coordinated attack scenarios. The relationships among data applications are illustrated in Fig. 6.

Fig. 6.

Fig. 6

Application relationships in the heterogeneous data federated framework.

In the data preprocessing stage, a spatiotemporal encoder is designed to unify multi-source data formats: (1) Node attributes integrate device status and logical authentication records; (2) Edge attributes incorporate physical-layer metrics and security labels. A sliding window mechanism segments dynamic topology sequences, ensuring temporal and spatial dependency modeling by maintaining graph snapshots across consecutive time steps. The final dataset includes: 57,000 topology sequences, 2.8 million cross-domain authentication events, 17 types of adversarial attack chains, Coverage of composite threats, such as wavelength eavesdropping and logical-layer infiltration. The pseudo code of the data preprocessing stage is shown in Fig. 7.

Fig. 7.

Fig. 7

Pseudocode of data preprocessing stage.

The key parameters of the dynamic topology and attack scenarios of the optical network are summarized in Table 2:

Table 2.

Key parameters of optical network dynamic topology and attack scenarios.

Parameter category Specific parameter Data source Description
Topology structure parameters Number of nodes CAIDA MPLS Total number of physical nodes in each subgraph
Number of links Number of optical path connections (unidirectional/bidirectional)
Node degree distribution Average degree, maximum degree, minimum degree
Wavelength capacity (wavelengths/link) Simulation supplement Number of wavelengths supported by each optical link
Attack injection parameters Attack types (17 categories) ITU-T X.1818 Includes wavelength eavesdropping, DDoS, cross-domain privilege escalation, etc.
Attack stages (single-point/multi-stage) Indicates whether the attack is part of a complex attack chain
Threat impact level (1–5) Ranges from 1 (low risk) to 5 (catastrophic impact)
Attack duration (ms) CIC-IDS-2017/2018 Extracted from logs to determine attack time span
Dynamic evolution parameters Topology change frequency (times/s) CAIDA + simulation Simulated rate of optical path switching or node failures
Attack occurrence frequency (times/topology change) ITU-T + CIC-IDS Number of injected attacks per topology change cycle
Normal traffic vs. attack traffic ratio CIC-IDS-2017/2018 Ratio of legitimate to malicious requests, based on log statistics

The 17 types of adversarial attacks are summarized in Table 3:

Table 3.

Summary of 17 types of adversarial attacks.

Attack Scope Category count Attack type Attack stage Attack code
Physical layer attacks (A) 3 Wavelength eavesdropping15 Single-point A1
Optical path hijacking Multi-stage A2
Cross-domain privilege escalation Multi-stage A3
Data layer attacks (B) 3 Data tampering16 Single-point B1
Man-in-the-middle attack Multi-stage B2
Data leakage Single-point B3
Logical layer attacks (C) 3 Web penetration attack Multi-stage C1
APT (advanced persistent threat) attack Multi-stage C2
SQL (structured query language) injection Single-point C3
Service layer attacks (D) 3 DoS (denial of service)18 Single-point D1
Botnet Multi-stage D2
Service spoofing Multi-stage D3
Application layer attacks (E) 2 DDoS Single-point E1
Brute force attack Single-point E2
Advanced composite attacks (F) 3 Route hijacking Multi-stage F1
Resource exhaustion attack Single-point F2
Protocol vulnerability exploitation18 Multi-stage F3

Before conducting the formal validation, and to highlight the advantages of the proposed model, the study compares the proposed model with different baseline models selected for various objectives. The selection of baseline models is summarized in Table 4.

Table 4.

Selection of baseline models for comparative analysis.

Analysis dimension Baseline model Comparative justification Key parameters of baseline model
Multi-dimensional attack detection performance comparison Traditional static role-based access control (RBAC) model Evaluates performance differences between static and dynamic strategies in detecting emerging attacks19 Role-permission matrix: 100 × 50; rule update cycle: fixed at 24 h
Q-learning algorithm Validates SAC’s superiority over traditional RL in policy exploration and multi-objective trade-offs20 Learning rate: 0.01; discount factor: 0.99; state space dimension: node degree + link load (50 dimensions)
ML-based detector (SVM - support vector machine) Compares traditional feature engineering with GNN for dynamic topology-based attack detection sensitivity21 Feature dimensions: 20 (packet rate, connection count, etc.); kernel function: RBF (radial basis function); penalty factor c = 10
Dynamic resource allocation balance validation Greedy algorithm Highlights the contrast between global optimization and local optimal strategies in resource balancing22 Link Selection Policy: Lowest real-time load priority; History State Memory Window: None.
Shortest path first Validates the necessity of multi-objective optimization compared to single-objective (minimum hops)23 Routing update interval: 5 s; link cost calculation: hop count; maximum path count: 3
Robustness evaluation under topology variations Dynamic programming graph generation (DPGG) model Compares centralized vs. federated architectures in terms of policy stability in dynamic topologies24 Network structure: actor (256–128–64)—critic (256–128); experience replay buffer size: 1e5; batch size: 64
Distributed architecture efficiency & scalability testing Not applicable N/A N/A

To evaluate the performance of GESAC in complex optical network environments, this study aims to assess its security, stability, robustness, and deployability. Accordingly, the evaluation is conducted from four perspectives: multi-dimensional attack detection performance, dynamic resource allocation balance, policy robustness under topological mutations, and scalability of the distributed architecture25,26. A set of key performance metrics has been identified for each evaluation dimension, as summarized in Table 5.

Table 5.

Key metrics used in multi-dimensional evaluation of GESAC.

Evaluation aspect Metric Definition Calculation method
Multi-dimensional attack detection F1-score Composite indicator of detection precision 2 × (Precision × Recall)/(Precision + Recall)
False positive rate (FPR) Proportion of benign traffic misclassified as malicious FP/(FP + TN)
Dynamic resource allocation balance Wavelength utilization variance Measures load balancing across resources Variance of wavelength utilization across nodes
Standard deviation of E2E latency Assesses transmission stability under varying load Standard deviation of end-to-end path latency
Robustness under topological mutation Policy entropy decay rate (%/k steps) Measures policy diversity reduction rate Decline rate of policy entropy per thousand steps
Pareto frontier coverage (%) Proportion of optimal solutions in the objective space Ratio of Pareto-optimal solutions generated by the model
Coverage improvement (%) Improvement in Pareto coverage over baseline ((SESAC − DPGG)/DPGG) × 100%
Entropy decay reduction (%) Degree of suppression in entropy decline ((DPGG − SESAC)/DPGG) × 100%
Distributed architecture scalability Single-step decision latency (µs) Time taken to generate a single policy action Total policy computation time/number of decisions
Federated communication overhead (MB/cycle) Communication cost per cycle across subdomains Total communication volume across subdomains/number of communication cycles
Centralized communication overhead (MB/cycle) Communication cost under centralized system Total communication volume to central node/number of cycles
Communication reduction ratio (%) Reduction in overhead via federated learning ((Centralized − Federated)/Centralized) × 100%
Scalability slope (MB/subdomain) Communication load growth rate with number of subdomains Federated communication overhead/number of subdomains

To enhance reproducibility and ensure transparency of experimental parameters, the study provides a comprehensive summary of key configuration settings prior to formal analysis. This includes network topology scale, attack injection strategies, model training hyperparameters, and runtime environment specifications. Given the stochastic nature of both topology variations and attack injections, the settings also explicitly define parameter ranges, fixed random seeds, and synchronization cycles. These details offer a consistent technical foundation for interpreting results and facilitating cross-method comparison. The complete configuration information is shown in Table 6.

Table 6.

Key configuration parameters for experimental analysis.

Parameter category Parameter name Value/range Notes
Topology configuration Number of graph nodes 100–100,000 Sampled from CAIDA subgraphs
Number of graph edges 150–320,000 Filtered based on degree distribution and wavelength availability
Number of subdomains 10–500 Automatically generated via subgraph partitioning
Wavelength capacity range 4–16 Specifies physical link capacity constraints
Attack injection settings Attack injection frequency 1–2 attacks per 100 topology updates Simulates multi-stage collaborative attack chains
Attack type coverage 17 types (physical/logical/composite) Based on CIC-IDS and ITU-T threat mapping
Topology mutation interval Every 10–50 steps Simulates dynamic topological evolution
Model training parameters Number of GNN layers 2–4 Uses improved GNN architecture
GNN hidden dimension 64/128/256 Kept consistent across experiments
SAC learning rate 1e-4–5e-4 Used for joint training of actor and critic networks
Entropy coefficient (α) 0.1–0.2 Balances exploration and exploitation
Batch size 256/512 Tuned according to model size
Temporal decay factor (λ) 0.95–0.99 Enhances model responsiveness by decaying outdated state information
Number of communities (K) 20–100 Maximum number of graph partitions per subdomain for federated aggregation granularity
Operating environment Computing platform CPU cluster Supports large-scale simulations
Random seed Fixed at 42 Ensures experimental reproducibility
Hardware setup Intel Xeon Gold 6338 × 2, NVIDIA A100 (80GB) ×4, 512GB DDR4 RAM High-performance cluster for concurrent training and simulation
Software stack Ubuntu 20.04, CUDA 11.6, PyTorch 1.13, DGL 0.9.1, Python 3.8 Standard deep learning and graph processing framework supporting federated parallel scheduling and GPU acceleration

Application effectiveness of GESAC in optical network security access control

Multi-dimensional attack detection performance analysis

To evaluate GESAC’s improvements in threat detection, the study extracted various attack samples from integrated datasets, including: (1) Emerging threats, such as wavelength eavesdropping and cross-domain privilege escalation; (2) Traditional attacks, such as DDoS and brute force attacks. Using F1-score ad false positive rate (FPR) as the core evaluation metrics, the experiment maintained a consistent test environment through controlled variable methodology. GESAC was compared against baseline models, with results visualized in Fig. 8.

Fig. 8.

Fig. 8

Comparison of multidimensional attack detection performance.

As illustrated in Fig. 8, two key findings emerge. First, in terms of F1-score, GESAC exhibits markedly superior performance across most attack types. For emerging physical-layer threats, such as wavelength eavesdropping (A1), optical path hijacking (A2), and cross-domain privilege escalation (A3), GESAC achieves F1-scores of 0.931, 0.915, and 0.921, respectively. These represent improvements of over 0.60 compared to static RBAC and nearly 0.45 over SVM, underscoring the model’s enhanced precision in capturing dependencies between optical path switching and anomalous behaviors through spatiotemporal graph modeling. Second, with respect to FPR, GESAC demonstrates strong robustness. In application-layer attacks such as DDoS (E1) and brute-force login (E2), it maintains low FPRs of 0.9% and 0.7%, respectively—significantly lower than SVM (2.7%, 2.1%) and Q-Learning (3.5%, 2.8%). For more complex, multi-stage attacks like routing hijacking (F1) and protocol exploitation (F3), GESAC contains the FPR within 2.2–2.4%, compared to over 10% in static RBAC. Furthermore, for data- and service-layer threats such as man-in-the-middle (MITM) attacks (B2) and botnet activity (D2), GESAC consistently maintains F1-scores above 0.85 and FPRs below 3%. This performance substantially outpaces traditional models, which often exhibit unstable F1-scores and elevated false positives when facing multi-stage attacks. These results confirm that GESAC not only delivers high accuracy for isolated attacks but also effectively infers causality and temporal patterns in complex, sequential attack chains. GESAC’s core strength lies in its ability to detect anomalous edge attributes during topological evolution, while employing entropy-maximized policy learning to strike a balance between detection accuracy and false positive suppression. This allows it to overcome the limitations of static rule-based approaches, which often suffer from under-detection or overfitting due to inflexible policy definitions. Consequently, GESAC addresses fundamental challenges in decomposed optical networks, including fragmented attack features and detection delays.

Dynamic resource allocation balance validation

For the optimization of resource allocation balance, the study simulates different traffic load conditions using CAIDA topology data, records the distribution of wavelength utilization, and calculates its variance to measure the uniformity of resource allocation. Additionally, the standard deviation of end-to-end latency in milliseconds is monitored to assess the model’s stability under high load fluctuations. By comparing baseline models, the relative trend of changes in wavelength utilization variance (Indicator 1) and end-to-end latency standard deviation (Indicator 2) is analyzed. The results are shown in Table 7:

Table 7.

Dynamic resource allocation balance validation results.

Scenario Attack status Metric type GESAC result Greedy algorithm result Shortest path first result Optimization (vs. greedy) Optimization (vs. shortest path first)
Low load No attack Indicator 1 0.12 0.18 0.25 33.3% ↓ 52.0% ↓
No attack Indicator 2 2.1 3.5 4.8 40.0% ↓ 56.25% ↓
Normal load No attack Indicator 1 0.21 0.37 0.52 43.2% ↓ 59.6% ↓
No attack Indicator 2 3.8 6.2 8.9 38.7% ↓ 57.3% ↓
High load No attack Indicator 1 0.34 0.72 1.05 52.8% ↓ 67.6% ↓
Wavelength eavesdropping attack Indicator 1 0.39 0.95 1.32 58.9% ↓ 70.5% ↓
No attack Indicator 2 5.2 9.1 12.4 42.9% ↓ 58.1% ↓
Wavelength eavesdropping attack Indicator 2 6.7 14.3 18.9 53.1% ↓ 64.6% ↓
Peak load DDoS attack Indicator 1 0.51 1.24 1.68 58.9% ↓ 69.6% ↓
DDoS attack Indicator 2 8.3 19.6 24.7 57.7% ↓ 66.4% ↓

As shown in Table 7, under dynamic load conditions, GESAC significantly mitigates the risk of localized resource overload through cross-domain topological awareness and a federated coordination mechanism. In scenarios combining high traffic loads with wavelength eavesdropping attacks, GESAC reduces the variance in wavelength utilization from 0.95 to 0.39—a 58.9% improvement. Simultaneously, the standard deviation of end-to-end delay drops from 14.3 ms to 6.7 ms, representing a 53.1% reduction. These results validate GESAC’s capability to alleviate the dual impact of physical-layer resource contention and attack-induced service degradation. In more extreme conditions—such as peak load scenarios combined with DDoS attacks—GESAC achieves performance improvements of 58.9% and 57.7% in Indicator 1 and Indicator 2, respectively, demonstrating strong generalization and adaptability in dynamic scheduling. Even under stable load conditions in resource-intensive regions, GESAC reduces the variance in wavelength utilization from 0.37 to 0.21, a 43.2% decrease. Regarding end-to-end latency, it delivers up to a 57.3% improvement compared to shortest-path-first strategies. Moreover, during transitions from low to high load in the absence of attacks, the increase in GESAC’s delay standard deviation remains significantly lower than that of the greedy algorithm—rising from 2.1 to 5.2, compared to an increase from 3.5 to 9.1—highlighting GESAC’s anticipatory responsiveness to traffic trends through spatiotemporal causal inference. In summary, GESAC enables dynamic and balanced scheduling across multiple domains by jointly optimizing physical resource constraints and logical access policies. Even under the combined pressures of topological disruptions and adversarial interference, it maintains a tightly coupled relationship between wavelength utilization uniformity and service stability—surpassing the limitations of traditional scheduling mechanisms that rely on static configurations and centralized control.

Strategy robustness evaluation under topological mutation

The study constructs dynamic topology sequences using CAIDA data, injecting random structural changes, and records the adjustment process of the model’s strategy to evaluate its robustness under topological mutation. The Pareto front coverage rate is used to measure the model’s optimization capability under multiple constraints, reflecting the trade-off between security and service quality. Additionally, the decay rate of strategy entropy is monitored. By comparing the changes in strategy entropy under different attack scenarios, the robustness of GESAC in dealing with multi-stage coordinated attacks is analyzed. The analysis results are shown in Fig. 9:

Fig. 9.

Fig. 9

Strategy robustness evaluation under topological mutation.

In Fig. 9, GESAC enhances the robustness and adaptability of strategies in topological mutation scenarios compared to baseline models. When faced with dynamic changes such as node failures and link breakages, the model employs cross-layer federated perception and spatiotemporal causal reasoning to convert topology disruptions into controllable adjustments in the multi-objective optimization space, rather than causing a global degradation of the strategy. For example, in a scenario involving subdomain isolation and high-frequency optical path switching attacks, the model uses a dynamic game mechanism to balance security isolation and resource scheduling needs. This results in a 100% improvement in the balance of the multi-objective Pareto front compared to traditional strategies. Additionally, no local stagnation due to goal conflict occurred during the strategy adjustment process. In terms of strategy diversity maintenance, the entropy regularization mechanism effectively suppresses the strategy convergence speed. Under 20% node failure and multi-stage attack chain interference, the long-term decay of strategy entropy is reduced by more than 65% compared to the baseline, preventing the systemic imbalance caused by over-optimization of a single objective. These results demonstrate that GESAC, through the joint encoding of physical topology states and logical access behaviors, transforms dynamic disruptions into resilient mappings in the strategy space. This approach shifts from passive responses to active optimization, achieving tightly coupled optimization of security and service quality despite structural mutations and attack interference.

Distributed architecture efficiency and scalability testing

While ensuring the model’s technical advantages, its engineering feasibility must also be validated. The study primarily tests the decision efficiency and communication overhead of GESAC under different network topology scales, assessing its scalability limit. The results are shown in Fig. 10:

Fig. 10.

Fig. 10

Distributed architecture efficiency and scalability test results.

As shown in Fig. 10, the distributed architecture of GESAC demonstrates notable engineering value in terms of both efficiency and scalability. In tests where the network size was scaled from 100 to 100,000 nodes, the per-step decision latency increased from 4.2 µs to 25.6 µs, exhibiting a sub-linear growth trend. This ensures millisecond-level control responsiveness even in ultra-large-scale optical network environments. In terms of communication overhead, GESAC’s federated architecture effectively reduces load through local feature aggregation within subdomains and inter-domain gradient sharing. Compared to the centralized architecture’s 3288.7 MB per cycle, GESAC achieves a communication cost of only 1389.5 MB in a 100,000-node, 500-subdomain scenario—a reduction consistently maintained between 57.2% and 57.9%. Additionally, the scalability slope is controlled within a range of 1.2 to 2.0 MB per subdomain, indicating that communication growth remains linearly manageable with increasing subdomain numbers and avoids the bandwidth bottlenecks typical of centralized scheduling. At a deeper level, GESAC leverages a federated game-theoretic equilibrium mechanism to decouple collaboration and resource isolation across subdomains. This design enables each subdomain to operate independently while maintaining global awareness of topological dynamics, thereby enhancing the adaptability of cross-domain resource scheduling. Overall, the architecture delivers three key advantages—low latency, low bandwidth dependency, and high scalability—making it well-suited for online access control deployment in large-scale decomposed optical networks.

Mininet optical dataset validation

To further evaluate GESAC’s adaptability under realistic network transmission delays, the Mininet Optical simulator was introduced to construct an emulated optical network environment. A portion of dynamically preprocessed topological data from GESAC was imported to simulate light-path switching and access behaviors across network sizes ranging from 50 to 500 nodes. Attack events were extracted from the original CIC-IDS dataset and injected into link traffic streams through mapping-based techniques. In the simulation, GESAC was deployed as an edge control strategy, communicating with Mininet’s control plane via REST APIs. This setup enabled policy synchronization within the federated learning framework while executing local access control actions. Using iperf and tc commands, key performance metrics—including transmission latency, jitter, and FPR—were measured under various attack and traffic load conditions. These were further analyzed in conjunction with wavelength allocation results. The outcomes are summarized in Table 8:

Table 8.

Validation results of GESAC on the mininet optical dataset.

Test scenario Avg. end-to-end delay (ms) Max delay (ms) Min delay (ms) Std. Dev. of delay (ms) Wavelength utilization (%) GESAC_FPR (%)
Small topology (50 nodes) 1.823 2.431 1.213 0.412 78.345 1.1
Medium topology (200 nodes) 3.417 5.823 2.014 0.879 84.212 1.3
Large topology (500 nodes) 8.121 12.301 5.612 2.101 86.013 1.5
Medium topology + light load 2.154 3.276 1.325 0.536 82.136 1
Medium topology + heavy load 5.843 9.104 3.804 1.762 71.884 1.8
Medium topology + DDoS injection 6.276 10.443 4.173 1.944 68.905 2.1
Large topology + topology shift 9.472 14.682 6.438 2.739 81.324 1.7

As shown in Table 8, GESAC demonstrates strong performance even under realistic network transmission conditions. First, in terms of average end-to-end delay, GESAC maintains low latency across small to medium topologies, ranging from 1.8 ms to 3.4 ms. At a larger scale of 500 nodes, the average delay remains well-controlled at 8.1 ms, with a maximum of 12.3 ms, highlighting its scalability and effective delay management. Second, under various load and attack scenarios—such as heavy traffic, DDoS injections, and topological shifts—GESAC sustains high wavelength utilization between 68.9% and 86.0%, while keeping the FPR consistently below 2.1%, even as delay variance increases. Notably, in the “Medium Topology + Light Load” scenario, GESAC achieves its best performance, with the lowest FPR (1.0%) and minimal delay variability (0.536 ms), demonstrating high stability and precision in resource-abundant environments. Furthermore, under “Topology Shift” conditions, despite an increase in maximum delay to 14.68 ms, GESAC maintains an average FPR of just 1.7%, reflecting its robustness and adaptability to structural changes. Overall, these results validate GESAC’s practical deployability in real-world, multi-domain decomposed optical networks, confirming its capability to handle complex, multi-objective secure access control requirements under dynamic latency and disruption constraints.

Discussion

The study analyzes the decomposition of optical network security access control using GESAC and proposes the following three optimization directions based on the findings:

  1. Enhance attack detection capabilities to build a comprehensive defense system: The attack detection model should be further improved by incorporating feature learning from more complex attack scenarios. This would enhance sensitivity to new attacks through cross-domain knowledge fusion. Additionally, the detection algorithm should be optimized to reduce FPR. Strengthening attack chain tracing and blocking mechanisms will improve the overall resilience of the optical network against attacks.

  2. Optimize dynamic resource allocation strategies to improve efficiency and stability: A real-time dynamic resource allocation system should be built based on traffic and security situational awareness. In various load and attack scenarios, intelligent algorithms should be used to allocate resources precisely. This will improve wavelength utilization, reduce end-to-end delay fluctuations, balance resource competition and security investment, and ensure service quality.

  3. Enhance flexibility and robustness in responding to topological mutations to ensure continuous and stable network operation: Cross-layer federated perception and spatiotemporal causal reasoning mechanisms should be improved. When topology structure mutations occur, the system should quickly translate these changes into optimization criteria for strategies, maintaining a balance between security and service quality. Strengthening strategy entropy regularization will ensure strategy diversity, prevent stagnation caused by topology changes, and ensure stable operation of the optical network under complex conditions.

Given that GESAC integrates both GNN and reinforcement learning strategies, its operation naturally involves a certain level of computational overhead. To address this, the implementation adopts a modular distributed architecture, where each subdomain model independently performs topology graph encoding and policy updates on local processors. Experimental results confirm that even at a scale of 100,000 nodes, single-step decision latency remains as low as 25.6 µs, with communication overhead growth stabilized at under 2 MB per subdomain—indicating the feasibility of parallel deployment on devices with moderate edge computing capabilities. Leveraging the principles of federated architecture, GESAC eliminates the need for centralized training. Instead, policy coordination is achieved through localized subdomain updates and periodic parameter aggregation. This allows deployment on network devices that support control functions, such as SDN controllers, edge servers, or virtualized router instances. A central coordination node handles policy synchronization and global aggregation, aligning well with current domain-based management models and mainstream multi-controller SDN frameworks. Overall, this deployment strategy—supported by both architectural design and empirical performance validation—demonstrates strong engineering feasibility and adaptability for real-world optical network environments.

Conclusion

The key innovation of this study lies in transforming the problem of resource fragmentation during optical network decomposition into a dynamic graph partitioning issue. The improved GNN community discovery algorithm is employed to automatically identify security boundaries, while the entropy maximization exploration mechanism of SAC balances network security and service continuity in access policy optimization. Ultimately, end-to-end training is achieved through joint gradient updates. In this process, the gradient of the GNN parameters is backpropagated through the policy value network. SAC’s policy update then relies on topology-sensitive state features extracted by the GNN, creating a closed-loop optimization system.

However, this study has certain limitations. GESAC’s detection and decision-making efficiency may be affected when handling ultra-large-scale complex networks or novel unknown attack combinations. The current GNN architecture requires further optimization to reduce computational resource consumption, particularly when dealing with massive nodes and complex connections. Future work will plan to integrate neural signal processing technology with the existing model. Neural signal processing can quickly perceive and process information by simulating the signal transmission mechanism between neurons. This approach would enable the model to capture subtle changes in network status in real time, allowing for a rapid response to network attacks and fluctuations in resource demand. For example, the high-speed parallel processing capabilities of neuromorphic chips can accelerate the model’s analysis of network data. Additionally, combining online learning algorithms will continuously optimize the model during operation. This will enhance its adaptability to dynamic environments.

Supplementary Information

Below is the link to the electronic supplementary material.

Author contributions

Zhenqian Zhao contributed to conception and design of the study. Yuhe Wang organized the database. Zhenqian Zhao performed the statistical analysis. Yuhe Wang wrote the first draft of the manuscript. Zhenqian Zhao wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Data availability

Data is provided within the manuscript or supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Hu, L. et al. Apr., Convex Optimization - Based High - Speed and Security Joint Optimization Scheme in Optical Access Networks, OPT EXPRESS, vol. 32, no. 4, pp. 6748–6764, (2024). [DOI] [PubMed]
  • 2.Bin, S. & Sun, G. Research on the Influence Maximization Problem in Social Networks Based on the Multi - Functional Complex Networks Model, JOEUC, vol. 34, no. 3, pp. 1–17, May. (2022).
  • 3.Opanin Gyamfi, E. et al. Hierarchical graph neural network: A lightweight image matching model with enhanced message passing of local and global information in hierarchical graph neural networks. Information15 (10), 602 (Oct. 2024).
  • 4.Zhang, Y. et al. Mar., Access Point Selection Based on Beacon Signals in Optical Satellite Networks, J OPT COMMUN NETW, vol. 17, no. 3, pp. 163–177, (2025).
  • 5.Zhang, M. et al. Quantum noise stream cipher scheme based on 3 - Channel - QAM with 3D mapping and DFTs - OFDM. IEEE PHOTONIC TECH. L. 25 (5), 677 (May. 2024).
  • 6.Chen, X. et al. Meta - Learning - Aided QoT estimator provisioning for a dynamic VNT configuration in optical networks. J. OPT. COMMUN. NETW.17 (1), A103–A (Jan. 2025).
  • 7.Boeding, M., Hempel, M. & Sharif, H. Novel approach towards a fully deep Learning - Based IoT receiver architecture: from Estimation to decoding. Future Internet. 16 (5), 155 (May. 2024).
  • 8.Sehgal, S. K. & Gupta, R. Dual Secret Key Virtualization Through TCC Based Wavelength Allocation and BB84 Protocol in QKD over Optical Networks, WIRELESS PERS COMMUN, vol. 134, no. 3, pp. 1443–1468, Mar. (2024).
  • 9.Bekri, M., Reyes, R. R. & Bauschert, T. Multilayer Restoration in IP - Optical Networks by Adjustable Robust Optimization and Deep Reinforcement Learning, J OPT COMMUN NETW, vol. 16, no. 7, pp. 721–734, Jul. (2024).
  • 10.Krishna-Sowjanya, K. & Mouleeswaran, S. K. Enhancing cloud security with intelligent load balancing and malicious request classification, Cluster Computing, vol. 28, no. 1, p. 24, May. (2025).
  • 11.Lin, L. et al. Mar., E-GRACL: an IoT intrusion detection system based on graph neural networks, J SUPERCOMPUT, vol. 81, no. 1, p. 24, (2025).
  • 12.Guo, P. et al. Jun., Adaptive Access Strategy for Satellite - Terrestrial Optical Networks Based on Particle Swarm Optimization Algorithm, PHYS COMMUN-AMST, vol. 69, no. 6, p. 102600, (2025).
  • 13.Yin, X., Wu, X. & Zhang, X. A Trusted Federated Learning Method Based on Consortium Blockchain, Information, vol. 16, no. 1, p. 14, Jun. (2024).
  • 14.Alalyan, F. et al. Mar., Secure distributed federated learning for cyberattacks detection in B5G open radio access networks, IEEE Open Journal of the Communications Society, vol. 6, no. 7, pp. 3067–3081, (2025).
  • 15.Song, H. et al. Oct., Cluster - Based Unsupervised Method for Eavesdropping Detection and Localization in WDM Systems, J OPT COMMUN NETW, vol. 16, no. 10, pp. F52-F61, (2024).
  • 16.Kish, S. P. et al. Mitigation of channel tampering attacks in Continuous - Variable quantum key distribution. Phys. Rev. Res.6 (2), 023301 (Feb. 2024).
  • 17.Gong, X. et al. Mar., Machine - Learning - Based Optical Spectrum Feature Analysis for DoS Attack Detection in IP over Optical Networks, OPT EXPRESS, vol. 32, no. 3, pp. 3793–3803, (2024). [DOI] [PubMed]
  • 18.Shirichian, M. et al. Mar., A QTCP/IP Reference Model for Partially Trusted - Node - Based Quantum - Key - Distribution - Secured Optical Networks, QUANTUM INF PROCESS, vol. 23, no. 3, p. 87, (2024).
  • 19.Alshammari, S. T. et al. Oct., Building a Comprehensive Trust Evaluation Model to Secure Cloud Services From Reputation Attacks, IEEE ACCESS, vol. 12, no. 10, pp. 150754–150775, (2024).
  • 20.Said, D. et al. Quantum entropy and reinforcement learning for distributed denial of service attack detection in smart grid. IEEE ACCESS.7 (2), 102452 (Feb. 2024).
  • 21.Nourmohammadi, F. et al. Using Convolutional Neural Networks for Blocking Prediction in Elastic Optical Networks, Applied Sciences, vol. 14, no. 5, p. May. 2024. (2003).
  • 22.Ning, Y. et al. May., Biphase Routing Scheme for Optimal Throughput in Large - Scale Optical Satellite Networks, J OPT COMMUN NETW, vol. 16, no. 5, pp. 553–564, (2024).
  • 23.Matsuura, H. et al. Extension of the k - SPF algorithm for finding SRLG - Disjoint primary and backup route pairs in optical networks. J OPT. COMMUN. NETW, 16, 9, pp. E23-E35, Sep. 2024.
  • 24.Zou, Y. E. et al. Jun., Aging - Aware Co - Optimization of Topology, Parameter and Control for Multi - Mode Input - and Output - Split Hybrid Electric Powertrains, J POWER SOURCES, vol. 10, no. 6, p. 24, (2024).
  • 25.Peng, H. et al. Jun., Evading control flow graph based GNN malware detectors via active opcode insertion method with maliciousness preserving, SCI REP-UK, vol. 15, no. 1, p. 9174, (2024). [DOI] [PMC free article] [PubMed]
  • 26.Mughal, F. R. et al. Mar., A Meta-Reinforcement Learning Framework Using Deep Q‐Networks and GCNs for Graph Cluster Representation, SOFTWARE PRACT EXPER, vol. 3, no. 1, p. 17, (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

Data is provided within the manuscript or supplementary information files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES