Soft actor-critic algorithm and improved GNN model in secure access control of disaggregated optical networks

Zhenqian Zhao; Yuhe Wang

doi:10.1038/s41598-025-15225-z

. 2025 Aug 11;15:29358. doi: 10.1038/s41598-025-15225-z

Soft actor-critic algorithm and improved GNN model in secure access control of disaggregated optical networks

Zhenqian Zhao ^1,^✉, Yuhe Wang ²

PMCID: PMC12339999 PMID: 40790073

Abstract

To address the challenges of coordinated defense amid dynamic topology evolution and multidimensional security threats in decomposed optical networks, this study introduces the Graph-Entangled Security Actor-Critic (GESAC) model. GESAC is built on spatiotemporal modeling of evolving topologies and leverages a cross-layer spatiotemporal Graph Neural Network (GNN) to capture causal dependencies between optical path switching and access requests. Additionally, it enables adaptive delineation of security boundaries across multiple domains through federated representation learning. Within this framework, the Soft Actor-Critic (SAC) algorithm is employed to construct a policy optimization mechanism. By integrating entropy-guided multi-objective reinforcement learning, GESAC maps encoded network states to access control strategies, jointly optimizing for security, service quality, and system resilience. Experimental validation is conducted on a heterogeneous dataset comprising Cooperative Association for Internet Data Analysis (CAIDA) topology data, Canadian Institute for Cybersecurity Intrusion Detection Systems (CIC-IDS) access logs, and International Telecommunication Union Telecommunication Standardization Sector threat characteristics. The dataset encompasses 12 attack scenarios, 57,000 dynamic topology sequences, and 2.8 million cross-domain authentication events. Key findings include: (1) Threat Detection: GESAC achieves an F1-score of 0.915–0.931 in identifying physical-layer attacks such as wavelength eavesdropping and cross-domain privilege escalation, with a false positive rate as low as 0.7%. (2) Resource Optimization: Compared to greedy strategies, GESAC improves wavelength utilization variance by up to 58.9% and reduces end-to-end latency standard deviation by up to 57.7% under high-load conditions. (3) Policy Robustness: In scenarios involving topological mutations, the model increases Pareto frontier coverage by over 100% and reduces policy entropy decay rate by more than 65%, indicating strong robustness. (4) Scalability: At a scale of 100,000 network nodes, GESAC achieves a single-step decision latency of just 25.6µs and significantly reduces communication overhead, demonstrating excellent scalability. GESAC is designed to overcome the limitations of static security policies in the face of dynamic decomposition and large-scale attacks in optical networks. Integrating causal inference with game-theoretic equilibrium redefines the security control paradigm—shifting from passive defense to proactive resilience—and provides an interpretable, highly adaptive foundation for next-generation architectures such as multi-domain collaboration and computing-network convergence.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-15225-z.

Keywords: Disaggregated optical networks, Optical network secure access control, Cross-layer spatiotemporal graph attention mechanism, GESAC, Dynamic adaptation

Subject terms: Materials science, Mathematics and computing

Introduction

In the digital era, optical networks—functioning as the backbone of modern communication infrastructure—play a vital role in global data exchange, owing to their high bandwidth and low latency capabilities¹. Cloudflare’s latest report shows that global internet traffic grew by 17.2% in 2024 compared to the previous year. Mobile devices now account for over 40% of total traffic, significantly increasing the performance and security demands on underlying optical transmission networks. Additionally, cybersecurity threats continue to escalate worldwide. According to the World Economic Forum’s Global Risks Report 2024, 39% of respondents believe cyberattacks are likely to spark global crises in the near future. Cybercrime is now ranked among the top ten global risks for the coming years. In its 2025 Security Report, Check Point projected a 44% year-over-year increase in cyberattacks globally. Verizon’s Data Breach Investigations Report further highlighted a shift in threat patterns: in 2025, vulnerability exploitation overtook phishing as the leading initial attack vector, surpassing credential abuse. Notably, 22% of attacks directly targeted Virtual Private Networks (VPNs) and gateway devices—an eightfold increase from the previous year.

In this context, as network architectures evolve toward greater decomposition and multi-domain integration, static rule-based access control mechanisms are increasingly insufficient for addressing complex, dynamic security threats^2,3. Current optical network defenses largely depend on static perimeter definitions and role-based access controls. These systems rely on predefined attack signatures or single-policy templates, limiting their ability to respond to rapidly shifting access patterns and evolving topologies. Cross-domain access within optical networks is also becoming more common, with subdomain boundaries growing increasingly blurred. As a result, attack paths now demonstrate greater lateral movement and chain-propagation behavior. This trend undermines the effectiveness of single-point defenses and conventional blacklist–whitelist strategies. Moreover, optical networks must now meet rising demands driven by large-scale computing resource scheduling, cloud service integration, and smart device connectivity. These networks must balance strong security isolation with high performance across multiple dimensions—including bandwidth utilization, wavelength allocation fairness, and service latency. This growing complexity intensifies the conflict between secure access control and efficient resource orchestration. Currently, optical networks face three fundamental challenges: (1) Escalating Attack Techniques: Threats are evolving from conventional denial-of-service (DoS) attacks to advanced physical-layer exploits, such as wavelength eavesdropping. (2) Dynamic Topologies: Constant changes in network topology obscure security boundaries, complicating the tracing of cross-domain attack paths. (3) Resource-Security Tradeoffs: There is an increasing need to balance resource allocation with adaptive enforcement of security policies⁴.

Fundamentally, advancing secure access control in optical networks hinges on dynamically modeling network states, accurately identifying potential threats, and optimizing policies in real time. The Soft Actor-Critic (SAC) algorithm, a reinforcement learning technique, is well-suited for these tasks. Through continuous interaction with its environment, SAC learns optimal policies and adapts effectively to complex, dynamic conditions—making it a strong fit for the nature of optical networks⁵. Meanwhile, Graph Neural Network (GNN) is highly effective in processing graph-structured data by leveraging node and edge message-passing mechanisms to capture high-order relationships. Given the intricate and interwoven topologies of optical networks, GNN is well positioned to model their structural evolution and the associated security dependencies⁶. However, under a single-method framework, static models often struggle to capture the cascading security effects triggered by real-time topology changes. Moreover, access control strategies and physical resource management are typically implemented in isolation, lacking holistic coordination. To address these limitations, integrating SAC and GNN offers a compelling solution: SAC provides dynamic decision-making capabilities, while GNN enables sophisticated feature extraction and structural modeling. Enhancing GNN to reflect the unique security demands of optical networks further strengthens their applicability in access control contexts. In response, this study proposes the Graph-Entangled Security Actor-Critic (GESAC) model, which synergizes reinforcement learning with graph-based modeling. GESAC seeks to establish a new paradigm for secure access control in decomposed optical networks by constructing a collaborative defense system that simultaneously ensures global security, optimizes resource efficiency, and adapts policy strategies to evolving threats.

Implementation principles of SAC and GNN models

SAC is an algorithm based on maximum entropy RL, designed to simultaneously enhance cumulative rewards and policy diversity during optimization. It employs dual critic networks to estimate state-action values, reducing value estimation bias, while an actor network generates policies and maximizes policy entropy to enhance exploration. During training, SAC integrates experience replay and target networks and optimizes policies by minimizing Bellman error and maximizing entropy regularization, ensuring stable and robust policy outputs. The implementation principles of SAC are illustrated in Fig. 1.

Fig. 1 — Implementation principles of the SAC algorithm.

GNN employs an iterative message-passing mechanism to perform feature aggregation between nodes and their adjacent structures, enabling representation learning for graph-structured data. At each layer, GNN aggregates and updates the target node’s representation based on its neighbors’ weighted features, thereby capturing complex topological relationships and inter-node dependencies⁷. In optical network modeling, GNN can effectively represent physical links, node roles, and their interactions, mapping network states into high-dimensional embeddings as inputs to the policy model. This structured information supports decision optimization in subsequent processes. The implementation principles of the GNN model are illustrated in Fig. 2.

Fig. 2 — Implementation principles of the GNN model.

Building on traditional GNN architectures, this study introduces a multi-layer spatiotemporal attention mechanism and a cross-layer gated fusion structure to enhance the modeling capabilities of dynamic optical network characteristics. A temporal sliding window mechanism captures topological evolution patterns over different time periods. This is combined with a self-attention mechanism to extract latent causal dependencies between nodes, improving the model’s awareness of topology mutations and attack propagation paths⁸. A cross-layer gating module is designed to control the information flow across different GNN layers, ensuring effective fusion of physical-layer features (e.g., wavelength allocation, path switching) with logical-layer states. This enhances the model’s ability to express heterogeneous information⁹. A community discovery approach partitions the graph into security subdomains, allowing the model to perform local training and global aggregation under a federated learning framework. This mitigates policy fragmentation issues caused by resource fragmentation and provides fine-grained structural constraints for policy optimization^10,11. The implementation principles of the improved GNN model are illustrated in Fig. 3.

Fig. 3 — Implementation principles of the improved GNN model.

In the improved GNN, node features evolve over time and are encoded as follows at time Inline graphic and layer :

Inline graphic represents the graph embedding of node v, denotes its neighbor set, is the attention weight, is the layer-specific parameter matrix, and is the activation function.

The attention weight is determined by the temporal perception mechanism:

Inline graphic represents the attention vector, and denotes vector concatenation. and are mapping matrices, while is the temporal decay factor.

The temporal decay factor models the influence strength of past events on the current state, as shown in Eq. (3):

Inline graphic is the temporal decay rate, and represents the end of the current time window.

The cross-layer gating mechanism dynamically fuses features from adjacent layers:

Inline graphic is the gated output, is the gating weight, and denotes the Hadamard product.

The gating weight is controlled by the node’s own state and graph structure:

Inline graphic is the gating weight matrix, and denotes the node degree.

The graph security subdomains are defined based on embedding similarity:

Inline graphic represents the -th security subdomain, is the -th cluster center.

After each iteration, the subdomain centers are updated as follows:

Inline graphic represents the number of nodes in the th subdomain.

Building on the above mathematical formulations, Fig. 4 presents the implementation workflow of the improved GNN model, detailing the data flow from initial input to the final output.

Implementation of the GESAC model

The GESAC model constructs a graph-based RL framework tailored to the spatiotemporal characteristics of dynamic optical networks. The core idea is to model network topology evolution using a GNN and integrate it with the SAC algorithm for policy optimization. At the lower level, the improved GNN extracts structural features and security context information from each physical subdomain. A temporal attention mechanism identifies dynamic interactions between optical path switching and access requests. A gating mechanism enhances semantic consistency in multi-layer graph representations. Community partitioning enables automatic identification of security boundaries.

On this basis, GESAC incorporates a federated architecture to aggregate GNN representations from different subdomains into a unified state vector, which is then processed by the SAC module for policy evaluation and action decision-making⁴. SAC employs a dual-critic mechanism to estimate the long-term rewards of access policies¹². By maximizing expected rewards and policy entropy, the actor network generates diverse and robust access control actions. Throughout the training process, GNN and SAC share parameters and undergo joint optimization, allowing the model to balance resource utilization and access security while maintaining strong adaptability^13,14.

In GESAC, the global state is constructed by aggregating subdomain graph embeddings:

Inline graphic represents the graph structure of the th subdomain at time , while denotes the corresponding GNN encoder. The operation refers to aggregation.

The probability distribution of control actions output by the policy network is computed as shown in Eq. (9):

Inline graphic is the actor policy parameterized by , represents the action, and is the state. The mean and variance are computed by a neural network.

The soft Q-value of the state-action pair, estimated by the value network, is calculated as shown in Eq. (10):

Inline graphic denotes the critic network, and represents the target network parameters. The term is the discount factor, while denotes the entropy coefficient.

The policy objective maximizes the expected entropy-adjusted return, computed as shown in Eq. (11):

The critic loss function is computed as shown in Eq. (12):

The target value is derived from the target network and entropy regularization, as shown in Eq. (13):

The gradient of the graph embedding is obtained through backpropagation from the policy loss:

Inline graphic represents the GNN parameters, and is the state graph embedding.

After federated aggregation, the global policy parameters are obtained as follows:

Inline graphic denotes the local parameters of the th subdomain at round , and represents the number of its samples.

The final action must satisfy both physical and security constraints:

Inline graphic is the access control matrix, while and represent resource and security constraints, respectively. The thresholds are denoted by and .

The pseudocode for implementing the GESAC model is shown in Fig. 5.

The tabulation of related variables in the above code is shown in Table 1:

Table 1.

Symbol and variable definitions for pseudocode description.

Symbol/variable	Description
k	Subdomain index, representing the k-th distributed subnetwork in federated learning
θk	Parameter set of the GNN encoder in the k-th subdomain
πk	Actor policy network in the k-th subdomain
Qk1, Qk2	Twin Critic networks in the k-th subdomain
πg	Global policy network obtained through federated aggregation
Dk	Experience replay buffer in the k-th subdomain
Gk	Current dynamic graph sequence in the k-th subdomain
sk, s’k	State vectors encoded by the GNN, representing the current and next states
ak	Action (i.e., access control decision) output by the policy under state sk
rk	Immediate reward received after action execution
γ	Discount factor controlling the impact of future rewards
α	Entropy temperature coefficient, balancing exploration and exploitation
N	Federated communication interval (model aggregation and synchronization every N steps)
Q_target	Target value function used for Critic network update via Bellman equation
MaxEpisode	Maximum number of training episodes, defining the upper limit of iterations
EncodeGraph()	Graph encoding function mapping node and edge attributes to state vectors via GNN
SampleAction()	Entropy-regularized action sampling from policy π
StepEnvironment()	Simulates environment interaction: executes action and returns reward and next state

Open in a new tab

Data collection process for GESAC model validation

This study employs a heterogeneous data federation framework, integrating Cooperative Association for Internet Data Analysis (CAIDA) topology data, Canadian Institute for Cybersecurity Intrusion Detection Systems (CIC-IDS) access logs, and International Telecommunication Union Telecommunication Standardization Sector (ITU-T) threat characteristics to construct a dynamic optical network validation dataset. First, subgraph skeletons matching optical network characteristics are extracted from CAIDA’s Multiprotocol Label Switching (MPLS) dataset. Topologies with abnormal node degree distributions or excessive link wavelength capacities are filtered out, preserving only core nodes and critical optical paths to form the physical-layer infrastructure. Next, based on the CIC-IDS-2017/2018 dataset, attack events such as brute force and Distributed Denial of Service (DDoS) are mapped to corresponding nodes in the CAIDA subgraph using IP address hashing and timestamp alignment. This process generates dynamic access request sequences with labeled normal and abnormal behaviors. Additionally, in accordance with ITU-T X.1818 standards, threats such as wavelength eavesdropping and cross-domain privilege escalation are transformed into executable edge attributes. These threats are injected into the topology links following predefined attack chain patterns, simulating multi-stage coordinated attack scenarios. The relationships among data applications are illustrated in Fig. 6.

Fig. 6 — Application relationships in the heterogeneous data federated framework.

In the data preprocessing stage, a spatiotemporal encoder is designed to unify multi-source data formats: (1) Node attributes integrate device status and logical authentication records; (2) Edge attributes incorporate physical-layer metrics and security labels. A sliding window mechanism segments dynamic topology sequences, ensuring temporal and spatial dependency modeling by maintaining graph snapshots across consecutive time steps. The final dataset includes: 57,000 topology sequences, 2.8 million cross-domain authentication events, 17 types of adversarial attack chains, Coverage of composite threats, such as wavelength eavesdropping and logical-layer infiltration. The pseudo code of the data preprocessing stage is shown in Fig. 7.

Fig. 7 — Pseudocode of data preprocessing stage.

The key parameters of the dynamic topology and attack scenarios of the optical network are summarized in Table 2:

Table 2.

Key parameters of optical network dynamic topology and attack scenarios.

Parameter category	Specific parameter	Data source	Description
Topology structure parameters	Number of nodes	CAIDA MPLS	Total number of physical nodes in each subgraph
	Number of links		Number of optical path connections (unidirectional/bidirectional)
	Node degree distribution		Average degree, maximum degree, minimum degree
	Wavelength capacity (wavelengths/link)	Simulation supplement	Number of wavelengths supported by each optical link
Attack injection parameters	Attack types (17 categories)	ITU-T X.1818	Includes wavelength eavesdropping, DDoS, cross-domain privilege escalation, etc.
	Attack stages (single-point/multi-stage)		Indicates whether the attack is part of a complex attack chain
	Threat impact level (1–5)		Ranges from 1 (low risk) to 5 (catastrophic impact)
	Attack duration (ms)	CIC-IDS-2017/2018	Extracted from logs to determine attack time span
Dynamic evolution parameters	Topology change frequency (times/s)	CAIDA + simulation	Simulated rate of optical path switching or node failures
	Attack occurrence frequency (times/topology change)	ITU-T + CIC-IDS	Number of injected attacks per topology change cycle
	Normal traffic vs. attack traffic ratio	CIC-IDS-2017/2018	Ratio of legitimate to malicious requests, based on log statistics

Open in a new tab

The 17 types of adversarial attacks are summarized in Table 3:

Table 3.

Summary of 17 types of adversarial attacks.

Attack Scope	Category count	Attack type	Attack stage	Attack code
Physical layer attacks (A)	3	Wavelength eavesdropping¹⁵	Single-point	A1
		Optical path hijacking	Multi-stage	A2
		Cross-domain privilege escalation	Multi-stage	A3
Data layer attacks (B)	3	Data tampering¹⁶	Single-point	B1
		Man-in-the-middle attack	Multi-stage	B2
		Data leakage	Single-point	B3
Logical layer attacks (C)	3	Web penetration attack	Multi-stage	C1
		APT (advanced persistent threat) attack	Multi-stage	C2
		SQL (structured query language) injection	Single-point	C3
Service layer attacks (D)	3	DoS (denial of service)¹⁸	Single-point	D1
		Botnet	Multi-stage	D2
		Service spoofing	Multi-stage	D3
Application layer attacks (E)	2	DDoS	Single-point	E1
Application layer attacks (E)	2	Brute force attack	Single-point	E2
Advanced composite attacks (F)	3	Route hijacking	Multi-stage	F1
		Resource exhaustion attack	Single-point	F2
		Protocol vulnerability exploitation¹⁸	Multi-stage	F3

Open in a new tab

Before conducting the formal validation, and to highlight the advantages of the proposed model, the study compares the proposed model with different baseline models selected for various objectives. The selection of baseline models is summarized in Table 4.

Table 4.

Selection of baseline models for comparative analysis.

Analysis dimension	Baseline model	Comparative justification	Key parameters of baseline model
Multi-dimensional attack detection performance comparison	Traditional static role-based access control (RBAC) model	Evaluates performance differences between static and dynamic strategies in detecting emerging attacks¹⁹	Role-permission matrix: 100 × 50; rule update cycle: fixed at 24 h
	Q-learning algorithm	Validates SAC’s superiority over traditional RL in policy exploration and multi-objective trade-offs²⁰	Learning rate: 0.01; discount factor: 0.99; state space dimension: node degree + link load (50 dimensions)
	ML-based detector (SVM - support vector machine)	Compares traditional feature engineering with GNN for dynamic topology-based attack detection sensitivity²¹	Feature dimensions: 20 (packet rate, connection count, etc.); kernel function: RBF (radial basis function); penalty factor c = 10
Dynamic resource allocation balance validation	Greedy algorithm	Highlights the contrast between global optimization and local optimal strategies in resource balancing²²	Link Selection Policy: Lowest real-time load priority; History State Memory Window: None.
Dynamic resource allocation balance validation	Shortest path first	Validates the necessity of multi-objective optimization compared to single-objective (minimum hops)²³	Routing update interval: 5 s; link cost calculation: hop count; maximum path count: 3
Robustness evaluation under topology variations	Dynamic programming graph generation (DPGG) model	Compares centralized vs. federated architectures in terms of policy stability in dynamic topologies²⁴	Network structure: actor (256–128–64)—critic (256–128); experience replay buffer size: 1e5; batch size: 64
Distributed architecture efficiency & scalability testing	Not applicable	N/A	N/A

Open in a new tab

To evaluate the performance of GESAC in complex optical network environments, this study aims to assess its security, stability, robustness, and deployability. Accordingly, the evaluation is conducted from four perspectives: multi-dimensional attack detection performance, dynamic resource allocation balance, policy robustness under topological mutations, and scalability of the distributed architecture^25,26. A set of key performance metrics has been identified for each evaluation dimension, as summarized in Table 5.

Table 5.

Key metrics used in multi-dimensional evaluation of GESAC.

Evaluation aspect	Metric	Definition	Calculation method
Multi-dimensional attack detection	F1-score	Composite indicator of detection precision	2 × (Precision × Recall)/(Precision + Recall)
Multi-dimensional attack detection	False positive rate (FPR)	Proportion of benign traffic misclassified as malicious	FP/(FP + TN)
Dynamic resource allocation balance	Wavelength utilization variance	Measures load balancing across resources	Variance of wavelength utilization across nodes
Dynamic resource allocation balance	Standard deviation of E2E latency	Assesses transmission stability under varying load	Standard deviation of end-to-end path latency
Robustness under topological mutation	Policy entropy decay rate (%/k steps)	Measures policy diversity reduction rate	Decline rate of policy entropy per thousand steps
	Pareto frontier coverage (%)	Proportion of optimal solutions in the objective space	Ratio of Pareto-optimal solutions generated by the model
	Coverage improvement (%)	Improvement in Pareto coverage over baseline	((SESAC − DPGG)/DPGG) × 100%
	Entropy decay reduction (%)	Degree of suppression in entropy decline	((DPGG − SESAC)/DPGG) × 100%
Distributed architecture scalability	Single-step decision latency (µs)	Time taken to generate a single policy action	Total policy computation time/number of decisions
	Federated communication overhead (MB/cycle)	Communication cost per cycle across subdomains	Total communication volume across subdomains/number of communication cycles
	Centralized communication overhead (MB/cycle)	Communication cost under centralized system	Total communication volume to central node/number of cycles
	Communication reduction ratio (%)	Reduction in overhead via federated learning	((Centralized − Federated)/Centralized) × 100%
	Scalability slope (MB/subdomain)	Communication load growth rate with number of subdomains	Federated communication overhead/number of subdomains

Open in a new tab

To enhance reproducibility and ensure transparency of experimental parameters, the study provides a comprehensive summary of key configuration settings prior to formal analysis. This includes network topology scale, attack injection strategies, model training hyperparameters, and runtime environment specifications. Given the stochastic nature of both topology variations and attack injections, the settings also explicitly define parameter ranges, fixed random seeds, and synchronization cycles. These details offer a consistent technical foundation for interpreting results and facilitating cross-method comparison. The complete configuration information is shown in Table 6.

Table 6.

Key configuration parameters for experimental analysis.

Parameter category	Parameter name	Value/range	Notes
Topology configuration	Number of graph nodes	100–100,000	Sampled from CAIDA subgraphs
	Number of graph edges	150–320,000	Filtered based on degree distribution and wavelength availability
	Number of subdomains	10–500	Automatically generated via subgraph partitioning
	Wavelength capacity range	4–16	Specifies physical link capacity constraints
Attack injection settings	Attack injection frequency	1–2 attacks per 100 topology updates	Simulates multi-stage collaborative attack chains
	Attack type coverage	17 types (physical/logical/composite)	Based on CIC-IDS and ITU-T threat mapping
	Topology mutation interval	Every 10–50 steps	Simulates dynamic topological evolution
Model training parameters	Number of GNN layers	2–4	Uses improved GNN architecture
	GNN hidden dimension	64/128/256	Kept consistent across experiments
	SAC learning rate	1e-4–5e-4	Used for joint training of actor and critic networks
	Entropy coefficient (α)	0.1–0.2	Balances exploration and exploitation
	Batch size	256/512	Tuned according to model size
	Temporal decay factor (λ)	0.95–0.99	Enhances model responsiveness by decaying outdated state information
	Number of communities (K)	20–100	Maximum number of graph partitions per subdomain for federated aggregation granularity
Operating environment	Computing platform	CPU cluster	Supports large-scale simulations
	Random seed	Fixed at 42	Ensures experimental reproducibility
	Hardware setup	Intel Xeon Gold 6338 × 2, NVIDIA A100 (80GB) ×4, 512GB DDR4 RAM	High-performance cluster for concurrent training and simulation
	Software stack	Ubuntu 20.04, CUDA 11.6, PyTorch 1.13, DGL 0.9.1, Python 3.8	Standard deep learning and graph processing framework supporting federated parallel scheduling and GPU acceleration

Open in a new tab

Application effectiveness of GESAC in optical network security access control

Multi-dimensional attack detection performance analysis

To evaluate GESAC’s improvements in threat detection, the study extracted various attack samples from integrated datasets, including: (1) Emerging threats, such as wavelength eavesdropping and cross-domain privilege escalation; (2) Traditional attacks, such as DDoS and brute force attacks. Using F1-score ad false positive rate (FPR) as the core evaluation metrics, the experiment maintained a consistent test environment through controlled variable methodology. GESAC was compared against baseline models, with results visualized in Fig. 8.

As illustrated in Fig. 8, two key findings emerge. First, in terms of F1-score, GESAC exhibits markedly superior performance across most attack types. For emerging physical-layer threats, such as wavelength eavesdropping (A1), optical path hijacking (A2), and cross-domain privilege escalation (A3), GESAC achieves F1-scores of 0.931, 0.915, and 0.921, respectively. These represent improvements of over 0.60 compared to static RBAC and nearly 0.45 over SVM, underscoring the model’s enhanced precision in capturing dependencies between optical path switching and anomalous behaviors through spatiotemporal graph modeling. Second, with respect to FPR, GESAC demonstrates strong robustness. In application-layer attacks such as DDoS (E1) and brute-force login (E2), it maintains low FPRs of 0.9% and 0.7%, respectively—significantly lower than SVM (2.7%, 2.1%) and Q-Learning (3.5%, 2.8%). For more complex, multi-stage attacks like routing hijacking (F1) and protocol exploitation (F3), GESAC contains the FPR within 2.2–2.4%, compared to over 10% in static RBAC. Furthermore, for data- and service-layer threats such as man-in-the-middle (MITM) attacks (B2) and botnet activity (D2), GESAC consistently maintains F1-scores above 0.85 and FPRs below 3%. This performance substantially outpaces traditional models, which often exhibit unstable F1-scores and elevated false positives when facing multi-stage attacks. These results confirm that GESAC not only delivers high accuracy for isolated attacks but also effectively infers causality and temporal patterns in complex, sequential attack chains. GESAC’s core strength lies in its ability to detect anomalous edge attributes during topological evolution, while employing entropy-maximized policy learning to strike a balance between detection accuracy and false positive suppression. This allows it to overcome the limitations of static rule-based approaches, which often suffer from under-detection or overfitting due to inflexible policy definitions. Consequently, GESAC addresses fundamental challenges in decomposed optical networks, including fragmented attack features and detection delays.

Dynamic resource allocation balance validation

For the optimization of resource allocation balance, the study simulates different traffic load conditions using CAIDA topology data, records the distribution of wavelength utilization, and calculates its variance to measure the uniformity of resource allocation. Additionally, the standard deviation of end-to-end latency in milliseconds is monitored to assess the model’s stability under high load fluctuations. By comparing baseline models, the relative trend of changes in wavelength utilization variance (Indicator 1) and end-to-end latency standard deviation (Indicator 2) is analyzed. The results are shown in Table 7:

Table 7.

Dynamic resource allocation balance validation results.

Scenario	Attack status	Metric type	GESAC result	Greedy algorithm result	Shortest path first result	Optimization (vs. greedy)	Optimization (vs. shortest path first)
Low load	No attack	Indicator 1	0.12	0.18	0.25	33.3% ↓	52.0% ↓
Low load	No attack	Indicator 2	2.1	3.5	4.8	40.0% ↓	56.25% ↓
Normal load	No attack	Indicator 1	0.21	0.37	0.52	43.2% ↓	59.6% ↓
Normal load	No attack	Indicator 2	3.8	6.2	8.9	38.7% ↓	57.3% ↓
High load	No attack	Indicator 1	0.34	0.72	1.05	52.8% ↓	67.6% ↓
	Wavelength eavesdropping attack	Indicator 1	0.39	0.95	1.32	58.9% ↓	70.5% ↓
	No attack	Indicator 2	5.2	9.1	12.4	42.9% ↓	58.1% ↓
	Wavelength eavesdropping attack	Indicator 2	6.7	14.3	18.9	53.1% ↓	64.6% ↓
Peak load	DDoS attack	Indicator 1	0.51	1.24	1.68	58.9% ↓	69.6% ↓
Peak load	DDoS attack	Indicator 2	8.3	19.6	24.7	57.7% ↓	66.4% ↓

Open in a new tab

As shown in Table 7, under dynamic load conditions, GESAC significantly mitigates the risk of localized resource overload through cross-domain topological awareness and a federated coordination mechanism. In scenarios combining high traffic loads with wavelength eavesdropping attacks, GESAC reduces the variance in wavelength utilization from 0.95 to 0.39—a 58.9% improvement. Simultaneously, the standard deviation of end-to-end delay drops from 14.3 ms to 6.7 ms, representing a 53.1% reduction. These results validate GESAC’s capability to alleviate the dual impact of physical-layer resource contention and attack-induced service degradation. In more extreme conditions—such as peak load scenarios combined with DDoS attacks—GESAC achieves performance improvements of 58.9% and 57.7% in Indicator 1 and Indicator 2, respectively, demonstrating strong generalization and adaptability in dynamic scheduling. Even under stable load conditions in resource-intensive regions, GESAC reduces the variance in wavelength utilization from 0.37 to 0.21, a 43.2% decrease. Regarding end-to-end latency, it delivers up to a 57.3% improvement compared to shortest-path-first strategies. Moreover, during transitions from low to high load in the absence of attacks, the increase in GESAC’s delay standard deviation remains significantly lower than that of the greedy algorithm—rising from 2.1 to 5.2, compared to an increase from 3.5 to 9.1—highlighting GESAC’s anticipatory responsiveness to traffic trends through spatiotemporal causal inference. In summary, GESAC enables dynamic and balanced scheduling across multiple domains by jointly optimizing physical resource constraints and logical access policies. Even under the combined pressures of topological disruptions and adversarial interference, it maintains a tightly coupled relationship between wavelength utilization uniformity and service stability—surpassing the limitations of traditional scheduling mechanisms that rely on static configurations and centralized control.

Strategy robustness evaluation under topological mutation

The study constructs dynamic topology sequences using CAIDA data, injecting random structural changes, and records the adjustment process of the model’s strategy to evaluate its robustness under topological mutation. The Pareto front coverage rate is used to measure the model’s optimization capability under multiple constraints, reflecting the trade-off between security and service quality. Additionally, the decay rate of strategy entropy is monitored. By comparing the changes in strategy entropy under different attack scenarios, the robustness of GESAC in dealing with multi-stage coordinated attacks is analyzed. The analysis results are shown in Fig. 9:

In Fig. 9, GESAC enhances the robustness and adaptability of strategies in topological mutation scenarios compared to baseline models. When faced with dynamic changes such as node failures and link breakages, the model employs cross-layer federated perception and spatiotemporal causal reasoning to convert topology disruptions into controllable adjustments in the multi-objective optimization space, rather than causing a global degradation of the strategy. For example, in a scenario involving subdomain isolation and high-frequency optical path switching attacks, the model uses a dynamic game mechanism to balance security isolation and resource scheduling needs. This results in a 100% improvement in the balance of the multi-objective Pareto front compared to traditional strategies. Additionally, no local stagnation due to goal conflict occurred during the strategy adjustment process. In terms of strategy diversity maintenance, the entropy regularization mechanism effectively suppresses the strategy convergence speed. Under 20% node failure and multi-stage attack chain interference, the long-term decay of strategy entropy is reduced by more than 65% compared to the baseline, preventing the systemic imbalance caused by over-optimization of a single objective. These results demonstrate that GESAC, through the joint encoding of physical topology states and logical access behaviors, transforms dynamic disruptions into resilient mappings in the strategy space. This approach shifts from passive responses to active optimization, achieving tightly coupled optimization of security and service quality despite structural mutations and attack interference.

Distributed architecture efficiency and scalability testing

While ensuring the model’s technical advantages, its engineering feasibility must also be validated. The study primarily tests the decision efficiency and communication overhead of GESAC under different network topology scales, assessing its scalability limit. The results are shown in Fig. 10:

As shown in Fig. 10, the distributed architecture of GESAC demonstrates notable engineering value in terms of both efficiency and scalability. In tests where the network size was scaled from 100 to 100,000 nodes, the per-step decision latency increased from 4.2 µs to 25.6 µs, exhibiting a sub-linear growth trend. This ensures millisecond-level control responsiveness even in ultra-large-scale optical network environments. In terms of communication overhead, GESAC’s federated architecture effectively reduces load through local feature aggregation within subdomains and inter-domain gradient sharing. Compared to the centralized architecture’s 3288.7 MB per cycle, GESAC achieves a communication cost of only 1389.5 MB in a 100,000-node, 500-subdomain scenario—a reduction consistently maintained between 57.2% and 57.9%. Additionally, the scalability slope is controlled within a range of 1.2 to 2.0 MB per subdomain, indicating that communication growth remains linearly manageable with increasing subdomain numbers and avoids the bandwidth bottlenecks typical of centralized scheduling. At a deeper level, GESAC leverages a federated game-theoretic equilibrium mechanism to decouple collaboration and resource isolation across subdomains. This design enables each subdomain to operate independently while maintaining global awareness of topological dynamics, thereby enhancing the adaptability of cross-domain resource scheduling. Overall, the architecture delivers three key advantages—low latency, low bandwidth dependency, and high scalability—making it well-suited for online access control deployment in large-scale decomposed optical networks.

Mininet optical dataset validation

To further evaluate GESAC’s adaptability under realistic network transmission delays, the Mininet Optical simulator was introduced to construct an emulated optical network environment. A portion of dynamically preprocessed topological data from GESAC was imported to simulate light-path switching and access behaviors across network sizes ranging from 50 to 500 nodes. Attack events were extracted from the original CIC-IDS dataset and injected into link traffic streams through mapping-based techniques. In the simulation, GESAC was deployed as an edge control strategy, communicating with Mininet’s control plane via REST APIs. This setup enabled policy synchronization within the federated learning framework while executing local access control actions. Using iperf and tc commands, key performance metrics—including transmission latency, jitter, and FPR—were measured under various attack and traffic load conditions. These were further analyzed in conjunction with wavelength allocation results. The outcomes are summarized in Table 8:

Table 8.

Validation results of GESAC on the mininet optical dataset.

Test scenario	Avg. end-to-end delay (ms)	Max delay (ms)	Min delay (ms)	Std. Dev. of delay (ms)	Wavelength utilization (%)	GESAC_FPR (%)
Small topology (50 nodes)	1.823	2.431	1.213	0.412	78.345	1.1
Medium topology (200 nodes)	3.417	5.823	2.014	0.879	84.212	1.3
Large topology (500 nodes)	8.121	12.301	5.612	2.101	86.013	1.5
Medium topology + light load	2.154	3.276	1.325	0.536	82.136	1
Medium topology + heavy load	5.843	9.104	3.804	1.762	71.884	1.8
Medium topology + DDoS injection	6.276	10.443	4.173	1.944	68.905	2.1
Large topology + topology shift	9.472	14.682	6.438	2.739	81.324	1.7

Open in a new tab

As shown in Table 8, GESAC demonstrates strong performance even under realistic network transmission conditions. First, in terms of average end-to-end delay, GESAC maintains low latency across small to medium topologies, ranging from 1.8 ms to 3.4 ms. At a larger scale of 500 nodes, the average delay remains well-controlled at 8.1 ms, with a maximum of 12.3 ms, highlighting its scalability and effective delay management. Second, under various load and attack scenarios—such as heavy traffic, DDoS injections, and topological shifts—GESAC sustains high wavelength utilization between 68.9% and 86.0%, while keeping the FPR consistently below 2.1%, even as delay variance increases. Notably, in the “Medium Topology + Light Load” scenario, GESAC achieves its best performance, with the lowest FPR (1.0%) and minimal delay variability (0.536 ms), demonstrating high stability and precision in resource-abundant environments. Furthermore, under “Topology Shift” conditions, despite an increase in maximum delay to 14.68 ms, GESAC maintains an average FPR of just 1.7%, reflecting its robustness and adaptability to structural changes. Overall, these results validate GESAC’s practical deployability in real-world, multi-domain decomposed optical networks, confirming its capability to handle complex, multi-objective secure access control requirements under dynamic latency and disruption constraints.

Discussion

The study analyzes the decomposition of optical network security access control using GESAC and proposes the following three optimization directions based on the findings:

Enhance attack detection capabilities to build a comprehensive defense system: The attack detection model should be further improved by incorporating feature learning from more complex attack scenarios. This would enhance sensitivity to new attacks through cross-domain knowledge fusion. Additionally, the detection algorithm should be optimized to reduce FPR. Strengthening attack chain tracing and blocking mechanisms will improve the overall resilience of the optical network against attacks.
Optimize dynamic resource allocation strategies to improve efficiency and stability: A real-time dynamic resource allocation system should be built based on traffic and security situational awareness. In various load and attack scenarios, intelligent algorithms should be used to allocate resources precisely. This will improve wavelength utilization, reduce end-to-end delay fluctuations, balance resource competition and security investment, and ensure service quality.
Enhance flexibility and robustness in responding to topological mutations to ensure continuous and stable network operation: Cross-layer federated perception and spatiotemporal causal reasoning mechanisms should be improved. When topology structure mutations occur, the system should quickly translate these changes into optimization criteria for strategies, maintaining a balance between security and service quality. Strengthening strategy entropy regularization will ensure strategy diversity, prevent stagnation caused by topology changes, and ensure stable operation of the optical network under complex conditions.

Given that GESAC integrates both GNN and reinforcement learning strategies, its operation naturally involves a certain level of computational overhead. To address this, the implementation adopts a modular distributed architecture, where each subdomain model independently performs topology graph encoding and policy updates on local processors. Experimental results confirm that even at a scale of 100,000 nodes, single-step decision latency remains as low as 25.6 µs, with communication overhead growth stabilized at under 2 MB per subdomain—indicating the feasibility of parallel deployment on devices with moderate edge computing capabilities. Leveraging the principles of federated architecture, GESAC eliminates the need for centralized training. Instead, policy coordination is achieved through localized subdomain updates and periodic parameter aggregation. This allows deployment on network devices that support control functions, such as SDN controllers, edge servers, or virtualized router instances. A central coordination node handles policy synchronization and global aggregation, aligning well with current domain-based management models and mainstream multi-controller SDN frameworks. Overall, this deployment strategy—supported by both architectural design and empirical performance validation—demonstrates strong engineering feasibility and adaptability for real-world optical network environments.

Conclusion

The key innovation of this study lies in transforming the problem of resource fragmentation during optical network decomposition into a dynamic graph partitioning issue. The improved GNN community discovery algorithm is employed to automatically identify security boundaries, while the entropy maximization exploration mechanism of SAC balances network security and service continuity in access policy optimization. Ultimately, end-to-end training is achieved through joint gradient updates. In this process, the gradient of the GNN parameters is backpropagated through the policy value network. SAC’s policy update then relies on topology-sensitive state features extracted by the GNN, creating a closed-loop optimization system.

However, this study has certain limitations. GESAC’s detection and decision-making efficiency may be affected when handling ultra-large-scale complex networks or novel unknown attack combinations. The current GNN architecture requires further optimization to reduce computational resource consumption, particularly when dealing with massive nodes and complex connections. Future work will plan to integrate neural signal processing technology with the existing model. Neural signal processing can quickly perceive and process information by simulating the signal transmission mechanism between neurons. This approach would enable the model to capture subtle changes in network status in real time, allowing for a rapid response to network attacks and fluctuations in resource demand. For example, the high-speed parallel processing capabilities of neuromorphic chips can accelerate the model’s analysis of network data. Additionally, combining online learning algorithms will continuously optimize the model during operation. This will enhance its adaptability to dynamic environments.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(1.4MB, zip)}

Author contributions

Zhenqian Zhao contributed to conception and design of the study. Yuhe Wang organized the database. Zhenqian Zhao performed the statistical analysis. Yuhe Wang wrote the first draft of the manuscript. Zhenqian Zhao wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Data availability

Data is provided within the manuscript or supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Hu, L. et al. Apr., Convex Optimization - Based High - Speed and Security Joint Optimization Scheme in Optical Access Networks, OPT EXPRESS, vol. 32, no. 4, pp. 6748–6764, (2024). [DOI] [PubMed]
2.Bin, S. & Sun, G. Research on the Influence Maximization Problem in Social Networks Based on the Multi - Functional Complex Networks Model, JOEUC, vol. 34, no. 3, pp. 1–17, May. (2022).
3.Opanin Gyamfi, E. et al. Hierarchical graph neural network: A lightweight image matching model with enhanced message passing of local and global information in hierarchical graph neural networks. Information15 (10), 602 (Oct. 2024).
4.Zhang, Y. et al. Mar., Access Point Selection Based on Beacon Signals in Optical Satellite Networks, J OPT COMMUN NETW, vol. 17, no. 3, pp. 163–177, (2025).
5.Zhang, M. et al. Quantum noise stream cipher scheme based on 3 - Channel - QAM with 3D mapping and DFTs - OFDM. IEEE PHOTONIC TECH. L. 25 (5), 677 (May. 2024).
6.Chen, X. et al. Meta - Learning - Aided QoT estimator provisioning for a dynamic VNT configuration in optical networks. J. OPT. COMMUN. NETW.17 (1), A103–A (Jan. 2025).
7.Boeding, M., Hempel, M. & Sharif, H. Novel approach towards a fully deep Learning - Based IoT receiver architecture: from Estimation to decoding. Future Internet. 16 (5), 155 (May. 2024).
8.Sehgal, S. K. & Gupta, R. Dual Secret Key Virtualization Through TCC Based Wavelength Allocation and BB84 Protocol in QKD over Optical Networks, WIRELESS PERS COMMUN, vol. 134, no. 3, pp. 1443–1468, Mar. (2024).
9.Bekri, M., Reyes, R. R. & Bauschert, T. Multilayer Restoration in IP - Optical Networks by Adjustable Robust Optimization and Deep Reinforcement Learning, J OPT COMMUN NETW, vol. 16, no. 7, pp. 721–734, Jul. (2024).
10.Krishna-Sowjanya, K. & Mouleeswaran, S. K. Enhancing cloud security with intelligent load balancing and malicious request classification, Cluster Computing, vol. 28, no. 1, p. 24, May. (2025).
11.Lin, L. et al. Mar., E-GRACL: an IoT intrusion detection system based on graph neural networks, J SUPERCOMPUT, vol. 81, no. 1, p. 24, (2025).
12.Guo, P. et al. Jun., Adaptive Access Strategy for Satellite - Terrestrial Optical Networks Based on Particle Swarm Optimization Algorithm, PHYS COMMUN-AMST, vol. 69, no. 6, p. 102600, (2025).
13.Yin, X., Wu, X. & Zhang, X. A Trusted Federated Learning Method Based on Consortium Blockchain, Information, vol. 16, no. 1, p. 14, Jun. (2024).
14.Alalyan, F. et al. Mar., Secure distributed federated learning for cyberattacks detection in B5G open radio access networks, IEEE Open Journal of the Communications Society, vol. 6, no. 7, pp. 3067–3081, (2025).
15.Song, H. et al. Oct., Cluster - Based Unsupervised Method for Eavesdropping Detection and Localization in WDM Systems, J OPT COMMUN NETW, vol. 16, no. 10, pp. F52-F61, (2024).
16.Kish, S. P. et al. Mitigation of channel tampering attacks in Continuous - Variable quantum key distribution. Phys. Rev. Res.6 (2), 023301 (Feb. 2024).
17.Gong, X. et al. Mar., Machine - Learning - Based Optical Spectrum Feature Analysis for DoS Attack Detection in IP over Optical Networks, OPT EXPRESS, vol. 32, no. 3, pp. 3793–3803, (2024). [DOI] [PubMed]
18.Shirichian, M. et al. Mar., A QTCP/IP Reference Model for Partially Trusted - Node - Based Quantum - Key - Distribution - Secured Optical Networks, QUANTUM INF PROCESS, vol. 23, no. 3, p. 87, (2024).
19.Alshammari, S. T. et al. Oct., Building a Comprehensive Trust Evaluation Model to Secure Cloud Services From Reputation Attacks, IEEE ACCESS, vol. 12, no. 10, pp. 150754–150775, (2024).
20.Said, D. et al. Quantum entropy and reinforcement learning for distributed denial of service attack detection in smart grid. IEEE ACCESS.7 (2), 102452 (Feb. 2024).
21.Nourmohammadi, F. et al. Using Convolutional Neural Networks for Blocking Prediction in Elastic Optical Networks, Applied Sciences, vol. 14, no. 5, p. May. 2024. (2003).
22.Ning, Y. et al. May., Biphase Routing Scheme for Optimal Throughput in Large - Scale Optical Satellite Networks, J OPT COMMUN NETW, vol. 16, no. 5, pp. 553–564, (2024).
23.Matsuura, H. et al. Extension of the k - SPF algorithm for finding SRLG - Disjoint primary and backup route pairs in optical networks. J OPT. COMMUN. NETW, 16, 9, pp. E23-E35, Sep. 2024.
24.Zou, Y. E. et al. Jun., Aging - Aware Co - Optimization of Topology, Parameter and Control for Multi - Mode Input - and Output - Split Hybrid Electric Powertrains, J POWER SOURCES, vol. 10, no. 6, p. 24, (2024).
25.Peng, H. et al. Jun., Evading control flow graph based GNN malware detectors via active opcode insertion method with maliciousness preserving, SCI REP-UK, vol. 15, no. 1, p. 9174, (2024). [DOI] [PMC free article] [PubMed]
26.Mughal, F. R. et al. Mar., A Meta-Reinforcement Learning Framework Using Deep Q‐Networks and GCNs for Graph Cluster Representation, SOFTWARE PRACT EXPER, vol. 3, no. 1, p. 17, (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(1.4MB, zip)}

Data Availability Statement

Data is provided within the manuscript or supplementary information files.

[CR1] 1.Hu, L. et al. Apr., Convex Optimization - Based High - Speed and Security Joint Optimization Scheme in Optical Access Networks, OPT EXPRESS, vol. 32, no. 4, pp. 6748–6764, (2024). [DOI] [PubMed]

[CR2] 2.Bin, S. & Sun, G. Research on the Influence Maximization Problem in Social Networks Based on the Multi - Functional Complex Networks Model, JOEUC, vol. 34, no. 3, pp. 1–17, May. (2022).

[CR3] 3.Opanin Gyamfi, E. et al. Hierarchical graph neural network: A lightweight image matching model with enhanced message passing of local and global information in hierarchical graph neural networks. Information15 (10), 602 (Oct. 2024).

[CR4] 4.Zhang, Y. et al. Mar., Access Point Selection Based on Beacon Signals in Optical Satellite Networks, J OPT COMMUN NETW, vol. 17, no. 3, pp. 163–177, (2025).

[CR5] 5.Zhang, M. et al. Quantum noise stream cipher scheme based on 3 - Channel - QAM with 3D mapping and DFTs - OFDM. IEEE PHOTONIC TECH. L. 25 (5), 677 (May. 2024).

[CR6] 6.Chen, X. et al. Meta - Learning - Aided QoT estimator provisioning for a dynamic VNT configuration in optical networks. J. OPT. COMMUN. NETW.17 (1), A103–A (Jan. 2025).

[CR7] 7.Boeding, M., Hempel, M. & Sharif, H. Novel approach towards a fully deep Learning - Based IoT receiver architecture: from Estimation to decoding. Future Internet. 16 (5), 155 (May. 2024).

[CR8] 8.Sehgal, S. K. & Gupta, R. Dual Secret Key Virtualization Through TCC Based Wavelength Allocation and BB84 Protocol in QKD over Optical Networks, WIRELESS PERS COMMUN, vol. 134, no. 3, pp. 1443–1468, Mar. (2024).

[CR9] 9.Bekri, M., Reyes, R. R. & Bauschert, T. Multilayer Restoration in IP - Optical Networks by Adjustable Robust Optimization and Deep Reinforcement Learning, J OPT COMMUN NETW, vol. 16, no. 7, pp. 721–734, Jul. (2024).

[CR10] 10.Krishna-Sowjanya, K. & Mouleeswaran, S. K. Enhancing cloud security with intelligent load balancing and malicious request classification, Cluster Computing, vol. 28, no. 1, p. 24, May. (2025).

[CR11] 11.Lin, L. et al. Mar., E-GRACL: an IoT intrusion detection system based on graph neural networks, J SUPERCOMPUT, vol. 81, no. 1, p. 24, (2025).

[CR12] 12.Guo, P. et al. Jun., Adaptive Access Strategy for Satellite - Terrestrial Optical Networks Based on Particle Swarm Optimization Algorithm, PHYS COMMUN-AMST, vol. 69, no. 6, p. 102600, (2025).

[CR13] 13.Yin, X., Wu, X. & Zhang, X. A Trusted Federated Learning Method Based on Consortium Blockchain, Information, vol. 16, no. 1, p. 14, Jun. (2024).

[CR14] 14.Alalyan, F. et al. Mar., Secure distributed federated learning for cyberattacks detection in B5G open radio access networks, IEEE Open Journal of the Communications Society, vol. 6, no. 7, pp. 3067–3081, (2025).

[CR15] 15.Song, H. et al. Oct., Cluster - Based Unsupervised Method for Eavesdropping Detection and Localization in WDM Systems, J OPT COMMUN NETW, vol. 16, no. 10, pp. F52-F61, (2024).

[CR16] 16.Kish, S. P. et al. Mitigation of channel tampering attacks in Continuous - Variable quantum key distribution. Phys. Rev. Res.6 (2), 023301 (Feb. 2024).

[CR17] 17.Gong, X. et al. Mar., Machine - Learning - Based Optical Spectrum Feature Analysis for DoS Attack Detection in IP over Optical Networks, OPT EXPRESS, vol. 32, no. 3, pp. 3793–3803, (2024). [DOI] [PubMed]

[CR18] 18.Shirichian, M. et al. Mar., A QTCP/IP Reference Model for Partially Trusted - Node - Based Quantum - Key - Distribution - Secured Optical Networks, QUANTUM INF PROCESS, vol. 23, no. 3, p. 87, (2024).

[CR19] 19.Alshammari, S. T. et al. Oct., Building a Comprehensive Trust Evaluation Model to Secure Cloud Services From Reputation Attacks, IEEE ACCESS, vol. 12, no. 10, pp. 150754–150775, (2024).

[CR20] 20.Said, D. et al. Quantum entropy and reinforcement learning for distributed denial of service attack detection in smart grid. IEEE ACCESS.7 (2), 102452 (Feb. 2024).

[CR21] 21.Nourmohammadi, F. et al. Using Convolutional Neural Networks for Blocking Prediction in Elastic Optical Networks, Applied Sciences, vol. 14, no. 5, p. May. 2024. (2003).

[CR22] 22.Ning, Y. et al. May., Biphase Routing Scheme for Optimal Throughput in Large - Scale Optical Satellite Networks, J OPT COMMUN NETW, vol. 16, no. 5, pp. 553–564, (2024).

[CR23] 23.Matsuura, H. et al. Extension of the k - SPF algorithm for finding SRLG - Disjoint primary and backup route pairs in optical networks. J OPT. COMMUN. NETW, 16, 9, pp. E23-E35, Sep. 2024.

[CR24] 24.Zou, Y. E. et al. Jun., Aging - Aware Co - Optimization of Topology, Parameter and Control for Multi - Mode Input - and Output - Split Hybrid Electric Powertrains, J POWER SOURCES, vol. 10, no. 6, p. 24, (2024).

[CR25] 25.Peng, H. et al. Jun., Evading control flow graph based GNN malware detectors via active opcode insertion method with maliciousness preserving, SCI REP-UK, vol. 15, no. 1, p. 9174, (2024). [DOI] [PMC free article] [PubMed]

[CR26] 26.Mughal, F. R. et al. Mar., A Meta-Reinforcement Learning Framework Using Deep Q‐Networks and GCNs for Graph Cluster Representation, SOFTWARE PRACT EXPER, vol. 3, no. 1, p. 17, (2025).

PERMALINK

Soft actor-critic algorithm and improved GNN model in secure access control of disaggregated optical networks

Zhenqian Zhao

Yuhe Wang

Abstract

Supplementary Information

Introduction

Implementation principles of SAC and GNN models

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Implementation of the GESAC model

Fig. 5.

Table 1.

Data collection process for GESAC model validation

Fig. 6.

Fig. 7.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Application effectiveness of GESAC in optical network security access control

Multi-dimensional attack detection performance analysis

Fig. 8.

Dynamic resource allocation balance validation

Table 7.

Strategy robustness evaluation under topological mutation

Fig. 9.

Distributed architecture efficiency and scalability testing

Fig. 10.

Mininet optical dataset validation

Table 8.

Discussion

Conclusion

Supplementary Information

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases