Skip to main content
. 2025 Jul 31;11:e3089. doi: 10.7717/peerj-cs.3089

Algorithm 1 . Soft adversarial asynchronous actor-critic intrusion detection (SA3C-ID).

Input: Train Dataset DTrain, Test Dataset DTest, Number of workers Nworkers
Output: Detection results of policy πθ on test dataset DTest
01: Preprocess train and test datasets
02: Initialize datasets by SABPIO algorithm: DTrainDTrain, DTestDTest
03: Initialize global experience replay pool: Bgloblea0, Bglobled0
04: Initialize πθ (defender policy) and πφ (attacker policy) randomly
05: Initialize shared parameters θ and φ for πθ and πφ respectively
06: Initialize worker threads based on Nworkers
07: foreach worker in worker threads to do
08:            for epoch = 1 to epochs do
09:                Initialize the state s0 random sampling from DTrain
10:                Choose a set of actions Aa,t based on πφ(s0)
11:                Initialize local Mini-Batch buffer M random sampling from DTrain, and labels all belong to Aa,t
12:                              while not done do
13:                                     Choose action ad,ti based on πθ(st) for each state sti in M
14:                                     Obtain rewards for the attacker and the defender: ra,t+1, rd,t+1
15:                                     Choose new set of actions Aa,t+1 based on πφ(st)
16:                                     Update local Mini-Batch buffer M random sampling from DTrain, and labels all belong to Aa,t+1
17:                                     Calculate the priority weight wt of each attacker experience st,aa,t,ra,t+1,sa,t+1and each defender experience st,ad,t,rd,t+1,sd,t+1
18:                                     Store attacker experience Bglobleast,aa,t,ra,t+1,sa,t+1,dt,wt
19:                                     Store defender experience Bglobledst,ad,t,rd,t+1,sd,t+1,dt,wt
20:                                     Update worker attacker agent by sampling M experience from Bgloblea based on their weights wt
21:                                     Update worker defender agent by sampling M experience from Bglobled based on their weights wt
22:                             end while
23:                             Update global attacker agent by worker attacker agent
24:                             Update global defender agent by worker defender agent
25:            end for
26: end foreach
27: Test policy πθ on test dataset DTest
28: Return Detection results