Skip to main content
. 2021 Oct 29;23(11):1433. doi: 10.3390/e23111433
Algorithm 2 Adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm.
1: for N agents, randomly initialize their critic network Qisi,ai;θiQ and actor network μisi|θiμ
2: synchronize target networks Qisi,ai;θiQ and μisi|θiμ with θiQθiQ,θiμθiμ
3: initialize hyperparameters: experience buffer D, mini-batch size m, max episode M, max step T, actor learning rate la, critic learning rate lc, discount factor γ, soft update rate τ
4: for episode = 1, M do
5:  reset environment, and receive the initial state x
6:  initialize exploration noise of action Naction
7:  for t = 1, T do
8:  si(noise)StateAttackQiμx,a1,a2,,an,μisi|θiμ,si
9:   for each agent i, select action ai=μsi(noise);θiμ+Naction
10:   execute a1,a2,,an, rewards r1,,rn, and next state x’
11:   store sample x,a1,a2,,an,r1,,rn,x in D
12:   for agenti = 1, n do
13:    randomly extract m samples xk,ak,rk,xk,
14:    update critic network by Equation (27)
15:    update actor network by:
θiμJ=m1jθiμμiaisiaiQiμx,a1aiaN;θiQai=μisi,aji*=ajk+η˜j
16:   end for
17:   update target networks by θiQτθiQ+(1τ)θiQ,θiμτθiμ+(1τ)θiμ
18:  end for
19: end for