Skip to main content
. 2022 Jul 19;22(14):5389. doi: 10.3390/s22145389
Algorithm 1 The process noise adaption based on DDPG
1 Initialize the parameters of the actor network and critic network
2 Initialize target networks by copying the actor and critic network
3 Initialize the replay memory buffer
For each episode, perform the following steps
4 Initialize the estimation state and its covariance matrix
  For each timestep, perform the following steps
5   Generate an action based on the actor network and the current state ak=Aμ(sk)+𝒩k, where the random noise is generated by Ornstein-Uhlenbeck process
6   Execute the action, i.e., the compensation factor in the filter to obtain a new state sk+1 and a new reward rk
7   Store the sample (sk,ak,sk+1,rk) in the buffer
8   Randomly select NSample samples from the buffer
9   Calculate the temporal difference error δp of each sample
   δp=rp+ξQω(sp+1,Aμ(sp+1))Qω(sp,Aμ(sp))
10   Calculate the policy gradient
   μJ(Aμ)=1NSamplep=1NSampleμAμ(sp)atQω(sp,ap)
11   Update the actor network by Adam optimizer:
   μk+1=fAdam(μJ(Aμ))
12   Update the critic network
   ω(ω)=1NSamplep=1NSample2δpωQω(sp,ap)
   ωk+1=fAdam(ωt,ω(ω))
13   Update the two target networks by soft update
   μk+1=τμk+1+(1τ)μk
   ωk+1=τωk+1+(1τ)ωk
End timestep
End episode