Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets

. 2022 Jul 19;22(14):5389. doi: 10.3390/s22145389

Algorithm 1 The process noise adaption based on DDPG

1 Initialize the parameters of the actor network and critic network

2 Initialize target networks by copying the actor and critic network

3 Initialize the replay memory buffer

ℛ

For each episode, perform the following steps

4 Initialize the estimation state and its covariance matrix

For each timestep, perform the following steps

5 Generate an action based on the actor network and the current state

a_{k} = A^{μ} (s_{k}) + 𝒩_{k}

, where the random noise is generated by Ornstein-Uhlenbeck process

6 Execute the action, i.e., the compensation factor in the filter to obtain a new state

s_{k + 1}

and a new reward

r_{k}

7 Store the sample

(s_{k}, a_{k}, s_{k + 1}, r_{k})

in the buffer

ℛ

8 Randomly select

N_{S a m p l e}

samples from the buffer

9 Calculate the temporal difference error

δ_{p}

of each sample

δ_{p} = r_{p} + ξ Q^{ω^{’}} (s_{p + 1}, A^{μ^{’}} (s_{p + 1})) - Q^{ω} (s_{p}, A^{μ} (s_{p}))

10 Calculate the policy gradient

\nabla_{μ} J (A^{μ}) = \frac{1}{N_{S a m p l e}} \sum_{p = 1}^{N_{S a m p l e}} \nabla_{μ} A^{μ} (s_{p}) \nabla_{a_{t}} Q^{ω} (s_{p}, a_{p})

11 Update the actor network by Adam optimizer:

μ_{k + 1} = f_{A d a m} (\nabla_{μ} J (A^{μ}))

12 Update the critic network

\nabla_{ω} ℒ (ω) = \frac{1}{N_{S a m p l e}} \sum_{p = 1}^{N_{S a m p l e}} - 2 δ_{p} \nabla_{ω} Q^{ω} (s_{p}, a_{p})

ω_{k + 1} = f_{A d a m} (ω_{t}, \nabla_{ω} ℒ (ω))

13 Update the two target networks by soft update

μ ’_{k + 1} = τ μ_{k + 1} + (1 - τ) μ ’_{k}

ω ’_{k + 1} = τ ω_{k + 1} + (1 - τ) ω ’_{k}

End timestep

End episode