|
Algorithm 1 The process noise adaption based on DDPG |
| 1 Initialize the parameters of the actor network and critic network |
| 2 Initialize target networks by copying the actor and critic network |
| 3 Initialize the replay memory buffer
|
| For each episode, perform the following steps |
| 4 Initialize the estimation state and its covariance matrix |
|
For each timestep, perform the following steps
|
| 5 Generate an action based on the actor network and the current state , where the random noise is generated by Ornstein-Uhlenbeck process |
| 6 Execute the action, i.e., the compensation factor in the filter to obtain a new state and a new reward
|
| 7 Store the sample in the buffer
|
| 8 Randomly select samples from the buffer |
| 9 Calculate the temporal difference error of each sample |
|
|
| 10 Calculate the policy gradient |
|
|
| 11 Update the actor network by Adam optimizer: |
|
|
| 12 Update the critic network |
|
|
|
|
| 13 Update the two target networks by soft update |
|
|
|
|
|
End timestep
|
| End episode |