| Algorithm 1 Actor-dueling-critic algorithm |
|
1: Initialize: Initialize actor and dueling-critic Initialize target actor with and target dueling-critic with Initialize replay memory , random process . Uniformly separate the action space to n intervals (). 2: for episode=1 to M do 3: Receive initial state 4: for t=1 to N do 5: With probability select action , otherwise select 6: Execute and observe reward and new state 7: Store transition in R 8: Sample a random minibatch of N transitions from R 9: Implement target actor 10: Implement dueling-critic (Equation (14)) with 11: Set (set if is terminal) 12: Update dueling-critic by minimizing the loss: 13: Update actor using the sampled PG: 14: Soft update target networks of dueling-critic and actor (): 15: end for 16: end for |