Algorithm 1: Actor-Critic in the proposed model |
Input: Iteration T, time step , discount factor , hypermeter for policy network Process: Initialize observations of states for : ,, = Actor(,) , = Critic(,) Update TD Error by
Update Critic by
Update Actor by
Update State by
end for Output: |