Skip to main content
. 2023 Jan 15;23(2):990. doi: 10.3390/s23020990
Algorithm 1: PPO, Actor–Critic Style [15]
for iteration 1,2, ... do
  for actor=1,2, ... N do
   Interact with the environment using the πθold policy
   feed the states to the critic network to calculate states
    base estimate
   Compute advantage estimates
  end for
  Optimize surrogate L wrt θ, with K epochs
  update the policy
end for