Skip to main content
. 2022 Nov 24;22(23):9114. doi: 10.3390/s22239114
Algorithm 1 PPO Learning Algorithm
  • 1:

    Initialize θ and ϕ with random numbers

  • 2:

    for each iteration do

  • 3:

        Collect B number of trajectories following the policy πθ in the actor

  • 4:

        Update θ as θ+αθL(θ) using Equation (10).

  • 5:

        Update ϕ as ϕ+αϕL(ϕ) using Equation (11).

  • 6:

    end for