Skip to main content
. 2023 Oct 6;23(19):8278. doi: 10.3390/s23198278
Algorithm 2 Student Policy OpTimization (SPOT)
  • 1:

    Input: DB and DE                                           Stage 01

  • 2:

    Output: τθ

  • 3:

    fϕ(τ|X)DB                                                   Stage 02 as per Equation (1)

  • 4:

    Initialize πθ, Dπ, LE

  • 5:

    while not done do:

  • 6:

          Episode = sample(DE)

  • 7:

          for i=0:LE do:

  • 8:

                 (state:xtc,action:fϕ(xd))fϕ(τϕ|xd)

  • 9:

                 Append: Dixtc,τϕ

  • 10:

               Policy Buffer: DπDπDi

  • 11:

               Every nb   and   Dπ2LE do:

  • 12:

                      Compute Loss: Lθ                         From Equation (2)

  • 13:

                      Policy Training: πθLθ          Stage 03