Skip to main content
. 2023 Oct 6;23(19):8278. doi: 10.3390/s23198278
Algorithm 1 Policy and Model Optimization (PMO)
  • 1:

    Input: DB and DE                                                                   Stage 01

  • 2:

    Output: τθ

  • 3:

    fϕ0(τ|X)DB                                                                       Stage 02 as per Equation (1)

  • 4:

    Initialize πθ, Dπ, LE

  • 5:

    while not done do:

  • 6:

          Episode = sample(DE)

  • 7:

          for i=0:LE do:

  • 8:

                 (state:xtc,action:fϕi(xd))fϕi(τϕi|xd)

  • 9:

                 Append: Dixtc,τϕi

  • 10:

               DBDBDi

  • 11:

               DπDπDi

  • 12:

               Every nb   and   Dπ2LE do:

  • 13:

                     Compute loss for DBM: MSE (Lϕ)

  • 14:

                     DBM Optimization: fϕiLϕ                             Stage 02 on repeat

  • 15:

                     Compute Behavioral loss: (LB)                           From Equations (3) and (4)

  • 16:

                     Compute policy loss: (Lθ)                                    From Equations (3) and (4)

  • 17:

                     Policy Training & Optimization: πθLθ        Stage 03