Skip to main content
. 2021 Nov 27;21(23):7925. doi: 10.3390/s21237925
Algorithm 2 PPO-Based Deep Optimistic Linear Support
  • 1:

    Initialization:

  • 2:

    S = partial CSS, which is composed of Vt obtained after the PPO learning.

  • 3:

    W = corner weights, which is obtained from S.

  • 4:

    Q = priority queue of weights for the multi-objective, where the weights form a tuple along with their importance (i.e., ([ω1t,ω2t],I)).

  • Instruction: 

     

  • 5:

    ωt=Q.pop()

  • 6:

    for iteration=1,2, ….,do

  • 7:

        for iteration=1,2, …., T do

  • 8:

            at=πθold(st)

  • 9:

            [ret,rdt],st+1=Env(at)

  • 10:

            Reduce scaling of [ret,rdt]

  • 11:

            M=M{st,at,[ret,rdt],st+1}

  • 12:

            [Aet,Adt]= compute advantage estimate from Equation (26)

  • 13:

            At^=At^{[Aet,Adt]×[ω1t,ω2t]}

  • 14:

        end for

  • 15:

        Optimize surrogate L and wrt θ from At^, with K epochs   

  • 16:

        Optimize Vϕ and wrt ϕ from Vt^GAE(γ,λ), with K epochs

  • 17:

       θold=θ,ϕold=ϕ

  • 18:

    end for when convergence

  • 19:

    Vt=Vϕ(s)

  • 20:

    W=Wωt

  • 21:

    if ωt·Vt>USωt·U then

  • 22:

        S = remove obsolete Vdel due to new Vt

  • 23:

        ωc = new corner weight from S

  • 24:

        S=SVt

  • 25:

        Q = remove obsolete ωdel due to new ωc

  • 26:

        for iteration=1,2, …., ωc do

  • 27:

            if estimate improvement of (ω,W,S)>τ then

  • 28:

               Q=Qω

  • 29:

            end if

  • 30:

        end for

  • 31:

    end if

  • 32:

    ifQ is not empty then

  • 33:

        go back to line 1

  • 34:

    end if