Skip to main content
. 2023 Jan 22;25(2):215. doi: 10.3390/e25020215
Algorithm 1: Actor-Critic in the proposed model

Input: Iteration T, time step α, discount factor γ, hypermeter for policy network θ

Process:

Initialize observations of states φ(s1),φ(s2),...,φ(sn)

for i=1,2,...,T:

Ri,φ(si+1),Ai+1 = Actor(φ(si),Ai)

V(si),V(si) = Critic(φ(si),φ(si))

Update TD Error by

δRi+V(si)V(si)

Update Critic by

ωω+βδV(si)

Update Actor by

θθ+αθlogπθ(St,Ai)δ

Update State by

φ(si)=φ(si+1)

end for

Output: ω,θ