Skip to main content
. Author manuscript; available in PMC: 2022 Dec 22.
Published in final edited form as: IEEE Trans Cybern. 2021 Dec 22;51(12):5717–5727. doi: 10.1109/TCYB.2019.2958912

Algorithm 3.

Experience Replay for Stochastic Environment (ExperienceReplay)

(1) g ← 0
(2) Choose a simulated continuous state, x_t, from the state-space
(3) while g < K do
(4) s_tϕ(x_t)
(5) a_targmaxaAp((1λ)eΔQ(s_t,a)+(λ)κ)
(6) (Δx¯Ncl_a,r¯Ncl_a)M(s_t,a_t)
(7) x_t+1x_t+Δx¯Ncl_a
(8) s_t+1ϕ(x_t+1)
(9) Use (1) and (2) to update the Q-value and ΔQ-value
(10) x_tx_t+1
(11) gg + 1
(12) end while