Skip to main content
. 2021 May 13;21(10):3409. doi: 10.3390/s21103409
Algorithm 4. Pretraining the dynamics network.
Function Pretrain_Dynamics_Network (fϕ, D)
/* the dynamics network fϕ, the demo dataset D */
for e = 0,…,E epochs do
  for i = 0,…,|D| do
Initialize the episodic buffer B to be empty
Sample d=s1E,a1E,s2E,a2E,,silE,ailE from D
Initialize the environment to initial state s1E
for t = 1,…. il1 timesteps do
Execute an action atE and perceive the next state st+1
B B  {st,atE,st+1}//append each state transition to B
    end for
Update the dynamics network parameters ϕ by gradient descent
       ϕϕϕst,atE,st+1Bfϕst,atEst+122
   end for
end for
return fϕ