Algorithm 4. Pretraining the dynamics network. |
Function Pretrain_Dynamics_Network ( )
/* the dynamics network , the demo dataset */ for e = 0,…,E epochs do for i = 0,…,|D| do Initialize the episodic buffer to be empty Sample from Initialize the environment to initial state for t = 1,…. timesteps do Execute an action and perceive the next state B B{//append each state transition to B end for Update the dynamics network parameters by gradient descent end for end for return |