Hybrid Imitation Learning Framework for Robotic Manipulation Tasks

. 2021 May 13;21(10):3409. doi: 10.3390/s21103409

Algorithm 4. Pretraining the dynamics network.

Function Pretrain_Dynamics_Network (

f_{ϕ},

D

)
/* the dynamics network

f_{ϕ}

, the demo dataset

D

*/
for e = 0,…,E epochs do
for i = 0,…,|D| do
Initialize the episodic buffer

B

to be empty
Sample

d = 〈(s_{1}^{E}, a_{1}^{E}), (s_{2}^{E}, a_{2}^{E}), \dots, (s_{i l}^{E}, a_{i l}^{E})〉

from

D

Initialize the environment to initial state

s_{1}^{E}

for t = 1,….

i l - 1

timesteps do
Execute an action

a_{t}^{E}

and perceive the next state

s_{t + 1}

\leftarrow

\cup

{

(s_{t}, a_{t}^{E}, s_{t + 1})}

//append each state transition to B
end for
Update the dynamics network parameters

ϕ

by gradient descent

ϕ \leftarrow ϕ - \nabla_{ϕ} (\sum_{(s_{t}, a_{t}^{E}, s_{t + 1}) \in B} ‖ f_{ϕ} (s_{t}, a_{t}^{E}) - s_{t + 1} ‖_{2}^{2})

end for
end for
return

f_{ϕ}