Algorithm 1. Hybrid imitation learning framework. |
Function HIL(D, )
/* the demo dataset , the state recovery probability */ Initialize the policy network and the dynamics network = Pretrain_Dynamics_Network( , ) for e = 0,…, E epochs do for I = 0,…,|D| do Sample from Behavior_Cloning(, ) State_Cloning(, , ) Loss_Mixing(, ) Update the policy network parameters by gradient descent end for end for return |