Algorithm 2. Behavior cloning. |
Function Behavior_Cloning(, )
/* the policy network the demo data */ Initialize the episodic buffer to be empty for t = 1,…, timesteps do Sample an action end for Estimate the BC loss Calculate the degree of cloning return |