Hybrid Imitation Learning Framework for Robotic Manipulation Tasks

. 2021 May 13;21(10):3409. doi: 10.3390/s21103409

Algorithm 2. Behavior cloning.

Function Behavior_Cloning(

π_{θ}

d

)
/* the policy network

π_{θ}

the demo data

d = 〈(s_{1}^{E}, a_{1}^{E}), (s_{2}^{E}, a_{2}^{E}), \dots, (s_{l}^{E}, a_{l}^{E})〉

*/
Initialize the episodic buffer

B

to be empty
for t = 1,…,

l

timesteps do
Sample an action

a_{t} ~ π_{θ} (s_{t}^{E})

p_{t} = Ρ (a_{t} | s_{t}^{E}; π_{θ})

B \leftarrow B \cup \{(s_{t}^{E}, a_{t}^{E}, a_{t}, p_{t})\}

end for
Estimate the BC loss

L_{BC}

L_{BC} = \sum_{(s_{t}^{E}, a_{t}, a_{t}^{E}) \in B} ‖ a_{t} - a_{t}^{E} ‖_{2}^{2}

Calculate the degree of cloning

d_{c o n v}

d_{conv} = \frac{1}{l} \sum_{t = 1}^{l} p_{t}

return

L_{BC}, d_{conv}