Skip to main content
. 2021 May 13;21(10):3409. doi: 10.3390/s21103409
Algorithm 2. Behavior cloning.
Function Behavior_Cloning(πθ, d)
/* the policy network πθ
the demo data d=s1E,a1E, s2E,a2E,, slE,alE */
Initialize the episodic buffer B to be empty
for t = 1,…, l timesteps do
Sample an action at~ πθstE
            pt=Ρ(at|stE;πθ)
          BBstE, atE,at,pt
end for
Estimate the BC loss LBC
          LBC=stE, at, atEBatatE22
Calculate the degree of cloning dconv
          dconv=1lt=1lpt
return LBC,dconv