Skip to main content
. 2021 May 13;21(10):3409. doi: 10.3390/s21103409
Algorithm 1. Hybrid imitation learning framework.
Function HIL(D, ρ)
/* the demo dataset D={sk1E,ak1E,, sklE,aklE|k=1,,n},
the state recovery probability ρ */
Initialize the policy network πθ and the dynamics network fϕ
fϕ = Pretrain_Dynamics_Network(fϕ,  D, E)
for e = 0,…, E epochs do
  for I = 0,…,|D| do
    Sample d=s1E,a1E, s2E,a2E,, slE,alE from D
    LBC,dconv= Behavior_Cloning(πθ, d)
LSC= State_Cloning(πθ, fϕ, d, ρ)
Lmix= Loss_Mixing(LBC, LSC,dconv)
    Update the policy network parameters θ by gradient descent
         θθθLmix
  end for
end for
return πθ