Skip to main content
. 2022 Sep 14;22(18):6959. doi: 10.3390/s22186959
Algorithm 2 The proposed adaptation algorithm.
  • 1:

    Input

  • 2:

      {τT1,τT2,} A set of expert demonstrations on the target task

  • 3:

      {τS1,τS2,} A set of expert demonstrations on the source task

  • 4:

    Randomly initialize task embedding network E, generator G and discriminator D

  • 5:

    fork = 0, 1, 2, … do

  • 6:

      Sample an expert demonstration on the target task τTi

  • 7:

      Sample an expert demonstration on the source task τSi

  • 8:

      Sample state-action pairs (s^St,a^St)τSi and (s^Tt,a^Tt)τTi

  • 9:

      n ← uniform random number between 0 and 1

  • 10:

       if n<λ then           ▹ Review source task’s learned knowledge

  • 11:

        Compute zSt=E(s^St)

  • 12:

        Compute zTt=stopgrad(E(s^Tt))

  • 13:

        Generate action aSt=G(zSt)

  • 14:

        Compute the loss L=LE(zSt,zTt)+LGD(a^St,aSt)

  • 15:

      else                           ▹ Learn target task

  • 16:

        Compute zTt=E(s^Tt)

  • 17:

        Compute zSt=stopgrad(E(s^St))

  • 18:

        Generate action aTt=G(zTt)

  • 19:

        Compute the loss L=LE(zTt,zSt)+LGD(a^Tt,aTt)

  • 20:

      end if

  • 21:

      Update the parameters of F, G, and D

  • 22:

      Update policy πS with the reward signal r=logD(a^St)

  • 23:

    end for

  • 24:

    Output

  • 25:

      πST    Learned policy for both source and target task