Skip to main content
. 2021 Mar 11;21(6):1960. doi: 10.3390/s21061960
Algorithm 1 DQN Algorithm
  • 1:

    CreateaReplayBufferB

  • 2:

    Initiaterandomlythelocalnetworkweightsθ

  • 3:

    Initiatetargetnetworkweightsθtθ

  • 4:

    foreachepisodeedo

  • 5:

        Initiatearandomstates

  • 6:

        for eachtransitiont do

  • 7:

            Selectactionaaccordingtoϵ-greedymodel

  • 8:

            Findthenext-stateandreward

  • 9:

            StoretheexperienceinbufferB

  • 10:

            Selectbatch-sizerandomexperiencesfrombufferB

  • 11:

            CalculateQ(s,a)usinglocalnetworkand

  • 12:

            thebatchofexperiences

  • 13:

            CalculateQ(s,a)usingtargetnetworkand

  • 14:

            thebatchofnext-states

  • 15:

            CalculatelossfunctionusingQ(s,a)andQ(s,a)

  • 16:

            if t%updating-interval==0 then

  • 17:

               θtθ

  • 18:

            end if

  • 19:

        end for

  • 20:

    end for