Skip to main content
. 2021 Mar 25;21(7):2302. doi: 10.3390/s21072302
Algorithm 1 Traffic Signal Control Using Parameterized Deep RL.
  • 1:

    Initialize: Learning rates {lrQ,lrx}, exploration parameter ϵ, minibatch size B, a probability distribution ζ, flow configurations, network weights ω0 and θ0.

  • 2:

    for episode e=1,E do

  • 3:

        Start simulation, observe initial state s0 and take initial joint action a0.

  • 4:

         for  t=1,T  do

  • 5:

             Compute action parameters dPtxdP(st;θt).

  • 6:
             Select action at=(Pt,dPt) according to ϵ-greedy policy.
    at=asamplefromζwithprobabilityϵ,(Pt,dPt)Pt=argmaxPQ(st,P,dPt;ωt)1ϵ.
  • 7:

             Perform at, observe next state st+1 and get Rt.

  • 8:

             Store <st,at,Rt,st+1> in memory M.

  • 9:

             Sample random B experiences from M.

  • 10:
    yt=Rtift=T,Rt+maxPγQ(st+1,P,xdP(st+1;θ);ωt)otherwise.
  • 11:

             ComputeωtQ(ωt)andθtQ(θ)using{yt,st,at}.

  • 12:

             update weights ωt+1ωtlrQωtQ(ωt) and θt+1θtlrxθtQ(θ).

  • 13:

         end for

  • 14:

    end for