Skip to main content
. 2022 Nov 6;22(21):8543. doi: 10.3390/s22218543
Algorithm 1 Q-learning algorithm.
  • 1:

    repeat

  • 2:

       each data item for each mini-batch sample

  • 3:

       using a greedy strategy, choose action ut, get reward rt, and reach a new state xt+1

  • 4:

       Q(xt,ut)Q(xt,ut)+α[rt+1+γmaxQ(xt+1,ut+1)Q(xt,ut)]

  • 5:

       xtxt+1

  • 6:

    until all Q(x,u) reach a state of convergence