Skip to main content
. 2024 Nov 6;13:103037. doi: 10.1016/j.mex.2024.103037
Algorithm 2: Proposed algorithm for Q-learning model.

  • Input: Learning rate α ε [0 1]
  • Output: Exploration parameter ε < 0
  • Initialization:
   ο Initialize Q(s,a) for all s, a except Q(terminal,) = 0
   ο For each episode:
   ο Get state S charging during the evening and early morning hours.
   ο For each step: Choose action A from S using behavioral policy
    • Take action A, observe reward R, next state S'
    • Q(S,A) ← Q(S,A) + α[R + γ maxa′(S',a') − Q(S,A)]
    • S ← S' =0