| Algorithm 2: Proposed algorithm for Q-learning model. |
| • Input: Learning rate α ε [0 1] |
| • Output: Exploration parameter ε < 0 |
| • Initialization: |
| ο Initialize Q(s,a) for all s, a except Q(terminal,) = 0 |
| ο For each episode: |
| ο Get state S charging during the evening and early morning hours. |
| ο For each step: Choose action A from S using behavioral policy |
| • Take action A, observe reward R, next state S' |
| • Q(S,A) ← Q(S,A) + α[R + γ maxa′(S',a') − Q(S,A)] |
| • S ← S' =0 |