Deep Reinforcement Learning for the Detection of Abnormal Data in Smart Meters

. 2022 Nov 6;22(21):8543. doi: 10.3390/s22218543

Algorithm 1 Q-learning algorithm.

1:
repeat
2:
each data item for each mini-batch sample
3:
using a greedy strategy, choose action $u_{t}$ , get reward $r_{t}$ , and reach a new state $x_{t + 1}$
4:
$Q (x_{t}, u_{t}) \leftarrow Q (x_{t}, u_{t}) + α [r_{t + 1} + γ max Q (x_{t + 1}, u_{t + 1}) - Q (x_{t}, u_{t})]$
5:
$x_{t} \leftarrow x_{t + 1}$
6:
until all $Q (x, u)$ reach a state of convergence