Skip to main content
. 2022 Sep 15;22(18):6992. doi: 10.3390/s22186992
Algorithm 1 : Q-learning extended Kalman filter
1: x^P,0B=x^P,0E=x^P,0, PP,0B=PP,0E=PP,0                                                                     Initialization
2: k 0
3: Θ0
4: for each period, do
5:      for all aA, do
6:          CB0, CE0
7:          for t=1,2,,T, do
8:              kk+1
9:              [x^P,kB,PP,kB,y˜P,kB]EKF(x^P,k1B,PP,k1B,yP,k,QP,k,RP,k)                   Benchmark filter
10:            [x^P,kE,PP,kE,y˜P,kE]EKF(x^P,k1E,PP,k1E,yP,k,Q^P,k(s,a),RP,k)                 Exploring filter
11:            [x^P,k,PP,k,y˜P,k]EKF(x^P,k1,PP,k1,yP,k,Q^P,k(s,amax),RP,k)             Main filter
12:              CBCB+1T1[(y˜kB)TRk1y˜kBCB]
13:              CECE+1T1[(y˜kE)TRk1y˜kECE]
14:            end for
15:            R(s,a)CBCE                                                                             Calculation of Reward
16:            ΘΘ+α[R(s,a)+γmaxa(ΦT(s,a)Θ)ΦT(s,a)Θ]Φ(s,a)      Update of weight
17:            x^P,kEx^P,kB, PP,kEPP,kB                                                                    Resetting of exploring filter
18:        end for
19:        amaxarg maxa(ΦT(s,a)Θ)                                                              Selection of the best action
20: end for
21: return {x^P,k} and {PP,k}                                                                                 Result of state estimation