|
| |
| Algorithm 1. Learning progress of sensor vi | |
|
| |
| 1 | Initialise Q value for each available action arbitrarily; |
| 2 | for k = 0 to a predefined integer do; |
| 3 | calculate π; |
| 4 | for each available action a ∈ Ai do; |
| 5 | Qk+1(a) = Qk(a) + π (a)α1(∑a
(a)π(a) − Qk(a)); |
| 6 | end for |
| 7 | end for |
| 8 | aopti ← argMax(Q); |
| 9 | vi takes the action aopti; |
|
| |
(a)π(a) − Qk(a));