Algorithm 2.
Incremental Model Learning for Stochastic Environment (ModelLearning)
| (1) | Calculate the variation of the continuous state and the reward using equation (3) | |
| (2) | A normalized procedure is employed to calculated a normalized variation vector, | |
| (3) | s ← ϕ(xt) | |
| (4) | Retrieve all clusters, C = {c1, c2, …, cl}, from the cell of the model, M(s, a) using state-action pair, (s, a), |C| is the total number of cluster in this cell. | |
| (5) | if |C| == 0 then | |
| (6) | The first variation vector is set as the center of the first cluster, i.e., c1 = v, c1 ∈ C | |
| (7) | else if |C| > 0 then | |
| (8) | ||
| (9) | if then | |
| (10) | Create a new cluster, cl+1 = v, cl+1 ∈ C | |
| (11) | else | |
| (12) | Activate the cluster, , and retrieve the information from this cluster, , , and | |
| (13) | Calculate the new , the new , the new and the new using the equations (5)–(8). | |
| (14) | ||
| (15) | end if | |
| (16) | end if | |
| (17) | Update the variation transition function, , and the reward function, Rp(s′, s, a)
|
|
| (18) | Store these two functions in the model, M(s, a).
|
|
| (19) | return M(s, a) |