. Author manuscript; available in PMC: 2022 Dec 22.

Published in final edited form as: IEEE Trans Cybern. 2021 Dec 22;51(12):5717–5727. doi: 10.1109/TCYB.2019.2958912

Algorithm 2.

Incremental Model Learning for Stochastic Environment (ModelLearning)

(1)

Calculate the variation of the continuous state and the reward using equation (3)

(2)

A normalized procedure is employed to calculated a normalized variation vector,

v \leftarrow (Δ x_{N}^{a}, r_{N}^{a})

(3)

s ← ϕ(x_t)

(4)

Retrieve all clusters, C = {c₁, c₂, …, c_l}, from the cell of the model, M(s, a) using state-action pair, (s, a), |C| is the total number of cluster in this cell.

(5)

if |C| == 0 then

(6)

The first variation vector is set as the center of the first cluster, i.e., c₁ = v, c₁ ∈ C

(7)

else if |C| > 0 then

(8)

d_{\underline{c_{l}}} \leftarrow a r g \min_{c_{l} \in C} D (c_{l}, v)

(9)

d_{\underline{c_{l}}} > D_{t h}^{a}

then

(10)

Create a new cluster, c_l+1 = v, c_l+1 ∈ C

(11)

else

(12)

Activate the cluster,

\underline{c_{l}}

, and retrieve the information from this cluster,

Δ {\bar{x}}_{N_{\underline{c_{l}}}}^{a}

{(Δ {\bar{x}}_{N_{\underline{c_{l}}}}^{a})}^{'}

({\bar{r}}_{N_{\underline{c_{l}}}}^{a})

and

{({\bar{r}}_{N_{\underline{c_{l}}}}^{a})}^{'}

(13)

Calculate the new

Δ {\bar{x}}_{N_{\underline{c_{l}} + 1}}^{a}

, the new

{(Δ {\bar{x}}_{N_{\underline{c_{l}} + 1}}^{a})}^{'}

, the new

({\bar{r}}_{N_{\underline{c_{l}} + 1}}^{a})

and the new

{({\bar{r}}_{N_{\underline{c_{l}} + 1}}^{a})}^{'}

using the equations (5)–(8).

(14)

N_{\underline{c_{l}}} \leftarrow N_{\underline{c_{l}} + 1}

(15)

end if

(16)

end if

(17)

Update the variation transition function,

T_{p} (Δ {\bar{x}}_{N_{\underline{c_{l}} + 1}}^{a} | s, a)

, and the reward function, R_p(s′, s, a)

T_{p} (Δ {\bar{x}}_{N_{\underline{c_{l}} + 1}}^{a} | s, a) \leftarrow Δ {\bar{x}}_{N_{\underline{c_{l}} + 1}}^{a} R_{p} (s^{'}, s, a) \leftarrow {\bar{r}}_{N_{\underline{c_{l}} + 1}}^{a}

(18)

Store these two functions in the model, M(s, a).

M (s, a) \leftarrow (T_{p} (Δ {\bar{x}}_{N_{\underline{c_{l}} + 1}}^{a} | s, a), R_{p} (s^{'}, s, a))

(19)

return M(s, a)