Robust Spike-Based Continual Meta-Learning Improved by Restricted Minimum Error Entropy Criterion

. 2022 Mar 25;24(4):455. doi: 10.3390/e24040455

Algorithm 1 Training process in the reward learning process

Input: number of full episodes

K

, timesteps

T

, fixed parameters

θ_{o l d}

, target firing rate

f^{0}

, regularization hyper-parameters

µ_{v}

µ_{e}

µ_{f i r i n g}

, bandwidth

σ

, predicted value function

V_{θ} (t, k)

and sum of future rewards

R (t, k)

Output: total loss

L_{θ}

1.
Parameters setting: $f^{0}$ , $µ_{v}$ , $µ_{e}$ , $µ_{f i r i n g} a n d$ $σ$ .
2.
for n in batch size N:
3.
Set $e_{n} = R (t, k) - V_{θ} (t, k)$
4.
if number of literation is 0:
5.
$(φ_{0}, φ_{- 1}, φ_{1}) = (N, 0, 0)$
6.
else:

$(φ_{0}, φ_{- 1}, φ_{1}) = (# {e_{n} \in (- 0.5, 0.5)},$

$# {e_{n} \in (- 1, - 0.5)},$

$# {e_{n} \in (0.5, 1)})$

where #{·} indicates counting the samples that satisfy the condition
7.
$(u_{n}, v_{n}, s_{n}) = (- e x p (- \frac{e_{n}^{2}}{2 σ^{2}}), - e x p (- \frac{{(e_{n} + 1)}^{2}}{2 σ^{2}}), - e x p (- \frac{{(e_{n} - 1)}^{2}}{2 σ^{2}}))$
8.
$L_{n}^{R M E E}$ = $φ_{0} u_{n} e_{n}^{2} + φ_{- 1} v_{n} {(e_{n} + 1)}^{2} + φ_{1} s_{n} {(e_{n} - 1)}^{2}$
9.
end for
10.
for k in K:
11.
for t in T:

$L_{(t, k)}^{P P O} = O^{P P O} (θ_{o l d}, θ, t, k)$
12.
end for
13.
end for
14.
Calculate the total loss:

$L (e) = L_{p} (e) + J_{k} (e)$
15.
return $L (e)$