A Trusted Routing Scheme Using Blockchain and Reinforcement Learning for Wireless Sensor Networks

. 2019 Feb 25;19(4):970. doi: 10.3390/s19040970

Algorithm 2 Reinforcement learning and blockchain based (RLBC) routing algorithm.

Input: Environment E; Action Space

A

; Initial State

x_{0}

; Reward Discount

γ

; Learning Rate

α

;
Output: Policy

π

;
1:

Q_{t}

(x,a) = 0, P(x,a) =

\frac{1}{∣ A (x) ∣}

;
2: x =

x_{0}

;
3: for

T = 1, 2,

… do
4: a =

π^{p}

(x);
5: r = reward by routing action a;
6:

x^{'}

= next state by routing action a;
7:

a^{'}

π

(

x^{'}

);
8:

Q_{t}

(x,a) =

Q_{t}

(x,a) +

α

(r+

γ Q_{t}

(

x^{'}

a^{'}

) −

Q_{t}

(x,a));
9:

π

(x) = arg

\max_{a_{k}} q_{t}

(x,

a_{k}

)

\cdot Q_{t}

(x,

a_{k}

);
10: x =

x^{'}

;
11: end for