| Algorithm 2 Reinforcement learning and blockchain based (RLBC) routing algorithm. |
|
Input: Environment E; Action Space ; Initial State ; Reward Discount ; Learning Rate ; Output: Policy ; 1: (x,a) = 0, P(x,a) = ; 2: x = ; 3: for … do 4: a = (x); 5: r = reward by routing action a; 6: = next state by routing action a; 7: = (); 8: (x,a) = (x,a) + (r+(,) − (x,a)); 9: (x) = arg (x,)(x,); 10: x = ; 11: end for |