Skip to main content
. 2019 Feb 25;19(4):970. doi: 10.3390/s19040970
Algorithm 2 Reinforcement learning and blockchain based (RLBC) routing algorithm.
Input: Environment E; Action Space A; Initial State x0; Reward Discount γ; Learning Rate α;
Output: Policy π;
 1: Qt(x,a) = 0, P(x,a) = 1A(x);
 2: x = x0;
 3: for T=1,2,do
 4:   a = πp(x);
 5:   r = reward by routing action a;
 6:   x = next state by routing action a;
 7:   a = π(x);
 8:   Qt(x,a) = Qt(x,a) + α(r+γQt(x,a) − Qt(x,a));
 9:   π(x) = arg maxakqt(x,ak)·Qt(x,ak);
 10:   x = x;
 11: end for