Skip to main content
. 2022 Jan 13;22(2):611. doi: 10.3390/s22020611
Algorithm 1 The hierarchical feedback learning control of network component i
Inputs:
      The measured state si,t,
      learning rate α,
      balanced factor β, and discount factor λ.
Initialize:
      deep reinforcement learning models Wi,t
# Obtain next action and execute it.
ai,tπi,t
# Suppose ai,t is establishing the TCP flow for component j.
# Calculate reward function with Measured RTT.
Rsi,t,si,t+1ai,t = βSTjiRTjiLoss
# Re-calculate reward function with the feedback reward.
Rsi,t,si,t+1ai,t += (1β)rj
# Calculate TD-error
L(Wi,t)=Rsi,t,si,t+1ai,t+γmaxaQ(si,t+1,Ai,Wi,t)Q(si,t,ai,t,Wi,t)
# Update Wi,t to Wi,t+1
Wi,t+1Wi,tαdLdWi,t{L(Wi,t)}2
tt+1