Effective TCP Flow Management Based on Hierarchical Feedback Learning in Complex Data Center Network

. 2022 Jan 13;22(2):611. doi: 10.3390/s22020611

Algorithm 1 The hierarchical feedback learning control of network component i

Inputs:

The measured state

s_{i, t}

learning rate

α

balanced factor

β

, and discount factor

λ

Initialize:

deep reinforcement learning models

W_{i, t}

# Obtain next action and execute it.

a_{i, t} \leftarrow π_{i, t}

# Suppose

a_{i, t}

is establishing the TCP flow for component j.

# Calculate reward function with Measured RTT.

R_{s_{i, t}, s_{i, t + 1}}^{a_{i, t}}

β \frac{S T_{j}^{i}}{R T_{j}^{i} \sqrt{L o s s}}

# Re-calculate reward function with the feedback reward.

R_{s_{i, t}, s_{i, t + 1}}^{a_{i, t}}

(1 - β) r_{j}

# Calculate TD-error

L (W_{i, t}) = R_{s_{i, t}, s_{i, t + 1}}^{a_{i, t}} + γ {max}_{a} Q (s_{i, t + 1}, A_{i}, W_{i, t}) - Q (s_{i, t}, a_{i, t}, W_{i, t})

# Update

W_{i, t}

W_{i, t + 1}

W_{i, t + 1} \leftarrow W_{i, t} - α \frac{d L}{d W_{i, t}} {L (W_{i, t})}^{2}

t \leftarrow t + 1