|
Algorithm 1 The hierarchical feedback learning control of network component i
|
| Inputs: |
| The measured state , |
| learning rate , |
| balanced factor , and discount factor . |
| Initialize: |
| deep reinforcement learning models
|
| # Obtain next action and execute it. |
|
| # Suppose is establishing the TCP flow for component j. |
| # Calculate reward function with Measured RTT. |
|
=
|
| # Re-calculate reward function with the feedback reward. |
|
+=
|
| # Calculate TD-error |
|
| # Update to
|
|
|