|
Algorithm 2 Dueling DQN Algorithm for Lower Tier Bandwidth Allocation |
| 1: |
Initialize the action-value function , target action-value function the replay memory D to capacity N
|
| 2: |
Each MVNO receives a bandwidth from the BS; |
| 3: |
Each MVNO creates an action space ; |
| 4: |
Form = 1 to M, do
|
| 5: |
MVNO randomly chooses an action and performs ; |
| 6: |
MVNO allocates the bandwidth to users which are connected with it; |
| 7: |
Calculate the as state ; |
| 8: |
For
t = 1, to T, do
|
| 9: |
The agent gets the current state ; |
| 10: |
Choose an action according to the policy of Dueling DQN; |
| 11: |
Calculate the total system utility according to (15); |
| 12: |
Calculate the total reward; |
| 13: |
The agent allocates the bandwidth to users and calculates the state after the selection action of this iteration as ; |
| 14: |
#Train Dueling DQN |
| 15: |
The agent i.e., each MVNO inputs into the Dueling DQN; |
| 16: |
The agent store transition in D; |
| 17: |
The agent sample random minibatch of transitions from D; |
| 18: |
Set
|
| 19: |
The agent perform a gradient descent step on with respect to the network parameters θ, and ; |
| 20: |
Every steps reset ; |
| 21: |
End for
|
| 22: |
End for |