| Algorithm 1 DQN and Joint Bidding Algorithm for Upper Tier Bandwidth Allocation | ||
| 1: | Initialize the Bidding pool B of MVNO and corresponding lower tier action selection table A; | |
| 2: | Initialize the action-value function , target action-value function the replay memory D to capacity N | |
| 3: | Each MVNO estimates the maximum total needed rate and minimum total needed rate of linked users, then create the Bidding pool B; |
|
| 4: | For in B do | |
| 5: | Find the lower tier optimal allocation action and store it in table A; | |
| 6: | end for | |
| 7: | Random choose an action i.e., bidding value and BS distributes to each MVNO according to (2); | |
| 8: | Repeat | |
| 9: | For t = 1, to T, do | |
| 10: | Calculate the ratio of the allocated bandwidth to its required minimum rate, and take it as the current state S = s of the last iteration; | |
| 11: | For m = 1 to M, do | |
| 12: | Each MVNO m allocates optimal bandwidth to its users according to table A; | |
| 13: | Each MVNO m calculates the by (9) and (10); | |
| 14: | Each MVNO m calculates the penalty by (4); | |
| 15: | Each MVNO m and calculates the profit by (3) and get the reward ; | |
| 16: | End for | |
| 17: | Calculate the total system utility F according to (5); | |
| 18: | Calculate the total reward r; | |
| 19: | Choose an action i.e., bidding value according to the policy of DQN; | |
| 20: | InP distributes to each MVNO according to (2); | |
| 21: | Get the state S = s’ after the selection action of this iteration; | |
| 22: | #Train DQN | |
| 23: | The agent i.e., each MVNO inputs into the DQN; | |
| 24: | The agent stores transition in D; | |
| 25: | The agent sample random minibatch of transitions from D; | |
| 26: | Set |
|
| 27: | The agent perform a gradient descent step on with respect to the network parameters θ; | |
| 28: | Every steps reset ; | |
| 29: | End for | |
| 30: | Until The predefined maximum number of iterations has been completed. | |