Skip to main content
. 2022 May 4;22(9):3495. doi: 10.3390/s22093495
Algorithm 1 DQN and Joint Bidding Algorithm for Upper Tier Bandwidth Allocation
1: Initialize the Bidding pool B of MVNO and corresponding lower tier action selection table A;
2: Initialize the action-value function Q, target action-value function Q^ the replay memory D to capacity N
3: Each MVNO mM estimates the maximum total needed rate and minimum total needed
rate of linked users, then create the Bidding pool B;
4: For bm in B do
5:   Find the lower tier optimal allocation action and store it in table A;
6: end for
7: Random choose an action at i.e., bidding value bmϵB and BS distributes cm  to each MVNO according to (2);
8: Repeat
9: For t = 1, to T, do
10:   Calculate the ratio of the allocated bandwidth to its required minimum rate, and take it as the current state S = s of the last iteration;
11:   For m = 1 to M, do
12:     Each MVNO m allocates optimal bandwidth cjm to its users according to table A;
13:     Each MVNO m calculates the vm by (9) and (10);
14:     Each MVNO m calculates the penalty qm by (4);
15:     Each MVNO m and calculates the profit ym by (3) and get the reward rm;
16:   End for
17:   Calculate the total system utility F according to (5);
18:   Calculate the total reward r;
19:   Choose an action at i.e., bidding value bmϵB according to the policy of DQN;
20:   InP distributes cm  to each MVNO according to (2);
21:   Get the state S = s’ after the selection action of this iteration;
22:   #Train DQN
23:   The agent i.e., each MVNO inputs (s,a, s,r) into the DQN;
24:   The agent stores transition (s,a, s,r) in D;
25:   The agent sample random minibatch of transitions (s_,a_, s_,r_) from D;
26:   Set
y_={rr_+γmaxa*Q^(s_,a*;θ)if episode terminates at step _+1otherwise
27:   The agent perform a gradient descent step on (y_Q(s_,a_;θ))2 with respect to the network parameters θ;
28:   Every steps reset Q^=Q;
29: End for
30: Until The predefined maximum number of iterations has been completed.