Skip to main content
. 2022 May 4;22(9):3495. doi: 10.3390/s22093495
Algorithm 2 Dueling DQN Algorithm for Lower Tier Bandwidth Allocation
1: Initialize the action-value function Q, target action-value function Q^ the replay memory D to capacity N
2: Each MVNO receives a bandwidth cm  from the BS;
3: Each MVNO creates an action space Al;
4: Form = 1 to M, do
5:  MVNO randomly chooses an action aAl and performs a;
6:  MVNO allocates the bandwidth  cjm to users which are connected with it;
7:  Calculate the qjm as state s;
8: For t = 1, to T, do
9:   The agent gets the current state s;
10:   Choose an action aAl according to the policy of Dueling DQN;
11:   Calculate the total system utility fm according to (15);
12:   Calculate the total reward;
13:   The agent allocates the bandwidth to users and calculates the state after the selection action of this iteration as s;
14:   #Train Dueling DQN
15:   The agent i.e., each MVNO inputs (s,a, s,r) into the Dueling DQN;
16:   The agent store transition (s,a, s,r) in D;
17:   The agent sample random minibatch of transitions (s_,a_, s_,r_) from D;
18:  Set
y_={rr_+γmaxa*Q^(s_,a*;θ,α,β)if episode terminates at step _+1otherwise
19:   The agent perform a gradient descent step on (y_Q(s_,a_;θ,α,β))2 with respect to the network parameters θ, α and β;
20:   Every steps reset Q^=Q;
21: End for
22: End for