Skip to main content
. 2023 Mar 12;23(6):3054. doi: 10.3390/s23063054
Algorithm 2: DQN-based dynamic SFC readjustment algorithm
Input: The network state st the set of SFC G1,G2Gm, dynamic SFC deployment policy Π1.
Output: Dynamic SFC readjustment policy Π2.
1: Initialize the action-value function Q(st,a;θ), where θ is the randomly generated neural network weights.
2: Initialize the target action-value function Q^(st,a;θ), where θ=θ.
3: Initialize the experience pool D with memory N.
4: for episode in range (EPISODES):
5:  Generate a new collection of SFCs.
6:  Initialization state s.
7:  for step in range (STEPS):
8:     Generate the set of nodes that need to be readjusted based on the state of the underlying network.
9:     With probability ε, select an action at at random.
10:    Otherwise, select the action at=argmaxaQ(st,a;θ).
11:    Execute readjustment action at, stst.
12:    Perform deployment with Π1.
13:    Observe reward rt, stst+1.
14:    Store transition et=(st,at,rt,st+1) in D.
15:    Sample random minibatch of transitions (sj,aj,rj,sj+1) from D.
16:    Set yj=rj,rj=endrj+γmaxaQ^(sj+1,a;θ),rjend
17:    Perform a gradient descent step on (yjQ(sj+1,a;θ))2 with respect to the network parameters θ.
18:    Every C steps, reset Q^=Q.
19:  End.
20: End.