| Algorithm 2: DQN-based dynamic SFC readjustment algorithm |
|
Input: The network state the set of SFC , dynamic SFC deployment policy . Output: Dynamic SFC readjustment policy . 1: Initialize the action-value function , where is the randomly generated neural network weights. 2: Initialize the target action-value function , where . 3: Initialize the experience pool with memory . 4: for episode in range (EPISODES): 5: Generate a new collection of SFCs. 6: Initialization state . 7: for step in range (STEPS): 8: Generate the set of nodes that need to be readjusted based on the state of the underlying network. 9: With probability , select an action at random. 10: Otherwise, select the action . 11: Execute readjustment action , . 12: Perform deployment with . 13: Observe reward . 14: Store transition in . 15: Sample random minibatch of transitions from D. 16: Set 17: Perform a gradient descent step on with respect to the network parameters . 18: Every steps, reset . 19: End. 20: End. |