Skip to main content
. 2024 Mar 1;24(5):1632. doi: 10.3390/s24051632
Algorithm 1: Training process based on DDQN distributed algorithm
Input:
Environment simulator, DDQN neural network structure.
1: Start:
Initialize the model to generate sersors, AUs, and PUs;
Random initialization of deep neural networks as a function of Q.
Cycle:
2: for epoch e=1,2,3,,Epo
3:    Generate state S1.
4:        for step t=1,2,3,,T
5:            Select the P2P pair in the system. For the agent, the action at
            (transmission power and spectrum resources) is selected based on the policy.
6:            The environment generates reward rt and next state st+1.
7:            Collect experience (st,at,rt,st+1) and store it in the experience pool.
8:            if t=0 mod K
9:                Generate random numbers and then sample.
10:               Select the experience corresponding to the serial number to
                    train the neural network θ.
11:            end if
12:        end for
13: end for
Output: Well-trained neural network model