|
Algorithm 1: Training process based on DDQN distributed algorithm |
| Input: |
| Environment simulator, DDQN neural network structure. |
| 1: Start: |
| Initialize the model to generate sersors, AUs, and PUs; |
| Random initialization of deep neural networks as a function of Q. |
| Cycle: |
| 2: for epoch
|
| 3: Generate state . |
| 4: for step
|
| 5: Select the P2P pair in the system. For the agent, the action
|
| (transmission power and spectrum resources) is selected based on the policy. |
| 6: The environment generates reward and next state . |
| 7: Collect experience (,,,) and store it in the experience pool. |
| 8: if mod K
|
| 9: Generate random numbers and then sample. |
| 10: Select the experience corresponding to the serial number to |
| train the neural network . |
| 11: end if |
| 12: end for |
| 13: end for |
| Output: Well-trained neural network model |