Deep Reinforcement Learning-Based Energy Consumption Optimization for Peer-to-Peer (P2P) Communication in Wireless Sensor Networks

. 2024 Mar 1;24(5):1632. doi: 10.3390/s24051632

Algorithm 1: Training process based on DDQN distributed algorithm

Input:

Environment simulator, DDQN neural network structure.

1: Start:

Initialize the model to generate sersors, AUs, and PUs;

Random initialization of deep neural networks as a function of Q.

Cycle:

2: for epoch

e = 1, 2, 3, \dots, E p o

3: Generate state

S_{1}

4: for step

t = 1, 2, 3, \dots, T

5: Select the P2P pair in the system. For the agent, the action

a_{t}

(transmission power and spectrum resources) is selected based on the policy.

6: The environment generates reward

r_{t}

and next state

s_{t + 1}

7: Collect experience (

s_{t}

a_{t}

r_{t}

s_{t + 1}

) and store it in the experience pool.

8: if

t = 0

mod K

9: Generate random numbers and then sample.

10: Select the experience corresponding to the serial number to

train the neural network

θ

11: end if

12: end for

13: end for

Output: Well-trained neural network model