Skip to main content
. 2025 Jul 29;25(15):4685. doi: 10.3390/s25154685
Algorithm 1: B-PER DQN algorithm
1. Initialize the priority experience playback buffer D, the capacity is C
2. Initialize environment parameters and training parameters
3. For episode =1 to M do
4. Reset the environment, get the initial states, initialize episode reward =0
5.  While not done:
6.   Select the action according to Equation (8).
7.   Execute the action a and store s,s,r,d to D
8.   If C > 256:
9.   According to Equation (5), the sample of batch-size size is extracted from D
10.   Calculate and update network parameters according to Equation (11)
11.   End if
12.  Dynamically adjust τ according to Equation (10)
13.  Attenuation ε according to Equation (9)
14. End for