Priority Control of Intelligent Connected Dedicated Bus Corridor Based on Deep Deterministic Policy Gradient

. 2025 Aug 4;25(15):4802. doi: 10.3390/s25154802

Algorithm 1 Deep deterministic policy gradient algorithm.

Input I: Status and reward $(s_{t}, r_{t}, s_{t + 1})$

Output I: Action $a_{t} \to Δ t_{i}$

1. for each bus, from $t = 1$ to $t = T$ ;

2. Initial test system parameters, including network parameters, reward and punishment functions, etc.

3. Combine with adjustment constraints and random factor, this model determined the action $a_{t}$ , $a_{t} \in A$ , and transfer the action into the simulation part.

4. The simulation part executes action $a_{t}$ , and it will receive the reward value $r_{t}$ and the new status $s_{t + 1}$ .

5. If the sample pool overflows, then delete the earliest sample records in chronological order.

6. The actor network will put the $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ into experience playback, which supplies the train data for the online network.

7. Sampling from the experience pool, gain N sets of sample data $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ as the training set for the online actor network and Q network.

8. Use the standard BP method to calculate the gradient of the online Q network.

9. Update the parameter $θ^{e}$ of the online Q network.

10. Calculate the policy gradient (PG) of the actor network.

11. Update the parameter $θ^{n}$ of the online actor network.

12. Update the parameters $n 1, e 1$ of the target network.

13. End for.