An Adaptive Framework for Remaining Useful Life Prediction Integrating Attention Mechanism and Deep Reinforcement Learning

. 2025 Oct 14;25(20):6354. doi: 10.3390/s25206354

Algorithm 2 DDPG for Hyperparameter Optimization

Require:
1: env: Environment for RUL prediction
2: state_dim: Dimension of state space
3: action_dim: Dimension of action space
4: action_range: Range of actions
5: memory_capacity: Capacity of replay memory
6: batch_size: Batch size for training DDPG
7: gamma: Discount factor
8: tau: Soft update coefficient
9: actor_lr: Learning rate for actor network
10: critic_lr: Learning rate for critic network
Ensure:
11: ddpg_agent: Trained DDPG agent
12: procedure DDPG( $e n v$ , $s t a t e_d i m$ , $a c t i o n_d i m$ , $a c t i o n_r a n g e$ , $m e m o r y_c a p a c i t y$ , $b a t c h_s i z e$ , $g a m m a$ , $t a u$ , $a c t o r_l r$ , $c r i t i c_l r$ )
13: Initialize actor network $μ$ and critic network Q
14: Initialize target networks $μ^{'}$ and $Q^{'}$
15: Initialize replay memory
16: Initialize actor and critic optimizers
17: for each training step do
18: Obtain current state from environment
19: Select action using actor network
20: Execute action in environment, obtain reward and next state
21: Store transition in replay memory
22: Sample random batch from replay memory
23: Update critic network using sampled batch
24: Update actor network using sampled batch
25: Soft update target networks
26: end forreturn ddpg_agent
27: end procedure