Skip to main content
. 2025 Oct 14;25(20):6354. doi: 10.3390/s25206354
Algorithm 2 DDPG for Hyperparameter Optimization
  • Require:

  •   1: env: Environment for RUL prediction

  •   2: state_dim: Dimension of state space

  •   3: action_dim: Dimension of action space

  •   4: action_range: Range of actions

  •   5: memory_capacity: Capacity of replay memory

  •   6: batch_size: Batch size for training DDPG

  •   7: gamma: Discount factor

  •   8: tau: Soft update coefficient

  •   9: actor_lr: Learning rate for actor network

  •  10: critic_lr: Learning rate for critic network

  • Ensure:

  •  11: ddpg_agent: Trained DDPG agent

  •  12: procedure DDPG(env, state_dim, action_dim, action_range, memory_capacity, batch_size, gamma, tau, actor_lr, critic_lr)

  •  13:     Initialize actor network μ and critic network Q

  •  14:     Initialize target networks μ and Q

  •  15:     Initialize replay memory

  •  16:     Initialize actor and critic optimizers

  •  17:     for each training step do

  •  18:         Obtain current state from environment

  •  19:         Select action using actor network

  •  20:         Execute action in environment, obtain reward and next state

  •  21:         Store transition in replay memory

  •  22:         Sample random batch from replay memory

  •  23:         Update critic network using sampled batch

  •  24:         Update actor network using sampled batch

  •  25:         Soft update target networks

  •  26:     end forreturn ddpg_agent

  •  27: end procedure