Skip to main content
. Author manuscript; available in PMC: 2023 Jun 3.
Published in final edited form as: Biomed Phys Eng Express. 2022 Jun 3;8(4):10.1088/2057-1976/ac6d82. doi: 10.1088/2057-1976/ac6d82

Table 2.

The hyperparameters and their values used to train the VTP.

Hyperparameter Value Description
learning rate 1x10−5 The learning rate used by the VTP
minibatch size 16 The number of training samples that are used to update θi in Equation (4)
target update frequency 500 The frequency with which the target parameters θ+ are updated
discount factor 0.7 Discount factor γ used by the Q learning
initial exploration 0.999 Initial value of ε from ε-greedy exploration
final exploration 0.333 Final value of ε from ε-greedy exploration
replay memory 125000 The number of state action pairs that are stored
number of episodes 200 Total number of training episodes
number of steps 30 Maximum number of time steps in each episode