. Author manuscript; available in PMC: 2023 Jun 3.

Published in final edited form as: Biomed Phys Eng Express. 2022 Jun 3;8(4):10.1088/2057-1976/ac6d82. doi: 10.1088/2057-1976/ac6d82

Table 2.

The hyperparameters and their values used to train the VTP.

Hyperparameter	Value	Description
learning rate	1x10⁻⁵	The learning rate used by the VTP
minibatch size	16	The number of training samples that are used to update θ_i in Equation (4)
target update frequency	500	The frequency with which the target parameters θ⁺ are updated
discount factor	0.7	Discount factor γ used by the Q learning
initial exploration	0.999	Initial value of ε from ε-greedy exploration
final exploration	0.333	Final value of ε from ε-greedy exploration
replay memory	125000	The number of state action pairs that are stored
number of episodes	200	Total number of training episodes
number of steps	30	Maximum number of time steps in each episode