. Author manuscript; available in PMC: 2020 May 29.

Published in final edited form as: Phys Med Biol. 2019 May 29;64(11):115013. doi: 10.1088/1361-6560/ab18bf

Table 1.

Hyperparameters to train the WTPN.

Hyperparameter	Value	Description
σ	5×10⁻⁴	Stopping criteria in Algorithm 1
β	5	Penalty parameter in Algorithm 1
n	4	Number of weights (OARs) to be tuned
γ	0.5	Discount factor
∈	0.99 ~ 0.1	Probability of ∈-greedy approach
N_patient	5	Number of training patient cases
N_epoch	100	Number of training epoch
N_train	25	Number of training steps in each epoch
N_update	10	Number of steps to update $\hat{W} = W$
δ	1×10⁻⁴	Learning rate (step size of gradient descent for W)