Table 1.
Hyperparameter | Value | Description |
---|---|---|
σ | 5×10−4 | Stopping criteria in Algorithm 1 |
β | 5 | Penalty parameter in Algorithm 1 |
n | 4 | Number of weights (OARs) to be tuned |
γ | 0.5 | Discount factor |
∈ | 0.99 ~ 0.1 | Probability of ∈-greedy approach |
Npatient | 5 | Number of training patient cases |
Nepoch | 100 | Number of training epoch |
Ntrain | 25 | Number of training steps in each epoch |
Nupdate | 10 | Number of steps to update |
δ | 1×10−4 | Learning rate (step size of gradient descent for W) |