Algorithm 1.
1: Training set , number of training steps T, batch size B. |
2: Initialize the neural net params θ. |
3: Initialize baseline value. |
4: for t = 1 to T do |
5: Select a batch of samples Ci for i ∈ {1, ⋯ , B}. |
6: Sample solution πi based on pθ(·|Ci) for i ∈ {1, ⋯ , B}. |
7: Let gθ= . |
8: Update θ = ADAM(θ, gθ). |
9: Update baseline b(Ci) = b(Ci)+α(AC(πi|Ci) − b(Ci)) for i ∈ {1, ⋯ , B}. |
10: end for |
11: return neural net parameters θ. |