|
Algorithm 1 Pseudocode of the alternating training. |
-
1:
Inputs: Training data data
-
2:
Outputs: the Classification Model M and the Generator G
-
3:
Initialize parameters of both models
-
4:
for each epoch in epochs do
-
5:
for each batch datai in data do
-
6:
Freeze the parameters of the generator
-
7:
Calculate the loss value using the cross-entropy loss(Equation (2))
-
8:
Update the parameters of the classification model
-
9:
end for
-
10:
for each batch datai in data do
-
11:
Freeze the parameters of the classification model
-
12:
Calculate the reward
-
13:
Update the parameters of the Proximal Policy Optimization-Clip approach (Equation (3))
-
14:
end for
-
15:
end for
|