Deep Reinforcement Learning for CT-Based Non-Invasive Prediction of SOX9 Expression in Hepatocellular Carcinoma

. 2025 May 15;15(10):1255. doi: 10.3390/diagnostics15101255

Algorithm 1 Pseudocode of the alternating training.

1:
Inputs: Training data data $= (x_{i}, y_{i})$
2:
Outputs: the Classification Model M and the Generator G
3:
Initialize parameters of both models
4:
for each epoch in epochs do
5:
for each batch data_i in data do
6:
Freeze the parameters of the generator
7:
Calculate the loss value using the cross-entropy loss(Equation (2))
8:
Update the parameters of the classification model
9:
end for
10:
for each batch data_i in data do
11:
Freeze the parameters of the classification model
12:
Calculate the reward
13:
Update the parameters of the Proximal Policy Optimization-Clip approach (Equation (3))
14:
end for
15:
end for