|
Algorithm 1 Knowledge Distillation |
-
1:
Input: Teacher model , Student model , temperature T, balancing factor
-
2:
for each batch of input data
do
-
3:
Forward pass and to obtain logits and
-
4:
Compute softened probabilities: ,
-
5:
Calculate
-
6:
Calculate
-
7:
Compute total loss:
-
8:
Update by minimizing
-
9:
end for
|