Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions

. 2025 Feb 21;25(5):1318. doi: 10.3390/s25051318

Algorithm 1 Knowledge Distillation

1:
Input: Teacher model $M_{t}$ , Student model $M_{s}$ , temperature T, balancing factor $α$
2:
for each batch of input data $(x, y)$ do
3:
Forward pass $M_{t}$ and $M_{s}$ to obtain logits $z_{t}$ and $z_{s}$
4:
Compute softened probabilities: $P_{t} = softmax (z_{t} / T)$ , $P_{s} = softmax (z_{s} / T)$
5:
Calculate $L_{task} = CE (y, P_{s})$
6:
Calculate $L_{distill} = T^{2} \cdot KL (P_{t} ‖ P_{s})$
7:
Compute total loss: $L = (1 - α) \cdot L_{task} + α \cdot L_{distill}$
8:
Update $M_{s}$ by minimizing $L$
9:
end for