Skip to main content
. 2025 Feb 21;25(5):1318. doi: 10.3390/s25051318
Algorithm 1 Knowledge Distillation
  • 1:

    Input: Teacher model Mt, Student model Ms, temperature T, balancing factor α

  • 2:

    for each batch of input data (x,y) do

  • 3:

        Forward pass Mt and Ms to obtain logits zt and zs

  • 4:

        Compute softened probabilities: Pt=softmax(zt/T), Ps=softmax(zs/T)

  • 5:

        Calculate Ltask=CE(y,Ps)

  • 6:

        Calculate Ldistill=T2·KL(PtPs)

  • 7:

        Compute total loss: L=(1α)·Ltask+α·Ldistill

  • 8:

        Update Ms by minimizing L

  • 9:

    end for