Figure 8.

Convergence of teacher and student models. Student model trained with KD converges quicker and to lower loss values when compared to the teacher model.

Convergence of teacher and student models. Student model trained with KD converges quicker and to lower loss values when compared to the teacher model.