Fig. 6.
Learned weights after label-first training mirror conditional probabilities of features given a label (in this case, “dog”). Here, features that are less frequent in dogs (barking and big) receive a lower weight than features that are more frequent in dogs (small and tail-wagging). This differs from weight development in object-first training (Fig. 5), where weights correspond to the relevance of features for discrimination (in that case, size features are less relevant than the other features)