Skip to main content
. 2022 Jun 16;11:e76218. doi: 10.7554/eLife.76218

Figure 3. The validation of Selfee (Self-supervised Features Extraction) with human annotations.

(A) Visualization of fly courtship live-frames with t-SNE dimension reduction. Each dot was colored based on human annotations. Points representing chasing, wing extension, copulation attempt, copulation, and non-interactive behaviors (‘others’) were colored with yellow, green, blue, violet and red, respectively. (B) The confusion matrix of the k-NN classifier for fly courtship behavior, normalized by the numbers of each behavior in the ground truth. The average F1 score of the sevenfold cross-validation was 72.4%, and mAP was 75.8%. The recall of each class of behaviors was indicated on the diagonal of the confusion matrix. (C) A visualized comparison of labels produced by the k-NN classifier and human annotations of fly courtship behaviors. The k-NN classifier was constructed with data and labels of all seven videos used in the cross-validation, and the F1 score was 76.1% and mAP was 76.1%. (D) Visualization of live-frames of mice mating behaviors with t-SNE dimension reduction. Each dot is colored based on human annotations. Points representing non-interactive behaviors (‘others’), social interest, mounting, intromission, and ejaculation were colored with red, yellow, green, blue, and violet, respectively. (E) The confusion matrix of the LightGBM (Light Gradient Boosting Machine) classifier for mice mating behaviors, normalized by the numbers of each behavior in the ground truth. For the LightGBM classifier, the average F1 score of the eightfold cross-validation was 67.4%, and mAP was 69.1%. The recall of each class of behaviors was indicated on the diagonal of the confusion matrix. (F) A visualized comparison of labels produced by the LightGBM classifier and human annotations of mice mating behaviors. An ensemble of eight trained LightGBM was used, and the F1 sore was 68.1% and mAP was not available for this ensembled classifier due to the voting mechanism.

Figure 3.

Figure 3—figure supplement 1. Selfee (Self-supervised Features Extraction) captured fine-grained features related to animal postures and positions.

Figure 3—figure supplement 1.

(A) Visualization of fly courtship live-frames with t-SNE dimension reduction. Each dot was colored based on human annotations. Points representing non-interactive behaviors (‘others’), chasing, wing extension, copulation attempt, and copulation were colored with red, yellow, green, blue, and violet, respectively. Same as Figure 3A. (B) Fly skeletons were semi-automated labeled, and three features were used in the following panels. Five body parts were marked with points, including head, tail, thorax, wings, and head to tail, thorax to wings were linked by lines. Females were indicated by blue color and males were indicated by orange color. Three features were male head to female tail distance, male thorax to chamber center, and the angle between male wings, which were indicated in violet. (C) Male wing angles were visualized on the t-SNE map same as panel A. Frames of wing extension behaviors in the red box were of relatively smaller wing angles than those in the orange box. Some examples from these two groups were exhibited on the right, with their angle values below each image. (D) Male head to female tail distances were visualized on the t-SNE map same as panel A. Frames of wing extension behaviors in the red box were of relatively shorter distance than those in the orange and violet boxes. Some examples from these three groups were exhibited on the right, with their distance values below each image. (E) Male thorax to chamber center distances were visualized on the t-SNE map same as panel A. Frames of chasing behaviors in the violet box were of relatively shorter distance than those in the orange box. Some examples from these two groups were exhibited on the right, with their distance values below each image.
Figure 3—figure supplement 2. Difficulties on fly courtship behavior classification.

Figure 3—figure supplement 2.

(A) Some wing extension frames are hard to distinguish from chasing behaviors. Images in the first row were labeled as no wing extension; images in the second to the fourth rows were labeled as wing extension. Images in the second row were of relatively weak wing extension (blue indicators pointed at slightly extended wings), and the fourth row showed a process from no wing extension to strong wing extension. (B) The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in inferred labels. The average F1 score of the sevenfold cross-validation was 72.4%, and mAP was 75.8%. The precision of each class of behaviors was indicated on the diagonal of the confusion matrix.
Figure 3—figure supplement 3. Classification of mice mating behaviors with Selfee (Self-supervised Features Extraction) extracted features.

Figure 3—figure supplement 3.

(A) For the k-NN classifier, the average F1 score of the eightfold cross-validation was 59.0%, and mAP was 53.0%. The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in the ground truth. The recall of each class of behaviors was indicated on the diagonal of the confusion matrix. (B) The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in inferred labels. The precision of each class of behaviors was indicated on the diagonal of the confusion matrix. (C) The confusion matrix of the LightGBM (Light Gradient Boosting Machine) classifier, normalized by the numbers of each behavior in inferred labels. The precision of each class of behaviors was indicated on the diagonal of the confusion matrix. The LightGBM classifier had a much better performance compared with the k-NN classifier.
Figure 3—figure supplement 4. k-NN classification of rat behaviors with Selfee (Self-supervised Features Extraction) trained on mice datasets.

Figure 3—figure supplement 4.

(A) The average F1 score of the ninefold cross-validation was 49.6%, and mAP was 46.6%. The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in the ground truth. The recall of each class of behaviors was indicated on the diagonal of the confusion matrix. (B) The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in inferred labels. The precision of each class of behaviors was indicated on the diagonal of the confusion matrix.
Figure 3—figure supplement 5. Ablation test of Selfee (Self-supervised Features Extraction) training process on fly datasets.

Figure 3—figure supplement 5.

(A) The distribution of different behaviors in wild-type flies courtship videos. (B) Visualization of the same live-frames as Figure 3A with t-SNE dimension reduction. Used representations were extracted by models trained without cross-level instance-group discrimination (CLD) loss. Each dot is colored based on human annotations. The legend is shared with panel A. (C) The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in the ground truth. The recall of each class of behaviors was indicated on the diagonal of the confusion matrix. Used representations were extracted by models trained without CLD loss. (D) Collapse levels during the training process. Collapse level was calculated as one minus to the average standard deviation of each channel of the representation multiplied by the square root of the channel number. One means maximum collapse, while zero means no collapse. Without CLD loss, Selfee suffered from catastrophic mode collapse. Details for collapse level calculation could be found in Materials and methods. (E) Visualization of the same live-frames as Figure 3A with t-SNE dimension reduction. Used representations were extracted by models trained without Turbo transformation. Each dot is colored based on human annotations. The legend is shared with panel A. (F) The confusion matrix of the k-NN classifier, normalized by the numbers of each behavior in the ground truth. The recall of each class of behaviors was indicated on the diagonal of the confusion matrix. Used representations were extracted by models trained without Turbo transformation.
Figure 3—video 1. Pose estimation of fly courtship behaviors.
Download video file (8.5MB, mp4)
Flies’ wings, heads, tails, and thoraxes were tracked throughout the clip automatically using cutting-edge animal tracking software SLEAP, and each frame was carefully manual proofread by human researchers.