Skip to main content
. Author manuscript; available in PMC: 2022 Jul 11.
Published in final edited form as: Nat Genet. 2022 Feb 10;54(3):349–357. doi: 10.1038/s41588-021-01010-x

Table 1:

Performance comparison between classification and clustering with different encoders on sets of known disorders.

Test set Model Images
Supported syndromes Null top-1 accuracy Top-1 Top-5 Top-10 Top-30
Gallery Test
F2G-frequent Enc-F2G (softmax) - 2,669 299 0.33% 35.94% 52.45% 63.91% 78.13%
F2G-frequent Enc-F2G 19,950 2,669 299 0.33% 21.06% 39.62% 49.12% 67.98%
F2G-frequent Enc-healthy 19,950 2,669 299 0.33% 10.69% 23.69% 31.46% 50.80%

F2G-rare Enc-F2G 2,348.8 1,183.3 816 0.12% 13.66% 23.62% 29.56% 40.94%
F2G-rare Enc-healthy 2,348.8 1,183.3 816 0.12% 9.46% 16.87% 21.77% 31.77%

F2G-frequent Enc-F2G 22,298a 2,669 1,115c 0.09% 20.15% 37.81% 46.85% 64.21%
F2G-frequent Enc-healthy 22,298a 2,669 1,115c 0. 09% 9.70% 22.51% 29.80% 48.24%

F2G-rare Enc-F2G 22,298.8b 1,183.3 1,115c 0. 09% 7.07% 14.19% 17.67% 24.41%
F2G-rare Enc-healthy 22,298.8b 1,183.3 1,115c 0. 09% 4.02% 8.84% 11.73% 16.61%

The deep convolutional neural networks of Enc-F2G (softmax), Enc-F2G, and Enc-healthy have the same architecture. Training of Enc-F2G (softmax) and Enc-F2G was initiated with CASIA-WebFace and further fine-tuned on photos of patients in the Face2Gene frequent set. The Enc-F2G (softmax) model is the same as Enc-F2G, but using the softmax values of the layer instead of cosine distances between the FPDs in the CFPS. For the top-1 to top-30 columns, the best performance in each set is boldfaced. The numbers of images and syndromes in the rare set are averaged over ten splits. Enc-F2G outperformed Enc-healthy on both types of syndromes, showing the importance of fine-tuning on patient photos for learning facial dysmorphic features. The top-10 accuracy of Enc-F2G only drops by 2.27 percentage points (from 49.12% to 46.85%) after increasing the number of cases in the gallery and almost quadrupling the number of supported syndromes from 299 to 1,115.

a

Number of images in the frequent gallery + rare gallery.

b

Average of ten splits in the frequent gallery + rare gallery.

c

Number of syndromes in the frequent gallery + rare gallery.