Abstract
Background
Collectively, an estimated 5% of the population have a genetic disease. Many of them feature characteristics that can be detected by facial phenotyping. Face2Gene CLINIC is an online app for facial phenotyping of patients with genetic syndromes. DeepGestalt, the neural network driving Face2Gene, automatically prioritizes syndrome suggestions based on ordinary patient photographs, potentially improving the diagnostic process. Hitherto, studies on DeepGestalt’s quality highlighted its sensitivity in syndromic patients. However, determining the accuracy of a diagnostic methodology also requires testing of negative controls.
Objective
The aim of this study was to evaluate DeepGestalt's accuracy with photos of individuals with and without a genetic syndrome. Moreover, we aimed to propose a machine learning–based framework for the automated differentiation of DeepGestalt’s output on such images.
Methods
Frontal facial images of individuals with a diagnosis of a genetic syndrome (established clinically or molecularly) from a convenience sample were reanalyzed. Each photo was matched by age, sex, and ethnicity to a picture featuring an individual without a genetic syndrome. Absence of a facial gestalt suggestive of a genetic syndrome was determined by physicians working in medical genetics. Photos were selected from online reports or were taken by us for the purpose of this study. Facial phenotype was analyzed by DeepGestalt version 19.1.7, accessed via Face2Gene CLINIC. Furthermore, we designed linear support vector machines (SVMs) using Python 3.7 to automatically differentiate between the 2 classes of photographs based on DeepGestalt's result lists.
Results
We included photos of 323 patients diagnosed with 17 different genetic syndromes and matched those with an equal number of facial images without a genetic syndrome, analyzing a total of 646 pictures. We confirm DeepGestalt’s high sensitivity (top 10 sensitivity: 295/323, 91%). DeepGestalt’s syndrome suggestions in individuals without a craniofacially dysmorphic syndrome followed a nonrandom distribution. A total of 17 syndromes appeared in the top 30 suggestions of more than 50% of nondysmorphic images. DeepGestalt’s top scores differed between the syndromic and control images (area under the receiver operating characteristic [AUROC] curve 0.72, 95% CI 0.68-0.76; P<.001). A linear SVM running on DeepGestalt’s result vectors showed stronger differences (AUROC 0.89, 95% CI 0.87-0.92; P<.001).
Conclusions
DeepGestalt fairly separates images of individuals with and without a genetic syndrome. This separation can be significantly improved by SVMs running on top of DeepGestalt, thus supporting the diagnostic process of patients with a genetic syndrome. Our findings facilitate the critical interpretation of DeepGestalt’s results and may help enhance it and similar computer-aided facial phenotyping tools.
Keywords: facial phenotyping, DeepGestalt, facial recognition, Face2Gene, medical genetics, diagnostic accuracy, genetic syndrome, machine learning
Introduction
Background
Although individual genetic diseases are rare, they collectively affect an estimated 5% of a population [1]. Thus, these diseases represent a major challenge for health care systems, as it usually requires highly specialized knowledge to propose a specific genetic diagnosis. Assessing the facial phenotypes of patients with genetic syndromes is key to this diagnostic process [2]. Traditionally performed by a physician, the advents of computer vision and machine learning in medicine enable rapid and automated assessment of a patient's facial traits [3,4]. Numerous facial phenotyping systems have been developed with the potential to aid the diagnostic processes in medical genetics [5-12]. DeepGestalt, the neural network behind Face2Gene CLINIC, which was trained on more than 17,106 images, is thus far the best-investigated and most convenient to use application [11]. Several studies assessed the algorithm's sensitivity, suggesting that it is of a certain quality [11,13-38]. These tests predominantly analyzed images of patients diagnosed with a genetic disorder known to show characteristic facial features. This appears reasonable as DeepGestalt is designed to identify such syndromes. However, it might introduce a bias in conclusions of the system's everyday clinical use since not all individuals seen in a real-life setting belong to the group of patients included in previous studies of DeepGestalt. This may be because (1) the featured syndrome is yet to be analyzed by the system; (2) an individual features a syndrome not associated with a characteristic facies; or (3) an individual has no syndrome at all.
In addition to such evaluations of DeepGestalt's sensitivity, there is a need for studies on its specificity when tested on individuals without craniofacial dysmorphism. As DeepGestalt is not designed to suggest the class label “inconspicuous face” [11], evaluating its clinical specificity is not too trivial a task. Some studies tested the ability of DeepGestalt's methodology to distinguish between facial images with and without a genetic syndrome by constructing user-specific neural networks trained on healthy control images and on images of limited numbers of well-selected genetic disorders using Face2Gene RESEARCH [20,26-28,30,32,34,39-41]. Their results suggested that neural networks such as DeepGestalt may have the potential to differentiate between the 2 classes and may thus be used in diagnosing patients in medical genetics. Such a test could be applied at different stages of the diagnostic process. Patients who want to know if genetic counseling is necessary could use it as a triage test to check whether a suspicion of a genetic disease is justified. Physicians and other medical professionals could similarly use such a test on patients suspected of having a genetic syndrome to narrow down the range of possible diagnoses. Geneticists could use it as an add-on test to further confirm a diagnosis, for example, in the presence of a variant of unknown significance.
Objectives
We aimed to systematically benchmark DeepGestalt’s power to discern images of individuals with a dysmorphic genetic syndrome from images of healthy control individuals. For this purpose, we tested the basic prerequisite for the diagnostic usefulness of DeepGestalt, that is, to yield different scores in persons with a conventionally established diagnosis of a genetic syndrome than in persons without a genetic syndrome (H1: µsyndromic ≠ µhealthy). We also determined DeepGestalt’s capacity to distinguish those images by measuring its area under the receiver operating characteristic (AUROC) curve. Furthermore, we aimed to develop and test a machine learning–based approach to improve DeepGestalt's accuracy.
Methods
Selection and Analysis of Portrait Photos
Study Design
To be included in this study, portrait photos had to depict the entire frontal face (from hairline to chin showing both eyes) and no artifact other than glasses. To achieve a vertical positioning of the face, the images were cropped and rotated if necessary. A convenience sample of online accessible images was collected between September 2019 and December 2019, using a methodology adjusted from Ferry et al [8]. Pictures photographed by us were taken at the 2018 meeting of the Elterninitiative Apertsyndrom und Verwandte Fehlbildungen eV, a parents’ initiative on Apert syndrome and related disorders in Germany, after obtaining written informed consents as approved by the ethics committee of the Charité – Universitätsmedizin Berlin (EA2/190/16). Image inclusion was planned before conducting analysis by DeepGestalt. A sample size of the positive and negative class of 105 (N=210) was calculated using G*Power, version 3.1.9.7 (effect size 0.5; α=.05; power 0.95; allocation ratio 1).
Defining Reference Phenotypes
Only images of individuals reported to be clinically or molecularly diagnosed with a genetic syndrome were labeled as syndromic. When no syndrome was reported and no facial gestalt suggestive of a syndrome was observed, as judged by physicians working in medical genetics, images were labeled as “healthy.”
Computer-Aided Facial Phenotyping
Computer-aided facial phenotyping was performed using DeepGestalt version 19.1.7, accessed via Face2Gene CLINIC (FDNA Inc). Neither the class labels nor diagnoses were passed to DeepGestalt. No other phenotypic information but 1 portrait photo per case was entered into the system. DeepGestalt's training set was tested not to contain duplicates of images used in this study, as described previously [42].
Danyel Cohort
The Danyel cohort, originally described by Danyel et al [30], comprises 116 healthy control images.
Syndromic Cohort
This cohort comprises frontal facial images of 17 syndromes. We planned to collect the same number of images for each of these syndromes. A total of 16 of these syndromes were chosen from the 201 distinct suggestions in DeepGestalt’s top 30 results lists of the Danyel cohort. Syndromes of different frequencies ranging from 76% (frequently suggested) to 1% (rarely suggested) were selected. In descending order of frequency, these syndromes are as follows: Fragile X syndrome (OMIM: #300624), Angelman syndrome (OMIM: #105830), Rett syndrome (OMIM: #312750), Phelan-McDermid syndrome (OMIM: #606232), Klinefelter syndrome, Beckwith-Wiedemann syndrome (OMIM: #130650), 22q11.2 deletion syndrome (OMIM: #611867), Sotos syndrome (OMIM: #117550), Noonan syndrome (OMIM: PS163950), Loeys-Dietz syndrome (OMIM: PS609192), Williams-Beuren syndrome (OMIM: #194050), Rubinstein-Taybi syndrome (OMIM: PS180849), achondroplasia (OMIM: #100800), Wolf-Hirschhorn syndrome (OMIM: #194190), Pallister-Killian syndrome (OMIM: #601803), and Treacher Collins syndrome (OMIM: PS154500). In addition, we chose Apert syndrome (OMIM: #101200), which was not implied in the Danyel cohort.
Matched Control Cohort
Each photo of the syndromic cohort was matched to an image of an individual without a genetic syndrome by age, sex, and ethnicity to build a cohort of an equal number of control images.
Statistical Evaluation and Classification Experiments
Face2Gene CLINIC returns DeepGestalt’s top 30 syndrome suggestions. DeepGestalt associates each suggestion with a Gestalt score [11]. The syndrome suggestions’ frequencies, scores, and ranks were statistically evaluated.
Feature Extraction and Vector Construction
All images were labeled by class (syndromic vs healthy). Vectors were built to hold an attribute for any of the syndromes suggested at least once in DeepGestalt’s top 30 suggestions. To construct a vector for a given photo, the 30 highest Gestalt scores were assigned to their respective attributes; and the remaining attributes were set to 0 (s. matrix.txt in Multimedia Appendix 1).
Classification
To differentiate between syndromic and healthy portrait photos, we trained linear support vector machines (SVMs) using the LinearSVM class of scikit-learn, version 0.21.3, with default parameters in Python 3.7. To avoid overfitting, training and testing were performed using a leave-1-out classification scheme. Since ethnic background is a possible confounder of DeepGestalt [15,22,26,29,33], we designed classification experiments based on all images, images of White persons, and those of persons with other ethnicities, to benchmark the influence of ethnicity on SVM performance.
To test a possible influence of the number of top ranks considered, classification of all images was run 30 times with the number of considered top Gestalt ranks, ranging from 1 to 30.
Statistical Analysis
Scores of the syndromic and healthy control cohort were tested to be different using a 2-sided, independent Welch t test. Difference of receiver operating characteristics (ROCs) was tested using a DeLong test. Classification performance was assessed using Matthews correlation coefficient (MCC). All statistical tests were performed in Python 3.7; the code can be found in Multimedia Appendix 1.
Data and Code Availability
The data and code can be found in Multimedia Appendix 1. For reasons of data protection, all data were cumulated (where possible), deidentified, and minimized. Facial images depicted in Figure 1 show computer-generated composite masks and not real individuals. In Multimedia Appendix 1, file data.txt describes the diagnosis, age, sex, and ethnicity of persons in the analyzed set of images; and file matrix.txt contains DeepGestalt’s output vectors as used for this study. Files differentiator.py and reproduce.py may be used for reproducing the statistical results of this study. Further information may be found in file readme.txt (Multimedia Appendix 1).
Results
Included Images
We could include 19 images for each of the 17 syndromes in the syndromic cohort. A total of 83% (272/323) of these images were of White persons (file data.txt of Multimedia Appendix 1). Images from the syndromic cohort were matched to 323 images forming the matched control cohort, resulting in a total number of 646 analyzed photos (Figure 1).
Frequencies and Scores of Suggested Syndromes in Control Individuals
DeepGestalt suggested 238 different syndromes among the top 30 suggestions of the matched control cohort. One syndrome was suggested in more than 80% of the cases (Fragile X syndrome, 82%), 6 syndromes in 70%-80% of the cases; 4 syndromes in 60%-70% of the cases; 6 syndromes in 50%-60% of the cases; 6 syndromes in 40%-50% of the cases; 11 syndromes in 30%-40% of the cases; 15 syndromes in 20%-30% of the cases; 29 syndromes in 10%-20% of the cases; and 160 syndromes at least once in less than 10% of the cases (Figure 2A).
The highest first-rank Gestalt score of the matched control cohort amounted to 0.85, and the lowest, to 0.06, with a mean of 0.27 (SD 0.15). First-rank Gestalt scores of the syndromic cohort (highest 1.0; lowest 0.08; mean 0.47, SD 0.28) and the matched control cohort appeared to be separable with an AUROC of 0.72 (95% CI 0.68-0.76) (Figure 3A). Notably, this was found for both tested ethnic groups (Figure 3A, Multimedia Appendix 2), White persons only (AUROC 0.71, 95% CI 0.67-0.76; P<.001), and persons of other ethnicities only (AUROC 0.71, 95% CI 0.62-0.83; P<.001). Separability of the 2 cohorts is evident and significant (P<.001), as shown in Figure 3B.
Sensitivity of DeepGestalt
DeepGestalt’s average top 10 sensitivity in the syndromic cohort amounted to 91%, varying between the 17 tested syndromes (Figure 3C, Multimedia Appendix 3). Interestingly, DeepGestalt was sensitive independent of ethnicity (White persons only, 90%; persons of other ethnicities only, 97%). A total of 7 syndromes reached a top 10 sensitivity of 100% (Fragile X, Noonan, Phelan-McDermid, Rett, Sotos, Treacher-Collins, and Williams-Beuren syndromes). DeepGestalt performed worst for Loeys-Dietz syndrome, with a top 10 sensitivity of 74% (Figure 3C).
Performance of the SVM
Sensitivities of binary SVM classification differed between syndromes (Figure 2B). All images of individuals with Apert syndrome, Wolf-Hirschhorn syndrome, and Williams-Beuren syndrome were correctly classified as being syndromic. The SVM performed worst on the 19 images of individuals with Klinefelter syndrome, correctly classifying only 7 of them as syndromic.
Binary SVM classification of DeepGestalt’s output achieved an increased separability of syndromic images and healthy controls as compared to top Gestalt scores with an AUROC of 0.89 (95% CI 0.87-0.92) (Figure 3A). Again, this was true in both tested ethnic groups (Figure 3A), for photos of White persons (AUROC 0.88, 95% CI 0.86-0.91; P<.001) and those of persons of other ethnicities (AUROC 0.79, 95% CI 0.62-0.83). However, difference in ROCs was not significant in the latter (P=.13). SVM classification performance improved with an increasing number of considered ranks. Using the top 30 Gestalt scores showed the best MCC (0.63), as shown in Multimedia Appendix 4, with a sensitivity of 75.54% and a specificity of 86.38% (Figure 3D). Separability was significant (P<.001) (Figure 3E).
Discussion
Classification of Images of Individuals Without a Genetic Syndrome
To our knowledge, this is the first study to systematically analyze DeepGestalt’s behavior on portrait photos of individuals without a genetic syndrome. For these images, we show that DeepGestalt’s syndrome suggestions follow an interesting distribution. Certain syndromes are implied as differential diagnoses with a considerably high likelihood. Among these were Fragile X, Klinefelter, Rett, and Angelman syndromes, which were suggested in more than 3 quarters of the matched control cohort. In contrast, syndromes such as Treacher-Collins syndrome and Wolf-Hirschhorn syndrome were implied very rarely.
DeepGestalt cannot assign the class label “inconspicuous.” Yet, DeepGestalt’s scores are used to help judge the presence of a given syndrome. Based on a high maximum Gestalt score, a user could assume that the individual depicted in an entered image is likely to have a syndrome. Likewise, one is tempted to assume that a low maximum Gestalt score makes an underlying syndrome unlikely. Indeed, the mean of first-rank Gestalt scores is higher in images depicting syndromic facies than in images of individuals without a genetic syndrome. Similarly, scores higher than 0.85 appear to be specific indicators of a syndromic facies, and those lower than 0.08 are not suggestive of a genetic syndrome. However, these specific values are very rare. Gestalt scores alone are only fairly sufficient for judging the presence or absence of a genetic syndrome with facial dysmorphism since the distributions of the highest Gestalt scores of the syndromic and matched control cohort greatly overlap. We show that this problem can be reduced by considering both top Gestalt scores and the actual list of suggested syndrome matches. The boost in discriminatory power is illustrated by the increase of the respective AUROCs. Although DeepGestalt cannot directly assess the presence/absence of a syndromic facies, machine learning–based tools (eg, SVMs) built on top of DeepGestalt may be used for this purpose.
It is noteworthy that we achieved promising results with a comparably low number of samples and a low complexity classification model with default hyperparameters. We assume that the quality and complexity of future classifiers will improve as more data will become available. Increasing the number of top ranks considered for vector construction increased the performance of the SVM. However, the number of DeepGestalt’s suggestions accessible via Face2Gene CLINIC is limited to 30 suggestions. We hypothesize that using more than just the 30 top ranks for vector construction might further boost classification performance. We classified DeepGestalt’s output to predict the presence of a syndromic facies. We also suggest evaluating classification performance based on DeepGestalt’s input vectors.
Potential Confounders
Until now, differences in the diagnostic performance of DeepGestalt, which arise due to the ethnicity of the person depicted, have been evaluated using DeepGestalt's sensitivity. Studies of earlier versions of DeepGestalt showed that its sensitivity is dependent on the ethnic background in certain syndromes [15,22]. Studies of more recent versions of DeepGestalt suggested that ethnicity had no major influence on its sensitivity [26,29]. In our set of syndromic images, DeepGestalt’s sensitivity is remarkably high, which is in line with the previous studies highlighting DeepGestalt’s good general sensitivity [11,36,42]. This high sensitivity of DeepGestalt was confirmed for both groups of images, those of White persons and those of persons of other ethnicities. Improvement of distinguishability of images of individuals with and without a genetic syndrome appeared to be stronger in the group of photos of White persons than in the group of photos of persons of other ethnicities. However, we assume that this is caused by the limited sample size of images of non-White persons in our data set. We believe that our approach is also applicable to populations comprising predominantly other ethnicities.
The SVM had difficulties classifying images of patients with syndromes that were frequently suggested in healthy controls. Possible explanations for DeepGestalt’s output to be similar in controls and individuals with these syndromes could be as follows: (1) such syndromes have only mild characteristic facial features; (2) they have a typical facial gestalt, which is present only in some but not all affected individuals; or (3) they have no typical facies at all. For example, not all patients with Loeys-Dietz syndrome exhibit distinctive facial features [43], and the facial appearances of males with Klinefelter syndrome show no commonly observed characteristics [44].
Further Research
Further research is necessary to determine DeepGestalt’s capacity to distinguish individuals with and without a genetic syndrome when combined with other sources of information, such as genetic test results and nonfacial phenotypic information. We suggest including additional scores that are based on both phenotype and genotype (eg, prioritization of exome data by image analysis [PEDIA] scores [42]) in future classifiers of the presence/absence of a syndromic facies.
The increasing use and quality of facial phenotyping software in clinical genetics should also be accompanied by an ethical evaluation of these systems [45]. This affects issues such as the automation of medical diagnostic action, the sharing of (potentially identifiable) data, and a potentially altered doctor-patient relationship. In particular, a systematic analysis of the patient perspective on the use of computer-aided facial analysis methodologies in clinical genetics is lacking so far.
We believe that our findings will help improve future versions of DeepGestalt and similar systems and are crucial when interpreting Face2Gene’s results in the clinical routine. In particular, we recommend providing users with the false-positive rates of each suggested syndrome.
Conclusion
DeepGestalt is a computer-aided facial phenotyping tool that showed promising results for detecting a potentially syndromic facies. It yields higher first-rank scores in individuals with a genetic syndrome than in those without a diagnosis of a genetic syndrome. Its output may be classified to improve this detection. The exact stage to use DeepGestalt during the diagnostic makeup of individuals with a suspected genetic syndrome remains to be determined. Primarily, it should be used by expert geneticists.
Acknowledgments
We thank the members of the Elterninitiative Apertsyndrom und Verwandte Fehlbildungen eV, a parents’ initiative on Apert syndrome and related disorders in Germany, for the contribution of their images, and Yaron Gurovich and Nicole Fleischer of FDNA Inc for technical assistance in checking DeepGestalt’s training set for duplicate images used in this study. MAM is a participant in the BIH Charité Digital Clinician Scientist Program funded by the Charité – Universitätsmedizin Berlin and the Berlin Institute of Health. We acknowledge support from the German Research Foundation (DFG) and the Open Access Publication Funds of Charité – Universitätsmedizin Berlin.
Abbreviations
- AUROC
area under the receiver operating characteristic
- MCC
Matthews correlation coefficient
- PEDIA
prioritization of exome data by image analysis
- ROC
receiver operating characteristic
- SVM
support vector machine
Appendix
Code and data.
(A) Distribution of first-rank Gestalt scores for the images of White persons in the syndromic cohort and the matched control cohort (healthy). (B) Distribution of first-rank Gestalt scores for the images of persons with other ethnicities in the syndromic cohort and the matched control cohort (healthy).
DeepGestalt’s sensitivities: purple circles indicate the average of the entire syndromic cohort; for other symbols/coloring, see respective subfigure title.
Performance of the SVM on the entire syndromic cohort and matched control cohort: X-axis number of top-rank Gestalt score used for vector construction per case. MCC: Matthews correlation coefficient. Note: rising tendency.
Footnotes
Authors' Contributions: JTP, NH, and MAM designed the study. JTP, NH, MD, JE, ATAP, and MAM collected the data. SM, MS, DH, and CEO provided insights that were critical for the interpretation of data. MAM implemented the Python code with support from PH. PH and MAM performed the statistical analysis. JTP, NH, CEO, and MAM wrote the manuscript with approval of all the authors.
Conflicts of Interest: None declared.
References
- 1.Jackson M, Marks L, May GHW, Wilson JB. The genetic basis of disease. Essays Biochem. 2018 Dec 03;62(5):643–723. doi: 10.1042/EBC20170053. http://europepmc.org/abstract/MED/30509934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hart TC, Hart PS. Genetic studies of craniofacial anomalies: clinical implications and applications. Orthod Craniofac Res. 2009 Aug;12(3):212–20. doi: 10.1111/j.1601-6343.2009.01455.x. http://europepmc.org/abstract/MED/19627523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xie Q, Faust K, Van Ommeren R, Sheikh A, Djuric U, Diamandis P. Deep learning for image analysis: Personalizing medicine closer to the point of care. Crit Rev Clin Lab Sci. 2019 Jan;56(1):61–73. doi: 10.1080/10408363.2018.1536111. [DOI] [PubMed] [Google Scholar]
- 4.Dias R, Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019 Nov 19;11(1):70. doi: 10.1186/s13073-019-0689-8. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-019-0689-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Boehringer S, Vollmar T, Tasse C, Wurtz RP, Gillessen-Kaesbach G, Horsthemke B, Wieczorek D. Syndrome identification based on 2D analysis software. Eur J Hum Genet. 2006 Oct;14(10):1082–9. doi: 10.1038/sj.ejhg.5201673. doi: 10.1038/sj.ejhg.5201673. [DOI] [PubMed] [Google Scholar]
- 6.Vollmar T, Maus B, Wurtz RP, Gillessen-Kaesbach G, Horsthemke B, Wieczorek D, Boehringer S. Impact of geometry and viewing angle on classification accuracy of 2D based analysis of dysmorphic faces. Eur J Med Genet. 2008;51(1):44–53. doi: 10.1016/j.ejmg.2007.10.002. [DOI] [PubMed] [Google Scholar]
- 7.Boehringer S, Guenther M, Sinigerova S, Wurtz RP, Horsthemke B, Wieczorek D. Automated syndrome detection in a set of clinical facial photographs. Am J Med Genet A. 2011 Sep;155A(9):2161–9. doi: 10.1002/ajmg.a.34157. [DOI] [PubMed] [Google Scholar]
- 8.Ferry Q, Steinberg J, Webber C, FitzPatrick DR, Ponting CP, Zisserman A, Nellåker C. Diagnostically relevant facial gestalt information from ordinary photos. Elife. 2014 Jun 24;3:e02020. doi: 10.7554/eLife.02020. doi: 10.7554/eLife.02020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cerrolaza JJ, Porras AR, Mansoor A, Zhao Q, Summar M, Linguraru MG. Identification of dysmorphic syndromes using landmark-specific local texture descriptors Internet. IEEE 13th International Symposium on Biomedical Imaging (ISBI); 13-16 April 2016; Prague, Czech Republic. 2016. [DOI] [Google Scholar]
- 10.Tu L, Porras A, Boyle A, Linguraru M. Analysis of 3D Facial Dysmorphology in Genetic Syndromes from Unconstrained 2D Photographs Internet. In: Frangi A, Schnabel J, Davatzikos C, Alberola-López C, Fichtinger G, editors. Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science, vol 11070. Cham: Springer; 2018. pp. 347–355. [Google Scholar]
- 11.Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, Basel-Salmon L, Krawitz PM, Kamphausen SB, Zenker M, Bird LM, Gripp KW. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med. 2019 Jan;25(1):60–64. doi: 10.1038/s41591-018-0279-0. [DOI] [PubMed] [Google Scholar]
- 12.Dudding-Byth T, Baxter A, Holliday EG, Hackett A, O'Donnell S, White SM, Attia J, Brunner H, de Vries B, Koolen D, Kleefstra T, Ratwatte S, Riveros C, Brain S, Lovell BC. Computer face-matching technology using two-dimensional photographs accurately matches the facial gestalt of unrelated individuals with the same syndromic form of intellectual disability. BMC Biotechnol. 2017 Dec 19;17(1):90. doi: 10.1186/s12896-017-0410-1. https://bmcbiotechnol.biomedcentral.com/articles/10.1186/s12896-017-0410-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Basel-Vanagaite L, Wolf L, Orin M, Larizza L, Gervasini C, Krantz ID, Deardoff MA. Recognition of the Cornelia de Lange syndrome phenotype with facial dysmorphology novel analysis. Clin Genet. 2016 May;89(5):557–63. doi: 10.1111/cge.12716. [DOI] [PubMed] [Google Scholar]
- 14.Gripp KW, Baker L, Telegrafi A, Monaghan KG. The role of objective facial analysis using FDNA in making diagnoses following whole exome analysis. Report of two patients with mutations in the BAF complex genes. Am J Med Genet A. 2016 Jul;170(7):1754–62. doi: 10.1002/ajmg.a.37672. [DOI] [PubMed] [Google Scholar]
- 15.Lumaka A, Cosemans N, Lulebo Mampasi A, Mubungu G, Mvuama N, Lubala T, Mbuyi-Musanzayi S, Breckpot J, Holvoet M, de Ravel T, Van Buggenhout G, Peeters H, Donnai D, Mutesa L, Verloes A, Lukusa Tshilobo P, Devriendt K. Facial dysmorphism is influenced by ethnic background of the patient and of the evaluator. Clin Genet. 2017 Aug;92(2):166–171. doi: 10.1111/cge.12948. [DOI] [PubMed] [Google Scholar]
- 16.Hadj-Rabia S, Schneider H, Navarro E, Klein O, Kirby N, Huttner K, Wolf L, Orin M, Wohlfart S, Bodemer C, Grange DK. Automatic recognition of the XLHED phenotype from facial images. Am J Med Genet A. 2017 Sep;173(9):2408–2414. doi: 10.1002/ajmg.a.38343. [DOI] [PubMed] [Google Scholar]
- 17.Gardner OK, Haynes K, Schweitzer D, Johns A, Magee WP, Urata MM, Sanchez-Lara PA. Familial Recurrence of 3MC Syndrome in Consanguineous Families: A Clinical and Molecular Diagnostic Approach With Review of the Literature. Cleft Palate Craniofac J. 2017 Nov;54(6):739–748. doi: 10.1597/15-151. [DOI] [PubMed] [Google Scholar]
- 18.Valentine M, Bihm DCJ, Wolf L, Hoyme HE, May PA, Buckley D, Kalberg W, Abdul-Rahman OA. Computer-Aided Recognition of Facial Attributes for Fetal Alcohol Spectrum Disorders. Pediatrics. 2017 Dec;140(6):e20162028. doi: 10.1542/peds.2016-2028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Knaus A, Pantel JT, Pendziwiat M, Hajjir N, Zhao M, Hsieh T, Schubach M, Gurovich Y, Fleischer N, Jäger Marten, Köhler Sebastian, Muhle H, Korff C, Møller Rikke S, Bayat A, Calvas P, Chassaing N, Warren H, Skinner S, Louie R, Evers C, Bohn M, Christen H, van den Born M, Obersztyn E, Charzewska A, Endziniene M, Kortüm Fanny, Brown N, Robinson PN, Schelhaas HJ, Weber Y, Helbig I, Mundlos S, Horn D, Krawitz PM. Characterization of glycosylphosphatidylinositol biosynthesis defects by clinical features, flow cytometry, and automated image analysis. Genome Med. 2018 Jan 09;10(1):3. doi: 10.1186/s13073-017-0510-5. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0510-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liehr T, Acquarola N, Pyle K, St-Pierre S, Rinholm M, Bar O, Wilhelm K, Schreyer I. Next generation phenotyping in Emanuel and Pallister-Killian syndrome using computer-aided facial dysmorphology analysis of 2D photos. Clin Genet. 2018 Feb;93(2):378–381. doi: 10.1111/cge.13087. [DOI] [PubMed] [Google Scholar]
- 21.Zarate YA, Smith-Hicks CL, Greene C, Abbott M, Siu VM, Calhoun ARUL, Pandya A, Li C, Sellars EA, Kaylor J, Bosanko K, Kalsner L, Basinger A, Slavotinek AM, Perry H, Saenz M, Szybowska M, Wilson LC, Kumar A, Brain C, Balasubramanian M, Dubbs H, Ortiz-Gonzalez XR, Zackai E, Stein Q, Powell CM, Schrier Vergano S, Britt A, Sun A, Smith W, Bebin EM, Picker J, Kirby A, Pinz H, Bombei H, Mahida S, Cohen JS, Fatemi A, Vernon HJ, McClellan R, Fleming LR, Knyszek B, Steinraths M, Velasco Gonzalez C, Beck AE, Golden-Grant KL, Egense A, Parikh A, Raimondi C, Angle B, Allen W, Schott S, Algrabli A, Robin NH, Ray JW, Everman DB, Gambello MJ, Chung WK. Natural history and genotype-phenotype correlations in 72 individuals with SATB2-associated syndrome. Am J Med Genet A. 2018 Apr;176(4):925–935. doi: 10.1002/ajmg.a.38630. [DOI] [PubMed] [Google Scholar]
- 22.Pantel JT, Zhao M, Mensah MA, Hajjir N, Hsieh T, Hanani Y, Fleischer N, Kamphans T, Mundlos S, Gurovich Y, Krawitz PM. Advances in computer-assisted syndrome recognition by the example of inborn errors of metabolism. J Inherit Metab Dis. 2018 May;41(3):533–539. doi: 10.1007/s10545-018-0174-3. http://europepmc.org/abstract/MED/29623569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ferreira CR, Altassan R, Marques-Da-Silva D, Francisco R, Jaeken J, Morava E. Recognizable phenotypes in CDG. J Inherit Metab Dis. 2018 May;41(3):541–553. doi: 10.1007/s10545-018-0156-5. http://europepmc.org/abstract/MED/29654385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jiang Y, Wangler MF, McGuire AL, Lupski JR, Posey JE, Khayat MM, Murdock DR, Sanchez-Pulido L, Ponting CP, Xia F, Hunter JV, Meng Q, Murugan M, Gibbs RA. The phenotypic spectrum of Xia-Gibbs syndrome. Am J Med Genet A. 2018 Jun;176(6):1315–1326. doi: 10.1002/ajmg.a.38699. http://europepmc.org/abstract/MED/29696776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Graul-Neumann LM, Mensah MA, Klopocki E, Uebe S, Ekici AB, Thiel CT, Reis A, Zweier C. Biallelic intragenic deletion in MASP1 in an adult female with 3MC syndrome. Eur J Med Genet. 2018 Jul;61(7):363–368. doi: 10.1016/j.ejmg.2018.01.016. [DOI] [PubMed] [Google Scholar]
- 26.Vorravanpreecha N, Lertboonnum T, Rodjanadit R, Sriplienchan P, Rojnueangnit K. Studying Down syndrome recognition probabilities in Thai children with de-identified computer-aided facial analysis. Am J Med Genet A. 2018 Sep;176(9):1935–1940. doi: 10.1002/ajmg.a.40483. [DOI] [PubMed] [Google Scholar]
- 27.Martinez-Monseny A, Cuadras D, Bolasell M, Muchart J, Arjona C, Borregan M, Algrabli A, Montero R, Artuch R, Velázquez-Fragua R, Macaya A, Pérez-Cerdá C, Pérez-Dueñas B, Pérez B, Serrano M, Spanish Consortium CDG. From gestalt to gene: early predictive dysmorphic features of PMM2-CDG. J Med Genet. 2019 Apr;56(4):236–245. doi: 10.1136/jmedgenet-2018-105588. [DOI] [PubMed] [Google Scholar]
- 28.Pascolini G, Fleischer N, Ferraris A, Majore S, Grammatico P. The facial dysmorphology analysis technology in intellectual disability syndromes related to defects in the histones modifiers. J Hum Genet. 2019 Aug;64(8):721–728. doi: 10.1038/s10038-019-0598-0. [DOI] [PubMed] [Google Scholar]
- 29.Mishima Hiroyuki, Suzuki Hisato, Doi Michiko, Miyazaki Mutsuko, Watanabe A, Matsumoto Tadashi, Morifuji Kanako, Moriuchi Hiroyuki, Yoshiura Koh-Ichiro, Kondoh Tatsuro, Kosaki Kenjiro. Evaluation of Face2Gene using facial images of patients with congenital dysmorphic syndromes recruited in Japan. J Hum Genet. 2019 Aug;64(8):789–794. doi: 10.1038/s10038-019-0619-z. [DOI] [PubMed] [Google Scholar]
- 30.Danyel M, Cheng Z, Jung C, Boschann F, Pantel JT, Hajjir N, Flöttmann R, Schulz S, Demuth I, Sheridan E, Mundlos S, Horn D, Mensah MA. Differentiation of MISSLA and Fanconi anaemia by computer-aided image analysis and presentation of two novel MISSLA siblings. Eur J Hum Genet. 2019 Dec;27(12):1827–1835. doi: 10.1038/s41431-019-0469-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pascolini G, Valiante M, Bottillo I, Laino L, Fleischer N, Ferraris A, Grammatico P. Striking phenotypic overlap between Nicolaides-Baraitser and Coffin-Siris syndromes in monozygotic twins with ARID1B intragenic deletion. Eur J Med Genet. 2020 Mar;63(3):103739. doi: 10.1016/j.ejmg.2019.103739. [DOI] [PubMed] [Google Scholar]
- 32.Kruszka P, Hu T, Hong S, Signer R, Cogné B, Isidor B, Mazzola SE, Giltay JC, van Gassen KLI, England EM, Pais L, Ockeloen CW, Sanchez-Lara PA, Kinning E, Adams DJ, Treat K, Torres-Martinez W, Bedeschi MF, Iascone M, Blaney S, Bell O, Tan TY, Delrue M, Jurgens J, Barry BJ, Engle EC, Savage SK, Fleischer N, Martinez-Agosto JA, Boycott K, Zackai EH, Muenke M. Phenotype delineation of ZNF462 related syndrome. Am J Med Genet A. 2019 Oct;179(10):2075–2082. doi: 10.1002/ajmg.a.61306. http://europepmc.org/abstract/MED/31361404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fung JLF, Rethanavelu K, Luk H, Ho MSP, Lo IFM, Chung BHY. Coffin-Lowry syndrome in Chinese. Am J Med Genet A. 2019 Oct;179(10):2043–2048. doi: 10.1002/ajmg.a.61323. [DOI] [PubMed] [Google Scholar]
- 34.Weiss K, Lazar HP, Kurolap A, Martinez AF, Paperna T, Cohen L, Smeland MF, Whalen S, Heide S, Keren B, Terhal P, Irving M, Takaku M, Roberts JD, Petrovich RM, Schrier Vergano SA, Kenney A, Hove H, DeChene E, Quinonez SC, Colin E, Ziegler A, Rumple M, Jain M, Monteil D, Roeder ER, Nugent K, van Haeringen A, Gambello M, Santani A, Medne L, Krock B, Skraban CM, Zackai EH, Dubbs HA, Smol T, Ghoumid J, Parker MJ, Wright M, Turnpenny P, Clayton-Smith J, Metcalfe K, Kurumizaka H, Gelb BD, Baris Feldman H, Campeau PM, Muenke M, Wade PA, Lachlan K. The CHD4-related syndrome: a comprehensive investigation of the clinical spectrum, genotype-phenotype correlations, and molecular basis. Genet Med. 2020 Feb;22(2):389–397. doi: 10.1038/s41436-019-0612-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zarate YA, Bosanko KA, Gripp KW. Using facial analysis technology in a typical genetic clinic: experience from 30 individuals from a single institution. J Hum Genet. 2019 Dec;64(12):1243–1245. doi: 10.1038/s10038-019-0673-6. [DOI] [PubMed] [Google Scholar]
- 36.Narayanan DL, Ranganath P, Aggarwal S, Dalal A, Phadke SR, Mandal K. Computer-aided Facial Analysis in Diagnosing Dysmorphic Syndromes in Indian Children. Indian Pediatr. 2019 Dec 15;56(12):1017–1019. https://www.indianpediatrics.net/dec2019/1017.pdf. [PubMed] [Google Scholar]
- 37.Latorre-Pellicer A, Ascaso Á, Trujillano L, Gil-Salvador M, Arnedo M, Lucia-Campos C, Antoñanzas-Pérez Rebeca, Marcos-Alcalde I, Parenti I, Bueno-Lozano G, Musio A, Puisac B, Kaiser FJ, Ramos FJ, Gómez-Puertas Paulino, Pié Juan. Evaluating Face2Gene as a Tool to Identify Cornelia de Lange Syndrome by Facial Phenotypes. Int J Mol Sci. 2020 Feb 04;21(3):1042. doi: 10.3390/ijms21031042. https://www.mdpi.com/resolver?pii=ijms21031042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Arora V, Puri RD, Bijarnia-Mahay S, Verma IC. Expanding the phenotypic and genotypic spectrum of Wiedemann-Steiner syndrome: First patient from India. Am J Med Genet A. 2020 May;182(5):953–956. doi: 10.1002/ajmg.a.61534. [DOI] [PubMed] [Google Scholar]
- 39.Carli D, Giorgio E, Pantaleoni F, Bruselles A, Barresi S, Riberi E, Licciardi F, Gazzin A, Baldassarre G, Pizzi S, Niceta M, Radio FC, Molinatto C, Montin D, Calvo PL, Ciolfi A, Fleischer N, Ferrero GB, Brusco A, Tartaglia M. NBAS pathogenic variants: Defining the associated clinical and facial phenotype and genotype-phenotype correlations. Hum Mutat. 2019 Jun;40(6):721–728. doi: 10.1002/humu.23734. [DOI] [PubMed] [Google Scholar]
- 40.Staufner C, Peters B, Wagner M, Alameer S, Barić I, Broué P, Bulut D, Church JA, Crushell E, Dalgıç B, Das AM, Dick A, Dikow N, Dionisi-Vici C, Distelmaier F, Bozbulut NE, Feillet F, Gonzales E, Hadzic N, Hauck F, Hegarty R, Hempel M, Herget T, Klein C, Konstantopoulou V, Kopajtich R, Kuster A, Laass MW, Lainka E, Larson-Nath C, Leibner A, Lurz E, Mayr JA, McKiernan P, Mention K, Moog U, Mungan NO, Riedhammer KM, Santer R, Palafoll IV, Vockley J, Westphal DS, Wiedemann A, Wortmann SB, Diwan GD, Russell RB, Prokisch H, Garbade SF, Kölker S, Hoffmann GF, Lenz D. Defining clinical subgroups and genotype-phenotype correlations in NBAS-associated disease across 110 patients. Genet Med. 2020 Mar;22(3):610–621. doi: 10.1038/s41436-019-0698-4. [DOI] [PubMed] [Google Scholar]
- 41.Myers L, Anderlid B, Nordgren A, Lundin K, Kuja-Halkola R, Tammimies K, Bölte Sven. Clinical versus automated assessments of morphological variants in twins with and without neurodevelopmental disorders. Am J Med Genet A. 2020 May 12;182(5):1177–1189. doi: 10.1002/ajmg.a.61545. [DOI] [PubMed] [Google Scholar]
- 42.Hsieh T, Mensah MA, Pantel JT, Aguilar D, Bar O, Bayat A, Becerra-Solano L, Bentzen HB, Biskup S, Borisov O, Braaten O, Ciaccio C, Coutelier M, Cremer K, Danyel M, Daschkey S, Eden HD, Devriendt K, Wilson S, Douzgou S, Đukić D, Ehmke N, Fauth C, Fischer-Zirnsak B, Fleischer N, Gabriel H, Graul-Neumann L, Gripp KW, Gurovich Y, Gusina A, Haddad N, Hajjir N, Hanani Y, Hertzberg J, Hoertnagel K, Howell J, Ivanovski I, Kaindl A, Kamphans T, Kamphausen S, Karimov C, Kathom H, Keryan A, Knaus A, Köhler S, Kornak U, Lavrov A, Leitheiser M, Lyon GJ, Mangold E, Reina PM, Carrascal AM, Mitter D, Herrador LM, Nadav G, Nöthen M, Orrico A, Ott C, Park K, Peterlin B, Pölsler L, Raas-Rothschild A, Randolph L, Revencu N, Fagerberg CR, Robinson PN, Rosnev S, Rudnik S, Rudolf G, Schatz U, Schossig A, Schubach M, Shanoon O, Sheridan E, Smirin-Yosef P, Spielmann M, Suk E, Sznajer Y, Thiel CT, Thiel G, Verloes A, Vrecar I, Wahl D, Weber I, Winter K, Wiśniewska M, Wollnik B, Yeung MW, Zhao M, Zhu N, Zschocke J, Mundlos S, Horn D, Krawitz PM. PEDIA: prioritization of exome data by image analysis. Genet Med. 2019 Dec;21(12):2807–2814. doi: 10.1038/s41436-019-0566-2. http://europepmc.org/abstract/MED/31164752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.MacCarrick G, Black JH, Bowdin S, El-Hamamsy I, Frischmeyer-Guerrerio PA, Guerrerio AL, Sponseller PD, Loeys B, Dietz HC. Loeys-Dietz syndrome: a primer for diagnosis and management. Genet Med. 2014 Aug;16(8):576–87. doi: 10.1038/gim.2014.11. http://europepmc.org/abstract/MED/24577266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bird RJ, Hurren BJ. Anatomical and clinical aspects of Klinefelter's syndrome. Clin Anat. 2016 Jul;29(5):606–19. doi: 10.1002/ca.22695. [DOI] [PubMed] [Google Scholar]
- 45.Martinez-Martin N. What Are Important Ethical Implications of Using Facial Recognition Technology in Health Care? AMA J Ethics. 2019 Mar 01;21(2):E180–187. doi: 10.1001/amajethics.2019.180. https://journalofethics.ama-assn.org/article/what-are-important-ethical-implications-using-facial-recognition-technology-health-care/2019-02. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Code and data.
(A) Distribution of first-rank Gestalt scores for the images of White persons in the syndromic cohort and the matched control cohort (healthy). (B) Distribution of first-rank Gestalt scores for the images of persons with other ethnicities in the syndromic cohort and the matched control cohort (healthy).
DeepGestalt’s sensitivities: purple circles indicate the average of the entire syndromic cohort; for other symbols/coloring, see respective subfigure title.
Performance of the SVM on the entire syndromic cohort and matched control cohort: X-axis number of top-rank Gestalt score used for vector construction per case. MCC: Matthews correlation coefficient. Note: rising tendency.
Data Availability Statement
The data and code can be found in Multimedia Appendix 1. For reasons of data protection, all data were cumulated (where possible), deidentified, and minimized. Facial images depicted in Figure 1 show computer-generated composite masks and not real individuals. In Multimedia Appendix 1, file data.txt describes the diagnosis, age, sex, and ethnicity of persons in the analyzed set of images; and file matrix.txt contains DeepGestalt’s output vectors as used for this study. Files differentiator.py and reproduce.py may be used for reproducing the statistical results of this study. Further information may be found in file readme.txt (Multimedia Appendix 1).