Skip to main content
[Preprint]. 2024 May 21:2023.06.06.23290887. Originally published 2023 Jun 10. [Version 3] doi: 10.1101/2023.06.06.23290887

Figure 4: Overview of the GestaltMatcher Database (GMDB)-FAIR dataset.

Figure 4:

a) Sex distribution. Number of images shown in brackets. b) Distribution of patient age in years. c) Left: Two-dimensional representation of phenotypic similarities between patients, as calculated on the basis of Human Phenotype Ontology (HPO) terms via Uniform Manifold Approximation and Projection (UMAP). HPO terms were annotated for 4,474 individuals in the GMDB, and expert clinicians defined twelve distinct HPO-defined symptom groups. Based on the annotated HPO terms, each case was assigned to one or more HPO-defined symptom groups. All OMIM diseases were included, using their HPO annotations (gray background dots) as a reference. GMDB cases are color-coded according to their most pronounced HPO-defined symptom group, i.e., the group that includes the majority of their HPO terms. The dataset is dominated by two major clusters (facial dysmorphism in yellow and neurodevelopmental in blue) but shows cases from across the complete disease landscape. Right: Heatmap of the proportion of GMDB individuals within the HPO-defined symptom group on the X-axis who are also assigned to the HPO-defined symptom group on the Y-axis. Notably, facial dysmorphism is present in at least 70% of the cases of each HPO-defined symptom group. d) Proportion of the unpublished and published images in each ancestry group. e) Proportion of the unpublished and published images in each sub-ancestry group.