Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2025 Dec 23;113(1):57–70. doi: 10.1016/j.ajhg.2025.12.001

GA4GH phenopacket-driven characterization of genotype-phenotype correlations in Mendelian disorders

Lauren Rekerle 1,27, Daniel Danis 2,27, Filip Rehburg 2, Adam SL Graefe 2, Viktor Bily 3, Andrés Caballero-Oteyza 4,5, Pilar Cacheiro 6, Leonardo Chimirri 2, Jessica X Chong 7, Evan Connelly 8, Bert BA de Vries 9, Alexander JM Dingemans 9, Michael H Duyzend 10,11,12, Tomas Freiberger 3, Petra Gehle 13, Tudor Groza 14,15,16, Peter Hansen 2, Julius OB Jacobsen 6, Adam Klocperk 17, Markus S Ladewig 18, Michael I Love 8,19, Allison J Marcello 7, Alexander Mordhorst 20, Monica C Munoz-Torres 21, Justin Reese 22, Catharina Schuetz 23,26, Damian Smedley 6, Timmy Strauss 23, Ondrej Vladyka 17, David Zocche 24, Sylvia Thun 2, Christopher J Mungall 22, Melissa A Haendel 8, Peter N Robinson 1,2,25,
PMCID: PMC12824607  PMID: 41443197

Summary

Comprehensively characterizing genotype-phenotype correlations (GPCs) in Mendelian disease would create new opportunities for improving clinical management and understanding disease biology. However, heterogeneous approaches to data sharing, reuse, and analysis have hindered progress in the field. We developed Genotype-Phenotype Statistical Evaluation of Associations (GPSEA), a software package that leverages the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema to represent case-level clinical and genetic data about individuals. GPSEA applies an independent filtering strategy to boost statistical power to detect categorical GPCs represented by Human Phenotype Ontology terms. GPSEA additionally enables visualization and analysis of continuous phenotypes, clinical severity scores, and survival data such as age of onset of disease or clinical manifestations. We applied GPSEA to 85 cohorts with 6,179 previously published individuals with variants in one of 81 genes associated with 122 Mendelian diseases and identified 253 significant GPCs, with 48 cohorts having at least one statistically significant GPC. These results highlight the power of standardized representations of clinical data for scalable discovery of GPCs in Mendelian disease.

Keywords: genotype-phenotype correlation, Global Alliance for Genomics and Health, Human Phenotype Ontology, Mendelian disease


GPSEA is a software tool that uses the GA4GH Phenopacket Schema to streamline discovery of genotype-phenotype correlations (GPCs) in Mendelian diseases. Analyzing data from 85 cohorts of previously published individuals, it identified 253 significant GPCs, demonstrating the power of standardized clinical data for improving clinical management and disease understanding.

Introduction

There are a huge number of clinical manifestations of human disease, and even individuals with the same clinical diagnosis may present with different combinations of phenotypic abnormalities, ages of onset of these abnormalities, and degrees of clinical severity. A key question for genomic precision medicine is how specific genetic variants influence clinical phenotype. The correlation between genotype (the type of variant or variants present at a given location) and phenotype (presence or absence of medically relevant observable traits) is defined as an above-chance probability of an association between the two and is termed genotype-phenotype correlation (GPC).1 Commonly, even individuals with an identical pathogenic variant may display variable findings, so GPCs are rarely absolute. Instead, GPCs usually signify a higher frequency of a feature in the presence of a certain genotype or an earlier age of onset of the disease or disease feature, or in some cases earlier mortality. For instance, a specific in-frame deletion of codon 992 of NF1 (MIM: 613113) is associated with a milder phenotype characterized by café-au-lait spots and skinfold freckling and with the absence of cutaneous and visible plexiform neurofibromas, whereas individuals with missense mutations affecting any of the five codons 844–848 have a more severe phenotype characterized by a high prevalence of plexiform neurofibromas, optic pathway gliomas, malignant neoplasms, and skeletal abnormalities.2 A core paradigm in precision genomic medicine is to match therapeutic interventions and other forms of clinical care to the pathomechanism of disease and, where appropriate, to specific genetic variants. Although this paradigm has been extremely successful in oncology, where targeted therapies are applied to treat cancer types if a certain genetic variant is identified, this approach has been less successful in Mendelian disease.3 Historically, it has been difficult to identify GPCs for rare Mendelian diseases because the rarity of diseases implies it is generally difficult or impossible to recruit cohorts large enough to achieve statistical power.

The Human Phenotype Ontology (HPO) is a comprehensive bioinformatics resource for the analysis of human diseases and phenotypes, offering a computational bridge between genome biology and clinical medicine, and is used internationally for analysis and exchange of phenotype data in rare disease medicine.4,5,6 A growing number of published works leverage HPO-encoded data for GPC analysis.7,8,9,10,11,12,13,14,15,16,17,18,19,20,21 However, there are challenges to using HPO for GPCs because of the need to propagate annotations up the hierarchy of the HPO and to include only explicitly observed or excluded HPO terms in categorical testing, both being operations that are not natively supported by standard spreadsheet tools or bioinformatics packages. Additionally, it is desirable to integrate the analysis with other kinds of clinical data including numerical measurements, age of onset or mortality, and severity scoring. In addition, the community has lacked a common schema for representing individual (i.e., case-level) clinical trajectories with these and other attributes. Heterogeneous approaches to data sharing, reuse, and analysis have hindered development of software packages and data repositories that support GPC analysis. The Global Alliance for Genomics and Health (GA4GH) is an organization developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information characterizing an individual person or biosample that addresses the challenge of documenting case-level clinical information.22,23,24,25 Here, we present Genotype-Phenotype Statistical Evaluation of Associations (GPSEA). GPSEA leverages phenopackets, characterizing an individual person or biosample and linking the individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments.22 GPSEA automates the process of visualizing and performing GPC analysis. We applied the software to 85 cohorts, 48 (56%) of which had at least one statistically significant GPC (there were a total of 253 statistically significant results). We show the power of utilizing individual-level characterization and discuss future utility in differential diagnostics and precision medicine.

Material and methods

Input data

Genotypic and phenotypic data about individuals with rare Mendelian disease were derived from the Phenopacket Store repository.25 Version 0.1.25 of Phenopacket Store includes 8,207 phenopackets representing 521 Mendelian and chromosomal diseases associated with 463 genes and 4,507 unique pathogenic alleles curated from 1,238 different publications. The phenopackets are structured representations of data comprising age of onset and age at last examination, vital status, genotype of the variant(s) deemed to be causal, disease diagnosis, and HPO terms representing the clinical manifestations of the disease. Where available in the original publication, the age of onset of the clinical manifestations is indicated. Missing data were not imputed.

For each GPC analysis, a cohort was defined based on the gene or disease. The analysis code for each cohort is available in the project GitHub repository (see data and code availability). The phenopackets used for analysis are automatically imported from Phenopacket Store by each cohort notebook. If desired, the phenopackets can also be obtained directly from Phenopacket Store.

GPSEA

GPSEA is a Python package for streamlining GPC analysis. For the analysis described here, version 0.9.11 was used. GPSEA enables a stepwise workflow to characterize GPCs. (1) A collection of phenopackets is loaded into a cohort, and a report with basic descriptive statistics is displayed. The report comprises the number of individuals and the distribution of sex and age, as well as tables with the most commonly annotated HPO terms, diseases, and associated genes. The variants are summarized according to their frequency and the predicted effect on the clinically relevant transcript, including a graphic with the location and frequencies of all non-structural variants. (2) The user can generate hypotheses about the GPC analyses that are most likely to be fruitful. For instance, if roughly half of the variants are missense, then it might make sense to test whether missense variants are associated with different clinical manifestations as compared to other variants. (3) The analysis is configured with respect to multiple testing correction and other parameters. (4) A hypothesis is expressed with GPSEA’s variant predicates and genotype classifiers to define a partitioning of the cohort by genotype for testing with four statistical approaches. (5) The analysis results are presented as figures and tables. The data used to perform the statistical tests can be exported as data frames for additional analysis. A detailed tutorial is available online (see data and code availability).

Statistical tests

GPSEA provides four main statistical tests, each of which can be combined with predicates to test a wide variety of GPCs. For each test performed, GPSEA checks whether data are available for each individual (observed or excluded in the case of HPO-term-based tests, numerical measurement results for the t test, and duration until event for survival analysis) and omits a data point if either phenotypic or genotypic information is not available or applicable. By default, associations with adjusted p values of less than 0.05 are considered to be statistically significant.

Dichotomous qualitative phenotypes

The Fisher exact test (FET) calculates the exact probability value for the relationship between two dichotomous variables. In our implementation, the two dichotomous variables are the genotype and the phenotype. For instance, the individuals of the cohort may be divided according to whether they have a missense or a stop-gained (nonsense) variant and according to whether or not they have Strabismus (HP:0000486) (Figure 1).

Figure 1.

Figure 1

Computation of association between a pair of dichotomous variables

In this example, a Fisher exact test (FET) is used to assess the association of Strabismus with missense variants compared to stop-gained (nonsense) variants. A p value of 5.43 × 10−6 is obtained, meaning there is a significant difference between the groups.

Multiple testing correction

For the FET procedure, GPSEA can perform one FET for each HPO term identified in the cohort. However, in some cohorts, up to hundreds of terms are identified. GPSEA offers two approaches toward controlling the type I error rate, i.e., the probability of rejecting the null hypothesis when it is true. First, 11 classical multiple testing correction procedures are offered including Bonferroni and Benjamini-Hochberg (BH). A multiple-testing correction (MTC) procedure (by default BH) is applied to each test result, and the adjusted p value is reported.

The multiple testing burden can also be reduced by selecting only a subset of terms to test. If the user has a hypothesis about which HPO terms are involved in a GPC, then GPSEA can be instructed to test only this term (or subset of terms).

Independent filtering for Human Phenotype Ontology: IF-HPO

Additionally, we developed the IF-HPO procedure that applies a series of heuristics to select terms to test. The procedure was inspired by analogous strategies used in functional genomics that consist in filtering by a variable that is independent of the test statistic under the null hypothesis. By reducing the number of tests in this way, we maximize power for differential testing while preserving type I error control.26 (1) Skip “general”-level terms. All the direct children of the root phenotype term Phenotypic abnormality are skipped because of the assumption that if there is a valid signal, it will derive from one of the more specific descendents. For instance, Abnormality of the nervous system (HP:0000707) is a child of Phenotypic abnormality, and this assumption implies that if there is a signal from the nervous system, it will lead to at least one of the descendants of Abnormality of the nervous system being significant. The top-level terms as well as their child and grandchild terms are skipped in this way, because they all represent general group terms. Details are available in the online tutorial. (2) Skip terms if all counts are identical to counts for a child term. Let’s say a term such as Posterior polar cataract (HP:0001115) was observed in seven of 11 individuals with MISSENSE variants and in three of eight individuals with NONSENSE variants. If we find the same individual counts (7 of 11 and 3 of 8) in the parent term Polar cataract (HP:0010696), then we choose to not test the parent term. This is because the more specific an HPO term, the more information it has (the more interesting the correlation would be if it exists), and the result of the FET for Polar cataract would be exactly the same as for Posterior polar cataract. (3) Skip terms that are reported in less than a certain proportion of cohort members (default 0.4), because even if a correlation is identified it is unlikely to be of great interest, as the phenotype in question occurs rarely. (4) If the individuals are binned into two genotype groups and two phenotype groups (2 × 2) and the total count of individuals is less than 7, or into three genotype groups and two phenotype groups (3 × 2) and the total count of individuals is less than 6, then there is a lack of even nominal statistical power and the counts can never be significant. (5) Skip terms if there are no HPO observations in a genotype class. If one of the genotype classes has neither observed nor excluded observations for an HPO term, skip it. This situation suggests that the data are not sufficiently rich to confidently perform a test.

Phenotype scores

It is difficult to define an objective measure for the clinical severity of disease. Some published studies use the total count of features from a defined set as a proxy for severity. For instance, Jordan et al.27 found that the total number of structural defects of the brain, eye, heart, and kidney and in sensorineural hearing loss seen in individuals with point mutations in the Atrophin domain of the RERE (MIM: 605226) is significantly higher than expected, compared to the number of defects seen in individuals with putative loss-of-function (LoF) variants. Since there are five potential defects, each individual has a count ranging between 0 and 5. The authors regarded higher counts as representative of a severe clinical presentation.27

GPSEA performs a Mann-Whitney U test (also known as a Wilcoxon rank-sum test) to compare the distribution of such counts between genotype classes. This is a non-parametric test that compares the class medians to determine whether they come from the same distribution.

A set of HPO terms that define the severity score is entered. GPSEA increments the total count by one for each of the terms (or more specific descendant terms) to which an individual is annotated in the phenopacket. If multiple HPO terms are found to be related to one of the specified terms, then only one count is incremented (e.g., if both Ventricular septal defect [HP:0001629] and Atrial septal defect [HP:0001631] are identified, a score of 1 and not 2 is entered for Abnormal heart morphology [HP:0001627]).

The de Vries score is a simple phenotypic severity score for individuals with intellectual disability in which points are given for (severity of) intellectual disability, growth abnormalities (prenatal and postnatal), facial dysmorphisms, non-facial dysmorphisms, and other congenital anomalies.13,28 Our implementation of the de Vries score leverages the hierarchical structure of the HPO to include more specific descendants of phenotypic abnormalities included in the original score. For instance, Disproportionate short stature (HP:0003498) would be counted for Short stature (HP:0004322).

GPSEA can also make use of user-defined functions to support a plethora of scoring schemes used in different clinical domains. A scoring function is required to “condense” the phenotype of the individual into a numeric score or return a “not a number” (NaN) value if the individual should be excluded from the analysis. We provide examples for using user-defined functions as well as defining custom phenotype scorers in GPSEA documentation.

Genotype-specific survival analysis

We may wish to compare the genotype classes with respect to the time point of a specific event, such as age of onset, age at death, or age at onset of a specified phenotypic feature such as kidney failure. To do this, GPSEA tabulates the age of the event (if the event was observed) or whether the individual was still alive (based on the time of last evaluation) without the event having occurred—that is, the survival time was right censored. The log-rank test is used to test the null hypothesis that there is no difference between the populations in the probability of an event at any time point.29

Student’s t test for numerical values

GPSEA performs an unpaired and two-sided t test to compare the means of the two groups defined by the genotype classifier. GPSEA expects numerical data for this test to be made available as measurement elements in the GA4GH Phenopacket Schema. GPSEA does not stipulate any specific ontology to represent the measurements, but in our examples we use LOINC codes to denote the assay and UCUM codes to represent units. GPSEA does not apply multiple-testing correction to these results, and users need to perform one analysis for each measurement to be tested.

Variant predicates

GPSEA has flexible predicates (functions that return either true or false based on the input) that can be used to partition the cohort into (usually) two groups of individuals. The predicates can be combined using “AND,” “OR,” and “NOT” operators of Boolean algebra logic to test complex conditions. GPSEA offers predicates for specific variants, for variant effect categories such as missense and stop-gained variants, specific exons, protein regions, types of structural variant, and others (Table S1). Besides the off-the-shelf predicates, custom predicates for testing arbitrary variant properties can be designed.

Genotype classifiers

GPSEA has five classifiers. The mono-allelic classifiers can be used to investigate autosomal-dominant (heterozygous variants) and X chromosomal diseases (hemizygous variants; if desired cohorts of males and females can be analyzed, in which case the mono-allelic classifier would identify hemizygous variants in males and heterozygous variants in females). For instance, a mono-allelic classifier might partition individuals according to whether they have a heterozygous missense variant (group A) or not (group B); another classifier might partition individuals according to whether they have a heterozygous missense variant (group A) or a heterozygous structural variant (group B). In some cases, a genotype classifier might omit certain individuals; for instance, in the previous example, the classifier would omit any individual who does not have either a missense or a structural variant, or any individuals with homozygous or compound heterozygous genotypes for missense or structural variants. The bi-allelic classifier is designed for autosomal-recessive conditions; it is possible to test three genotypes (e.g., AA, AB, and BB, where A refers to a genotype such as “stop-gained variant” and B refers to other variants), which is a 3 × 2 contingency table that can be analyzed by FET. Alternatively, it is possible to form two groups (e.g., AA and AB vs. BB, i.e., one or two stop-gained alleles vs. no stop-gained allele). The online documentation shows how to define partitions to perform these tests. Any of the variant predicates can be used together with the mono-allelic and bi-allelic genotype classifiers.

The disease classifier is designed to assay differences between the phenotypic features of two different diseases. For instance, in our cohort, we test for differences between Loeys-Dietz syndrome 1 (MIM: 609192) and Loeys-Dietz syndrome 3 (MIM: 613795) as well as between autosomal-recessive and -dominant forms of Robinow syndrome (MIM: 268310 and 616331). The sex classifier is designed to test differences between the phenotypic features observed in the males and females of a cohort. Individuals with unknown or unspecified sex are ignored by this classifier. The allele count classifier is designed to assay differences between individuals with one and two variant alleles in the same gene. For instance, in our cohort, we test whether individuals with mono-allelic and bi-allelic variants in EZH1 (MIM: 601674) have distinct phenotypic profiles. This disease, sex, and allele count classifiers do not take genotype into account.

Visualization

GPSEA visualizes variants against the background of the protein domain structure. To do so, it leverages the UniProt30 application programming interface (API) to retrieve information about protein domains. It is also possible to manually construct a dataframe with information about protein domains in cases where the UniProt API fails or does not contain information about a domain of interest. GPSEA then extracts information about all variants found in the cohort and plots each variant as a “lollipop” whose height and size reflect the number of times the variant was found in the cohort and whose color represents the functional effect predicted for the transcript of interest. Protein domains are depicted as colored boxes. Currently, GPSEA does not display non-coding or structural variants.

Cohorts

A total of 85 cohorts were chosen for GPSEA analysis from version 0.1.25 of Phenopacket Store.25 The cohorts had a mean of 77.8 individuals (median 49, minimum 16, and maximum 462). The cohorts comprised information from 6,179 individuals. Information on the sex of participants was available for 82.6% of these individuals, with 53% being male and 47% female.

To use GPSEA with new cohorts, it will be necessary to convert data to GA4GH Phenopacket Schema format. Several software tools are available to streamline this process.23,25

Search for previously published genotype-phenotype correlations

For each of the gene-specific cohorts, we searched for publications that described genotype-phenotype correlations in PubMed. Each search was designed as {Disease name synonyms} AND {gene/variant synonyms} AND {genotype-phenotype correlation}. The following is an example for Loeys-Dietz syndrome 3.

(“Loeys-Dietz syndrome type 3” OR LDS3 OR “Loeys-Dietz syndrome 3”) AND

(SMAD3 OR variant OR mutation) AND

(“genotype phenotype correlation” OR “phenotype genotype correlation”)

Additionally, the relevant entries from Online Mendelian Inheritance in Man (OMIM)31 were consulted as were the publications used for curation.

Results

Characterizing genotype-phenotype correlations with GPSEA

We developed GPSEA as an end-to-end software framework for exploring and visualizing the cohorts and for characterizing GPCs. GPSEA is a Python package designed to be used in a Jupyter notebook, but the analysis functions can also be used as a programming library. The GPSEA framework enables testing of existing hypotheses about the disease or gene in question or generation of new hypotheses based on salient aspects of the investigated cohort depicted by the tables and visualizations. Genotypes can be tested for association with four main classes of clinical phenotype data: categorical phenotypic traits (i.e., observed vs. excluded HPO terms or disease diagnoses), numerical values (e.g., laboratory test results), phenotype scores, and survival data (i.e., mortality, disease onset, or onset of a specific HPO term). GPSEA enables definition of reproducible analyses that combine flexible partitioning of the individuals into genotype/phenotype groups followed by standard statistical tests (e.g., FET for categorical phenotypes). Importantly, GPSEA exploits the HPO hierarchy and propagates the HPO annotations when computing the contingency tables and distribution of phenotype scores or the survival data or filtering the HPO terms to reduce the multiple testing burden. All analysis results are formatted as tables and figures suitable for processing in bioinformatics pipelines or interactive exploration within the Python data science environment (Figure 2). We tested GPSEA on 85 cohorts, covering 81 genes and 122 diseases. We first explain the algorithmic approaches to setting up GPC testing and then present an overview of 253 significant correlations identified in the cohorts.

Figure 2.

Figure 2

Schematic overview of GPSEA workflow

(A) Overview. GPSEA is a Python package designed to work well in Jupyter notebooks. GPSEA takes a collection of GA4GH phenopackets as input, performs quality assessment, and visualizes the salient characteristics of the cohort; genotype classes are defined (Figure 3); and one of four classes of statistical test is performed for each hypothesis the user decides to test.

(B) Visualizing data and formulating hypotheses. GPSEA displays tables with the distribution of phenotypic abnormalities, disease diagnoses, variants, and other information, and presents a cartoon with the distribution of variants across the protein. This information intends to help users formulate hypotheses about genotype-phenotype correlations.

(C) Statistical testing. GPSEA offers four main ways of testing phenotypes (see text for details and Figure 5 for examples).

Partitioning the cohort according to genotypes

The GPC analysis starts with defining one or more hypotheses. Ideally, decisions regarding the analysis structure will be based on prior hypotheses about the disease or gene in question. Alternatively or additionally, GPSEA helps users generate hypotheses based on the tables and visualizations. For instance, if GPSEA shows that roughly 50% of the variants observed in a cohort are missense and the other 50% are truncation or presumed LoF variants, users may choose to analyze whether missense are associated with significantly different phenotypes than LoF variants. Other GPSEA visualizations may help to formulate hypotheses about commonly occurring variants, protein domains, exons, or other classes of variation. Once a hypothesis has been conceived, it must be encoded into a genotype classifier, i.e., a GPSEA component that assigns each cohort member into one of the (typically two) classes.

Variant predicates are key building blocks for classification based on genomic variants. GPSEA offers predicates for specific variants, variant effect categories such as missense and stop-gained variants, specific exons, protein regions, and types of structural variant and others. Predicates can be combined using Boolean algebra to create more expressive predicates. The framework provides five genotype classifiers that are used to divide the cohort into groups for statistical testing. The mono- and bi-allelic classifiers use variant predicates to select the variants of interest and assign individuals to genotype classes. The sex classifier investigates differences between males and females. The diagnosis classifier tests for differences in the phenotypic spectrum of different diseases (e.g., Loeys-Dietz syndrome 1 [MIM: 609192] vs. Loeys-Dietz syndrome 2 [MIM: 610168]). The allele count classifier takes a variant predicate to select the variants of interest and classify the individuals according to the number of variant alleles. For instance, mono-allelic and bi-allelic variants in EZH1 cause dominant and recessive neurodevelopmental disorders32; with the allele count classifier, the distribution of phenotypic features can be compared between individuals with mono-allelic and bi-allelic variants (Figure 3 and Table S1).

Figure 3.

Figure 3

Variant predicates and genotype classifiers

(A) Variant predicate tests. GPSEA provides predicate functions that test whether a variant, such as c.373G>C (GenBank: NM_181486.4) (p.Gly125Arg) in TBX5 (MIM: 601620), meets a criterion from one of three evidence groups: allele, functional annotation, or protein. For instance, the predicate checks if the variant is a deletion and whether it overlaps with a specific exon or with a protein region of interest.

(B) Boolean algebra. Variant predicates can be combined using AND, OR, and NOT operators of Boolean algebra to test complex criteria. For instance, a predicate for a point mutation can be formulated as a “missense mutation affecting one reference base and change length of zero” (no sequence loss or gain). A predicate for a loss-of-function mutation can be defined as a mutation leading to a transcript ablation, frameshift, introduction of a premature stop codon, or the start codon loss. A predicate for a structural deletion can test whether the variant is either an imprecise chromosomal deletion or a deletion involving 50 or more base pairs (or other thresholds).33

(C) Genotype classifiers. Each classifier splits a cohort into two or more classes to enable genotype-phenotype comparisons. GPSEA ships with five built-in classifiers to classify the cohort members using their sex, diagnosis, a fixed count of alleles of different types (mono-allelic and bi-allelic), or by a different allele count of the same type (allele count).

Analyzing the cohort according to phenotypes

GPSEA offers four major tests for different kinds of clinical data. The categorical test is designed to be used for observations of HPO terms (observed/excluded) or disease diagnoses. Numerical values such as laboratory measurements can be analyzed with a t test. Phenotype scores can be derived as a proxy of clinical severity and are analyzed by a Mann-Whitney U test. Finally, survival analysis can be performed for age of disease onset, onset of a phenotypic feature (HPO term), or death. The following sections explain the approach. The examples are taken from the 85 cohorts (for an overview, see Tables S2 and S3–S10; for a summary of results for each cohort, see Figures S1–S88; for source code for analyses, see data and code availability).

Categorical association

The FET calculates the exact probability for observing as extreme a contingency table for the relationship between two categorical variables, if in fact they are independent. In our implementation, the two categorical variables are the genotype and the phenotype. For instance, the individuals of the cohort may be divided according to whether or not they have a stop-gained (nonsense) variant and according to whether or not they have Strabismus.

IF-HPO

Larger cohorts may include several hundreds of HPO terms. Even though many published articles on GPC analysis do not apply an MTC to the tests, we feel it is appropriate to do so unless users have a well-defined hypothesis prior to performing the analysis. In the cohorts analyzed here, up to hundreds of HPO terms are used, so MTC can result in low statistical power. In high-dimensional data such as analysis of mRNA expression in cohorts, a two-stage approach termed independent filtering prefilters hypotheses (e.g., expression differences per gene) by a criterion independent of the test statistic under the null hypothesis, before testing any hypotheses, to reduce the number of the hypotheses tested at stage 2, leading to a milder MTC effect and, thereby, increased power.26 We developed an analogous approach, IF-HPO, to reduce the testing burden before MTC is applied (Figure 4). This rule-based approach leverages the hierarchical structure of the HPO to avoid unnecessary tests and the tests that are unlikely to reveal an interesting result. IF-HPO reduces the total number of tested terms by over 10-fold in the cohorts analyzed here (before filtering: mean 304, median 277, minimum 45, maximum 967; following independent filtering: mean 40, median 28, minimum 1, maximum 225). While any such heuristic has its trade-offs and it is possible that some significant and interesting results are removed, the IF-HPO procedure provides a substantial boost in statistical power for the remaining terms.

Figure 4.

Figure 4

Independent filtering for human phenotype ontology

Independent filtering for HPO (IF-HPO) removes hypotheses (here, HPO terms) by criteria independent of the test statistic to reduce the multiple testing burden and boost power. The HPO has a hierarchical structure going from general to specific terms. (1) IF-HPO does not test the top two levels of the HPO under the Phenotypic abnormality root or the terms that are not descendants of the Phenotypic abnormality under the assumption that more specific terms are of higher medical and scientific interest and the signal is likely to be driven by a more specific clinical manifestation. (2) Terms are not tested if they have the exact same counts as one of their child terms, because in this case the annotations of the parent term are derived entirely from those of the child term by the true path rule. (3) Terms are not tested if the coverage is less than 40% of the entire cohort (assuming a cohort of 100 individuals in the figure), under the assumption that the result would not be representative of the cohort. (4) Terms are not tested if the total count is below a threshold for reaching the nominal statistical power. (5) Finally, terms are not tested if one of the genotype classes has neither present nor excluded observations.

Alternatively, users can choose to test specific HPO terms if there is a prior hypothesis or to test all terms. Standard MTC is applied to the tests performed following IF-HPO. By default, GPSEA applies the BH method34 (ten other standard MTC approaches are available).

Figures 5A and 5B show an example cartoon generated for a cohort of individuals with NF1 variants and results of a categorical analysis to test for associations of variants at residue Arg1830 (GenBank: NP_001035957.1) in individuals diagnosed with Neurofibromatosis, type 1 (MIM: 162200). Twelve HPO terms were found to have a significantly lower or higher frequency in individuals with variants at this position as compared to other variants in the NF1 (for additional results for NF1, see Figure S42; for a summary of all significant categorical test results, see Table S2).

Figure 5.

Figure 5

Excerpted results from five example analyses

(A) Visualization. GPSEA generates a cartoon showing the location and frequency of variants in protein sequences. The following panels show examples of statistically significant GPCs identified by GPSEA.

(B) Categorical analysis. Several phenotypic abnormalities (HPO terms) such as neurofibromas, optic nerve glioma, and Lisch nodules are significantly less frequent in individuals with neurofibromatosis type 1 due to variants located at the arginine residue at position 1,830 of neurofibromin isoform 1 than in those with different mutations (FET, IF-HPO, Benjamini-Hochberg correction).

(C) Severity score. A boxplot with counts of abnormalities in five organ systems in the individuals with mutations in RERE showing the association of the mutations in the Atrophin domain with abnormalities in multiple organ systems27 (Mann-Whitney U test, p = 1.44 × 10−3). The boxes represent the Q1–Q3 range, and the whiskers extend to the farthest score lying within 1.5× the interquartile range. The blue line denotes the median score.

(D) de Vries score. Boxplots representing the association of the de Vries phenotype score13 and missense variants in CHD8 (Mann-Whitney U test, p = 8.99 × 10−4).

(E) Continuous phenotypes. Association of CYP21A2 genotype (homozygous missense vs. other) with concentration of 17-OH progesterone (t test, p = 7.91 × 10−6).

(F) Survival analysis. Comparison of the onset of Stage 5 chronic kidney disease (HP:0003774) in individuals with UMOD mutations showing a significantly earlier onset of the disease in the individuals with NM_003361.4:c.744C>G; p.(Cys248Trp) than in those with NM_003361.4:c.947A>C; p.(Gln316Pro) (log-rank test, p = 4.1 × 10-4). Missense, set complement of “missense,” i.e., any mutation that is not missense; LoF, loss of function.

Phenotype scores

Phenotype scores have been developed for some diseases to provide a semi-objective assessment of disease severity. Many such scores count the total number of observed phenotypic features from a list of stipulated terms. Other scores involve more complicated systems that use Boolean logic or thresholding. An example of the first score is provided in the analysis of Atrophin domain variants in RERE, which were previously found to be significantly associated with higher scores defined by the counts of structural defects of the brain, eye, heart, kidney, and sensorineural hearing loss.27 GPSEA provides Counting scorer, which allows users to indicate relevant HPO terms. The hierarchical structure of the HPO is used to count annotations to the term itself or any of its descendents; for each of the terms, a count of 1 is given if one or more such annotations were found; otherwise, a count of zero is assigned for the term. The phenotype score thus ranges from 0 if no relevant abnormalities were recorded to the total count of specified HPO terms if an abnormality was found in all items. For instance, one of the terms was Abnormal brain morphology (HP:0012443). Therefore, one point would be given if the individual was annotated to any of the descendant terms, for instance, Agenesis of corpus callosum (HP:0001274). There was a significantly higher severity score for variants located in the Atrophin domain of RERE (Figures 5C and S50).

Other scores have been developed with more involved rules. For instance, the de Vries score was developed as a relatively simple phenotypic severity score for individuals with intellectual disability in which points are given for (severity of) intellectual disability, growth abnormalities (prenatal and postnatal), facial dysmorphisms, non-facial dysmorphisms, and other congenital anomalies.13 We developed a modified version of this score that uses the structure of the HPO to “roll up” specific terms. Using this, we identified a significantly lower score (corresponding to milder clinical manifestations) in individuals with CHD8 (MIM: 610528) missense variants than those with other CHD8 variants, similar to the original application of the score to CHD8 (Figure 5D).13 We also applied the score to other cohorts; a similar significant association of missense variants with lower scores was identified for CTCF (MIM: 604167) (Figure S15; for a summary of all phenotype score results, see Table S3).

Numerical values

HPO terms are categorical and are not designed to capture continuous (numerical) values. Instead, the GA4GH Phenopacket Schema has a measurement element that can be used to represent the results of laboratory tests for analytes such as enzyme activity or metabolite concentrations. GPSEA can test the association of numerical data, such as metabolite levels or enzyme activities, with genotype classes. For example, in Adrenal hyperplasia, congenital, due to 21-hydroxylase deficiency (MIM: 201910), it is assumed that the mildest mutation determines the phenotype in compound heterozygotes and that missense variants in CYP21A2 (MIM: 613815) have a less severe effect on enzyme activity than do other variants such as truncation or ablation variants.35 Using GPSEA, we applied a t test and observed significantly lower 17-OH-progesterone levels (which are known to increase with reduced 21-hydroxylase activity) in the individuals with two missense alleles (Figures 5E and S16; for a summary of all t test results, see Table S4).

Survival analysis

GPSEA can perform survival analysis to assess associations between genotype classes and mortality, disease onset, or onset of a specific phenotypic abnormality such as Stage 5 chronic kidney disease (HP:0003774). The data are plotted as a Kaplan-Meier curve, and a log-rank test is applied to assess statistical significance (Figures 5F and S83). This requires that the phenopackets have information about the ages of onset or mortality; because this information was not available in most of the publications curated for this project, survival analysis was performed only for a subset of cohorts. The analysis leverages the ontological structure of the HPO to roll up annotation from descendant terms, similar to the procedure for categorical analysis. If we are testing for onset of Seizure (HP:0001250), and an individual was noted to have both Tonic seizure (HP:0032792) and Generalized myoclonic seizure (HP:0002123), the youngest age of onset for the latter two terms is chosen (for a summary of all survival analysis results, see Tables S5–S7).

Analysis by disease diagnosis

GPSEA also allows users to search for HPO terms that are different between two diseases. For instance, 8/25 (32%) individuals with Kabuki syndrome 1 (MIM: 147920) displayed Feeding difficulties (HP:0011968) compared to 55/63 (87%) individuals with Kabuki syndrome 2 (MIM: 300867) (p = 2.1 × 10−5, FET, IF-HPO, BH correction). A total of 16 significant findings were observed (Figure S28; for a summary of all disease analysis results, see Table S8).

Analysis of sex differences

A categorical analysis can be performed of the association between phenotypic features and sex (male or female). Tests were performed in 44 cohorts, and one significant difference was identified in the cohort for Kabuki syndrome 2, in which 14/18 (78%) males were annotated to the HPO term Intellectual disability, severe (HP:0010864), compared to only 7/25 (28%) females (p = 7.77 × 10−3; FET, BH) (Table S9).

GPCs are common in Mendelian disease

We analyzed 85 cohorts with 6,179 individuals (median 49 per cohort, range: 16–462) with 122 Mendelian diseases. Each individual was encoded as a phenopacket with information about the disease diagnosis, phenotypic abnormalities (HPO terms), and, where available, age of onset of the disease and individual features, age of death, and in some cases numerical laboratory test results. GPSEA analysis was applied to each of the cohorts. Existing knowledge about GPCs related to the gene or disease of interest was sought in PubMed (see material and methods) and, if possible, an analysis was performed in GPSEA to reproduce a similar result using the cohorts available. Alternatively or additionally, GPSEA visualizations were consulted to generate hypotheses about testable GPCs for common variant categories (e.g., missense and nonsense), common variants, exons, protein domains, or regions. If relevant information was available about onset or mortality, survival analysis was performed. In some cases, phenotype severity scores were applied or numerical analyses were performed. A total of 253 significant correlations were identified. We did not identify even a single publication for which the data and analysis script were made available in a way that would allow the original analysis to be replicated. Additionally, we attempted to curate data from all available publications for each gene or disease being analyzed and so had different cohorts and a different methodology. Nevertheless, we assessed whether results are similar to previously published ones. Some of our cohorts involved comparison of diseases with well-known phenotypic differences; for instance, we compared Spastic paraplegia 78, autosomal recessive (MIM: 617225) and Kufor-Rakeb syndrome (MIM: 606693), both of which are caused by variants in ATP13A2 (MIM: 610513), and showed a significantly higher frequency of Parkinsonism (HP:0001300) and Bradykinesia (HP:0002067) in the individuals with Kufor-Rakeb syndrome (Figure S8). Although statistical tests are rarely conducted to characterize allelic diseases in this way, we regard such differences as well known and record them as previously published in the literature for the purposes of Table 1. Other differences, such as a higher prevalence of Osteoarthritis (HP:0002758) in Loeys-Dietz syndrome 3 in individuals with the missense variant c.859C>T (GenBank: NM_005902.4) (p.Arg287Trp) (19/19) compared to individuals with other variants (7/19; 37%; p = 3.7 × 10−5) could not be identified in previous literature. Significant GPCs were identified for 48 cohorts. We identified previously published GPCs for 29 of these cohorts, many of which overlapped with our findings (references and detailed analysis of which are presented in Figures S1–S88). Seventy-one significant findings in the remaining 19 cohorts represent candidate GPCs that should be validated by independent studies on validation cohorts (Table 1; references for previously published findings are available in Figures S1–S88).

Table 1.

Summary of tests performed according to type of test

Statistical procedure Cohorts tested Tests performed Significant tests
Categorical analysis 78 6,736 217
t test 2 3 3
HPO onset 4 6 3
Disease onset 10 11 6
Mortality 3 3 1
Phenotype scores 6 9 7
Disease diagnosis 8 266 15
Sex differences 44 1,979 1
Total 85a 9,013 253

The table provides a summary of the results from the 85 cohorts tested, arranged according to the types of statistical tests offered by GPSEA. Categorical analysis: association of genotypes with phenotypes by a Fisher exact test. t test: test of means of continuous values by Student’s t test. HPO onset: log-rank test for association of genotypes with age of onset of a phenotypic abnormality represented by an HPO term. Disease onset: log-rank test for association of genotypes with age of onset of a disease. Mortality: log-rank test for association of genotypes with age of death. Phenotype scores: Mann-Whitney U test for association of genotypes with magnitude of a phenotype severity score. Multiple testing correction was applied to the categorical tests (Benjamini-Hochberg method), following the independent filtering procedure (IF-HPO). Disease diagnosis: comparison of two or more diseases associated with the variants in the same gene. Sex differences: comparison of frequencies of phenotypic features in a disease between males and females. No multiple testing correction was applied to the remaining tests, which were considered to represent distinct hypotheses (detailed results are shown in Table S10).

a

Multiple statistical procedure types were performed for some cohorts.

Distribution of phenotypic features with significant GPCs

We analyzed the distribution of HPO terms for which significant GPCs were identified by identifying the top-level term (direct child of Phenotypic abnormality). The distribution of terms was significantly different from what one would expect based on the counts of all terms in the Phenotypic abnormality subhierarchy of the HPO (exact multinomial test, p = 3.17 × 10−36). The largest differences were observed for Abnormality of the nervous system (HP:0000707; expected 10.7%, observed 23.0%), Abnormality of metabolism/homeostasis (HP:0001939; expected 9.5%, observed 1.1%), and Neoplasm (HP:0002664; expected 2.7%, observed 9.2%). This raises the possibility that phenotypic features in different organ systems may have a differential tendency to display GPCs, although our observation may also be the result of an ascertainment or other bias (Table S11).

Discussion

Precision genomic medicine is an emerging medical discipline that aims to apply genomic information for prediction, prevention for early diagnosis, or tailored treatment in order to improve clinical care. Although precision approaches have been applied successfully to some Mendelian diseases such as Cystic fibrosis (MIM: 219700),36 our understanding of disease subtypes and GPCs is limited for the vast majority of the roughly 7,000 characterized rare Mendelian diseases. Understanding GPCs can contribute toward understanding disease pathophysiology and stratified clinical management. A barrier has been the lack of standardized data exchange and analysis schemas, which means that it is difficult to combine data from multiple sources and that scripts or program code need to be created anew for each project. The GA4GH Phenopacket Schema was released in 2022 and approved by the International Standards Organization (ISO 4454:2022) as a standard for sharing clinical and genomic information about an individual. Each phenopacket is a computational representation of the clinical trajectory of one individual and can contain data about phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used for data exchange and as a computational model for clinical decision support systems scoped on individuals and their families, such as Exomiser,37 LIRICAL,38 and Emedgene, as well as for the algorithms that facilitate classification, stratification, or GPC discovery in cohorts of individuals, such as GPSEA.22 Therefore, when phenopackets serve as a unifying standard across projects, locations, and registries, they enable machine readability and reusability for multiple analyses while offering precise, ontology-based semantics and adherence to the findable, accessible, interoperable, and reusable (FAIR) data principles. The GA4GH Phenopacket Schema thus enables software such as GPSEA to be used for any relevant dataset that is available as or can be transformed into phenopackets.

GPCs can be of utility for clinical management decisions or translational research. GPSEA is not designed for use in clinical care but rather provides a framework that is helpful for characterizing GPCs in cohorts of individuals, and it is our hope that it will contribute to the discovery of novel GPCs. In Mendelian disease, GPCs display a spectrum of association strength. With some genes associated with Mendelian disease, different mutations lead deterministically to distinct diseases. For instance, different germline FGFR3 (MIM: 134934) variants cause Achondroplasia (MIM: 100800), Hypochondroplasia (MIM: 146000), and other disorders; timely diagnosis of FGFR3-related skeletal dysplasia is essential for timely management of complications and genetic counseling.39 Specific NF1 variants are associated with mild clinical manifestations of neurofibromatosis type 1; for instance, variants affecting Arg1809 tend to show café-au-lait macules and Noonan-like dysmorphic features but do not have neurofibromas and some other typical neurofibromatosis features.40 In other cases, the degree of association of a GPC is weaker, so that individuals with a certain variant or category of variant tend to have a lower or higher frequency of a feature. For instance, premature termination codon variants in FBN1 (MIM: 134797) are associated with a higher risk of aortic dissection in Marfan syndrome (MIM: 154700), but individuals with other categories of variants may also experience aortic dissection.41 Understanding GPCs in these and other Mendelian diseases may help guide clinical management in some cases. In a few cases, correlations of specific variants with clinical data are included in clinical guidelines, as is the case for CFTR (MIM: 602421) variants.42 GPCs have also been used as the starting point for experimental work to understand disease biology and gene function; for instance, a de novo single-base substitution within the LMNA (MIM: 150330) exon 11 (c.1824C>T [GenBank: NM_170707.4] [p.(=)]) activates a cryptic splice site, leading to an in-frame deletion of 50 amino acids near the C terminus of prelamin A; this mutant form of lamin A acts in a dominant fashion to induce a whole variety of abnormalities in nuclear processes, which eventually lead to cellular and organismal decline and cause Hutchinson-Gilford progeria (MIM: 176670). This pathomechanism is distinct from that observed in other laminopathies such as Cardiomyopathy, dilated, 1A (MIM: 115200), Emery-Dreifuss muscular dystrophy 2, autosomal dominant (MIM: 181350), and Mandibuloacral dysplasia (MIM: 248370).43 The identification of GPCs can thus help to formulate well-targeted hypotheses for molecular research.

An area where genotype-phenotype stratification approach could provide immediate value is in prenatal genomics. Increasingly, diagnostic genomic sequencing is performed during pregnancy following an abnormal fetal phenotype, often with results available between 16 and 18 weeks gestation. At this stage, decisions about clinical management can be extremely difficult and time sensitive. Having a clearer, evidence-based understanding of likely phenotypic outcomes based on genotype could support more informed and personalized counseling for families and help them to weigh risks and make decisions aligned with their values and tolerance for uncertainty.

As more and more data in human genetics becomes available with HPO annotations, new challenges arise for analysis because of the ontological structure of the HPO. Performing GPC analysis with HPO annotations raises challenges for analysis because of the ontological structure of the HPO. Analysis software needs to roll up annotations; for instance, if an individual is annotated to Nuclear cataract (HP:0100018), it is always true that the individual also has the manifestation described by the parent term Zonular cataract (HP:0010920) and the grandparent term Cataract (HP:0000518), and so forth (this is termed the “true path rule”). However, HPO annotation is performed to the most specific level, and typical statistical software used to work with data frames is not able to perform the rolling up that is needed for correct analysis. Another challenge when using ontologies for analysis relates to the redundancies inherent in the hierarchical structure.44 GPSEA addresses both challenges by preparing data for analysis using the true path rule and minimizing redundancy and multiple-testing burden by the IF-HPO procedure.

Many publications on GPC analysis either do not provide raw data or provide a summary of data in the supplement as an Excel file or related format. Rarely is analysis code provided to reproduce the results, and since the formats used in the supplemental files are diverse, it requires a substantial amount of work to prepare them for statistical analysis. The approach we have presented here makes the investigation of GPCs FAIR. The entirety of the data used for the analysis is freely available in Phenopacket Store,25 and all code used for analysis is available in the GitHub repository (one notebook is provided for each cohort). An additional advantage of this approach is that results for new cohorts can be assessed in comparison to a body of previous results using the same software.

For this project, we used data derived from published cohorts for our analysis. The cohorts we present here were all derived from published case and cohort reports. In general, published cohorts do not contain comprehensive clinical information but instead present features deemed most relevant or important by the authors. Furthermore, phenotype is not solely determined by genetic factors; environmental and lifestyle influences can interact with allelic variability and modifier genes to further shape phenotype. For this reason, and also because of potential publication biases that may lead to an overestimation of clinical severity,45 the results about specific GPCs presented here should be regarded as hypotheses that will require confirmation in independent studies.

Allelic variability is only one of many factors that determine phenotype. However, phenotypic severity or penetrance can be influenced by the genotype at another locus, which is referred to as a modifier gene.46 Indeed, in hereditary breast cancer, polygenic risk scores (PRSs) for ovarian cancer are associated with penetrance of ovarian cancer in individuals harboring mutations in BRCA1 or BRCA2 (MIM: 113705 and 600185).47 GPSEA could be easily extended to evaluate the effects of PRSs or variants in modifier genes, but the main challenge will be in the collection of comprehensive clinical and genomic data. As a general rule, the ability to identify GPCs will depend in general on the presence of true biomedical differences, on the quality and comprehensiveness of reporting of clinical data, and on statistical power related to the size of the cohort. Prospectively capturing clinical data using a common data model could be beneficial.48

GPSEA serves as a foundational tool, enabling correlation studies to be conducted at any scale, within any setup, and for any hypothesis. For instance, we envision that GPSEA can be used for comprehensive research databases with more balanced datasets, whether or not the raw data can be shared publicly. Wide community adoption of the GA4GH Phenopacket Schema and application of consistent practices for recording and reporting phenotypic features in publications and databases would accelerate characterization of GPCs across the Mendeliome and thereby contribute to our knowledge of the natural history of rare diseases.

Data and code availability

  • No original data were generated for this analysis. GPSEA is available at https://github.com/P2GX/gpsea under an MIT license. Documentation and a tutorial are provided at https://p2gx.github.io/gpsea/stable/.

  • The GPSEA case studies (gpsea-cs) repository provides one Jupyter notebook for each of the cohorts analyzed in this work and is available at https://github.com/P2GX/gpsea-cs. This repository additionally contains code we used to generate the supplemental figures and tables that is not needed by new users of GPSEA.

Acknowledgments

This work was supported by grants from the National Human Genome Research Institute (A Phenomics-First Resource for Interpretation of Variants, 5RM1HG010860 and The Human Phenotype Ontology: Accelerating Computational Integration of Clinical Data for Genomics, 5U24HG011449; J.X.C. and A.J.M.D. were supported by 1R35HG011297). P.N.R. was supported by a Professorship of the Alexander von Humboldt Foundation. A.K. and O.V. were supported by grant NU23-05-00097 issued by the Czech Health Research Council, Ministry of Health of the Czech Republic.

Author contributions

P.N.R. and D.D. conceived and designed the project and methodology; L.R. and D.D. wrote the Python code with contributions from P.N.R., J.R., and F.R.; P.N.R. developed the statistical modeling with input from M.I.L.; the cohorts were analyzed by L.R., A.S.L.G., V.B., A.C.-O., P.C., L.C., J.X.C., E.C., A.J.M.D., B.B.A.d.V., M.H.D., T.F., P.G., P.H., A.K., M.S.L., A.M., A.J.M., J.R., C.S., T.S., O.V., D.Z., and P.N.R. using the GPSEA software; M.A.H., T.G., J.O.B.J., C.J.M., M.C.M.-T., S.T., and D.S. advised about the use and creation of phenopackets; D.D. and P.N.R. wrote the manuscript; and all authors reviewed and approved the final version.

Declaration of interests

The authors declare no competing interests.

Published: December 23, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2025.12.001.

Web resources

Supplemental information

Document S1. Figures S1–S88; Tables S1 and S3–S11
mmc1.pdf (1.2MB, pdf)
Table S2. Significant GPCs between categorical genotypes and phenotypes
mmc2.xlsx (18.6KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (4.5MB, pdf)

References

  • 1.Ries M., Gal A. In: Fabry Disease: Perspectives from 5 Years of FOS. Mehta A., Beck M., Sunder-Plassmann G., editors. Oxford PharmaGenesis; 2006. Genotype–phenotype correlation in Fabry disease. [PubMed] [Google Scholar]
  • 2.Bettegowda C., Upadhayaya M., Evans D.G., Kim A., Mathios D., Hanemann C.O., REiNS International Collaboration Genotype-phenotype correlations in neurofibromatosis and their potential clinical use. Neurology. 2021;97:S91–S98. doi: 10.1212/WNL.0000000000012436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.MacRae C.A., Seidman C.E. Closing the Genotype-Phenotype Loop for Precision Medicine. Circulation. 2017;136:1492–1494. doi: 10.1161/CIRCULATIONAHA.117.030831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Robinson P.N., Köhler S., Bauer S., Seelow D., Horn D., Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 2008;83:610–615. doi: 10.1016/j.ajhg.2008.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Köhler S., Vasilevsky N.A., Engelstad M., Foster E., McMurry J., Aymé S., Baynam G., Bello S.M., Boerkoel C.F., Boycott K.M., et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 2017;45:D865–D876. doi: 10.1093/nar/gkw1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Köhler S., Doelken S.C., Mungall C.J., Bauer S., Firth H.V., Bailleul-Forestier I., Black G.C.M., Brown D.L., Brudno M., Campbell J., et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42:D966–D974. doi: 10.1093/nar/gkt1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Pehlivan D., Bengtsson J.D., Bajikar S.S., Grochowski C.M., Lun M.Y., Gandhi M., Jolly A., Trostle A.J., Harris H.K., Suter B., et al. Structural variant allelic heterogeneity in MECP2 duplication syndrome provides insight into clinical severity and variability of disease expression. Genome Med. 2024;16:146. doi: 10.1186/s13073-024-01411-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Alecu J.E., Tam A., Richter S., Quiroz V., Schierbaum L., Saffari A., Ebrahimi-Fakhari D. Quantitative natural history modeling of HPDL-related disease based on cross-sectional data reveals genotype-phenotype correlations. Genet. Med. 2025;27 doi: 10.1016/j.gim.2024.101349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dardas Z., Fatih J.M., Jolly A., Dawood M., Du H., Grochowski C.M., Jones E.G., Jhangiani S.N., Wehrens X.H.T., Liu P., et al. NODAL variants are associated with a continuum of laterality defects from simple D-transposition of the great arteries to heterotaxy. Genome Med. 2024;16:53. doi: 10.1186/s13073-024-01312-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bosch E., Popp B., Güse E., Skinner C., van der Sluijs P.J., Maystadt I., Pinto A.M., Renieri A., Bruno L.P., Granata S., et al. Elucidating the clinical and molecular spectrum of SMARCC2-associated NDD in a cohort of 65 affected individuals. Genet. Med. 2023;25 doi: 10.1016/j.gim.2023.100950. [DOI] [PubMed] [Google Scholar]
  • 11.Calame D.G., Guo T., Wang C., Garrett L., Jolly A., Dawood M., Kurolap A., Henig N.Z., Fatih J.M., Herman I., et al. Monoallelic variation in DHX9, the gene encoding the DExH-box helicase DHX9, underlies neurodevelopment disorders and Charcot-Marie-Tooth disease. Am. J. Hum. Genet. 2023;110:1394–1413. doi: 10.1016/j.ajhg.2023.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Guatibonza Moreno P., Pardo L.M., Pereira C., Schroeder S., Vagiri D., Almeida L.S., Juaristi C., Hosny H., Loh C.C.Y., Leubauer A., et al. At a glance: the largest Niemann-Pick type C1 cohort with 602 patients diagnosed over 15 years. Eur. J. Hum. Genet. 2023;31:1108–1116. doi: 10.1038/s41431-023-01408-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dingemans A.J.M., Truijen K.M.G., van de Ven S., Bernier R., Bongers E.M.H.F., Bouman A., de Graaff-Herder L., Eichler E.E., Gerkes E.H., De Geus C.M., et al. The phenotypic spectrum and genotype-phenotype correlations in 106 patients with variants in major autism gene CHD8. Transl. Psychiatry. 2022;12:421. doi: 10.1038/s41398-022-02189-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Crawford K., Xian J., Helbig K.L., Galer P.D., Parthasarathy S., Lewis-Smith D., Kaufman M.C., Fitch E., Ganesan S., O’Brien M., et al. Computational analysis of 10,860 phenotypic annotations in individuals with SCN2A-related disorders. Genet. Med. 2021;23:1263–1272. doi: 10.1038/s41436-021-01120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.van der Spek J., den Hoed J., Snijders Blok L., Dingemans A.J.M., Schijven D., Nellaker C., Venselaar H., Astuti G.D.N., Barakat T.S., Bebin E.M., et al. Inherited variants in CHD3 show variable expressivity in Snijders Blok-Campeau syndrome. Genet. Med. 2022;24:1283–1296. doi: 10.1016/j.gim.2022.02.014. [DOI] [PubMed] [Google Scholar]
  • 16.Zhang C., Jolly A., Shayota B.J., Mazzeu J.F., Du H., Dawood M., Soper P.C., Ramalho de Lima A., Ferreira B.M., Coban-Akdemir Z., et al. Novel pathogenic variants and quantitative phenotypic analyses of Robinow syndrome: WNT signaling perturbation and phenotypic variability. HGG Adv. 2022;3 doi: 10.1016/j.xhgg.2021.100074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hebebrand M., Hüffmeier U., Trollmann R., Hehr U., Uebe S., Ekici A.B., Kraus C., Krumbiegel M., Reis A., Thiel C.T., Popp B. The mutational and phenotypic spectrum of TUBA1A-associated tubulinopathy. Orphanet J. Rare Dis. 2019;14:38. doi: 10.1186/s13023-019-1020-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Casanova E.L., Gerstner Z., Sharp J.L., Casanova M.F., Feltus F.A. Widespread genotype-phenotype correlations in intellectual disability. Front. Psychiatry. 2018;9:535. doi: 10.3389/fpsyt.2018.00535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.van der Sluijs P.J., Jansen S., Vergano S.A., Adachi-Fukuda M., Alanay Y., AlKindy A., Baban A., Bayat A., Beck-Wödl S., Berry K., et al. The ARID1B spectrum in 143 patients: from nonsyndromic intellectual disability to Coffin-Siris syndrome. Genet. Med. 2019;21:1295–1307. doi: 10.1038/s41436-018-0330-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chiorean A., Farncombe K.M., Delong S., Andric V., Ansar S., Chan C., Clark K., Danos A.M., Gao Y., Giles R.H., et al. Large scale genotype- and phenotype-driven machine learning in Von Hippel-Lindau disease. Hum. Mutat. 2022;43:1268–1285. doi: 10.1002/humu.24392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chiu T.L.-H., Leung D., Chan K.-W., Yeung H.M., Wong C.-Y., Mao H., He J., Vignesh P., Liang W., Liew W.K., et al. Phenomic analysis of chronic granulomatous disease reveals more severe integumentary infections in X-linked compared with autosomal recessive chronic granulomatous disease. Front. Immunol. 2021;12 doi: 10.3389/fimmu.2021.803763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jacobsen J.O.B., Baudis M., Baynam G.S., Beckmann J.S., Beltran S., Buske O.J., Callahan T.J., Chute C.G., Courtot M., Danis D., et al. The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat. Biotechnol. 2022;40:817–820. doi: 10.1038/s41587-022-01357-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Danis D., Jacobsen J.O.B., Wagner A.H., Groza T., Beckwith M.A., Rekerle L., Carmody L.C., Reese J., Hegde H., Ladewig M.S., et al. Phenopacket-tools: Building and validating GA4GH phenopackets. PLoS One. 2023;18 doi: 10.1371/journal.pone.0285433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ladewig M.S., Jacobsen J.O.B., Wagner A.H., Danis D., El Kassaby B., Gargano M., Groza T., Baudis M., Steinhaus R., Seelow D., et al. GA4GH phenopackets: A practical introduction. Adv. Genet. 2023;4 doi: 10.1002/ggn2.202200016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Danis D., Bamshad M.J., Bridges Y., Caballero-Oteyza A., Cacheiro P., Carmody L.C., Chimirri L., Chong J.X., Coleman B., Dalgleish R., et al. A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery. HGG Adv. 2025;6 doi: 10.1016/j.xhgg.2024.100371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bourgon R., Gentleman R., Huber W. Independent filtering increases detection power for high-throughput experiments. Proc. Natl. Acad. Sci. USA. 2010;107:9546–9551. doi: 10.1073/pnas.0914005107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jordan V.K., Fregeau B., Ge X., Giordano J., Wapner R.J., Balci T.B., Carter M.T., Bernat J.A., Moccia A.N., Srivastava A., et al. Genotype-phenotype correlations in individuals with pathogenic RERE variants. Hum. Mutat. 2018;39:666–675. doi: 10.1002/humu.23400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.de Vries B.B., White S.M., Knight S.J., Regan R., Homfray T., Young I.D., Super M., McKeown C., Splitt M., Quarrell O.W., et al. Clinical studies on submicroscopic subtelomeric rearrangements: a checklist. J. Med. Genet. 2001;38:145–150. doi: 10.1136/jmg.38.3.145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bland J.M., Altman D.G. The logrank test. BMJ. 2004;328:1073. doi: 10.1136/bmj.328.7447.1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.UniProt Consortium UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 2025;53:D609–D617. doi: 10.1093/nar/gkae1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Amberger J.S., Bocchini C.A., Scott A.F., Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019;47:D1038–D1043. doi: 10.1093/nar/gky1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gracia-Diaz C., Zhou Y., Yang Q., Maroofian R., Espana-Bonilla P., Lee C.-H., Zhang S., Padilla N., Fueyo R., Waxman E.A., et al. Gain and loss of function variants in EZH1 disrupt neurogenesis and cause dominant and recessive neurodevelopmental disorders. Nat. Commun. 2023;14:4109. doi: 10.1038/s41467-023-39645-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Alkan C., Coe B.P., Eichler E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 2011;12:363–376. doi: 10.1038/nrg2958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Benjamini Y. Discovering the false discovery rate: False Discovery Rate. J. R. Stat. Soc. Series B Stat. Methodol. 2010;72:405–416. doi: 10.1111/j.1467-9868.2010.00746.x. [DOI] [Google Scholar]
  • 35.Xu C., Jia W., Cheng X., Ying H., Chen J., Xu J., Guan Q., Zhou X., Zheng D., Li G., Zhao J. Genotype-phenotype correlation study and mutational and hormonal analysis in a Chinese cohort with 21-hydroxylase deficiency. Mol. Genet. Genomic Med. 2019;7 doi: 10.1002/mgg3.671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chang E.H., Zabner J. Precision genomic medicine in cystic fibrosis. Clin. Transl. Sci. 2015;8:606–610. doi: 10.1111/cts.12292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vestito L., Jacobsen J.O.B., Walker S., Cipriani V., Harris N.L., Haendel M.A., Mungall C.J., Robinson P., Smedley D. Efficient reinterpretation of rare disease cases using Exomiser. NPJ Genom. Med. 2024;9:65. doi: 10.1038/s41525-024-00456-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Robinson P.N., Ravanmehr V., Jacobsen J.O.B., Danis D., Zhang X.A., Carmody L.C., Gargano M.A., Thaxton C.L., UNC Biocuration Core. Karlebach G., et al. Interpretable Clinical Genomics with a Likelihood Ratio Paradigm. Am. J. Hum. Genet. 2020;107:403–417. doi: 10.1016/j.ajhg.2020.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kim H.Y., Ko J.M. Clinical management and emerging therapies of FGFR3-related skeletal dysplasia in childhood. Ann. Pediatr. Endocrinol. Metab. 2022;27:90–97. doi: 10.6065/apem.2244114.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rojnueangnit K., Xie J., Gomes A., Sharp A., Callens T., Chen Y., Liu Y., Cochran M., Abbott M.-A., Atkin J., et al. High incidence of Noonan syndrome features including short stature and pulmonic stenosis in patients carrying NF1 missense mutations affecting p.Arg1809: Genotype-phenotype correlation: Human mutation. Hum. Mutat. 2015;36:1052–1063. doi: 10.1002/humu.22832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Arnaud P., Milleron O., Hanna N., Ropers J., Ould Ouali N., Affoune A., Langeois M., Eliahou L., Arnoult F., Renard P., et al. Clinical relevance of genotype-phenotype correlations beyond vascular events in a cohort study of 1500 Marfan syndrome patients with FBN1 pathogenic variants. Genet. Med. 2021;23:1296–1304. doi: 10.1038/s41436-021-01132-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Castellani C., De Boeck K., De Wachter E., Sermet-Gaudelus I., Simmonds N.J., Southern K.W., ECFS Diagnostic Network Working Group ECFS standards of care on CFTR-related disorders: Updated diagnostic criteria. J. Cyst. Fibros. 2022;21:908–921. doi: 10.1016/j.jcf.2022.09.011. [DOI] [PubMed] [Google Scholar]
  • 43.Gonzalo S., Kreienkamp R., Askjaer P. Hutchinson-Gilford Progeria Syndrome: A premature aging disease caused by LMNA gene mutations. Ageing Res. Rev. 2017;33:18–29. doi: 10.1016/j.arr.2016.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Grossmann S., Bauer S., Robinson P.N., Vingron M. Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics. 2007;23:3024–3031. doi: 10.1093/bioinformatics/btm440. [DOI] [PubMed] [Google Scholar]
  • 45.Nannenberg E.A., van Rijsingen I.A.W., van der Zwaag P.A., van den Berg M.P., van Tintelen J.P., Tanck M.W.T., Ackerman M.J., Wilde A.A.M., Christiaans I. Effect of ascertainment bias on estimates of patient mortality in inherited cardiac diseases. Circ. Genom. Precis. Med. 2018;11 doi: 10.1161/CIRCGEN.117.001797. [DOI] [PubMed] [Google Scholar]
  • 46.Corvol H., Blackman S.M., Boëlle P.-Y., Gallins P.J., Pace R.G., Stonebraker J.R., Accurso F.J., Clement A., Collaco J.M., Dang H., et al. Genome-wide association meta-analysis identifies five modifier loci of lung disease severity in cystic fibrosis. Nat. Commun. 2015;6:8382. doi: 10.1038/ncomms9382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Dareng E.O., Tyrer J.P., Barnes D.R., Jones M.R., Yang X., Aben K.K.H., Adank M.A., Agata S., Andrulis I.L., Anton-Culver H., et al. Polygenic risk modeling for prediction of epithelial ovarian cancer risk. Eur. J. Hum. Genet. 2022;30:349–362. doi: 10.1038/s41431-021-00987-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Graefe A.S.L., Hübner M.R., Rehburg F., Sander S., Klopfenstein S.A.I., Alkarkoukly S., Grönke A., Weyersberg A., Danis D., Zschüntzsch J., et al. An ontology-based rare disease common data model harmonising international registries, FHIR, and Phenopackets. Sci. Data. 2025;12:234. doi: 10.1038/s41597-025-04558-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S88; Tables S1 and S3–S11
mmc1.pdf (1.2MB, pdf)
Table S2. Significant GPCs between categorical genotypes and phenotypes
mmc2.xlsx (18.6KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (4.5MB, pdf)

Data Availability Statement

  • No original data were generated for this analysis. GPSEA is available at https://github.com/P2GX/gpsea under an MIT license. Documentation and a tutorial are provided at https://p2gx.github.io/gpsea/stable/.

  • The GPSEA case studies (gpsea-cs) repository provides one Jupyter notebook for each of the cohorts analyzed in this work and is available at https://github.com/P2GX/gpsea-cs. This repository additionally contains code we used to generate the supplemental figures and tables that is not needed by new users of GPSEA.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES