Skip to main content
Genome Medicine logoLink to Genome Medicine
. 2025 Oct 21;17:127. doi: 10.1186/s13073-025-01546-1

An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser

Isabelle B Cooperstein 1, Shruti Marwaha 2,3, Alistair Ward 1,4, Shilpa N Kobren 5, Jennefer N Carter 2; Undiagnosed Diseases Network, Matthew T Wheeler 2,3, Gabor T Marth 1,
PMCID: PMC12539062  PMID: 41121346

Abstract

Background

Exome sequencing (ES) and genome sequencing (GS) are increasingly used as standard genetic tests to identify diagnostic variants in rare disease cases. However, prioritizing these variants to reduce the time and burden of manual interpretation by clinical teams remains a significant challenge. The Exomiser/Genomiser software suite is the most widely adopted open-source software for prioritizing coding and noncoding variants. Despite its ubiquitous use, limited data-driven guidelines currently exist to optimize its performance for diagnostic variant prioritization. Based on detailed analyses of Undiagnosed Diseases Network (UDN) probands, this study presents optimized parameters and practical recommendations for deploying the Exomiser and Genomiser tools. We also highlight scenarios where diagnostic variants may be missed and propose alternative workflows to improve diagnostic success in such complex cases.

Methods

We analyzed 386 diagnosed probands from the UDN, including cases with coding and noncoding diagnostic variants. We systematically evaluated how tool performance was affected by key parameters, including gene:phenotype association data, variant pathogenicity predictors, phenotype term quality and quantity, and the inclusion and accuracy of family variant data.

Results

Parameter optimization significantly improved Exomiser’s performance over default parameters. For GS data, the percentage of coding diagnostic variants ranked within the top 10 candidates increased from 49.7% to 85.5%, and for ES, from 67.3% to 88.2%. For noncoding variants prioritized with Genomiser, the top 10 rankings improved from 15.0% to 40.0%. We also explored refinement strategies for Exomiser outputs, including using p-value thresholds and flagging genes that are frequently ranked in the top 30 candidates but rarely associated with diagnoses.

Conclusion

This study provides an evidence-based framework for variant prioritization in ES and GS data using Exomiser and Genomiser. These recommendations have been implemented in the Mosaic platform to support the ongoing analysis of undiagnosed UDN participants and provide efficient, scalable reanalysis to improve diagnostic yield. Our work also highlights the importance of tracking solved cases and diagnostic variants that can be used to benchmark bioinformatics tools. Exomiser and Genomiser are available at https://github.com/exomiser/Exomiser/.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13073-025-01546-1.

Keywords: Rare disease, Variant prioritization, Diagnosis, Exomiser, Genomiser, Parameter optimization, Phenotype, HPO, Genome sequencing, Exome sequencing

Background

Fewer than half of the approximately 10,000 documented rare diseases have an identified genetic etiology; however, most are presumed to be genetic in origin [1, 2]. Despite advances in next-generation sequencing, 59–75% of rare disease patients remain undiagnosed after undergoing sequencing-based tests, often due to the difficulty of accurately prioritizing and interpreting the clinical relevance of candidate variants [35]. Clinical diagnostic teams have limited time per patient, making it critical to minimize the number of variants requiring their review while also minimizing the chance of filtering out, and thus falsely removing from consideration, true diagnostic variants.

Variant prioritization tools integrate evidence such as population allele frequency, in silico predictions of variant deleteriousness, and familial segregation patterns [6]. Phenotype-based approaches refine the search further by incorporating patients’ clinical presentations encoded by Human Phenotype Ontology (HPO) [7] terms and leveraging gene-phenotype-disease associations [6]. These tools combine genotypic data in a structured framework to focus diagnostic efforts on variants most likely linked to the patient’s condition. Among freely available open-source tools for phenotype-based prioritization of single-nucleotide variants (SNVs) and indels in academic and nonprofit research, only Exomiser [8] (and its noncoding extension, Genomiser [9]) and AI-MARRVEL [10] met our essential selection criteria. These included accessibility via both web-based and programmatic interfaces, support for commonly used reference genomes like GRCh38, support for family-based analysis, and the ability to prioritize noncoding variants (Additional file 1: Table S1). Due to Exomiser’s more widespread use in both clinical and research settings, we selected it as the most impactful target for developing practical recommendations [5, 11, 12].

Exomiser is a scalable, secure, and efficient tool for variant prioritization, incorporating state-of-the-art annotations and prediction algorithms in regular updates. Exomiser calculates gene-level variant and phenotype scores, combining them to generate a ranked list of protein-coding candidate variants or genes across different modes of inheritance [13]. Protein-coding-focused tools such as Exomiser are the standard initial diagnostic approach due to the difficulty of interpreting variants lying outside of coding regions [14]. In contrast, the Genomiser tool was developed by the same group to focus specifically on regulatory variants, employing the same algorithms as Exomiser but expanding the search space beyond coding regions [9]. Genomiser also incorporates ReMM [9] scores, designed to predict the pathogenicity of noncoding regulatory variants. Genomiser has been shown to be effective in identifying compound heterozygous diagnoses in cases where one diagnostic variant is regulatory, and the other is coding or splice-altering [9]. However, because non-coding regions can introduce substantial noise, it has been recommended that Genomiser be used as a complementary tool alongside Exomiser rather than a replacement.

Standard user inputs to Exomiser and Genomiser include a proband or multi-sample family variant call format (VCF) [15] file, a corresponding pedigree file in PED format, and proband phenotype terms represented by HPO terms [7]. Users select tool parameters related to proband gene-phenotype similarity algorithms, variant pathogenicity scores, and frequency filters. Despite Exomiser’s widespread use, practical, data-driven guidelines for optimizing its parameters with respect to recently updated or newly developed annotations are lacking, especially for GS data [16]. We systematically investigated the impact of these parameters on the ranking of known diagnostic variants in Exomiser and Genomiser outputs. We evaluated how differences in the quality of user-provided data, including VCF filtering criteria, HPO term quality and quantity, and familial information, impact this rank. Here, we present practical, data-supported guidelines for optimizing Exomiser and Genomiser performance using a cohort of solved cases from the Undiagnosed Diseases Network (UDN) [17]. Our study highlights scenarios where diagnostic variants may be omitted or ranked beyond the top 30 candidates and proposes alternative workflows to improve diagnostic success in complex undiagnosed cases. By applying this optimized process to an expanding number of UDN cases, we establish a scalable framework for efficient variant prioritization that can be broadly implemented for periodic reanalysis across a wide range of research and clinical scenarios.

Methods

Undiagnosed Diseases Network participant process and consent

The Undiagnosed Diseases Network (UDN) is an NIH-funded national research study designed to improve diagnosis and understanding of rare conditions by bringing together clinical and research experts across the USA. A detailed description of UDN processes is provided in the manual of operations [18]. Briefly, individuals may self-refer or be referred by a healthcare provider, and applications are reviewed by a central clinical intake committee. Because only a limited number of individuals can be accepted each year, preference is given to applicants with the greatest potential to achieve a diagnosis or generate new knowledge about disease mechanisms.

Inclusion criteria require that the applicant has a condition that remains undiagnosed despite thorough evaluation by a healthcare provider, the presence of at least one objective finding, and the agreement to the storage and sharing of information and biomaterials (identified within UDN sites, de-identified for research sites beyond the network). Applications are denied if the applicant already has a diagnosis explaining their objective findings, if review of records suggests a diagnosis and further evaluation by the UDN is deemed unnecessary, or if the applicant is too ill to safely travel to a UDN clinical site and telemedicine is not feasible or appropriate.

Following a comprehensive review by the clinical team, accepted UDN participants undergo in-depth clinical evaluations, including ES or GS of available affected and unaffected family members. Medical records and in-person evaluations are reviewed both manually and with computational assistance [19] to encode phenotypic features as both positive (present) and negative (absent) standardized phenotype terms using the HPO. The extensive information collected during the application and iterative evaluation process is uploaded to the UDN Data Management and Coordinating Center to support ongoing research and facilitate data sharing across the network.

UDN participants provide consent for their genomic, phenotypic, and clinical data to be shared broadly with researchers within the network. This process supports the evaluation of gene-phenotype relationships for individual probands and candidate genes and enables the use of participant data in research studies approved by the UDN Research and Publications Committee in accordance with the UDN Institutional Review Board (IRB) protocol and Manual of Operations. All UDN participant identifiers were replaced with masked IDs that do not correspond to their original consortium-assigned identifiers.

Harmonization of exome and genome sequencing data

GS and ES cohort-level variant datasets from UDN participants were produced as follows. First, unaligned, paired-end GS FASTQ files from 5412 unique samples corresponding to 5353 individuals from 1772 families enrolled in the UDN as of November 2023 were aligned to human reference GRCh38 (with decoys and all alt contigs) and processed to produce per-sample GVCF files in the Amazon Web Services cloud using the Clinical Genome Analysis Pipeline (CGAP, https://cgap.hms.harvard.edu/) [20]. Per-sample GVCFs were then downloaded to the local Harvard institutional cluster, where SNVs and short insertions/deletions (indels) were jointly called across all samples using Sentieon [21]. Similarly, unaligned ES FASTQ pairs from 1454 unique samples corresponding to 1439 individuals from 314 families enrolled in the UDN as of June 2024 were processed using the same workflow. All processing steps were performed in parallel on the Harvard institutional cluster [22]. Multi-sample VCF [15] files for each UDN case, including the affected proband and relevant affected and unaffected family members, were extracted from these cohort-level, jointly called GS or ES variant datasets.

Curation of comprehensive phenotype lists and variant-level genetic diagnoses

HPO terms for UDN participants were stored in the UDN database by the clinical teams managing each case using PhenoTips [23]. These comprehensive HPO term lists served as input for Exomiser and Genomiser in this study. Diagnostic variants were primarily stored in HGVS format; when unavailable, they were manually extracted from sequencing reports in the UDN database. Analyses involving randomly sampled HPO terms were conducted by selecting a set of random terms from the list of all 18,697 terms in the HPO database, with each term having an equal chance of sampling, matching the proband’s comprehensive list in size but ensuring no overlaps.

The earliest and now obsolete collection interface for UDN cases included a separate prenatal and perinatal section with prelisted terms and checkboxes. In contrast, phenotype terms were added using free text matches in the clinical section. This illustrates the importance of HPO term curation, where biases from automated extraction or term collection interfaces may be reflected in collected HPO term sets. Analyses involving “pruned” term lists were conducted by removing all perinatal and prenatal HPO terms from probands’ comprehensive HPO term lists. A complete list of removed terms is provided in Additional file 2.

This study considered only cases classified as having a “certain” or “highly likely” diagnosis by the UDN. Briefly, these classifications follow established diagnostic guidelines. For diseases lacking clear diagnostic criteria, diagnoses are based on a synthesis of the available objective data and the judgment of the treating clinician. A “certain” diagnosis is defined as one that is associated with minimal uncertainty, while a “highly likely” diagnosis is associated with an element of uncertainty but not enough to dismiss it for the purpose of clinical decision making [17].

Curation of benchmarking cohorts

To establish truth sets for benchmarking Exomiser and Genomiser, we applied the following inclusion criteria to all diagnostic genes in the UDN jointly called cohorts: (i) probands with a comprehensive list of HPO terms describing their presentation/symptoms; (ii) diagnoses labeled as “certain” or “highly likely” by the clinical team; (iii) chief method of diagnosis “Genome-scale sequencing (e.g., exome sequencing, genome sequencing)” to ensure diagnosis was made primarily using sequencing data (e.g., ES or GS) rather than other clinical tests; and (iv) SNV or indel diagnostic variants (compound heterozygous diagnoses consisting of one SNV/indel and one structural variant were excluded). Although Exomiser can handle structural variants as input, these variants were not considered as part of this study, as the cohort-level harmonized variant data currently includes only SNV and indel variants. Inclusion was determined based on information retrieved from the UDN database on April 8, 2024.

Theoretically, Genomiser can be run in place of Exomiser, as it incorporates the same methods [9]. However, Genomiser’s additional consideration of noncoding variants often results in downgrading coding variants in rank (Additional file 3: Fig. S1). This negatively impacts its ability to prioritize coding variants, presumably due to the inherent challenges in establishing the pathogenicity of noncoding variants [14]. To address this, we recommend a two-step approach when analyzing GS variant data: using Exomiser for coding variants followed by Genomiser for non-coding variants, where necessary. Further, we curated a dedicated Genomiser benchmarking cohort focusing on GS cases involving noncoding variants in the diagnosis.

The included data were divided into three cohorts:

  1. ES Exomiser Cohort: ES cases with diagnostic variants in coding regions

  2. GS Exomiser Cohort: GS cases with diagnostic variants in coding regions

  3. GS Genomiser Cohort: GS cases with diagnostic variants in noncoding regions or compound heterozygous diagnoses involving noncoding variant(s)

Cohorts 1 and 2 (i.e., diagnostic coding variants) were used for Exomiser benchmarking, while cohort 3 (with diagnostic noncoding variants) was used for Genomiser benchmarking. These cohorts included probands with multiple diagnostic variants due to multiple diagnostic genes or compound heterozygous diagnoses.

Among the 181 diagnosed UDN probands with ES data, the inclusion criteria identified 153 variants in 130 genes from 125 probands (Table 1; Additional file 3: Fig. S2A, B, C). Five probands (4.0%) had two diagnostic genes. A total of 69 variants in 55 diagnostic genes from the ES jointly called data were excluded for not meeting the outlined inclusion criteria or additional factors (Additional file 3: Fig. S3A). Incomplete penetrance or misphenotyping of relatives was noted in 13 families. In this context, misphenotyping refers to incorrectly annotating a family member as unaffected due to their mild phenotypic presentation.

Table 1.

Summary of genomic data from diagnosed UDN cases used to benchmark Exomiser and Genomiser

Sequencing type Variant type Probands Diagnostic genes Diagnostic variants MOI Test type*
ES Coding 125 130 153

AD = 90

AR = 54

XR = 2

XD = 7

Singleton = 9

Duo = 16

Trio = 64

Quad = 26

 > Quad = 10

GS Coding 231 239 296

AD = 144

AR = 138

XR = 6

XD = 8

Singleton = 24

Duo = 29

Trio = 114

Quad = 48

 > Quad = 16

GS Noncoding/intronic 39 39 60

AD = 9

AR = 46

XR = 3

XD = 2

Singleton = 1

Duo = 3

Trio = 23

Quad = 8

 > Quad = 4

Summary of the three UDN cohorts used to benchmark Exomiser and Genomiser

MOI mode of inheritance, AD autosomal dominant, AR autosomal recessive, XR X-linked recessive, XD X-linked dominant

*Test type refers to the number of closely related individuals included in the analysis, not a specific family structure. For example, “trio” indicates any family structure consisting of three relatedfamily members, not necessarily a parent–parent–child trio

Among the 404 diagnosed UDN probands with GS data, the inclusion criteria identified 296 variants in 239 genes from 231 probands for the GS Exomiser Cohort. (Table 1; Additional file 3: Fig. S2D, E, F). Four probands (1.7%) had two diagnostic genes, and two probands (0.9%) had three diagnostic genes. The GS Genomiser cohort included 39 probands and genes with 60 diagnostic variants (Table 1; Additional file 3: Fig. S2 G, H, I), including 15 genes with compound heterozygous diagnoses involving one coding and one noncoding variant. A total of 154 variants in 127 genes from the GS jointly called data were excluded for not meeting the outlined inclusion criteria or additional factors (Additional file 3: Fig. S3B). We identified inaccuracies in pedigree information, including incomplete penetrance or misphenotyping of relatives in 22 families affecting 24 diagnostic variants, leading to their exclusion from the GS Exomiser cohort.

Nine probands with coding diagnostic variants had both ES and GS data and were evaluated in both truth sets.

Variant filtering

Exomiser and Genomiser consider all variants in the input VCF file as diagnostic candidates. Consequently, low-quality variants, or those likely to be artifacts of the mapping or variant calling algorithms (false positives), can be inadvertently ranked highly by these tools, increasing the burden of manual variant review. Although input VCF files can be preprocessed to remove likely false positive variants, common variant filtering strategies also risk removing true, potentially diagnostic variants from an input VCF. Standard variant callers, such as GATK [24], generate quality metrics like genotype quality (GQ), a Phred-scaled value that reflects the confidence that the genotype assigned to a sample is correct [25]. Variant allele frequency (VAF) is calculated as the proportion of sequencing reads that support an alternate allele relative to the total number of reads aligned at a specific genomic locus [25, 26]. Ideally, heterozygous loci have a VAF of 50%, homozygous alternate loci 100%, and homozygous reference loci 0%. In reality, the VAF deviates from these ideal values as a result of the sequencing process, data processing, and variant calling methods [27]. Additionally, an alternate allele value of “*” in a multi-sample VCF represents a deletion in the current sample that spans or overlaps another variant present in a different sample.

We first sought to determine optimal variant filtering criteria to minimize the inadvertent exclusion of true diagnostic variants from the input VCF. Across the three cohorts, 35 (6.9%) diagnostic variants were not prioritized using Exomiser or Genomiser default parameters on raw, unfiltered input VCFs. These variants were excluded from the filtering analysis, leaving 474 diagnostic variants to evaluate filtering criteria. Variants with an alternate allele of “*” were excluded, and the effects of filtering based on GQ and VAF of heterozygous variants were assessed. We found that increasing GQ filter stringency raised the percentage of diagnostic variants removed due to filtering (Fig. 1A). Narrowing the VAF window from the least (10%–90%) to the most (25%–75%) stringent additionally removed ~0.5% (n = 2) of diagnostic variants. As expected, the rank of diagnostic variants improved as filter stringency increased (Fig. 1B), highlighting the trade-off between retaining diagnostic variants and improving their prioritization rank.

Fig. 1.

Fig. 1

Evaluating VCF filtering criteria on 474 variants in combined ES and GS cohorts. A Minimum genotype quality (GQ) versus percent of diagnostic variants removed due to filtering criteria under varying required variant allele frequency (VAF) ranges for heterozygous variants represented by colored lines. Light blue line (15%–85%) overlaps dark blue (10%–90%). B Minimum GQ versus mean rank of diagnostic variants in Exomiser or Genomiser outputs under default parameters with varying required VAF ranges for heterozygous variants represented by colored lines

To balance these trade-offs, we recommend filtering input VCFs prior to variant prioritization to ensure that only high-quality variants are considered. We adopted filters of 15% ≤ VAF ≤ 85% for heterozygous variants and GQ ≥ 20 as an optimal compromise. All results in this manuscript are based on VCF files preprocessed with these filters, and all “*” alternate allele variants were removed. Per case, approximately 1.34 million variants were removed to create filtered VCFs, retaining 6.45 million variants on average in the GS Exomiser cohort (Additional file 1: Table S2).

We also note that filtering on VAF may lead to the exclusion of potentially diagnostic mosaic variants, which are present only in a subset of cells and, therefore, would have a VAF that deviates from 50%. This circumstance is not observed in our cohorts, as mosaic cases in the UDN were often diagnosed by methods other than standard exome or genome sequencing and were therefore excluded from this benchmarking cohort.

Benchmarking strategy

We defined three success criteria to evaluate Exomiser and Genomiser’s ability to accurately prioritize diagnostic variants:

  1. Gene-level success: The diagnostic gene is present in the prioritized Exomiser/Genomiser output, regardless of whether the variant position(s) or nucleotide change(s) contributing to the gene’s score are correct. This is the most lenient criterion.

  2. Variant-level success: The diagnostic variant, including the correct position and nucleotide change, is prioritized, even if the mode of inheritance (MOI), such as autosomal dominant (AD) or autosomal recessive (AR), is incorrect. This is our primary success measure and is most frequently reported in this manuscript.

  3. Variant-level success with correct MOI: The diagnostic variant meets criterion 2 and has the correct MOI, making this the most stringent criterion.

For example, in compound heterozygous cases where one variant is not prioritized and the second is prioritized as an AD variant, this qualifies as a gene-level success (criterion 1), one success and one failure at the variant level (criterion 2), and two failures under variant-level success with correct MOI (criterion 3). In cases where a proband was diagnosed with pathogenic variants in more than one gene, we treated each proband-gene pair as an independent benchmarking unit. For example, if a proband had diagnostic variants in two genes, each gene was assessed separately for ranking success based on its corresponding variant(s). Thus, the best possible outcome would be for the two genes or variants to be ranked first and second by Exomiser.

Exomiser terminology defines the variant(s) with the highest variant score within each candidate gene as a contributing variant. By default, Exomiser outputs only contributing variants, even if other variants in the gene are scored. These non-contributing variants can be optionally included in the output. In this manuscript, a variant must be designated as a contributing variant to count as a variant-level success, but non-contributing variants are made available to users upon request via Mosaic, as described in the “Results”.

Finally, we imposed a top 30 ranking cutoff for success. Diagnostic variants ranked beyond 30 are unlikely to be manually reviewed and are categorized as “poor performance.”

Bioinformatics tools

Exomiser/Genomiser version v14.0.0 was used in the analyses reported with the dataset version 2406. Documentation and installation information can be found at https://github.com/Exomiser/Exomiser/ [28].

All VCF subsetting, VAF calculations, and VCF filtering were done using BCFtools version 1.16 [29]. HGVS to GRCh38 genomic coordinate liftover was done using the BCFtools liftover plug-in via the Broad Institute [30]. Exomiser/Genomiser was run using CADD v1.7 [31] and ReMM v0.4 [9] scores when applicable. The ClinVar whitelist feature was disabled for all benchmarking analyses, as diagnosed UDN cases are periodically submitted to ClinVar.

Code availability

Example YAML files and scripts used in the analyses of this paper are available in the GitHub repository at https://github.com/icooperstein/exomiser_optimization [32].

Results

Assembly of UDN benchmarking cohorts

The benchmarking cohorts comprised 386 diagnosed UDN probands and their families (“Methods”). Approximately half of the probands were diagnosed with disorders of the nervous system, including conditions of the brain and spinal cord. Probands had a median of 20 HPO terms describing their condition, with the most frequently assigned terms representing general neurological features such as global developmental delay and seizure (Additional file 3: Fig. S3C, D, E). We used these cohorts to systematically evaluate the impact of various parameters on diagnostic variant ranking by Exomiser and Genomiser, including phenotypic prioritization algorithms, variant pathogenicity prediction tools, HPO term quality and quantity, and the inclusion and accuracy of family data.

Impact of gene-phenotype association database selection on Exomiser and Genomiser performance

Exomiser and Genomiser calculate phenotype scores using user-specified phenotypic similarity algorithms that compare proband HPO terms to known gene-phenotype associations. One option, PhenIX [33], restricts these comparisons to known Mendelian disease genes and uses gene-phenotype associations derived exclusively from human clinical data. PHIVE [34] considers all genes and uses gene-phenotype associations derived from mouse ortholog experiments. Finally, under default parameters, hiPHIVE [35] considers all genes and uses gene-phenotype associations from human, mouse, and zebrafish models, as well as phenotype associations of gene neighbors in a protein–protein interaction (PPI) network.

We compared Exomiser’s performance using PhenIX, PHIVE, default hiPHIVE, and no phenotype prioritization algorithm while keeping variant pathogenicity sources constant. As expected, omitting phenotype prioritization produced the poorest performance, with 40.9% of diagnostic variants ranked within the top 10 candidates in the GS Exomiser cohort (Fig. 2). Utilizing PHIVE phenotypic prioritization improved performance to 58.1%, utilizing PhenIX improved performance to 62.8%, and utilizing hiPHIVE with default parameters further improved performance to 66.6% within this range.

Fig. 2.

Fig. 2

Evaluation of phenotype prioritization algorithms in the GS Exomiser cohort. Cumulative percentage of diagnostic variants (n = 296) ranked at or above each rank threshold (x-axis). Each curve corresponds to a specific phenotype prioritization algorithm parameter setting represented by color. All runs use filtered VCF input and REVEL + MVP as the variant pathogenicity sources (default). Default phenotype prioritization algorithms (PHIVE, PhenIX, hiPHIVE) are compared, as well as variations of hiPHIVE using specific combinations of model organism gene-phenotype databases. hiPHIVE default (dark blue) and hiPHIVE human, PPI, and mouse (pink) curves overlap. The x-axis is limited to rank thresholds 1–30, consistent with our benchmarking strategy that defines successful prioritization as rank ≤ 30 (“Methods”)

Next, we evaluated different model organism gene:phenotype association databases using the top-performing phenotype prioritization algorithm, hiPHIVE. Notably, the highest performance (82.8%) was achieved when running hiPHIVE with human-only associations (Fig. 2), representing a 16.2% increase in performance compared to Exomiser’s recommended default hiPHIVE parameter setting using all available models of gene-phenotype associations (Fig. 2; Fig. 3A; blue to orange). Similarly, human-only hiPHIVE improved performance in the ES cohort, with a 5.8% increase in diagnostic variants ranked within the top 10 candidates compared to default hiPHIVE, and even greater gains observed at higher rank thresholds (Fig. 3B, blue to orange).

Fig. 3.

Fig. 3

Stepwise optimization process for Exomiser and Genomiser across three UDN cohorts. A GS Exomiser cohort (n = 296 variants). B ES Exomiser cohort (n = 153 variants). C GS Genomiser cohort (n = 60 variants). Red lines: Exomiser or Genomiser performance under default settings (hiPHIVE all models; REVEL + MVP (+ ReMM for Genomiser)) using raw, unfiltered VCFs. All other runs use the filtered VCFs to remove potential false positive variants (“Methods”). Blue lines, Exomiser or Genomiser performance under default settings (hiPHIVE all models; REVEL + MVP, (+ ReMM for Genomiser)) using filtered VCFs. Orange lines, Exomiser or Genomiser performance using hiPHIVE human-only associations and REVEL + MVP (+ ReMM for Genomiser) pathogenicity prediction sources. Green lines, Exomiser or Genomiser performance using hiPHIVE human-only associations and REVEL + MVP + AlphaMissense + SpliceAI (+ ReMM for Genomiser) pathogenicity prediction sources. These are our optimized parameters

In the GS Genomiser cohort, 26.7% of diagnostic variants were ranked within the top 10 candidates, regardless of whether hiPHIVE used all available models (default) or human-only gene-phenotype associations (Fig. 3C, blue to orange). However, for diagnostic variants prioritized under both settings, human-only hiPHIVE always yielded higher ranks (Additional file 3: Fig. S4A).

Our results indicate that restricting hiPHIVE to human-only associations improves overall ranking performance over default settings. Nevertheless, there were rare instances where noncoding diagnostic variants missed by the human-only hiPHIVE were recovered by default multispecies settings (Additional file 3: Fig. S4B). Based on these findings, we recommend using the human-only hiPHIVE model for primary analyses due to its superior average performance. All subsequent analyses are conducted using human-only hiPHIVE.

Impact of variant pathogenicity prediction source selection on Exomiser performance

Each variant in Exomiser or Genomiser outputs receives a frequency score (0–1) based on its frequency in selected variant databases (frequencySources; Additional file 4: Text S1) and a pathogenicity score (0–1) drawn from user-specified pathogenicity predictors (pathogenicitySources) [16, 36]. Most of the built-in pathogenicity predictors (e.g., REVEL [37]) provide scores on a 0 (benign) to 1 (pathogenic) scale, while Exomiser normalizes others (e.g., CADD [27]) to this range. Combining many prediction sources increases the likelihood that at least one will assign a high (pathogenic) score to a given variant, inflating scores since Exomiser uses the maximum score across sources as the variant’s pathogenicity score. Variants not scored by any of the selected pathogenicity prediction sources are assigned a class-based pathogenicity score, as defined in Additional file 1: Table S3 [36]. The final variant score is the product of the frequency and pathogenicity scores. Default Exomiser parameter settings use REVEL and MVP [38] as pathogenicity sources.

We evaluated Exomiser’s average performance using individual pathogenicity sources and observed similar results across many sources (Additional file 3: Fig. S5), though differences likely exist for discrete variant classes. Adding AlphaMissense (AM) [39] and SpliceAI [40] to the default combination of REVEL and MVP modestly improved performance, with AM and MVP primarily contributing to missense variant predictions and SpliceAI to splice variants (Fig. 4A mint green to salmon; Additional file 3: Fig. S6B). These findings suggest that, as expected, pathogenicity prediction sources that are tailored to specific variant classes may have limited impact on cohort-wide performance but can meaningfully improve prioritization within relevant subsets of variants.

Fig. 4.

Fig. 4

Evaluation of variant pathogenicity prediction score sources in the GS Exomiser cohort. A Cumulative percentage of diagnostic variants (n = 296) ranked at or above each rank threshold (x-axis) under different combinations of variant pathogenicity prediction sources. Each colored curve corresponds to a specific combination of sources, as indicated in the boxed legend. B, C Breakdown of maximum pathogenicity score sources for prioritized diagnostic variants (B) or nondiagnostic variants (C) under each source combination. X-axis indicates the combination of sources used, and the y-axis represents the number of prioritized variants. Bar color reflects which source provided the maximum pathogenicity score, as indicated in the top legend. Dashed lines mark the total number of prioritized variants under each source combination. Variants not covered by any pathogenicity sources in that combination are assigned a class-based score (Additional file 1: Table S3) and, therefore, have no maximum score source, which is represented by the uncolored whitespace in each bar. D, E Distribution of maximum pathogenicity scores for all prioritized diagnostic variants (D) or nondiagnostic variants (E). X-axis represents the pathogenicity score (ranging from 0 to 1), and color indicates the combination of scoring sources used in each run, as indicated in the boxed legend. These distributions highlight differences in score separation between diagnostic and nondiagnostic variants. All data represent Exomiser runs on filtered VCFs (“Methods”) using hiPHIVE human-only gene:phenotype associations

Older tools (PolyPhen [41], MutationTaster [42], and SIFT [43]) disproportionately contributed maximum scores when included (Additional file 3: Fig. S6C), consistent with prior reports that they overestimate pathogenicity. Adding Exomiser-normalized CADD scores to the combination of REVEL, MVP, AM, and SpliceAI introduced a strong bias toward CADD as the maximum pathogenicity score source, regardless of variant type or diagnostic status (Fig. 4B, C salmon vs purple; Additional file 3: Fig. S6D). CADD Phred-like scores reflect the log-scaled rank of a variant’s raw CADD score relative to all 8.6 billion possible SNVs in the human reference genome [31]. Exomiser normalizes these scores to a 0–1 range, which results in most exonic variants—which tend to be in the top 2% of SNVs—receiving a normalized score of 0.98 or higher. Because diagnostic variants are more likely to be truly pathogenic, we expect them to have higher pathogenicity scores than nondiagnostic variants. This differentiation was observed using the REVEL, MVP, AM, and SpliceAI combination but was substantially diminished when CADD or older predictors were included (Fig. 4D, E, salmon vs purple/pink/green). Additionally, approximately 10,000 variants received scores exclusively from AM and SpliceAI, reflecting the broader coverage provided by these newer tools (Fig. 4B, C, mint green vs salmon).

These findings suggest that normalized CADD scores bias pathogenicity rankings, likely due to difficulties in maintaining equivalency with other pathogenicity sources’ score scales, and older tools tend to overestimate pathogenicity. In contrast, adding AM and SpliceAI to the default REVEL and MVP combination improves Exomiser performance, increasing the proportion of diagnostic variants ranked within the top 10 by 2.7% over default settings (Fig. 3A, orange to green). We therefore recommend the combination of REVEL, MVP, AM, and SpliceAI as pathogenicity sources for Exomiser analyses on GS or ES (Additional file 3: Fig. S7) variant data.

Impact of variant pathogenicity prediction source selection on Genomiser performance

Genomiser prioritizes noncoding variants and incorporates ReMM scores, which predict the pathogenicity of noncoding regulatory variants. Genomiser’s default settings use ReMM, REVEL, and MVP as pathogenicity prediction sources. We evaluated the average performance of Genomiser across all variants using different combinations of the available sources in the GS Genomiser cohort. Patterns were broadly consistent with those observed in the GS Exomiser cohort, including biases introduced by older tools (PP, MT, and SIFT) and normalized CADD scores (Additional file 3: Fig. S8).

Unexpectedly, excluding ReMM scores improved performance in this cohort (Additional file 3: Fig. S8A). This may reflect the cohort’s small size (n = 60) and lack of diagnostic variant types that ReMM is designed to predict, such as promoter or enhancer variants. Several cases involved compound heterozygous diagnoses comprising both coding (e.g., missense) and noncoding (e.g., splice-altering) variants (Additional file 3: Fig. S2H), for which ReMM is not a better predictor than missense variant prediction tools or SpliceAI. Notably, three variants (5.0%) were successfully prioritized using SpliceAI scores when ReMM was excluded but failed when ReMM was included, as higher ReMM scores were assigned to alternative variants in the diagnostic gene, resulting in the true diagnostic variants labeled as non-contributing. This suggests that for certain noncoding variant types—such as intronic splice-altering variants—tools like SpliceAI provide more informative pathogenicity predictions, but their signal can be obscured when ReMM is included due to competing high ReMM scores for other variants.

Importantly, removing ReMM is not equivalent to running Exomiser, which only considers coding variants and, in this GS Genomiser cohort, ranked just one diagnostic variant within the top 30 candidates (Additional file 3: Fig. S9). However, this frameshift variant was incorrectly prioritized under an autosomal dominant (AD) mode of inheritance. Identifying its compound heterozygous partner—a noncoding intronic SNV—required running Genomiser. In all similar cases, Exomiser failed to prioritize the coding variant under any mode of inheritance, and Genomiser was required to recognize the pathogenicity of the coding and noncoding variant combination.

These findings highlight Genomiser’s essential role in prioritizing noncoding and regulatory variants, particularly in compound heterozygous diagnoses, which remain challenging in the field. While excluding ReMM improved performance in this specific cohort, we caution that this may reflect cohort limitations—such as the small sample size and underrepresentation of promoter or enhancer variants that ReMM is designed to predict. Given this, we recommend running Genomiser with ReMM, REVEL, MVP, and SpliceAI, which improved performance in the top 10 candidates by 13.3% compared to default variant prioritization source settings (Fig. 3C, orange to green).

Impact of proband phenotype term quality on Exomiser performance

HPO terms can be manually curated through clinical review or algorithmically extracted, such as via natural language processing (NLP) of medical records [19]. This variability results in some probands receiving concise terms most relevant to a genetic diagnosis, while others are assigned exhaustive lists that may include terms that are less specific or are associated with comorbidities unrelated to their genetic disease. In this study, UDN probands had a median of 20 HPO terms (range: 1–117; Additional file 3: Fig. S3C).

We evaluated how probands’ HPO term quality and quantity affected Exomiser’s ability to prioritize diagnostic variants. Removing all HPO terms significantly reduced prioritization accuracy (Additional file 3: Fig. S10A). Interestingly, assigning randomly selected HPO terms from the pool of all terms in the ontology (“Methods”) improved performance over using no terms. This likely reflects the fact that including any terms increases scores for variants in genes with known phenotypic associations, which tend to be disease-associated genes. Since most diagnosed UDN cases involve previously discovered disease-associated genes, it is unsurprising that including even randomly selected HPO terms improves performance. However, applying random terms to undiagnosed cases may bias against as-yet undiscovered disease genes and is not recommended. Additionally, randomly sampled HPO terms from the complete ontology may still be hierarchically related to probands’ phenotypes. These imprecise but nearby terms can partially capture some patients’ true phenotypic presentation, potentially inflating performance despite incorrect or overly general terms.

To assess the impact of broad term sets containing terms non-informative to the underlying genetic condition, we added randomly sampled HPO terms to probands’ comprehensive lists. Exomiser performance declined consistently with additional terms, though the effect was modest (Additional file 3: Fig. S10A). Conversely, we pruned phenotype lists for 108 probands in the GS Exomiser cohort, removing prenatal and perinatal terms that likely reflected bias introduced by an early UDN term collection interface and were unlikely to be related to the proband’s underlying genetic condition (“Methods”; Additional file 3: Fig. S10B). Removing these terms changed the rank of diagnostic variants by only one position—improving rank for 17 variants (12.1%) and decreasing rank for four variants (2.9%) (Additional file 3: Fig. S10D, E, F, G).

These findings suggest that Exomiser performs best with accurate HPO terms characteristic of or indicative of genetic disorders but can tolerate imprecision that may be present in more exhaustive phenotype lists.

Role of pedigree accuracy and incomplete penetrance on Exomiser results

Diseases may exhibit incomplete penetrance, relatives may be misphenotyped (i.e., marked as unaffected despite mild phenotypic presentation), or pedigree files may contain human errors. We identified inaccuracies in pedigree information for 22 families with 24 diagnostic variants that led to Exomiser’s inability to rank the diagnostic variant(s) in its output. These cases were excluded from the GS Exomiser cohort (“Methods”; Additional file 3: Fig. S11A). When reanalyzed using proband-only variant data, 21 of the 24 diagnostic variants were recovered, with 8 ranked as the top candidate and 19 within the top 30 (Additional file 3: Fig. S11B, C). In many cases, manually correcting relatives’ affected status to align with the known inheritance pattern also recovered the diagnostic variants.

Notably, two diagnostic variants (8.3%) were recovered using proband-only variant data but were not prioritized when family data with corrected pedigrees were used (Additional file 3: Fig. S11C). The true diagnoses involved single heterozygous variants consistent with AD inheritance. In both cases, Exomiser incorrectly ranked the variants under an AR model in the proband-only analysis, pairing them with additional variants in the diagnostic genes, as parental origin could not be inferred. When run with family data, the variants failed Exomiser’s inheritance filtering step as the variants did not fit a compound heterozygous model, and the variants were not recognized as damaging under an AD model.

Relatives’ phenotypic status may be uncertain or inaccurate in diagnostic analyses, hindering variant prioritization. This can be further confounded by factors like incomplete penetrance and multigenic disorders with overlapping phenotypes. These findings highlight the importance of accurate pedigree information and suggest that re-evaluating the affected status of family members may help recover diagnostic variants within the top 30 ranked candidates. We also assessed how family variant data and the inheritance filtering step influence Exomiser’s ability to prioritize diagnostic variants when pedigree information is accurate. This analysis confirmed that leveraging family data improves performance but highlighted the inheritance filtering step as essential for enhancing prioritization even when only proband information is available (Additional file 4: Text S2; Additional file 3: Fig. S12).

Strategies to refine candidate variant lists from Exomiser and Genomiser outputs

A major goal of variant prioritization is to reduce the number of candidate variants requiring manual review while retaining the diagnostic variant. In this study, we optimized Exomiser and Genomiser parameters to improve the ranking of diagnostic variants, ideally within the top 30 candidates. However, some diagnostic variants may still fall below this threshold and require additional support from other "omic" data, such as RNA or long-read sequencing. Ideally, this multi-omic data should be analyzed together. In such cases, a broader list of candidate variants may be helpful. For example, this list may be cross-referenced against genes that show expression or splicing outliers or structural variants in the same gene that may suggest compound heterozygosity.

Although Exomiser provides p-values for each ranked variant and gene, there are no established guidelines for using these values to filter candidate lists. We explored whether p-value thresholds could serve as an alternative to a fixed top 30 cutoff to generate a less stringent prioritized list of SNV/indel variants for downstream integration with other data types.

We evaluated multiple thresholds (Additional file 3: Fig. S13) and found that p ≤ 0.3 offered a favorable balance—reducing noise and increasing precision by lowering the median number of prioritized variants per proband, while excluding only 5 (1.7%) diagnostic variants in the GS Exomiser cohort, thereby maintaining high recall (Additional file 1: Table S4). However, the number of retained variants was strongly influenced by family structure, with singleton cases yielding more candidates under the same threshold as compared to trios. While the p-value cutoff may reduce noise in integrated analyses, we caution that the ideal threshold for multi-omics applications likely depends on the specific context and analysis framework. Additional benchmarking is needed to define best practices for integrating Exomiser results with other “omic” data.

Additionally, we observed that certain genes regularly contributed high-scoring but nondiagnostic variants across this cohort of patients with very diverse phenotypes. To reduce noise and ensure a focus on high-quality candidates, we first applied a p-value threshold of p ≤ 0.3. Across 231 GS Exomiser cases, 86 genes appeared in the top 30 with p ≤ 0.3 in at least 5% of probands (Fig. 5; Additional file 5). Removing variants in these genes from the Exomiser output and subsequent reranking resulted in the loss of 14 (4.7%) diagnostic variants but moderately improved the rank of 78 (26.4%) (Additional file 3: Fig. S14A, B). Although Exomiser v14.0.0 includes an optional GeneBlacklistFilter step, which allows users to exclude specific genes from the prioritization process, we chose not to apply this filter. Instead, we aimed to identify frequently prioritized genes directly from our diagnostic cohort and recommend that users flag, rather than filter, such genes for manual review in a clinical context. Notably, only 7 of the 86 frequently ranked genes we identified in the GS Exomiser cohort appear in the default blacklist. We also identified frequently ranked genes in the ES Exomiser cohort (Additional file 3: Fig. S15; Additional file 5), but the small sample size of the GS Genomiser cohort (n = 39 probands) precluded a similar analysis, as 5% of this benchmarking cohort represents only two cases. Removing variants from genes frequently prioritized among the top candidates by Exomiser in a given cohort may result in the loss of true diagnostic variants. Therefore, we suggest a tiered approach that flags genes with frequently “over-prioritized” variants that can be further manually interpreted and eliminated cautiously.

Fig. 5.

Fig. 5

Frequently prioritized genes in the GS Exomiser cohort. Eighty-six genes that ranked in the top 30 candidates with p ≤ 0.3 for at least 5% of probands in the GS Exomiser cohort. List of genes can be found in Additional file 5. Color represents the binned average rank at which the gene was prioritized across all probands in the cohort. Bold font indicates OMIM genes with a confirmed causal relationship to a disease (OMIM 3). Shape reflects if the gene is diagnostic in the GS Exomiser cohort (square), not diagnostic in the GS Exomiser cohort but diagnostic for at least one proband in the UDN consortium as a whole (diamond), or not diagnostic in the UDN consortium (circle)

Applying optimized Exomiser parameters on newly diagnosed UDN probands

To evaluate the generalizability of our recommended Exomiser parameters beyond the specific solved cases used for benchmarking, we identified 17 UDN probands in the GS jointly called variant dataset who received diagnoses after our cohort assembly was finalized. We treated these 17 probands and their 23 diagnostic variants as an independent test set and applied our optimized parameters. This reproduced improvements in diagnostic variant ranking within the top 10 candidates, with 22 of 23 diagnostic variants prioritized within the top 30 (Fig. 6A), demonstrating that our recommendations are stable and generalizable to newly diagnosed cases. While most variants in this test set improved in rank with the optimized parameters, four cases displayed a decrease in the rank of the diagnostic variant(s) relative to default parameters (Fig. 6B). We observed an average Exomiser phenotype score of 0.51 in the default run compared to 0 with optimized parameters for these four cases. These cases highlight the importance of model organism and PPI network gene-phenotype data for cases with new disease associations made within the last year, which may not yet be reflected in human disease association databases such as OMIM and Orphanet.

Fig. 6.

Fig. 6

Generalizability of optimized Exomiser parameters in newly diagnosed UDN probands. A Stepwise optimization process for Exomiser on a small cohort of newly diagnosed UDN probands encompassing 23 diagnostic variants. B Change in rank for each diagnostic variant (n = 23) in the newly diagnosed UDN proband cohort using optimized parameters (green) compared to default parameters (blue). *Denotes compound heterozygous diagnosis (two variants in labeled genes). Red, Exomiser under default settings (hiPHIVE all models; REVEL + MVP using raw, unfiltered VCFs. All other runs use the filtered VCFs to remove potential false positive variants (“Methods”). Blue, Exomiser performance under default settings (hiPHIVE all models; REVEL + MVP) using filtered VCFs. Orange, Exomiser performance using hiPHIVE human-only associations and REVEL + MVP pathogenicity prediction sources. Green, Exomiser or Genomiser performance using hiPHIVE human-only associations and REVEL + MVP + AlphaMissense + SpliceAI (+ ReMM for Genomiser) pathogenicity prediction sources. These are our optimized parameters

Additionally, we explored whether performance varied between primary symptom categories and found that improvements achieved by parameter optimization in the benchmarking UDN cohorts were consistent across logically defined subsets of these cohorts, i.e., within broad disease area groupings (e.g., neurology), further supporting the robustness of our recommendations (Additional file 3: Fig. S16; Additional file 3: Fig. S3E). Similarly, we performed an inheritance mode-stratified evaluation of performance and observed no strong differences in performance between inheritance modes in the GS Exomiser cohort (Additional file 3: Fig. S17A). Small or unbalanced sample sizes in the ES Exomiser and GS Genomiser precluded any firm conclusions (Additional file 3: Fig. S17B, C).

Applying optimized Exomiser parameters for variant reanalysis

Exomiser’s fast runtime and strong performance in ranking diagnostic variants using optimized parameters make it well-suited for variant reanalysis in undiagnosed patients, particularly in response to evolving clinical phenotypes. We therefore integrated our Exomiser guidelines into the ongoing analysis of undiagnosed UDN cases via the Mosaic platform.

Mosaic is a genomic data management and analysis platform used by UDN researchers that supports collaborative diagnostics via user-friendly interfaces. The user interface allows users to search approximately 200,000 to 400,000 variants per UDN family. All Exomiser rankings and scores generated via the best practices outlined in this paper are made available. When participants undergo a change in phenotype, updated Exomiser scores are made available based on the updated HPO term list.

Discussion

Exomiser and Genomiser are complementary open-source tools designed to prioritize coding and noncoding variants in genomic data. In this study, we used diagnosed cases from the Undiagnosed Diseases Network (UDN) to evaluate the performance of these tools, propose best practices, and identify potential pitfalls in rare disease diagnostics. Optimizing Exomiser parameters significantly improved its performance over default settings, with 85.5% of diagnostic variants in the GS Exomiser cohort ranked in the top 10 candidates compared to 66.6% under default settings after applying our recommended filtering criteria (“Methods”; Fig. 3A). This optimization shifted 70 (23.6%) variants into the top 10 candidates (Fig. 7), representing a substantial improvement for manual review by elevating variants that may have otherwise been missed at lower ranks. Across all diagnostic variants in this cohort, optimization improved the rank for 152 variants and did not change the rank for 125. Although the rank of 19 diagnostic variants decreased, none were displaced from the top 30 candidates. This reflects an inherent trade-off, where improvements in overall performance may occasionally deprioritize certain variants. Similar improvements were observed for ES data and noncoding variants prioritized using Genomiser (Fig. 3B, C; Additional file 3: Fig. S18A, B). A small subset of 11 probands in our cohort harbored diagnostic variants in more than one gene. To evaluate each gene-level diagnosis independently, we treated each proband-gene pair as a separate benchmarking unit (“Methods”), aligning with how candidate genes are typically assessed in real-world diagnostic workflows. Exomiser or Genomiser successfully ranked all but one diagnostic variant from these multigenic cases within the top 30 candidates (Additional file 3: Fig. S19).

Fig. 7.

Fig. 7

Parameter optimization shifts diagnostic variants into the top 10 candidates in the GS Exomiser cohort. Seventy (23.6%) variants in the GS Exomiser cohort are shifted into the top 10 candidates using optimized parameters (green) in comparison to default parameters (blue). Optimized parameters refer to running Exomiser on the filtered family VCF, hiPHIVE human-only gene-phenotype associations, and REVEL, MVP, AlphaMissense, and SpliceAI variant pathogenicity score sources. Default parameters refer to running Exomiser on the filtered family VCF using hiPHIVE human, mouse, zebrafish, PPI gene-phenotype associations, and REVEL and MVP variant pathogenicity score sources. *Denotes compound heterozygous diagnosis (two variants in labeled genes)

Based on the analyses performed in this study, we recommend the following best practices for using Exomiser/Genomiser in rare disease diagnostics (Fig. 8):

  1. Begin analysis with a family (jointly called) VCF (when available) and filter it to remove potential false positive variants (“Methods”).

  2. Use the REVEL, MVP, AlphaMissense, and SpliceAI variant pathogenicity prediction sources and human-only hiPHIVE gene-phenotype associations.

  3. Utilize pedigree information and Exomiser inheritance filters and enable the ClinVar whitelist*.

  4. Manually review the top 30 contributing variants.

Fig. 8.

Fig. 8

Recommended workflow for using Exomiser and Genomiser in rare disease diagnostics. Numbers indicate the count (percentage) of diagnostic variants ranked (green circles) or not ranked (red circles) within the top 30 candidates by Exomiser or Genomiser after applying each preceding step in the flowchart. Percentages are calculated from the total in the preceding step, beginning at n = 380 diagnostic variants (296 from the GS Exomiser cohort, 24 with inconsistent pedigrees, and 60 from the GS Genomiser cohort). Begin analysis with a family VCF (when available), filtered to remove potentially false-positive variants (“Methods”). Use the REVEL, MVP, AlphaMissense, SpliceAI variant pathogenicity sources, and human-only hiPHIVE gene-phenotype associations. Run Exomiser using all available family variant data, pedigree information, inheritance filters, and the ClinVar whitelist enabled. Manually review the top 30 contributing variants, with frequently prioritized genes flagged. If no compelling candidates are identified, verify pedigree accuracy (considering that some family members may be misphenotyped) and consider running Exomiser on the proband-only variant data with inheritance filters enabled. If no strong candidates are found in GS data, run Genomiser to assess noncoding variants and compound heterozygous candidates with one noncoding variant and one coding variant

Any variant on the ClinVar whitelist bypasses all Exomiser filters (e.g., allele frequency) and will be included in the prioritized results with a maximal score. Although we excluded this feature to avoid biasing benchmarking results, we observed improved performance when enabled, so we recommend its inclusion when analyzing unsolved cases. We also found that Exomiser performs best with accurate and relevant HPO terms but tolerates some imprecision or noise in phenotype lists.

There are scenarios in which the outlined steps do not prioritize the diagnostic variant(s), prompting us to introduce additional analyses that can be explored to further improve outcomes when time permits (Fig. 8). First, we observed that Exomiser missed some diagnostic variants due to inheritance constraints resulting from pedigree inaccuracies (Additional file 3: Fig. S11). We also demonstrated that Genomiser, while requiring more manual curation, is necessary for identifying noncoding candidates. In cases with compound heterozygous diagnoses made up of noncoding and coding variants, Exomiser alone was unable to detect both diagnostic variants in the correct AR configuration (Additional file 3: Fig. S9). However, while Genomiser considers coding variants, it is not a replacement for Exomiser, as it identified~30% fewer diagnostic variants within the top 30 candidates and required nearly three times the runtime (Additional file 3: Fig. S1).

While restricting gene-phenotype associations to human-only data improved overall performance, in rare instances, it also led to missed or lower-ranked variants in genes with recently established disease associations not yet represented in human-specific databases. Expanding to include multispecies and PPI data enabled recovery of such variants, as seen for six variants in the GS Genomiser cohort (Additional file 3: Fig. S4B) and increased rank for five variants in the newly diagnosed test cohort (Fig. 6). In the GS Exomiser benchmarking cohort, Exomiser failed to prioritize 7 (2.4%) diagnostic variants within the top 30 candidates (Additional file 1: Table S5). These instances of poor performance, based on our criteria for success, were typically driven by low phenotype or variant scores. Together, these findings underscore the importance of regular updates to and the expansion of gene-phenotype association databases and variant pathogenicity predictors to improve prioritization outcomes, particularly in challenging or novel cases.

We encountered a specific challenge with using CADD scores in Exomiser and Genomiser. To be incorporated, CADD Phred scores are normalized to a 0–1 range, where a score of 10 maps to 0.9. However, in our experience, a REVEL score of 0.9 indicates pathogenicity more than a CADD Phred score of 10, particularly for missense variants. As Exomiser assigns the highest score across variant pathogenicity sources, these normalized CADD scores disproportionately bias maximum pathogenicity scores due to their incompatibility with other sources’ scoring scales. As a result, we excluded CADD scores from our analyses. This decision does not reflect a flaw in CADD or Exomiser but rather highlights a limitation in the compatibility of these scales. We recommend that users review CADD scores alongside Exomiser results, as they provide valuable context for variant interpretation. Future updates to Exomiser that improve variant pathogenicity scale equivalency may mitigate this issue.

Finally, we explored refinement strategies for Exomiser outputs, including filtering by p-value and identifying genes frequently ranked in the top 30 candidates but rarely associated with diagnoses. These approaches can support hypothesis generation in complex or unsolved cases by enabling integration of prioritized SNV/indels with other variant types (e.g., structural variants) or “omes” (e.g., expression or splicing outliers) or by flagging variants that analysts should interpret with caution, informed by trends observed across a larger benchmarking cohort.

We acknowledge some limitations in this study. The cohort size, particularly for Genomiser benchmarking, was relatively small, which may limit the generalizability of our findings. Additionally, we recognize the biases inherent in solved UDN cases, where well-documented gene-phenotype associations may be more available and could have led to more favorable outcomes compared to unsolved cases. Over half of the probands in the combined benchmarking cohort had previously undergone nondiagnostic clinical exome sequencing, reflecting the particularly challenging nature of rare disease presentations among UDN participants. While this complexity may underestimate Exomiser or Genomiser performance in less complex populations, the rich phenotypic data collected through the UDN may enhance prioritization by these phenotype-aware algorithms, highlighting the importance of detailed phenotypic information in driving diagnostic success. Furthermore, some variant pathogenicity prediction tools, such as REVEL, are trained on ClinVar data, which may include diagnostic variants in this cohort, potentially inflating their ability to predict these variants as damaging.

Conclusions

This study provides valuable insights into the capabilities and limitations of Exomiser and Genomiser for variant prioritization in rare disease diagnostics. Through systematic parameter optimization and refinement of analysis strategies, we significantly improved tool performance in prioritizing diagnostic variants compared to default settings. Our findings underscore the importance of integrating informative patient phenotypes and accurate family pedigrees with high-quality variant data to enhance the performance of these phenotype-aware tools.

We propose evidence-based best practices for deploying these tools for rare disease diagnostics and demonstrate their generalizability across diverse clinical presentations, including newly diagnosed cases. In the context of undiagnosed rare disease diagnostics—where sensitivity is often prioritized over specificity to avoid missing diagnostic variants—even small reductions in recall can have meaningful consequences. Our study offers a benchmarked framework quantifying the trade-off between analytical burden and diagnostic sensitivity. While the optimized parameters enhance diagnostic outcomes, there are cases where they may fall short, highlighting the continued necessity of manual review in clinical applications. In practice, a tiered approach beginning with stringent filters and relaxing them in iterative rounds is common in challenging cases. Finally, this work highlights the significance of tracking and documenting diagnosed variants that can be used for benchmarking bioinformatic pipelines.

Supplementary Information

13073_2025_1546_MOESM1_ESM.docx (27KB, docx)

Additional file 1: Supporting tables S1-S7. Table S1: Comparison of phenotype-aware variant prioritization methods. Table S2: VCF filtering summary statistics. Table S3: Default pathogenicity scores assigned to classes of variants. Table S4: Diagnostic variants with p-values > 0.3 in GS Exomiser cohort. Table S5: Diagnostic variants that ranked > 30 in GS Exomiser cohort. Table S6: Summary of diagnostic variants not prioritized by Exomiser in GS Exomiser cohort. Table S7: ACMG classification of diagnostic variants in benchmarking cohorts.

13073_2025_1546_MOESM2_ESM.docx (7.3KB, docx)

Additional file 2: List of perinatal/prenatal HPO terms removed to create “pruned” terms lists.

13073_2025_1546_MOESM3_ESM.docx (4MB, docx)

Additional file 3: Supporting figures S1-S20. Fig S1: Comparison of Exomiser and Genomiser performance on coding variants. Fig S2: Breakdown of three UDN cohorts used in benchmarking. Fig S3: Summary of combined ES/GS cohorts. Fig S4: Genomiser performance using hiPHIVE with default versus human-only gene:phenotype associations. Fig S5: Evaluation of individual variant pathogenicity prediction scores sources in GS Exomiser cohort. Fig S6: Maximum pathogenicity score sources for diagnostic variants in the GS Exomiser cohort. Fig S7: Evaluation of variant pathogenicity prediction score sources in ES Exomiser cohort. Fig S8: Evaluation of variant pathogenicity prediction score sources in GS Genomiser cohort. Fig S9: Exomiser performance on GS Genomiser cohort. Fig S10: Impact of proband phenotype quality on Exomiser performance. Fig S11: Recovering diagnostic variants through proband-only reanalysis or manual pedigree correction. Fig S12: Impact of family variant data and inheritance filters on Exomiser performance. Fig S13: Impact of maximum p-value thresholds on number of candidate variants and loss of diagnostic variants. Fig S14: Removal of frequently ranked genes in GS Exomiser cohort. Fig S15: Frequently ranked genes in ES Exomiser cohort. Fig S16: Generalizability of Exomiser/Genomiser performance across primary disease categories. Fig S17: Exomiser and Genomiser performance stratified by inheritance pattern. Fig S18: Parameter optimization shifts diagnostic variants into the top ten candidates in GS Genomiser and ES Exomiser cohorts. Fig S19: Probands with multiple diagnostic genes. Fig S20: Impact of variant frequency filters and population sources in GS Exomiser cohort.

13073_2025_1546_MOESM4_ESM.docx (16.9KB, docx)

Additional file 4: Supporting texts S1-S4. Text S1: Impact of family variant frequency filters and population sources. Text S2: Impact of the inclusion of family variant data and inheritance filtering step on Exomiser performance. Text S3: Diagnostic variants not prioritized in the GS Exomiser cohort. Text S4: ACMG classification of diagnostic variants.

13073_2025_1546_MOESM5_ESM.docx (14.5KB, docx)

Additional file 5: Lists of frequently ranked genes in GS Exomiser cohort and ES Exomiser cohorts.

Acknowledgements

We thank the Exomiser development team for making Exomiser and Genomiser freely available and for their ongoing work to support and improve rare disease diagnostics through open-source tools. Members of the Undiagnosed Diseases Network (UDN) include Aaron Quinlan, Abdul Elkadri, Adeline Vanderver, Adriana Rebelo, Alan H. Beggs, Albert R. La Spada, Alden Huang, Alex Paul, Alexander Miller, Alistair Ward, Allen Bale, Allyn McConkie-Rosell, Alyson Krokosky, Alyssa A. Tran, Andrea Gropman, Andres Vargas, Andrew B. Crouse, Andrew Stergachis, Anna Hurst, Anna Raper, Anne Slavotinek, Arian Nouraee, Arjun Tarakad, Ashley Andrews, Ashley McMinn, Ashok Balasubramanyam, Ayuko Iverson, Barbara N. Pusey Swerdzewski, Beatriz Anguiano, Ben Afzali, Ben Solomon, Beth A. Martin, Bianca E. Russell, Brandon M. Wilk, Breanna Mitchell, Brendan C. Lanpher, Brendan H. Lee, Brent L. Fogel, Brett Bordini, Brett H. Graham, Brianna Tucker, Bruce Gelb, Bruce Korf, Calum A. MacRae, Camilo Toro, Cara Skraban, Carlos A. Bacino, Carlos A. Pardo-Villamizar, Carlos Prada, Carol Oladele, Caroline Hendry, Carson A. Smith, Cathy Shyr, Cecilia Esteves, Changrui Xiao, Charlotte Cunningham-Rundles, Chloe M. Reuter, Christine M. Eng, Christopher Mayhew, Chun-Hung Chan, Colleen E. Wahl, Corrine K. Welt, Cynthia J. Tifft, Dana Kiley, Dana Sayer, Daniel J. Rader, Daniel Wegner, Danny E. Miller, Daryl A. Scott, Dave Viskochil, David A. Sweetser, David R. Adams, Deborah Barbouth, Deepak A. Rao, Devin Oglesbee, Devon Bonner, Donald Basel, Donna Novacic, Dr. Francisco Bustos Velasq, Dustin Baldridge, Edward Behrens, Edwin K. Silverman, Elaine Seto, Elijah Kravets, Elisabeth Rosenthal, Elizabeth A. Worthey, Elizabeth A. Burke, Elizabeth Blue, Elizabeth C. Chao, Elizabeth L. Fieg, Elizabeth Wohler, Ellen F. Macnamara, Elsa Balton, Emily Glanton, Emily Shelkowitz, Emily Wang, Eneida Mendonca, Eric Allenspach, Eric Gamazon, Eric Gayle, Eric Klee, Eric Vilain, Erica Davis, Erin Conboy, Erin E. Baldwin, Erin McRoy, Esteban C. Dell'Angelica, Euan A. Ashley, F. Sessions Cole, Filippo Pinto e Vairo, Frances High, Francesco Vetrini, Francis Rossignol, Fuki M. Hisama, Gabor Marth, Gail P. Jarvik, Gary D. Clark, George Carvalho, Gerard T. Berry, Ghayda Mirzaa, Giorgio Sirugo, Gonench Kilich, Guney Bademci, Hector Rodrigo Mendez, Heidi Wood, Herman Taylor, Holly K. Tabor, Hongzheng Dai, Hsiao-Tuan Chao, Hua Xu, Hugo J. Bellen, Hui Zhang, Ian Glass, Ian R. Lanza, Ingrid A. Holm, Isaac S. Kohane, Isum Ward, Ivan Chinn, J. Carl Pallais, Jacinda B. Sampson, James P. Orengo, James Verbsky, Jared Sninsky, Jason Hom, Jason Schend, Jennefer N. Kohler, Jennifer Morgan, Jennifer Schymick, Jennifer Tousseau, Jennifer Wambach, Jessica Douglas, Jiayu Fu, Jill A. Rosenfeld, Jimann Shin, Joan M. Stoler, Joanna Jen, Joanna M. Gonzalez, John A. Phillips III, John Carey, John E. Gorzynski, John J. Mulvihill, Joie Davis, Jonathan A. Bernstein, Jordan Whitlock, Jose Abdenur, Joseph Loscalzo, Joy D. Cogan, Julian A. Martínez-Agosto, Julie Hoover-Fong, Julie McCarrier, Justin Alvey, Kahlen Darr, Kai Lee Yap, Kaitlin Callaway, Kathleen A. Leppig, Kathleen Page, Kathleen Sullivan, Kathy Sisco, Katrina Dipple, Kayla M. Treat, Kelly Hassey, Kelly Regan-Fendt, Kelly Schoch, Kevin S. Smith, Khurram Liaqat, Kim Worley, Kimberly Ezell, Kimberly LeBlanc, Kirsten Blanco, Kumarie Latchman, Lakshitha Perera, Lance H. Rodan, Laura Keehan, Laurel A. Cobban, Lauren Blieden, Lauren C. Briere, Lauren Jeffries, Laurens Wiel, Layal F. Abi Farraj, Leoyklang Petcharet, LéShon Peart, Lili Mantcheva, Lilianna Solnica-Krezel, Lindsay C. Burrage, Lindsay Mulvihill, Lisa Bastarache, Lisa Schimmenti, Lisa T. Emrick, Lorenzo Botto, Lorraine Potocki, Louise Bier, Lynette Rives, Lynne A. Wolfe, Mafalda Barbosa, Maija-Rikka Steenari, Manish J. Butte, Manisha Balwani, Margaret Delgado, María José Ortuño Romero, María Paula Silva, Maria T. Acosta, Marie Morimoto, Mariko Nakano-Okuno, Mariya Shadrina, Mark Gerstein, Mark Wener, Marla Sabaii, Martha Horike-Pyne, Martin G. Martin, Martin Rodriguez, Mary Koziura, Matt Velinder, Matthew Coggins, Matthew Might, Matthew Robinson, Matthew T. Wheeler, May Christine V. Malicdan, Megan Bell, Meghan C. Halley, Melissa Walker, Mia Levanto, Michael Bamshad, Michael F. Wangler, Michael Muriello, Michael T. Zimmermann, Michele Spencer-Manzon, Miranda Leitheiser, Mohamad Mikati, Mohamad Saifeddine, Monika Weisz Hubshman, Monkol Lek, Monte Westerfield, Mustafa Tekin, Nada Derar, Naghmeh Dorrani, Nara Sobreira, Neil H. Parker, Neil Hanchard, Nicholas Borja, Nicola Longo, Nicole M. Walley, Nitsuh K. Dargie, Odelya Kaufman, Oguz Kanca, Orpa Jean-Marie, Page C. Goddard, Paolo Moretti, Patricia A. Ward, Patricia Dickson, Patrick McMullen, Paul Auwaerter, Paul Berger, Paul G. Fisher, Pengfei Liu, Peter Byers, Philip Dane Witmer, Pinar Bayrak-Toydemir, Pongtawat Lertwilaiwittaya, Precilla D'Souza, Queenie Tan, Rachel A. Ungar, Rachel Evard, Rachel Li, Rakale C. Quarells, Ramakrishnan Rajagopalan, Raquel L. Alvarez, Reaford Blackburn, Rebecca C. Spillmann, Rebecca Ganetzky, Rebecca Overbury, Rebekah Barrick, Richard A. Lewis, Richard Chang, Richard L. Maas, Rizwan Hamid, Rong Mao, Ronit Marom, Rosario I. Corona, Runjun Kumar, Russell Butterfield, Sanaz Attaripour, Sandesh Nagamani, Sara Emami, Saskia Shuman, Seema R. Lalani, Seth Perlman, Shamika Ketkar, Shamil R. Sunyaev, Shilpa N. Kobren, Shinya Yamamoto, Shrikant Mane, Shruti Marwaha, Sirisak Chanprasert, Stanley F. Nelson, Stephan Zuchner, Stephanie Bivona, Stephanie M. Ware, Stephen B. Montgomery, Stephen C. Pak, Steven Boyden, Suha Bachir, Surendra Dasari, Susan Korrick, Susan Shin, Suzanne Sandmeyer, Tahseen Mozaffar, Tammi Skelton, Tanner D. Jensen, Tarun K. K. Mamidi, Taylor Beagle, Taylor Maurer, Teneasha Washington, Teodoro Jerves Serrano, Terra R. Coakley, Thomas Cassini, Thomas J. Nicholas, Timothy Schedl, Tiphanie P. Vogel, Vaidehi Jobanputra, Valerie V. Maduro, Vandana Shashi, Vasilis Vasiliou, Virginia Sybert, Vishnu Cuddapah, Wendy Introne, Wendy Raskind, Willa Thorson, William A. Gahl, William E. Byrd, William J. Craigen, Winston Halstead, Winston Timp, Yan Huang, Yigit Karasozen, Yong-Hui Jiang, Yuka Manabe, Zackary Dov Berger, Ziyuan Guo.

Abbreviations

ES

Exome sequencing

GS

Genome sequencing

UDN

Undiagnosed Diseases Network

HPO

Human Phenotype Ontology

SNV

Single-nucleotide variant

VCF

Variant call format

IRB

Institutional Review Board

GQ

Genotype quality

VAF

Variant allele frequency

MOI

Mode of inheritance

AD

Autosomal dominant

AR

Autosomal recessive

PPI

Protein-protein interaction

AM

AlphaMissense

NLP

Natural language processing

Authors’ contributions

S.M., I.C., and A.W. contributed to the conception and design of the study, including methods and interpretation of the results. S.M. piloted benchmarking of Exomiser and Genomiser using a small subset of solved cases from the UDN. I.C. scaled analysis to all diagnosed cases for results in this manuscript. I.C. and S.M. fetched metadata associated with participants. I.C. curated benchmarking cohorts. S.K. performed realignment and joint calling of UDN GS and ES data. I.C. performed parameter optimizations and computational analyses and completed all figure generation. A.W. ran Exomiser and Genomiser on unsolved UDN cases and facilitated their availability in Mosaic. I.C., S.M., and A.W. drafted the original manuscript. S.K., J.C., M.W., and G.T.M. contributed to manuscript review and editing. All authors reviewed the final manuscript.

Funding

This work was supported by a grant from the National Human Genome Research Initiative (NHGRI) Advancing Genomic Medicine Research (AGMR) program (RO1HG012286 to GTM). Research reported in this manuscript was supported by the NIH common fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Number U01HG010218 and by the NIH NINDS under Award Numbers U01NS134358 and U2CNS132415. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The computational resources used were partially funded by the NIH Shared Instrumentation Grant 1S10OD021644-01A1.

Data availability

All deidentified exomic and genomic sequencing data including SNV and indel variant calls, as well as corresponding phenotype data in the form of pedigree files and HPO terms, are regularly deposited in dbGaP (accession phs001232.v7.p3; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001232.v7.p3) [44] (https:/paperpile.com/c/gjbj9b/DENn). Rare SNV and indel variants and HPO terms for UDN participants with genomic sequencing are also queryable in a public-facing browser: (https://dbmi-bgm.github.io/udn-browser/) [45]. Variant-level data, clinical significance and supporting evidence, demographic information, and phenotype information for all diagnostic variants, including those included in the ES Exomiser, GS Exomiser, and GS Genomiser datasets used in this study, are submitted to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/submitters/505999/) [46] (https:/paperpile.com/c/gjbj9b/Oze4). Other relevant, patient-specific clinical information may be shared on a case-by-case basis at the discretion of the clinical team managing the case.

Declarations

Ethics approval and consent to participate

All work included in this study was performed in accordance with all ethical guidelines outlined in the NIH IRB no. 15HG0130 and the UDN Manual of Operations. All de-identified patient data included in this study was provided with informed consent by all participants to be used freely for research purposes across the network. The study proposal and this manuscript were approved by the UDN Publications and Research Committee. All research has been conducted in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

A.W. and G.T.M. are co-founders and CEO and CSO, respectively, of Frameshift Labs, the developer of the Mosaic platform. The remaining authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Gabor T. Marth, Email: gmarth@genetics.utah.edu

Undiagnosed Diseases Network:

Aaron Quinlan, Abdul Elkadri, Adeline Vanderver, Adriana Rebelo, Alan H Beggs, Albert R La Spada, Alden Huang, Alex Paul, Alexander Miller, Alistair Ward, Allen Bale, Allyn McConkie-Rosell, Alyson Krokosky, Alyssa A Tran, Andrea Gropman, Andres Vargas, Andrew B Crouse, Andrew Stergachis, Anna Hurst, Anna Raper, Anne Slavotinek, Arian Nouraee, Arjun Tarakad, Ashley Andrews, Ashley McMinn, Ashok Balasubramanyam, Ayuko Iverson, Barbara N Pusey Swerdzewski, Beatriz Anguiano, Ben Afzali, Ben Solomon, Beth A Martin, Bianca E Russell, Brandon M Wilk, Breanna Mitchell, Brendan C Lanpher, Brendan H Lee, Brent L Fogel, Brett Bordini, Brett H Graham, Brianna Tucker, Bruce Gelb, Bruce Korf, Calum A MacRae, Camilo Toro, Cara Skraban, Carlos A Bacino, Carlos A Pardo-Villamizar, Carlos Prada, Carol Oladele, Caroline Hendry, Carson A Smith, Cathy Shyr, Cecilia Esteves, Changrui Xiao, Charlotte Cunningham-Rundles, Chloe M Reuter, Christine M Eng, Christopher Mayhew, Chun-Hung Chan, Colleen E Wahl, Corrine K Welt, Cynthia J Tifft, Dana Kiley, Dana Sayer, Daniel J Rader, Daniel Wegner, Danny E Miller, Daryl A Scott, Dave Viskochil, David A Sweetser, David R Adams, Deborah Barbouth, Deepak A Rao, Devin Oglesbee, Devon Bonner, Donald Basel, Donna Novacic, Francisco Bustos velasq, Dustin Baldridge, Edward Behrens, Edwin K Silverman, Elaine Seto, Elijah Kravets, Elisabeth Rosenthal, Elizabeth A Worthey, Elizabeth A Burke, Elizabeth Blue, Elizabeth C Chao, Elizabeth L Fieg, Elizabeth Wohler, Ellen F Macnamara, Elsa Balton, Emily Glanton, Emily Shelkowitz, Emily Wang, Eneida Mendonca, Eric Allenspach, Eric Gamazon, Eric Gayle, Eric Klee, Eric Vilain, Erica Davis, Erin Conboy, Erin E Baldwin, Erin McRoy, Esteban C Dell’Angelica, Euan A Ashley, F Sessions Cole, Filippo Pinto E Vairo, Frances High, Francesco Vetrini, Francis Rossignol, Fuki M Hisama, Gabor Marth, Gail P Jarvik, Gary D Clark, George Carvalho, Gerard T Berry, Ghayda Mirzaa, Giorgio Sirugo, Gonench Kilich, Guney Bademci, Hector Rodrigo Mendez, Heidi Wood, Herman Taylor, Holly K Tabor, Hongzheng Dai, Hsiao-Tuan Chao, Hua Xu, Hugo J Bellen, Hui Zhang, Ian Glass, Ian R Lanza, Ingrid A Holm, Isaac S Kohane, Isum Ward, Ivan Chinn, J Carl Pallais, Jacinda B Sampson, James P Orengo, James Verbsky, Jared Sninsky, Jason Hom, Jason Schend, Jennefer N Kohler, Jennifer Morgan, Jennifer Schymick, Jennifer Tousseau, Jennifer Wambach, Jessica Douglas, Jiayu Fu, Jill A Rosenfeld, Jimann Shin, Joan M Stoler, Joanna Jen, Joanna M Gonzalez, John A Phillips, III, John Carey, John E Gorzynski, John J Mulvihill, Joie Davis, Jonathan A Bernstein, Jordan Whitlock, Jose Abdenur, Joseph Loscalzo, Joy D Cogan, Julian A Martínez-Agosto, Julie Hoover-Fong, Julie McCarrier, Justin Alvey, Kahlen Darr, Kai Lee Yap, Kaitlin Callaway, Kathleen A Leppig, Kathleen Page, Kathleen Sullivan, Kathy Sisco, Katrina Dipple, Kayla M Treat, Kelly Hassey, Kelly Regan-Fendt, Kelly Schoch, Kevin S Smith, Khurram Liaqat, Kim Worley, Kimberly Ezell, Kimberly LeBlanc, Kirsten Blanco, Kumarie Latchman, Lakshitha Perera, Lance H Rodan, Laura Keehan, Laurel A Cobban, Lauren Blieden, Lauren C Briere, Lauren Jeffries, Laurens Wiel, Layal F Abi Farraj, Leoyklang Petcharet, LéShon Peart, Lili Mantcheva, Lilianna Solnica-Krezel, Lindsay C Burrage, Lindsay Mulvihill, Lisa Bastarache, Lisa Schimmenti, Lisa T Emrick, Lorenzo Botto, Lorraine Potocki, Louise Bier, Lynette Rives, Lynne A Wolfe, Mafalda Barbosa, Maija-Rikka Steenari, Manish J Butte, Manisha Balwani, Margaret Delgado, María José Ortuño Romero, María Paula Silva, Maria T Acosta, Marie Morimoto, Mariko Nakano-Okuno, Mariya Shadrina, Mark Gerstein, Mark Wener, Marla Sabaii, Martha Horike-Pyne, Martin G Martin, Martin Rodriguez, Mary Koziura, Matt Velinder, Matthew Coggins, Matthew Might, Matthew Robinson, Matthew T Wheeler, MayChristine V Malicdan, Megan Bell, Meghan C Halley, Melissa Walker, Mia Levanto, Michael Bamshad, Michael F Wangler, Michael Muriello, Michael T Zimmermann, Michele Spencer-Manzon, Miranda Leitheiser, Mohamad Mikati, Mohamad Saifeddine, Monika Weisz Hubshman, Monkol Lek, Monte Westerfield, Mustafa Tekin, Nada Derar, Naghmeh Dorrani, Nara Sobreira, Neil H Parker, Neil Hanchard, Nicholas Borja, Nicola Longo, Nicole M Walley, Nitsuh K Dargie, Odelya Kaufman, Oguz Kanca, Orpa Jean-Marie, Page C Goddard, Paolo Moretti, Patricia A Ward, Patricia Dickson, Patrick McMullen, Paul Auwaerter, Paul Berger, Paul G Fisher, Pengfei Liu, Peter Byers, Philip Dane Witmer, Pinar Bayrak-Toydemir, Pongtawat Lertwilaiwittaya, Precilla D’Souza, Queenie Tan, Rachel A Ungar, Rachel Evard, Rachel Li, Rakale C Quarells, Ramakrishnan Rajagopalan, Raquel L Alvarez, Reaford Blackburn, Rebecca C Spillmann, Rebecca Ganetzky, Rebecca Overbury, Rebekah Barrick, Richard A Lewis, Richard Chang, Richard L Maas, Rizwan Hamid, Rong Mao, Ronit Marom, Rosario I Corona, Runjun Kumar, Russell Butterfield, Sanaz Attaripour, Sandesh Nagamani, Sara Emami, Saskia Shuman, Seema R Lalani, Seth Perlman, Shamika Ketkar, Shamil R Sunyaev, Shilpa N Kobren, Shinya Yamamoto, Shrikant Mane, Shruti Marwaha, Sirisak Chanprasert, Stanley F Nelson, Stephan Zuchner, Stephanie Bivona, Stephanie M Ware, Stephen B Montgomery, Stephen C Pak, Steven Boyden, Suha Bachir, Surendra Dasari, Susan Korrick, Susan Shin, Suzanne Sandmeyer, Tahseen Mozaffar, Tammi Skelton, Tanner D Jensen, Tarun KK Mamidi, Taylor Beagle, Taylor Maurer, Teneasha Washington, Teodoro Jerves Serrano, Terra R Coakley, Thomas Cassini, Thomas J Nicholas, Timothy Schedl, Tiphanie P Vogel, Vaidehi Jobanputra, Valerie V Maduro, Vandana Shashi, Vasilis Vasiliou, Virginia Sybert, Vishnu Cuddapah, Wendy Introne, Wendy Raskind, Willa Thorson, William A Gahl, William E Byrd, William J Craigen, Winston Halstead, Winston Timp, Yan Huang, Yigit Karasozen, Yong-Hui Jiang, Yuka Manabe, Zackary Dov Berger, and Ziyuan Guo

References

  • 1.Haendel M, Vasilevsky N, Unni D, et al. How many rare diseases are there? Nat Rev Drug Discov. 2020;19:77–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ferreira CR. The burden of rare diseases. Am J Med Genet A. 2019;179:885–92. [DOI] [PubMed] [Google Scholar]
  • 3.Chung CCY, Hue SPY, Ng NYT, Doong PHL, Hong Kong Genome Project, Chu ATW, et al. Meta-analysis of the diagnostic and clinical utility of exome and genome sequencing in pediatric and adult patients with rare diseases across diverse populations. Genet Med. 2023;25:100896. [DOI] [PubMed] [Google Scholar]
  • 4.Wright CF, Campbell P, Eberhardt RY, et al. Genomic diagnosis of rare pediatric disease in the United Kingdom and Ireland. N Engl J Med. 2023;388:1559–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, et al (2021) 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N Engl J Med 385:1868–1880 [DOI] [PMC free article] [PubMed]
  • 6.Jacobsen JOB, Kelly C, Cipriani V, Research Consortium GE, Mungall CJ, Reese J, et al. Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease. Hum Mutat. 2022;43:1071–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Robinson PN, Mundlos S. The human phenotype ontology. Clin Genet. 2010;77:525–34. [DOI] [PubMed] [Google Scholar]
  • 8.Smedley D, Jacobsen JOB, Jäger M, et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10:2004–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Smedley D, Schubach M, Jacobsen JOB, et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am J Hum Genet. 2016;99:595–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mao D, Liu C, Wang L, et al. AI-MARRVEL - a knowledge-driven AI system for diagnosing Mendelian disorders. NEJM AI. 2024. 10.1056/aioa2300009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Laurie S, Piscia D, Matalonga L, et al. The RD-connect genome-phenome analysis platform: accelerating diagnosis, research, and gene discovery for rare diseases. Hum Mutat. 2022;43:717–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Buske OJ, Girdea M, Dumitriu S, et al. PhenomeCentral: a portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Hum Mutat. 2015;36:931–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Smedley D, Robinson PN. Phenotype-driven strategies for exome prioritization of human Mendelian disease genes. Genome Med. 2015;7:81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Belkadi A, Bolze A, Itan Y, Cobat A, Vincent QB, Antipenko A, et al. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proc Natl Acad Sci U S A. 2015;112:5473–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cipriani V, Pontikos N, Arno G, et al. An improved phenotype-driven tool for rare Mendelian variant prioritization: benchmarking Exomiser on real patient whole-exome data. Genes (Basel). 2020. 10.3390/genes11040460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Splinter K, Adams DR, Bacino CA, et al. Effect of genetic diagnosis on patients with previously undiagnosed disease. N Engl J Med. 2018;379:2131–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Undiagnosed Diseases Network (2017) UDN Manual of Operations. https://undiagnosed.hms.harvard.edu/research/udn-manual-of-operations/. Accessed 1 Sep 2025
  • 19.Deisseroth CA, Birgmeier J, Bodle EE, et al. Clinphen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis. Genet Med. 2019;21:1585–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.CGAP - Computational Genome Analysis Platform. https://cgap.hms.harvard.edu/. Accessed 1 Sep 2025
  • 21.Freed D, Aldana R, Weber JA, Edwards JS (2017) The Sentieon Genomics Tools - a fast and accurate solution to variant calling from next-generation sequence data. bioRxiv. 10.1101/115717
  • 22.Kobren SN, Moldovan MA, Reimers R, et al. Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations. Nat Commun. 2025;16:7267 [DOI] [PMC free article] [PubMed]
  • 23.Girdea M, Dumitriu S, Fiume M, et al. PhenoTips: patient phenotyping software for clinical and research use. Hum Mutat. 2013;34:1057–65. [DOI] [PubMed] [Google Scholar]
  • 24.Van der Auwera GA, O’Connor BD (2020) Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O’Reilly Media
  • 25.Carson AR, Smith EN, Matsui H, Brækkan SK, Jepsen K, Hansen J-B, et al. Effective filtering strategies to improve data quality from population-based whole exome sequencing studies. BMC Bioinformatics. 2014;15:125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Boscolo Bielo L, Trapani D, Repetto M, Crimini E, Valenza C, Belli C, et al. Variant allele frequency: a decision-making tool in precision oncology? Trends Cancer Res. 2023;9:1058–68. [DOI] [PubMed] [Google Scholar]
  • 27.Kraft IL, Godley LA. Identifying potential germline variants from sequencing hematopoietic malignancies. Hematology Am Soc Hematol Educ Program. 2020;2020:219–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Smedley D, Jacobsen JO, Jäger M, Köhler S, Holtgrewe M et al. Exomiser: a tool to annotate and prioritize exome variants. In: GitHub. https://github.com/exomiser/Exomiser. Accessed 1 Sep 2025
  • 29.Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Broad Institute LiftOver. http://liftover.broadinstitute.org. Accessed 2 Sep 2025
  • 31.Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res. 2024;52:D1143–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cooperstein I (2025) Exomiser optimization analysis code and example files. In: GitHub. https://github.com/icooperstein/exomiser_optimization.
  • 33.Zemojtel T, Köhler S, Mackenroth L, et al. Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med. 2014;6:252ra123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Smedley D, Oellrich A, Köhler S, Ruef B, Sanger Mouse Genetics Project, Westerfield M, Robinson P, Lewis S, Mungall C (2013) PhenoDigm: analyzing curated annotations to associate animal models with human diseases. Database 2013:bat025 [DOI] [PMC free article] [PubMed]
  • 35.Bone WP, Washington NL, Buske OJ, et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med. 2016;18:608–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Robinson PN, Köhler S, Oellrich A, et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 2014;24:340–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ioannidis NM, Rothstein JH, Pejaver V, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99:877–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Qi H, Zhang H, Zhao Y, Chen C, Long JJ, Chung WK, et al. Mvp predicts the pathogenicity of missense variants by deep learning. Nat Commun. 2021;12:510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Cheng J, Novati G, Pan J, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381:eadg7492. [DOI] [PubMed] [Google Scholar]
  • 40.Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176:535-548.e24. [DOI] [PubMed] [Google Scholar]
  • 41.Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Schwarz JM, Cooper DN, Schuelke M, Seelow D. Mutationtaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11:361–2. [DOI] [PubMed] [Google Scholar]
  • 43.Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11:863–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Undiagnosed Diseases Network (2025) Undiagnosed Diseases Network (UDN) genomic data [dbGaP Study phs001232.v7.p3]. In: Database of Genotypes and Phenotypes (dbGaP), National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001232.v7.p3.
  • 45.Kobren SN Undiagnosed Diseases Network Variant Browser. https://dbmi-bgm.github.io/udn-browser/. Accessed 1 Sep 2025
  • 46.UDN ClinVar Submissions. https://www.ncbi.nlm.nih.gov/clinvar/submitters/505999/. Accessed 1 Sep 2025

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13073_2025_1546_MOESM1_ESM.docx (27KB, docx)

Additional file 1: Supporting tables S1-S7. Table S1: Comparison of phenotype-aware variant prioritization methods. Table S2: VCF filtering summary statistics. Table S3: Default pathogenicity scores assigned to classes of variants. Table S4: Diagnostic variants with p-values > 0.3 in GS Exomiser cohort. Table S5: Diagnostic variants that ranked > 30 in GS Exomiser cohort. Table S6: Summary of diagnostic variants not prioritized by Exomiser in GS Exomiser cohort. Table S7: ACMG classification of diagnostic variants in benchmarking cohorts.

13073_2025_1546_MOESM2_ESM.docx (7.3KB, docx)

Additional file 2: List of perinatal/prenatal HPO terms removed to create “pruned” terms lists.

13073_2025_1546_MOESM3_ESM.docx (4MB, docx)

Additional file 3: Supporting figures S1-S20. Fig S1: Comparison of Exomiser and Genomiser performance on coding variants. Fig S2: Breakdown of three UDN cohorts used in benchmarking. Fig S3: Summary of combined ES/GS cohorts. Fig S4: Genomiser performance using hiPHIVE with default versus human-only gene:phenotype associations. Fig S5: Evaluation of individual variant pathogenicity prediction scores sources in GS Exomiser cohort. Fig S6: Maximum pathogenicity score sources for diagnostic variants in the GS Exomiser cohort. Fig S7: Evaluation of variant pathogenicity prediction score sources in ES Exomiser cohort. Fig S8: Evaluation of variant pathogenicity prediction score sources in GS Genomiser cohort. Fig S9: Exomiser performance on GS Genomiser cohort. Fig S10: Impact of proband phenotype quality on Exomiser performance. Fig S11: Recovering diagnostic variants through proband-only reanalysis or manual pedigree correction. Fig S12: Impact of family variant data and inheritance filters on Exomiser performance. Fig S13: Impact of maximum p-value thresholds on number of candidate variants and loss of diagnostic variants. Fig S14: Removal of frequently ranked genes in GS Exomiser cohort. Fig S15: Frequently ranked genes in ES Exomiser cohort. Fig S16: Generalizability of Exomiser/Genomiser performance across primary disease categories. Fig S17: Exomiser and Genomiser performance stratified by inheritance pattern. Fig S18: Parameter optimization shifts diagnostic variants into the top ten candidates in GS Genomiser and ES Exomiser cohorts. Fig S19: Probands with multiple diagnostic genes. Fig S20: Impact of variant frequency filters and population sources in GS Exomiser cohort.

13073_2025_1546_MOESM4_ESM.docx (16.9KB, docx)

Additional file 4: Supporting texts S1-S4. Text S1: Impact of family variant frequency filters and population sources. Text S2: Impact of the inclusion of family variant data and inheritance filtering step on Exomiser performance. Text S3: Diagnostic variants not prioritized in the GS Exomiser cohort. Text S4: ACMG classification of diagnostic variants.

13073_2025_1546_MOESM5_ESM.docx (14.5KB, docx)

Additional file 5: Lists of frequently ranked genes in GS Exomiser cohort and ES Exomiser cohorts.

Data Availability Statement

All deidentified exomic and genomic sequencing data including SNV and indel variant calls, as well as corresponding phenotype data in the form of pedigree files and HPO terms, are regularly deposited in dbGaP (accession phs001232.v7.p3; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001232.v7.p3) [44] (https:/paperpile.com/c/gjbj9b/DENn). Rare SNV and indel variants and HPO terms for UDN participants with genomic sequencing are also queryable in a public-facing browser: (https://dbmi-bgm.github.io/udn-browser/) [45]. Variant-level data, clinical significance and supporting evidence, demographic information, and phenotype information for all diagnostic variants, including those included in the ES Exomiser, GS Exomiser, and GS Genomiser datasets used in this study, are submitted to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/submitters/505999/) [46] (https:/paperpile.com/c/gjbj9b/Oze4). Other relevant, patient-specific clinical information may be shared on a case-by-case basis at the discretion of the clinical team managing the case.


Articles from Genome Medicine are provided here courtesy of BMC

RESOURCES