Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 Mar 26:2024.08.06.606224. Originally published 2024 Aug 7. [Version 2] doi: 10.1101/2024.08.06.606224

VUStruct: a compute pipeline for high throughput and personalized structural biology

Christopher W Moth 1, Jonathan H Sheehan 2, Abdullah Al Mamun 1, R Michael Sivley 3, Alican Gulsevin 4, David Rinker 5; Undiagnosed Diseases Network6, John A Capra 7, Jens Meiler 1,8,*
PMCID: PMC11326201  PMID: 39149406

Abstract

Effective diagnosis and treatment of rare genetic disorders requires the interpretation of a patient’s genetic variants of unknown significance (VUSs). Today, clinical decision-making is primarily guided by gene-phenotype association databases and DNA-based scoring methods. Our web-accessible variant analysis pipeline, VUStruct, supplements these established approaches by deeply analyzing the downstream molecular impact of variation in context of 3D protein structure. VUStruct’s growing impact is fueled by the co-proliferation of protein 3D structural models, gene sequencing, compute power, and artificial intelligence.

Contextualizing VUSs in protein 3D structural models also illuminates longitudinal genomics studies and biochemical bench research focused on VUS, and we created VUStruct for clinicians and researchers alike. We now introduce VUStruct to the broad scientific community as a mature, web-facing, extensible, High-Performance Computing (HPC) software pipeline.

VUStruct maps missense variants onto automatically selected protein structures and launches a broad range of analyses. These include energy-based assessments of protein folding and stability, pathogenicity prediction through spatial clustering analysis, and machine learning (ML) predictors of binding surface disruptions and nearby post-translational modification sites. The pipeline also considers the entire input set of VUS and identifies genes potentially involved in digenic disease.

VUStruct’s utility in clinical rare disease genome interpretation has been demonstrated through its analysis of over 175 Undiagnosed Disease Network (UDN) Patient cases. VUStruct-leveraged hypotheses have often informed clinicians in their consideration of additional patient testing, and we report here details from two cases where VUStruct was key to their solution. We also note successes with academic research collaborators, for whom VUStruct has informed research directions in both computational genomics and wet lab studies.

Introduction

Clinical diagnosis of the genetic causes of rare diseases is primarily guided by databases of known gene-phenotype associations(1) and computational methods for quantifying the effects of genetic variants. Examples of these methods include GERP, which analyzes evolutionary constraint(2,3); SIFT(4) which performs protein sequence homology analysis; and Polyphen(5), which is additionally trained on observed and predicted protein 3D structural features. While variant effect prediction algorithms have demonstrated utility in distinguishing known pathogenic variants from benign variants across large variant sets, these algorithms suffer from low specificity. Thus, computational methods are often of limited utility for the small sets of pre-filtered variants(6) that are typically analyzed in clinical cases and other applications involving small sets of variants (bench studies of proteins and metabolic pathways, deep mutational scans, etc.)(7). The recent development of more sophisticated ML techniques and larger training data sets has increased the predictive accuracy of scoring algorithms(8,9). Nonetheless, even AlphaMissense’s scores lack reliability in cases of specific variants(8) and can exhibit high false positive rates(9). I.e., with such a high false positive rate, the touted longitudinal statistical significance of these algorithms cannot diagnose an individual patient’s disease, nor reliably identify disruption points in a single protein or metabolic pathway.

Compounding the above caveats, computational variant effect prediction approaches reveal neither molecular nor biological mechanistic hypotheses. Instead, these tools are focused on the broad classification of mutations into pathogenic or benign, a vague partitioning with limited clinical utility. The critical biology of life unfolds in 3D space and time. Yet, scores compress this complex biology into a single number which obscures functional consequences of VUSs and their mechanisms of disease progression.

Mechanistically, variants in protein coding regions can disrupt protein function and cause diseases in various ways. As examples, amino acid substitutions can compromise the subtle energetics of protein folding and thermodynamic stability. Protein-protein interactions can be disrupted, post-translational modifications can be impeded, and metabolic networks can be broken(10).

Recently, computational protein structural analyses have demonstrated the power of mechanistic modeling of variants’ effects to reveal causes of rare disease. For example, structural modeling suggested that a de-novo VUS in KCNC2 (V469L) could block the ion channel pore, impacting the stability of the protein(11). This provided a rational and foundational hypothesis for the mechanism by which V469L causes developmental and epileptic encephalopathies (DEE) symptoms. Structure-based calculations also revealed that a missense variant in MSH2 could destabilize the protein, leading to cellular protein degradation and Lynch-syndrome disorder(12). In these cases, structure-based calculations outperformed the traditionally used genetic disease predictors. The success of the MSH2 study, as well as numerous other single-gene focused analyses, informed the creation of a generalized structure-based workflow for variant classification in the clinic(13). While this workflow provides guidance for 3D structural model selection and curation, the prescribed processes require significant human input. Once selected, structures must be manually forwarded to various external webservers which perform specific calculations, the results of which must still be integrated into final reports and explored with external visualization tools.

We created VUStruct considering these successes and analytical challenges. We hypothesized that an automated pipeline of structure-based calculations could reveal clues of variant structural and functional impacts which, in turn, could lead to the plausible identification of the root causes of rare genetic disorders in many patient cases.

The goal of VUStruct is to provide robust context on the effects of a VUS on protein structure and function - context that enables the development of mechanistic clinical hypotheses about the causes of disease. VUStruct’s automated contextualization of VUSs in protein 3D structural models can also illuminate longitudinal genomics studies and biochemical bench research focused on VUS. The pipeline automatically selects structures, integrates a broad spectrum of established computational approaches, and caps the calculation with holistic case-wide reporting. In contrast to other webservers which display variants on protein structures alongside precomputed and pre-aggregated scores(14,15), VUStruct performs fresh calculations based on queries to current genomic and protein model databases. Many servers require upload of a (single) protein structure file(16) or default to Alphafold-2 models(17) covering only Uniprot(18) canonical transcripts. VUStruct expands the scope of previous methods by integrating analyses of multiple 3D structures per protein and non-canonical transcripts when available. The final computed product is a website that enables drilling-down from a top-level case report, to each transcript, to 3D structure visualization. For each 3D structure, NGLviewer(19) sessions afford not only 3D manipulation of a variant’s spatial environment, but also visualization of the proximity of known pathogenic and benign variants and evolutionary constraint within and between species (PathProx (20,21), ConSurf(22,23) and COSMIS(24)). VUStruct also investigates the potential for combinations of the VUSs to cause disease with DiGePred(25) and DIEP(26) ML algorithms trained to detect digenic disease.

Taken together, these automated and parallel calculations inform the clinic or laboratory at a 3D structural and molecular mechanistic level. VUStruct provides a compelling supplement to the insights gained from conventional genome-based scoring analysis alone, and we report the pipeline’s contribution to two UDN(27,28) patient cases.

Design and Implementation

The VUStruct computational pipeline is primarily implemented as Python codes which query and filter a wide range of pre-downloaded databases. Additional code launches and monitors the calculations which run inside Singularity(29) containers.

Conceptually, the pipeline runs in five discrete phases following upload of a variant set via the initial web form:

  1. Optional pre-processing of human genomic coordinates

  2. 3D structure selection and compute job planning

  3. Launch of job arrays (for each variant) on the HPC

  4. Progress monitoring

  5. Report generation

Except for progress monitoring, these phases are depicted in Figure 1 and detailed below.

Figure 1.

Figure 1.

Starting from user-provided variant genomic coordinates (top left), VUStruct identifies missense variants and maps them onto protein structures which VUStruct automatically curates from experimental depositions and model databases. Various parallel calculations are then launched on the HPC, as enumerated in the text.

1. Variant Upload and (optional) Genomic Preprocessing

VUStruct supports several input formats, which are first converted into a “vustruct.csv” pipeline-ready, comma-delimited flat file. This “VUStruct CSV” file contains gene names, transcript identifiers, amino acid variants, and parental inheritance when known. When users already know precise transcript identifiers and amino acid changes of interest, then “VUStruct CSV” format may be selected from the outset, and the pipeline will proceed directly to phase 2.

When starting from patient genetic variants (the underlying changes to alleles at chromosome positions) VUStruct converts these data to proteomic impacts. For variants loaded in VCF(30) format, parsed genomic coordinates are fed through the ENSEMBL(31) Variant Effect Predictor (VEP)(32) and missense variants are retained from the VEP output for subsequent structure selection and calculation scheduling. Since the VEP often reports impacts to many predicted transcripts which lack experimental validation, VUstruct restricts its calculations to the subset of returned genomic transcript IDs which cross-reference to Swiss-Prot curated Uniprot(18) protein IDs. Non-canonical transcripts are increasingly found in-vivo as proteomics methods evolve(33), and VUstruct includes all of a gene’s curated splice variants. I.e., we incorporate the curated but non-canonical sequences identified in Uniprot with additional “-N” suffixed identifiers.

A challenge in our field is that annotations to the human reference genome(34) are not static. There are relatively frequent amino acid sequence discrepancies between ENSEMBL transcript records and Swiss-Prot curated Uniprot sequences (as cross-referenced by UniParc identifiers in Uniprot’s “Id Mapping” resources). As a practical example, in early 2023, while 48,308 ENSEMBL transcripts cross-referenced to Uniprot sequences perfectly, we also found that 8,500 curated Uniprot IDs had no cross references to any GRCh38 Ensembl Transcript identifiers. 1,166 transcripts had cross-references and the transcript lengths were the same in both databases. However, the amino acid sequences were different. 370 transcripts had varying transcript lengths between between ENSEMBL and Uniprot. Uniprot is constantly working to improve cross-references, and the MANE(35) collaboration is also informing the field. Today, for variants that cannot be immediately processed due to these disconnects, the pipeline reports these problems to the user and the pipeline stops. This provides users with the opportunity to either rework the genomic coordinates input data, or manually download and patch the preprocessor-generated vustruct.csv file for input to phase 2.

2. Structure Selection and Compute Job planning.

From the Uniprot IDs in the “vustruct.csv” file, the pipeline “plans” the set of calculations by gathering structural information for target proteins. Available experimental structures are mined from the PDB and aligned to current transcripts via the SIFTS database(36,37). SwissModel and ModBase models are integrated (38,39). For mutations on canonical transcripts, AlphaFold(40) models are added to the set of representative structure. The final structure selections minimize redundancy and maximize diversity of experimental techniques, variant-coverage, model confidence and experimental quality metrics. Multimeric complexes are also prioritized in this process.

Below the single directory for the user-provided Case ID, a subdirectory is created for each variant. For each retained structure, calculations are planned, and command line parameters are set for each job. These details are recorded in the workplan.csv of the variant subdirectory. Importantly, planning is entirely independent of the HPC architecture. To ensure that no job conflicts with any other, each user-input Case ID is appended to a Globally Unique Identifier (GUID)(41) and assigned a work directory in the hierarchy of VUStruct/CaseID_and GUID/Transcript ID/3D Structure Type and Id/Calculation Type/Work Directory/. A sibling /Status Directory/is used by each running job in VUStruct to uniformly communicate progress, competition, or failure to the VUStruct monitor application (described under phase 4).

3. Job Launch

To launch the hundred(s) of jobs typically planned for a set of VUS (e.g., from a UDN case or from list of variants from genome sequencing), the pipeline writes submission scripts for the supported cluster environments on the back end (either SLURM(42) or IBM LSF). Each launched job runs out of a Singularity(29) container. From the container, the bound filesystem of the HPC environment is accessible but the application is otherwise blind to the surrounding HPC API. A single short script, external to the container architecture, launches all the HPC jobs, and records assigned job numbers for downstream monitoring.

The currently launched calculations include:

  1. Rosetta ΔΔGfolding(4345) estimates the energetic impact of each amino acid substitution on the free energy of protein folding. These two-part calculations are stored in a repository to avoid redundant “relax” steps and save compute time.

  2. PathProx (20,21) predicts pathogenic variants when they better fit with clusters of “known pathogenic” sites (mined from Clinvar) (46) vs. randomly placed vs. benign variant sites found in Gnomad (47).

  3. ScanNet (48)estimates the likelihood that a variant to disrupt a protein-protein interaction, via an ML algorithm.

  4. MusiteDeep (49) predicts protein post-translational (PTM) site modification through a deep-learning framework.

  5. Digenic disease interactions are predicted with DigePred(25) and DIEP(26).

Additional suggestions for the interpretation of these outputs are provided in the Supplemental Information.

4. Job Monitoring

Over the course of a VUStruct run, the case report is refreshed at 30 minute intervals, to reflect the latest calculated data. The stdout, stderr, and .log files for each individual job are also updated.

The pipeline also informs the user of both overall and individual job progress on the cluster. In a large shared HPC environment, launched jobs are assigned unique job numbers, but do not immediately run. Traditionally, HPC users monitor job progress with a suite of HPC-provided command line tools. Through its web interface, VUStruct interfaces to these tools on the back end, and dynamically reports on job prioritization, submission delays, remaining run time, and resource allocation. These technical status updates are presented to the user via a JavaScript monitor running in the case landing page. This page receives updates from an HPC node via middleware on the web server host.

5. Reporting

The pipeline generates a case-wide report as a landing page that combines calculated results for each transcript. As shown in Figure 2, the report also integrates queried scores from AlphaMissense(50), ConSurf(22,23) and COSMIS(24) for all the individual variants. This is followed by digenic analysis outputs.

Figure 2.

Figure 2

The case report landing page contains a table, in which each row summarizes the range of calculated values for each variant. Ranges arise in some calculations because multiple structures are considered for each transcript. In the case of input genomic coordinates, multiple transcript isoforms are often impacted for each variant. The “Refresh Cluster Jobs Information” box allows detailed monitoring and troubleshooting. The drawn red box shows how summary row 14 (a change to gene HFE on Chrom 6) has been expanded to display five rows for different impacted transcripts. The first of row corresponds to the canonical UniProt isoform. Clicking the “Report” link for that line will display detailed calculations for this variant in the context of that transcript (see next figure).

From the case-wide report, the user may click into specific transcript reports (Figure 3). Clicking into a transcript report presents the user with a PFAM domain graphic(51) followed by a tabular summary of calculation results for the associated structures that were selected in step 2. The “navbar” at top left allows the user to hop to individual 3D structures, where NGL WebViewer(19) sessions are available to inspect the atomic environment of variants (Figure 4). The customized viewer also allows backbone coloring of the various calculated constraint scores, and model confidence.

Figure 3.

Figure 3

The top of each transcript variant report shows the variant location as a pink diamond in the context of the protein’s PFAM domain(51) annotation. This is followed by results for Rate4Site, COSMIS, MusiteDeep, and ScanNet calculations. A key table is the Structure Summary, which lists all of the structures (from the PDB, MODBASE, SWISS-MODEL database, and AlphaFold database) on which calculations were performed. For example, the highlighted row summarizes all the calculations performed on X-Ray crystal structure 1a6b.pdb, and the highlighted shortcut in the left column leads to the section of the page detailing those results (see next figure).

Figure 4.

Figure 4.

The section of the results page devoted to each structural model shows the results of the ΔΔG and PathProx calculations, highlighted in red, along with associated statistics to judge reliability. Each structure is displayed in a customized and interactive NGL Viewer(19) session. This view can be used to understand the structural context of the variant (highlighted here with a gray text pop-up. For example, one can display all pathogenic and likely pathogenic ClinVar variants (shown as red spheres), or color the model according to AlphaFold confidence or by PathProx score (from low- blue, to high - red - as shown above). Figures to illustrate a structural mechanistic hypothesis can be generated quickly from these images.

The downstream audience for VUStruct case reports is broader than the structural biologists trained to interpret the pipeline’s detailed outputs. Typically, that final audience includes clinicians and geneticists who are primarily interested in whether VUStruct identifies a candidate gene for ongoing consideration, and how pipeline outputs, at high level, inform that recommendation. To communicate the high-level findings of VUStruct succinctly, VUStruct drafts a case summary spreadsheet (Supplemental Information Figure 1). The Supplemental Information also suggests approaches to communicating with clinical partners and includes advice on calculation interpretation.

Dependencies

VUStruct integrates several externally sourced databases. So that the pipeline can run responsively, and avoid vulnerability to external outages, the supporting databases are locally downloaded, installed, and maintained. The two support pillars of VUStruct are the ENSEMBL GRCh38 PERL API and the UniProt id mapping file. We locally import ENSEMBL’s SQL database, and additionally load UniProt(18) cross-references into SQL tables to speed sequence cross-references between genome and proteome. BASH scripts are additionally provided to aid download of Clinvar(46), COSMIC(52), and gnomAD(47) databases which are mined for PathProx’s mathematical spatial analysis and for web-based visualizations. Several of our predictive calculations integrate sequence constraint, gleaned from both multi-species sequence alignments(22,23) and human population sequences(24). These calculations, along with AlphaMissense(50) predictions, are downloaded as transcriptome-wide precomputations, and are integrated into final reports without the need for cluster launches.

Cited calculations are deployed inside Singularity Containers. Deployment of Rosetta ΔΔGfolding (4345) Cartesian and Monomer calculations requires a free academic or paid commercial license from rosettacommons.org.

Results/Application of VUStruct in the interpretation of clinical data:

We have demonstrated the VUStruct pipeline’s utility in the interpretation of genetic VUS in collaboration with colleagues from the Vanderbilt UDN. The containerized VUStruct software pipeline has been applied to over 150 UDN Vanderbilt UDN patients and 25 Washington University patients. The pipeline provides researchers and clinician geneticists with insights into candidate missense variants in the context of 3D protein atomic structure. In contrast to the many algorithms and websites that perform a single calculation on a single protein variant on a single protein structure, VUStruct is holistic and automated. Our pipeline analyzes a set of patient genetic VUSs and unifies the results under a case-wide report page. VUStruct is also noteworthy for its principled selection of appropriate structures among the growing wealth of available experimental and computational structural models, automated calculation setup and launch, and progress monitoring.

As one illustration of VUStruct’s potential to aid hypothesis generation, we highlight a patient with PASNA syndrome caused by a heterozygous variant in the CACNA1D gene that encodes a Human L-type voltage-gated calcium channel (Cav). Several candidate variants were selected from the patient genome sequencing (GS) data based on phenotype analysis. These variants were submitted to the pipeline and the 3D structure of the corresponding protein was analyzed by different computational methods including Rosetta ΔΔG (4345), protein-protein interaction (PPI), post-translational modifications (PTM) and digenic predictions (DiGePred) analysis. VUStruct reported that the F767L variant in CACNA1D results in structural destabilization as evidenced by ΔΔG score in Rosetta. Starting from the VUStruct report, we hypothesized that the variant may contribute to the PASNA syndrome and conducted additional Rosetta simulations on the Cav structural model. In follow-up, two different variants F767L and F767S for Cav were used to calculate the ΔΔG in Rosetta using closed state conformation (PDB id: 7UHG (53)). F767S is a known pathogenic variant that causes a gain of function mechanism, and it was used as a positive control for this study. The higher calculated ΔΔG of F767L (~5.3 Rosetta Energy Units) vs F767S (~3.9 R.E.U.) suggested that F767L could contribute to at least as much structural disruption as known pathogenic variant F767S for the closed state conformation. Thus we hypothesized that these variants destabilize the closed state, and push conformational equilibrium towards the channel opening state. The search for this crucial finding began with VUStruct analysis and led to the further confirmative analysis to diagnose the possible cause of the PASNA syndrome (54).

A second demonstration of VUStruct’s utility was aiding a diagnosis of Diamond Blackfin anemia (DBA) in a case which could not be explained by simple Mendelian inheritance. The VUStruct report suggested that a missense variant in the RPS19 gene results in a slight stabilization, based on Rosetta ΔΔG. In addition, the proband carried another variant in the RPL27 gene, which DiGePred(25) and DIEP(26) analysis predicted to have a strong digenic interaction with RPS19. These clues helped to focus further structural analysis. We investigated different 80S ribosome structures available in the protein data bank. Although RPS19 and RPL27 are on opposite sides of the complex. It is plausible that T55M in RPS19 changes allosteric interactions between the two proteins, disrupting the 80S ribosome function. These structural analyses inspired further co-segregation and RNA sequencing analysis of the proband. Further analysis of these suggested the proband’s DBA is caused by the digenic interactions between RPS19 and RPL27(55).

Availability and Future Directions

The website is made available to all, without condition. For those wishing to setup their own pipeline environment, all our code and containers (with one exception), are licensed under the MIT License and can be downloaded from https://github.com/meilerlab/VUStruct. The one exception is the Rosetta ΔΔG module containers, which require Rosetta Commons licensing, available at no charge to academic users.

VUStruct development is continuously fueled by ongoing explosions in available protein 3D structures, genome sequencing, computer power, and artificial intelligence. We are committed to the pipeline’s flexibility and continuous improvement.

One current pipeline limitation is that all calculations are based on sets of single structural models, and the implications of dynamics are not presently considered. Multi-conformer generation is an active area of research(56). We plan to integrate that work into the pipeline, so that more conformational states are sampled. Additionally, we are integrating AlphaFold models for non-canonical transcript sequences(57). Pending its public opening, we hope to mine the AlphaFold 3 repository for its updated structural coverage that includes multimeric complexes(58). Predictions of digenic interactions should benefit from model retraining, given the emergence of new ground truth data sets(59).

Supplementary Material

Supplement 1
media-1.pdf (476.3KB, pdf)

UDN Consortium

Full Name Affiliation Email
Alyssa A. Tran BCM Clinical alyssat@bcm.edu
Arjun Tarakad BCM Clinical tarakad@bcm.edu
Ashok Balasubramanyam BCM Clinical ashokb@bcm.edu
Brendan H. Lee BCM Clinical blee@bcm.edu
Carlos A. Bacino BCM Clinical cbacino@bcm.edu
Daryl A. Scott BCM Clinical dscott@bcm.edu
Elaine Seto BCM Clinical esseto@bcm.edu
Gary D. Clark BCM Clinical gdclark@texaschildrens.or
Hongzheng Dai BCM Clinical Hongzheng.Dai@bcm.edu
Hsiao-Tuan Chao BCM Clinical hc140077@bcm.edu
Ivan Chinn BCM Clinical Ivan.Chinn@bcm.edu
James P. Orengo BCM Clinical james.orengo@bcm.edu
Jennifer E. Posey BCM Clinical Jennifer.Posey@bcm.edu
Jill A. Rosenfeld BCM Clinical mokry@bcm.edu
Kim Worley BCM Clinical kworley@bcm.edu
Lindsay C. Burrage BCM Clinical burrage@bcm.edu
Lisa T. Emrick BCM Clinical emrick@bcm.edu
Lorraine Potocki BCM Clinical lpotocki@bcm.edu
Monika Weisz Hubshman BCM Clinical hubshman@bcm.edu
Richard A. Lewis BCM Clinical rlewis@bcm.edu
Ronit Marom BCM Clinical ronit.marom@bcm.edu
Seema R. Lalani BCM Clinical seemal@bcm.edu
Shamika Ketkar BCM Clinical ketkar@bcm.edu
Tiphanie P. Vogel BCM Clinical tiphanie.vogel@bcm.edu
William J. Craigen BCM Clinical wcraigen@bcm.edu
Lauren Blieden BCM Clinical Lauren.Blieden@bcm.edu
Jared Sninsky BCM Clinical Jared.Sninsky@bcm.edu
Hugo J. Bellen BCM MOSC hbellen@bcm.edu
Michael F. Wangler BCM MOSC mw147467@bcm.edu
Oguz Kanca BCM MOSC Oguz.Kanca@bcm.edu
Shinya Yamamoto BCM MOSC yamamoto@bcm.edu
Christine M. Eng BCM Sequencing ceng@bcm.edu
Patricia A. Ward BCM Sequencing pward@bcm.edu
Pengfei Liu BCM Sequencing pliu@baylorgenetics.com
Adeline Vanderver CHOP vandervera@chop.edu
Cara Skraban CHOP skrabanc@chop.edu
Edward Behrens CHOP behrens@chop.edu
Gonench Kilich CHOP kilichg@chop.edu
Kathleen Sullivan CHOP sullivank@chop.edu
Kelly Hassey CHOP hasseyk@chop.edu
Ramakrishnan Rajagopalan CHOP rajagopalanr@chop.edu
Rebecca Ganetzky CHOP ganetzkyr@chop.edu
Vishnu Cuddapah CHOP cuddapahv@chop.edu
Anna Raper CHOP/UPenn rapera@pennmedicine.up
Daniel J. Rader CHOP/UPenn rader@pennmedicine.upe
Giorgio Sirugo CHOP/UPenn Giorgio.Sirugo@pennmedi
Vaidehi Jobanputra Columbia vj2004@cumc.columbia.edu
Allyn McConkie-Rosell Duke allyn.mcconkie@duke.edu
Kelly Schoch Duke kelly.schoch@duke.edu
Mohamad Mikati Duke mohamad.mikati@duke.edu
Nicole M. Walley Duke nicole.walley@duke.edu
Rebecca C. Spillmann Duke rebecca.crimian@duke.ed
Vandana Shashi Duke vandana.shashi@duke.edu
Alan H. Beggs Harvard beggs@enders.tch.harvar
Calum A. MacRae Harvard camacrae@bics.bwh.harva
David A. Sweetser Harvard dsweetser@partners.org
Deepak A. Rao Harvard darao@bwh.harvard.edu
Edwin K. Silverman Harvard ed.silverman@channing.h
Elizabeth L. Fieg Harvard efieg@bwh.harvard.edu
Frances High Harvard fhigh@partners.org
Gerard T. Berry Harvard gerard.berry@childrens.ha
Ingrid A. Holm Harvard ingrid.holm@childrens.har
J. Carl Pallais Harvard Juan.Pallais@mgh.harvard
Joan M. Stoler Harvard joan.stoler@childrens.harv
Joseph Loscalzo Harvard jloscalzo@partners.org
Lance H. Rodan Harvard lance.rodan@childrens.ha
Laurel A. Cobban Harvard lcobban@bwh.harvard.ed
Lauren C. Briere Harvard lbriere@partners.org
Matthew Coggins Harvard mcoggins@bwh.harvard.e
Melissa Walker Harvard walker.melissa@mgh.harv
Richard L. Maas Harvard maas@genetics.med.harv
Susan Korrick Harvard skorrick@bwh.harvard.edu
Jessica Douglas Harvard Jessica.Douglas@childrens
AudreyStephannie C. Maghiro Harvard DMCC audreystephannie_maghir
Cecilia Esteves Harvard DMCC cecilia_esteves@hms.harv
Emily Glanton Harvard DMCC Emily_Glanton@hms.harv
Isaac S. Kohane Harvard DMCC isaac_kohane@hms.harva
Kimberly LeBlanc Harvard DMCC kimberly_leblanc@hms.ha
Rachel Mahoney Harvard DMCC rachel_mahoney@hms.ha
Shamil R. Sunyaev Harvard DMCC ssunyaev@hms.harvard.e
Shilpa N. Kobren Harvard DMCC Shilpa_Kobren@hms.harv
Brett H. Graham IU bregraha@iu.edu
Erin Conboy IU econboy@iu.edu
Francesco Vetrini IU fvetrini@iu.edu
Kayla M. Treat IU ktreat@iuhealth.org
Khurram Liaqat IU kliaqat@iu.edu
Lili Mantcheva IU lmantche@iu.edu
Stephanie M. Ware IU stware@iu.edu
Breanna Mitchell Mayo Clinic Mitchell.Breanna@mayo.e
Brendan C. Lanpher Mayo Clinic lanpher.brendan@mayo.e
Devin Oglesbee Mayo Clinic oglesbee.devin@mayo.ed
Eric Klee Mayo Clinic klee.eric@mayo.edu
Filippo Pinto e Vairo Mayo Clinic vairo.filippo@mayo.edu
Ian R. Lanza Mayo Clinic lanza.ian@mayo.edu
Kahlen Darr Mayo Clinic Darr.Kahlen@mayo.edu
Lindsay Mulvihill Mayo Clinic mulvihill.lindsay@mayo.e
Lisa Schimmenti Mayo Clinic Schimmenti.Lisa@mayo.ed
Queenie Tan Mayo Clinic Tan.KhoonGheeQueenie@
Surendra Dasari Mayo Clinic dasari.surendra@mayo.e
Adriana Rebelo Miami arebelo@med.miami.edu
Carson A. Smith Miami carsonsmith@med.miami.
Deborah Barbouth Miami dbarbouth@miami.edu
Guney Bademci Miami g.bademci@miami.edu
Joanna M. Gonzalez Miami jmg442@miami.edu
Kumarie Latchman Miami kxl604@med.miami.edu
LéShon Peart Miami L.peart@med.miami.edu
Mustafa Tekin Miami mtekin@miami.edu
Nicholas Borja Miami nborja@med.miami.ed
Stephan Zuchner Miami szuchner@miami.edu
Stephanie Bivona Miami sab355@miami.edu
Willa Thorson Miami wthorson@miami.edu
Herman Taylor Morehouse DMCC htaylor@msm.edu
Andrea Gropman NIH UDP agropman@childrensnatic
Barbara N. Pusey Swerdzewski NIH UDP barbara.pusey@nih.gov
Camilo Toro NIH UDP toroc@mail.nih.gov
Colleen E. Wahl NIH UDP colleen.wahl@nih.gov
Donna Novacic NIH UDP donna.novacic@nih.gov
Ellen F. Macnamara NIH UDP ellen.macnamara@nih.gov
John J. Mulvihill NIH UDP johmulvihill@gmail.com
Maria T. Acosta NIH UDP acostam@nhgri.nih.gov
Precilla D’Souza NIH UDP precilla.d’souza@nih.gov
Valerie V. Maduro NIH UDP vbraden@mail.nih.gov
Ben Afzali NIH UDP, NHGRI ben.afzali@nih.gov
Ben Solomon NIH UDP, NHGRI solomonb@mail.nih.gov
Cynthia J. Tifft NIH UDP, NHGRI ctifft@nih.gov
David R. Adams NIH UDP, NHGRI david.adams@nih.gov
Elizabeth A. Burke NIH UDP, NHGRI elizabeth.burke2@nih.gov
Francis Rossignol NIH UDP, NHGRI francis.rossignol@nih.gov
Heidi Wood NIH UDP, NHGRI heidi.wood@nih.gov
Jiayu Fu NIH UDP, NHGRI fuj6@mail.nih.gov
Joie Davis NIH UDP, NHGRI jdavis@niaid.nih.gov
Leoyklang Petcharet NIH UDP, NHGRI petcharat.leoyklang@nih.g
Lynne A. Wolfe NIH UDP, NHGRI lynne.wolfe@nih.gov
Margaret Delgado NIH UDP, NHGRI margaret.delgado@nih.go
Marie Morimoto NIH UDP, NHGRI marie.morimoto@nih.gov
Marla Sabaii NIH UDP, NHGRI marla.sabaii@nih.gov
MayChristine V. Malicdan NIH UDP, NHGRI maychristine.malicdan@ni
Neil Hanchard NIH UDP, NHGRI neil.hanchard@nih.gov
Orpa Jean-Marie NIH UDP, NHGRI orpa.jean-marie@nih.gov
Wendy Introne NIH UDP, NHGRI wintrone@nhgri.nih.gov
William A. Gahl NIH UDP, NHGRI gahlw@mail.nih.gov
Yan Huang NIH UDP, NHGRI yan.huang@nih.gov
Aimee Allworth PNW allwoa@uw.edu
Andrew Stergachis PNW absterga@uw.edu
Danny Miller PNW Danny.Miller@seattlechild
Elizabeth Blue PNW em27@uw.edu
Elizabeth Rosenthal PNW erosen@uw.edu
Elsa Balton PNW ebalton@medicine.washin
Emily Shelkowitz PNW
Eric Allenspach PNW eric.allenspach@seattlech
Fuki M. Hisama PNW fmh2@uw.edu
Gail P. Jarvik PNW pair@uw.edu
Ghayda Mirzaa PNW gmirzaa@uw.edu
Ian Glass PNW ianglass@uw.edu
Kathleen A. Leppig PNW leppig@uw.edu
Katrina Dipple PNW katrina.dipple@seattlechil
Mark Wener PNW wener@uw.edu
Martha Horike-Pyne PNW mpyne@medicine.washing
Michael Bamshad PNW mbamshad@uw.edu
Peter Byers PNW pbyers@uw.edu
Sam Sheppeard PNW samshep@uw.edu
Sirisak Chanprasert PNW sirisc@uw.edu
Virginia Sybert PNW flk01@uw.edu
Wendy Raskind PNW wendyrun@uw.edu
Nitsuh K. Dargie PNW nitsuhk@medicine.washin
Beth A. Martin Stanford martinb@stanford.edu
Chloe M. Reuter Stanford creuter@stanfordhealthca
Devon Bonner Stanford devonbonner@stanfordhe
Elijah Kravets Stanford ekravets@stanford.edu
Holly K. Tabor Stanford hktabor@stanford.edu
Jacinda B. Sampson Stanford jacindas@stanford.edu
Jason Hom Stanford jasonhom@stanford.edu
Jennefer N. Kohler Stanford jkohler@stanfordhealthca
Jonathan A. Bernstein Stanford Jon.Bernstein@stanford.e
Kevin S. Smith Stanford kssmith@stanford.edu
Matthew T. Wheeler Stanford wheelerm@stanford.edu
Meghan C. Halley Stanford mhalley@stanford.edu
Page C. Goddard Stanford pgoddard@stanford.edu
Paul G. Fisher Stanford pfisher@stanford.edu
Rachel A. Ungar Stanford raungar@stanford.edu
Raquel L. Alvarez Stanford raquela1@stanford.edu
Shruti Marwaha Stanford mshruti@stanford.edu
Terra R. Coakley Stanford tcoakley@stanford.edu
Euan A. Ashley Stanford DMCC Euan@stanford.edu
Ali Al-Beshri UAB asabeshri@uabmc.edu
Anna Hurst UAB acehurst@uab.edu
Bruce Korf UAB bkorf@uab.uabmc.edu
Kaitlin Callaway UAB kcallaway@uabmc.edu
Martin Rodriguez UAB rodriguez@uabmc.edu
Tammi Skelton UAB tlskelton@uabmc.edu
Andrew B. Crouse UAB DMCC acrouse@uab.edu
Jordan Whitlock UAB DMCC jbarham3@uab.edu
Mariko Nakano-Okuno UAB DMCC marikonk@uab.edu
Matthew Might UAB DMCC might@uab.edu
William E. Byrd UAB DMCC webyrd@gmail.com
Changrui Xiao UCI/CHOC changrx@hs.uci.edu
Eric Vilain UCI/CHOC evilain@hs.uci.edu
Jose Abdenur UCI/CHOC JAbdenur@choc.org
Kathyrn Singh UCI/CHOC kesingh@hs.uci.edu
Rebekah Barrick UCI/CHOC rebekah.barrick@choc.org
Sanaz Attaripour UCI/CHOC sattarip@hs.uci.edu
Suzanne Sandmeyer UCI/CHOC sbsandme@hs.uci.edu
Sirisak Chanprasert PNW sirisc@uw.edu
Virginia Sybert PNW flk01@uw.edu
Wendy Raskind PNW wendyrun@uw.edu
Nitsuh K. Dargie PNW nitsuhk@medicine.washin
Beth A. Martin Stanford martinb@stanford.edu
Chloe M. Reuter Stanford creuter@stanfordhealthca
Devon Bonner Stanford devonbonner@stanfordhe
Elijah Kravets Stanford ekravets@stanford.edu
Holly K. Tabor Stanford hktabor@stanford.edu
Jacinda B. Sampson Stanford jacindas@stanford.edu
Jason Hom Stanford jasonhom@stanford.edu
Jennefer N. Kohler Stanford jkohler@stanfordhealthca
Jonathan A. Bernstein Stanford Jon.Bernstein@stanford.ei
Kevin S. Smith Stanford kssmith@stanford.edu
Matthew T. Wheeler Stanford wheelerm@stanford.edu
Meghan C. Halley Stanford mhalley@stanford.edu
Page C. Goddard Stanford pgoddard@stanford.edu
Paul G. Fisher Stanford pfisher@stanford.edu
Rachel A. Ungar Stanford raungar@stanford.edu
Raquel L. Alvarez Stanford raquela1@stanford.edu
Shruti Marwaha Stanford mshruti@stanford.edu
Terra R. Coakley Stanford tcoakley@stanford.edu
Euan A. Ashley Stanford DMCC Euan@stanford.edu
Ali Al-Beshri UAB asabeshri@uabmc.edu
Anna Hurst UAB acehurst@uab.edu
Bruce Korf UAB bkorf@uab.uabmc.edu
Kaitlin Callaway UAB kcallaway@uabmc.edu
Martin Rodriguez UAB rodriguez@uabmc.edu
Tammi Skelton UAB tlskelton@uabmc.edu
Andrew B. Crouse UAB DMCC acrouse@uab.edu
Jordan Whitlock UAB DMCC jbarham3@uab.edu
Mariko Nakano-Okuno UAB DMCC marikonk@uab.edu
Matthew Might UAB DMCC might@uab.edu
William E. Byrd UAB DMCC webyrd@gmail.com
Changrui Xiao UCI/CHOC changrx@hs.uci.edu
Eric Vilain UCI/CHOC evilain@hs.uci.edu
Jose Abdenur UCI/CHOC JAbdenur@choc.org
Kathyrn Singh UCI/CHOC kesingh@hs.uci.edu
Rebekah Barrick UCI/CHOC rebekah.barrick@choc.org
Sanaz Attaripour UCI/CHOC sattarip@hs.uci.edu
Suzanne Sandmeyer UCI/CHOC sbsandme@hs.uci.edu
Tahseen Mozaffar UCI/CHOC mozaffar@hs.uci.edu
Albert R. La Spada UCI/CHOC alaspada@uci.edu
Elizabeth C. Chao UCI/CHOC ecchao@uci.edu
Maija-Rikka Steenari UCI/CHOC msteenari@choc.org
Alden Huang UCLA AYHuang@mednet.ucla.ed
Brent L. Fogel UCLA bfogel@ucla.edu
Esteban C. Dell'Angelica UCLA edellangelica@mednet.uc
George Carvalho UCLA GCarvalhoNeto@mednet.
Julian A. Martfnez-Agosto UCLA julianmartinez@mednet.u
Manish J. Butte UCLA mbutte@mednet.ucla.edu
Martin G. Martin UCLA mmartin@mednet.ucla.ed
Naghmeh Dorrani UCLA ndorrani@mednet.ucla.ed
Neil H. Parker UCLA nhparker@mednet.ucla.ed
Rosario I. Corona UCLA rcoronadelafuente@medn
Stanley F. Nelson UCLA snelson@ucla.edu
Yigit Karasozen UCLA Ykarasozen@mednet.ucla
Aaron Quinlan University of Utah aquinlan@genetics.utah.e
Alistair Ward University of Utah alistairnward@gmail.com
Ashley Andrews University of Utah ashley.andrews@hsc.utah
Corrine K. Welt University of Utah cwelt@u2m2.utah.edu
Dave Viskochil University of Utah dave.viskochil@hsc.utah.e
Erin E. Baldwin University of Utah erin.baldwin@hsc.utah.ed
John Carey University of Utah john.carey@hsc.utah.edu
Justin Alvey University of Utah justin.alvey@hsc.utah.edu
Laura Pace University of Utah laura.pace@hsc.utah.edu
Lorenzo Botto University of Utah lorenzo.botto@hsc.utah.e
Nicola Longo University of Utah nicola.longo@hsc.utah.ed
Paolo Moretti University of Utah paolo.moretti@hsc.utah.e
Rebecca Overbury University of Utah rebecca.overbury@hsc.uta
Russell Butterfield University of Utah russell.butterfield@hsc.ut
Steven Boyden University of Utah steven.boyden@genetics.u
Thomas J. Nicholas University of Utah thomas.nicholas@utah.ed
Matt Velinder University of Utah mvelinder@frameshift.io
Gabor Marth University of Utah DMCC gmarth@genetics.utah.ed
Pinar Bayrak-Toydemir University of Utah/ARUP pinar.bayrak-toydemir@ar
Rong Mao University of Utah/ARUP rong.mao@aruplab.com
Monte Westerfield UO MOSC monte@uoneuro.uoregon
Brian Corner Vanderbilt brian.corner@vumc.org
John A. Phillips III Vanderbilt John.a.phillips@vumc.org
Kimberly Ezell Vanderbilt kimberly.ezell@vumc.org
Lynette Rives Vanderbilt lynette.c.rives@vumc.org
Rizwan Hamid Vanderbilt rizwan.hamid@vumc.org
Serena Neumann Vanderbilt serena.neumann@vumc.o
Ashley McMinn Vanderbilt ashley.mcminn@vumc.org
Joy D. Cogan Vanderbilt joy.cogan@vumc.org
Thomas Cassini Vanderbilt thomas.a.cassini@vumc.or
Alex Paul WUSTL Clinical alex.paul@wustl.edu
Dana Kiley WUSTL Clinical dana.kiley@wustl.edu
Daniel Wegner WUSTL Clinical danieljwegner@wustl.edu
Erin McRoy WUSTL Clinical e.hediger@wustl.edu
Jennifer Wambach WUSTL Clinical wambachj@wustl.edu
Kathy Sisco WUSTL Clinical siscok@wustl.edu
Patricia Dickson WUSTL Clinical pdickson@wustl.edu
F. Sessions Cole WUSTL DMCC fcole@wustl.edu
Dustin Baldridge WUSTL MOSC dbaldri@wustl.edu
Jimann Shin WUSTL MOSC shinji@wustl.edu
Lilianna Solnica-Krezel WUSTL MOSC solnical@wustl.edu
Stephen Pak WUSTL MOSC stephen.pak@email.wustl.
Timothy Schedl WUSTL MOSC ts@wustl.edu
Hector Rodrigo Mendez Stanford mendezh@stanford.edu
Brianna Tucker Stanford bmtucker@stanford.edu
Beatriz Anguiano Stanford banguian@stanford.edu
Mia Levanto Stanford mlevanto@stanford.edu
Suha Bachir Stanford sbachir@stanford.edu
Laurens Wiel Stanford lvdwiel@stanford.edu
Stephen B Montgomery Stanford smontgom@stanford.edu
Tanner D Jensen Stanford tannerj@stanford.edu
John E. Gorzynski Stanford jgorz@stanford.edu
Sara Emami Stanford slemami@stanford.edu
Laura Keehan Stanford keehan@stanford.edu
Jennifer Schymick Stanford jennifer.schymick@hhs.scc
Taylor Maurer Stanford maurertm@stanford.edu
Alexander Miller Stanford atex91@stanford.edu
Andres Vargas UCLA AndresVargas@mednet.uc
Amanda M. Shrewsbury UCLA ashrewsbury@mednet.ucl
Bianca E. Russell UCLA berussell@mednet.ucla.ed
Layal F. Abi Farraj UCLA LAbiFarraj@mednet.ucla.e
Elizabeth A Worthey UAB eaworthey@uabmc.edu
Tarun KK Mamidi UAB tmamidi@uab.edu
Brandon M Wilk UAB brandonwilk@uabmc.edu
Rachel Li Sanford Rachel.Li@SanfordHealth.
Jennifer Morgan Sanford Jennifer.Morgan@Sanford
Chun-Hung Chan Sanford Chun-Hung.Chan@Sanfor
Paul Berger Sanford Paul.berger@sanfordhealt
Mohamad Saifeddine Sanford Mohamad.Saifeddine@Sa
Isum Ward Sanford Isum.Ward@SanfordHealt
Jason Schend Sanford Jason.Schend@SanfordHe
Megan Bell Sanford Megan.bell@sanfordhealt
Dr. Francisco Bustos velasq Sanford Francisco.Bustos@sanford
Taylor Beagle Sanford Taylor.Beagle@SanfordHe
Miranda Leitheiser Sanford Miranda.Leitheiser@Sanf
Runjun Kumar WUSTL Clinical rdkumar@uw.edu
Donald Basel MCW-CW dbasel@mcw.edu
Michael Muriello MCW-CW mmuriello@mcw.edu
Brett Bordini MCW-CW bbordini@mcw.edu
Michael Zimmermann MCW-CW mtzimmermann@mcw.ed
Abdul Elkadri MCW-CW AElKadri@mcw.edu
James Verbsky MCW-CW jverbsky@mcw.edu
Julie McCarrier MCW-CW jmccarrier@mcw.edu

Support and Thanks

This work leveraged the resources provided by the Vanderbilt Advanced Computing Center for Research and Education (ACCRE), a collaboratory operated by and for Vanderbilt faculty. ACCRE is comprised of over 3,000 researchers from more than 40 campus departments.

The pipeline would not have been possible without the energetic and helpful support from staff at Uniprot, ENSEMBL, SwissModel, and Modbase.

Grants

This work has been supported by NIH grant R01 LM013434-04.

J.M. acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1052 (209933838), and SPP 2363 (460865652). J.M. is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J.M. is supported by BMBF (Federal Ministry of Education and Research) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI). This work is partly supported by the Federal Ministry of Education and Research (BMBF) through DAAD project 57616814 (SECAI, School of Embedded Composite AI). Work in the Meiler laboratory is further supported through the NIH (R01 HL122010, R01 DA046138, R01 AG068623, U01 AI150739, R01 CA227833, R01 LM013434, S10 OD016216, S10 OD020154, S10 OD032234). This work was supported by the BMBF-funded German Network for Bioinformatics Infrastructure (de.NBI).

Research reported in this publication was supported by the National Institute Of Neurological Disorders And Stroke of the National Institutes of Health under Award Numbers [U01HG007674, U01NS134349, U01HG010215, U01NS134354]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

  • 1.Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Davydov E V., Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLoS Comput Biol. 2010 Dec 2;6(12):e1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005. Jul;15(7):901–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ng PC. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 Jul 1;31(13):3812–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ramensky V. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002 Sep 1;30(17):3894–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kobren SN, Baldridge D, Velinder M, Krier JB, LeBlanc K, Esteves C, et al. Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases. Genetics in Medicine. 2021. Jun;23(6):1075–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Flanagan SE, Patch AM, Ellard S. Using SIFT and PolyPhen to Predict Loss-of-Function and Gain-of-Function Mutations. Genet Test Mol Biomarkers. 2010. Aug;14(4):533–7. [DOI] [PubMed] [Google Scholar]
  • 8.McDonald EF, Oliver KE, Schlebach JP, Meiler J, Plate L. Benchmarking AlphaMissense pathogenicity predictions against cystic fibrosis variants. PLoS One. 2024;19(1):e0297560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Murali H, Wang P, Liao EC, Wang K. Genetic variant classification by predicted protein structure: A case study on IRF6. Comput Struct Biotechnol J. 2024. Dec;23:892–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Taipale M. Disruption of protein function by pathogenic mutations: common and uncommon mechanisms. Biochemistry and Cell Biology. 2019. Feb;97(1):46–57. [DOI] [PubMed] [Google Scholar]
  • 11.Mukherjee S, Cassini TA, Hu N, Yang T, Li B, Shen W, et al. Personalized structural biology reveals the molecular mechanisms underlying heterogeneous epileptic phenotypes caused by de novo KCNC2 variants. Human Genetics and Genomics Advances. 2022. Oct;3(4):100131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Nielsen S V., Stein A, Dinitzen AB, Papaleo E, Tatham MH, Poulsen EG, et al. Predicting the impact of Lynch syndrome-causing missense mutations from structural calculations. PLoS Genet. 2017 Apr 19;13(4):e1006739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Caswell RC, Gunning AC, Owens MM, Ellard S, Wright CF. Assessing the clinical utility of protein structural analysis in genomic variant classification: experiences from a diagnostic laboratory. Genome Med. 2022 Dec 22;14(1):77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Laskowski RA, Stephenson JD, Sillitoe I, Orengo CA, Thornton JM. VarSite: Disease variants and protein structure. Protein Science. 2020 Jan 27;29(1):111–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stephenson JD, Totoo P, Burke DF, Jänes J, Beltrao P, Martin MJ. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res. 2024 May 20; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJE. Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol. 2019. May;431(11):2197–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Philipp M, Moth CW, Ristic N, Tiemann JKS, Seufert F, Panfilova A, et al. MutationExplorer : a webserver for mutation of proteins and 3D visualization of energetic impacts. Nucleic Acids Res. 2024 Apr 22; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bateman A, Martin MJ, Orchard S, Magrane M, Ahmad S, Alpi E, et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D523–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlić A, Rose PW. NGL viewer: web-based molecular graphics for large complexes. Bioinformatics. 2018 Nov 1;34(21):3755–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sivley RM, Dou X, Meiler J, Bush WS, Capra JA. Comprehensive Analysis of Constraint on the Spatial Distribution of Missense Variants in Human Protein Structures. The American Journal of Human Genetics. 2018. Mar;102(3):415–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sivley RM, Sheehan JH, Kropski JA, Cogan J, Blackwell TS, Phillips JA, et al. Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia. BMC Bioinformatics. 2018 Dec 23;19(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016 Jul 8;44(W1):W344–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002 Jul 1;18(suppl_1):S71–7. [DOI] [PubMed] [Google Scholar]
  • 24.Li B, Roden DM, Capra JA. The 3D mutational constraint on amino acid sites in the human proteome. Nat Commun. 2022 Jun 7;13(1):3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mukherjee S, Cogan JD, Newman JH, Phillips JA, Hamid R, Meiler J, et al. Identifying digenic disease genes via machine learning in the Undiagnosed Diseases Network. The American Journal of Human Genetics. 2021. Oct;108(10):1946–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yuan Y, Zhang L, Long Q, Jiang H, Li M. An accurate prediction model of digenic interaction for estimating pathogenic gene pairs of human diseases. Comput Struct Biotechnol J. 2022;20:3639–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gahl WA, Wise AL, Ashley EA. The Undiagnosed Diseases Network of the National Institutes of Health. JAMA. 2015 Nov 3;314(17):1797. [DOI] [PubMed] [Google Scholar]
  • 28.Ramoni RB, Mulvihill JJ, Adams DR, Allard P, Ashley EA, Bernstein JA, et al. The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. The American Journal of Human Genetics. 2017. Feb;100(2):185–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011 Aug 1;27(15):2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D988–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016 Dec 6;17(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sinitcyn P, Richards AL, Weatheritt RJ, Brademan DR, Marx H, Shishkova E, et al. Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol. 2023 Dec 23;41(12):1776–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017. May;27(5):849–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022 Apr 14;604(7905):310–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dana JM, Gutmanas A, Tyagi N, Qi G, O’Donovan C, Martin M, et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 2019 Jan 8;47(D1):D482–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Berman HM. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H, Kim SJ, et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2014. Jan;42(D1):D336–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, et al. The SWISS-MODEL Repository—new features and functionality. Nucleic Acids Res. 2017 Jan 4;45(D1):D313–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug 26;596(7873):583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Leach P, Mealling M, Salz R. A Universally Unique IDentifier (UUID) URN Namespace. 2005. Jul.
  • 42.Yoo AB, Jette MA, Grondona M. SLURM: Simple Linux Utility for Resource Management. In 2003. p. 44–60.
  • 43.Park H, Bradley P, Greisen P, Liu Y, Mulligan VK, Kim DE, et al. Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. J Chem Theory Comput. 2016 Dec 13;12(12):6201–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kellogg EH, Leaver-Fay A, Baker D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics. 2011 Mar 3;79(3):830–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Frenz B, Lewis SM, King I, DiMaio F, Park H, Song Y. Prediction of Protein Mutational Free Energy: Benchmark and Sampling Improvements Increase Classification Accuracy. Front Bioeng Biotechnol. 2020 Oct 8;8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018 Jan 4;46(D1):D1062–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May 28;581(7809):434–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Tubiana J, Schneidman-Duhovny D, Wolfson HJ. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat Methods. 2022 Jun 30;19(6):730–9. [DOI] [PubMed] [Google Scholar]
  • 49.Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics. 2017 Dec 15;33(24):3909–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (1979). 2023 Sep 22;381(6664). [DOI] [PubMed] [Google Scholar]
  • 51.Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D412–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019 Jan 8;47(D1):D941–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Yao X, Gao S, Yan N. Structural basis for pore blockade of human voltage-gated calcium channel Cav1.3 by motion sickness drug cinnarizine. Cell Res. 2022 Apr 27;32(10):946–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ezell KM, Tinker RJ, Furuta Y, Gulsevin A, Bastarache L, Hamid R, et al. Undiagnosed Disease Network collaborative approach in diagnosing rare disease in a patient with a mosaic CACNA1D variant. Am J Med Genet A. 2024 Jul 21;194(7). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Furuta Y, Tinker RJ, Gulsevin A, Neumann SM, Hamid R, Cogan JD, et al. Probable digenic inheritance of Diamond–Blackfan anemia. Am J Med Genet A. 2024 Mar 27;194(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Brown BP, Stein RA, Meiler J, Mchaourab HS. Approximating Projections of Conformational Boltzmann Distributions with AlphaFold2 Predictions: Opportunities and Limitations. J Chem Theory Comput. 2024 Jan 12; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Sommer MJ, Cha S, Varabyou A, Rincon N, Park S, Minkin I, et al. Structure-guided isoform identification for the human transcriptome. Elife. 2022 Dec 15;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 May 8; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Nachtegael C, Gravel B, Dillen A, Smits G, Nowé A, Papadimitriou S, et al. Scaling up oligogenic diseases research with OLIDA: the Oligogenic Diseases Database. Database (Oxford). 2022 Apr 12;2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (476.3KB, pdf)

Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES