VUStruct: a compute pipeline for high throughput and personalized structural biology

Christopher W Moth; Jonathan H Sheehan; Abdullah Al Mamun; R Michael Sivley; Alican Gulsevin; David Rinker; Undiagnosed Diseases Network; John A Capra; Jens Meiler

doi:10.1101/2024.08.06.606224

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Mar 26:2024.08.06.606224. Originally published 2024 Aug 7. [Version 2] doi: 10.1101/2024.08.06.606224

VUStruct: a compute pipeline for high throughput and personalized structural biology

Christopher W Moth ¹, Jonathan H Sheehan ², Abdullah Al Mamun ¹, R Michael Sivley ³, Alican Gulsevin ⁴, David Rinker ⁵; Undiagnosed Diseases Network⁶, John A Capra ⁷, Jens Meiler ^1,^8,^*

PMCID: PMC11326201 PMID: 39149406

Abstract

Effective diagnosis and treatment of rare genetic disorders requires the interpretation of a patient’s genetic variants of unknown significance (VUSs). Today, clinical decision-making is primarily guided by gene-phenotype association databases and DNA-based scoring methods. Our web-accessible variant analysis pipeline, VUStruct, supplements these established approaches by deeply analyzing the downstream molecular impact of variation in context of 3D protein structure. VUStruct’s growing impact is fueled by the co-proliferation of protein 3D structural models, gene sequencing, compute power, and artificial intelligence.

Contextualizing VUSs in protein 3D structural models also illuminates longitudinal genomics studies and biochemical bench research focused on VUS, and we created VUStruct for clinicians and researchers alike. We now introduce VUStruct to the broad scientific community as a mature, web-facing, extensible, High-Performance Computing (HPC) software pipeline.

VUStruct maps missense variants onto automatically selected protein structures and launches a broad range of analyses. These include energy-based assessments of protein folding and stability, pathogenicity prediction through spatial clustering analysis, and machine learning (ML) predictors of binding surface disruptions and nearby post-translational modification sites. The pipeline also considers the entire input set of VUS and identifies genes potentially involved in digenic disease.

VUStruct’s utility in clinical rare disease genome interpretation has been demonstrated through its analysis of over 175 Undiagnosed Disease Network (UDN) Patient cases. VUStruct-leveraged hypotheses have often informed clinicians in their consideration of additional patient testing, and we report here details from two cases where VUStruct was key to their solution. We also note successes with academic research collaborators, for whom VUStruct has informed research directions in both computational genomics and wet lab studies.

Introduction

Clinical diagnosis of the genetic causes of rare diseases is primarily guided by databases of known gene-phenotype associations(1) and computational methods for quantifying the effects of genetic variants. Examples of these methods include GERP, which analyzes evolutionary constraint(2,3); SIFT(4) which performs protein sequence homology analysis; and Polyphen(5), which is additionally trained on observed and predicted protein 3D structural features. While variant effect prediction algorithms have demonstrated utility in distinguishing known pathogenic variants from benign variants across large variant sets, these algorithms suffer from low specificity. Thus, computational methods are often of limited utility for the small sets of pre-filtered variants(6) that are typically analyzed in clinical cases and other applications involving small sets of variants (bench studies of proteins and metabolic pathways, deep mutational scans, etc.)(7). The recent development of more sophisticated ML techniques and larger training data sets has increased the predictive accuracy of scoring algorithms(8,9). Nonetheless, even AlphaMissense’s scores lack reliability in cases of specific variants(8) and can exhibit high false positive rates(9). I.e., with such a high false positive rate, the touted longitudinal statistical significance of these algorithms cannot diagnose an individual patient’s disease, nor reliably identify disruption points in a single protein or metabolic pathway.

Compounding the above caveats, computational variant effect prediction approaches reveal neither molecular nor biological mechanistic hypotheses. Instead, these tools are focused on the broad classification of mutations into pathogenic or benign, a vague partitioning with limited clinical utility. The critical biology of life unfolds in 3D space and time. Yet, scores compress this complex biology into a single number which obscures functional consequences of VUSs and their mechanisms of disease progression.

Mechanistically, variants in protein coding regions can disrupt protein function and cause diseases in various ways. As examples, amino acid substitutions can compromise the subtle energetics of protein folding and thermodynamic stability. Protein-protein interactions can be disrupted, post-translational modifications can be impeded, and metabolic networks can be broken(10).

Recently, computational protein structural analyses have demonstrated the power of mechanistic modeling of variants’ effects to reveal causes of rare disease. For example, structural modeling suggested that a de-novo VUS in KCNC2 (V469L) could block the ion channel pore, impacting the stability of the protein(11). This provided a rational and foundational hypothesis for the mechanism by which V469L causes developmental and epileptic encephalopathies (DEE) symptoms. Structure-based calculations also revealed that a missense variant in MSH2 could destabilize the protein, leading to cellular protein degradation and Lynch-syndrome disorder(12). In these cases, structure-based calculations outperformed the traditionally used genetic disease predictors. The success of the MSH2 study, as well as numerous other single-gene focused analyses, informed the creation of a generalized structure-based workflow for variant classification in the clinic(13). While this workflow provides guidance for 3D structural model selection and curation, the prescribed processes require significant human input. Once selected, structures must be manually forwarded to various external webservers which perform specific calculations, the results of which must still be integrated into final reports and explored with external visualization tools.

We created VUStruct considering these successes and analytical challenges. We hypothesized that an automated pipeline of structure-based calculations could reveal clues of variant structural and functional impacts which, in turn, could lead to the plausible identification of the root causes of rare genetic disorders in many patient cases.

The goal of VUStruct is to provide robust context on the effects of a VUS on protein structure and function - context that enables the development of mechanistic clinical hypotheses about the causes of disease. VUStruct’s automated contextualization of VUSs in protein 3D structural models can also illuminate longitudinal genomics studies and biochemical bench research focused on VUS. The pipeline automatically selects structures, integrates a broad spectrum of established computational approaches, and caps the calculation with holistic case-wide reporting. In contrast to other webservers which display variants on protein structures alongside precomputed and pre-aggregated scores(14,15), VUStruct performs fresh calculations based on queries to current genomic and protein model databases. Many servers require upload of a (single) protein structure file(16) or default to Alphafold-2 models(17) covering only Uniprot(18) canonical transcripts. VUStruct expands the scope of previous methods by integrating analyses of multiple 3D structures per protein and non-canonical transcripts when available. The final computed product is a website that enables drilling-down from a top-level case report, to each transcript, to 3D structure visualization. For each 3D structure, NGLviewer(19) sessions afford not only 3D manipulation of a variant’s spatial environment, but also visualization of the proximity of known pathogenic and benign variants and evolutionary constraint within and between species (PathProx (20,21), ConSurf(22,23) and COSMIS(24)). VUStruct also investigates the potential for combinations of the VUSs to cause disease with DiGePred(25) and DIEP(26) ML algorithms trained to detect digenic disease.

Taken together, these automated and parallel calculations inform the clinic or laboratory at a 3D structural and molecular mechanistic level. VUStruct provides a compelling supplement to the insights gained from conventional genome-based scoring analysis alone, and we report the pipeline’s contribution to two UDN(27,28) patient cases.

Design and Implementation

The VUStruct computational pipeline is primarily implemented as Python codes which query and filter a wide range of pre-downloaded databases. Additional code launches and monitors the calculations which run inside Singularity(29) containers.

Conceptually, the pipeline runs in five discrete phases following upload of a variant set via the initial web form:

Optional pre-processing of human genomic coordinates
3D structure selection and compute job planning
Launch of job arrays (for each variant) on the HPC
Progress monitoring
Report generation

Except for progress monitoring, these phases are depicted in Figure 1 and detailed below.

1. Variant Upload and (optional) Genomic Preprocessing

VUStruct supports several input formats, which are first converted into a “vustruct.csv” pipeline-ready, comma-delimited flat file. This “VUStruct CSV” file contains gene names, transcript identifiers, amino acid variants, and parental inheritance when known. When users already know precise transcript identifiers and amino acid changes of interest, then “VUStruct CSV” format may be selected from the outset, and the pipeline will proceed directly to phase 2.

When starting from patient genetic variants (the underlying changes to alleles at chromosome positions) VUStruct converts these data to proteomic impacts. For variants loaded in VCF(30) format, parsed genomic coordinates are fed through the ENSEMBL(31) Variant Effect Predictor (VEP)(32) and missense variants are retained from the VEP output for subsequent structure selection and calculation scheduling. Since the VEP often reports impacts to many predicted transcripts which lack experimental validation, VUstruct restricts its calculations to the subset of returned genomic transcript IDs which cross-reference to Swiss-Prot curated Uniprot(18) protein IDs. Non-canonical transcripts are increasingly found in-vivo as proteomics methods evolve(33), and VUstruct includes all of a gene’s curated splice variants. I.e., we incorporate the curated but non-canonical sequences identified in Uniprot with additional “-N” suffixed identifiers.

A challenge in our field is that annotations to the human reference genome(34) are not static. There are relatively frequent amino acid sequence discrepancies between ENSEMBL transcript records and Swiss-Prot curated Uniprot sequences (as cross-referenced by UniParc identifiers in Uniprot’s “Id Mapping” resources). As a practical example, in early 2023, while 48,308 ENSEMBL transcripts cross-referenced to Uniprot sequences perfectly, we also found that 8,500 curated Uniprot IDs had no cross references to any GRCh38 Ensembl Transcript identifiers. 1,166 transcripts had cross-references and the transcript lengths were the same in both databases. However, the amino acid sequences were different. 370 transcripts had varying transcript lengths between between ENSEMBL and Uniprot. Uniprot is constantly working to improve cross-references, and the MANE(35) collaboration is also informing the field. Today, for variants that cannot be immediately processed due to these disconnects, the pipeline reports these problems to the user and the pipeline stops. This provides users with the opportunity to either rework the genomic coordinates input data, or manually download and patch the preprocessor-generated vustruct.csv file for input to phase 2.

2. Structure Selection and Compute Job planning.

From the Uniprot IDs in the “vustruct.csv” file, the pipeline “plans” the set of calculations by gathering structural information for target proteins. Available experimental structures are mined from the PDB and aligned to current transcripts via the SIFTS database(36,37). SwissModel and ModBase models are integrated (38,39). For mutations on canonical transcripts, AlphaFold(40) models are added to the set of representative structure. The final structure selections minimize redundancy and maximize diversity of experimental techniques, variant-coverage, model confidence and experimental quality metrics. Multimeric complexes are also prioritized in this process.

Below the single directory for the user-provided Case ID, a subdirectory is created for each variant. For each retained structure, calculations are planned, and command line parameters are set for each job. These details are recorded in the workplan.csv of the variant subdirectory. Importantly, planning is entirely independent of the HPC architecture. To ensure that no job conflicts with any other, each user-input Case ID is appended to a Globally Unique Identifier (GUID)(41) and assigned a work directory in the hierarchy of VUStruct/CaseID_and GUID/Transcript ID/3D Structure Type and Id/Calculation Type/Work Directory/. A sibling /Status Directory/is used by each running job in VUStruct to uniformly communicate progress, competition, or failure to the VUStruct monitor application (described under phase 4).

3. Job Launch

To launch the hundred(s) of jobs typically planned for a set of VUS (e.g., from a UDN case or from list of variants from genome sequencing), the pipeline writes submission scripts for the supported cluster environments on the back end (either SLURM(42) or IBM LSF^™). Each launched job runs out of a Singularity(29) container. From the container, the bound filesystem of the HPC environment is accessible but the application is otherwise blind to the surrounding HPC API. A single short script, external to the container architecture, launches all the HPC jobs, and records assigned job numbers for downstream monitoring.

The currently launched calculations include:

Rosetta ΔΔG_folding(43–45) estimates the energetic impact of each amino acid substitution on the free energy of protein folding. These two-part calculations are stored in a repository to avoid redundant “relax” steps and save compute time.
PathProx (20,21) predicts pathogenic variants when they better fit with clusters of “known pathogenic” sites (mined from Clinvar) (46) vs. randomly placed vs. benign variant sites found in Gnomad (47).
ScanNet (48)estimates the likelihood that a variant to disrupt a protein-protein interaction, via an ML algorithm.
MusiteDeep (49) predicts protein post-translational (PTM) site modification through a deep-learning framework.
Digenic disease interactions are predicted with DigePred(25) and DIEP(26).

Additional suggestions for the interpretation of these outputs are provided in the Supplemental Information.

4. Job Monitoring

Over the course of a VUStruct run, the case report is refreshed at 30 minute intervals, to reflect the latest calculated data. The stdout, stderr, and .log files for each individual job are also updated.

The pipeline also informs the user of both overall and individual job progress on the cluster. In a large shared HPC environment, launched jobs are assigned unique job numbers, but do not immediately run. Traditionally, HPC users monitor job progress with a suite of HPC-provided command line tools. Through its web interface, VUStruct interfaces to these tools on the back end, and dynamically reports on job prioritization, submission delays, remaining run time, and resource allocation. These technical status updates are presented to the user via a JavaScript monitor running in the case landing page. This page receives updates from an HPC node via middleware on the web server host.

5. Reporting

The pipeline generates a case-wide report as a landing page that combines calculated results for each transcript. As shown in Figure 2, the report also integrates queried scores from AlphaMissense(50), ConSurf(22,23) and COSMIS(24) for all the individual variants. This is followed by digenic analysis outputs.

The case report landing page contains a table, in which each row summarizes the range of calculated values for each variant. Ranges arise in some calculations because multiple structures are considered for each transcript. In the case of input genomic coordinates, multiple transcript isoforms are often impacted for each variant. The “Refresh Cluster Jobs Information” box allows detailed monitoring and troubleshooting. The drawn red box shows how summary row 14 (a change to gene HFE on Chrom 6) has been expanded to display five rows for different impacted transcripts. The first of row corresponds to the canonical UniProt isoform. Clicking the “Report” link for that line will display detailed calculations for this variant in the context of that transcript (see next figure).

From the case-wide report, the user may click into specific transcript reports (Figure 3). Clicking into a transcript report presents the user with a PFAM domain graphic(51) followed by a tabular summary of calculation results for the associated structures that were selected in step 2. The “navbar” at top left allows the user to hop to individual 3D structures, where NGL WebViewer(19) sessions are available to inspect the atomic environment of variants (Figure 4). The customized viewer also allows backbone coloring of the various calculated constraint scores, and model confidence.

The top of each transcript variant report shows the variant location as a pink diamond in the context of the protein’s PFAM domain(51) annotation. This is followed by results for Rate4Site, COSMIS, MusiteDeep, and ScanNet calculations. A key table is the Structure Summary, which lists all of the structures (from the PDB, MODBASE, SWISS-MODEL database, and AlphaFold database) on which calculations were performed. For example, the highlighted row summarizes all the calculations performed on X-Ray crystal structure 1a6b.pdb, and the highlighted shortcut in the left column leads to the section of the page detailing those results (see next figure).

Figure 4. — The section of the results page devoted to each structural model shows the results of the ΔΔG and PathProx calculations, highlighted in red, along with associated statistics to judge reliability. Each structure is displayed in a customized and interactive NGL Viewer(19) session. This view can be used to understand the structural context of the variant (highlighted here with a gray text pop-up. For example, one can display all pathogenic and likely pathogenic ClinVar variants (shown as red spheres), or color the model according to AlphaFold confidence or by PathProx score (from low- blue, to high - red - as shown above). Figures to illustrate a structural mechanistic hypothesis can be generated quickly from these images.

The downstream audience for VUStruct case reports is broader than the structural biologists trained to interpret the pipeline’s detailed outputs. Typically, that final audience includes clinicians and geneticists who are primarily interested in whether VUStruct identifies a candidate gene for ongoing consideration, and how pipeline outputs, at high level, inform that recommendation. To communicate the high-level findings of VUStruct succinctly, VUStruct drafts a case summary spreadsheet (Supplemental Information Figure 1). The Supplemental Information also suggests approaches to communicating with clinical partners and includes advice on calculation interpretation.

Dependencies

VUStruct integrates several externally sourced databases. So that the pipeline can run responsively, and avoid vulnerability to external outages, the supporting databases are locally downloaded, installed, and maintained. The two support pillars of VUStruct are the ENSEMBL GRCh38 PERL API and the UniProt id mapping file. We locally import ENSEMBL’s SQL database, and additionally load UniProt(18) cross-references into SQL tables to speed sequence cross-references between genome and proteome. BASH scripts are additionally provided to aid download of Clinvar(46), COSMIC(52), and gnomAD(47) databases which are mined for PathProx’s mathematical spatial analysis and for web-based visualizations. Several of our predictive calculations integrate sequence constraint, gleaned from both multi-species sequence alignments(22,23) and human population sequences(24). These calculations, along with AlphaMissense(50) predictions, are downloaded as transcriptome-wide precomputations, and are integrated into final reports without the need for cluster launches.

Cited calculations are deployed inside Singularity Containers. Deployment of Rosetta ΔΔG_folding (43–45) Cartesian and Monomer calculations requires a free academic or paid commercial license from rosettacommons.org.

Results/Application of VUStruct in the interpretation of clinical data:

We have demonstrated the VUStruct pipeline’s utility in the interpretation of genetic VUS in collaboration with colleagues from the Vanderbilt UDN. The containerized VUStruct software pipeline has been applied to over 150 UDN Vanderbilt UDN patients and 25 Washington University patients. The pipeline provides researchers and clinician geneticists with insights into candidate missense variants in the context of 3D protein atomic structure. In contrast to the many algorithms and websites that perform a single calculation on a single protein variant on a single protein structure, VUStruct is holistic and automated. Our pipeline analyzes a set of patient genetic VUSs and unifies the results under a case-wide report page. VUStruct is also noteworthy for its principled selection of appropriate structures among the growing wealth of available experimental and computational structural models, automated calculation setup and launch, and progress monitoring.

As one illustration of VUStruct’s potential to aid hypothesis generation, we highlight a patient with PASNA syndrome caused by a heterozygous variant in the CACNA1D gene that encodes a Human L-type voltage-gated calcium channel (Cav). Several candidate variants were selected from the patient genome sequencing (GS) data based on phenotype analysis. These variants were submitted to the pipeline and the 3D structure of the corresponding protein was analyzed by different computational methods including Rosetta ΔΔG (43–45), protein-protein interaction (PPI), post-translational modifications (PTM) and digenic predictions (DiGePred) analysis. VUStruct reported that the F767L variant in CACNA1D results in structural destabilization as evidenced by ΔΔG score in Rosetta. Starting from the VUStruct report, we hypothesized that the variant may contribute to the PASNA syndrome and conducted additional Rosetta simulations on the Cav structural model. In follow-up, two different variants F767L and F767S for Cav were used to calculate the ΔΔG in Rosetta using closed state conformation (PDB id: 7UHG (53)). F767S is a known pathogenic variant that causes a gain of function mechanism, and it was used as a positive control for this study. The higher calculated ΔΔG of F767L (~5.3 Rosetta Energy Units) vs F767S (~3.9 R.E.U.) suggested that F767L could contribute to at least as much structural disruption as known pathogenic variant F767S for the closed state conformation. Thus we hypothesized that these variants destabilize the closed state, and push conformational equilibrium towards the channel opening state. The search for this crucial finding began with VUStruct analysis and led to the further confirmative analysis to diagnose the possible cause of the PASNA syndrome (54).

A second demonstration of VUStruct’s utility was aiding a diagnosis of Diamond Blackfin anemia (DBA) in a case which could not be explained by simple Mendelian inheritance. The VUStruct report suggested that a missense variant in the RPS19 gene results in a slight stabilization, based on Rosetta ΔΔG. In addition, the proband carried another variant in the RPL27 gene, which DiGePred(25) and DIEP(26) analysis predicted to have a strong digenic interaction with RPS19. These clues helped to focus further structural analysis. We investigated different 80S ribosome structures available in the protein data bank. Although RPS19 and RPL27 are on opposite sides of the complex. It is plausible that T55M in RPS19 changes allosteric interactions between the two proteins, disrupting the 80S ribosome function. These structural analyses inspired further co-segregation and RNA sequencing analysis of the proband. Further analysis of these suggested the proband’s DBA is caused by the digenic interactions between RPS19 and RPL27(55).

Availability and Future Directions

The website is made available to all, without condition. For those wishing to setup their own pipeline environment, all our code and containers (with one exception), are licensed under the MIT License and can be downloaded from https://github.com/meilerlab/VUStruct. The one exception is the Rosetta ΔΔG module containers, which require Rosetta Commons licensing, available at no charge to academic users.

VUStruct development is continuously fueled by ongoing explosions in available protein 3D structures, genome sequencing, computer power, and artificial intelligence. We are committed to the pipeline’s flexibility and continuous improvement.

One current pipeline limitation is that all calculations are based on sets of single structural models, and the implications of dynamics are not presently considered. Multi-conformer generation is an active area of research(56). We plan to integrate that work into the pipeline, so that more conformational states are sampled. Additionally, we are integrating AlphaFold models for non-canonical transcript sequences(57). Pending its public opening, we hope to mine the AlphaFold 3 repository for its updated structural coverage that includes multimeric complexes(58). Predictions of digenic interactions should benefit from model retraining, given the emergence of new ground truth data sets(59).

Supplementary Material

Supplement 1

media-1.pdf^{(476.3KB, pdf)}

UDN Consortium

Full Name	Affiliation	Email
Alyssa A. Tran	BCM Clinical	alyssat@bcm.edu
Arjun Tarakad	BCM Clinical	tarakad@bcm.edu
Ashok Balasubramanyam	BCM Clinical	ashokb@bcm.edu
Brendan H. Lee	BCM Clinical	blee@bcm.edu
Carlos A. Bacino	BCM Clinical	cbacino@bcm.edu
Daryl A. Scott	BCM Clinical	dscott@bcm.edu
Elaine Seto	BCM Clinical	esseto@bcm.edu
Gary D. Clark	BCM Clinical	gdclark@texaschildrens.or
Hongzheng Dai	BCM Clinical	Hongzheng.Dai@bcm.edu
Hsiao-Tuan Chao	BCM Clinical	hc140077@bcm.edu
Ivan Chinn	BCM Clinical	Ivan.Chinn@bcm.edu
James P. Orengo	BCM Clinical	james.orengo@bcm.edu
Jennifer E. Posey	BCM Clinical	Jennifer.Posey@bcm.edu
Jill A. Rosenfeld	BCM Clinical	mokry@bcm.edu
Kim Worley	BCM Clinical	kworley@bcm.edu
Lindsay C. Burrage	BCM Clinical	burrage@bcm.edu
Lisa T. Emrick	BCM Clinical	emrick@bcm.edu
Lorraine Potocki	BCM Clinical	lpotocki@bcm.edu
Monika Weisz Hubshman	BCM Clinical	hubshman@bcm.edu
Richard A. Lewis	BCM Clinical	rlewis@bcm.edu
Ronit Marom	BCM Clinical	ronit.marom@bcm.edu
Seema R. Lalani	BCM Clinical	seemal@bcm.edu
Shamika Ketkar	BCM Clinical	ketkar@bcm.edu
Tiphanie P. Vogel	BCM Clinical	tiphanie.vogel@bcm.edu
William J. Craigen	BCM Clinical	wcraigen@bcm.edu
Lauren Blieden	BCM Clinical	Lauren.Blieden@bcm.edu
Jared Sninsky	BCM Clinical	Jared.Sninsky@bcm.edu
Hugo J. Bellen	BCM MOSC	hbellen@bcm.edu
Michael F. Wangler	BCM MOSC	mw147467@bcm.edu
Oguz Kanca	BCM MOSC	Oguz.Kanca@bcm.edu
Shinya Yamamoto	BCM MOSC	yamamoto@bcm.edu
Christine M. Eng	BCM Sequencing	ceng@bcm.edu
Patricia A. Ward	BCM Sequencing	pward@bcm.edu
Pengfei Liu	BCM Sequencing	pliu@baylorgenetics.com
Adeline Vanderver	CHOP	vandervera@chop.edu
Cara Skraban	CHOP	skrabanc@chop.edu
Edward Behrens	CHOP	behrens@chop.edu
Gonench Kilich	CHOP	kilichg@chop.edu
Kathleen Sullivan	CHOP	sullivank@chop.edu
Kelly Hassey	CHOP	hasseyk@chop.edu
Ramakrishnan Rajagopalan	CHOP	rajagopalanr@chop.edu
Rebecca Ganetzky	CHOP	ganetzkyr@chop.edu
Vishnu Cuddapah	CHOP	cuddapahv@chop.edu
Anna Raper	CHOP/UPenn	rapera@pennmedicine.up
Daniel J. Rader	CHOP/UPenn	rader@pennmedicine.upe
Giorgio Sirugo	CHOP/UPenn	Giorgio.Sirugo@pennmedi
Vaidehi Jobanputra	Columbia	vj2004@cumc.columbia.edu
Allyn McConkie-Rosell	Duke	allyn.mcconkie@duke.edu
Kelly Schoch	Duke	kelly.schoch@duke.edu
Mohamad Mikati	Duke	mohamad.mikati@duke.edu
Nicole M. Walley	Duke	nicole.walley@duke.edu
Rebecca C. Spillmann	Duke	rebecca.crimian@duke.ed
Vandana Shashi	Duke	vandana.shashi@duke.edu
Alan H. Beggs	Harvard	beggs@enders.tch.harvar
Calum A. MacRae	Harvard	camacrae@bics.bwh.harva
David A. Sweetser	Harvard	dsweetser@partners.org
Deepak A. Rao	Harvard	darao@bwh.harvard.edu
Edwin K. Silverman	Harvard	ed.silverman@channing.h
Elizabeth L. Fieg	Harvard	efieg@bwh.harvard.edu
Frances High	Harvard	fhigh@partners.org
Gerard T. Berry	Harvard	gerard.berry@childrens.ha
Ingrid A. Holm	Harvard	ingrid.holm@childrens.har
J. Carl Pallais	Harvard	Juan.Pallais@mgh.harvard
Joan M. Stoler	Harvard	joan.stoler@childrens.harv
Joseph Loscalzo	Harvard	jloscalzo@partners.org
Lance H. Rodan	Harvard	lance.rodan@childrens.ha
Laurel A. Cobban	Harvard	lcobban@bwh.harvard.ed
Lauren C. Briere	Harvard	lbriere@partners.org
Matthew Coggins	Harvard	mcoggins@bwh.harvard.e
Melissa Walker	Harvard	walker.melissa@mgh.harv
Richard L. Maas	Harvard	maas@genetics.med.harv
Susan Korrick	Harvard	skorrick@bwh.harvard.edu
Jessica Douglas	Harvard	Jessica.Douglas@childrens
AudreyStephannie C. Maghiro	Harvard DMCC	audreystephannie_maghir
Cecilia Esteves	Harvard DMCC	cecilia_esteves@hms.harv
Emily Glanton	Harvard DMCC	Emily_Glanton@hms.harv
Isaac S. Kohane	Harvard DMCC	isaac_kohane@hms.harva
Kimberly LeBlanc	Harvard DMCC	kimberly_leblanc@hms.ha
Rachel Mahoney	Harvard DMCC	rachel_mahoney@hms.ha
Shamil R. Sunyaev	Harvard DMCC	ssunyaev@hms.harvard.e
Shilpa N. Kobren	Harvard DMCC	Shilpa_Kobren@hms.harv
Brett H. Graham	IU	bregraha@iu.edu
Erin Conboy	IU	econboy@iu.edu
Francesco Vetrini	IU	fvetrini@iu.edu
Kayla M. Treat	IU	ktreat@iuhealth.org
Khurram Liaqat	IU	kliaqat@iu.edu
Lili Mantcheva	IU	lmantche@iu.edu
Stephanie M. Ware	IU	stware@iu.edu
Breanna Mitchell	Mayo Clinic	Mitchell.Breanna@mayo.e
Brendan C. Lanpher	Mayo Clinic	lanpher.brendan@mayo.e
Devin Oglesbee	Mayo Clinic	oglesbee.devin@mayo.ed
Eric Klee	Mayo Clinic	klee.eric@mayo.edu
Filippo Pinto e Vairo	Mayo Clinic	vairo.filippo@mayo.edu
Ian R. Lanza	Mayo Clinic	lanza.ian@mayo.edu
Kahlen Darr	Mayo Clinic	Darr.Kahlen@mayo.edu
Lindsay Mulvihill	Mayo Clinic	mulvihill.lindsay@mayo.e
Lisa Schimmenti	Mayo Clinic	Schimmenti.Lisa@mayo.ed
Queenie Tan	Mayo Clinic	Tan.KhoonGheeQueenie@
Surendra Dasari	Mayo Clinic	dasari.surendra@mayo.e
Adriana Rebelo	Miami	arebelo@med.miami.edu
Carson A. Smith	Miami	carsonsmith@med.miami.
Deborah Barbouth	Miami	dbarbouth@miami.edu
Guney Bademci	Miami	g.bademci@miami.edu
Joanna M. Gonzalez	Miami	jmg442@miami.edu
Kumarie Latchman	Miami	kxl604@med.miami.edu
LéShon Peart	Miami	L.peart@med.miami.edu
Mustafa Tekin	Miami	mtekin@miami.edu
Nicholas Borja	Miami	nborja@med.miami.ed
Stephan Zuchner	Miami	szuchner@miami.edu
Stephanie Bivona	Miami	sab355@miami.edu
Willa Thorson	Miami	wthorson@miami.edu
Herman Taylor	Morehouse DMCC	htaylor@msm.edu
Andrea Gropman	NIH UDP	agropman@childrensnatic
Barbara N. Pusey Swerdzewski	NIH UDP	barbara.pusey@nih.gov
Camilo Toro	NIH UDP	toroc@mail.nih.gov
Colleen E. Wahl	NIH UDP	colleen.wahl@nih.gov
Donna Novacic	NIH UDP	donna.novacic@nih.gov
Ellen F. Macnamara	NIH UDP	ellen.macnamara@nih.gov
John J. Mulvihill	NIH UDP	johmulvihill@gmail.com
Maria T. Acosta	NIH UDP	acostam@nhgri.nih.gov
Precilla D’Souza	NIH UDP	precilla.d’souza@nih.gov
Valerie V. Maduro	NIH UDP	vbraden@mail.nih.gov
Ben Afzali	NIH UDP, NHGRI	ben.afzali@nih.gov
Ben Solomon	NIH UDP, NHGRI	solomonb@mail.nih.gov
Cynthia J. Tifft	NIH UDP, NHGRI	ctifft@nih.gov
David R. Adams	NIH UDP, NHGRI	david.adams@nih.gov
Elizabeth A. Burke	NIH UDP, NHGRI	elizabeth.burke2@nih.gov
Francis Rossignol	NIH UDP, NHGRI	francis.rossignol@nih.gov
Heidi Wood	NIH UDP, NHGRI	heidi.wood@nih.gov
Jiayu Fu	NIH UDP, NHGRI	fuj6@mail.nih.gov
Joie Davis	NIH UDP, NHGRI	jdavis@niaid.nih.gov
Leoyklang Petcharet	NIH UDP, NHGRI	petcharat.leoyklang@nih.g
Lynne A. Wolfe	NIH UDP, NHGRI	lynne.wolfe@nih.gov
Margaret Delgado	NIH UDP, NHGRI	margaret.delgado@nih.go
Marie Morimoto	NIH UDP, NHGRI	marie.morimoto@nih.gov
Marla Sabaii	NIH UDP, NHGRI	marla.sabaii@nih.gov
MayChristine V. Malicdan	NIH UDP, NHGRI	maychristine.malicdan@ni
Neil Hanchard	NIH UDP, NHGRI	neil.hanchard@nih.gov
Orpa Jean-Marie	NIH UDP, NHGRI	orpa.jean-marie@nih.gov
Wendy Introne	NIH UDP, NHGRI	wintrone@nhgri.nih.gov
William A. Gahl	NIH UDP, NHGRI	gahlw@mail.nih.gov
Yan Huang	NIH UDP, NHGRI	yan.huang@nih.gov
Aimee Allworth	PNW	allwoa@uw.edu
Andrew Stergachis	PNW	absterga@uw.edu
Danny Miller	PNW	Danny.Miller@seattlechild
Elizabeth Blue	PNW	em27@uw.edu
Elizabeth Rosenthal	PNW	erosen@uw.edu
Elsa Balton	PNW	ebalton@medicine.washin
Emily Shelkowitz	PNW
Eric Allenspach	PNW	eric.allenspach@seattlech
Fuki M. Hisama	PNW	fmh2@uw.edu
Gail P. Jarvik	PNW	pair@uw.edu
Ghayda Mirzaa	PNW	gmirzaa@uw.edu
Ian Glass	PNW	ianglass@uw.edu
Kathleen A. Leppig	PNW	leppig@uw.edu
Katrina Dipple	PNW	katrina.dipple@seattlechil
Mark Wener	PNW	wener@uw.edu
Martha Horike-Pyne	PNW	mpyne@medicine.washing
Michael Bamshad	PNW	mbamshad@uw.edu
Peter Byers	PNW	pbyers@uw.edu
Sam Sheppeard	PNW	samshep@uw.edu
Sirisak Chanprasert	PNW	sirisc@uw.edu
Virginia Sybert	PNW	flk01@uw.edu
Wendy Raskind	PNW	wendyrun@uw.edu
Nitsuh K. Dargie	PNW	nitsuhk@medicine.washin
Beth A. Martin	Stanford	martinb@stanford.edu
Chloe M. Reuter	Stanford	creuter@stanfordhealthca
Devon Bonner	Stanford	devonbonner@stanfordhe
Elijah Kravets	Stanford	ekravets@stanford.edu
Holly K. Tabor	Stanford	hktabor@stanford.edu
Jacinda B. Sampson	Stanford	jacindas@stanford.edu
Jason Hom	Stanford	jasonhom@stanford.edu
Jennefer N. Kohler	Stanford	jkohler@stanfordhealthca
Jonathan A. Bernstein	Stanford	Jon.Bernstein@stanford.e
Kevin S. Smith	Stanford	kssmith@stanford.edu
Matthew T. Wheeler	Stanford	wheelerm@stanford.edu
Meghan C. Halley	Stanford	mhalley@stanford.edu
Page C. Goddard	Stanford	pgoddard@stanford.edu
Paul G. Fisher	Stanford	pfisher@stanford.edu
Rachel A. Ungar	Stanford	raungar@stanford.edu
Raquel L. Alvarez	Stanford	raquela1@stanford.edu
Shruti Marwaha	Stanford	mshruti@stanford.edu
Terra R. Coakley	Stanford	tcoakley@stanford.edu
Euan A. Ashley	Stanford DMCC	Euan@stanford.edu
Ali Al-Beshri	UAB	asabeshri@uabmc.edu
Anna Hurst	UAB	acehurst@uab.edu
Bruce Korf	UAB	bkorf@uab.uabmc.edu
Kaitlin Callaway	UAB	kcallaway@uabmc.edu
Martin Rodriguez	UAB	rodriguez@uabmc.edu
Tammi Skelton	UAB	tlskelton@uabmc.edu
Andrew B. Crouse	UAB DMCC	acrouse@uab.edu
Jordan Whitlock	UAB DMCC	jbarham3@uab.edu
Mariko Nakano-Okuno	UAB DMCC	marikonk@uab.edu
Matthew Might	UAB DMCC	might@uab.edu
William E. Byrd	UAB DMCC	webyrd@gmail.com
Changrui Xiao	UCI/CHOC	changrx@hs.uci.edu
Eric Vilain	UCI/CHOC	evilain@hs.uci.edu
Jose Abdenur	UCI/CHOC	JAbdenur@choc.org
Kathyrn Singh	UCI/CHOC	kesingh@hs.uci.edu
Rebekah Barrick	UCI/CHOC	rebekah.barrick@choc.org
Sanaz Attaripour	UCI/CHOC	sattarip@hs.uci.edu
Suzanne Sandmeyer	UCI/CHOC	sbsandme@hs.uci.edu
Sirisak Chanprasert	PNW	sirisc@uw.edu
Virginia Sybert	PNW	flk01@uw.edu
Wendy Raskind	PNW	wendyrun@uw.edu
Nitsuh K. Dargie	PNW	nitsuhk@medicine.washin
Beth A. Martin	Stanford	martinb@stanford.edu
Chloe M. Reuter	Stanford	creuter@stanfordhealthca
Devon Bonner	Stanford	devonbonner@stanfordhe
Elijah Kravets	Stanford	ekravets@stanford.edu
Holly K. Tabor	Stanford	hktabor@stanford.edu
Jacinda B. Sampson	Stanford	jacindas@stanford.edu
Jason Hom	Stanford	jasonhom@stanford.edu
Jennefer N. Kohler	Stanford	jkohler@stanfordhealthca
Jonathan A. Bernstein	Stanford	Jon.Bernstein@stanford.ei
Kevin S. Smith	Stanford	kssmith@stanford.edu
Matthew T. Wheeler	Stanford	wheelerm@stanford.edu
Meghan C. Halley	Stanford	mhalley@stanford.edu
Page C. Goddard	Stanford	pgoddard@stanford.edu
Paul G. Fisher	Stanford	pfisher@stanford.edu
Rachel A. Ungar	Stanford	raungar@stanford.edu
Raquel L. Alvarez	Stanford	raquela1@stanford.edu
Shruti Marwaha	Stanford	mshruti@stanford.edu
Terra R. Coakley	Stanford	tcoakley@stanford.edu
Euan A. Ashley	Stanford DMCC	Euan@stanford.edu
Ali Al-Beshri	UAB	asabeshri@uabmc.edu
Anna Hurst	UAB	acehurst@uab.edu
Bruce Korf	UAB	bkorf@uab.uabmc.edu
Kaitlin Callaway	UAB	kcallaway@uabmc.edu
Martin Rodriguez	UAB	rodriguez@uabmc.edu
Tammi Skelton	UAB	tlskelton@uabmc.edu
Andrew B. Crouse	UAB DMCC	acrouse@uab.edu
Jordan Whitlock	UAB DMCC	jbarham3@uab.edu
Mariko Nakano-Okuno	UAB DMCC	marikonk@uab.edu
Matthew Might	UAB DMCC	might@uab.edu
William E. Byrd	UAB DMCC	webyrd@gmail.com
Changrui Xiao	UCI/CHOC	changrx@hs.uci.edu
Eric Vilain	UCI/CHOC	evilain@hs.uci.edu
Jose Abdenur	UCI/CHOC	JAbdenur@choc.org
Kathyrn Singh	UCI/CHOC	kesingh@hs.uci.edu
Rebekah Barrick	UCI/CHOC	rebekah.barrick@choc.org
Sanaz Attaripour	UCI/CHOC	sattarip@hs.uci.edu
Suzanne Sandmeyer	UCI/CHOC	sbsandme@hs.uci.edu
Tahseen Mozaffar	UCI/CHOC	mozaffar@hs.uci.edu
Albert R. La Spada	UCI/CHOC	alaspada@uci.edu
Elizabeth C. Chao	UCI/CHOC	ecchao@uci.edu
Maija-Rikka Steenari	UCI/CHOC	msteenari@choc.org
Alden Huang	UCLA	AYHuang@mednet.ucla.ed
Brent L. Fogel	UCLA	bfogel@ucla.edu
Esteban C. Dell'Angelica	UCLA	edellangelica@mednet.uc
George Carvalho	UCLA	GCarvalhoNeto@mednet.
Julian A. Martfnez-Agosto	UCLA	julianmartinez@mednet.u
Manish J. Butte	UCLA	mbutte@mednet.ucla.edu
Martin G. Martin	UCLA	mmartin@mednet.ucla.ed
Naghmeh Dorrani	UCLA	ndorrani@mednet.ucla.ed
Neil H. Parker	UCLA	nhparker@mednet.ucla.ed
Rosario I. Corona	UCLA	rcoronadelafuente@medn
Stanley F. Nelson	UCLA	snelson@ucla.edu
Yigit Karasozen	UCLA	Ykarasozen@mednet.ucla
Aaron Quinlan	University of Utah	aquinlan@genetics.utah.e
Alistair Ward	University of Utah	alistairnward@gmail.com
Ashley Andrews	University of Utah	ashley.andrews@hsc.utah
Corrine K. Welt	University of Utah	cwelt@u2m2.utah.edu
Dave Viskochil	University of Utah	dave.viskochil@hsc.utah.e
Erin E. Baldwin	University of Utah	erin.baldwin@hsc.utah.ed
John Carey	University of Utah	john.carey@hsc.utah.edu
Justin Alvey	University of Utah	justin.alvey@hsc.utah.edu
Laura Pace	University of Utah	laura.pace@hsc.utah.edu
Lorenzo Botto	University of Utah	lorenzo.botto@hsc.utah.e
Nicola Longo	University of Utah	nicola.longo@hsc.utah.ed
Paolo Moretti	University of Utah	paolo.moretti@hsc.utah.e
Rebecca Overbury	University of Utah	rebecca.overbury@hsc.uta
Russell Butterfield	University of Utah	russell.butterfield@hsc.ut
Steven Boyden	University of Utah	steven.boyden@genetics.u
Thomas J. Nicholas	University of Utah	thomas.nicholas@utah.ed
Matt Velinder	University of Utah	mvelinder@frameshift.io
Gabor Marth	University of Utah DMCC	gmarth@genetics.utah.ed
Pinar Bayrak-Toydemir	University of Utah/ARUP	pinar.bayrak-toydemir@ar
Rong Mao	University of Utah/ARUP	rong.mao@aruplab.com
Monte Westerfield	UO MOSC	monte@uoneuro.uoregon
Brian Corner	Vanderbilt	brian.corner@vumc.org
John A. Phillips III	Vanderbilt	John.a.phillips@vumc.org
Kimberly Ezell	Vanderbilt	kimberly.ezell@vumc.org
Lynette Rives	Vanderbilt	lynette.c.rives@vumc.org
Rizwan Hamid	Vanderbilt	rizwan.hamid@vumc.org
Serena Neumann	Vanderbilt	serena.neumann@vumc.o
Ashley McMinn	Vanderbilt	ashley.mcminn@vumc.org
Joy D. Cogan	Vanderbilt	joy.cogan@vumc.org
Thomas Cassini	Vanderbilt	thomas.a.cassini@vumc.or
Alex Paul	WUSTL Clinical	alex.paul@wustl.edu
Dana Kiley	WUSTL Clinical	dana.kiley@wustl.edu
Daniel Wegner	WUSTL Clinical	danieljwegner@wustl.edu
Erin McRoy	WUSTL Clinical	e.hediger@wustl.edu
Jennifer Wambach	WUSTL Clinical	wambachj@wustl.edu
Kathy Sisco	WUSTL Clinical	siscok@wustl.edu
Patricia Dickson	WUSTL Clinical	pdickson@wustl.edu
F. Sessions Cole	WUSTL DMCC	fcole@wustl.edu
Dustin Baldridge	WUSTL MOSC	dbaldri@wustl.edu
Jimann Shin	WUSTL MOSC	shinji@wustl.edu
Lilianna Solnica-Krezel	WUSTL MOSC	solnical@wustl.edu
Stephen Pak	WUSTL MOSC	stephen.pak@email.wustl.
Timothy Schedl	WUSTL MOSC	ts@wustl.edu
Hector Rodrigo Mendez	Stanford	mendezh@stanford.edu
Brianna Tucker	Stanford	bmtucker@stanford.edu
Beatriz Anguiano	Stanford	banguian@stanford.edu
Mia Levanto	Stanford	mlevanto@stanford.edu
Suha Bachir	Stanford	sbachir@stanford.edu
Laurens Wiel	Stanford	lvdwiel@stanford.edu
Stephen B Montgomery	Stanford	smontgom@stanford.edu
Tanner D Jensen	Stanford	tannerj@stanford.edu
John E. Gorzynski	Stanford	jgorz@stanford.edu
Sara Emami	Stanford	slemami@stanford.edu
Laura Keehan	Stanford	keehan@stanford.edu
Jennifer Schymick	Stanford	jennifer.schymick@hhs.scc
Taylor Maurer	Stanford	maurertm@stanford.edu
Alexander Miller	Stanford	atex91@stanford.edu
Andres Vargas	UCLA	AndresVargas@mednet.uc
Amanda M. Shrewsbury	UCLA	ashrewsbury@mednet.ucl
Bianca E. Russell	UCLA	berussell@mednet.ucla.ed
Layal F. Abi Farraj	UCLA	LAbiFarraj@mednet.ucla.e
Elizabeth A Worthey	UAB	eaworthey@uabmc.edu
Tarun KK Mamidi	UAB	tmamidi@uab.edu
Brandon M Wilk	UAB	brandonwilk@uabmc.edu
Rachel Li	Sanford	Rachel.Li@SanfordHealth.
Jennifer Morgan	Sanford	Jennifer.Morgan@Sanford
Chun-Hung Chan	Sanford	Chun-Hung.Chan@Sanfor
Paul Berger	Sanford	Paul.berger@sanfordhealt
Mohamad Saifeddine	Sanford	Mohamad.Saifeddine@Sa
Isum Ward	Sanford	Isum.Ward@SanfordHealt
Jason Schend	Sanford	Jason.Schend@SanfordHe
Megan Bell	Sanford	Megan.bell@sanfordhealt
Dr. Francisco Bustos velasq	Sanford	Francisco.Bustos@sanford
Taylor Beagle	Sanford	Taylor.Beagle@SanfordHe
Miranda Leitheiser	Sanford	Miranda.Leitheiser@Sanf
Runjun Kumar	WUSTL Clinical	rdkumar@uw.edu
Donald Basel	MCW-CW	dbasel@mcw.edu
Michael Muriello	MCW-CW	mmuriello@mcw.edu
Brett Bordini	MCW-CW	bbordini@mcw.edu
Michael Zimmermann	MCW-CW	mtzimmermann@mcw.ed
Abdul Elkadri	MCW-CW	AElKadri@mcw.edu
James Verbsky	MCW-CW	jverbsky@mcw.edu
Julie McCarrier	MCW-CW	jmccarrier@mcw.edu

Open in a new tab

Support and Thanks

This work leveraged the resources provided by the Vanderbilt Advanced Computing Center for Research and Education (ACCRE), a collaboratory operated by and for Vanderbilt faculty. ACCRE is comprised of over 3,000 researchers from more than 40 campus departments.

The pipeline would not have been possible without the energetic and helpful support from staff at Uniprot, ENSEMBL, SwissModel, and Modbase.

Grants

This work has been supported by NIH grant R01 LM013434-04.

J.M. acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1052 (209933838), and SPP 2363 (460865652). J.M. is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J.M. is supported by BMBF (Federal Ministry of Education and Research) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI). This work is partly supported by the Federal Ministry of Education and Research (BMBF) through DAAD project 57616814 (SECAI, School of Embedded Composite AI). Work in the Meiler laboratory is further supported through the NIH (R01 HL122010, R01 DA046138, R01 AG068623, U01 AI150739, R01 CA227833, R01 LM013434, S10 OD016216, S10 OD020154, S10 OD032234). This work was supported by the BMBF-funded German Network for Bioinformatics Infrastructure (de.NBI).

Research reported in this publication was supported by the National Institute Of Neurological Disorders And Stroke of the National Institutes of Health under Award Numbers [U01HG007674, U01NS134349, U01HG010215, U01NS134354]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

1.Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Davydov E V., Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLoS Comput Biol. 2010 Dec 2;6(12):e1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005. Jul;15(7):901–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Ng PC. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 Jul 1;31(13):3812–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Ramensky V. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002 Sep 1;30(17):3894–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kobren SN, Baldridge D, Velinder M, Krier JB, LeBlanc K, Esteves C, et al. Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases. Genetics in Medicine. 2021. Jun;23(6):1075–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Flanagan SE, Patch AM, Ellard S. Using SIFT and PolyPhen to Predict Loss-of-Function and Gain-of-Function Mutations. Genet Test Mol Biomarkers. 2010. Aug;14(4):533–7. [DOI] [PubMed] [Google Scholar]
8.McDonald EF, Oliver KE, Schlebach JP, Meiler J, Plate L. Benchmarking AlphaMissense pathogenicity predictions against cystic fibrosis variants. PLoS One. 2024;19(1):e0297560. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Murali H, Wang P, Liao EC, Wang K. Genetic variant classification by predicted protein structure: A case study on IRF6. Comput Struct Biotechnol J. 2024. Dec;23:892–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Taipale M. Disruption of protein function by pathogenic mutations: common and uncommon mechanisms. Biochemistry and Cell Biology. 2019. Feb;97(1):46–57. [DOI] [PubMed] [Google Scholar]
11.Mukherjee S, Cassini TA, Hu N, Yang T, Li B, Shen W, et al. Personalized structural biology reveals the molecular mechanisms underlying heterogeneous epileptic phenotypes caused by de novo KCNC2 variants. Human Genetics and Genomics Advances. 2022. Oct;3(4):100131. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Nielsen S V., Stein A, Dinitzen AB, Papaleo E, Tatham MH, Poulsen EG, et al. Predicting the impact of Lynch syndrome-causing missense mutations from structural calculations. PLoS Genet. 2017 Apr 19;13(4):e1006739. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Caswell RC, Gunning AC, Owens MM, Ellard S, Wright CF. Assessing the clinical utility of protein structural analysis in genomic variant classification: experiences from a diagnostic laboratory. Genome Med. 2022 Dec 22;14(1):77. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Laskowski RA, Stephenson JD, Sillitoe I, Orengo CA, Thornton JM. VarSite: Disease variants and protein structure. Protein Science. 2020 Jan 27;29(1):111–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Stephenson JD, Totoo P, Burke DF, Jänes J, Beltrao P, Martin MJ. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res. 2024 May 20; [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJE. Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol. 2019. May;431(11):2197–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Philipp M, Moth CW, Ristic N, Tiemann JKS, Seufert F, Panfilova A, et al. MutationExplorer : a webserver for mutation of proteins and 3D visualization of energetic impacts. Nucleic Acids Res. 2024 Apr 22; [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Bateman A, Martin MJ, Orchard S, Magrane M, Ahmad S, Alpi E, et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D523–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlić A, Rose PW. NGL viewer: web-based molecular graphics for large complexes. Bioinformatics. 2018 Nov 1;34(21):3755–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Sivley RM, Dou X, Meiler J, Bush WS, Capra JA. Comprehensive Analysis of Constraint on the Spatial Distribution of Missense Variants in Human Protein Structures. The American Journal of Human Genetics. 2018. Mar;102(3):415–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Sivley RM, Sheehan JH, Kropski JA, Cogan J, Blackwell TS, Phillips JA, et al. Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia. BMC Bioinformatics. 2018 Dec 23;19(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016 Jul 8;44(W1):W344–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002 Jul 1;18(suppl_1):S71–7. [DOI] [PubMed] [Google Scholar]
24.Li B, Roden DM, Capra JA. The 3D mutational constraint on amino acid sites in the human proteome. Nat Commun. 2022 Jun 7;13(1):3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Mukherjee S, Cogan JD, Newman JH, Phillips JA, Hamid R, Meiler J, et al. Identifying digenic disease genes via machine learning in the Undiagnosed Diseases Network. The American Journal of Human Genetics. 2021. Oct;108(10):1946–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Yuan Y, Zhang L, Long Q, Jiang H, Li M. An accurate prediction model of digenic interaction for estimating pathogenic gene pairs of human diseases. Comput Struct Biotechnol J. 2022;20:3639–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Gahl WA, Wise AL, Ashley EA. The Undiagnosed Diseases Network of the National Institutes of Health. JAMA. 2015 Nov 3;314(17):1797. [DOI] [PubMed] [Google Scholar]
28.Ramoni RB, Mulvihill JJ, Adams DR, Allard P, Ashley EA, Bernstein JA, et al. The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. The American Journal of Human Genetics. 2017. Feb;100(2):185–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011 Aug 1;27(15):2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D988–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016 Dec 6;17(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Sinitcyn P, Richards AL, Weatheritt RJ, Brademan DR, Marx H, Shishkova E, et al. Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol. 2023 Dec 23;41(12):1776–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017. May;27(5):849–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022 Apr 14;604(7905):310–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Dana JM, Gutmanas A, Tyagi N, Qi G, O’Donovan C, Martin M, et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 2019 Jan 8;47(D1):D482–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Berman HM. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H, Kim SJ, et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2014. Jan;42(D1):D336–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Bienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, et al. The SWISS-MODEL Repository—new features and functionality. Nucleic Acids Res. 2017 Jan 4;45(D1):D313–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug 26;596(7873):583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Leach P, Mealling M, Salz R. A Universally Unique IDentifier (UUID) URN Namespace. 2005. Jul.
42.Yoo AB, Jette MA, Grondona M. SLURM: Simple Linux Utility for Resource Management. In 2003. p. 44–60.
43.Park H, Bradley P, Greisen P, Liu Y, Mulligan VK, Kim DE, et al. Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. J Chem Theory Comput. 2016 Dec 13;12(12):6201–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Kellogg EH, Leaver-Fay A, Baker D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics. 2011 Mar 3;79(3):830–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Frenz B, Lewis SM, King I, DiMaio F, Park H, Song Y. Prediction of Protein Mutational Free Energy: Benchmark and Sampling Improvements Increase Classification Accuracy. Front Bioeng Biotechnol. 2020 Oct 8;8. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018 Jan 4;46(D1):D1062–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May 28;581(7809):434–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Tubiana J, Schneidman-Duhovny D, Wolfson HJ. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat Methods. 2022 Jun 30;19(6):730–9. [DOI] [PubMed] [Google Scholar]
49.Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics. 2017 Dec 15;33(24):3909–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (1979). 2023 Sep 22;381(6664). [DOI] [PubMed] [Google Scholar]
51.Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D412–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019 Jan 8;47(D1):D941–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Yao X, Gao S, Yan N. Structural basis for pore blockade of human voltage-gated calcium channel Cav1.3 by motion sickness drug cinnarizine. Cell Res. 2022 Apr 27;32(10):946–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Ezell KM, Tinker RJ, Furuta Y, Gulsevin A, Bastarache L, Hamid R, et al. Undiagnosed Disease Network collaborative approach in diagnosing rare disease in a patient with a mosaic CACNA1D variant. Am J Med Genet A. 2024 Jul 21;194(7). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Furuta Y, Tinker RJ, Gulsevin A, Neumann SM, Hamid R, Cogan JD, et al. Probable digenic inheritance of Diamond–Blackfan anemia. Am J Med Genet A. 2024 Mar 27;194(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Brown BP, Stein RA, Meiler J, Mchaourab HS. Approximating Projections of Conformational Boltzmann Distributions with AlphaFold2 Predictions: Opportunities and Limitations. J Chem Theory Comput. 2024 Jan 12; [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Sommer MJ, Cha S, Varabyou A, Rincon N, Park S, Minkin I, et al. Structure-guided isoform identification for the human transcriptome. Elife. 2022 Dec 15;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 May 8; [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Nachtegael C, Gravel B, Dillen A, Smits G, Nowé A, Papadimitriou S, et al. Scaling up oligogenic diseases research with OLIDA: the Oligogenic Diseases Database. Database (Oxford). 2022 Apr 12;2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.pdf^{(476.3KB, pdf)}

[R1] 1.Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Davydov E V., Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLoS Comput Biol. 2010 Dec 2;6(12):e1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005. Jul;15(7):901–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Ng PC. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 Jul 1;31(13):3812–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Ramensky V. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002 Sep 1;30(17):3894–900. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Kobren SN, Baldridge D, Velinder M, Krier JB, LeBlanc K, Esteves C, et al. Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases. Genetics in Medicine. 2021. Jun;23(6):1075–85. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Flanagan SE, Patch AM, Ellard S. Using SIFT and PolyPhen to Predict Loss-of-Function and Gain-of-Function Mutations. Genet Test Mol Biomarkers. 2010. Aug;14(4):533–7. [DOI] [PubMed] [Google Scholar]

[R8] 8.McDonald EF, Oliver KE, Schlebach JP, Meiler J, Plate L. Benchmarking AlphaMissense pathogenicity predictions against cystic fibrosis variants. PLoS One. 2024;19(1):e0297560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Murali H, Wang P, Liao EC, Wang K. Genetic variant classification by predicted protein structure: A case study on IRF6. Comput Struct Biotechnol J. 2024. Dec;23:892–904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Taipale M. Disruption of protein function by pathogenic mutations: common and uncommon mechanisms. Biochemistry and Cell Biology. 2019. Feb;97(1):46–57. [DOI] [PubMed] [Google Scholar]

[R11] 11.Mukherjee S, Cassini TA, Hu N, Yang T, Li B, Shen W, et al. Personalized structural biology reveals the molecular mechanisms underlying heterogeneous epileptic phenotypes caused by de novo KCNC2 variants. Human Genetics and Genomics Advances. 2022. Oct;3(4):100131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Nielsen S V., Stein A, Dinitzen AB, Papaleo E, Tatham MH, Poulsen EG, et al. Predicting the impact of Lynch syndrome-causing missense mutations from structural calculations. PLoS Genet. 2017 Apr 19;13(4):e1006739. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Caswell RC, Gunning AC, Owens MM, Ellard S, Wright CF. Assessing the clinical utility of protein structural analysis in genomic variant classification: experiences from a diagnostic laboratory. Genome Med. 2022 Dec 22;14(1):77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Laskowski RA, Stephenson JD, Sillitoe I, Orengo CA, Thornton JM. VarSite: Disease variants and protein structure. Protein Science. 2020 Jan 27;29(1):111–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Stephenson JD, Totoo P, Burke DF, Jänes J, Beltrao P, Martin MJ. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res. 2024 May 20; [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJE. Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol. 2019. May;431(11):2197–212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Philipp M, Moth CW, Ristic N, Tiemann JKS, Seufert F, Panfilova A, et al. MutationExplorer : a webserver for mutation of proteins and 3D visualization of energetic impacts. Nucleic Acids Res. 2024 Apr 22; [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Bateman A, Martin MJ, Orchard S, Magrane M, Ahmad S, Alpi E, et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D523–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlić A, Rose PW. NGL viewer: web-based molecular graphics for large complexes. Bioinformatics. 2018 Nov 1;34(21):3755–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Sivley RM, Dou X, Meiler J, Bush WS, Capra JA. Comprehensive Analysis of Constraint on the Spatial Distribution of Missense Variants in Human Protein Structures. The American Journal of Human Genetics. 2018. Mar;102(3):415–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Sivley RM, Sheehan JH, Kropski JA, Cogan J, Blackwell TS, Phillips JA, et al. Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia. BMC Bioinformatics. 2018 Dec 23;19(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016 Jul 8;44(W1):W344–50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002 Jul 1;18(suppl_1):S71–7. [DOI] [PubMed] [Google Scholar]

[R24] 24.Li B, Roden DM, Capra JA. The 3D mutational constraint on amino acid sites in the human proteome. Nat Commun. 2022 Jun 7;13(1):3273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Mukherjee S, Cogan JD, Newman JH, Phillips JA, Hamid R, Meiler J, et al. Identifying digenic disease genes via machine learning in the Undiagnosed Diseases Network. The American Journal of Human Genetics. 2021. Oct;108(10):1946–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Yuan Y, Zhang L, Long Q, Jiang H, Li M. An accurate prediction model of digenic interaction for estimating pathogenic gene pairs of human diseases. Comput Struct Biotechnol J. 2022;20:3639–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Gahl WA, Wise AL, Ashley EA. The Undiagnosed Diseases Network of the National Institutes of Health. JAMA. 2015 Nov 3;314(17):1797. [DOI] [PubMed] [Google Scholar]

[R28] 28.Ramoni RB, Mulvihill JJ, Adams DR, Allard P, Ashley EA, Bernstein JA, et al. The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. The American Journal of Human Genetics. 2017. Feb;100(2):185–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011 Aug 1;27(15):2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D988–95. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016 Dec 6;17(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Sinitcyn P, Richards AL, Weatheritt RJ, Brademan DR, Marx H, Shishkova E, et al. Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol. 2023 Dec 23;41(12):1776–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017. May;27(5):849–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022 Apr 14;604(7905):310–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Dana JM, Gutmanas A, Tyagi N, Qi G, O’Donovan C, Martin M, et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 2019 Jan 8;47(D1):D482–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Berman HM. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H, Kim SJ, et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2014. Jan;42(D1):D336–46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Bienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, et al. The SWISS-MODEL Repository—new features and functionality. Nucleic Acids Res. 2017 Jan 4;45(D1):D313–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug 26;596(7873):583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Leach P, Mealling M, Salz R. A Universally Unique IDentifier (UUID) URN Namespace. 2005. Jul.

[R42] 42.Yoo AB, Jette MA, Grondona M. SLURM: Simple Linux Utility for Resource Management. In 2003. p. 44–60.

[R43] 43.Park H, Bradley P, Greisen P, Liu Y, Mulligan VK, Kim DE, et al. Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. J Chem Theory Comput. 2016 Dec 13;12(12):6201–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Kellogg EH, Leaver-Fay A, Baker D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics. 2011 Mar 3;79(3):830–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Frenz B, Lewis SM, King I, DiMaio F, Park H, Song Y. Prediction of Protein Mutational Free Energy: Benchmark and Sampling Improvements Increase Classification Accuracy. Front Bioeng Biotechnol. 2020 Oct 8;8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018 Jan 4;46(D1):D1062–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May 28;581(7809):434–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Tubiana J, Schneidman-Duhovny D, Wolfson HJ. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat Methods. 2022 Jun 30;19(6):730–9. [DOI] [PubMed] [Google Scholar]

[R49] 49.Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics. 2017 Dec 15;33(24):3909–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (1979). 2023 Sep 22;381(6664). [DOI] [PubMed] [Google Scholar]

[R51] 51.Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D412–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019 Jan 8;47(D1):D941–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Yao X, Gao S, Yan N. Structural basis for pore blockade of human voltage-gated calcium channel Cav1.3 by motion sickness drug cinnarizine. Cell Res. 2022 Apr 27;32(10):946–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Ezell KM, Tinker RJ, Furuta Y, Gulsevin A, Bastarache L, Hamid R, et al. Undiagnosed Disease Network collaborative approach in diagnosing rare disease in a patient with a mosaic CACNA1D variant. Am J Med Genet A. 2024 Jul 21;194(7). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Furuta Y, Tinker RJ, Gulsevin A, Neumann SM, Hamid R, Cogan JD, et al. Probable digenic inheritance of Diamond–Blackfan anemia. Am J Med Genet A. 2024 Mar 27;194(3). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Brown BP, Stein RA, Meiler J, Mchaourab HS. Approximating Projections of Conformational Boltzmann Distributions with AlphaFold2 Predictions: Opportunities and Limitations. J Chem Theory Comput. 2024 Jan 12; [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Sommer MJ, Cha S, Varabyou A, Rincon N, Park S, Minkin I, et al. Structure-guided isoform identification for the human transcriptome. Elife. 2022 Dec 15;11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] 58.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 May 8; [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Nachtegael C, Gravel B, Dillen A, Smits G, Nowé A, Papadimitriou S, et al. Scaling up oligogenic diseases research with OLIDA: the Oligogenic Diseases Database. Database (Oxford). 2022 Apr 12;2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

VUStruct: a compute pipeline for high throughput and personalized structural biology

Christopher W Moth

Jonathan H Sheehan

Abdullah Al Mamun

R Michael Sivley

Alican Gulsevin

David Rinker

John A Capra

Jens Meiler

Abstract

Introduction

Design and Implementation

Figure 1.

1. Variant Upload and (optional) Genomic Preprocessing

2. Structure Selection and Compute Job planning.

3. Job Launch

4. Job Monitoring

5. Reporting

Figure 2.

Figure 3.

Figure 4.

Dependencies

Results/Application of VUStruct in the interpretation of clinical data:

Availability and Future Directions

Supplementary Material

Support and Thanks

Grants

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

VUStruct: a compute pipeline for high throughput and personalized structural biology

Christopher W Moth

Jonathan H Sheehan

Abdullah Al Mamun

R Michael Sivley

Alican Gulsevin

David Rinker

John A Capra

Jens Meiler

Abstract

Introduction

Design and Implementation

Figure 1.

1. Variant Upload and (optional) Genomic Preprocessing

2. Structure Selection and Compute Job planning.

3. Job Launch

4. Job Monitoring

5. Reporting

Figure 2.

Figure 3.

Figure 4.

Dependencies

Results/Application of VUStruct in the interpretation of clinical data:

Availability and Future Directions

Supplementary Material

Support and Thanks

Grants

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases