Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Oct 1.
Published in final edited form as: Curr Opin Syst Biol. 2018 Sep 17;11:107–116. doi: 10.1016/j.coisb.2018.09.006

Extracting Complementary Insights from Molecular Phenotypes for Prioritization of Disease-Associated Mutations

Shayne D Wierbowski 1,2, Robert Fragoza 2,3, Siqi Liang 1,2, Haiyuan Yu 1,2,
PMCID: PMC6510504  NIHMSID: NIHMS1510156  PMID: 31086831

Abstract

Rapid advances in next-generation sequencing technology have resulted in an explosion of whole-exome/genome sequencing data, providing an unprecedented opportunity to identify disease- and trait-associated variants in humans on a large scale. To date, the long-standing paradigm has leveraged fitness-based approximations to translate this ever-expanding sequencing data into causal insights in disease. However, while this approach robustly identifies variants under evolutionary constraint, it fails to provide molecular insights. Moreover, complex disease phenomena often violate standard assumptions of a direct organismal phenotype to overall fitness effect relationship. Here we discuss the potential of a molecular phenotype-oriented paradigm to uniquely identify candidate disease-causing mutations from the human genetic background. By providing a direct connection between single nucleotide mutations and observable organismal and cellular phenotypes associated with disease, we suggest that molecular phenotypes can readily incorporate alongside established fitness-based methodologies to provide complementary insights to the functional impact of human mutations. Lastly, we discuss how integrated approaches between molecular phenotypes and fitness-based perspectives facilitate new insights into the molecular mechanisms underlying disease-associated mutations while also providing a platform for improved interpretation of epistasis in human disease.

Introduction

Ever-improving next-generation sequencing technologies have led to the ongoing discovery of tens of millions of DNA variants across diverse human populations [1] and have enabled the identification of tens of thousands of disease-associated mutations [2, 3]. Nonetheless, a vast majority of these variants remain uncharacterized and a corresponding understanding of how these unannotated variants may contribute to human disease and traits has yet to materialize [4]. Although numerous mutations occur in noncoding regions of genomes, missense variants are of particular interest to researchers since known disease- and trait-associated mutations have been shown to be enriched in coding regions [5]. Proper interpretation of the functional impact of missense mutations, which dominate exome sequencing datasets, remains a pivotal challenge. Overcoming this challenge will require new tools and approaches that better leverage large-scale sequencing data and that take advantage of newly emerging sources of experimentally assessed functional variant data.

Functional prediction algorithms have provided a boon towards the identification and prioritization of disease-associated mutations. Although early approaches to disease association specifically prioritized rare variants, tools such as SIFT [68], PolyPhen-2 [8, 9], CADD [10], and PROVEAN [1113] have provided systematic methods for predicting the impact of missense variants. Other tools, such as GWAVA [14] and LinSIGHT [15], tailor their methodology specifically to non-coding variants. These approaches share a central approach that utilizes principles of population genetics and conservation both within humans and across species as a means of approximating the fitness cost of specific variants. Cumulatively, these methods have been widely used in prior identification of disease-associated mutations [1621]. However, while these methods continue to persist as invaluable tools for prioritizing coding and non-coding mutations in disease, annotations from these tools alone do not provide insight into the underlying molecular mechanisms of causal variants. Indeed, no method to-date can effectively identify true risk missense variants for human disease [22, 23].

A guiding principle of precision medicine is to accurately measure clinical and molecular attributes of individual patients so as to tailor personalized therapies based on the outcomes of these measurements [24]. Considering millions of DNA variants segregating in human genomes, and the extraordinary level of allelic heterogeneity found in disease, success of the precision medicine effort hinges not only on the ability to detect disease-causing mutations, but also to understand and properly assess the functional consequences of these mutations. A major challenge, therefore, is to radically accelerate the pace of experimental and computational assessments of the functional impacts of millions of single nucleotide variants (SNVs) uncovered by sequencing efforts. Direct assessments of molecular phenotypes—such as impact on protein stability, enzymatic kinetics, or binding efficiencies by missense mutations or gene regulatory impacts by non-coding mutations—provide a unique and complementary perspective to current methods for detecting causal disease mutations. Integrating molecular phenotype data into fitness-based approaches for identifying deleterious mutations may also provide new insights into how causal mutations mechanistically function and provides a framework for dissecting epistatic relationships that modulate the impact of low penetrance mutations.

Caveats to Fitness-Based Methods

Long-standing computational methods rooted in approximating fitness effects have provided considerable headway towards the identification of disease-causing mutations on genome-wide scales. However, carving out the path for future innovation in variant prioritization—and moreover mechanistic interpretation—necessitates an awareness of the limitations and caveats surrounding the current methods. Indeed, despite their widespread use, current algorithms often perform poorly in clinical settings and seldom result in measurable phenotypes. For example, Miosge and colleagues examined 33 de novo missense mutations occurring in essential immune system genes in mice found that only 20% of mutations predicted to be deleterious by PolyPhen-2 resulted in discernible phenotypes in mice homozygous for the de novo mutations tested [25]. A more recent study expanded the scope of this genotype-phenotype by inducing 116,330 random ENU mutations in mice. Their results showed that only 17% of missense mutations scored as “probably damaging” by PolyPhen-2 resulted in discernible phenotypes in mice homozygous for the tested mutation [26]. Similar limitations for variant annotation algorithms were reported for a set of 236 clinically-relevant BRCA1/2 mutations [27]. Implicit biases in the training sets used to develop variant annotation algorithms [28, 29], including limited sensitivity to disease-associated common variation [30], as well as high false positive rates across classifiers [2527, 31] may contribute to the limited accuracy of these methods to predict organismal phenotypes. Moreover, variant annotation algorithms provide little to no mechanistic insight as to how a predicted deleterious variant may function. This information is critical for developing targeted hypotheses and clinical strategies to target causal mutations.

Variant annotation algorithms have limited sensitivity to disease-associated common variants

Variant annotation algorithms vary greatly in their applications as do the methodologies that drive their predictions. Briefly, algorithms specific to coding variation, including PolyPhen-2 and Mutation Taster, use various protein structure- and nucleotide-based databases to generate multiple sequence alignments for evaluating conservation of examined coding sites. Ultimately though, the breadth of disease-associated mutations represented in their training sets largely determines whether a variant annotation algorithm classifies a mutation as deleterious or not [32]. Biases and errors in these training sets can therefore limit the sensitivity of these tools to accurately detect deleterious variants [28], as can limited sensitivity for variants involved in complex, non-Mendelian disease [33]. In general, the lower the allele frequency of a variant, the more likely a variant annotation algorithm is to score it as deleterious [29]. As a result, variant annotation algorithms also underperform in detecting disease- and risk-associated mutations that occur at common allele frequencies [30, 33].

Given the conceptual framework of identifying causal variants through fitness effects, and the historic emphasis of previous studies on highly penetrant, Mendelian diseases, underperformance detecting these deleterious common variants is logical. Though purifying selection should limit the capacity of truly deleterious variants to achieve common allele frequencies (MAF > 1.0%), the probability of such variants reaching high allele frequencies is never zero; particularly if the variant affects a trait minimally associated with reproductive fitness. Indeed, several examples of clinically-relevant, disease-associated variants at common allele frequencies follow this pattern. For example, gene dosage effects from the apolipoprotein E type 4 allele (MAF = 18.4%) increase Alzheimer’s disease risk by 20 to 90% [3436]. Likewise, carriers of the P12A polymorphism of PPARG (MAF = 11.0%) are significantly more likely to develop type 2 diabetes [37, 38]. Similar examples of common variants (MAF > 1.0%) that result in or modulate disease risk are detailed in current literature [24, 3951] and briefly summarized in Table 1. Notably, only one of these listed disease-associated mutations scores as “probably damaging” by PolyPhen-2 while only a handful of cases are scored as “deleterious” by SIFT (Table 1). Moreover, functional mutations at common allele frequencies, including R543Q and C282Y mutations in F5 [52, 53] and HFE [5457], respectively, represent disease mutations with incomplete penetrance (Table 1). Despite strong evidence linking these mutations to disease risk [5257], a majority of carriers of these variants do not develop their associated diseases [58]. While there is evidence suggesting that many of these mutations may be annotation errors or artifacts of association studies [59, 60], partially penetrant disease-associated mutations, nonetheless, still modulate disease risk. The current framework for variant annotation is evidently ill-suited to discern variants associated with subtle effects. Yet characterizing precisely these mutations will be crucial toward understanding how an individual’s genetic background determines their risk for particular diseases and influences complex traits.

Table 1.

A curation of the literature highlights several disease-associated variants occurring at unexpectedly common minor allele frequencies (MAF > 1.0%). These variants exhibit lower selection pressure than may be anticipated given their well-studied connections to disease phenotype, exemplifying the confounding that occurs when using fitness driven perspectives to explain and detect disease mutations. Indeed, two common variant annotation algorithms, PolyPhen-2 and SIFT, have infrequently labeled these known functional mutations with their highest functional annotations.

Gene Mutation ExAC MAF rsID PolyPhen-2 Score SIFT Score Disease Citation
APOE C130R 18.40% rs429358 benign tolerated Alzheimer’s disease [3436]
ARMS2 A69S 25.50% rs10490924 possibly damaging deleterious (low confidence Age-related macular degeneration [39, 40]
BTD D444H 3.20% rs13078881 benign deleterious Partial biotinidase deficiency [41, 42]
CFH Y402H 32.80% rs1061170 benign tolerated Age-related macular degeneration [4345]
COL4A2 E1123G 1.70% rs117412802 possibly damaging unscored Haemorrhagic stroke [46]
F5 R543Q 2.20% rs6025 benign tolerated Factor V Leiden [52, 53]
HFE C282Y 3.20% rs1800562 probably damaging deleterious Haemochromatosis [5457]
INHA A257T 2.40% rs12720062 benign tolerated Premature ovarian failure [4749]
PPARG P12A 11.00% rs1801282 benign deleterious (low confidence Type 2 diabetes [37, 38]
PRSS1 A16V 1.60% rs202003805 benign tolerated Chronic pancreatitis [50, 51]
TRIM22 R321K 3.00% rs12364019 possibly damaging deleterious Inflammatory bowel disease [24]
TRIM22 S244L 1.40% rs61735273 possibly damaging deleterious Inflammatory bowel disease [24]

High discordance between variant annotation algorithms

In practice, researchers incorporate multiple variant annotation algorithms to identify putatively functional mutations from whole-exome/genome sequencing data; however, discordance between the results of these algorithms is high. Indeed, a study that applied seven different variant annotation algorithms to data from the Exome Sequencing Project found that 47% of nonsynonymous variants were predicted to be functional by at least one algorithm while only 1% of nonsynonymous variants were scored as functional by all seven annotation tools [31]. Large discrepancies were also observed between variant annotation algorithms when applied to phenotype-associated mutations and were each suggested to greatly overestimate the damaging effect of their predicted functional mutations [26]. A “majority rule” criteria in which at least four of seven variant annotation algorithms must score the variant as functional for the variant to be considered deleterious can instead be applied [31, 61], but false negative rates are presumably very high when combining the results from distinct variant annotation algorithms in this manner. The distinct datasets and annotation sources used to develop each of these variant annotation algorithms can be used instead to train a single support vector machine for predicting putatively functional alleles, as developed for CADD. Nonetheless, despite impressive classification accuracy, CADD achieved only a 15% success rate when applied to the aforementioned set of 33 de novo missense mutations in essential immune system genes studied by Miosge and colleagues [25].

Variant annotation algorithms alone provide limited mechanistic insights

Mutations can perturb cellular activity in multiple ways. In particular, disease-associated missense mutations often function by disrupting protein-protein interactions [6264], destabilizing protein folding [62, 63], or altering transcription factor activity [65, 66]. Understanding the molecular mechanisms through which disease-associated mutations function is imperative for developing clinical strategies to treat their corresponding phenotypes and for drug target assessment [67, 68]. In spite of this importance, only a single widely used variant annotation algorithm for coding variants, MutPred2 [69], currently evaluates the possible mechanisms by which mutations scored as deleterious may function. More precise predictions for deleterious variants and better insights to their corresponding molecular mechanisms may be achieved through improved structural databases to detail where missense mutations physically occur with respect to protein interface residues [70, 71]. Similar database improvements may also apply to variant annotation algorithms that also score noncoding mutations, for example fitCons [72] which evaluates patterns of polymorphisms and genetic divergence to estimate the “fitness consequence” of point mutations genome-wide. However, fitCons, heavily depends on the accuracy of functional elements identified by ENCODE [73]. Recently developed sequence co-variation approaches to predicting the effects of DNA variants bypass dependence on structural feature or functional noncoding annotations [74]; however, mechanistic insights as to how these epistatic dependencies emerge are not provided. As such, integrating structural and functional information from these datasets can provide improved and complementary insights to the molecular function of predicted deleterious mutations.

Molecular Phenotypes: an Orthogonal Framework

In assessing the impact of human variants, we highlight the importance of distinguishing three related yet distinct biological concepts: overall fitness, organismal/cellular phenotype, and molecular phenotype (Figure 1). Overall fitness refers to the ability of an individual to survive and reproduce. Organismal phenotypes refer to observable features, including disease phenotypes such as diabetes, autism spectrum disorder and cancer, or traits such as height, hair color and blood type. Molecular phenotypes refer to the direct effect of a variant at the molecular level. For example, changes in gene expression, loss of protein stability, changes in enzymatic activity, or modifications to protein-protein, protein-DNA or protein-ligand interaction affinities.

Figure 1.

Figure 1.

Graphical depiction of the relationship between three related biological concepts associated with human variations: 1) molecular phenotype, 2) organismal/cellular phenotype, and 3) overall fitness. All genetic variation is either molecularly inert or molecularly active. The cumulation of all molecularly active variants—each causing one or more molecular phenotypes—constitutes the unique genetic background of an individual. Molecular phenotypes provide the ultimate link explaining the mechanistic basis for how SNVs manifest in organismal/cellular phenotypes or come to be selected for or against through fitness effects. Although organismal phenotypes, in general, directly relate to overall fitness, weak effect diseases, late onset/post-reproductive diseases, and partially penetrant mutations often confound this relationship. Researchers have various tools to perform direct inquiries into how these three concepts relate to specific molecularly active variants. Human disease research aims to understand organismal/cellular phenotypes while population genetics provides insights into fitness, conservation, and selection. Researchers investigate molecular phenotypes either through direct experimental assays to observe underlying molecular phenotypes or through computational predictions of putative molecular phenotypes. The ultimate aim is to infer information about one spoke of the triangle through the other two; namely, scientists seek to infer which SNVs are causal disease variants though information about the overall fitness or molecular phenotype effects of the SNV.

All human genetic variation separates into molecularly inert or molecularly active variants depending on whether or not each variant causes a molecular phenotype. While not all molecular phenotypes contribute directly to observable organismal phenotypes, organismal or cellular phenotypes are largely derived in molecularly active variants; and hence must be directly mediated through one or more molecular phenotypes. Likewise, overall fitness is always rooted in molecular phenotypes since molecular changes modulate the ability of the organism to perform various functions necessary for survival and reproduction. In principal, all organismal phenotypes associate with a fitness value ranging from deleterious, to neutral, to advantageous. While there is a direct relationship between organismal phenotypes and fitness, this relationship is not always clearly defined, particularly in specialized fields of disease research dealing with cancer biology, age or post-reproductive related diseases, and complex diseases with reduced penetrance [75]. In such disease studies, the one-to one correspondence between fitness score and the severity of the organismal phenotype breaks down since clinically deleterious phenotypes can have limited impact on reproduction. Molecular phenotypes can be indispensable towards characterizing these cases of ambiguous fitness-to-phenotype relationships.

Molecular phenotypes provide complementary information for identifying causal variants

Whereas most approaches leverage the link between fitness effects and organismal/cellular phenotypes, an alternative framework rooted in molecular phenotypes provides an orthogonal line of support. At least two degrees of separation lie between disease phenotypes caused by particular variants, the fitness effects of these variants, and our ability to discern these effects. By contrast methods aimed at molecular phenotypes directly address the central link. The combination of these two rationally justified, yet conceptually distinct paths connecting SNVs to disease phenotype is expected to culminate in an overall higher degree of accuracy in predicting disease associations. The availability of data and library of tools for assessing molecular phenotypes are currently leagues behind the equivalent datasets for fitness-based approaches. Therefore, it is likely that established conservation and fitness-based methods will remain a valuable step in prioritizing variants, while more direct support from the orthogonal molecular phenotype data should serve as strong confidence in the accuracy of these results.

For instance, a recently developed interaction perturbation framework leveraged annotations of protein-protein interaction (PPI) interface residues [71] alongside PolyPhen-2 scores [76]. Chen and colleagues demonstrated increased accuracy in distinguishing de novo risk variants in autism spectrum disorder from benign mutations in unaffected siblings. Figure 2A provides a reconstructed example in which a proband PolyPhen-2 mutation scored as “probably damaging”, P375L on the protein RARA, occurred on a predicted interface residue. In contrast, a second PolyPhen-2-scored “probably damaging” mutation, R83H on the same RARA protein, was reported in an unaffected individual; however, R83H did not occur on a predicted interaction interface residue. Consequently, despite matching PolyPhen-2 prediction, only the proband P375L mutation was predicted to disrupt the heterodimeric interaction between RARA and RXRB, a prediction which the authors also validated experimentally. This exemplifies the potential for molecular phenotypes to aid in pinpointing candidate causal variants that are otherwise indistinguishable from molecularly inert variants using fitness-based methods alone.

Figure 2.

Figure 2.

Molecular phenotypes including the annotation of protein-protein interaction interface residues can inform the mechanism of disease-associated mutations. A. Homology model between RARA (template 1DKF:B) and RXRB (template 1DKF:A) used to distinguish a potentially causal mutation from a benign mutation. A de novo mutation, P375L, on RARA identified in an autism spectrum disorder-affected individual occurs on an interface residue with RXRB. RARA interface residue mutations were not found in an unaffected sibling. B. Homology model between VHL (PDB 4WQO:A) and ELOC (PDB 4WQO:C) demonstrates potential leveraging of molecular phenotypes to identify convergent mechanisms in divergent disease mutations. Variants on both of these proteins associate with the same disease and localize to the same interface. C. Homology model between BMP4 (template 1REW:B), BMPR1A (template 1REW:A), and BMPR1B (template 3VES:C) shows hypothesis-driven differentiation of mechanisms of different diseases based on molecular phenotype. Two variants on BMP4, A346V, and W325C, associated with divergent diseases localize to distinct interaction interfaces.

Leveraging molecular phenotype approaches towards disentangling molecular mechanisms of causal variants

The molecular phenotype framework provides clear potential to investigate the underlying mechanisms behind how variants manifest in disease phenotypes. Since the specific molecular defect associated with a variant often directly relates to the disease phenotype, identification of candidate variants based on molecular phenotype annotations should enable translational studies for disease etiology. The further development of methods to approximate and predict molecular phenotypes will facilitate the development of actional hypotheses to direct future research.

For instance, Chen et al. used experimentally derived and computationally predicted annotations of protein interaction interface residues [71] as a predictor for the molecular phenotype, loss of PPI. In addition to distinguishing a true autism risk variant, P375L, from other “probably damaging” variants, the additional knowledge that this variant intersected with the RARA-RXRB interaction interface (Figure 2A), led to the testable hypothesis that this variant would disrupt this interaction, and helped to propose a pathway for RARA’s involvement in autism spectrum disorder through this interaction [76].

Extending the interface residue approximation for the loss of PPI molecular phenotype facilitates mechanistic inferences in other cases as well. This approach may be generalized to cases involving variants across both faces of an interface (Figure 2B). Corroborating cross-interface evidence may strengthen the hypothesis that disease-associated mutations function through disruption of a specific interaction and helps categorize distinct variants associated with the same disease by similarities in their molecular mechanisms. Figure 2B shows a known tumor suppressor gene-encoded protein, VHL [77, 78] with a mutation, L158Q, associated with renal cell carcinoma, in complex with an elongation factor, ELOC. The localization of L158Q at the ELOC interface, suggests that the disease may function through disruption of the VHL-ELOC interaction. Moreover, ELOC contains several mutations on the same protein interaction interface, Y79F, Y79N, and Y79S, which are also associated with renal cell carcinoma, solidifying the hypothesis that the these cross-interface variants drive a distinct form of renal cell carcinoma through a single shared molecular phenotype.

Understanding the molecular phenotypes caused by certain disease-associated mutations may further elucidate how several mutations on the same gene can associate with different diseases. For instance, two missense mutations found on the protein BMP4, A346V and W325C, are associated a developmental defect orofacial cleft 11, and colorectal cancer, respectively – two clinically distinct diseases. The homology models provided in Figure 2C demonstrate that these variants localize to opposites ends of the BMP4 structure and occur at distinct protein-protein interaction interfaces. These insights suggest these distinct disease phenotypes may manifest through divergent pathways related to the biological functions of their distinctly targeted interaction partners. Indeed, although BMPR1A and BMPR1B are paralogous, previous studies have linked them to unique functions and disease states [79, 80].

Cumulatively, these interaction perturbation examples demonstrate how molecular phenotypes contribute to elucidation of disease etiology. We emphasize the potential to explore similar mechanistic hypotheses utilizing molecular phenotypes outside of PPI disruption. Recent studies have highlighted the value of examining other molecular phenotypes, including changes in protein stability [81, 82] as well as changes in gene expression level [83, 84], to unravel the pathogenic mechanisms of both coding and non-coding mutations.

Molecular phenotypes help dissect genetic epistasis and clear the path towards precision medicine

The combination of all molecularly active variants and their corresponding molecular phenotypes constitutes the genetic background that defines an individual (Figure 1). Frustratingly, some molecular phenotypes may never produce discernable organismal phenotypes, while others may do so only in the presence of specific, often unknown combinations of complementary molecular phenotypes. Indeed, recent studies in multiple organisms and human cell lines have identified complex pairwise, and even multi-way intertwinement by which deficits in individual genes affect organismal/cellular phenotypes and fitness [8587]. The complex behavior of genetic epistasis has been a major roadblock to establishing causal relationships between genetic variants and human disease. However, there is no epistasis at the molecular level when examining molecular phenotypes of variants. Therefore, particularly compared to fitness effects (one type of organismal phenotypes) which may be completely masked by epistasis, the ability to record or predict concrete molecular phenotypes associated with otherwise silent variants will prove crucial towards dissecting epistasis.

Molecular phenotype-based studies aimed at bridging this disconnect will carry immediate implications in precision medicine. On one front, leveraging molecular phenotype information to interpret the individual’s genetic background is vital for deciphering variations among disease risk and drug response/toxicity among the human population. For example, Young et al. have elucidated how multiple SNVs on SORL1 affect BDNF-induced SORL1 expression in neuronal cells, contributing to risk for Alzheimer’s disease [88]. More recently, Cheng-Hathaway et al. have uncovered the expression-reducing molecular phenotype of another variant, R47H on TREM2, that also increases risk of Alzheimer’s disease [89]. Additionally, a study by Hauser et al. demonstrated that multiple variants on GPCR receptors impact drug response via a variety of molecular alterations, including reduced or increased onset kinetics and altered G-protein-binding specificity [90]. By providing a means to identify and evaluate functional effects at a molecular resolution, these studies help disentangle the links between human genetic variation and personalized disease risk assessment.

On another front, knowledge of molecular phenotypes of diseased tissue, especially in cancer, provides direct guidance on population-wide treatment for specialized types of disease. Tumor subtyping based on mRNA expression, protein expression, and epigenetic profiles [9195] has already been widely used for making therapeutic decisions. A complementary effort in a recent study identified master regulators for metastatic progression of gastroenteropancreatic neuroendocrine tumors across four distinct subtypes, allowing prioritization of compounds based on patient-specific master regulator activity [96]. Harnessing molecular phenotypes that modulate both the genetic background and the disease state of an individual will significantly improve the efficacy of disease prevention, diagnosis, and treatment in a personalized manner.

Conclusion

The incorporation of direct assays for molecular phenotypes and novel computational methods that approximate molecular phenotypes in the continued efforts to identify, prioritize, and understand causal variants in human disease is positioned to provide a truly orthogonal view to the longstanding fitness-based approach. Whereas current variant annotation algorithms rooted in sequencing and fitness approximations have yielded suboptimal specificity, novel methods directed at molecular phenotypes aim to extract complementary molecular insights otherwise unavailable. Towards these ends, researchers have conducted high-throughput assays to directly measure the functional impact of thousands of disease-associated missense mutations on protein-protein interactions [62, 63], protein stability [62], and DNA binding [65, 66]. Literature curation efforts by the IMEx Consortium have provided protein interaction perturbation data corresponding to nearly 8,000 coding mutations in humans [97]. Continued development of high-throughput approaches—including deep-mutational scanning pipelines capable of probing nearly the entire mutational landscape of targeted proteins [98101]—will provide an ever-larger resource of functional mutation data. This data will help elucidate the biochemical and evolutionary properties that differentiate truly damaging mutations from those that are benign.

Despite the impressive scale that high-throughput experimental pipelines have achieved [62, 63, 99], no experimental pipeline alone can keep pace with the rate of sequence variant discovery, highlighting the need for continued development of computational approaches and variant annotation algorithms. A comprehensive effort to integrate these sources of experimentally verified molecular phenotypes to further train widely used fitness-based models will be key to improving their accuracy and clinical application, but remains as of yet unimplemented. Orthogonally, we also emphasize the continued need to develop novel algorithms distinct from the fitness paradigm that make direct predictions about putative molecular phenotypes. For instance, interaction interface residue annotations provide useful mechanistic insights, but low coverage in experimentally validated structures or homology models has limited their applicability. The recently published Interactome INSIDER resource provides a method to predict interface residues—and consequentially loss of PPI phenotypes—in the absence of structural information [71]. MutPred2 enables a combination of approaches, making predictions both for overall functional effect and prioritized potential mechanisms of action [69]. Recently, Wagih et al have released MutFunc containing precomputed predictions for every possible variant in H. sapiens, S cerevisiae, and E. coli. These predictions include estimates for changes to protein stability, protein interaction interfaces, post translational modifications, and transcription factor binding among other approximations for molecular phenotypes [102]. Advances in this realm of widespread predictors for specific molecular phenotypes that can prioritize targeted assays to validate the veracity of those phenotypes will prove crucial to ensure researchers can maintain up-to-date annotations of molecularly activate variants.

Acknowledgments

This work was supported by National Institute of General Medical Sciences grants (R01 GM104424, R01 GM124559, R01 GM125639); Eunice Kennedy Shriver National Institute of Child Health and Human Development grant (R01 HD082568); National Human Genome Research Institute grant (UM1 HG009393, R01 HG008126); National Science Foundation grant (DBI-1661380); Simons Foundation Autism Research Initiative grants (SF367561) to H.Y.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • (**)1.Snyder M, Du J, and Gerstein M, Personal genome sequencing: current approaches and challenges. Genes Dev, 2010. 24(5): p. 423–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.The 1000 Genomes Project Consortium, An integrated map of genetic variation from 1,092 human genomes. Nature, 2012. 491(7422): p. 56–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)3.Fu W, et al. , Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature, 2013. 493(7431): p. 216–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)4.Stenson PD, et al. , The Human Gene Mutation Database: 2008 update. Genome Med, 2009. 1(1): p. 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hindorff LA, et al. , Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 2009. 106(23): p. 9362–9367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)6.Ng PC, SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Research, 2003. 31(13): p. 3812–3814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)7.Choi Y, A Fast Computation of Pairwise Sequence Alignment Scores Between a Protein and a Set of Single-Locus Variants of Another Protein. ACM BCB, 2012. [Google Scholar]
  • (*)8.Adzhubei I, Jordan DM, and Sunyaev SR, Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet, 2013. Chapter 7: p. Unit 7 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Adzhubei IA, et al. , A method and server for predicting damaging missense mutations. Nat Methods, 2010. 7(4): p. 248–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)10.Kircher M, et al. , A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet, 2014. 46(3): p. 310–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Seifi M and Walter MA, Accurate prediction of functional, structural, and stability changes in PITX2 mutations using in silico bioinformatics algorithms. PLoS One, 2018. 13(4): p. e0195971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)12.Choi Y, et al. , Predicting the functional effect of amino acid substitutions and indels. PLoS One, 2012. 7(10): p. e46688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Choi Y and Chan AP, PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics, 2015. 31(16): p. 2745–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ritchie GR, et al. , Functional annotation of noncoding sequence variants. Nat Methods, 2014. 11(3): p. 294–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Huang YF, Gulko B, and Siepel A, Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet, 2017. 49(4): p. 618–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rosenberg S, et al. , A recurrent point mutation in PRKCA is a hallmark of chordoid gliomas. Nat Commun, 2018. 9(1): p. 2371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Graf S, et al. , Identification of rare sequence variation underlying heritable pulmonary arterial hypertension. Nat Commun, 2018. 9(1): p. 1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bhattacharya S, et al. , Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia. Genome Res, 2018. 28(4): p. 423–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tubeleviciute-Aydin A, et al. , Rare human Caspase-6-R65W and Caspase-6-G66R variants identify a novel regulatory region of Caspase-6 activity. Sci Rep, 2018. 8(1): p. 4428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bhatnager R and Dang AS, Comprehensive in-silico prediction of damage associated SNPs in Human Prolidase gene. Sci Rep, 2018. 8(1): p. 9430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cunningham AD, et al. , Coupling between Protein Stability and Catalytic Activity Determines Pathogenicity of G6PD Variants. Cell Rep, 2017. 18(11): p. 2592–2599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Iossifov I, et al. , The contribution of de novo coding mutations to autism spectrum disorder. Nature, 2014. 515(7526): p. 216–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Geisheker MR, et al. , Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains. Nat Neurosci, 2017. 20(8): p. 1043–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)24.Li Q, et al. , Variants in TRIM22 That Affect NOD2 Signaling Are Associated With Very-Early-Onset Inflammatory Bowel Disease. Gastroenterology, 2016. 150(5): p. 1196–1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)25.Miosge LA, et al. , Comparison of predicted and actual consequences of missense mutations. Proceedings of the National Academy of Sciences of the United States of America, 2015. 112(37): p. E5189–E5198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)26.Wang T, et al. , Probability of phenotypically detectable protein damage by ENU-induced mutations in the Mutagenetix database. Nature Communications, 2018. 9(1): p. 441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)27.Ernst C, et al. , Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics. BMC Medical Genomics, 2018. 11(1): p. 35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cooper GM and Shendure J, Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics, 2011. 12: p. 628. [DOI] [PubMed] [Google Scholar]
  • 29.Henn BM, et al. , Estimating Mutation Load in Human Genomes. Nature Reviews Genetics, 2015. 16(6): p. 333–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ng PC and Henikoff S, Predicting the Effects of Amino Acid Substitutions on Protein Function. Annual Review of Genomics and Human Genetics, 2006. 7(1): p. 61–80. [DOI] [PubMed] [Google Scholar]
  • 31.Tennessen JA, et al. , Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 2012. 337(6090): p. 64–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Care MA, et al. , Deleterious SNP prediction: be mindful of your training data! Bioinformatics, 2007. 23(6): p. 664–672. [DOI] [PubMed] [Google Scholar]
  • 33.Thomas PD and Kejariwal A, Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: Evolutionary evidence for differences in molecular effects. Proceedings of the National Academy of Sciences of the United States of America, 2004. 101(43): p. 15398–15403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)34.Corder E, et al. , Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science, 1993. 261(5123): p. 921–923. [DOI] [PubMed] [Google Scholar]
  • (*)35.Strittmatter WJ, et al. , Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proceedings of the National Academy of Sciences of the United States of America, 1993. 90(5): p. 1977–1981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)36.Deary IJ, et al. , Cognitive change and the APOE ɛ4 allele. Nature, 2002. 418: p. 932. [DOI] [PubMed] [Google Scholar]
  • 37.Robitaille J, et al. , The PPAR-gamma P12A polymorphism modulates the relationship between dietary fat intake and components of the metabolic syndrome: results from the Québec Family Study. Clinical Genetics, 2003. 63(2): p. 109–116. [DOI] [PubMed] [Google Scholar]
  • 38.Florez JC, et al. , Effects of the type 2 diabetes-associated PPARG P12A polymorphism on progression to diabetes and response to troglitazone. The Journal of Clinical Endocrinology and Metabolism, 2007. 92(4): p. 1502–1509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)39.Kanda A, et al. , A variant of mitochondrial protein LOC387715/ARMS2, not HTRA1, is strongly associated with age-related macular degeneration. Proceedings of the National Academy of Sciences, 2007. 104(41): p. 16227–16232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)40.Rivera A, et al. , Hypothetical LOC387715 is a second major susceptibility gene for age-related macular degeneration, contributing independently of complement factor H to disease risk. Human Molecular Genetics, 2005. 14(21): p. 3227–3236. [DOI] [PubMed] [Google Scholar]
  • (*)41.Norrgard KJ, et al. , Double mutation (A171T) and (D444H) is a common cause of profound biotinidase deficiency in children ascertained by newborn screening in the United States. Human Mutation, 1998. 11(5): p. 410-410. [DOI] [PubMed] [Google Scholar]
  • (*)42.Borsatto T, et al. , Biotinidase deficiency: clinical and genetic studies of 38 Brazilian patients. BMC Medical Genetics, 2014. 15(1): p. 96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)43.Klein RJ, et al. , Complement Factor H Polymorphism in Age-Related Macular Degeneration. Science, 2005. 308(5720): p. 385–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)44.Edwards AO, et al. , Complement Factor H Polymorphism and Age-Related Macular Degeneration. Science, 2005. 308(5720): p. 421–424. [DOI] [PubMed] [Google Scholar]
  • (*)45.Haines JL, et al. , Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration. Science, 2005. 308(5720): p. 419–421. [DOI] [PubMed] [Google Scholar]
  • (*)46.Jeanne M, et al. , COL4A2 Mutations Impair COL4A1 and COL4A2 Secretion and Cause Hemorrhagic Stroke. The American Journal of Human Genetics, 2012. 90(1): p. 91–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)47.Chand AL, et al. , Functional analysis of the human inhibin α subunit variant A257T and its potential role in premature ovarian failure. Human Reproduction, 2007. 22(12): p. 3241–3248. [DOI] [PubMed] [Google Scholar]
  • (*)48.Chand AL, Harrison CA, and Shelling AN, Inhibin and premature ovarian failure. Human Reproduction Update, 2010. 16(1): p. 39–50. [DOI] [PubMed] [Google Scholar]
  • (*)49.Shelling AN, et al. , Inhibin: a candidate gene for premature ovarian failure. Human Reproduction, 2000. 15(12): p. 2644–2649. [DOI] [PubMed] [Google Scholar]
  • (*)50.Witt* H, Luck‡ W, and Becker‡ M, A signal peptide cleavage site mutation in the cationic trypsinogen gene is strongly associated with chronic pancreatitis. Gastroenterology, 1999. 117(1): p. 7–10. [DOI] [PubMed] [Google Scholar]
  • (*)51.Chen J-M, et al. , The A16V signal peptide cleavage site mutation in the cationic trypsinogen gene and chronic pancreatitis. Gastroenterology, 1999. 117(6): p. 1508–1509. [DOI] [PubMed] [Google Scholar]
  • (*)52.Kujovich JL, Factor V Leiden thrombophilia. Genetics In Medicine, 2010. 13: p. 1. [DOI] [PubMed] [Google Scholar]
  • (*)53.van Mens TE, Levi M, and Middeldorp S, Evolution of Factor V Leiden. Thromb Haemost, 2013. 110(07): p. 23–30. [DOI] [PubMed] [Google Scholar]
  • 54.Beutler E, The HFE Cys282Tyr mutation as a necessary but not sufficient cause of clinical hereditary hemochromatosis. Blood, 2003. 101(9): p. 3347–3350. [DOI] [PubMed] [Google Scholar]
  • 55.McCune CA, et al. , Iron loading and morbidity among relatives of HFE C282Y homozygotes identified either by population genetic testing or presenting as patients. Gut, 2006. 55(4): p. 554–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Whitlock EP, et al. , Screening for hereditary hemochromatosis: A systematic review for the u.s. preventive services task force. Annals of Internal Medicine, 2006. 145(3): p. 209–223. [DOI] [PubMed] [Google Scholar]
  • 57.Rossi E, Olynyk JK, and Jeffrey GP, Clinical penetrance of C282Y homozygous HFE hemochromatosis. Expert Review of Hematology, 2008. 1(2): p. 205–216. [DOI] [PubMed] [Google Scholar]
  • (**)58.Cooper DN, et al. , Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Human Genetics, 2013. 132(10): p. 1077–1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Lek M, et al. , Analysis of protein-coding genetic variation in 60,706 humans. Nature, 2016. 536(7616): p. 285–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Walsh R, et al. , Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples. Genetics In Medicine, 2016. 19: p. 192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Fu W, et al. , Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature, 2013. 493(7431): p. 216–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)62.Sahni N, et al. , Widespread macromolecular interaction perturbations in human genetic disorders. Cell, 2015. 161(3): p. 647–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)63.Wei X, et al. , A Massively Parallel Pipeline to Clone DNA Variants and Examine Molecular Phenotypes of Human Disease Mutations. PLOS Genetics, 2014. 10(12): p. e1004819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zhong Q, et al. , Edgetic perturbation models of human inherited disorders. Molecular Systems Biology, 2009. 5(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)65.Barrera LA, et al. , Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science, 2016. 351(6280): p. 1450–1454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)66.Fuxman Bass JI, et al. , Human gene-centered transcription factor networks for enhancers and disease variants. Cell, 2015. 161(3): p. 661–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Stefl S, et al. , Molecular Mechanisms of Disease-Causing Missense Mutations. Journal of Molecular Biology, 2013. 425(21): p. 3919–3936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Schenone M, et al. , Target identification and mechanism of action in chemical biology and drug discovery. Nature Chemical Biology, 2013. 9: p. 232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (**)69.Pejaver V, et al. , MutPred2: inferring the molecular and phenotypic impact of amino acid variants. bioRxiv, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Wang X, et al. , Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nature Biotechnology, 2012. 30(2): p. 159–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (**)71.Meyer MJ, et al. , Interactome INSIDER: a structural interactome browser for genomic studies. Nature Methods, 2018. 15: p. 107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)72.Gulko B, et al. , A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nature Genetics, 2015. 47: p. 276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)73.The ENCODE Project Consortium, An integrated encyclopedia of DNA elements in the human genome. Nature, 2012. 489(7414): p. 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)74.Hopf TA, et al. , Mutation effects predicted from sequence co-variation. Nature Biotechnology, 2017. 35: p. 128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wright A, et al. , A polygenic basis for late-onset disease. Trends Genet, 2003. 19(2): p. 10. [DOI] [PubMed] [Google Scholar]
  • (**)76.Chen S, et al. , An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders. Nat Genet, 2018. 50(7): p. 1032–1040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Sufan RI, Jewett MAS, and Ohh M, The role of von Hippel-Lindau tumor suppressor protein and hypoxia in renal clear cell carcinoma. American Journal of Physiology-Renal Physiology, 2004. 287: p. F1–F6. [DOI] [PubMed] [Google Scholar]
  • 78.Kaelin WG, The von Hippel-Lindau Tumor Suppressor Protein: An Update. 2007. 435: p. 371–383. [DOI] [PubMed] [Google Scholar]
  • 79.Sahni V, et al. , BMPR1a and BMPR1b signaling exert opposing effects on gliosis after spinal cord injury. J Neurosci, 2010. 30(5): p. 1839–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Racacho L, et al. , Two novel disease-causing variants in BMPR1B are associated with brachydactyly type A1. Eur J Hum Genet, 2015. 23(12): p. 1640–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Takano K, et al. , An X-linked channelopathy with cardiomegaly due to a CLIC2 mutation enhancing ryanodine receptor channel activity. Hum Mol Genet, 2012. 21(20): p. 4497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Koczok K, et al. , A novel point mutation affecting Asn76 of dystrophin protein leads to dystrophinopathy. Neuromuscul Disord, 2018. 28(2): p. 129–136. [DOI] [PubMed] [Google Scholar]
  • 83.Aneichyk T, et al. , Dissecting the Causal Mechanism of X-Linked Dystonia-Parkinsonism by Integrating Genome and Transcriptome Assembly. Cell, 2018. 172(5): p. 897–909 e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Hua JT, et al. , Risk SNP-Mediated Promoter-Enhancer Switching Drives Prostate Cancer through lncRNA PCAT19. Cell, 2018. 174(3): p. 564–575 e18. [DOI] [PubMed] [Google Scholar]
  • 85.Costanzo M, et al. , A global genetic interaction network maps a wiring diagram of cellular function. Science, 2016. 353(6306): p. pii: aaf1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Kuzmin E, et al. , Systematic analysis of complex genetic interactions. Science, 2018. 360(6386). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Horlbeck MA, et al. , Mapping the Genetic Landscape of Human Cells. Cell, 2018. 174(4): p. 953–967 e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Young JE, et al. , Elucidating molecular phenotypes caused by the SORL1 Alzheimer’s disease genetic risk factor using human induced pluripotent stem cells. Cell Stem Cell, 2015. 16(4): p. 373–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Cheng-Hathaway PJ, et al. , The Trem2 R47H variant confers loss-of-function-like phenotypes in Alzheimer’s disease. Mol Neurodegener, 2018. 13(1): p. 29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Hauser AS, et al. , Pharmacogenomics of GPCR Drug Targets. Cell, 2018. 172(1–2): p. 41–54 e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Yersal O and Barutca S, Biological subtypes of breast cancer: Prognostic and therapeutic implications. World J Clin Oncol, 2014. 5(3): p. 412–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Huang KL, et al. , Proteogenomic integration reveals therapeutic targets in breast cancer xenografts. Nat Commun, 2017. 8: p. 14864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Zhang H, et al. , Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer. Cell, 2016. 166(3): p. 755–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Chen TW, et al. , APOBEC3A is an oral cancer prognostic biomarker in Taiwanese carriers of an APOBEC deletion polymorphism. Nat Commun, 2017. 8(1): p. 465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Lomberk G, et al. , Distinct epigenetic landscapes underlie the pathobiology of pancreatic cancer subtypes. Nat Commun, 2018. 9(1): p. 1978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Alvarez MJ, et al. , A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat Genet, 2018. 50(7): p. 979–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (**)97.del-Toro N, et al. , Capturing variation impact on molecular interactions: the IMEx Consortium mutations data set. bioRxiv, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)98.Fowler DM, et al. , High-resolution mapping of protein sequence-function relationships. Nature Methods, 2010. 7: p. 741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)99.Fowler DM and Fields S, Deep mutational scanning: a new style of protein science. Nature Methods, 2014. 11: p. 801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)100.Starita LM, et al. , Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proceedings of the National Academy of Sciences, 2013. 110(14): p. E1263–E1272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (*)101.Starita LM, et al. , Massively Parallel Functional Analysis of BRCA1 RING Domain Variants. Genetics, 2015. 200(2): p. 413–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (**)102.Wagih O, et al. , Comprehensive variant effect predictions of single nucleotide variants in model organisms. 2018. [DOI] [PMC free article] [PubMed]

RESOURCES