Abstract
Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in cancer. Elucidating the functional pathways altered by loss-of-function (LOF) or gain-of-function (GOF) mutations will be crucial for prioritizing cancer-causing variants and their resultant therapeutic liabilities. In this review, we highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks. We also summarize advances in experimental and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of mutations. Together, systematic investigations of the function of GOF mutations will provide an important missing piece for cancer biology and precision therapy.
LOF versus GOF Mutations in Cancer
Rapidly evolving high-throughput sequencing technologies are increasing the amount of human genotypic information available and are identifying genomic variants (see Glossary) associated with numerous types of cancer [1,2]. However, how these genomic variants contribute to the heterogeneous phenotype of a tumor, including clinical outcome and response to therapy, remains largely unknown [3,4]. Therefore, understanding functional consequences of these genomic variants is crucial.
Genomic variants that affect an organism are often categorized into two basic types [5]: loss-of-function (LOF) mutations and gain-of-function (GOF) mutations (Figure 1A). Wild-type alleles typically encode a protein product that is necessary for specific functions. However, when a mutation occurs in the gene, the function of the protein may be lost. These mutations are generally termed LOF mutations. Moreover, according to the degree of LOF mutations, these mutations are further classified as null or leaky mutations. In contrast, a new protein isoform can be generated as a result of a mutation. This new isoform may perform a new and important function. Mutations in this class are generally called GOF mutations (Figure 1A). Notably, GOF mutations produce proteins with new functions and certainly add variety to the cellular system [6]. However, much attention in recent studies has focused on investigating the phenotypic consequence of LOF mutations (Figure 1B), and we still lack knowledge about the functions of GOF mutations. This may be due to a combination of several factors. First, it is more difficult to explore novel functions of mutant proteins compared with testing if the known function is lost. Second, the computational and experimental methods for investigation of GOF mutations are limited. Third, the majority of LOF mutations are located in structured protein domains while GOF mutations are more likely to be enriched in unstructured regions [7], which are less focused-on in current studies. To this point, although the majority of disease-associated genomic variants have been documented to occur in structured protein domains [8-10], an increasing number of studies have revealed critical mutations in other unstructured regions [7,11,12], such as intrinsically disordered regions (IDRs). Furthermore, these mutations have been demonstrated to function through a GOF mechanism [13].
Figure 1. LOF and GOF Mutations in Cancer.
(A) Illustration of loss-of-function (LOF) mutations (left panel) and gain-of-function (GOF) mutations (right panel). Genomic variants can result in, complete loss of protein products (null) or, a decrease in protein level (leaky), as well as, the generation of novel protein isoforms (gain). The colored boxes represent the gene and the circles are the protein products of the gene. Green, wild type; blue, genes with LOF mutations; red, genes with GOF mutations. (B) The bar plots show the number of publications that investigated GOF mutations (top panel) and LOF mutations (bottom panel) in recent years from PubMed. The dot line plots (right y axis) show the cumulative number of publications.
Together, these efforts highlight the importance of paying attention to the functional characterization of GOF mutations in cancer, which will lead to the emergence of a vantage direction for cancer biology. Moreover, computational and experimental methods are emerging to enable genome-wide functional characterization of mutations. Thus, it is time to draw our attention on the function of GOF mutations. In this review, we first provide a brief overview of the functional characterization of GOF variants in human cancer. We then discuss the potential functional and phenotypic consequence of the GOF mutations and describe a toolkit of recent experimental and computational resources used to functionally characterize GOF/LOF mutations. Finally, we aim to provide systematic strategies for connecting the genotypes to the phenotypes by integrating computational and experimental platforms.
Phenotypic Consequence of GOF Mutations in Cancer
The large amount of cancer-related genomic variants now available has created a clear need for the functional characterization of these variants. Understanding the function of these variants should help with distinguishing driver variants from passenger mutations. Previous genotype–phenotype relationships were modeled under the assumption that disease-associated variants lead to complete loss of protein function through changes in protein folding and stability [14]. However, explaining the consequence of several types of variants, such as missense mutations, is difficult. For example, whether the mutations will result in the gain of novel functions by increasing the misfolding and/or instability of the resultant protein is difficult to predict. Moreover, genes and gene products do not function in isolation but function in the context of biological networks [15]. Systematic analysis of the functional interaction profile perturbation induced by the genomic variants has helped distinguish driver mutations from common variants [16,17].
Gain of Structural Domains
Protein domains are evolutionarily conserved regions with independent functional properties. The structure–function relationship encoded in protein domains has been used for understanding the functional effects of disease-related mutations. For example, the most frequent mutations D835 in FLT3, D816 in KIT, and V600 in BRAF, all located in the kinase domains, can cause constitutive activation of these oncogenes [18,19]. However, how these mutations specifically affect the domain structures is still unknown. Genome-wide screens have also identified a number of domain hotspots in various types of cancer [9,20]. In addition, proteins may acquire new structural domains by various mechanisms (Figure 2A). Several types of genomic variants have been demonstrated to play critical roles in domain acquisition, such as gene fusion, extension of exons, and recombination [21]. Gene fusions are becoming increasingly recognized as important players in solid tumors and fusion proteins disproportionately connect with proteins that did not previously interact [22]. However, we still lack knowledge about the effects of these variants, especially the gain of domains, on downstream signaling pathways.
Figure 2. Potential Functional Consequences of Gain-of-Function (GOF) Mutations in the Context of Signaling Pathways.
Genomic variants may generate novel domains (A), intrinsically disordered regions (IDRs) (B), and linear motifs (C), with further gain in the protein/domain interactions. Genomic variants can also generate novel transcription factor (TF) binding sites (D), miRNA-binding sites (E), and RNA-binding protein (RBP)-binding sites (F) to perturb the regulatory network in cancer. Abbreviations: SLiM, short linear motif; UTR, untranslated region.
The biological process and biochemical reactions are mainly regulated by protein interactions, including binary protein–protein interactions (PPIs) and co-complexes [16]. Acquisition of protein domains may allow the protein to gain novel functions by interacting with other domains (Figure 2A). For example, PIK3CA is a frequently mutated gene in cancer [23,24]. The E545K mutation located in the helical domain of PIK3CA is reported to result in the protein gaining the ability to associate with insulin receptor substrate 1 (IRS1), thereby rewiring the oncogenic signaling pathway in cancer (Figure 3A) [25]. Discovery of gained protein interactions will provide detail for interpretation of the cellular mechanism of GOF mutations. Recently, much protein interaction data [26], as well as domain–domain interaction data [27], have become available due to the advancement of large-scale functional screening techniques. These valuable resources will be key evidence for revealing the consequences of GOFs.
Figure 3. Representative Examples of Gain-of-Function (GOF) Mutations That Perturb Signaling Pathways.
(A) PIK3CA plays a key role by recruiting other proteins to the membrane, activating signaling cascades involved in cell growth, survival, proliferation, motility, and morphology. In normal cells, cells are simulated by growth factors (such as receptor tyrosine kinase), PIK3CA is brought to cell membrane through binding of p85 to phospho-IRS1, thereby converting PIP2 to PIP3. An E545K mutation located in the helical domain of PIK3CA improves the ability of the protein to associate with IRS1 and aids in converting PIP2 to PIP3, thereby rewiring the oncogenic signaling pathway in cancer. (B) Gain of dileucine motifs (red spheres) induced by mutations in cytosolic intrinsically disordered regions (IDRs) of GLUT1 and CACNA1H in the cellular membrane and ITPR1 in the endoplasmic reticulum (ER) may result in mislocalization from the plasma membrane to endocytic compartments. (C) C-to-T mutations and CC-to-TT substitutions in the TERT promoter gain binding sites for transcription factors (TFs) from the ETS family, thus, promoting over expression of TERT and resulting in tumor development. (D) C-to-A mutations in the 3′ untranslated region (UTR) of GEPT1 gain a binding site for miR-206*, and ultimately reduce expression of GEPT1 by inhibiting translation. Abbreviations: IRS1, insulin receptor substrate 1; SLiM, short linear motif; TERT, telomerase reverse transcriptase; TFBSs, TF binding sites.
Gain of Novel IDRs
Although proteins are well known to perform their function via a folded 3D structure, recent studies have shown that some regions are unable to fold into tertiary structures. These regions are usually called IDRs, and have been implicated in a number of cancer types [28,29]. With the advent of the IDR concept and its role in cancer, such as acquiring abilities to recognize other molecules and to bind other proteins, DNA, and RNA [30], studies have emphasized the consequences of genomic variants in IDRs [7,12,30]. These findings illustrate the importance of variants in IDRs for cancer manifestation through a gain-of-IDR mechanism (Figure 2B).
A plausible hypothesis for the consequence of variants in IDRs is that they primarily perturb disorder-mediated processes, such as molecular interactions, and consequently the signaling and regulatory networks (Figure 2B). For example, c-Myc has been shown to use its IDRs to perform diverse interactions in cancer [31]. Moreover, Meryer et al. revealed that neurological disease mutations in the IDRs of three proteins (GLUT1, ITPR1, and CACNA1H) could increase clathrin binding through gain of a dileucine motif (Figure 3B) [12]. Further experiments demonstrated that gain of the motif can cause mislocation of GLUT1 from the plasma membrane to endocytic compartments by recruiting several adaptor proteins [12]. In addition, understanding the functional impact of variants in IDRs has another interesting implication. Recent studies suggest that IDRs could serve as drug targets for small molecules [32] due to their importance and over-representation in signaling and major disease pathways. However, the sequences of IDRs change more rapidly during evolution [7] and the transient protein interactions induced by GOFs are difficult to identify. Moreover, computational methods have estimated that 20–25% of the disease mutations were mapped to IDRs [30], and only a small fraction of the IDRs have been functionally characterized so far [33,34]. Recently, an IDR screen was proposed that allows mechanism-independent discovery of IDRs in a cellular context [35]. A library of either random or designed sequences was transformed into cells and screened to discover the functional or non-functional sequences. These experimentally validated data were used to learn the rules of functionality based on machine-learning methods. As it is difficult to discern the link between GOFs and IDRs, integration of these data can help uncover the functional consequences of GOF mutations, as well as allow the design of new therapy targets in cancer.
Gain of SLiMs
Short linear motifs (SLiMs) are protein-binding modules that play major roles in cell signaling, cancer development, and progression [36]. SLiMs generally occur in IDRs and have no stable 3D structure. However, despite their functional importance, few have been characterized. Increasing genomic variants have been discovered to be located in SLiMs, which mediate important interactions among molecules (Figure 2C). For example, β-catenin is a multifunctional protein that is an important component of the Wnt signaling pathway. Mutations in the catenin gene (CTNNB1) are implicated in various cancers through the perturbation of the DEG_SCF_TRCP1_1 SLiM [37,38]. In addition, the mutated proto-oncogene ETV1 is demonstrated to promote prostatic epithelial cell proliferation and tumorigenesis due to the lack of COP1 binding motifs [39]. However, even in the well-characterized cancer pathways, large numbers of functional elements remain undiscovered. In particular, the most recent studies have focused on the consequence of loss of SLiMs because it is time consuming and expensive to investigate the gain of SLiMs. Another issue is that although several studies have documented the interactions among SLiMs and other molecules [33], the number is still limited. By screening the cancer mutations in The Cancer Genome Atlas (TCGA) [40], we found that the number of mutations that can generate novel SLiMs are equal to those that cause a loss of motifs. Taken together, these findings illustrate that the interpretation of the functional effects of GOF mutations using SLiM interaction models is an important area for further research in systems biology.
Gain of TF Binding Sites
Only a small fraction of genomic variants are located in the coding region, with more than 90% of the disease-associated variants falling within the noncoding regions [41]. These noncoding variants can play a functional role by perturbing interactions between transcription factors (TFs) and their binding sites [42]. In addition to the loss of TF-binding sites (TFBSs), a gain of TFBSs has also been observed (Figure 2D). Telomerase reverse transcriptase (TERT), which encodes the subunit of the telomerase enzyme, has been reported to recurrently mutate in many different cancer types [43,44]. These mutations can create binding motifs for TFs in the ETS family, including ternary complex factors, and lead to their binding to the promoter regions of TERT, subsequently upregulating TERT expression (Figure 3C). In addition, the R84C mutation in LHX4, which is located in the LIM domain for the modulation of DNA binding, can also gain interactions [45]. This missense mutation in the LHX4 gene was associated with variable pituitary hormone deficiencies [46].
As super-enhancer regions are likely to recruit many TFs and drive the expression of downstream genes [47], the gain of TFBSs can also occur in enhancer regions. TAL1 (T cell acute lymphocytic leukemia 1) is an oncogene and plays an important role in cell differentiation [48]. The mutations in a super-enhancer upstream of TAL1 can create MYB-binding sites, which results in the overexpression of TAL1 in cancer [49]. A systematic analysis has also found evidence for the acquisition of a novel gene function through enhancer reprogramming, which took place through the acquisition of new TFBSs in 72% of the reprogramming events [50]. These findings suggest that mutations gain novel gene function through the acquisition of novel TFBSs (Figure 2D), which expands the gene regulatory landscapes into new regulatory domains.
Gain of miRNA Binding Sites
Noncoding genomic variants not only affect transcriptional regulation but can also affect other biological processes, such as translation efficiency or splicing by gain of interactions with miRNAs (Figure 2E) and RNA-binding proteins (RBPs) (Figure 2F). It is estimated that more than 60% of human coding genes are targeted by miRNAs, mostly by binding to the 3′ untranslated region (UTR) [51]. Thus, the potential for mutations in the 3′ UTR to disrupt or create new miRNA target sites is not surprising. Such perturbations of miRNA regulation, either loss or gain, may result in oncogene activation or tumor suppressor inactivation, respectively [52]. For example, a genomic variant within the E2F1:MIR136-5p target site has been demonstrated to disrupt miRNA-mediated regulation and leads to increased oncogene activity of E2F1 in colorectal cancer [53]. Similarly, other studies have revealed that genomic variants resulted in the gain of miRNA-binding sites [53-55]. For example, the myostatin (GDF8) mutation (a G to A transition) in the 3′ UTR creates a target site for mir-1 and mir-206, which are highly expressed in skeletal muscle. The gain of miRNA-binding sites causes translational inhibition of this muscle-specific chalone, and hence contributes to muscular hypertrophy [56]. Another mutation (c.*22C>A) in the GFPT1 gene leads to illegitimate binding of miR-206*, resulting in reduced protein expression levels (Figure 3D) [57]. These examples above do not constitute an exhaustive list of all mutations known to gain miRNA binding sites but illustrate the potential ways in which GOF mutations exhibit their functional effects in cancer (Figure 2E).
Gain of RBP Binding Sites
RBPs bind RNA through globular ribosome binding domains (RBDs) and regulate the function of the bound RNAs. With the advancement of high-throughput sequencing technology, hundreds of RBPs have been discovered and investigated [58]. Given the importance of RBPs in regulating gene expression, perturbations in RBP-gene regulation evidently can lead to various types of cancer [59,60]. Evidence has shown that cancer-related mutations are enriched in the binding motifs of RBPs, and subsequently perturb the expression and alternative splicing of the target genes [61]. Recently, we systematically analyzed the mutation perturbed RBP interactome across cancer types. We found that in addition to the suppression of RBP in target regulation, a number of other mutations could activate the regulation as well [62]. Moreover, several studies have constructed databases for exploration of mutations within RBP binding sites [63,64], including GOF mutations. These observations suggest that we can investigate the function of GOF mutations by comprehensive analysis of gained RBP-gene regulation in cancer (Figure 2F).
Experimental Methods for Characterizing GOF Mutations
Functional analysis of cancer mutations is key to understanding the potential mechanisms and developing therapeutic targets. Recent large-scale experimental platforms have revolutionized our ability to characterize cancer mutations. In this section, we summarize the recent experimental platforms that can be used for the characterization of genomic variants from the view that we discussed above (Figure 4).
Figure 4. Experimental Platforms That Are Useful for Investigating the Function of GOF Mutations.
(A) The platforms for identification of protein/domain interactions of gain-of-function (GOF) mutations. Customized domains can be spotted to assay the gained interactions. The yeast two-hybrid (Y2H) system can be used to identify the interacting partners of wild-type and mutated proteins. Affinity purification coupled with mass spectrometry (APMS) identifies changes in protein interaction partners between wild-type and mutant proteins based on antibody-based AP followed by MS. (B) The platforms for investigating protein–DNA interactions. ChIP-seq combines chromatin immunoprecipitation (ChIP) with sequencing to identify DNA sites to which transcription factors (TFs) bind. In enhanced yeast one-hybrid (eY1H) screens, a DNA fragment is cloned upstream of a reporter. Upon binding of the TF of interest, the reporter is turned on and the activity is measured. (C) Experimental platforms for revealing the miRNA–RNA regulation. In Argonaute (AGO)-CLIP-Seq, the binding sequences of AGO protein were aligned to the genome for identification of miRNA binding sites. GeneCopoeia relies on the wild-type and mutated untranslated regions (UTRs) being cloned. Upon binding of the miRNAs of interest, the activity can be measured by luciferase. (D) RNA-centered platforms for identification of RNA-binding protein (RBP)–RNA interactions induced by GOF mutations. The RBPs interacting with wild-type and mutated RNAs were pulled down for MS. In PRIMA, the wild-type and mutant were cloned into bait and RBPs into prey, respectively. The fluorescence activity was measured to identify the interactions. rec-YnH combines batch cloning and transformation with intracellular homologous recombination to generate bait–prey fusion libraries. (E) CRISPR systems are used to introduce genomic variants into cells, and further characterize the function of mutations by integrating other platforms. Cas9 creates DNA double-strand breaks at a specific site, which are repaired either by nonhomologous end joining (NHEJ), generating gene knockouts, or by homology directed repair (HDR), for precise editing. Abbreviations: GPCA, Gaussia princeps luciferase protein-fragment complementation assay; LC-MS/MS, liquid chromatography tandem mass spectrometry; LR, left and right.
Protein Interactome Changes by Genomic Variants
Proteins play critical roles in cells by interacting with other proteins. These protein–protein interactions form an interactome of the cells. Genomic variants can impair protein or domain interaction profiles. To understand the function of GOF mutations, we can predict the function of mutated proteins based on their interactome [65]. However, the gained interacting partners must be identified first. Several large-scale proteomics platforms can be applied to identify the GOF mutation-induced protein/domain interactions (Figure 4A). Peptide SPOT arrays provide a powerful tool for identifying the domain/protein binding partners. Wild-type or mutated peptides can be synthesized directly in discrete spots and assayed for an interaction with a domain/protein of interest [66]. In addition, the high-throughput yeast two-hybrid system (Y2H) [67] and the mammalian-cell-based Gaussia princeps luciferase protein-fragment complementation assay (GPCA) have been implemented to detect PPI alterations [68]. For example, Sahni et al. have investigated several thousand missense mutations based on Y2H and found that two-thirds of disease-associated alleles perturb PPIs [16]. Moreover, Chen et al. also investigated the mutations mediated by PPI perturbations based on a Y2H assay for developmental disorders [69]. Another platform is affinity purification-mass spectrometry (APMS) [70], which is based on the purification of wild-type or mutated protein from cell extracts and the proteins that are bound to the purified protein are determined by mass spectrometry (MS). By comparing the interactomes of wild types and mutants, we can identify the gained interactions induced by the genomic variants. However, several issues need to be addressed in these platforms. Investigating the interaction capability of mutant and wild-type proteins under similar or identical experimental conditions is critical. In addition, the antibodies of some mutated proteins for affinity purification (AP) might be not available and cloning of the mutated proteins is also time and cost consuming. Moreover, qualitative methods are commonly used for detecting the loss of interactions and more quantitative methods are needed to identify the enhanced interactions.
Protein–DNA Interactions Induced by Genomic Variants
ChIP-Seq functional assays have been extensively used for mapping genome-wide TFBSs. To assess the mutation-induced interactome perturbations, wild-type and mutated proteins can be generated and used to infect cells. ChIP-Seq assay comparisons of these cells will reveal the specific TF–gene interactions in the mutation-specific cells (Figure 4B). For example, Zhang et al. generated wild-type KLF5 and three of the recurrent mutants (D418N, E419K, and E419Q) and then collected ChIP-seq data for the wild-type and mutant proteins. ChIP-seq comparative analysis revealed that the E419Q mutant KLF5 gains a number of novel binding sites, further activating the downstream genes that are implicated in tumorigenesis [71].
Moreover, several platforms centered on the gene have emerged, such as the enhanced yeast one-hybrid (eY1H, Figure 4B), and will be more suitable for use when investigating mutational effects on target genes [72]. The wild-type or mutated DNA sequences can be used as bait to search for all possible TFs that can bind to the sequences in the cells. The reporter gene placed downstream of the wild-type or mutated DNA sequences are used to assess whether the TFs can bind. Using this assay, a number of mutations have been found to perturb protein–DNA interactions in diseases [45].
miRNA-Target Regulation Induced by Genomic Variants
The advent of crosslinked immunoprecipitation followed by next generation sequencing (CLIP-Seq) techniques, including high-throughput sequencing (HITS)-CLIP [73], photoactivatable ribonucleoside-enhanced (PAR)-CLIP [74], and individual-nucleotide resolution UV (i)CLIP [75], have helped in identifying miRNA targets with higher confidence (Figure 4C). Although the miRNA-binding profiles in wild-type and mutated genomes can be compared, mismatch issues in the process of alignment may cause confusion. For example, it is difficult to distinguish if the unaligned reads are caused by mismatch or mutations. Because only the reads that are exactly aligned to the genome to identify the genomic regions being bound by miRNAs can be analyzed, determining the exact miRNAs is still challenging.
The GeneCopoeia platform may be more suitable for identification of the gained interaction between miRNA and 3′ UTRs (https://www.genecopoeia.com/). This platform offers genome-wide 3′ UTR target clones, and the miRNA–mRNA target interaction can be elucidated with a live cell assay for the Gaussia luciferase (GLuc) reporter gene (Figure 4C). Similarly, we can also clone the mutated 3′ UTR and perform the same assays [76]. In addition to the use of GLuc as a reporter, a secreted alkaline phosphatase (SEAP) reporter can also be cloned into the same vector, which serves as the internal control [77]. This dual-reporter vector system enables normalization for an accurate comparison between wild type and mutant.
Protein–RNA Regulation Induced by Genomic Variants
Identification of RNA targets of RBPs has been a technically challenging task. As different cell types generally express different RBPs, the same RBPs can bind to different targets within a different cellular context [62,78]. The methods for identifying RBP–gene interactions can be classified into protein-centric approaches, which reveal all RNAs bound to a specific protein, and RNA-centric approaches, which characterize all proteins interacting with an RNA of interest [79]. Thus, the first class of methods will be useful for characterizing mutations in RBPs, as the second class of methods are better for mutations in target RNAs. Protein–RNA complexes are pulled down using biotinylated oligos that are complementary to the sequence of wild-type or mutated RNAs. After RNA digestion, the RBPs can be identified by western blotting or MS (Figure 4D) [80].
Furthermore, the Protein–RNA Interaction Mapping Assay (PRIMA) is another useful platform for identifying gain of interactions [81]. An RNA ‘bait’ (wild-type or mutated RNA) can be tested versus multiple RBP ‘preys’ in a single experiment. New yeast two and three-hybrid-based screen libraries (rec-YnH) have been developed recently (Figure 4D), which allow interactions between multiple RBPs and RNAs to be mapped simultaneously [82]. This assay has been used to map interactions of protein domains and reveals novel putative interacting partners of PAR proteins. All of these methods provide a complementary tool to expand the depth and scale of the investigation in order to explore the wild type and the mutation induced RNA–RBP interactome in cancer.
CRISPR-Based Functional Screening of GOF Mutations
Recent advances in CRISPR-associated genome engineering technologies are enabling the systematic interrogation of GOF or LOF mutations (Figure 4E) [83]. Increasing numbers of studies have revealed the functions for a number of mutations. For example, a point mutation in the α6 subunit of the nicotinic acetylcholine receptor was introduced by CRISPR editing. RT-PCR and sequencing confirmed the presence of the point mutation [84]. Moreover, a marked decrease in sensitivity to spinosad was observed in the mutated condition compared with the wild type, which indicates that the point mutation is related to resistance of spinosad [84]. Recently, approximately 4000 genomic variants of the BRCA1 gene have been engineered into human cells based on CRISPR editing, and the functions of these mutations have been characterized [85]. Moreover, a CRISPR screen was also used to characterize the functional consequence of multiple variants in driving liver tumorigenesis, and provided a powerful tool for mapping a functional atlas of tumor suppressors in vivo [86]. Taken together, these observations demonstrate the ability of CRISPR to induce specific genomic variants. We can integrate other functional platforms to further investigate the functional consequences (such as signaling pathway perturbations or gene expression alterations) of the candidate LOF or GOF mutations.
Computational Resources and an Integrative Framework for Investigating GOF Mutations
With high volumes of genomic data being generated by high-throughput sequencing, the use of functional assays is a major bottleneck in testing large numbers of GOF and LOF mutations. Increasing numbers of computational resources have been developed to aid in predicting and characterizing the structures and functions of genomic variants from genomic data (Table 1). The incorporation of these resources with the experimental approaches described above may allow for the structure and function of GOF mutations to be predicted (Figure 5, Key Figure).
Table 1.
Computational Resources for Functional Characterization of GOF Mutations
| Tools | Description | Web | Refs |
|---|---|---|---|
| Domain, IDR, and SLiMs identification | |||
| HMMER | Based on profiles of hidden Markov models, gathers four algorithms to find evolutionarily related proteins and/or domains. | https://www.ebi.ac.uk/Tools/hmmer/ | [87] |
| EMBOSS | Integrates a range of currently available packages and tools for sequence analysis into a seamless whole. | http://emboss.sourceforge.net/ | [88] |
| IUPred2A | Collects predictions for disordered regions and disordered binding segments, and can highlight redox-sensitive regions in proteins based on the energy estimation method. | https://iupred2a.elte.hu/ | [89] |
| FoldUnfold | A web server that calculates the expected packing density profiles along an amino acid sequence, and uses this density for predicting the state of protein with an unknown 3D structure, either folded or unfolded. | http://bioinfo.protres.ru/ogu/ | [90] |
| Eukaryotic Linear Motif (ELM) resource | A manually curated database of SLiMs. | http://elm.eu.org | [107] |
| SLiMFinder | A de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. | http://bioware.ucd.ie/~compass/biowareweb/ | [108,109] |
| SLiMSearch | A web-based tool for the discovery of novel SLiM instances in a proteome. | http://slim.ucd.ie/slimsearch/ | [110] |
| TF, miRNA, and RBP binding sites prediction | |||
| TRANSFAC | Provides data on eukaryotic transcription factors, their experimentally proven binding sites, consensus binding sequences (positional weight matrices), and regulated genes. | http://gene-regulation.com/pub/databases.html | [91] |
| ChIPBase | A database that integrates ChIP-Seq peak datasets of TFs, transcription cofactors, chromatin-remodeling factors, other DNA-binding proteins, and histone modifications. | http://rna.sysu.edu.cn/chipbase/ | [95,111] |
| TargetScan | Prediction of biological targets of miRNAs | http://www.targetscan.org/vert_72/ | [92] |
| MiRanda | Identifies potential miRNA target sites in genomic sequences | http://www.microrna.org/microrna/microrna/getDownloads.do | [93] |
| CISBP-RNA | An online library of RBPs and their motifs. | http://cisbp-rna.ccbr.utoronto.ca/ | [112] |
| CLIPdb | Annotated CLIP-seq data sets and RBPs, and provides a user-friendly interface for quick navigation of the CLIP-seq data | http://lulab.life.tsinghua.edu.cn/clipdb/ | [113] |
| starBase | Deciphers protein–RNA and miRNA–target interactions | http://starbase.sysu.edu.cn/ | [96] |
| RAID | Provides the scientific community with an all-in-one resource for efficient browsing and extraction of RNA-associated interactions. | http://www.rna-society.org/raid/ | [114] |
| ATtRACT | A database of RBPs and associated motifs | https://attract.cnic.es/ | [97] |
| Mutation functional annotation resource | |||
| Variant Effect Predictor | Determines the effect of variants on genes, transcripts, and protein sequence, as well as regulatory regions. | http://www.ensembl.org/info/docs/tools/vep/index.html | [98] |
| mutfunc | A database of mutations occurring in functionally important regions or that are predicted to disrupt protein structure stability, protein interaction interfaces, post-translational modifications (PTMs), protein translation, conserved regions, and regulatory regions. | http://www.mutfunc.com/ | [99] |
| Structure-PPi | Reports features that overlap in mutations, or that are in close physical proximity. The features reported include protein domains, variants, helices, and ligand binding residues. | http://structureppi.bioinfo.cnio.es/Structure | [115] |
| MutaBind | Screening impact of single-site mutations on binding affinity within proteins | https://www.ncbi.nlm.nih.gov/research/mutabind/ | [101] |
| dSysMap | Mapping of mutations on protein structures and on interaction interfaces. | https://dsysmap.irbbarcelona.org/ | [100] |
| Mutation functional annotation resource | |||
| iRegNet3D | Integrates TF–TF interactions, TF–DNA interactions, and chromatin–chromatin interactions as well as topologically associated domains to study mutation/gene–phenotype relationships. | http://iregnet3d.yulab.org/index/ | [11] |
| COPE-TFBS | Predicts variant effects on TFBSs using a context-sensitive approach. | http://cope.cbi.pku.edu.cn/run_TFBS.html | NA |
| LincSNP | A database that aims specifically to store and annotate disease-associated variants in long noncoding RNAs and their TFBSs. | http://210.46.80.146/lincsnp/ | [102] |
| DisoRDPbind | Predicts the RNA-, DNA-, and protein-binding residues located in intrinsically disordered regions. | http://biomine.cs.vcu.edu/servers/DisoRDPbind/ | [116] |
| SomamiR | A database of cancer somatic mutations in miRNA and their target sites that potentially alter the interactions. | http://compbio.uthsc.edu/SomamiR/ | [55] |
| PolymiRTS | An integrated platform for analyzing the functional impact of genetic polymorphisms in miRNA seed regions and miRNA target sites. | http://compbio.uthsc.edu/miRSNP/ | [103] |
| MSDD | A manually curated database of experimentally supported associations between miRNAs, SNPs, and human diseases | http://www.bio-bigdata.com/msdd/ | [117] |
| RBP-Var | Provides annotation of functional variants involved in post-transcriptional interaction and regulation. | http://www.rbp-var.biols.ac.cn/ | [104] |
| ASPRIN | A computational method to identify genetic variants that alter RBP–RNA interactions. | https://github.com/Xinglab/ASPRIN | [105] |
| NMDEscPredictor | A method to provide a rank for genetic susceptibility to disease using GOF versus LOF. | https://nmdprediction.shinyapps.io/nmdescpredictor/ | [106] |
Figure 5. Key Figure.
(A) For each mutation of interest, the mutated genome and protein sequence can be generated in silico. (B) Next, for the wild-type and mutated sequences, the domains, intrinsically disordered regions (IDRs), and short linear motifs (SLiMs) can be predicted by computational methods. (C) In addition, the interacting partners, such as protein–protein interaction (PPI), transcription factor (TF)–gene, miRNA–gene, and RNA binding protein (RBP)–gene, can also be predicted by available tools. By comparison of the wild-type and mutated protein interacting profiles, we can identify the gained interactions. The gained interactions can be validated by experimental methods. (D) Finally, the predicted cancer hallmarks related functions can be further validated by cell proliferation or colony formation assay. Abbreviations: APMS, affinity purification MS; CLIP, crosslinked immunoprecipitation; eY1H, enhanced yeast one-hybrid; GPCA, Gaussia princeps luciferase protein-fragment complementation assay; HITS-CLIP, high-throughput sequencing CLIP; iCLIP, individual-nucleotide resolution UV CLIP MS, mass spectrometry; PARS-CLIP, photoactivatable ribonucleoside-enhanced CLIP; PRIMA, Protein-RNA Interaction Mapping Assay; rec-YnH, new yeast two and three-hybrid-based screen libraries; Y2H, yeast two-hybrid
The first step in identifying the functionally important structural regions is to compare the wild-type and mutated genome to identify the lost or gained functional units, such as domains, IDRs, or SLiMs. Several computational packages are available for this task, including HMMER [87] and EMBOSS [88] for protein domain identification and IUPred2A [89] and FoldUnfold [90] for identification of IDRs. The Eukaryotic Linear Motif (ELM) resource mainly focuses on annotation and detection of SLiMs [33]. This resource also includes the interactions among SLiMs and protein domains. In contrast to these structural units, the identification of TFBSs and miRNA binding sites has made great progress. TRANSFAC provides the most comprehensive collection of experimentally determined TFBSs and positional weight matrices for TFs [91]. In addition, TargetScan [92] and miRanda [93] are generally used for predicting biological targets of miRNAs by searching the binding sites that match the seed region of each miRNA (Table 1). No doubt these computational methods can be used to identify the TF/miRNA binding sites in mutated genome sequences. However, the major problem of these methods is the false-positive rates, and some predictions are often biologically irrelevant [94]. In addition, the sequence screen methods may not capture the active targets as the TF/miRNA-gene regulation is context specific. To address this issue, ChIP-Seq and CLIP-Seq data can be integrated to identify the context-specific regulation. ChIPBase [95] and starBase [96] are valuable resources for use in deciphering TF– and miRNA–target interactions based on sequencing data. Despite advances in recent years, tools to identify RBPs are lacking. ATtRACT is a database of RBPs and associated motifs, which compiles information on 370 RBPs and ~1600 binding motifs [97].
In contrast to the identification of functional regions, several computational resources have been proposed to analyze the functional effects of the LOF or GOF mutations. Variant Effect Predictor [98] and mutfunc [99] are useful methods for functionally annotating the genomic variants, including whether the variants occur in functionally important regions or are predicted to disrupt protein structure stability, protein interaction interfaces, or regulatory regions. Moreover, several methods have been proposed to predict mutation-induced interactome changes, which can be used to identify driver mutations. For example, dSysMap predicts drivers by mapping mutations to the human interactome and evaluating whether these mutations are on the interaction interfaces [100]. The computational method MutaBind evaluates the effects of the genomic variants on protein interactions and calculates the changes in binding affinity, which may identify the gained interaction [101]. Additional tools annotate the genomic variants by calculating the changes of TF/miRNA/RBP binding sites in coding and noncoding regions, and these genomic variants are compiled in SomamiR [55], LincSNP [102], PolymiRTS [103], and RBP-Var [104]. Finally, several computational methods have been proposed to identify the genetic variants that alter the RBP–RNA interactions [105], as well as prioritizing the GOF mutations in complex diseases [106]. Taken together, these available computational resources are invaluable in functional characterization of GOF mutations, as well as predicting the number of cancer-related mutations in the context of signaling networks.
Concluding Remarks
Proteins that possess GOF mutations have been shown to play critical roles in the development and progression of cancer. There is now an urgent need to develop methods that can efficiently integrate different data sets to prioritize our understanding of candidate driver GOF mutations (see Outstanding Questions). Computational predictions of the structure and function of proteins with GOF mutations described herein enable downstream experimental validations of perturbed interactomes and signaling pathways. This combination of computational and experimental techniques is key to providing steps towards the functional validation of GOF mutations (Figure 5). Further, comprehensive understanding of GOF mutations will be supplemented by emerging techniques such as single cell sequencing which can incorporate how cell type specificity and tumor microenvironment further perturb signaling pathways. Such understanding of cellular pathways perturbed by GOF mutations will ultimately offer novel and effective strategies for development of the next generation of cancer therapeutics. Taken together, the integration of computational and experimental techniques provides a framework for understanding the mechanisms of GOF mutations and unveils their contribution to cancer development and progression. A better understanding of these perturbed genetic and epigenetic pathways will facilitate characterization of GOF mutations and develop personalized therapeutic targets.
Outstanding Questions.
How can we better classify and distinguish LOF and GOF mutations?
What genomic features, such as allele frequency and subcellular location, determine whether the mutation will result in LOF or GOF?
How can multiomics data, as well as single cell sequencing data, be integrated to determine the function of GOF mutations?
How can the functional effects of mutations be deconvoluted in a manner specific to cell type or cancer type?
How can the tumor microenvironment be integrated for the functional characterization of the mutations?
How do LOF and GOF mutations influence chromatin interactions and enhancer–promoter interactions?
How can LOF and GOF mutations in noncoding regions be integrated into understanding their biological roles?
Can comprehensive understanding of network perturbations by GOF mutations enable novel and effective strategies for developing novel drugs?
To what extent and how do GOF mutations contribute to phenotypic heterogeneity and cancer therapy?
Highlights.
GOF mutations are an important type of mutation that occur in various types of cancer but have garnered little attention.
GOF mutations play crucial roles in the development and progression of various types of cancer.
Systematic analysis of the interaction and regulator networks perturbed by GOF mutations can effectively reveal the functional consequences.
The combination of computational and experimental approaches is instrumental in identifying driver GOF mutations as well as identifying potential therapeutic targets.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (grants 31571331, 31871338, 61873705, 61502126, 61603116) and the NIH/NCI Award Grant K22CA214765 (S.Y.). The funders played no roles in the study design, decision to publish, or preparation of the manuscript. We would like to apologize to all our colleagues whose important work could not be cited here owing to space restrictions. We also thank anonymous reviewers and the editors for their constructive feedback to improve the paper.
Glossary
- ChIP-Seq functional assays
method used to identify protein–DNA interactions, which combines chromatin immunoprecipitation with DNA sequencing to identify the binding sites of DNA-associated proteins.
- Co-complexes
observed in the same protein complex, which are a form of quaternary structure.
- Common variants
variants found in the human population; not necessarily disease causing.
- Driver mutations
mutations that are causally implicated in oncogenesis.
- Gain-of-function (GOF) mutations
type of genetic variant in which altered genes or noncoding RNAs possess a new molecular function or a new pattern of expression.
- Genomic variants
differences between the DNA sequence of an individual when compared with the DNA sequence of a reference genome.
- Interactome
entire set of molecular interactions in a particular cell.
- Intrinsically disordered regions (IDRs)
protein regions that lack a stable tertiary structure.
- Loss-of-function (LOF) mutations
genetic variants that are predicted to disrupt the function of coding genes and noncoding RNAs.
- miRNA
small noncoding RNA of approximately 22 nucleotides that functions in post-transcriptional regulation of gene expression via base pairing.
- Missense mutation
point mutation in which a single nucleotide change results in a codon that codes for a different amino acid.
- Passenger mutations
mutations that do not confer a clonal growth advantage and do not contribute to cancer development.
- Positional weight matrices
a commonly used representation of motifs in biological sequences.
- Protein domains
conserved part of a given protein sequence and tertiary structure that can evolve, function, and exist independently of the rest of the protein chain.
- RNA-binding proteins (RBPs)
proteins that contain various structural motifs to bind double- or single-stranded RNAs.
- Short linear motifs (SLiMs)
self-sufficient functional sequences that specify interaction sites for other molecules and thus mediate a multitude of functions.
- Super-enhancer regions
regions of the genome composed of multiple enhancers that are collectively bound by other proteins and drive transcription of genes.
- Transcription factors (TFs)
proteins that control the rate of transcription of genetic information from DNA to mRNA, by binding to a specific DNA sequence and can interact with number of other proteins.
References
- 1.MacArthur J et al. (2017) The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Brookes AJ and Robinson PN (2015) Human genotype–phenotype databases: aims, challenges and opportunities. Nat. Rev. Genet 16, 702–715 [DOI] [PubMed] [Google Scholar]
- 3.Dagogo-Jack I and Shaw AT (2018) Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol 15, 81–94 [DOI] [PubMed] [Google Scholar]
- 4.Hunter KW et al. (2018) Genetic insights into the morass of metastatic heterogeneity. Nat. Rev. Cancer 18, 211–223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Griffiths AJF et al. (2000) An Introduction to Genetic Analysis (7th edn), W. H. Freeman [Google Scholar]
- 6.Lugo-Martinez J et al. (2016) The loss and gain of functional amino acid residues is a common mechanism causing human inherited disease. PLoS Comput. Biol. 12, e1005091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Li XH and Babu MM (2018) Human diseases from gain-of-function mutations in disordered protein regions. Cell 175, 40–42 [DOI] [PubMed] [Google Scholar]
- 8.Meyer MJ et al. (2018) Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Miller ML et al. (2015) Pan-cancer analysis of mutation hotspots in protein domains. Cell Syst 1, 197–209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ng PK et al. (2018) Systematic functional annotation of somatic mutations in cancer. Cancer Cell 33, 450–462.e10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Liang S et al. (2017) iRegNet3D: three-dimensional integrated regulatory network for the genomic analysis of coding and non-coding disease mutations. Genome Biol 18, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Meyer K et al. (2018) Mutations in disordered regions can cause disease by creating dileucine motifs. Cell 175, 239–253 e17 [DOI] [PubMed] [Google Scholar]
- 13.Vaughan CA et al. (2017) Gain-of-function p53 activates multiple signaling pathways to induce oncogenicity in lung cancer cells. Mol. Oncol 11, 696–711 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Subramanian S and Kumar S (2006) Evolutionary anatomies of positions and types of disease-associated and neutral amino acid mutations in the human genome. BMC Genomics 7, 306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Barabasi AL et al. (2011) Network medicine: a network-based approach to human disease. Nat. Rev. Genet 12, 56–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sahni N et al. (2015) Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yi S et al. (2017) Functional variomics and network perturbation: connecting genotype to phenotype in cancer. Nat. Rev. Genet 18, 395–410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dibb NJ et al. (2004) Switching on kinases: oncogenic activation of BRAF and the PDGFR family. Nat. Rev. Cancer 4, 718–727 [DOI] [PubMed] [Google Scholar]
- 19.Greenman C et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Buljan M et al. (2010) Quantifying the mechanisms of domain gain in animal proteins. Genome Biol 11, R74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Marsh JA and Teichmann SA (2010) How do proteins gain new domains? Genome Biol 11, 126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Latysheva NS et al. (2016) Molecular principles of gene fusion mediated rewiring of protein interaction networks in cancer. Mol. Cell 63, 579–592 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kim JY et al. (2017) Clinical implications of genomic profiles in metastatic breast cancer with a focus on TP53 and PIK3CA, the most frequently mutated genes. Oncotarget 8, 27997–28007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chen L et al. (2018) Characterization of PIK3CA and PIK3R1 somatic mutations in Chinese breast cancer patients. Nat. Commun 9, 1357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hao Y et al. (2013) Gain of interaction with IRS1 by p110alpha-helical domain mutants is crucial for their oncogenic functions. Cancer Cell 23, 583–593 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Luck K et al. (2017) Proteome-scale human interactomics. Trends Biochem. Sci 42, 342–354 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kim Y et al. (2012) IDDI: integrated domain-domain interaction and protein interaction analysis system. Proteome Sci 10, S9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Desai MA et al. (2015) An intrinsically disordered region of methyl-CpG binding domain protein 2 (MBD2) recruits the histone deacetylase core of the NuRD complex. Nucleic Acids Res. 43, 3100–3113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ganguly D and Chen J (2015) Modulation of the disordered conformational ensembles of the p53 transactivation domain by cancer-associated mutations. PLoS Comput. Biol 11, e1004247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Vacic V and Lakoucheva LM (2012) Disease mutations in disordered regions – exception to the rule? Mol. BioSyst 8, 27–32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kumar D et al. (2017) Therapeutic interventions of cancers using intrinsically disordered proteins as drug targets: c-Myc as model system. Cancer Informat Published online March 16, 2017. 10.1177/1176935117699408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Metallo SJ (2010) Intrinsically disordered proteins are potential drug targets. Curr. Opin. Chem. Biol 14, 481–488 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gouw M et al. (2018) The eukaryotic linear motif resource – 2018 update. Nucleic Acids Res 46, D428–D434 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Staller MV et al. (2018) A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst 6, 444–455.e6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ravarani CN et al. (2018) High-throughput discovery of functional disordered regions: investigation of transactivation domains. Mol. Syst. Biol 14, e8190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Meszaros B et al. (2017) Degrons in cancer. Sci. Signal 10, eaak9982. [DOI] [PubMed] [Google Scholar]
- 37.Provost E et al. (2005) Functional correlates of mutation of the Asp32 and Gly34 residues of beta-catenin. Oncogene 24, 2667–2676 [DOI] [PubMed] [Google Scholar]
- 38.Wang X et al. (2008) Association of genetic variation in genes implicated in the beta-catenin destruction complex with risk of breast cancer. Cancer Epidemiol. Biomark. Prev 17, 2101–2108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Vitari AC et al. (2011) COP1 is a tumour suppressor that causes degradation of ETS transcription factors. Nature 474, 403–406 [DOI] [PubMed] [Google Scholar]
- 40.Ellrott K et al. (2018) Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst 6, 271–281.e7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Maurano MT et al. (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Khurana E et al. (2016) Role of non-coding sequence variants in cancer. Nat. Rev. Genet 17, 93–108 [DOI] [PubMed] [Google Scholar]
- 43.Huang FW et al. (2013) Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Heidenreich B et al. (2014) TERT promoter mutations in cancer development. Curr. Opin. Genet. Dev 24, 30–377 [DOI] [PubMed] [Google Scholar]
- 45.Fuxman Bass JI et al. (2015) Human gene-centered transcription factor networks for enhancers and disease variants. Cell 161, 661–673 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pfaeffle RW et al. (2008) Three novel missense mutations within the LHX4 gene are associated with variable pituitary hormone deficiencies. J. Clin. Endocrinol. Metab 93, 1062–1071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hnisz D et al. (2013) Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhou RQ et al. (2014) Transcription factor SCL/TAL1 mediates the phosphorylation of MEK/ERK pathway in umbilical cord blood CD34(+) stem cells during hematopoietic differentiation. Blood Cells Mol. Dis 53, 39–46 [DOI] [PubMed] [Google Scholar]
- 49.Mansour MR et al. (2014) Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Flores MA and Ovcharenko I (2018) Enhancer reprogramming in mammalian genomes. BMC Bioinf 19, 316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Friedman RC et al. (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92–105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Nicoloso MS et al. (2010) Single-nucleotide polymorphisms inside microRNA target sites influence tumor susceptibility. Cancer Res 70, 2789–2798 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Lopes-Ramos CM et al. (2017) E2F1 somatic mutation within miRNA target site impairs gene regulation in colorectal cancer. PLoS One 12, e0181153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zhao H et al. (2017) Fixed differences in the 3′UTR of buffalo PRNP gene provide binding sites for miRNAs post-transcriptional regulation. Oncotarget 8, 46006–46019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bhattacharya A and Cui Y (2016) SomamiR 2.0: a database of cancer somatic mutations altering microRNA-ceRNA interactions. Nucleic Acids Res 44, D1005–D1010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Clop A et al. (2006) A mutation creating a potential illegitimate microRNA target site in the myostatin gene affects muscularity in sheep. Nat. Genet 38, 813–818 [DOI] [PubMed] [Google Scholar]
- 57.Dusl M et al. (2015) A 3′-UTR mutation creates a microRNA target site in the GFPT1 gene of patients with congenital myasthenic syndrome. Hum. Mol. Genet 24, 3418–3426 [DOI] [PubMed] [Google Scholar]
- 58.Dominguez D et al. (2018) Sequence, structure, and context preferences of human RNA binding proteins. Mol. Cell 70, 854–867.e9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lukong KE et al. (2008) RNA-binding proteins in human genetic disease. Trends Genet 24, 416–425 [DOI] [PubMed] [Google Scholar]
- 60.Neelamraju Y et al. (2018) Mutational landscape of RNA-binding proteins in human cancers. RNA Biol 15, 115–129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Singh B et al. (2018) Genome sequencing and RNA-motif analysis reveal novel damaging noncoding mutations in human tumors. Mol. Cancer Res 16, 1112–1124 [DOI] [PubMed] [Google Scholar]
- 62.Li Y et al. (2018) MERIT: systematic analysis and characterization of mutational effect on RNA interactome topology. Hepatology Published online August 28, 2018. 10.1002/hep.30242 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hu B et al. (2017) POSTAR: a platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins. Nucleic Acids Res 45, D104–D114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Xuan JJ et al. (2018) RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res 46, D327–D334 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Teng Z et al. (2017) Revealing protein functions based on relationships of interacting proteins and GO terms. J. Biomed. Semant 8, 27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Briant DJ et al. (2009) Rapid identification of linear protein domain binding motifs using peptide SPOT arrays. Methods Mol. Biol 570, 175–185 [DOI] [PubMed] [Google Scholar]
- 67.Fields S and Song O (1989) A novel genetic system to detect protein-protein interactions. Nature 340, 245–246 [DOI] [PubMed] [Google Scholar]
- 68.Yi S et al. (2017) Base-resolution stratification of cancer mutations using functional variomics. Nat. Protoc 12, 2323–2341 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Chen S et al. (2018) An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders. Nat. Genet 50, 1032–1040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Ho Y et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 [DOI] [PubMed] [Google Scholar]
- 71.Zhang X et al. (2018) Somatic superenhancer duplications and hotspot mutations lead to oncogenic activation of the KLF5 transcription factor. Cancer Discov 8, 108–125 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Reece-Hoyes JS et al. (2011) Yeast one-hybrid assays for gene-centered human gene regulatory network mapping. Nat. Methods 8, 1050–1052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Chi SW et al. (2009) Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479–486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hafner M et al. (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129–141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Konig J et al. (2011) iCLIP – transcriptome-wide mapping of protein-RNA interactions with individual nucleotide resolution. J. Vis. Exp Published online April 30, 2011. 10.3791/2638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hayashi M et al. (2019) Autoregulation of osteocyte Sema3A orchestrates estrogen action and counteracts bone aging. Cell Metab 29, 627–637.e5 [DOI] [PubMed] [Google Scholar]
- 77.Yang TT et al. (1997) Quantification of gene expression with a secreted alkaline phosphatase reporter system. Biotechniques 23, 1110–1114 [DOI] [PubMed] [Google Scholar]
- 78.Van Nostrand EL et al. (2016) Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods 13, 508–514 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Marchese D et al. (2016) Advances in the characterization of RNA-binding proteins. Wiley Interdiscip. Rev. RNA 7, 793–810 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Chu C et al. (2012) Chromatin isolation by RNA purification (ChIRP). J. Vis. Exp Published online March 25, 2012. 10.3791/3912 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Tamburino AM et al. (2017) PRIMA: a gene-centered, RNA-to-protein method for mapping RNA-protein interactions. Translation (Austin) 5, e1295130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Yang JS et al. (2018) rec-YnH enables simultaneous many-by-many detection of direct protein–protein and protein–RNA interactions. Nat. Commun 9, 3747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Hsu PD et al. (2014) Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262–1278 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Zimmer CT et al. (2016) A CRISPR/Cas9 mediated point mutation in the alpha 6 subunit of the nicotinic acetylcholine receptor confers resistance to spinosad in Drosophila melanogaster. Insect Biochem. Mol. Biol 73, 62–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Findlay GM et al. (2018) Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Wang G et al. (2018) Mapping a functional cancer genome atlas of tumor suppressors in mouse liver using AAV-CRISPR-mediated direct in vivo screening. Sci. Adv 4, eaao5508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Potter SC et al. (2018) HMMER web server: 2018 update. Nucleic Acids Res 46, W200–W204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Rice P et al. (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16, 276–277 [DOI] [PubMed] [Google Scholar]
- 89.Meszaros B et al. (2018) IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res 46, W329–W337 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Galzitskaya OV et al. (2006) FoldUnfold: web server for the prediction of disordered regions in protein chain. Bioinformatics 22, 2948–2949 [DOI] [PubMed] [Google Scholar]
- 91.Wingender E (2008) The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief. Bioinform 9, 326–332 [DOI] [PubMed] [Google Scholar]
- 92.Agarwal V et al. (2015) Predicting effective microRNA target sites in mammalian mRNAs. eLife 4, e05005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Betel D et al. (2008) The microRNA.org resource: targets and expression. Nucleic Acids Res 36, D149–D153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Pinzon N et al. (2017) microRNA target prediction programs predict many false positives. Genome Res 27, 234–245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Zhou KR et al. (2017)ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data. Nucleic Acids Res 45, D43–D50 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Li JH et al. (2014) starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 42, D92–D97 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Giudice G et al. (2016) ATtRACT – a database of RNA-binding proteins and associated motifs. Database (Oxford). Published online April 7, 2016. 10.1093/database/baw035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.McLaren W et al. (2016) The Ensembl Variant Effect Predictor. Genome Biol 17, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Wagih O et al. (2018) Comprehensive variant effect predictions of single nucleotide variants in model organisms. bioRxiv Published online May 2, 2018. 10.1101/313031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Mosca R et al. (2015) dSysMap: exploring the edgetic role of disease mutations. Nat. Methods 12, 167–168 [DOI] [PubMed] [Google Scholar]
- 101.Li M et al. (2016) MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions. Nucleic Acids Res 44, W494–W501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Ning S et al. (2017) LincSNP 2.0: an updated database for linking disease-associated SNPs to human long non-coding RNAs and their TFBSs. Nucleic Acids Res 45, D74–D78 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Bhattacharya A et al. (2014) PolymiRTS Database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways. Nucleic Acids Res 42, D86–D91 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Mao F et al. (2016) RBP-Var: a database of functional variants involved in regulation mediated by RNA-binding proteins. Nucleic Acids Res 44, D154–D163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Bahrami-Samani E and Xing Y (2019) (2019) Discovery of allele-specific protein-RNA interactions in human transcriptomes. Am. J. Hum. Genet 104, 492–502 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Coban-Akdemir Z et al. (2018) Identifying genes whose mutant transcripts cause dominant disease traits by potential gain-of-function alleles. Am. J. Hum. Genet 103, 171–187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Dinkel H et al. (2016) ELM 2016 – data update and new functionality of the eukaryotic linear motif resource. Nucleic Acids Res 44, D294–D300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Davey NE et al. (2010) SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nucleic Acids Res 38, W534–W539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Edwards RJ et al. (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS One 2, e967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Krystkowiak I and Davey NE (2017) SLiMSearch: a framework for proteome-wide discovery and annotation of functional modules in intrinsically disordered regions. Nucleic Acids Res 45, W464–W469 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Yang JH et al. (2013) ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res 41, D177–D187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Ray D et al. (2013) A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Yang YC et al. (2015) CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC Genomics 16, 51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Yi Y et al. (2017) RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res 45, D115–D118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Vazquez M et al. (2015) Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein–protein interfaces. Bioinformatics 31, 2397–2399 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Peng Z and Kurgan L (2015) High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder. Nucleic Acids Res 43, e121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Yue M et al. (2018) MSDD: a manually curated database of experimentally supported associations among miRNAs, SNPs and human diseases. Nucleic Acids Res 46, D181–D185 [DOI] [PMC free article] [PubMed] [Google Scholar]





