Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2014 Jul 26;30(21):3109–3114. doi: 10.1093/bioinformatics/btu499

e-Driver: a novel method to identify protein regions driving cancer

Eduard Porta-Pardo 1, Adam Godzik 1,*
PMCID: PMC4609017  PMID: 25064568

Abstract

Motivation: Most approaches used to identify cancer driver genes focus, true to their name, on entire genes and assume that a gene, treated as one entity, has a specific role in cancer. This approach may be correct to describe effects of gene loss or changes in gene expression; however, mutations may have different effects, including their relevance to cancer, depending on which region of the gene they affect. Except for rare and well-known exceptions, there are not enough data for reliable statistics for individual positions, but an intermediate level of analysis, between an individual position and the entire gene, may give us better statistics than the former and better resolution than the latter approach.

Results: We have developed e-Driver, a method that exploits the internal distribution of somatic missense mutations between the protein’s functional regions (domains or intrinsically disordered regions) to find those that show a bias in their mutation rate as compared with other regions of the same protein, providing evidence of positive selection and suggesting that these proteins may be actual cancer drivers. We have applied e-Driver to a large cancer genome dataset from The Cancer Genome Atlas and compared its performance with that of four other methods, showing that e-Driver identifies novel candidate cancer drivers and, because of its increased resolution, provides deeper insights into the potential mechanism of cancer driver genes identified by other methods.

Availability and implementation: A Perl script with e-Driver and the files to reproduce the results described here can be downloaded from https://github.com/eduardporta/e-Driver.git

Contact: adam@godziklab.org or eppardo@sanfordburnham.org

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

The landscape of cancer somatic mutations revealed by projects such as The Cancer Genome Atlas (TCGA) (Chang et al., 2013) or the International Cancer Genome Consortium (Hudson et al., 2010) is overwhelmingly complex, as hundreds of thousands of different mutations, ranging from large genomic rearrangements to point missense mutations, have been identified in different cancer samples (Ciriello et al., 2013, Kandoth et al., 2013). Several approaches have been developed to identify which genes are likely driving the carcinogenic process (driver genes). Such methods rely on the hypothesis that driver genes should be under positive selection in the cancer environment. Methods in this category include those that try to identify genes with higher-than-expected-by-chance mutation rates, such as MuSiC (Dees et al., 2012), or those that tend to accumulate highly damaging mutations, such as OncodriveFM (Gonzalez-Perez and Lopez-Bigas, 2012). More recently, methods that focus on the internal distribution of mutations along a protein have also been developed. For example, OncodriveCLUST (Tamborero et al., 2013b) looks for regions of proteins with higher-than-expected mutation rates, which makes it optimal for the identification of gain-of-function sites that, while being key for the carcinogenic process, would otherwise be missed. Another similar idea is ActiveDriver (Reimand and Bader, 2013), which tries to identify phosphorylation sites that are recurrently mutated in cancer. One of the differences between the two methods is that ActiveDriver tests the mutation frequencies of predefined regions (a phosphorylation site and its neighboring amino acids), whereas OncodriveCLUST first looks for potential seeds of highly mutated clusters and then tries to extend them.

Here we present e-Driver, a novel method that identifies protein functional regions (PFRs) that show a bias in their mutation rates. In this context, PFRs can be either domains or intrinsically disordered regions (IDRs). Our method is based on the assumption that different PFRs within the same protein mediate different functions and, thus, might have distinct roles in carcinogenesis. This becomes evident when describing proteins in terms of functional networks. In such networks, nodes represent different proteins, and edges between nodes represent functional relationships between them, such as physical interactions or post-translational modifications. Different edges leading to the same node/protein are usually mediated by different PFRs within that same protein, and mutations in the PFR mediating one edge will have different consequences than mutations in another PFR mediating a different edge. For example, if an enzyme contains a catalytic domain and an IDR that is phosphorylated, it is likely that the consequences of a missense mutation disrupting the catalytic domain will be different from those of a missense mutation affecting the phosphorylation site or a truncating mutation that disrupts both PFRs at the same time. Our method exploits this idea, which has been previously used to analyze mutations associated with Mendelian disorders (Zhong et al., 2009, Wang et al., 2012), by looking for PFRs that show a bias in their mutation rate.

We have applied e-Driver to the cancer genomic dataset from the pan-cancer project of TCGA. This dataset has also been analyzed with four other methods (MuSiC, OncodriveFM, OncodriveCLUST and ActiveDriver), allowing us to compare the results obtained with e-Driver with those obtained by methods relying on other approaches used to identify the signals of positive selection (Tamborero et al., 2013a).

2 METHODS

2.1 Identification of the driver PFRs

e-Driver is based on the hypothesis that not all functional regions of a given protein may be equally relevant for carcinogenesis. If this is the case, it should be reflected in the distribution of missense mutations along the protein, with regions under selection showing enrichment or depletion of such mutations compared with regions with random (assumed to be passenger) mutations.

To identify PFRs under selection pressure, e-Driver first retrieves all missense mutations in a cancer cohort located in any given protein as well as the mutation coordinates and maps them to the protein’s functional regions. Then, for every PFR, we use a binomial test to check whether the observed number of mutations in this protein region (MR in Figure 1b) is biased. We assume that each mutation is an independent event, and that all residues of the protein have the same probability of being mutated. Then, given the total number of mutations in the protein (MT in Figure 1b), and the lengths of the region and the protein, we can calculate the probability of observing at least MR mutations in the region under the null hypothesis that the mutations are distributed randomly across the protein. Once the P-values of all the regions of all the mutated proteins in the cohort are obtained, the Benjamini–Hochberg false discovery rate algorithm is applied to correct for multiple testing. Those regions with a Q-value < 0.05 are considered as candidate driver regions. The whole process is explained in Figure 1 in the example of PIK3R1 and its functional regions.

Fig. 1.

Fig. 1.

e-Driver’s workflow shown in the example of the analysis of PIK3R1 mutation data from TCGA. (a) e-Driver first retrieves all missense mutations in a protein. It then identifies its PFRs, such as Pfam domains or IDRs. For example, in the case of PIK3R1, the protein contains four different Pfam domains (one SH3 domain, one RhoGAP domain and two SH2 domains) and two distinct IDRs. The predictions are independent and, thus, can overlap, as in the case of the second IDR and the second SH2 domain. e-Driver iterates through every functional region, calculating the P-values of the mutation distribution using a Fisher’s test that takes into account the mutation rates and lengths of both the region of interest and the protein. (b) Example of the calculations for the IDRb in PIK3R1. MR is the number of mutations in the region being studied; MT is the total number of mutations in the protein. Given that 29 of the 43 mutations happening in PIK3R1 are located in its second IDR, as the length of the region is of 227 aminoacids and the protein is 732 aminoacids long, we would expect only 13 mutations in this region. Thus, using the binomial test, the P-value for the observed number of mutations is 2.1 e-6. (c) Each of the different PFRs in PIK3R1 performs different functions. For example, the first SH2 domain is responsible for the interactions with GRB2 and PTPN6 (blue edges), whereas the second SH2 domain mediates the interaction with PDGFRB (green edge) and the second IDR mediates the interaction with PIK3CA (red edge). (d) It is likely that missense mutations in the SH2b domain of PIK3R1 will disrupt, among others, its interaction with PDGFRB without altering the rest of the network. Given that this region is not enriched in cancer somatic mutations, the functions/interactions mediated by this domain are unlikely to be oncogenic. (e) On the other hand, IDRb is strongly enriched in somatic mutations; thus, edges mediated by this region, such as the physical interaction with PIK3CA, are likely to be relevant to carcinogenesis. (f) The mutations in PIK3R1 (the white helical protein) IDRb region (shown in red) cluster around the region that interacts with PIK3CA (shown in brown). Representation based on PDB structure 2RD0

2.2 PFR annotations

We defined PFRs as sections of the protein coding for individual protein domains and IDRs. We decided to include IDRs because they can also contain important functional regions such as phosphorylation sites or regions that regulate or mediate protein interactions (Dunker et al., 2005).

To identify protein domains, we retrieved, for each protein isoform from ENSEMBL, annotated Pfam domains (Punta et al., 2012) as well as putative novel protein domains located in regions with no previous domain annotations, as predicted using the AIDA server (Xu et al., 2014). We used Foldindex (Prilusky and Felder, 2005) to predict IDRs for each protein, including in our analysis those regions with a predicted unfolded score below −0.1.

Finally, we mapped the different missense somatic mutations of each tumor to these PFRs, giving us a total of 66 492 altered regions in 14 421 genes based on data from 3205 tumor samples (see below). Among the 66 492 regions, we have 36 626 Pfam domain instances, 4626 putative domains predicted by the AIDA server and 25 240 IDRs. The features can overlap, as the predictions were performed independently and there is no reason why, for example, an IDR cannot overlap with (or even be located within) a Pfam domain. For the sake of simplicity, we discuss results obtained for only the longest isoform of each gene (e-Driver results for all the ENSEMBL isoforms can be found in the Supplementary Table S2).

2.3 TCGA mutation dataset

We have downloaded the dataset that was used in the TCGA pan-cancer driver analysis (syn1729383). To compare our results with the ones obtained in the TCGA pan-cancer analysis, we applied the same filters to the dataset, excluding 71 samples that were considered to be hypermutators (Tamborero et al., 2013a). After filtering, the final dataset consists of 3205 tumor samples with 287 822 coding missense mutations.

2.4 Predicted driver genes by the other four methods

To assess the value of our method, we compared our results with those obtained by four different methods used previously to predict high-confidence gene drivers in the TCGA pan-cancer project: MuSiC, OncodriveFM, OncoCLUST and ActiveDriver (Tamborero et al., 2013a). We downloaded the results obtained in this analysis for three of the four methods: OncodriveFM (syn1701498), OncodriveCLUST (syn1701498) and MuSiC (syn1713813). As no ActiveDriver results for the whole genome were available on the repository describing the pan-cancer analysis, we used ActiveDriver results described in another paper (Reimand et al., 2013) that, according to their authors, have been obtained with similar TCGA mutation data (3185 cancer genomes, syn2237931). Therefore, the results shown here for ActiveDriver are slightly different than those described in the pan-cancer analysis.

2.5 Tissue-specific drivers

We classified the 3205 tumor genomes into their corresponding 11 tissues of origin, obtaining 11 tissue-specific datasets that were then analyzed individually with e-Driver. We then again corrected for multiple testing by considering as positive only those PFRs with a Q-value <0.05.

3 RESULTS

3.1 e-Driver identifies known cancer drivers

To assess the validity of our method, we reanalyzed the pan-cancer dataset of TCGA. This dataset contains mutation data for 3205 tumor samples that come from 11 different types of tumors and contains 287 822 missense mutations. The dataset has been previously analyzed using four different state-of-the-art methods to predict cancer drivers from mutation data (MuSiC, OncodriveFM, OncoCLUST and ActiveDriver).

When applying our method to this dataset, we identified 74 protein regions in 51 genes, showing a bias in their mutation rate when compared with the rest of the protein (Figure 2a, Supplementary Table S1). Among these 51 genes, 23 are included in the Cancer Gene Census (CGC), a curated list of 512 cancer drivers (Futreal et al., 2004). This represents a strong enrichment in CGC genes in our list of candidate drivers when compared with random expectation (Figure 2b, odds ratio > 25, P-value < 1e-16). As shown in Figure 2a, 31 of the 51 genes predicted by e-Driver (61%) are also identified by other methods. The highest overlap of e-Driver predictions is with predictions from OncodriveFM and MuSiC, with 21 of 51 genes (41%) being common. Regarding genes included in the CGC, 22 of the 23 genes identified by e-Driver (96%) that belong to this list also have some other signal of positive selection, as they are also predicted by other methods.

Fig. 2.

Fig. 2.

e-Driver identifies known cancer driver genes. (a) Venn diagram showing the overlap between the five different methods in their predictions. (b) Venn diagram showing the overlap between the five different methods of predicting genes included in the CGC

Interestingly, there is one gene in the CGC, CREBBP, that has not been identified by any of the other four methods but was picked up by e-Driver. The CREBBP protein does not show any specific cluster of mutations nor is it recurrently mutated in cancer, which could explain why it is not recognized as a potential cancer driver by the other methods. Nevertheless, its mutation pattern shows a strong bias, as the acetyltransferase domain, located between amino acids 1345 and 1639 (12% of the protein’s length), contains 20 of the 60, or 30%, of all the mutations found in this gene (Q-value < 0.02).

There is one other acetyltransferase domain in the EP300 gene that is also enriched in somatic mutations and identified by e-Driver. This gene is also included in the CGC and is also identified by MuSiC and OncodriveFM but not by OncodriveCLUST or ActiveDriver. This observation suggests that, while EP300 is frequently mutated in cancer, its mutations show no particular clustering. However, by using e-Driver, we can identify the specific region of the protein that is enriched in mutations.

3.2 e-Driver finds potential novel drivers

We then reviewed the remaining 28 genes that are identified as potential drivers by our method but are not included in the CGC. Eight of them had also been identified by, at least, one other method, supporting their potential roles as cancer drivers. For example, our method, as well as OncodriveFM, identified the MGA gene as a potential driver. This gene encodes a dual-specificity transcription factor that regulates expression of Myc/MAX target genes. It suppresses the transcriptional activation by Myc and inhibits Myc-dependent cell transformation. The domain identified by e-Driver is the helix-loop-helix domain between positions 2425 and 2474 that contains 8 of the 46 mutations identified in this protein (odds ratio 13, Q-value < 0.001) and that mediates the binding of the protein to E-boxes in the DNA. Additional evidence in favor of the carcinogenic role of MGA comes from a recent study (Lawrence et al., 2014) using a larger genomic dataset with 4742 cancer samples in which, thanks to an increase in sample size and statistical power, MGA could be identified by MuSiC. As for the other seven genes that were also predicted by other methods, five of them were included in the list of 258 high-confidence drivers described in the pan-cancer driver analysis: FRG1B, NBPF10, DHX9, POTEF and RPSAP58. This result agrees with previous observations that genes predicted by more than a single method are likely to be true cancer drivers (Tamborero et al., 2013a) and confirms the power of our method to identify genes relevant to the disease.

Among the remaining 20 genes that are not part of the CGC and that are not identified by any other method, we have found several potential drivers. For example, we identified two members of the neuroblastoma breakpoint family, NBPF12 and NBPF20, as having regions with strong enrichment in mutations. These two genes belong to the same family as NBPF10, one of the genes included in the list of high-confidence drivers of the pan-cancer analysis. Interestingly, the disordered regions identified by e-Driver from NBPF12 and NBPF20 have a 94% identity, suggesting that their potential driver role might be achieved through similar mechanisms. Other interesting genes identified uniquely by our method include POTEM, a protein that belongs to the same family as the high-confidence driver POTEF. As in NBPF proteins, the regions identified in POTEM and POTEF are IDRs; however, in this case, they do not show any homology. Another interesting fact about POTEF is that the region identified by e-Driver does not show an enrichment in cancer somatic mutations but instead a depletion, suggesting that the conservation of this PFR is important for the survival of cancer cells and for POTEF’s role as a driver.

3.3 Tissue-specific candidate PFR drivers

Cancer is a heterogeneous disease, and it is known that mutations driving one type of cancer might be completely irrelevant for another. Thus, although the pan-cancer dataset has more statistical power because of its larger size, it is possible that there are tissue-specific drivers that cannot be detected in the pan-cancer dataset. To explore that possibility, we divided the pan-cancer genomes into 11 tissue-specific smaller datasets and analyzed each of them using e-Driver.

Although most PFRs have a stronger signal in the pan-cancer dataset than in any tissue dataset (Figure 3a, black dots), others have stronger tissue-specific signals (Figure 3a, gray dots). This is the case, for example, in FLT3’s kinase domain, which is mostly mutated in acute myeloid leukemia (17/23 mutations in this domain happen in this type of cancer). Another example is EGFR, which has two clearly different mutation patterns in glioblastoma and lung adenocarcinoma (Figure 3b). In glioblastoma, it is Domain II of EGFR’s extracellular region that is mostly affected by missense mutations (Domain IV seems to be also strongly mutated, although, as it is not annotated in Pfam, it has not been analyzed by e-Driver), and there are almost no mutations in the kinase domain. On the other hand, in lung adenocarcinoma, there are almost no mutations in the extracellular region and most mutations are located in the kinase domain of this protein.

Fig. 3.

Fig. 3.

Tissue-specific candidate drivers identified by e-Driver. (a) Correlation plot showing the Q-values obtained for each region in the pan-cancer dataset compared with the lowest Q-value obtained for that region in the 11 different tissues. Dots in gray represent regions with lower tissue-specific than pan-cancer Q-values, whereas black dots have lower pan-cancer than tissue-specific Q-values. Dashed lines are located in the Q = 0.05 threshold that we established to consider a region as a potential driver. (b) Histograms showing the mutation distribution of EGFR in three different datasets: pan-cancer (lower histogram), lung adenocarcinoma (middle) and glioblastoma (top). In the pan-cancer and glioblastoma datasets, only EGFR’s extracellular Domain II (positions 185–338, between dashed lines) is enriched in mutations, whereas the kinase domain (positions 714–965, between dashed lines) shows no bias in its mutation rate. However, in lung adenocarcinoma, it seems that only the kinase domain is relevant, as most mutations (19/21, 90%) are located in this type of domain

There are 11 PFRs in 10 different proteins that can be identified in only the tissue-specific datasets (Pancan qval > 0.05, Tissue qval < 0.05, Table 1). These tissue-specific candidate drivers are strongly enriched in genes with known cancer roles, as 8 of 10 proteins are part of the CGC. Besides the identification of EGFR’s kinase domain (Pf07714) in lung adenocarcinoma that has been explained above, there are other interesting examples in the list. For example, although most PIK3CA mutations are located in the Pf00613 domain (including the well-studied E545K) and happen in a variety of cancer types, the Pf02192 domain, also known as the ABD domain, is mostly mutated in endometrial cancer.

Table 1.

Tissue-specific drivers identified by e-Driver

Gene symbol PFR Start End Pancan qval Tissue qval Tissue
CTCF Pf00096 266 288 0.66 0.02 Brca
SPOP Pf00917 39 162 0.09 0.03 Ucec
PIK3CA Pf02192 32 108 1 1.8 e-5 Ucec
EGFR Pf07714 714 965 1 5.5 e-8 Luad
EGFR Pf00069 712 964 1 5.5 e-8 Luad
BAP1 Pf01088 4 214 0.6 0.004 Kirc
CTNNB1 Pf05804 334 484 0.12 8.0 e-4 Ucec
ANKRD36C IDR 543 632 0.37 0.003 Hnsc
ZNF479 Pf00096 437 459 0.1 6.5 e-5 Blca
FLNA Pf00630 1158 1244 1 0.009 Gbm
MTOR IDR 1442 1492 0.07 1.2 e-4 kirc

4 DISCUSSION AND CONCLUSIONS

Here we evaluated the hypothesis that some cancer driver genes might accumulate mutations in only those functional regions (domains or disordered regions) that are relevant to the disease. To test this idea, we have developed a novel approach, e-Driver, and applied it to one of the largest available datasets of cancer genomic data, the TCGA’s pan-cancer project. Our method checks for each PFR whether it shows a bias in its mutation rate when compared with the rest of the protein. As it uses mutation data only for individual proteins, e-Driver, unlike other methods that compare mutation rates of whole genes, does not need to compensate for variations in mutation rates across the entire genome (De and Michor, 2011). Another novelty of our method is that protein domains and IDRs are usually larger than the clusters identified by other methods. This feature is important, as small clusters of mutations are usually located in oncogenes rather than in tumor-suppressor genes. By using larger functional regions, we can identify tumor suppressors whose contribution to carcinogenesis depends solely on the mutation status of specific regions.

The advantages of our method are exemplified in the identification of MGA using the TCGA dataset. This gene was not mutated in enough samples to be identified by methods that rely on the mutation frequency of the whole gene (in a recent study with more cancer samples, these methods were able to identify MGA as a potential cancer driver). Because this gene acts as a tumor suppressor, the range of positions that can be mutated for it to drive the tumor’s growth is too large to be identified by OncodriveCLUST. However, its mutations tend to accumulate in its helix-loop-helix domain rather than in the rest of the protein, allowing e-Driver to find it.

One drawback that comes from the use of predefined regions is that if the gene has no such regions, or if the regions cover the whole gene, the gene cannot be identified using our method. This is the case, for example, in IDH1 and IDH2 (Yan et al., 2009). These two known cancer driver genes encode single-domain proteins. In this scenario, even though their only PFRs are frequently mutated in cancer and show clusters of mutations, e-Driver cannot identify them. However, such cases represent <10% of all the human proteins and <3% of the proteins with at least one mutation in TCGA (see Supplementary Table S3). It is important to note that, just like most other methods that rely on mutation frequencies to identify potential drivers, e-Driver will also benefit from the increase in the number of sequenced cancer genomes, as the statistical power will be larger, allowing it to identify novel regions (Supplementary Figure S2).

Proteins enriched in mutations in an unannotated region (such as EGFR’s extracellular Domain IV) present another scenario in which e-Driver will not be able to identify that specific region. In this case, however, as long as the protein contains an annotated PFR, e-Driver should be able to find the protein, as it will pick up the annotated PFR because of its lack of missense mutations. Another interesting feature of e-Driver is that, as it detects which PFRs are relevant for each type of cancer, it might also help in defining strategies to design and administer drugs. For example, it has been recently shown that the two different patterns of mutations that we observed in EGFR for glioblastoma and lung adenocarcinoma have therapeutic implications as to which type of EGFR inhibitors work in each case, as they deregulate EGFR’s activity through different mechanisms (Vivanco et al., 2012). Other example are PIK3CA’s Pf02192 and Pf00613 domains, which are also driving different subsets of cancer and that determine the response to the IGF1R inhibitor AEW541 (Porta-Pardo and Godzik, unpublished data).

Overall, we have shown that our approach can identify both well-known oncogenes as well as novel candidate cancer drivers. Moreover, because of direct connections between protein regions and specific elements of the protein function, it can also provide further hypotheses of the mechanisms of driver genes. Given the complexity of the problem of identifying cancer drivers, it is likely that the combination of multiple approaches looking for distinct signals of positive selection is going to be needed to get to the final answer. For example, neither e-Driver nor any of the other methods discussed here work with data regarding somatic copy number variations, a type of mutation that can drive several subsets of cancer (Ciriello et al., 2013). Here we have demonstrated that e-Driver can provide a novel insightful and complementary view of the problem, contributing to its solution.

Supplementary Material

Supplementary Data

ACKNOWLEDGEMENTS

The authors want to thank their colleagues from the SBMRI bioinformatics group: specifically, Lukasz Jaroszewski for providing information and prediction for novel human protein domains and Thomas Hrabe for his help in preparing some of the figures.

Funding: This work has been supported by the Human Frontiers Science Program grant RGP0027/2011.

Conflict of Interest: none declared.

REFERENCES

  1. Chang K, et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ciriello G, et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 2013;45:1127–1133. doi: 10.1038/ng.2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. De S, Michor F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 2011;29:1103–1108. doi: 10.1038/nbt.2030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Dees ND, et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 2012;22:1589–1598. doi: 10.1101/gr.134635.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dunker AK, et al. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–5148. doi: 10.1111/j.1742-4658.2005.04948.x. [DOI] [PubMed] [Google Scholar]
  6. Futreal P, et al. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gonzalez-Perez A, Lopez-Bigas N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 2012;40:e169. doi: 10.1093/nar/gks743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hudson T, et al. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Kandoth C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339. doi: 10.1038/nature12634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Lawrence MS, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505:495–501. doi: 10.1038/nature12912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Prilusky J, Felder C. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics. 2005;21:3435–3438. doi: 10.1093/bioinformatics/bti537. [DOI] [PubMed] [Google Scholar]
  12. Punta M, et al. The Pfam protein families database. Nucleic Acids Res. 2012;40:D290–D301. doi: 10.1093/nar/gkr1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Reimand J, et al. The mutational landscape of phosphorylation signaling in cancer. Sci. Rep. 2013;3:2651. doi: 10.1038/srep02651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Reimand J, Bader GD. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 2013;9:637. doi: 10.1038/msb.2012.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Tamborero D, et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 2013a;3:2650. doi: 10.1038/srep02650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Tamborero D, et al. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013b;29:2238–2244. doi: 10.1093/bioinformatics/btt395. [DOI] [PubMed] [Google Scholar]
  17. Vivanco I, et al. Differential sensitivity of glioma- versus lung cancer-specific EGFR mutations to EGFR kinase inhibitors. Cancer Discov. 2012;2:458–471. doi: 10.1158/2159-8290.CD-11-0284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Wang X, et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 2012;30:159–164. doi: 10.1038/nbt.2106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Xu D, et al. AIDA: ab initio domain assembly server. Nucleic Acids Res. 2014;42:W308–W313. doi: 10.1093/nar/gku369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Yan H, et al. IDH1 and IDH2 mutations in Gliomas. N. Engl. J. Med. 2009;360:765–773. doi: 10.1056/NEJMoa0808710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhong Q, et al. Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol. 2009;5:321. doi: 10.1038/msb.2009.80. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES