Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2015 Oct 29;44(Database issue):D992–D999. doi: 10.1093/nar/gkv1123

NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings

Omer An 1, Giovanni M Dall'Olio 1, Thanos P Mourikis 1, Francesca D Ciccarelli 1,*
PMCID: PMC4702816  PMID: 26516186

Abstract

The Network of Cancer Genes (NCG, http://ncg.kcl.ac.uk/) is a manually curated repository of cancer genes derived from the scientific literature. Due to the increasing amount of cancer genomic data, we have introduced a more robust procedure to extract cancer genes from published cancer mutational screenings and two curators independently reviewed each publication. NCG release 5.0 (August 2015) collects 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13 315 cancer samples from 49 cancer types and 24 primary sites. In addition to collecting cancer genes, NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs).

INTRODUCTION

Cancer genome projects, including The Cancer Genome Atlas (TCGA, https://tcga-data.nci.nih.gov/) and the International Cancer Genome Project (ICGC, https://dcc.icgc.org/) have so far mapped DNA alterations in more than 13 000 cancer samples. These massive sequencing efforts show that somatic modifications vary greatly between and within cancer types (13). Only some of the acquired alterations, however, confer a selective advantage that promotes cancer development (driver alterations). The large majority of alterations have no or little role in cancer and are fixed in the cancer genome as a by-product of the selection acting on drivers (passenger alterations). One of the challenges of cancer genomics is to effectively distinguish between driver and passenger alterations in order to identify the molecular determinants of cancer. Most known driver alterations modify protein-coding genes (cancer genes). The ability to identify cancer genes among the wealth of mutated genes is crucial to better understand cancer biology and to empower the development of innovative anti-cancer therapy.

Network of Cancer Genes (NCG) is a database launched in 2010 with the aim to collect cancer genes from the literature. Curators constantly review cancer mutational screenings and annotate altered genes that either have well-established cancer functions (known cancer genes) or are putative cancer drivers (candidate cancer genes). Originally (4), NCG collected data from only five mutational screenings and annotated most known cancer genes from the Cancer Gene Census (CGC) (5). The last five years have seen the rapid accumulation of cancer genomic data from thousands of samples, with almost all human genes mutated in at least one sample (6,7). Due to this overwhelming amount of data and to avoid the inclusion of mutated genes with no role in cancer, in this release we have substantially reviewed the procedure to identify cancer genes. NCG now collects 1571 cancer genes, 518 of which are known cancer genes. The remaining 1053 genes are candidate cancer genes whose driver role has been predicted in the original publication using a variety of methods (Supplementary Table S1). Given the importance of a robust experimental support for the cancer activity of candidate cancer genes, NCG now collects additional literature describing available orthogonal validations. NCG also annotates various properties of cancer genes such as the presence of extra copies in the genome (gene duplicability), the evolutionary origin, the connectivity of the encoded proteins in the protein–protein and miRNA interaction networks, and the comprehensive gene expression profile across 38 human tissues and 1543 cancer cell lines.

The manual curation of the literature to extract cancer driver genes and the annotation of a large number of additional properties make NCG a comprehensive and updated resource to navigate the overwhelming amount of cancer data with a particular focus on the genetic determinants of cancer.

MANUAL ANNOTATION OF CANCER GENES

In this release of NCG, the procedure for the inclusion of cancer genes in NCG has been reviewed and standardized (Figure 1A). The first difference with previous versions is to restrict the inclusion only to studies that describe mutational screenings of cancer samples and that distinguish between cancer genes and genes with passenger mutations. This led to the identification of 119 new publications. To be consistent with these inclusion criteria, all 68 studies present in the previous release were re-analysed. Twelve of them were excluded because they screened cancer cell lines rather than cancer samples or used no methods to identify cancer genes among all mutated genes. As a result of this extensive literature search, NCG 5.0 currently collects 175 studies (Supplementary Table S1). Two curators reviewed independently each publication to extract cancer genes and complementary information, such as the screening and the cancer types, the primary sites, the number of sequenced samples and the methods that were applied to identify cancer genes (Figure 1A). This manual curation resulted in 1260 cancer genes, 207 of which were annotated as known cancer genes in CGC. The remaining 1053 genes were candidate cancer genes identified in the original study using one or more methods (Supplementary Table S1). Additional known cancer genes were also added from CGC (February 2014), leading to a total of 1571 cancer genes. If information was available, cancer genes were further annotated as dominant (mostly oncogenes) or recessive (mostly tumour-suppressors) genes.

Figure 1.

Figure 1.

Curation procedure and comparison between NCG 5.0 and NCG 4.0: (A) Flowchart of the curation procedure used in NCG. After the identification of relevant publications describing cancer mutational screenings, two independent curators extract cancer genes and related information on types of screening and cancer, primary sites, screened samples and supporting methods. (B) Number of publications, screenings, cancer types and screened samples in NCG 5.0 as compared to NCG 4.0. (C) Venn diagram of cancer genes in NCG 4.0 and NCG 5.0. The reasons for the removal of 778 genes from the database are detailed in Supplementary Table S2. (D–E) Growth of NCG data in time. Shown are the number of publications, screenings and cancer genes starting from 2010, year of the first release of NCG. All screenings that were published prior of 2010, were collapsed.

As compared to NCG 4.0 (8), NCG 5.0 now collects information from more than the double number of publications, screenings and cancer types and from four times more cancer samples (Figure 1B). Despite this substantial increase of data, the number of cancer genes decreased from 2000 to 1571 (Figure 1C), because of the more restrictive criteria. In particular, 612 genes were removed because the original publication was excluded and 166 genes because they had no support as cancer drivers (Supplementary Table S2). Overall, the studies in NCG 5.0 describe 188 mutational screenings, including 125 whole exome sequencings, 33 whole genome sequencings, 17 screenings of selected gene panels and 13 screenings based on multiple approaches (Figure 1D). Interestingly, the number of cancer genes with a well-documented role in cancer increases at a much slower pace as compared to candidate cancer genes (Figure 1E). This highlights the currently unmet need of efficient experimental assays that support the predicted role of candidate genes in cancer.

Almost all mutational screenings collected in NCG 5.0 applied only one method to identify cancer genes (Supplementary Table S1). The most common was the recurrence of mutation of a given gene across samples, which was taken as a sign of functional selection (Figure 2A and Supplementary Table S1). Other commonly used methods included MutSig (6) and MuSiC (9) (Figure 2A and Supplementary Table S1). Interestingly, the majority of known cancer genes (67%) had the support of at least two methods (Figure 2B), while most candidate cancer genes (78%) have been predicted by only one method (Figure 2C). In agreement with this, known cancer genes were overall identified as drivers across a higher number of mutational screenings and primary cancer sites as compared to candidate cancer genes (Figure 2D). The tendency of candidate cancer genes to be cancer specific was also reflected by the lower overlap between methods that support them as compared to those that support known cancer genes (Figure 2E). Cases where the overlap was higher (i.e. between MutSig and Invex, Figure 2E) corresponded to screenings where both methods were used (Supplementary Table S1).

Figure 2.

Figure 2.

Overview of data in NCG 5.0: (A) Cancer mutational screenings divided according to the method that was applied to identify cancer genes in the original publication. Methods and corresponding screenings are described in Supplementary Table S1. (B–C) Fractions of known and candidate cancer genes supported by one or more methods. Gene counts are reported in brackets. (D) Number of mutational screenings and primary sites where each cancer gene has been reported as a driver. TP53 is an outlier and has been excluded from the analysis because it has been identified in 113 screenings across 22 primary sites. (E) Heatmaps of the overlap between methods identifying known and candidate cancer genes. Each box represents the percentage of cancer genes identified with one method that are also supported by another. For each method, the total number of associated cancer genes is reported in brackets.

EXPERIMENTAL VALIDATION OF CANDIDATE CANCER GENES

Candidate cancer genes that are identified using computational methods often lack additional experimental validation of their cancer driver role. The main reason is that functional follow-ups are often cumbersome and require ad hoc design for individual genes. The experimental proof of predicted driver role is however crucial for the translatability of potentially relevant discoveries into increased knowledge and novel treatments.

In this release of NCG, we have extensively reviewed the literature to search for experimental validations of candidate cancer genes. NCG now annotates available orthogonal experiments that have been performed in the original study or in follow-up studies for 120 out of 1053 candidate cancer genes (11% of the total, Table 1 and Supplementary Table S3). Most commonly used approaches measure the effect of gene silencing or gene overexpression in cell lines (Figure 3A and Supplementary Table S3) and the majority of candidate genes (83 out of 120) have been validated through multiple assays (Figure 3B).

Table 1. Experimental validation of candidate cancer genes.

Experimental validation Candidate cancer genes (n) Publications (n)
Gene overexpression 60 74
Transient RNA interference 58 52
Mutagenesis 31 41
Immunostaining 25 26
Stable gene knockout 23 22
Survival analysis 20 21
Protein activity assay 19 20
Drug response assay 15 17
In silico protein modelling 12 14
Xenograft 10 11
Rhotekin pull-down 2 5
Total 275 (120 unique genes) 303 (166 unique publications)

For each type of experimental validation, the numbers of validated candidate genes and corresponding publications are shown. The complete gene list with references to the original papers is given in Supplementary Table S3.

Figure 3.

Figure 3.

Validation of candidate cancer genes and alteration spectrum of CSMD3: (A) Fractions of validated candidate cancer genes according to the used experimental assay. Gene silencing refers to stable knockout or transient knockdown via RNA interference. Other assays include in silico protein modelling, survival analysis, drug response, protein activity, rhotekin pull-down and xenograft cancer models. (B) Percentage of candidate cancer genes that have been validated using one or more experimental approaches. The corresponding number of genes is shown above each bar. The full list of experiments and genes is reported in Supplementary Table S3. (C) Protein domain architecture of CSMD3 according to the SMART database (32). (D) Percentage of mutational screenings, cancer types, primary sites and methods that support the cancer driver role of CSMD3. Corresponding numbers are provided. (E) Expression profile of CSMD3 in normal human tissues. Tissues where the gene is expressed in GTEx and Protein Atlas are highlighted in red.

An interesting case is CSMD3, the gene associated with benign adult familial myoclonic epilepsy (10) that encodes a long multi-repeat protein (Figure 3C). CSMD3 has been found recurrently mutated across several cancer types and, therefore, has been predicted as a cancer driver by several methods (Figure 3D). Because of its length, sequence composition and location in proximity of fragile sites of the genome, CSMD3 was regarded as a possible false positive in NCG 4.0. The fact that CSMD3 is constitutionally not expressed in many tissues where it is mutated (Figure 3E) also supports the passenger role of the acquired mutations. Despite this, however, the stable knockout of CSMD3 in immortalized epithelial cells has been reported to increase cell proliferation (11), thus suggesting a tumour-suppressor role for this gene. This example highlights the difficulty to correctly predict the driver role of mutated genes and the need of multiple independent pieces of evidence to assess the role of mutations in cancer.

ANNOTATION OF CANCER GENE PROPERTIES

To annotate the properties of cancer genes, original data on human genes, orthology, protein–protein and miRNA interactions and gene expression have been updated (Table 2).

Table 2. Data and properties of cancer genes in NCG 5.0.

Data sets in NCG 5.0 All cancer genes (1571) Known cancer genes (518) Candidate cancer genes (1053) Other human genes
Dominant (395) Recessive (112)
Human genes All genes 1525 382 112 1020 17 489
Duplicated genes (%) 280 (18%) 76 (20%) 12 (11%) 187 (18%) 3520 (20%)
Orthology All genes 1501 379 110 1001 16 618
Pre-metazoan genes (%) 992 (66%) 233 (61%) 80 (72%) 672 (67%) 10 516 (63%)
Protein–protein interactions All nodes 1332 371 110 840 13 262
Hubs (%) 558 (42%) 213 (57%) 78 (71%) 257 (31%) 2970 (22%)
All nodes in HT network 1177 339 108 720 11 481
Hubs in HT network (%) 386 (33%) 148 (44%) 52 (48%) 177 (25%) 2681 (23%)
Protein complexes Proteins (%) 752 (49%) 238 (62%) 87 (78%) 418 (41%) 4917 (28%)
miRNA interactions miRNA target genes (%) 1101 (72%) 332 (87%) 99 (88%) 662 (65%) 10 643 (61%)
miRNAs 324 247 163 250 438
Expression in normal tissues All genes in GTEx 1513 379 111 1012 16 818
Ubiquitous genes (%) 965 (64%) 301 (79%) 98 (88%) 555 (55%) 11 077 (66%)
Tissue-specific genes (%) 62 (4%) 5 (1%) 0 (0%) 57 (6%) 726 (4%)
All genes in Protein Atlas 1517 378 112 1016 16 889
Ubiquitous genes (%) 831 (55%) 278 (74%) 95 (85%) 447 (44%) 9492 (56%)
Tissue-specific genes (%) 90 (6%) 11 (3%) 1 (1%) 78 (8%) 1042 (6%)
Expression in cancer cell lines Cancer cell line encyclopedia 1426 367 106 942 15 158
COSMIC Cancer Lines 1398 358 105 924 14 788
Genentech data set 1524 381 112 1020 17 164

Of the 518 known cancer genes derived from CGC, 391 are annotated as dominant (mostly oncogenes), 108 as recessive (mostly tumour-suppressors), four as both as dominant and recessive and 15 have no specified mode of action. Duplicated genes have one or more duplicated loci in the genome covering ≥60% of their length (12). Pre-metazoan genes originated in the Last Universal Common Ancestor, Eukaryotes or Opisthokonts. Ubiquitously expressed genes are expressed in ≥95% tissues (29 tissues in GTEx and 30 tissues in Protein Atlas). HT = high throughput (publications reporting ≥100 interactions).

Applying the previously described method (12), protein sequences from RefSeq v.63 (13) were aligned to the human genome assembly Hg19 to identify unique gene loci. These included 1525 of the 1571 cancer genes (13 cancer genes did not have RefSeq entries and 33 had no match in Hg19 or were gene isoforms). Cancer genes confirm their lower duplicability as compared to non-cancer genes and the signal derives from recessive cancer genes (P-value = 0.02, chi-square test, Table 2).

Orthology information from EggNOG v.4 (14) was used to trace the evolutionary origin of 1501 cancer genes, as described earlier (15). In line with previous reports (1517), a higher fraction of cancer genes have orthologs in pre-metazoan species as compared to other human genes (P-value = 0.03, chi-square test, Table 2).

Four sources of primary interaction data (BioGRID v.3.4.125 (18); MIntAct v.190 (19); DIP (April 2015) (20); HPRD v.9 (21)) were integrated to rebuild the human protein–protein interaction network. This network included 1332 cancer proteins, which encode a higher fraction of hubs (defined as 25% most connected nodes of the network) as compared to other human proteins (P-value = 2.7 × 10−56, chi-square test, Table 2). We verified that cancer genes encode a higher fraction of protein hubs also in the network derived from high-throughput screenings (P-value = 7.7 × 10−13, chi-square test, Table 2). This excludes biases due to the higher number of single-gene experiments involving cancer proteins.

To complete the annotation of protein–protein interactions, NCG now collects also information on 752 cancer proteins involved in complexes as gathered from three resources (CORUM (February 2012) (22), HPRD v.9 (21), Reactome v.53 (23)). Supporting the signal from the overall protein–protein interaction network, a higher percentage of cancer proteins engage in complexes as compared to non-cancer proteins (P-value = 3.0 × 10−67, chi-square test, Table 2).

Interactions between 324 miRNAs and 1101 cancer genes were derived from miRTarBase v.4.5 (24) and miRecords (April 2013) (25). Similarly to the protein–protein interaction network, also in the miRNA network a significantly larger fraction of cancer genes are target of miRNAs as compared to other human genes (P-value = 3.0 × 10−18, chi-square test, Table 2).

This release of NCG provides information on the expression of cancer genes in normal tissues and in cancer cell lines. For normal tissues, NCG relies on GTEx v.1.1.8 (26) and Protein Atlas (April 2015) (27), which both derive gene expression from RNASeq data in a total of 38 tissues. Expression values (FPKM for GTEx and RPKM for Protein Atlas) were used to derive expression categories (low, medium and high expression) for each gene and to calculate the distribution of gene expression across samples in each tissue. In both data sets, larger fractions of known cancer genes, but not of candidate cancer genes, are ubiquitously expressed (expression in >95% of all tissues) as compared to other genes (P-value = 1.3 × 10−13 and P = 1.3 × 10−19 for GTEx and Protein Atlas, respectively, chi-square test, Table 2). Conversely, significantly lower fractions of known cancer genes, but not of candidate cancer genes, are tissue specific (P-value = 4.2 × 10−4 and P-value = 6.9 × 10−4, for GTEx and Protein Atlas, respectively, chi-square test, Table 2).

Three data sets (Cancer Cell Lines Encyclopedia (28), COSMIC Cancer Lines Project (29) and the recently released Genentech data set (30)) were used to derive gene expression in a total of 1543 cancer cell lines (Table 2). For each cancer gene, NCG provides the original expression value in each cell line as well as the normalized expression score, calculated as previously reported (31).

DATA ACCESS

NCG web interface has been reorganized, with particular focus on the summary of gene information and on the visualization of gene expression profiles. The gene summary now includes additional cross-references to external resources on protein domain architecture (32), drug and compound interactions (33,34) and protein druggability (35). For each cancer gene, the type of mutational screenings, the supporting methods and any experimental validation are detailed. Gene expression profiles are now shown as interactive graphs reporting the distribution of expression levels in each normal tissue and as summary tables in cancer cell lines.

NCG website provides overview statistics of the data contained in the database, including the list of 49 cancer types and corresponding 24 primary sites, the distribution of known and candidate cancer genes per primary sites, and information on 48 possible false positives. These include 14 genes derived from the literature (6), 4 additional genes that likely accumulate a high number of alterations due to their length and 30 olfactory receptor genes. All data contained in the database can be exported in batch using the advanced search option.

NCG USAGE

NCG offers a multi-level annotation of cancer genes that can be queried to gain insights on mutation status, properties, function and expression profiles of cancer genes (Figure 4A). This information facilitates the characterization of cancer genes and associated features. For example, gene duplicability has been exploited to extract duplicated tumour suppressor genes and to verify the occurrence of negative epistasis between them and their paralogs (36). Another useful feature of NCG is the comprehensive overview of gene expression profiles across a vast range of normal tissues and cancer cell lines. This can guide the selection of the most adequate cell systems for planning in vitro experiments (Figure 4B).

Figure 4.

Figure 4.

Examples of NCG usage: (A) Example of information available in NCG for a given cancer gene, in this case the oncogene AKT2. NCG summarizes the gene mutation profile across cancer types, information on duplicability, orthology, protein–protein and miRNA interactions and gene expression (B) NCG can facilitate the selection of the best cell systems for experimental assays by providing the expression profile of the gene of interest in several tissues and cell lines. (C) NCG can be used to annotate altered genes from mutational screenings. (D) The advanced search interface of NCG allows the identification of drivers in a variety of cancer types. (E) NCG can be integrated in gene enrichment analysis pipelines as a source of cancer genes.

NCG is exploited widely as a repository of cancer genes (17,3750). Examples include the use of NCG to test for the proximity of cancer genes to retrovirus insertion sites (48) and to evaluate the features of cancer classification methods (41). NCG also facilitates the interpretation of cancer mutational screenings by annotating the properties of mutated genes (Figure 4C) overall and in selected cancer types (Figure 4D). For example, NCG has been used to verify whether genes undergoing copy number variations in familial breast cancer were already known cancer genes (49). Finally, NCG can be easily integrated into more complex analytical pipelines (Figure 4E). In the method developed by Zeller et al., NCG provides a source of true cancer genes to prioritize drivers (50). In the DOSE bioconductor package, NCG is implemented as a source of cancer genes to perform enrichment analysis (51).

FUTURE WORK

It is expected that mutational screenings of cancer samples will continue to produce large amounts of data in the next years. The launch of personal genome initiatives ((52) and www.genomicsengland.co.uk) and the delivery of pan-cancer projects will substantially enlarge the spectrum of cancer types and samples with available mutational profiles. This will allow the discovery of novel cancer genes, particularly of those that recur in few samples and are currently difficult to identify. In parallel, the development of novel approaches for high-throughput functional screenings (e.g. based on the CRISPR-Cas technology (5356)) promises to improve the efficiency of experimental validation assays.

In this exciting scenario, NCG will continue in its commitment to manually curate the literature to extract cancer genes and annotate available orthogonal supports. NCG will also expand to include other types of cancer driver alterations, such as copy number variations, gene rearrangements and non-coding modifications (57,58). In addition to enlarge the repertoire of cancer drivers, NCG will integrate new properties, e.g. the epigenetic regulation of cancer genes and their germline mutations.

As data become available, NCG will include the clinical relevance of cancer genes, such as their actionability as pharmacological targets (59) and their applicability as biomarkers of cancer progression. All these efforts will contribute towards a more complete characterization of the molecular determinants of cancer.

Acknowledgments

The authors thank Alex Mastrogiannopoulos for his help in the manual curation of the experimental validation of candidate cancer genes and all members of the Ciccarelli lab for providing suggestions to improve NCG.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

European Union's Seventh Framework Programme [(FP7/2007-2013) under grant agreement No. 259743] (MODHEP consortium). The authors acknowledge support from the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St Thomas’ NHS Foundation Trust and King's College London.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Stratton M.R., Campbell P.J., Futreal P.A. The cancer genome. Nature. 2009;458:719–724. doi: 10.1038/nature07943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Garraway L.A., Lander E.S. Lessons from the cancer genome. Cell. 2013;153:17–37. doi: 10.1016/j.cell.2013.03.002. [DOI] [PubMed] [Google Scholar]
  • 3.Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Jr, Kinzler K.W. Cancer genome landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Syed A.S., D'Antonio M., Ciccarelli F.D. Network of Cancer Genes: a web resource to analyze duplicability, orthology and network properties of cancer genes. Nucleic Acids Res. 2010;38:D670–D675. doi: 10.1093/nar/gkp957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Futreal P.A., Coin L., Marshall M., Down T., Hubbard T., Wooster R., Rahman N., Stratton M.R. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Carter S.L., Stewart C., Mermel C.H., Roberts S.A., et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kandoth C., McLellan M.D., Vandin F., Ye K., Niu B., Lu C., Xie M., Zhang Q., McMichael J.F., Wyczalkowski M.A., et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339. doi: 10.1038/nature12634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.An O., Pendino V., D'Antonio M., Ratti E., Gentilini M., Ciccarelli F.D. NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes. Database. 2014;2014:bau015. doi: 10.1093/database/bau015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dees N.D., Zhang Q., Kandoth C., Wendl M.C., Schierding W., Koboldt D.C., Mooney T.B., Callaway M.B., Dooling D., Mardis E.R., et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 2012;22:1589–1598. doi: 10.1101/gr.134635.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shimizu A., Asakawa S., Sasaki T., Yamazaki S., Yamagata H., Kudoh J., Minoshima S., Kondo I., Shimizu N. A novel giant gene CSMD3 encoding a protein with CUB and sushi multiple domains: a candidate gene for benign adult familial myoclonic epilepsy on human chromosome 8q23.3-q24.1. Biochem. Biophys. Res. Commun. 2003;309:143–154. doi: 10.1016/s0006-291x(03)01555-9. [DOI] [PubMed] [Google Scholar]
  • 11.Liu P., Morrison C., Wang L., Xiong D., Vedell P., Cui P., Hua X., Ding F., Lu Y., James M., et al. Identification of somatic mutations in non-small cell lung carcinomas using whole-exome sequencing. Carcinogenesis. 2012;33:1270–1276. doi: 10.1093/carcin/bgs148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rambaldi D., Giorgi F.M., Capuani F., Ciliberto A., Ciccarelli F.D. Low duplicability and network fragility of cancer genes. Trends Genet. 2008;24:427–430. doi: 10.1016/j.tig.2008.06.003. [DOI] [PubMed] [Google Scholar]
  • 13.Pruitt K.D., Brown G.R., Hiatt S.M., Thibaud-Nissen F., Astashyn A., Ermolaeva O., Farrell C.M., Hart J., Landrum M.J., McGarvey K.M., et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:D756–D763. doi: 10.1093/nar/gkt1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Powell S., Forslund K., Szklarczyk D., Trachana K., Roth A., Huerta-Cepas J., Gabaldon T., Rattei T., Creevey C., Kuhn M., et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 2014;42:D231–D239. doi: 10.1093/nar/gkt1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.D'Antonio M., Ciccarelli F.D. Modification of gene duplicability during the evolution of protein interaction network. PLoS Comput. Biol. 2011;7:e1002029. doi: 10.1371/journal.pcbi.1002029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Domazet-Loso T., Tautz D. An ancient evolutionary origin of genes associated with human genetic diseases. Mol. Biol. Evol. 2008;25:2699–2707. doi: 10.1093/molbev/msn214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Domazet-Loso T., Tautz D. Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa. BMC Biol. 2010;8:66. doi: 10.1186/1741-7007-8-66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chatr-Aryamontri A., Breitkreutz B.J., Oughtred R., Boucher L., Heinicke S., Chen D., Stark C., Breitkreutz A., Kolas N., O'Donnell L., et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 2015;43:D470–D478. doi: 10.1093/nar/gku1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Orchard S., Ammari M., Aranda B., Breuza L., Briganti L., Broackes-Carter F., Campbell N.H., Chavali G., Chen C., del-Toro N., et al. The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42:D358–D363. doi: 10.1093/nar/gkt1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Salwinski L., Miller C.S., Smith A.J., Pettit F.K., Bowie J.U., Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32:D449–D451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Keshava Prasad T.S., Goel R., Kandasamy K., Keerthikumar S., Kumar S., Mathivanan S., Telikicherla D., Raju R., Shafreen B., Venugopal A., et al. Human Protein Reference database–2009 update. Nucleic Acids Res. 2009;37:D767–D772. doi: 10.1093/nar/gkn892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ruepp A., Waegele B., Lechner M., Brauner B., Dunger-Kaltenbach I., Fobo G., Frishman G., Montrone C., Mewes H.W. CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res. 2010;38:D497–D501. doi: 10.1093/nar/gkp914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Milacic M., Haw R., Rothfels K., Wu G., Croft D., Hermjakob H., D'Eustachio P., Stein L. Annotating cancer variants and anti-cancer therapeutics in reactome. Cancers. 2012;4:1180–1211. doi: 10.3390/cancers4041180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hsu S.D., Tseng Y.T., Shrestha S., Lin Y.L., Khaleel A., Chou C.H., Chu C.F., Huang H.Y., Lin C.M., Ho S.Y., et al. miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 2014;42:D78–D85. doi: 10.1093/nar/gkt1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xiao F., Zuo Z., Cai G., Kang S., Gao X., Li T. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 2009;37:D105–D110. doi: 10.1093/nar/gkn851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Melé M., Ferreira P.G., Reverter F., DeLuca D.S., Monlong J., Sammeth M., Young T.R., Goldmann J.M., Pervouchine D.D., Sullivan T.J., et al. Human genomics. The human transcriptome across tissues and individuals. Science. 2015;348:660–665. doi: 10.1126/science.aaa0355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Uhlén M., Fagerberg L., Hallstrom B.M., Lindskog C., Oksvold P., Mardinoglu A., Sivertsson A., Kampf C., Sjostedt E., Asplund A., et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
  • 28.Barretina J., Caponigro G., Stransky N., Venkatesan K., Margolin A.A., Kim S., Wilson C.J., Lehar J., Kryukov G.V., Sonkin D., et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Garnett M.J., Edelman E.J., Heidorn S.J., Greenman C.D., Dastur A., Lau K.W., Greninger P., Thompson I.R., Luo X., Soares J., et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012;483:570–575. doi: 10.1038/nature11005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Klijn C., Durinck S., Stawiski E.W., Haverty P.M., Jiang Z., Liu H., Degenhardt J., Mayba O., Gnad F., Liu J., et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 2015;33:306–312. doi: 10.1038/nbt.3080. [DOI] [PubMed] [Google Scholar]
  • 31.D'Antonio M., Ciccarelli F.D. Integrated analysis of recurrent properties of cancer genes to identify novel drivers. Genome Biol. 2013;14:R52. doi: 10.1186/gb-2013-14-5-r52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Letunic I., Doerks T., Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43:D257–D260. doi: 10.1093/nar/gku949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Griffith M., Griffith O.L., Coffman A.C., Weible J.V., McMichael J.F., Spies N.C., Koval J., Das I., Callaway M.B., Eldred J.M., et al. DGIdb: mining the druggable genome. Nat. Methods. 2013;10:1209–1210. doi: 10.1038/nmeth.2689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kuhn M., Szklarczyk D., Pletscher-Frankild S., Blicher T.H., von Mering C., Jensen L.J., Bork P. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014;42:D401–D407. doi: 10.1093/nar/gkt1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bento A.P., Gaulton A., Hersey A., Bellis L.J., Chambers J., Davies M., Kruger F.A., Light Y., Mak L., McGlinchey S., et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2014;42:D1083–D1090. doi: 10.1093/nar/gkt1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.D'Antonio M., Guerra R.F., Cereda M., Marchesi S., Montani F., Nicassio F., Di Fiore P.P., Ciccarelli F.D. Recessive cancer genes engage in negative genetic interactions with their functional paralogs. Cell Rep. 2013;5:1519–1526. doi: 10.1016/j.celrep.2013.11.033. [DOI] [PubMed] [Google Scholar]
  • 37.Cheng F., Jia P., Wang Q., Lin C.C., Li W.H., Zhao Z. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol. Biol. Evol. 2014;31:2156–2169. doi: 10.1093/molbev/msu167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Haemmerle R., Phaltane R., Rothe M., Schroder S., Schambach A., Moritz T., Modlich U. Clonal dominance with retroviral vector insertions near the ANGPT1 and ANGPT2 genes in a human xenotransplant mouse model. Mol. Ther. Nucleic Acids. 2014;3:e200. doi: 10.1038/mtna.2014.51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Liu W., Xie H. Predicting potential cancer genes by integrating network properties, sequence features and functional annotations. Sci. China Life Sci. 2013;56:751–757. doi: 10.1007/s11427-013-4500-6. [DOI] [PubMed] [Google Scholar]
  • 40.Liu Y., Tian F., Hu Z., DeLisi C. Evaluation and integration of cancer gene classifiers: identification and ranking of plausible drivers. Sci. Rep. 2015;5:10204. doi: 10.1038/srep10204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.List M., Hauschild A.C., Tan Q., Kruse T.A., Mollenhauer J., Baumbach J., Batra R. Classification of breast cancer subtypes by combining gene expression and DNA methylation data. J. Integr. Bioinform. 2014;11:236. doi: 10.2390/biecoll-jib-2014-236. [DOI] [PubMed] [Google Scholar]
  • 42.Nayak L., Tunga H., De R.K. Disease co-morbidity and the human Wnt signaling pathway: a network-wise study. OMICS. 2013;17:318–337. doi: 10.1089/omi.2012.0053. [DOI] [PubMed] [Google Scholar]
  • 43.Phaltane R., Haemmerle R., Rothe M., Modlich U., Moritz T. Efficiency and safety of O(6)-methylguanine DNA methyltransferase (MGMT(P140K))-mediated in vivo selection in a humanized mouse model. Hum. Gene Ther. 2014;25:144–155. doi: 10.1089/hum.2013.171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Saadatian Z., Masotti A., Nariman Saleh Fam Z., Alipoor B., Bastami M., Ghaedi H. Single-nucleotide polymorphisms within micrornas sequences and their 3’ UTR target sites may regulate gene expression in gastrointestinal tract cancers. Iran Red Crescent Med. J. 2014;16:e16659. doi: 10.5812/ircmj.16659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Srivastava A., Kumar S., Ramaswamy R. Two-layer modular analysis of gene and protein networks in breast cancer. BMC Syst. Biol. 2014;8:81. doi: 10.1186/1752-0509-8-81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Shangguan H., Tan S.Y., Zhang J.R. Bioinformatics analysis of gene expression profiles in hepatocellular carcinoma. Eur. Rev. Med. Pharmacol. Sci. 2015;19:2054–2061. [PubMed] [Google Scholar]
  • 47.Yu H., Mitra R., Yang J., Li Y., Zhao Z. Algorithms for network-based identification of differential regulators from transcriptome data: a systematic evaluation. Sci. China Life Sci. 2014;57:1090–1102. doi: 10.1007/s11427-014-4762-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Olszko M.E., Adair J.E., Linde I., Rae D.T., Trobridge P., Hocum J.D., Rawlings D.J., Kiem H.P., Trobridge G.D. Foamy viral vector integration sites in SCID-repopulating cells after MGMTP140K-mediated in vivo selection. Gene Ther. 2015;22:591–595. doi: 10.1038/gt.2015.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Masson A.L., Talseth-Palmer B.A., Evans T.J., Grice D.M., Hannan G.N., Scott R.J. Expanding the genetic basis of copy number variation in familial breast cancer. Hered. Cancer Clin. Pract. 2014;12:15. doi: 10.1186/1897-4287-12-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zeller M., Magnan C.N., Patel V.R., Rigor P., Sender L., Baldi P. A genomic analysis pipeline and its application to pediatric cancers. IEEE/ACM Trans. Comput. Biol. Bioinformatics. 2014;11:826–839. doi: 10.1109/TCBB.2014.2330616. [DOI] [PubMed] [Google Scholar]
  • 51.Yu G., Wang L.G., Yan G.R., He Q.Y. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2015;31:608–609. doi: 10.1093/bioinformatics/btu684. [DOI] [PubMed] [Google Scholar]
  • 52.Collins F.S., Varmus H. A new initiative on precision medicine. N. Engl. J. Med. 2015;372:793–795. doi: 10.1056/NEJMp1500523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Shalem O., Sanjana N.E., Hartenian E., Shi X., Scott D.A., Mikkelsen T.S., Heckl D., Ebert B.L., Root D.E., Doench J.G., et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343:84–87. doi: 10.1126/science.1247005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Gilbert L.A., Horlbeck M.A., Adamson B., Villalta J.E., Chen Y., Whitehead E.H., Guimaraes C., Panning B., Ploegh H.L., Bassik M.C., et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell. 2014;159:647–661. doi: 10.1016/j.cell.2014.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Wang T., Wei J.J., Sabatini D.M., Lander E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80–84. doi: 10.1126/science.1246981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Ran F.A., Hsu P.D., Wright J., Agarwala V., Scott D.A., Zhang F. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 2013;8:2281–2308. doi: 10.1038/nprot.2013.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Borah S., Xi L., Zaug A.J., Powell N.M., Dancik G.M., Cohen S.B., Costello J.C., Theodorescu D., Cech T.R. Cancer. TERT promoter mutations and telomerase reactivation in urothelial cancer. Science. 2015;347:1006–1010. doi: 10.1126/science.1260200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Vinagre J., Almeida A., Populo H., Batista R., Lyra J., Pinto V., Coelho R., Celestino R., Prazeres H., Lima L., et al. Frequency of TERT promoter mutations in human cancers. Nat. Commun. 2013;4:2185. doi: 10.1038/ncomms3185. [DOI] [PubMed] [Google Scholar]
  • 59.McGranahan N., Swanton C. Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. Cancer Cell. 2015;27:15–26. doi: 10.1016/j.ccell.2014.12.001. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES