Skip to main content
American Journal of Cancer Research logoLink to American Journal of Cancer Research
. 2014 Jul 16;4(4):394–410.

Genome-wide prediction of cancer driver genes based on SNP and cancer SNV data

Quanze He 1,*, Quanyuan He 2,*, Xiaohui Liu 3,4, Youheng Wei 1, Suqin Shen 1, Xiaohui Hu 1, Qiao Li 5, Xiangwen Peng 5, Lin Wang 6, Long Yu 1
PMCID: PMC4106657  PMID: 25057442

Abstract

Identifying cancer driver genes and exploring their functions are essential and the most urgent need in basic cancer research. Developing efficient methods to differentiate between driver and passenger somatic mutations revealed from large-scale cancer genome sequencing data is critical to cancer driver gene discovery. Here, we compared distinct features of SNP with SNV data in detail and found that the weighted ratio of SNV to SNP (termed as WVPR) is an excellent indicator for cancer driver genes. The power of WVPR was validated by accurate predictions of known drivers. We ranked most of human genes by WVPR and did functional analyses on the list. The results demonstrate that driver genes are usually highly enriched in chromatin organization related genes/pathways. And some protein complexes, such as histone acetyltransferase, histone methyltransferase, telomerase, centrosome, sin3 and U12-type spliceosomal complexes, are hot spots of driver mutations. Furthermore, this study identified many new potential driver genes (e.g. NTRK3 and ZIC4) and pathways including oxidative phosphorylation pathway, which were not deemed by previous methods. Taken together, our study not only developed a method to identify cancer driver genes/pathways but also provided new insights into molecular mechanisms of cancer development.

Keywords: Bioinformatics, SNV, SNP, mutation frequency, cancer driver gene

Introduction

Cancer is characterized by accumulated somatic mutations during tumorigenesis, of which only a small subset contributes to the tumor progression [1]. Distinguishing these “driver” mutations from the preponderance of “passenger” mutations is still a challenge because of the genetic heterogeneity of cancer [2]. In recent years, several methods were developed for predicting driver genes by taking advantage of the wealth of data produced by high-throughput cancer genome sequencing studies [3-5]. Till now more than 125 driver genes have been identified [1]. These genes have relatively high frequency mutations, which are usually shared by different tumors. However, more and more studies suggested that a great number of mutations with low frequency are shared by various cancers and remain discovered [6-9].

To identify driver genes, most of current methods test whether the mutation rate of each gene is significantly higher than the background (passenger) mutation rate using binomial or likelihood test. They used a common approach to define background non-silent mutation rate ρN, which is a product of ρS * R, where the ρS is the result of dividing the number of observed silent mutations by the number of base pairs and R is the average ratio of the number of potential non-silent mutation sites to the number of potential silent mutation sites. However this model is not as simple as it looks like. The most challenging part of this strategy is how to define pS and R for genes in distinct contexts. Many elaborate models designed to deal with the problem using additional parameters such as mutation type, gene length and nucleotide context to optimize the model [3,5]. Although working well for the genes with high frequency mutations, they have three unavoidable shortcomings for identifying low frequency ones. First, previous approaches ignore the fact that even the same type of mutations may have different impact on proteins’ function when they occur at different sites (such as active site and no essential site); Second, in most cases, the observation number of silent mutations is too small to be used for estimating silent mutation ratio (ρN) accurately for each gene. For example, only 108 silent mutations were identified in the data from Ding et al. who sequenced 623 genes in 188 tumor samples. In average, even one sample has less than one silent mutation; Third, many non-silence mutations are actually “silence” and haven’t significant impact on protein function, which results in overestimation of the value of R and losing of their sensitivity. As it is hard for current methods to overcome these shortcomings, developing new strategies to identify low frequency mutations is an urgent requirement in the field.

A single-nucleotide polymorphism (SNP) is an inheritable single nucleotide variation between members of species or paired chromosomes. Evolutionally, the SNP profile in genome is the fixating result of natural selection in evolution. Because unfavorable mutations will typically be eliminated while favorable changes are quickly fixed in a population, and only neutral (or nearly neutral) mutations, which has little effect on an organism’s fitness, can be accumulated across the genome [10,11]. They were considered as common variants in general population (minor allele frequency (MAF) of somatic mutation is > 1%) [10] and were derived from a lot of whole genome sequencing experiments. Therefore, the density of SNP can be used to estimate the frequency of neutral mutation for each gene [12,13]. Single-nucleotide variants (SNV) are somatic point mutations found in cancer tissues. Majority of them are non-silent mutations locating at exons and lead to alterations of protein’s structure/function. SNVs are enriched in cancer driver genes and cellular pathways essential for tumorigenesis. Here, we propose a new method that compares cancer SNV data to single-nucleotide polymorphism (common SNP) data to identify novel cancer driver genes. We validated the method by precise predictions of known drivers. Functional analyses on driver genes uncovered new protein complexes and pathways that are enriched with driver mutations, which provides new insights into molecular mechanisms of cancer development.

Materials and methods

Data collection

In this research, we collected four types of data: Chip-Seq data: Gene expression (RNA-Seq); mutation data (common SNPs [14] and cancer SNVs [15]) which were download from GEO [16], UCSC, SRA (NCBI Sequence Read Archive http://www.ncbi.nlm.nih.gov/Traces/sra), NIH website (http://dir.nhlbi.nih.gov) and COSMIC [15] databases. The detailed information about datasource can be found in Table S4. The reference genome [17] (NCBI37/hg19) of human was downloaded from UCSC FTP site.

Data pre-processing

Firstly, all ChiP-Seq and RNA-Seq data from SRA were converted to fastq files using SRA toolbox. Genomic read alignment and assemble were done by Bowtie [18] using default parameters with human reference genome of NCBI37/hg19. Secondly, the conversion of gene locations from NCBI36/hg18 to NCBI37/hg19 for sequence assembling was done by liftOver [19]. Finally, all aligned ChIP-Seq, RNA-Seq data files (bed, bam files) were converted into wig files using software MACS [20] with default setting. The cancer SNVs that co-localize with common SNPs were filtered out for consequential analyses.

Calculate weighted SNV/SNP ratios (WVPRs) for genes

Firstly, we used the longest isoform of a gene to define six gene related regions including exon, intron, promoter, tail, acceptor and donor. For each gene, promoter region includes upstream 1000bp and downstream 200bp of TSS; Donors include around 36bp of split site 5’ [21]; Acceptors include upstream 36bp and downstream 24bp of split site 3’ [22]; Tail includes upstream 200bp and downstream 1000bp of TTS. The location information of exons and introns were extracted from reference genomes [17] (NCBI37/hg19) of human. All SNVs and SNPs were mapped to these regions. For each region, the weight (W) is the percentages of total number of SNPs or SNVs within this region to total numbers of SNPs and SNVs across genome. For each gene, we calculated the mutation densities for each region, which is the ratio of the number of mutations within the region to the length of the regions in the gene. Finally, a line model was used to speculate mutation risk in normal and cancer (MRN and MRC) cells. The weighted SNV/SNP ratio (WVPR) is the ratio of MRC to MRN and was used to estimate gene mutation risk for each gene. The formula is as following:

MRN = Wpa × Mpa + Wpd × Mpd + Wpe × Mpe + Wpi × Mpi + Wpp × Mpp + Wpt × Mpt

MRC = Wva × Mva + Wvd × Mvd + Wve × Mve + Wvi × Mvi + Wvp × Mvp + Wvt × Mvt

WVPR = MRC/MRN

Here, the “W” and “M” represent the weight parameters and mutation densities; “p” indicates common SNP; “v” indicates cancer SNV; a, d, e, I, p and t represent six different ranges of gene including acceptor, donor, exon, intron, promoter and tail respectively.

Fisher test for KEGG pathways

Fisher’s exact test was performed to identify KEGG pathways enriched in cancer driver genes. A two-way contingency table was created based on the numbers of common SNPs and cancer SNVs in/out of a certain pathway to calculate the p values using R.

Results

Distinct characteristics of SNPs and SNVs

In this research, we collected 13,608,948 common SNPs and 2,342,135 unique cancer SNVs in which 5,140,763 and 1,735,291 mutation sites were found from 1000bp upstream to 1000bp downstream of 17,498 genes. All of these genes expressed at least in three out of six cells (H1-ESC, CD4, K562 Testis, Ovary and Hepg2) (see ‘Materials and methods’ section). All of SNPs and SNVs are classified into six groups based on the elements they locate on (promoter (upstream 1000bp and downstream 200bp of TSS), exon, intron, donor (around 36bp of split site 5’) [21], acceptor (upstream 36bp and downstream 24bp of split site 3’) [22] and tail (upstream 200bp and downstream 1000bp of TTS)) (Figure 1A). As shown in Figure 1B, although SNPs occur in non-coding regions with little higher frequently than in coding regions, in general, the number of SNPs is highly correlated with the length of elements (Figure 1C) suggesting their nature of neutral mutations. SNVs are highly enriched in exons and two splicing sites (donor and acceptor), which is consistent with previous reports [23-25]. It is notable that both majorities of SNPs (65,697/84,938) and SNVs (690,827/886,381) in exons are non-silent mutations and have almost consistent percentage (77.34% and 77.93%) in all SNP and SNV suggesting that many non-silent mutations are actually neutral. More important, no significant correlation between densities of SNV and SNP (R = 0.115) has been found (Figure 2A). And the size of genes is not correlated with SNV/SNP ratio (Figure S1). As that whether highly expressed genes in cancer cell have elevated mutation rates is a controversial question [26,27], we checked the correlation of mutation density and gene expression. We calculated the correlations between SNP density and gene expression in four normal cell lines/tissues (H1, CD4, Ovary, Testis) and correlations between SNV density and gene expression in two cancer cell lines (K562, Hepg2). No significant correlations were found in all tests supporting the null hypothesis that the gene expression doesn’t take an essential role in regulating SNV and SNP distribution. (Figure 2B).

Figure 1.

Figure 1

Predict cancer driver genes using SNP and SNV data. A. A carton to illustrate the definition of six regions in a gene. B. The distribution of length of six regions; the proportions of common SNPs and cancer SNVs within six different regions. C. The correlation between relative lengths and the numbers of SNVs and SNPs of six regions. D. The percentages of SNPs and SNVs in six regions. E. The distribution of SNPs and SNVs in GATA1. F. The distribution of cancer related genes annotated by databases in our ranked gene list. All genes were sorted and categorized into ten groups based on their WVPR value. The 0%-10% group contains genes with top10% highest WVPRs. There are 455 genes in Cosmic database [15], 180 genes in OMIM database and 168 gene in KEGG database (ver 2011-7-13) were annotated as cancer related genes. Driver gene list includes 125 genes and is adopted from the reference 1.

Figure 2.

Figure 2

The correlation of SNP, SNV with gene expression and epigenetics marks. A. The correlation between SNV/SNP density and gene expression in six cells. B. The correlation between SNV density and SNP density in 17,498 genes. C. The profile of seven epigenetics markers around SNV and SNP sites in H1-ESC and Hepg2 cells.

Distinct chromatin structure at SNV and SNP mutation sites

One possible mechanism to affect generation of DNA sequence variations is the alteration of chromatin structure [28]. However what are epigenetic statues that correlate with SNPs and SNVs occurrence are still unknown. To address the question, we calculated the accumulated profiles of active transcriptional epigenetic markers (such as H3K4me1, H3K4me2, H3K4me3, H3K9ac and H3K27ac) and transcriptional repressive markers (such as H3K9me3 and H3K27me3) at SNPs and SNVs site in H1 (normal human ES Cell) and Hepg2 (a liver cancer cell) cells. We found that the profiles of the same marker from two cells are usually similar. However, the profiles of H3K4me2, H3K9ac and H3K9me3 are significant different between SNPs and SNVs sites in both cells. The SNP sites have lower level of two transcription activation epigenetics markers (H3K4me2, H3K9ac) than around regions and localize at the bottom of valleys, And SNV sites however usually localized at the boundary between regions with high and low level of two epigenetic markers. Intriguingly, the difference of H3K9me3 profiles around SNP and SNV sites are totally different. Usually, SNP sites have low H3K9me3 marker but SNV sites are rich for the modification, which is consistent to recent study [29]. Similar observation also found for H3K27me3 profile in J1 cell. Taken together, SNPs are enriched at the regions free of epigenetics markers and SNVs are usually found at chromatin structure transition regions which are repressed by repressive epigenetic markers (Figure 2C).

Ranking genes by the weighted SNV/SNP ratio (WVPR)

It is reasonable to speculate that driver genes usually have higher SNV density and lower SNP density, as individuals who have mutations in these genes usually get more chances to be eliminated by cancer. This hypothesis also was supported by the analyses of known driver genes such as GATA1 (Figure 1E). Thus the simplest way to identify driver genes is ranking genes with ratio of density of SNV to SNP. However, we found although working well for most of driver genes, this method is not sensitive to some driver genes that have high SNP density such as TP53, PTEN, NF2. To improve its performance, we used a line model to speculate the weighted SNV and SNP ratio by multiplying relative frequencies of each group with the densities of them in each gene. Details can be found in the materials and method section. The formula is as following:

All of genes were then sorted by the ratio of weighted SNV to weighted SNP (WVPR) and classified into ten groups for further analysis.

Method validation

To validate the method, we divided the sorted gene list into ten groups and tested whether known driver genes are enriched in groups with high WVPR. We extracted potential driver genes based on annotations in Cosmic, OMIM and KEGG database. And a driver gene list presented by Bert Vogelstein et al. was also included. As shown in Figure 1E, there is a clear trend of enrichment for each dataset on highly ranked groups. Especially 70% of known driver genes are ranked in top 10% gene group. And most of the well-known oncogenes and tumor suppressor (for example: TP53, PTEN, VHL, NF2, GATA1) have top 10 highest WVPR scores. Some high-risk genes reported by recent GWAS studies are also included in top 10% such as STAT4 and TNFAIP3. They were firstly linked to human diseases in GWAS researches on hepatitis B virus-related hepatocellular carcinoma [30] and systemic lupus erythematosus [31] respectively. We categorized the top 10% genes into 18 gene families and some of them were not reported by previously studies such as ANKRD family (involving cell cycle, immune response, cell structure and cell’s signaling); ZNF family (as key role in gene transcription especially C2H2 zinc finger proteins); Histone family from H1 to H4 (contracture nucleosome); PCDHC family (involving cell adhere) (Table S2). The WVPR distribution in top 10% was shown in Figure S2. These results support the high accuracy of our prediction method and indicate there are more driver genes remained to be discovered.

GO analysis

To understand the functional preference of driver genes, we identify the enriched GO terms for all gene groups by Kolmogorov-Smirnov test. Using the value of -log p where p is the p-value of the test, we construct three matrixes for three GO name spaces and did hierarchical clustering to classify enriched GO items. As Figure 3A shown, the chromosome organization and its related biological processes (such as histone modification) are exclusive enriched in top groups suggesting their significant role in cancer development, which is consistent with previous reports [32-34]. Other processes of transcription regulation, cell cycle regulation and apoptosis are also highly enriched in high ranked groups. It is interesting that genes involved in translation and transportation to organelles have less SNVs than others in cancer cells and are highly enriched in the group with lowest WVPR, which suggests that although having lower possibility to be driver genes, these genes are important for viability of cancer cells. (Figure 3A).

Figure 3.

Figure 3

Discovering functional preference of driver genes by GO analysis. All of genes were categorized into ten groups according to their WVPR values. Hypergeometric test was used to test the enrichment of genes with certain GO items in these groups. The number in each grid is the -log10 (P), where P is the p value of hypergeometirc test. A. The enriched GO items in cell component namespace. B. The enriched GO items in molecular function namespace. C. The enriched GO items in biology processes namespace.

Intriguingly, in cellular component matrix, we found that chromatin remodeling complexes, especially histone acetytransferse complex and histone methytransferase complex are hottest spots of cancer driver mutations. For example, MEN1, OGT, RUVBL1, TAF1L have high WVPR in which MEN1 location on 27 in top 10% highest WVPR. These results suggest that histone modification alterations are one of most fundamental driving mechanisms for tumor genesis. Additionally, genes forming centromere and telomerase complex are also highly enriched in top ranked groups. It makes sense because centromere and telomerase mutations have long been linked to cancer development [35,36]. Furthermore, some complexes, such as U12-type spliceosomal complex and Sin3 complex, which were not deemed by previous studies, were firstly found as hotspots of driver mutations and remain for further study. Finally, no significant enrichment of driver genes was found in other cellular components such as Goligi apparatus, lysosome, ribosome, cytoskeleton and nuclear inner membrane. (Figure 3B).

In molecular function namespace, chromatin-binding genes and transcription cofactors are highly enriched in top ranked group (Figure 3C). The p-value of enrichment analysis in GO item for ten groups have been shown in Tables S5, S6, S7. It is consistent with previous observations [37,38] and reinforces the conclusion that alterations of cis and trans transcription regulation is the major driver force of cancer development.

Discovering cancer driver genes enriched pathways and networks

To discover the pathways involved in cancer development, we searched KEGG pathway database with top 10% high WVPR genes. 25 pathways are rich in these genes (149 genes in total) significantly by Fisher statistic test (P < 0.01). These pathways can be categorized into three groups including 14 cancer related pathways, 7 cell survival pathways and 4 novel pathways (Figure 4A). In cancer related pathways group, 62 high WVPR genes were found in the common cancer pathway (hsa05200) with the most significant p-value (6.97e-06) suggesting good accuracy of our method. The cell survival groups include p53 signaling pathway, cell cycle, apoptosis, Wnt, MAPK and phosphatidylinositol and ubiquitin mediated proteolysis signaling system. All of them have long been thought related to cancer development [5,39-44]. Our data suggests what components are the driver parts of the cancer pathways. For novel pathways, which wasn’t linked to cancer before, 50 unique genes are involved in four potential signal pathways including inositol phosphate metabolim, neurotrophin signaling pathway, amyotrophic lateral sclerosis and Huntinton’s disease. Interestingly, three of them related to neuron diseases. Whether some types of cancer sharing similar bio molecular mechanisms with these diseases remain further exploration.

Figure 4.

Figure 4

Pathways and networks enriched with cancer driver genes. A. Table of KEGG pathways enriched with genes with top 10% WVPR, which are classified into three groups: known cancer pathways, cell survival pathways and novel pathways. B. The core network of cancer driver genes. All genes are represented as circles. They are linked by lines with different colors representing interactions and regulation among them. C. A novel pathway of cancer, the members are location on mitochondrial inner membrane and involving oxidative phosphorylation (Pathway 2).

The ranked list of cancer driver genes also presents a good opportunity to construct a core network of cancer development. We use the high confident PPI (protein-protein interaction) data (p-value > 0.7) extracted from STRING database [45] to assemble the networks de novo. Finally 41 genes and two networks have been discovered which were named Network 1 and 2 (Figure 4B, 4C). Network 1 contains 32 genes, more than half of them (HDAC1, TP53, AKT3, CREB3L4, CASP8, PTEN, MAPK8, PIK3CA, BCL2, RHOA, CREB3L2, CREBBP, KRAS, PIK3R1, NTRK1, PIK3CG, BRAF) have reported in known cancer pathway (yellow node), which forms a core of the network. For example, HDAC1 as a deacetylase is responsible for deacetylation lysine residues on the N-terminal part of histones (H2-H4), which are not only involved in chronic myeloid leukemia but also discovered effectively in cell cycle process [46,47]. GSK3B is an oncogene in basal cell carcinoma, endometrial cancer, prostate cancer, and colorectal cancer and is involved in Wnt signaling pathway and two novel pathways in our result (Neurotrophin signaling pathway, Insulin signaling pathway) [48-54]. Other genes are usually involved in cell survival pathways (such as cell cycle and Wnt, MAPK and phosphatidylinositol signaling pathway) and may serve as interface of the core to link to other pathways. For example, DAXX is a transcription repressor and histone 3.3 specific chaperon and involved in MAPK signaling way. Recent reports suggested that mutations of DAXX result in dysfunction of telomeres and pancreatic neuroendocrine tumors [55].

Network 2 is constructed by 10 genes, which are involved in oxidative phosphorylation. It is notable that three out of five complexes in the network contain high mutation risk genes: Four genes (NDUFA1, NDUFB6, NDUFB5, NDUFB7) belong to mitochondrial respiratory chain complex I; two genes (UQCRC1 and UQCRC2) located on complex III; two genes (COX7B and COX5B) located on complex VI and ATP5B and ATP5E are subunits of F-type ATPases in complex V. As most cancer cells exhibit increased glycolysis for generation of ATP as a main source of their energy supply [56], this surprising result reveals that accumulated mutations and defect of oxidative phosphorylation pathway may be an initial step in cancer development. It also partially answers the question that why cancer cells prefer glycolysis but not oxidative phosphorylation even if oxygen is available.

Novel candidates of cancer driver genes

One of major goals of the study is founding new cancer driver genes. We discovered several gene families that are enriched in top 100 genes (Table S2). Here we focused on histone family and transcription related families.

Although the histone epigenetics modifications have long been linked to cancer development, until recently the missense mutations of histones were given more and more attentions [57,58]. Our data suggests that histone family is a hot spot of cancer related mutations and more ten histone genes have high WVPRs. More importantly, we found that most of un-silent mutations locate on/around (± 1) epigenetics modification sites in H2A, H2B, H3 and H4 (Table S3, Figure 5). For example, in HIST2H2AB and HIST2H2AC, 45 missense mutations and 6 other mutations (including three deletion mutations at N73, L115 and H123, one insertion at K126 and two nonsense mutations at Q24 and S18) accumulate on 10 sites. In which 7 of 10 mutation sites locate on or aside histone modification sites respectively (Tables S8, S9, S10, S11, S12). Although, till now, few studies investigated the biological effect of mutations around these modification sites, it is reasonable to speculate that these mutations may affect the structure and epigenetic statue of chromatin because most of them are highly conserved in evolution. How do these mutations influence histone function and cancer development is an interesting topic for further study.

Figure 5.

Figure 5

The distribution of driver mutations in histone family members with high WVPRs. The histone modification sites are marked by colored rectangles and mutations are represented by triangles and colored backgrounds as the legends in the figure.

Aberrant transcription/translation regulation is a key step of cancer development. Some transcription/translcation factors (e.g. ZIC1, ZIC4, ZNF26, ZNF513, ZNF536, ZMYM3, HOXA1), which were not deemed by previous methods, were highlighted in our list. For example, ZIC1 is sequence-specific transcription factor which involving developmental regulatory and regulation cell cycle and cell migration in gastric cancer [59]. Mutation analysis showed that ZIC1 gene is rich in mutations including 31 silence mutations, 103 missense mutations, two nonsense and one unknown mutations. Intriguingly, 47/50 mutations accumulated on its five C2H2 domains, which only take 30% of protein in length and are responsible for DNA binding (Figure 6). NOVA1 is another example. 90% un-silent mutations of NOVA1 have occurs on its conserved three KH domains, which function as RNA binding domains. Till now, no studies linked it to cancer and it was thought playing a role in regulating RNA splicing or metabolism in a specific subset of developing neurons [60,61].

Figure 6.

Figure 6

The distribution of missense mutations in ZIC1 and NOVA1. Mutations are represented as colored triangles and the domain regions are highlighted with yellow background.

The pattern of mutations

Recent studies suggested that, aside from mutation frequency, the pattern of mutations is also an important feature to identify Mut-driver genes. Oncogenes are usually recurrently mutated at the same positions, whereas tumor suppressor genes may mutate evenly through the gene body [1]. Based on the hypothesis, we tried to classify the top ranked genes into oncogenes and tumor suppressor by calculating the average mutation number per site for top 200 genes (Table S1). 98 driver cancer genes were identified as oncogenes based on the rules: more than 10% mutation sites are recurrently mutations. The other 102 genes may be tumor suppressor genes (Addition File). Two novel candidates of oncogenes including NTRK3 (SNVs are 54 times more than SNPs and 12% mutation sites have been repeated identified in different sample) and ZIC4 (SNVs are 14 times more than SNPs and 14% mutation site has been repeated identified in different sample) have high ratio of recurrently mutations were shown in Figure 6. Most of mutations of NTRK3 were discovered in lung and colon tumors. And the fusion protein ETV6-NTRK3 has been considered as a biomarker in breast carcinoma [62,63]. Although the last studies suggested that NTRK3 is a potential tumor suppressor gene [64], the high mutation frequencies and recurrently mutation rate suggest that it looks like a oncogene. ZIC4 gene encodes a member of ZIC family of C2H2 type zinc finger protein. Although its function is unknown, member of this family were linked to several human diseases such as visceral heterotaxy, and paraneoplastic neurologic disorders [65,66]. Another interesting observation about ZIC4 gene is that most of its cancer SNVs cluster at the exons encoding the N-terminal, C-terminal and two CHC2 domains of the proteins suggesting a potential function of ZIC4 in cancer development. Further experiments are needed to figure out the role of NTRK3 and ZIC4 in the process of cancer development. We also showed two genes (ZIC1 and WAS) as examples of tumor suppressors that usually have even distribution of SNVs (Figure 7).

Figure 7.

Figure 7

The different patterns of mutations in two cancer driver genes (NTRK3 and ZIC4) and two tumor suppressors (ZIC1 and WAS). For NTRK3 and ZIC4, mutations information was obtained from the COSMIC database and recurrent mutations including truncation or insertion in different samples have been shown. For ZIC1 and WAS, the first 30 mutation sites are plotted from COSMIC database.

Discussion

Our analyses suggest that although both of SNPs and cancer SNVs are single-nucleotide polymorphisms, their underline driver mechanisms might be dramatically different. It is supported the result that SNPs are enriched at the regions free of epigenetics markers and SNVs usually was found at chromatin structure transition regions which accumulate some repressive epigenetic markers. Here we propose a model: SNP sites are usually more sensitive to mutagen attack in germ cell than SNVs sites because they are not protected by epigenetic marker binding proteins. Comparing with gene body, which usually occupied/protected by transcriptional and epigenetic factors, intergenic regions have higher chances to get SNP. The chromatin statue of transition regions is changing dynamically in cell and need by protected by certain repressive epigenetic mechanisms such as H3K9me3 or PRC2 complex, which recognizes H3K27me3 marker. In cancer cells, the defect of these repressive epigenetic mechanisms may result in high frequency of mutations within these regions.

Tumorigenesis is an evolutionary process of accumulation of somatic mutations (driver mutation), which promotes a selective growth advantage for cancer cells. Numerous statistical methods to identify driver genes have been proposed. Most of them estimated background mutation ratio using the ratio of frequency of no silent mutations to silent mutations with the hypothesis that most of non-silent mutations are unfavorable and will affect gene function as well as fitness of species. However the validity of the hypothesis is controversy. Because the ratio of non-silent mutations in SNP and SNV is comparable suggesting that in most cases the non-silent mutations are natural, then it means previous methods usually can’t estimate background accurately.

One of significant advantages of our new method is using SNP to estimate the background mutation ratio for each gene. The SNP data cross human genome presents a natural map of neutral mutations. The genomic distribution of SNPs is not homogenous. For each gene, the final distribution of SNPs is the product of natural selection and affected by many factors such as mutation context, gene structure, location, size, nucleotide composition and basal mutation ratio, which vary among genes. As all of these factors already been taken into account, a complex model for correction is not necessary, which enables the method very simple. As the same SNP/SNV at different gene features might have distinct possibilities to be a neutral mutation/driver mutation, we used a line model to estimate the weight parameters for calculating the background as well as cancer mutation ratio, which dramatically increased the sensitivity of the method. In addition, it is notable that the new method also has some limitations. For example, as SNPs are the affixation result of neutral mutations in germ cell, using it to estimate the ratio of somatic neutral mutations in cancer cells may be risky, especially when the statue of gene (such as transcription statue, distribution of epigenetic markers) are dramatically different between germ cell and cancer cell. As a result, some driver genes for specific tumor type may not be identified by the new method efficiently.

The analysis of cancer genomics data is of key importance for understanding oncogenesis. Although vast amounts of cancer genome sequencing data are now available, deciphering this information to draw meaningful conclusions is still challenging. In this study we presented a large ranked list of cancer driver genes and highlighted a lot of new candidates for further analysis. Some complexes such as sin3 and U12-type spliceosomal complexes (Figure S3) and pathways such as oxidative phosphorylation pathway (Figure S4), which were not deemed by previous methods, now were linked to cancer development. Our study not only develops a method to identify cancer driver genes/pathways but also provides new insights into molecular mechanisms of cancer development. The WVPR of 17498 genes have been shown in additional file and the top 200 gene list also provided. Further experiments are needed to validate these findings in the future.

Acknowledgements

The study is supported by the National Key Sci-Tech Special Project of China (2013ZX10002010 and 2008ZX10002-020 to L.Y.), the National Natural Science Foundation of China for Creative Research Groups (30024001 to L.Y.), the Project of the Shanghai Municipal Science and Technology Commission (to L.Y.), the National Natural Science Foundation of China (31071193 to L.Y. and 31100895 to D.-K.J.), Director Foundation of the State Key Laboratory of Genetic Engineering (to L.Y.).

Disclosure of conflict of interest

The conflict of interest is none among authors.

Supporting Information

Addition File

ajcr0004-0394-f9.xls (2.4MB, xls)

References

  • 1.Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW. Cancer genome landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zhang J, Liu J, Sun J, Chen C, Foltz G, Lin B. Identifying driver mutations from sequencing data of heterogeneous tumors in the era of personalized genome sequencing. Brief Bioinform. 2014;15:244–55. doi: 10.1093/bib/bbt042. [DOI] [PubMed] [Google Scholar]
  • 3.Youn A, Simon R. Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics. 2011;27:175–181. doi: 10.1093/bioinformatics/btq630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O’Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, Menzies A, Mironenko T, Perry J, Raine K, Richardson D, Shepherd R, Small A, Tofts C, Varian J, Webb T, West S, Widaa S, Yates A, Cahill DP, Louis DN, Goldstraw P, Nicholson AG, Brasseur F, Looijenga L, Weber BL, Chiew YE, DeFazio A, Greaves MF, Green AR, Campbell P, Birney E, Easton DF, Chenevix-Trench G, Tan MH, Khoo SK, Teh BT, Yuen ST, Leung SY, Wooster R, Futreal PA, Stratton MR. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158. doi: 10.1038/nature05610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, Metcalf GA, Ng B, Milosavljevic A, Gonzalez-Garay ML, Osborne JR, Meyer R, Shi X, Tang Y, Koboldt DC, Lin L, Abbott R, Miner TL, Pohl C, Fewell G, Haipek C, Schmidt H, Dunford-Shore BH, Kraja A, Crosby SD, Sawyer CS, Vickery T, Sander S, Robinson J, Winckler W, Baldwin J, Chirieac LR, Dutt A, Fennell T, Hanna M, Johnson BE, Onofrio RC, Thomas RK, Tonon G, Weir BA, Zhao X, Ziaugra L, Zody MC, Giordano T, Orringer MB, Roth JA, Spitz MR, Wistuba II, Ozenberger B, Good PJ, Chang AC, Beer DG, Watson MA, Ladanyi M, Broderick S, Yoshizawa A, Travis WD, Pao W, Province MA, Weinstock GM, Varmus HE, Gabriel SB, Lander ES, Gibbs RA, Meyerson M, Wilson RK. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Vinagre J, Almeida A, Populo H, Batista R, Lyra J, Pinto V, Coelho R, Celestino R, Prazeres H, Lima L, Melo M, da Rocha AG, Preto A, Castro P, Castro L, Pardal F, Lopes JM, Santos LL, Reis RM, Cameselle-Teijeiro J, Sobrinho-Simoes M, Lima J, Maximo V, Soares P. Frequency of TERT promoter mutations in human cancers. Nat Commun. 2013;4:2185. doi: 10.1038/ncomms3185. [DOI] [PubMed] [Google Scholar]
  • 7.Salvesen HB, Kumar R, Stefansson I, Angelini S, MacDonald N, Smeds J, Jacobs IJ, Hemminki K, Das S, Akslen LA. Low frequency of BRAF and CDKN2A mutations in endometrial cancer. Int J Cancer. 2005;115:930–934. doi: 10.1002/ijc.20702. [DOI] [PubMed] [Google Scholar]
  • 8.Kwiatkowska E, Skasko E, Niwinska A, Wojciechowska-Lacka A, Rachtan J, Molong L, Nowakowska D, Konopka B, Janiec-Jankowska A, Paszko Z, Steffen J. Low frequency of the CHEK2*1100delC mutation among breast cancer probands from three regions of Poland. Neoplasma. 2006;53:305–308. [PubMed] [Google Scholar]
  • 9.Jiang L, Huang J, Morehouse C, Zhu W, Korolevich S, Sui D, Ge X, Lehmann K, Liu Z, Kiefer C, Czapiga M, Su X, Brohawn P, Gu Y, Higgs BW, Yao Y. Low frequency KRAS mutations in colorectal cancer patients and the presence of multiple mutations in oncogenic drivers in non-small cell lung cancer patients. Cancer Genet. 2013;206:330–9. doi: 10.1016/j.cancergen.2013.09.004. [DOI] [PubMed] [Google Scholar]
  • 10.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Visscher PM, Yang J, Goddard ME. A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al. (2010) Twin Res Hum Genet. 2010;13:517–524. doi: 10.1375/twin.13.6.517. [DOI] [PubMed] [Google Scholar]
  • 12.Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science. 2001;294:1719–1723. doi: 10.1126/science.1065573. [DOI] [PubMed] [Google Scholar]
  • 13.Nachman MW. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 2001;17:481–485. doi: 10.1016/s0168-9525(01)02409-x. [DOI] [PubMed] [Google Scholar]
  • 14.ENCODE Project Consortium; Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA NISC Comparative Sequencing Program; Baylor College of Medicine Human Genome Sequencing Center; Washington University Genome Sequencing Center; Broad Institute; Children’s Hospital Oakland Research Institute. Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B, de Jong PJ. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–950. doi: 10.1093/nar/gkq929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41:D991–995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005;33:D501–504. doi: 10.1093/nar/gki025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP, Haussler D, Kent WJ. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 2013;41:D64–69. doi: 10.1093/nar/gks1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Joseph DR, Hall SH, Conti M, French FS. The gene structure of rat androgen-binding protein: identification of potential regulatory deoxyribonucleic acid elements of a follicle-stimulating hormone-regulated protein. Mol Endocrinol. 1988;2:3–13. doi: 10.1210/mend-2-1-3. [DOI] [PubMed] [Google Scholar]
  • 22.Fu QH, Zhou RF, Liu LG, Wang WB, Wu WM, Ding QL, Hu YQ, Wang XF, Wang ZY, Wang HL. Identification of three F5 gene mutations associated with inherited coagulation factor V deficiency in two Chinese pedigrees. Haemophilia. 2004;10:264–270. doi: 10.1111/j.1365-2516.2004.00896.x. [DOI] [PubMed] [Google Scholar]
  • 23.Diez O, Gutierrez-Enriquez S. BRCA2 splice site mutations in an Italian breast/ovarian cancer family. Ann Oncol. 2009;20:1285. doi: 10.1093/annonc/mdp316. author reply 1285-1286. [DOI] [PubMed] [Google Scholar]
  • 24.Bianchi F, Rosati S, Belvederesi L, Loretelli C, Catalani R, Mandolesi A, Bracci R, Bearzi I, Porfiri E, Cellerino R. MSH2 splice site mutation and endometrial cancer. Int J Gynecol Cancer. 2006;16:1419–1423. doi: 10.1111/j.1525-1438.2006.00572.x. [DOI] [PubMed] [Google Scholar]
  • 25.Walsh T, Casadei S, Coats KH, Swisher E, Stray SM, Higgins J, Roach KC, Mandell J, Lee MK, Ciernikova S, Foretova L, Soucek P, King MC. Spectrum of mutations in BRCA1, BRCA2, CHEK2, and TP53 in families at high risk of breast cancer. JAMA. 2006;295:1379–1388. doi: 10.1001/jama.295.12.1379. [DOI] [PubMed] [Google Scholar]
  • 26.Park C, Qian W, Zhang J. Genomic evidence for elevated mutation rates in highly expressed genes. EMBO Rep. 2012;13:1123–1129. doi: 10.1038/embor.2012.165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Masica DL, Karchin R. Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Res. 2011;71:4550–4561. doi: 10.1158/0008-5472.CAN-11-0180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tolstorukov MY, Volfovsky N, Stephens RM, Park PJ. Impact of chromatin structure on sequence variability in the human genome. Nat Struct Mol Biol. 2011;18:510–515. doi: 10.1038/nsmb.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Schuster-Bockler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488:504–507. doi: 10.1038/nature11273. [DOI] [PubMed] [Google Scholar]
  • 30.Jiang DK, Sun J, Cao G, Liu Y, Lin D, Gao YZ, Ren WH, Long XD, Zhang H, Ma XP, Wang Z, Jiang W, Chen TY, Gao Y, Sun LD, Long JR, Huang HX, Wang D, Yu H, Zhang P, Tang LS, Peng B, Cai H, Liu TT, Zhou P, Liu F, Lin X, Tao S, Wan B, Sai-Yin HX, Qin LX, Yin J, Liu L, Wu C, Pei Y, Zhou YF, Zhai Y, Lu PX, Tan A, Zuo XB, Fan J, Chang J, Gu X, Wang NJ, Li Y, Liu YK, Zhai K, Zhang H, Hu Z, Liu J, Yi Q, Xiang Y, Shi R, Ding Q, Zheng W, Shu XO, Mo Z, Shugart YY, Zhang XJ, Zhou G, Shen H, Zheng SL, Xu J, Yu L. Genetic variants in STAT4 and HLA-DQ genes confer risk of hepatitis B virus-related hepatocellular carcinoma. Nat Genet. 2013;45:72–75. doi: 10.1038/ng.2483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Han JW, Zheng HF, Cui Y, Sun LD, Ye DQ, Hu Z, Xu JH, Cai ZM, Huang W, Zhao GP, Xie HF, Fang H, Lu QJ, Xu JH, Li XP, Pan YF, Deng DQ, Zeng FQ, Ye ZZ, Zhang XY, Wang QW, Hao F, Ma L, Zuo XB, Zhou FS, Du WH, Cheng YL, Yang JQ, Shen SK, Li J, Sheng YJ, Zuo XX, Zhu WF, Gao F, Zhang PL, Guo Q, Li B, Gao M, Xiao FL, Quan C, Zhang C, Zhang Z, Zhu KJ, Li Y, Hu DY, Lu WS, Huang JL, Liu SX, Li H, Ren YQ, Wang ZX, Yang CJ, Wang PG, Zhou WM, Lv YM, Zhang AP, Zhang SQ, Lin D, Li Y, Low HQ, Shen M, Zhai ZF, Wang Y, Zhang FY, Yang S, Liu JJ, Zhang XJ. Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nat Genet. 2009;41:1234–1237. doi: 10.1038/ng.472. [DOI] [PubMed] [Google Scholar]
  • 32.Cross NC. Histone modification defects in developmental disorders and cancer. Oncotarget. 2012;3:3–4. doi: 10.18632/oncotarget.436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Biancotto C, Frige G, Minucci S. Histone modification therapy of cancer. Adv Genet. 2010;70:341–386. doi: 10.1016/B978-0-12-380866-0.60013-7. [DOI] [PubMed] [Google Scholar]
  • 34.Chervona Y, Costa M. Histone modifications and cancer: biomarkers of prognosis? Am J Cancer Res. 2012;2:589–597. [PMC free article] [PubMed] [Google Scholar]
  • 35.Blackburn EH. Telomerase and Cancer: Kirk A. Landon--AACR prize for basic cancer research lecture. Mol Cancer Res. 2005;3:477–482. doi: 10.1158/1541-7786.MCR-05-0147. [DOI] [PubMed] [Google Scholar]
  • 36.Shay JW, Zou Y, Hiyama E, Wright WE. Telomerase and cancer. Hum Mol Genet. 2001;10:677–685. doi: 10.1093/hmg/10.7.677. [DOI] [PubMed] [Google Scholar]
  • 37.Shaikhibrahim Z, Wernert N. ETS transcription factors and prostate cancer: the role of the family prototype ETS-1 (review) Int J Oncol. 2012;40:1748–1754. doi: 10.3892/ijo.2012.1380. [DOI] [PubMed] [Google Scholar]
  • 38.Shimizu R, Engel JD, Yamamoto M. GATA1-related leukaemias. Nat Rev Cancer. 2008;8:279–287. doi: 10.1038/nrc2348. [DOI] [PubMed] [Google Scholar]
  • 39.Stegh AH. Targeting the p53 signaling pathway in cancer therapy - the promises, challenges and perils. Expert Opin Ther Targets. 2012;16:67–83. doi: 10.1517/14728222.2011.643299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Collins K, Jacks T, Pavletich NP. The cell cycle and cancer. Proc Natl Acad Sci U S A. 1997;94:2776–2778. doi: 10.1073/pnas.94.7.2776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lowe SW, Lin AW. Apoptosis in cancer. Carcinogenesis. 2000;21:485–495. doi: 10.1093/carcin/21.3.485. [DOI] [PubMed] [Google Scholar]
  • 42.Lascorz J, Forsti A, Chen B, Buch S, Steinke V, Rahner N, Holinski-Feder E, Morak M, Schackert HK, Gorgens H, Schulmann K, Goecke T, Kloor M, Engel C, Buttner R, Kunkel N, Weires M, Hoffmeister M, Pardini B, Naccarati A, Vodickova L, Novotny J, Schreiber S, Krawczak M, Broring CD, Volzke H, Schafmayer C, Vodicka P, Chang-Claude J, Brenner H, Burwinkel B, Propping P, Hampe J, Hemminki K. Genome-wide association study for colorectal cancer identifies risk polymorphisms in German familial cases and implicates MAPK signalling pathways in disease susceptibility. Carcinogenesis. 2010;31:1612–1619. doi: 10.1093/carcin/bgq146. [DOI] [PubMed] [Google Scholar]
  • 43.Hernandez-Aya LF, Gonzalez-Angulo AM. Targeting the phosphatidylinositol 3-kinase signaling pathway in breast cancer. Oncologist. 2011;16:404–414. doi: 10.1634/theoncologist.2010-0402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ciechanover A, Orian A, Schwartz AL. Ubiquitin-mediated proteolysis: biological regulation via destruction. Bioessays. 2000;22:442–451. doi: 10.1002/(SICI)1521-1878(200005)22:5<442::AID-BIES6>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]
  • 45.Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013;41:D808–815. doi: 10.1093/nar/gks1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ammanamanchi S, Freeman JW, Brattain MG. Acetylated sp3 is a transcriptional activator. J Biol Chem. 2003;278:35775–35780. doi: 10.1074/jbc.M305961200. [DOI] [PubMed] [Google Scholar]
  • 47.Wilting RH, Yanover E, Heideman MR, Jacobs H, Horner J, van der Torre J, DePinho RA, Dannenberg JH. Overlapping functions of Hdac1 and Hdac2 in cell cycle regulation and haematopoiesis. EMBO J. 2010;29:2586–2597. doi: 10.1038/emboj.2010.136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Boyle WJ, Smeal T, Defize LH, Angel P, Woodgett JR, Karin M, Hunter T. Activation of protein kinase C decreases phosphorylation of c-Jun at sites that negatively regulate its DNA-binding activity. Cell. 1991;64:573–584. doi: 10.1016/0092-8674(91)90241-p. [DOI] [PubMed] [Google Scholar]
  • 49.Welsh GI, Proud CG. Glycogen synthase kinase-3 is rapidly inactivated in response to insulin and phosphorylates eukaryotic initiation factor eIF-2B. Biochem J. 1993;294:625–629. doi: 10.1042/bj2940625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Beals CR, Sheridan CM, Turck CW, Gardner P, Crabtree GR. Nuclear export of NF-ATc enhanced by glycogen synthase kinase-3. Science. 1997;275:1930–1934. doi: 10.1126/science.275.5308.1930. [DOI] [PubMed] [Google Scholar]
  • 51.Li Y, Bharti A, Chen D, Gong J, Kufe D. Interaction of glycogen synthase kinase 3beta with the DF3/MUC1 carcinoma-associated antigen and beta-catenin. Mol Cell Biol. 1998;18:7216–7224. doi: 10.1128/mcb.18.12.7216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Yook JI, Li XY, Ota I, Fearon ER, Weiss SJ. Wnt-dependent regulation of the E-cadherin repressor snail. J Biol Chem. 2005;280:11740–11748. doi: 10.1074/jbc.M413878200. [DOI] [PubMed] [Google Scholar]
  • 53.Hashimoto YK, Satoh T, Okamoto M, Takemori H. Importance of autophosphorylation at Ser186 in the A-loop of salt inducible kinase 1 for its sustained kinase activity. J Cell Biochem. 2008;104:1724–1739. doi: 10.1002/jcb.21737. [DOI] [PubMed] [Google Scholar]
  • 54.Heyd F, Lynch KW. Phosphorylation-dependent regulation of PSF by GSK3 controls CD45 alternative splicing. Mol Cell. 2010;40:126–137. doi: 10.1016/j.molcel.2010.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Jiao Y, Shi C, Edil BH, de Wilde RF, Klimstra DS, Maitra A, Schulick RD, Tang LH, Wolfgang CL, Choti MA, Velculescu VE, Diaz LA Jr, Vogelstein B, Kinzler KW, Hruban RH, Papadopoulos N. DAXX/ATRX, MEN1, and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors. Science. 2011;331:1199–1203. doi: 10.1126/science.1200609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Gatenby RA, Gillies RJ. Why do cancers have high aerobic glycolysis? Nat Rev Cancer. 2004;4:891–899. doi: 10.1038/nrc1478. [DOI] [PubMed] [Google Scholar]
  • 57.Rheinbay E, Louis DN, Bernstein BE, Suva ML. A tell-tail sign of chromatin: histone mutations drive pediatric glioblastoma. Cancer Cell. 2012;21:329–331. doi: 10.1016/j.ccr.2012.03.001. [DOI] [PubMed] [Google Scholar]
  • 58.Moorefield B. Helicase disc breaks. Nat Struct Mol Biol. 2013;20:1242. [Google Scholar]
  • 59.Zhong J, Chen S, Xue M, Du Q, Cai J, Jin H, Si J, Wang L. ZIC1 modulates cell-cycle distributions and cell migration through regulation of sonic hedgehog, PI(3)K and MAPK signaling pathways in gastric cancer. BMC Cancer. 2012;12:290. doi: 10.1186/1471-2407-12-290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Buckanovich RJ, Posner JB, Darnell RB. Nova, the paraneoplastic Ri antigen, is homologous to an RNA-binding protein and is specifically expressed in the developing motor system. Neuron. 1993;11:657–672. doi: 10.1016/0896-6273(93)90077-5. [DOI] [PubMed] [Google Scholar]
  • 61.Jensen KB, Dredge BK, Stefani G, Zhong R, Buckanovich RJ, Okano HJ, Yang YY, Darnell RB. Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability. Neuron. 2000;25:359–371. doi: 10.1016/s0896-6273(00)80900-9. [DOI] [PubMed] [Google Scholar]
  • 62.Tognon C, Knezevich SR, Huntsman D, Roskelley CD, Melnyk N, Mathers JA, Becker L, Carneiro F, MacPherson N, Horsman D, Poremba C, Sorensen PH. Expression of the ETV6-NTRK3 gene fusion as a primary event in human secretory breast carcinoma. Cancer Cell. 2002;2:367–376. doi: 10.1016/s1535-6108(02)00180-0. [DOI] [PubMed] [Google Scholar]
  • 63.Li Z, Tognon CE, Godinho FJ, Yasaitis L, Hock H, Herschkowitz JI, Lannon CL, Cho E, Kim SJ, Bronson RT, Perou CM, Sorensen PH, Orkin SH. ETV6-NTRK3 fusion oncogene initiates breast cancer from committed mammary progenitors via activation of AP1 complex. Cancer Cell. 2007;12:542–558. doi: 10.1016/j.ccr.2007.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Luo Y, Kaz AM, Kanngurn S, Welsch P, Morris SM, Wang J, Lutterbaugh JD, Markowitz SD, Grady WM. NTRK3 is a potential tumor suppressor gene commonly inactivated by epigenetic mechanisms in colorectal cancer. PLoS Genet. 2013;9:e1003552. doi: 10.1371/journal.pgen.1003552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Cowan J, Tariq M, Ware SM. Genetic and functional analyses of ZIC3 variants in congenital heart disease. Hum Mutat. 2014;35:66–75. doi: 10.1002/humu.22457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Bataller L, Wade DF, Graus F, Stacey HD, Rosenfeld MR, Dalmau J. Antibodies to Zic4 in paraneoplastic neurologic disorders and small-cell lung cancer. Neurology. 2004;62:778–782. doi: 10.1212/01.wnl.0000113749.77217.01. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ajcr0004-0394-f9.xls (2.4MB, xls)

Articles from American Journal of Cancer Research are provided here courtesy of e-Century Publishing Corporation

RESOURCES