Abstract
Background
The majority of established prostate cancer risk-associated Single Nucleotide Polymorphisms (SNPs) identified from genome-wide association studies do not fall into protein coding regions. Therefore, the mechanisms by which these SNPs affect prostate cancer risk remain unclear. Here, we used a series of bioinformatic tools and databases to provide possible molecular insights into the actions of risk SNPs.
Methodology/Principal Findings
We performed a comprehensive assessment of the potential functional impact of 33 SNPs that were identified and confirmed as associated with PCa risk in previous studies. For these 33 SNPs and additional SNPs in Linkage Disequilibrium (LD) (r2 ≥ 0.5), we first mapped them to genomic functional annotation databases, including the Encyclopedia of DNA Elements (ENCODE), eleven genomic regulatory elements databases defined by the University of California Santa Cruz (UCSC) table browser, and Androgen Receptor (AR) binding sites defined by a ChIP-chip technique. Enrichment analysis was then carried out to assess whether the risk SNP blocks were enriched in the various annotation sets. Risk SNP blocks were significantly enriched over that expected by chance in two annotation sets, including AR binding sites (p=0.003), and FoxA1 binding sites (p=0.05). About one third of the 33 risk SNP blocks are located within AR binding regions.
Conclusions/Significance
The significant enrichment of risk SNPs in AR binding sites may suggest a potential molecular mechanism for these SNPs in prostate cancer initiation, and provide guidance for future functional studies.
Keywords: functional annotation, prostate cancer, bioinformatics, genome-wide association study
Introduction
Genome-wide association studies have identified ~33 established prostate cancer (PCa) risk-associated Single Nucleotide Polymorphisms (SNPs) (1–9). SNPs are DNA sequence variations occurring when a single nucleotide in the genome differs between individuals. Although these SNP associations have been consistently replicated in multiple studies, the functional roles of these risk SNPs remain uncharacterized, largely due to their location in DNA regions that do not encode proteins. Additionally, almost half of the SNP loci are located in regions which do not harbor any known genes. For example, multiple SNPs at 8q24 reside within a 1.3Mb gene-desert region (10), with the closest gene, c-Myc, located ~300kb downstream to rs1447295. Because the knowledge base for non-coding DNA is generally limited, few studies have been performed to evaluate the functional impact of the risk SNPs on PCa etiology (11–12). Indeed, the ability to understand the functional impact of the risk SNPs will most likely require additional emphasis on potential transcriptional regulatory mechanisms on non-coding DNA sequence.
The Encyclopedia of DNA Elements (ENCODE) project aims to provide a comprehensive catalog of biological information for functionally important elements. These elements include non-protein coding DNA sequences which regulate gene transcription, gene expression, and DNA replication. The ENCODE pilot study rigorously analyzed a small proportion (1%) of the human genome using computational and experimental methods. Results of the pilot study highlighted the complexity of transcriptional regulation and demonstrated the knowledge gap in this area (13). Based on the initial success, National Human Genome Research Institute (NHGRI) expanded the human ENCODE project to the whole genome (www.genome/gov/ENCODE). At this time, a primary focus of ENCODE is on the characterization of binding of transcriptional factors (TF) and chromatin structure, which represent two of the major factors involved in transcriptional regulation, using the ChIP-seq technique. ChIP-seq is a state-of-the-art high-throughput approach that involves chromatin immunoprecipitation and high-throughput sequencing of immunoprecipitated DNA. Compared with other methods of characterizing regulatory elements, a key advantage of ChIP-seq is a systematic and nonbiased approach, which does not depend on previous knowledge of canonical promoter regions and allows for evaluation of binding complexes of transcription factors and regulatory elements in a more natural state (14). The availability of this comprehensive catalog may facilitate an improved understanding of the functional role of risk SNPs located in non-coding regions.
In addition to the ENCODE project, several recent studies have utilized the high-throughput ChIP-chip method to identify genome-wide binding sites for transcription factors, including Androgen receptor (AR), Forkhead box A1 (FoxA1), and Estrogen receptor (ER) (15,16). Similar to ChIP-seq, ChIP-chip is another high-throughput, global approach for mapping transcription factors. ChIP-chip analysis involves isolating target DNA through chromatin immunoprecipitation, followed by analysis on DNA microarrays that tile the human genome (14). AR is a well-known transcription factor that plays an important role in prostate cancer initiation and progression. Risk SNPs located in putative AR binding sites might change the affinity of androgen-AR complex to binding sequences, thus providing a mechanism leading to modification of PCa risk.
In our study, we performed a comprehensive assessment of the potential functional impact of SNPs that were associated with PCa risk by GWAS studies, utilizing ENCODE genomic annotation databases, as well as ~20 annotation databases from the University of California Santa Cruz table browser (UCSC table browser) (http://genome.ucsc.edu/) and TF binding sites defined by previous studies (15,16). Enrichment analysis was then performed to evaluate whether the risk SNPs were over-represented in any of the annotation sets. To our knowledge, our study is among the first attempts to comprehensively characterize the potential function of risk SNPs using existing bioinformatic databases. These results assist in the interpretation of the molecular mechanisms of the risk SNPs on PCa etiology and provide guidance for future functional studies.
Methods
Define SNPs that are in Linkage Disequilibrium (LD) with established PCa risk-associated SNPs discovered from GWAS
The causative SNP may be the risk SNP itself or the SNPs in LD with them. Therefore, we identified all SNPs in LD (r2 ≥ 0.5) with the 33 risk SNPs discovered by GWAS (SNPs that reached a genome-wide significance level with a p value equal or less than 10−7 in previous studies (1–9)) based on the CEU genotype data from the HapMap release #27 (Phase II+PhaseIII) (http://hapmap.ncbi.nlm.nih.gov/). We consider each risk SNP and SNPs that are in LD (r2 ≥ 0.5) with it as one risk SNP block.
Overlapping the risk SNP blocks with functionally annotated genomic regions
We mapped SNPs in each risk SNP block to the ENCODE genomic annotation databases (release #2), as well as eleven annotation databases from UCSC (http://genome.ucsc.edu/) and transcription factors defined by previous studies. We defined a risk SNP block as located within a given annotated region if the risk SNP itself, or at least one of the SNPs in LD with the risk SNP, mapped to the annotated region.
Assessment of enrichment of the risk SNP blocks in the annotated genomic regions
We counted the number of risk SNP blocks that mapped to each annotated genomic region. Each risk SNP block was counted only once, even if more than one SNP within the same block mapped to the annotated region. A simulation analysis was used to assess the statistical significance of any potential enrichment for risk SNP blocks within annotated genomic regions, under a null hypothesis that none of these blocks were truly associated with PCa risk. We began the simulation analysis by randomly generating 1,000 sets of 33 SNPs (1,000 replicates) from the ~2.5 million SNPs in the genome with minor allele frequency (MAF)>=0.05 (Hapmap Phase II). We then identified all SNPs in LD with the randomly selected 33 SNPs, and performed the same analysis as for the true risk SNPs, including overlapping the SNP blocks with functionally annotated genomic regions and then counting the number of the SNP blocks that mapped to each annotated genomic region. Next, the mean number of risk SNP blocks that mapped to each annotated region was calculated based on the average counts of the 1,000 replicates. Finally, empirical p-values were calculated based on the number of replicates in which the number of counts was equal or larger than the observed number, divided by the total number of replicates. To reduce the concern of multiple testing, we limited the enrichment analysis to annotation sets with 5 or more mapped risk SNP blocks.
Results
Identification of SNPs in LD with PCa risk SNPs
We identified a total of 972 SNPs in LD with the 33 risk SNPs. A list of these SNPs and pair-wise r2 for each risk SNP is provided in Supplementary Table 1.
Defining the functional annotation databases
We further grouped the genomic annotation databases into six categories (Table 1), majorly based on the potential functionality and techniques used to define the annotation sets: 1) Yale ENCODE (Yale Transcription Factor Binding Sites (TFBS)) characterizes the binding sites for a series of transcription factors including c-Myc, GATA-2, SIRT6, TCF7L2, STAT1, NK-kB, c-Fox, c-Jun, E2F6, Max and SIRT6; 2) Broad ENCODE (Broad histone) defines genomic regions with chromatic accessibility and histone modifications, including regions that are enriched with histone markers (H3K4m1, H3K4m2, H3K4m3, H3K27ac, and H3K9ac); 3) regulatory elements defined by UCSC table browser (http://genome.ucsc.edu/), which includes 11 genomic regulatory annotation sets; 4) a conserved region annotation set was also retrieved from UCSC phastConsElements28way and phaseConsElements17way table with conservation scores >500 (a conservation score is a measurement of the degree of conservation of a genomic region) ; 5) coding regions and splice sites that include annotation sets for the protein coding regions, and non-protein-coding RNAs (including transfer RNAs, ribosomal RNAs, small nuclear RNAs, and micro (mi) RNAs); 6) annotation sets including AR, ER and FoxA1 binding sites as defined by the ChIP-on-chip technique were obtained from previous studies (14,15).
Table 1.
Annotation databases will be used for the bioinformatics analysis
Annotation type | Brief explanation | Sources and references |
---|---|---|
Yale TFBS | ||
Transcription factors | Transcription factor binding sites as defined by Encode project, including c-Myc, GATA-2, SIRT6, TCF7L2, STAT1, NK-kB, c-Fox, c-Jun, E2F6, Max, SIRT6 | Encode project performed by Yale University, UCSC Table browser |
Broad Histone | ||
Histone modifications | Genomic regions with chromatic accessibility and histone modifications, including regions that are enriched with histone marks (H3K4m1, H3K4m2, H3K4m3, H3K27ac, and H3K9ac) | ENCODE project performed by Broad Institute, UCSC Table browser |
Regulatory elements defined by UCSC table browser | ||
CpgIslandExt | Targeted methylation regions (CpG island) predicted by computation algorithms | UCSC Table browser, ref 26 |
encodeUViennaRnaz | Noncoding RNA region predicted by three computational algorithms (EvoFold, RNAz and AlifoldZ) in ENCODE regions | UCSC Table browser, ref 27 |
eponine | Transcription start sites (TSS) predicted by a probabilistic method (Eponine), with good specificity and excellent positional accuracy. | UCSC Table browser, ref 28 |
firstEF | Computationally predicted promoter regions and first exons in the human genome | UCSC Table browser, ref 29 |
lamB1 | Genomics regions interacting with nuclear lamina may contribute to the spatial organization of chromosomes inside the nucleus | UCSC Table browser, ref 30 |
oregano | The Open REGulatory ANNOtation database (ORegAnno) is an open database for the known regulatory elements from scientific literature (with various biological experimentally supported regulatory regions) | UCSC Table browser |
polyaDb | mRNA ploly (A) sites that are mapped by cDNA/EST sequences | UCSC Table browser, ref 31 |
switchDbTss | Computationally predicted transcription start sites (TSS) based on cDNA alignment | UCSC Table browser, ref 32 |
targetScanS | TargetScan predicts biological targets of micro (mi) RNAs by searching for the presence of conserved 8mer and 7mer sites that match the seed region of each miRNA | UCSC Table browser, ref 33–35 |
tfbsConsSites | Location and score of transcription factor binding sites conserved in the human/mouse/rat alignment; data are computed with the Transfac Matrix Database (v7.0) and are purely computational | UCSC Table browser |
VistaEnhancers | Defines distant-acting transcriptional enhancers by combining computational approach and a moderate throughput mpise transgenesis enhancer assay | UCSC Table browser, ref 36 |
Conserved region | ||
Conserved region | Genomic region that are conserved across different species | UCSC phastConsElements28way and phaseConsElements17way table with conservation score >500 |
Coding region and splicing sites | ||
Coding | Genomic regions coding for genes | UCSC Table browser |
Non-Synonymous change | Genomic regions where a nucleotide substitution leads to an amino acid change | UCSC Table browser |
Splice sites | Functional element that affect the splicing process | UCSC snp129 |
Non-protein-coding RNAs | Transfer RNAs, ribosomal RNAs, small nuclear RNAs, and miRNAs | UCSC Table browser (rnaGene) |
Transcription factor binding sites defined by ChIP-on-chip technique | ||
AR binding | Androgen Receptor binding regions defined by ChIP-on-chip technology | ref 15 |
ER binding | Estrogen Receptor binding regions defined by ChIP-on-chip technology | ref 16 |
FoxA1 binding | FoxA1 binding regions defined by ChIP-on-chip technology | ref 15 |
Mapping of risk SNP blocks to the functional annotation sets
Detailed annotation for each of the 33 risk SNP blocks are shown in Table 2. Only annotation sets with more than one mapped risk SNP block are listed in Table 2. Detailed information about the mapped SNPs that are in LD with the risk SNPs are presented in Supplementary Table 2. Briefly, 10 risk SNP blocks fall into conserved regions. No risk SNP blocks map to coding regions, non-synonymous changes, splice sites, non-protein-coding RNAs, miRNAs, miRNA target regions or methylation sites (data not shown). Based on category #6 genomic annotation databases (defined in preceding paragraph), 11, 4 and 9 risk SNP blocks were found to map to AR, ER and FoxA1 binding sites, respectively.
Table 2.
Annotation information on 33 risk SNP blocks based on various genomic databases
CHR | SNPs | Note | BP* | genes | Yale ENCODE | Broad ENCODE | regulatory elements defined by UCSC annotation |
Conserved | trascription factor binding sites defined by previous papers (13,14) |
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cpgIslandExt | encodeUViennaRnaz | firstEF | laminB1 | oreganno | tfbsConsSites | Region | AR | ER | FoxA | |||||||
2 | rs1465618 | New 2p21 | 43,407,453 | THADA | H3K4me1 | Yes | ||||||||||
2 | rs721048 | 2p15 | 62,985,235 | EHBP1 | H3K4me2,H3K27ac,H3K9ac, | Yes | Yes | Yes | ||||||||
2 | rs12621278 | New2q31.1 | 173,019,799 | ITGA6 | c-Fos,Max,TCF7L2,STAT1 | H3K4me1,H3K4me2,H3K27 ac,H3K4me3 |
Yes | Yes | Yes | Yes | ||||||
3 | rs2660753 | 3p12 | 87,193,364 | c-Fos,STAT1 | H3K4me1 | Yes | Yes | |||||||||
3 | rs10934853 | New3q21.3 | 129,521,063 | STAT1,GATA-2,ZNF263 | H3K4me1,H3K4me2 | Yes | Yes | Yes | Yes | Yes | ||||||
4 | rs17021918 | New4q22.3 | 95,781,900 | PDLIM5 | H3K4me1,H3K4me2 | Yes | Yes | Yes | Yes | Yes | ||||||
4 | rs7679673 | New 4q24 | 106,280,983 | TET2 | JunD,NFKB,STAT1 | H3K4me2,H3K4me3 | Yes | Yes | ||||||||
6 | rs9364554 | 6q25 | 160,753,654 | c-Fos | H3K4me1,H3K4me2,H3K4m e3, |
Yes | ||||||||||
7 | rs10486567 | 7p15 | 27,943,088 | JAZF1 | H3K4me1,H3K4me2 | Yes | Yes | Yes | ||||||||
7 | rs6465657 | 7q21 | 97,654,263 | LMTK2 | NFKB,Pol2,c- Myc,SIRT6,ZNF263 |
H3K4me1,H3K4me2,H3K4m e3 |
Yes | Yes | Yes | Yes | Yes | |||||
8 | rs2928679 | 8p21.2 | 23,494,920 | cTBP2 | H3K4me1,H3K4me2 | Yes | Yes | Yes | ||||||||
8 | rs1512268 | 8p21.2 | 23,582,408 | NKX3.1 | Pol2,c-Myc,HA- E2F1,TCF7L2,GATA-2,NF- YB,SIRT6,ZNF263 |
H3K4me1,H3K27ac,H3K4me 3,H3K9ac, |
Yes | Yes | Yes | Yes | Yes | Yes | ||||
8 | rs10086908 | New 8q24 (5) |
128,081,119 | H3K4me1,H3K4me3 | Yes | Yes | ||||||||||
8 | rs16901979 | 8q24(2) | 128,194,098 | H3K4me1,H3K4me2 | Yes | Yes | Yes | |||||||||
8 | rs16902094 | New 8q24.21 |
128,389,528 | |||||||||||||
8 | rs620861 | New 8q24 (4) |
128,404,855 | TCF7L2 | H3K4me2 | Yes | Yes | Yes | Yes | Yes | ||||||
8 | rs6983267 | 8q24(3) | 128,482,487 | TCF7L2,STAT1,Rad21 | Yes | Yes | ||||||||||
8 | rs1447295 | 8q24(1) | 128,554,220 | Yes | ||||||||||||
9 | rs1571801 | 9q33 | 123,467,194 | Yes | ||||||||||||
10 | rs10993994 | 10q11 | 51,219,502 | MSMB | ZNF263 | H3K4me1,H3K4me2,H3K27 ac,H3K9ac |
Yes | Yes | Yes | Yes | Yes | Yes | ||||
10 | rs4962416 | 10q26 | 126,686,862 | CTBP2 | Pol2,E2F6,ZNF263 | H3K4me1,H3K4me3 | Yes | Yes | ||||||||
11 | rs7127900 | New 11P15.5 |
2,190,150 | IGF2, IGF2AS, INS, TH |
Pol2 | H3K4me1 | Yes | |||||||||
11 | rs12418451 | 11q13(2) | 68,691,995 |
AL137479, BC043531 |
c-Fos,Max,NF-YA,NF-YB | H3K4me1 | ||||||||||
11 | rs10896449 | 11q13(1) | 68,751,243 | Max,c-Myc | H3K4me1,H3K4me2 | Yes | ||||||||||
17 | rs11649743 | 17q12 (2) | 33,149,092 | ZNF263 | H3K4me1,H3K4me3 | Yes | Yes | Yes | Yes | |||||||
17 | rs4430796 | 17q12(1) | 33,172,153 | TCF2 | Pol2,STAT1 | H3K4me1 | Yes | |||||||||
17 | rs1859962 | 17q24.3 | 66,620,348 | H3K4me1,H3K4me2 | Yes | Yes | Yes | |||||||||
19 | rs8102476 | New 19q13.2 |
43,427,453 | c-Myc | H3K4me1,H3K9ac | Yes | Yes | |||||||||
19 | rs887391 | 19q13 | 46,677,464 | 10 Mb to KLK3 |
NF-YB | H3K27ac, | Yes | |||||||||
19 | rs2735839 | 19q13 (KLK3) |
56,056,435 | KLK3 | ||||||||||||
22 | rs9623117 | New22q13 | 38,782,065 | GATA-2 | H3K4me1,H3K4me2 | Yes | Yes | |||||||||
22 | rs5759167 | New 22q13.2 |
41,830,156 | TTLL1, BIK, MCAT, PACSIN2 |
c-Fos,Max,c-Jun,c- Myc,TCF7L2,E2F6,GATA- 2,SIRT6 |
H3K4me1,H3K4me2 | ||||||||||
X | rs5945619 | Xp11 | 51,258,412 | NUDT10, NUDT11, LOC340602 |
bp (base pair) position is based on NCBI build 36
For Yale TFBS and Broad Histone annotation sets, transcription factor binding sites and regions that are enriched with specific histone markers were provided for each risk SNP block. For the remaining three annotation categories, a “yes” means the risk SNP block overlapped with the annotation category. Empty cells mean the risk SNP block does not overlap with the annotation category.
Enrichment analysis
Risk SNP blocks were significantly enriched in genomic regions containing AR binding sites, with 11 (34.4%) risk SNP blocks mapping to these sites, whereas only 4.0 (12.5%) blocks randomly generated from the genome are located at such sites (p=0.003). Similarly, risk SNP blocks were significantly enriched in regions containing FoxA1 binding sites (7 (21.88%) vs 3.47 (10.84%); p=0.05) (Table 2). Risk SNP blocks were not significantly enriched in any of the other annotation genomic sets that were tested.
Discussion
PCa risk-associated SNPs identified from GWA studies have been consistently replicated and confirmed in a large number of studies (1–9). The clinical utility of predicting an individual’s risk for PCa and identification of high risk men for PCa using these SNPs have been extensively discussed and explored (17–19). In contrast, the biological mechanisms by which the risk SNPs affect PCa initiation are poorly understood. In this study, we used bioinformatics tools to provide insight into the potential functional impact of these SNPs. Importantly, risk SNPs were found to be significantly enriched over that expected by chance in two functional annotation sets, consisting of AR binding sites (p=0.003) and FoxA1 binding sites (p=0.05).
AR, a member of the nuclear hormone receptor family, is a well-known transcription factor which plays an important role in prostate cancer initiation, although the precise mechanisms by which androgens promote prostate carcinogenesis remain ill-defined despite years of investigation. It is clear, however, that healthy men receiving preventive drugs that block the conversion of testosterone to dihydrotestosterone, the more potent androgen, experience a significant reduction (~25%) in PCa risk (20,21). Upon androgen binding, the AR-androgen dimer translocates from the cytosol into the cell nucleus. The AR-androgen dimer complex then binds to specific DNA sequences known as AR binding sites, recruiting coactivators and other factors which direct and regulate target gene expression. In our study, we demonstrated that almost one third of the 33 known risk SNP blocks are located in AR binding regions identified by the ChIP-chip method (15). A total of ~22,000 AR binding regions have been mapped across the prostate cell genome, using the Model-based Analysis of Tiling-arrays (MAT) algorithm (22) and based on a false discovery rate of 15% (p<10−4) (15). The average length of the AR binding regions is 911 base pairs (bp), with a range from 299 to 5,554 bp. The significant enrichment of known risk SNP blocks in regions that harbor AR binding regions suggests a molecular mechanism that may explain the associations between the risk SNPs and PCa risk. The statistical significance (P=0.003) remains even after a stringent Bonferroni correction (13 independent tests were performed in enrichment analysis, p=0.05/13=0.0037). Known risk SNPs that are located within the putative AR binding sites may change the binding affinity of the AR-androgen complex for the binding sequence, leading to altered expression of AR target genes that presumably play rate limiting roles in PCa formation.
Risk SNP blocks were also enriched in FoxA1 binding sites with a nominal p value of 0.05. FoxA1 acts as an AR collaborating cofactor and assists nuclear receptor binding in certain genomic regions (23,24). The coupling of FoxA1 to binding sites is required for AR binding to enhancers in multiple AR-targeted genes (15). SNPs that reside in FoxA1 binding sites may affect the binding affinity of FoxA1 protein and lead to increased risk for PCa. However, the importance of risk SNPs and FoxA1 binding need to be interpreted with caution since the enrichment in FoxA1 did not reach statistical significance after Bonferroni correction (P = 0.65 after Bonferroni correction).
One advantage of our study is the use of a comprehensive and unbiased approach to evaluate TF binding regions defined by ChIP-seq and ChIP-chip techniques, which allowed us to evaluate the regulatory elements on a genome-wide level. Regulation of gene expression is a complicated process and current knowledge in this area is very limited. The pilot study of the ENCODE project revealed that only about 25% of the regulatory elements are located near previously identified transcription starting sites (TSS). This suggests the ChIP-on-chip method is able to identify a large number of promoter regions and regulatory elements that were previously unknown and distant from the classical TSS (13). Although we did not observe any significant enrichment of risk SNP blocksin the ENCODE annotation sets, this might be due to tissue-specific patterns of TF and TF regulation. Currently, a variety of cell lines, mainly Hela and GM06990 have been used for the identification of regulatory elements in the ENCODE project. The LNCaP cell line is currently proposed for study by ENCODE consortium in Tier 3 (http://genome.ucsc.edu/ENCODE/cellTypes.html). Prostate cancer tissue-specific TF binding may provide valuable information for evaluating the molecular mechanisms of risk SNPs on PCa risk.
The fact that no risk SNP blocks are located in regions that code proteins may be due to two reasons. First, we only evaluated SNPs that are characterized by the Hapmap project, in which unknown SNPs that are located in protein coding regions are not evaluated. Secondly, the current resources that are commonly used for gene annotation, RefSeq and ENSEMBLE, likely represents an incomplete catalogue of human genes. SNPs may be located within exons that are not yet identified. In fact, about 60% of GENCODE exons (GENCODE is a sub-project of ENCODE, which aimed to provide a reference annotation of all protein-coding genes within the pilot study of the ENCODE project) are not annotated in RefSeq and ENSEMBL (25). This fact indicates that a high number of alternative slice forms with unique exons exist across the genome (25). With the completion of ENCODE in the near future, a richer and more complete annotation of human genes should provide more insight.
We also did not observe significant enrichment of risk SNP blocks in the regulatory annotation sets defined by UCSC table browsers. However, the null results for these annotation sets need to be interpreted with caution. The majority of the regulatory elements annotation sets are defined by computational algorithms, rather than by biological experiments. In addition, the regulatory elements predicted in those annotation sets are not tissue specific.
In summary, our study is among the first to comprehensively evaluate the potential functional impact of risk loci identified for PCa through GWAS studies. The fact that about one third of 33 SNP blocks fall within AR binding regions, and that the risk SNPs were statistically enriched in AR regions, may suggest a potential molecular mechanism by which risk SNPs contribute to PCa initiation. These results also provide a guidance for future functional studies. In addition, the databases used for bioinformatics annotation could also be used to annotate and prioritize variants identified through GWAS and whole-genome sequencing.
Supplementary Material
Table 3.
Enrichment analysis of the 33 risk SNP blocks
Annotation type* | # of counts and frequency (%) in the PRAS blocks |
# of counts and frequency in the randomly generated SNP blocks |
P-value# |
---|---|---|---|
Yale TFBS | |||
c-Myc | 5 (15.63) | 3.43 (0.11) | 0.26 |
TCF7L2 | 5 (15.63) | 3.08 (0.1) | 0.18 |
STAT1 | 6 (18.75) | 3.61 (0.11) | 0.13 |
c-Fos | 5 (15.63) | 3.58 (0.11) | 0.28 |
Broad Histone | |||
H3K4me1 | 23 (71.88) | 19.88 (0.62) | 0.18 |
H3K4me2 | 16 (50.00) | 14.04 (0.44) | 0.32 |
H3K4me3 | 8 (25.00) | 5.72 (0.18) | 0.19 |
H3K27ac | 5 (15.63) | 4.29 (0.13) | 0.43 |
Regulatory elements defined by UCSC table browser | |||
laminB1 | 24 (75.00) | 24.61 (76.91) | 0.53 |
tfbsConsSites | 6 (18.75) | 8.93 (0.28) | 0.82 |
Conserved region | |||
Conserved region | 10 (31.25) | 8.69 (27.16) | 0.38 |
Transcription factor binding sites defined by ChIP-chip technology | |||
AR binding | 11 (34.38) | 3.99 (12.47) | 0.003 |
Fox A1 binding | 7 (21.88) | 3.47 (10.84) | 0.05 |
Enrichment analysis was only performed for annotation sets with 5 or more mapped risk SNP blocks
p-value is based on 1,000 simulation replicates
Acknowledgments
Grants: National Cancer Institute (CA129684, CA140262, CA148463 to J.X.) and Department of Defense (W81XWH-09-1-0488 to J.S.)
References
- 1.Yeager M, Orr N, Hayes RB, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. doi: 10.1038/ng2022. [DOI] [PubMed] [Google Scholar]
- 2.Gudmundsson J, Sulem P, Steinthorsdottir V, et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet. 2007;39:977–983. doi: 10.1038/ng2062. [DOI] [PubMed] [Google Scholar]
- 3.Duggan D, Zheng SL, Knowlton M, et al. Two genome-wide association studies of aggressive prostate cancer implicate putative prostate tumor suppressor gene DAB2IP. J Natl Cancer Inst. 2007;99:1836–1844. doi: 10.1093/jnci/djm250. [DOI] [PubMed] [Google Scholar]
- 4.Thomas G, Jacobs KB, Yeager M, et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet. 2008;40:310–315. doi: 10.1038/ng.91. [DOI] [PubMed] [Google Scholar]
- 5.Gudmundsson J, Sulem P, Rafnar T, et al. Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nat Genet. 2008;40:281–283. doi: 10.1038/ng.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Eeles RA, Kote-Jarai Z, Giles GG, et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet. 2008;40:316–321. doi: 10.1038/ng.90. [DOI] [PubMed] [Google Scholar]
- 7.Yeager M, Chatterjee N, Ciampa J, et al. Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet. 2009;41:1055–1057. doi: 10.1038/ng.444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gudmundsson J, Sulem P, Gudbjartsson DF, et al. Genome-wide association and replication studies identify four variants associated with prostate cancer susceptibility. Nat Genet. 2009;41:1122–1126. doi: 10.1038/ng.448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Eeles RA, Kote-Jarai Z, Al Olama AA, et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat Genet. 2009;41:1116–1121. doi: 10.1038/ng.450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ghoussaini M, Song H, Koessler T, Al Olama AA, Kote-Jarai Z, Driver KE, Pooley KA, Ramus SJ, Kjaer SK, Hogdall E, DiCioccio RA, Whittemore AS, Gayther SA, Giles GG, Guy M, Edwards SM, Morrison J, Donovan JL, Hamdy FC, Dearnaley DP, Ardern-Jones AT, Hall AL, O'Brien LT, Gehr-Swain BN, Wilkinson RA, Brown PM, Hopper JL, Neal DE, Pharoah PD, Ponder BA, Eeles RA, Easton DF, Dunning AM UK Genetic Prostate Cancer Study Collaborators/British Association of Urological Surgeons' Section of Oncology; UK ProtecT Study Collaborators. Multiple loci with different cancer specificities within the 8q24 gene desert. J Natl Cancer Inst. 2008 Jul 2;100(13):962–966. doi: 10.1093/jnci/djn190. Epub 2008 Jun 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lou H, Yeager M, Li H, Bosquet JG, Hayes RB, Orr N, Yu K, Hutchinson A, Jacobs KB, Kraft P, Wacholder S, Chatterjee N, Feigelson HS, Thun MJ, Diver WR, Albanes D, Virtamo J, Weinstein S, Ma J, Gaziano JM, Stampfer M, Schumacher FR, Giovannucci E, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Crawford ED, Anderson SK, Tucker M, Hoover RN, Fraumeni JF, Jr, Thomas G, Hunter DJ, Dean M, Chanock SJ. Fine mapping and functional analysis of a common variant in MSMB on chromosome 10q11.2 associated with prostate cancer susceptibility. Proc Natl Acad Sci U S A. 2009 May 12;106(19):7933–7938. doi: 10.1073/pnas.0902104106. Epub 2009 Apr 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chang BL, Cramer SD, Wiklund F, Isaacs SD, Stevens VL, Sun J, Smith S, Pruett K, Romero LM, Wiley KE, Kim ST, Zhu Y, Zhang Z, Hsu FC, Turner AR, Adolfsson J, Liu W, Kim JW, Duggan D, Carpten J, Zheng SL, Rodriguez C, Isaacs WB, Grönberg H, Xu J. Fine mapping association study and functional analysis implicate a SNP in MSMB at 10q11 as a causal variant for prostate cancer risk. Hum Mol Genet. 2009 Apr 1;18(7):1368–1375. doi: 10.1093/hmg/ddp035. Epub 2009 Jan 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.The ENCODE Project Consortium Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799–816. doi: 10.1038/nature05874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 2007 Jun 17;6:898–909. doi: 10.1101/gr.5583007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wang Q, Li W, Zhang Y, Yuan X, Xu K, Yu J, et al. Androgen receptor regulates a distinct transcription program in androgen-independent prostate cancer. Cell. 2009 Jul 23;138(2):245–256. doi: 10.1016/j.cell.2009.04.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, Brodsky AS, Keeton EK, Fertuck KC, Hall GF, Wang Q, Bekiranov S, Sementchenko V, Fox EA, Silver PA, Gingeras TR, Liu XS*, Brown M* Genome-wide analysis of estrogen receptor binding sites. Nat. Genet. 2006;38:1289–1297. doi: 10.1038/ng1901. [DOI] [PubMed] [Google Scholar]
- 17.Xu J, Sun J, Kader AK, Lindström S, Wiklund F, Hsu FC, Johansson JE, Zheng SL, Thomas G, Hayes RB, Kraft P, Hunter DJ, Chanock SJ, Isaacs WB, Grönberg H. Estimation of absolute risk for prostate cancer using genetic markers and family history. Prostate. 2009 Oct 1;69(14):1565–1572. doi: 10.1002/pros.21002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hsu FC, Sun J, Zhu Y, Kim ST, Jin T, Zhang Z, Wiklund F, Kader AK, Zheng SL, Isaacs W, Grönberg H, Xu J. Comparison of two methods for estimating absolute risk of prostate cancer based on single nucleotide polymorphisms and family history. Cancer Epidemiol Biomarkers Prev. 2010 Apr;19(4):1083–1088. doi: 10.1158/1055-9965.EPI-09-1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sun J, Kader AK, et al. Inherited genetic markers discovered to date are able to identify a significant number of men at considerably elevated risk for prostate cancer. Prostate. doi: 10.1002/pros.21256. (In press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Thompson IM, Goodman PJ, Tangen CM, et al. The influence of finasteride on the development of prostate cancer. N Engl J Med. 2003;349:215–224. doi: 10.1056/NEJMoa030660. [DOI] [PubMed] [Google Scholar]
- 21.Andriole GA, Bostwick D, Brawley OW. The influence of dutasteride on the risk of biopsy-detectable prostate cancer: Outcomes of the REduction by DUtasteride of Prostate Cancer Events (REDUCE) study. N Engl J Med. 2010 Apr 1;362(13):1192–1202. doi: 10.1056/NEJMoa0908127. [DOI] [PubMed] [Google Scholar]
- 22.Johnson WE, Li W, Meyer CA, Gottardo R, Carroll JS, Brown M, Liu XS. Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci U S A. 2006 Aug 15;103(33):12457–12462. doi: 10.1073/pnas.0601180103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Carroll JS, Liu XS, Brodsky AS, Li W, Meyer CA, Szary AJ, Eeckhoute J, Shao W, Hestermann EV, Geistlinger TR, Fox EA, Silver PA, Brown M. Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell. 2005 Jul 15;122(1):33–43. doi: 10.1016/j.cell.2005.05.008. [DOI] [PubMed] [Google Scholar]
- 24.Wang Q, Li W, Liu XS, Carroll JS, Jänne OA, Keeton EK, Chinnaiyan AM, Pienta KJ, Brown M. A hierarchical network of transcription factors governs androgen receptor-dependent prostate cancer growth. Mol Cell. 2007 Aug 3;27(3):380–392. doi: 10.1016/j.molcel.2007.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1–S4.9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J.Mol.Biol. 1987 Jul 20;196(2):261–282. doi: 10.1016/0022-2836(87)90689-9. [DOI] [PubMed] [Google Scholar]
- 27.Washietl S, Pedersen JS, Korbel JO, Stocsits C, Gruber AR, Hackermüller J, Hertel J, Lindemeyer M, Reiche K, Tanzer A, Ucla C, Wyss C, Antonarakis SE, Denoeud F, Lagarde J, Drenkow J, Kapranov P, Gingeras TR, Guigó R, Snyder M, Gerstein MB, Reymond A, Hofacker IL, Stadler PF. Structured RNAs in the ENCODE selected regions of the human genome. Genome Res. 2007 Jun;17(6):852–864. doi: 10.1101/gr.5650707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Down TA, Hubbard TJP. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 2002 Mar;12(3):458–461. doi: 10.1101/gr.216102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Davuluri RV, Grosse I, Zhang MQ. Computational identification of promoters and first exons in the human genome. Nat Genet. 2001 Dec;29(4):412–417. doi: 10.1038/ng780. Erratum in: Nat Genet 2002 Nov;32(3):459. [DOI] [PubMed] [Google Scholar]
- 30.Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W, van Steensel B. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 2008 June 12;453:948–951. doi: 10.1038/nature06947. [DOI] [PubMed] [Google Scholar]
- 31.Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006 Nov 23;444(7118):499–502. doi: 10.1038/nature05295. [DOI] [PubMed] [Google Scholar]
- 32.Cheng Y, Miura RM, Tian B. Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics. 2006;22:2320–2325. doi: 10.1093/bioinformatics/btl394. [DOI] [PubMed] [Google Scholar]
- 33.Benjamin P Lewis, Christopher B Burge, David P Bartel. Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
- 34.Andrew Grimson, Kyle Kai-How Farh, Wendy K Johnston, Philip Garrett-Engele, Lee P Lim, David P Bartel. MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing Molecular. Cell. 2007;27:91–105. doi: 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Robin C Friedman, Kyle Kai-How Farh, Christopher B Burge, David P Bartel. Most Mammalian mRNAs Are Conserved Targets of MicroRNAs. Genome Research. 2009;19:92–105. doi: 10.1101/gr.082701.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Down TA, Hubbard TJ. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome research. 2002;12(3):458–461. doi: 10.1101/gr.216102. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.