Abstract
Genome-wide association studies (GWAS) have transformed our understanding of testicular germ cell tumour (TGCT) susceptibility but much of the heritability remains unexplained. Here we report a new GWAS, a meta-analysis with previous GWAS and a replication series, totalling 7,319 TGCT cases and 23,082 controls. We identify 19 new TGCT risk loci, approximately doubling the number of known TGCT risk loci to 44. By performing in-situ Hi-C in TGCT cells, we provide evidence for a network of physical interactions between all 44 TGCT risk SNPs and candidate causal genes. Our findings reveal widespread disruption of developmental transcriptional regulators as a basis of TGCT susceptibility, consistent with failed primordial germ cell differentiation as an initiating step in oncogenesis1. Defective microtubule assembly and dysregulation of KIT-MAPK signalling also feature as recurrently disrupted pathways. Our findings support a polygenic model of risk and provide insight into the biological basis of TGCT.
Keywords: Testicular Cancer, Germ Cell Tumour, TGCT, GWAS, Oncoarray
Testicular germ cell tumour (TGCT) is the most common cancer in men aged 18-45, with over 52,000 new cases diagnosed annually worldwide2. The development of TGCT is strongly influenced by inherited genetic factors, which contributes to nearly half of all disease risk3 and is reflected in the 4-to-8 fold increased risk shown in siblings of cases4–7. Our understanding of TGCT susceptibility has been transformed by recent genome-wide association studies (GWAS), which have so far identified 25 independent risk loci for TGCT8–18. Although projections indicate that additional risk variants for TGCT can be discovered by GWAS19, studies to date have been based on comparatively small sample sizes which have had limited power to detect common risk variants20.
To gain a more comprehensive insight into TGCT aetiology we performed a new GWAS with substantially increased power, followed by a meta-analysis with existing GWAS and replication genotyping (totalling 7,319 cases/23,082 controls). Here we report both the discovery of 19 new TGCT susceptibility loci and refined risk estimates for the previously reported loci. In addition, we have investigated the gene regulatory mechanisms underlying the genetic associations observed at all 44 TGCT GWAS risk loci by performing in-situ chromosome conformation capture in TGCT cells (Hi-C) to characterize chromatin interactions between predisposition SNPs and target genes, integrating these data with a range of publicly available TGCT functional genomics data.
We conducted a new GWAS using the Oncoarray platform (3,206 UK TGCT cases/7,422 UK controls), followed by a meta-analysis combining the two largest published TGCT GWAS datasets11,16 (986 UK cases/4,946 UK controls, 1,327 Scandinavian cases/6,687 Scandinavian controls) (Fig. 1). To increase genomic resolution, we imputed >10 million SNPs using the 1000 Genomes Project as a reference panel. Quantile-Quantile (Q-Q) plots for SNPs with minor allele frequency (MAF) >5% post imputation did not show evidence of substantive over-dispersion (λ1000=1.03, Supplementary Fig. 1). We derived joint odds ratios (ORs) and 95% confidence intervals (CIs) under a fixed-effects model for each SNP with MAF >0.01. Finally we sought validation of 37 SNPs associated at P < 5.0 x 10-6, which did not map to known TGCT risk loci and displayed a consistent OR across all GWAS datasets, by genotyping an additional 1,801 TGCT cases and 4,027 controls from the UK. After meta-analysis of the three GWAS and replication series, we identified genome-wide significant associations (i.e. P < 5 x 10-8) at 19 new loci (Table 1). We found no evidence for significant interactions between risk loci.
Figure 1.
Study design.
Table 1.
Summary of genotyping results for all genome-wide TGCT risk SNPs (n=44). New loci (n=19) discovered through this study are marked in bold.
| SNP1 | Chr. | Pos. (b37) | Alleles2 | RAF3 | Oncoarray | Discovery Meta | Replication | Combined-Meta | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OR4 (95% CI) | Ptrend5 | OR (95% CI) | Ptrend | OR (95% CI) | Ptrend | P meta6 | (I2)7 | |||||
| rs4240895 | 1 | 9713386 | C/T | 0.39 | 1.13 (1.07-1.19) | 7.4x10-05 | 1.14 (1.09-1.19) | 1.6x10-07 | 1.24 (1.16-1.32) | 1.7X10-07 | 5.6X10-13 | 47 |
| rs2072499 | 1 | 156169610 | A/G | 0.36 | 1.14 (1.08-1.2) | 2.1x10-05 | 1.18 (1.13-1.23) | 1.9x10-10 | - | - | 1.9X10-10 | 45 |
| rs3790672 | 1 | 165873392 | T/C | 0.29 | 1.17 (1.10-1.23) | 3.2x10-06 | 1.20 (1.14-1.25) | 4.5x10-11 | - | - | 4.5X10-11 | 16 |
| rs7581030 | 2 | 71572455 | C/T | 0.24 | 1.15 (1.08-1.22) | 5.4x10-05 | 1.17 (1.12-1.23) | 7.6x10-09 | 1.17 (1.08-1.26) | 6.2X10-04 | 1.9X10-11 | 0 |
| rs10510452 | 3 | 16625048 | A/G | 0.70 | 1.15 (1.09-1.21) | 1.7x10-05 | 1.18 (1.13-1.24) | 6.7x10-10 | - | - | 6.7X10-10 | 41 |
| rs11705932 | 3 | 141818850 | C/T | 0.80 | 1.19 (1.11-1.26) | 6.8x10-06 | 1.18 (1.11-1.24) | 2.9x10-07 | - | - | 2.9X10-07 | 39 |
| rs1510272 | 3 | 156300724 | C/T | 0.75 | 1.19 (1.13-1.26) | 3.9x10-07 | 1.23 (1.17-1.29) | 1.4x10-12 | - | - | 1.4X10-12 | 32 |
| rs6821144 | 4 | 76520651 | G/A | 0.89 | 1.18 (1.08-1.28) | 7.9x10-04 | 1.22 (1.14-1.30) | 1.5x10-06 | 1.35 (1.23-1.47) | 9.8X10-07 | 1.8X10-11 | 15 |
| rs17021463 | 4 | 95224812 | T/G | 0.43 | 1.14 (1.08-1.2) | 7.0x10-06 | 1.14 (1.10-1.19) | 3.6x10-08 | - | - | 3.6X10-08 | 0 |
| rs2720460 | 4 | 104054686 | A/G | 0.63 | 1.25 (1.18-1.31) | 1.7x10-12 | 1.27 (1.21-1.32) | 4.8x10-20 | - | - | 4.8X10-20 | 9 |
| rs4862848 | 4 | 188921440 | A/G | 0.35 | 1.17 (1.10-1.25) | 1.5x10-05 | 1.21 (1.16-1.27) | 4.5x10-12 | 1.10 (1.02-1.18) | 2.4X10-02 | 2.4X10-12 | 61 |
| rs2736100 | 5 | 1286516 | C/A | 0.51 | 1.24 (1.18-1.3) | 5.8x10-13 | 1.28 (1.23-1.33) | 2.3x10-24 | - | - | 2.3X10-24 | 33 |
| rs3805663 | 5 | 134342720 | C/A | 0.58 | 1.09 (1.03-1.15) | 3.4x10-03 | 1.13 (1.08-1.18) | 4.1x10-07 | - | - | 4.1X10-07 | 49 |
| rs4624820 | 5 | 141681788 | G/A | 0.56 | 1.46 (1.4-1.52) | 4.0x10-36 | 1.47 (1.43-1.52) | 2.7x10-57 | - | - | 2.7X10-57 | 0 |
| rs210138 | 6 | 33542538 | A/G | 0.21 | 1.42 (1.35-1.49) | 8.5x10-22 | 1.48 (1.42-1.54) | 2.9x10-37 | - | - | 2.9X10-37 | 70 |
| rs11155671 | 6 | 149972132 | G/A | 0.66 | 1.15 (1.09-1.21) | 1.1x10-05 | 1.14 (1.09-1.20) | 3.4x10-07 | 1.14 (1.05-1.22) | 3.3X10-03 | 3.9X10-09 | 0 |
| rs12699477 | 7 | 1968953 | T/C | 0.39 | 1.22 (1.16-1.28) | 1.9x10-10 | 1.2 (1.15-1.25) | 9.9x10-13 | - | - | 9.9X10-13 | 0 |
| rs17689040 | 7 | 40920313 | C/G | 0.43 | 1.16 (1.10-1.22) | 1.2x10-06 | 1.15 (1.10-1.20) | 3.4x10-08 | 1.17 (1.09-1.25) | 1.2X10-04 | 1.9X10-11 | 0 |
| rs17153755 | 8 | 11611500 | C/G | 0.65 | 1.20 (1.14-1.26) | 1.0x10-08 | 1.16 (1.11-1.21) | 1.5x10-08 | 1.05 (0.97-1.14) | 2.8X10-01 | 4.5X10-08 | 69 |
| rs7010162 | 8 | 70976505 | C/T | 0.62 | 1.13 (1.07-1.19) | 5.9x10-05 | 1.15 (1.10-1.20) | 3.4x10-08 | - | - | 3.4X10-08 | 0 |
| rs7040024 | 9 | 845516 | A/C | 0.77 | 1.48 (1.41-1.55) | 1.6x10-27 | 1.53 (1.47-1.59) | 1.4x10-45 | - | - | 1.4X10-45 | 42 |
| rs7107174 | 11 | 77996403 | C/A | 0.22 | 1.15 (1.08-1.22) | 1.1x10-04 | 1.15 (1.09-1.20) | 1.8x10-06 | - | - | 1.8X10-06 | 0 |
| rs648090 | 11 | 125071163 | A/G | 0.71 | 1.18 (1.11-1.24) | 6.2x10-07 | 1.15 (1.10-1.20) | 9.6x10-08 | 1.24 (1.15-1.33) | 2.4X10-06 | 2.9X10-12 | 24 |
| rs2900333 | 12 | 14653867 | C/T | 0.63 | 1.17 (1.11-1.23) | 3.2x10-07 | 1.2 (1.15-1.25) | 8.0x10-13 | - | - | 8.0X10-13 | 16 |
| rs4931000 | 12 | 32141495 | A/G | 0.22 | 1.17 (1.09-1.24) | 2.2x10-05 | 1.17 (1.11-1.22) | 8.0x10-08 | 1.23 (1.13-1.32) | 2.3X10-05 | 1.2X10-11 | 0 |
| rs7315956 | 12 | 70563865 | A/G | 0.34 | 1.13 (1.07-1.19) | 1.5x10-04 | 1.14 (1.09-1.19) | 4.8x10-07 | 1.16 (1.08-1.25) | 2.9X10-04 | 6.5X10-10 | 0 |
| rs3782181 | 12 | 88953561 | C/A | 0.81 | 2.07 (1.99-2.14) | 5.9x10-81 | 2.07 (2.01-2.13) | 1.4x10-129 | - | - | 1.4X10-129 | 40 |
| rs1009647 | 14 | 55880047 | G/A | 0.73 | 1.13 (1.06-1.19) | 3.5x10-04 | 1.15 (1.10-1.21) | 5.0x10-07 | 1.12 (1.03-1.21) | 1.7X10-02 | 3.0X10-08 | 0 |
| rs11071896 | 15 | 66821250 | A/G | 0.26 | 1.17 (1.10-1.23) | 5.0x10-06 | 1.19 (1.13-1.25) | 9.6x10-10 | 1.18 (1.09-1.27) | 2.1X10-04 | 8.7X10-13 | 0 |
| rs56046484 | 15 | 85605427 | G/T | 0.80 | 1.17 (1.10-1.24) | 3.1x10-05 | 1.18 (1.11-1.24) | 2.4x10-07 | 1.11 (1.01-1.21) | 3.8X10-02 | 4.2X10-08 | 0 |
| rs4561483 | 16 | 11920037 | A/G | 0.35 | 1.11 (1.05-1.18) | 5.7x10-04 | 1.14 (1.09-1.19) | 2.6x10-07 | - | - | 2.6X10-07 | 0 |
| rs7404843 | 16 | 15530708 | T/G | 0.11 | 1.21 (1.11-1.3) | 6.8x10-05 | 1.27 (1.20-1.35) | 1.7x10-11 | 1.17 (1.05-1.29) | 9.8X10-03 | 1.2X10-12 | 44 |
| rs8046148 | 16 | 50142944 | A/G | 0.79 | 1.10 (1.03-1.17) | 8.6x10-03 | 1.16 (1.1-1.21) | 5.2x10-07 | - | - | 5.2X10-07 | 59 |
| rs4888262 | 16 | 74670458 | C/T | 0.50 | 1.17 (1.11-1.23) | 2.0x10-07 | 1.18 (1.13-1.23) | 1.2x10-11 | - | - | 1.2X10-11 | 0 |
| rs55637647 | 16 | 88549264 | C/G | 0.38 | 1.15 (1.09-1.22) | 6.9x10-06 | 1.17 (1.12-1.22) | 2.8x10-09 | - | - | 2.8X10-09 | 0 |
| rs7501939 | 17 | 36101156 | T/C | 0.61 | 1.22 (1.16-1.28) | 4.4x10-11 | 1.26 (1.21-1.3) | 1.5x10-20 | - | - | 1.5X10-20 | 53 |
| rs9905704 | 17 | 56632543 | G/T | 0.68 | 1.27 (1.20-1.33) | 2.2x10-13 | 1.27 (1.22-1.32) | 3.1x10-20 | - | - | 3.1X10-20 | 0 |
| rs9966612 | 18 | 649311 | A/G | 0.32 | 1.15 (1.08-1.23) | 1.1x10-04 | 1.17 (1.11-1.22) | 2.4x10-08 | 1.13 (1.05-1.22) | 5.1X10-03 | 5.1X10-10 | 0 |
| rs2195987 | 19 | 24149545 | C/T | 0.83 | 1.19 (1.08-1.29) | 1.1x10-03 | 1.22 (1.15-1.29) | 8.3x10-09 | - | - | 8.3X10-09 | 0 |
| rs2241024 | 19 | 28257393 | G/A | 0.80 | 1.24 (1.17-1.31) | 4.4x10-09 | 1.23 (1.16-1.29) | 1.3x10-10 | 1.32 (1.22-1.42) | 6.3X10-08 | 9.5X10-17 | 29 |
| rs4599029 | 19 | 54284689 | G/T | 0.74 | 1.18 (1.12-1.25) | 7.7x10-07 | 1.16 (1.11-1.22) | 5.3x10-08 | 1.10 (1.01-1.19) | 3.8X10-02 | 9.9X10-09 | 22 |
| rs12481572 | 20 | 50708054 | A/T | 0.20 | 1.21 (1.13-1.28) | 7.6x10-07 | 1.20 (1.14-1.27) | 1.1x10-08 | 1.23 (1.13-1.33) | 3.7X10-05 | 2.0X10-12 | 0 |
| rs2839186 | 21 | 47690068 | C/T | 0.48 | 1.17 (1.11-1.23) | 1.5x10-07 | 1.18 (1.14-1.23) | 6.7x10-12 | - | - | 6.7X10-12 | 0 |
| rs739525 | 22 | 21332441 | T/C | 0.53 | 1.13 (1.06-1.19) | 1.8x10-04 | 1.14 (1.09-1.19) | 2.0x10-07 | 1.10 (1.02-1.18) | 2.2X10-02 | 1.9X10-08 | 0 |
dbSNP rs number
Alleles
Risk Allele Frequency
OR: per allele odds ratio
Ptrend: P-value for trend, via logistic regression
Pmeta : P-value for fixed effects meta-analysis
I2 heterogeneity index (0-100)
To the extent that they have been deciphered, many GWAS risk loci map to non-coding regions of the genome and influence gene regulation. Across the 44 independent TGCT risk loci (19 new and 25 previously reported), we confirmed a significant enrichment of enhancer/promoter associated histone marks, including H3K4me1, H3K4me3 and H3K9ac, using available ChIP-Seq data from the TGCT cell line NTERA2 (P<5.0x10-3) (Supplementary Table 1). Moreover this enrichment showed tissue specificity when compared to 41 other cell lines from the ENCODE21 project (Supplementary Fig. 2). These observations support the assertion that the TGCT predisposition loci influence risk through effects on cis-regulatory networks, and are involved in transcriptional initiation and enhancement. Since genomic spatial proximity and chromatin looping interactions are fundamental for regulation of gene expression we performed in situ capture Hi-C of promoters in NTERA2 cells to link risk loci to candidate target genes. We also sought to gain insight into the possible biological mechanisms for the associations by performing tissue-specific expression quantitative trait loci (eQTL) analysis for all risk SNP and target gene pairs (Supplementary Fig. 3, Supplementary Table 2). We analysed RNA-seq data from both normal testis (GTEx project22) and TGCT (TCGA), acknowledging that the latter may be affected by the issue of tumour purity, in addition to dysregulated gene expression that typifies cancer. Accepting this limitation and that further validation may be required, eQTL analysis was conducted in both datasets based on the established network of enhancer/promoter variants, to maximise our ability to find statistically significant associations after correcting for multiple testing. We additionally annotated risk loci with variants predicted to disrupt binding motifs of germ cell specific transcription factors (TF) (see methods). Finally, direct promoter variants and non-synonymous coding mutations for genes within the 44 risk loci were denoted (Table 2, Fig. 2).
Table 2.
Functional annotation of all 44 TGCT risk loci. Novel risk loci are highlighted in bold. Candidate causal genes are assigned based on the presence of functional data. Where data supports multiple candidates, the gene with the highest number of individual functional data points is assigned the candidate. Where multiple genes have the same number of data points all possible genes are listed. Competing mechanisms for the same gene (e.g. both coding and promoter variants) were allowed.
| SNP | Cyto-band | bp (b37) | Genes in LD Block | Functional Evidence | Candidate causal Gene(s) | Functional Pathway | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Coding Variant | Promoter Variant | Functional Chromatin (ChIP-seq peaks) | TF binding motif disruption | Hi-C Contact(s) | eQTL | Functional Study | ||||||
| rs4240895 | 1p36.22 | 9,713,386 | PIK3CD C1orf200 | KLF4 | ||||||||
| rs2072499 | 1q22 | 156,169,610 | KIAA0446 SLC25A44 | PMF1 | H3k4me1, H3k4me3, H3k9ac | PRDM1, CTCF | BGLAP, CCT3, PAQR6, PMF1, SEMA4A, UBQLN4 | CCT31 | PMF1 | Microtubule/chromosomal assembly | ||
| CCT3 | ||||||||||||
| rs3790672 | 1q24.1 | 165,873,392 | UCK2 | GATA, NANOG, LHX8, POU5F1, SOX9, PRDM1, CTCF | ||||||||
| rs7581030 | 2p13.3 | 71,572,455 | ZNF638 | ZNF638 | H3k4me3, H3k9ac | NANOG, POU5F1 | PAIP2B | ZNF638 | Transctiptional Regulation | |||
| rs10510452 | 3p24.3 | 16,625,048 | DAZL | GATA, NANOG, POU5F1 | OXNAD1 | OXNAD1 | ||||||
| rs11705932 | 3q23 | 141,818,850 | TFDP2 | |||||||||
| rs1510272 | 3q25.31 | 156,300,724 | GATA, POU5F1, CTCF | |||||||||
| rs6821144 | 4q21.1 | 76,520,651 | CDKL2 G3BP2 | H3k4me3, H3k9ac | SOX2 | G3BP2 | G3BP2 | |||||
| rs17021463 | 4q22.3 | 95,224,812 | SMARCAD1 HPGDS | SMARCAD1 | SMARCAD1 | H3k4me3, H3k9ac | GATA, KLF4, NANOG, POU5F1, PRDM1 | ATOH1 | SMARCAD12 | SMARCAD1 | Transctiptional Regulation | |
| rs2720460 | 4q24 | 104,054,686 | CENPE | CENPE | H3k4me3, H3k9ac | GATA, NANOG, LHX8, POU5F1, DMRT1 | MANBA, NFKB1, SLC39A8, TACR3 | MANBA2 | CENPE | Microtubule/chromosomal assembly | ||
| MANBA | ||||||||||||
| rs4862848 | 4q35.2 | 188,921,440 | ZFP42 | H3k4me3, H3k9ac | ||||||||
| rs2736100 | 5p15.33 | 1,286,516 | TERT | |||||||||
| rs3805663 | 5q31.1 | 134,366,200 | CATSPER3 PITX1 AK026965 | H3k4me3, H3k9ac | ||||||||
| rs4624820 | 5q31.3 | 141,681,788 | SPRY4 | |||||||||
| rs210138 | 6p21.31 | 33,542,538 | BAK1 AY383626 C6orf227 | BAK1 | H3k4me3, H3k9ac | GRM4 | BAK12 | BAK1 | KIT-MAPK | |||
| rs11155671 | 6q25.1 | 149,972,132 | KATNA1 LATS1 | GATA, KLF4, NANOG, SOX2, POU5F1, DMRT1, SOX9, PRDM1, CTCF | ||||||||
| rs12699477 | 7p22.3 | 1,968,953 | MAD1L1 | H3k4me1, H3k9ac | NANOG, CTCF | |||||||
| rs17689040 | 7p14.1 | 40,920,313 | ||||||||||
| rs17153755 | 8p23.1 | 11,611,500 | GATA4 c8orf | H3k4me3, H3k9ac | CTCF | GATA4 | GATA42 | GATA4 | Transctiptional Regulation | |||
| rs7010162 | 8q13.3 | 70,976,505 | PRDM14 | PRDM14 | GATA, PRDM1 | PRDM14 | Transctiptional Regulation | |||||
| rs7040024 | 9p24.3 | 845,516 | DMRT1 | DMRT1 | H3k4me3, H3k9ac | GATA, KLF4, CTCF | DMRT1 | Transctiptional Regulation | ||||
| rs7107174 | 11q14.1 | 77,997,936 | GAB2 | USP35 | GAB2 | H3k4me3, H3k9ac | GATA, KLF4, NANOG, LHX8, | ALG8, GAB2, NARS2 | GAB2 | KIT-MAPK | ||
| USP35 | ||||||||||||
| rs648090 | 11q24.2 | 125,071,163 | PKNOX2 | H3k4me1 | CTCF | |||||||
| rs2900333 | 12p13.1 | 14,653,867 | ATF7IP PLBD1 | ATF7IP | ATF7IP | GATA, NANOG, SOX2, POU5F1, CTCF | ATF7IP | Telomerase Function | ||||
| rs4931000 | 12p11.21 | 32,141,495 | C12orf35 | KIAA1551 | NANOG, CTCF | KIAA1551 | ||||||
| rs7315956 | 12q15 | 70,563,865 | CNOT2 KCNMB4 | CNOT2 | GATA, NANOG, LHX8, SOX2 | CNOT2 | Transctiptional Regulation | |||||
| rs3782181 | 12q21.32 | 88,953,561 | KITLG | H3k4me3, H3k9ac | GATA, NANOG, SOX2, POU5F1, DMRT1, PRDM1, CTCF | KITLG | KITLG | KIT-MAPK | ||||
| rs1009647 | 14q22.3 | 55,880,047 | ATG14 DLGAP5 FBXO34 LGALS3 TBPL2 | ATG14 | ATG141 | ATG14 | Autophagy | |||||
| rs11071896 | 15q22.31 | 66,821,250 | MAP2K1 TIPIN | ZWILCH | MAP2K1 | H3k4me1, H3k4me3, H3k9ac | GATA, NANOG, SOX2, DMRT1, SOX9, PRDM1, | DENND4A, IGDCC4, LCTL, MAP2K1, RAB11A, RPL4, SCARNA14, SNAPC5 | SNAPC52 | MAP2K1 | KIT-MAPK | |
| SNAPC5 | ||||||||||||
| ZWILCH | ||||||||||||
| rs56046484 | 15q25.2 | 85,605,427 | PDE8A SLC28A1 | GATA, NANOG, LHX8, CTCF | SLC28A1, WDR73 | WDR732 | WDR73 | Microtubule/chromosomal assembly | ||||
| rs4561483 | 16p13.13 | 11,920,037 | BCAR4 CATX-11 RSL1D1 | KLF4, NANOG, CTCF | LITAF | GSPT13 | GSPT1 | Cell cycle | ||||
| rs7404843 | 16p13.11 | 15,530,708 | MPV17L | GATA, POU5F1, CTCF | ||||||||
| rs8046148 | 16q12.1 | 50,142,944 | HEATR3 AF086132 | HEATR3 | H3k4me1, H3k4me3, H3k9ac | GATA, SOX2, CTCF | HEATR3 | HEATR31 | HEATR3 | |||
| rs4888262 | 16q23.1 | 74,670,458 | RFWD3 | RFWD3 | H3k4me1, H3k4me3, H3k9ac | GATA, KLF4, NANOG, PRDM1 | CLEC18C, LDHD, RFWD3, WDR59 | RFWD31 | RFWD3 | Apoptosis/p53 pathway | ||
| rs55637647 | 16q24.2 | 88,549,264 | ZFPM1 | ZFPM1 | H3k4me1, H3k9ac | KLF4 | ZFPM1 | Transctiptional Regulation | ||||
| rs7501939 | 17q12 | 36,101,156 | HNF1B | HNF1B | H3k4me3, H3k9ac | GATA, NANOG, SOX2, POU5F1, SOX9 | HNF1B | Transctiptional Regulation | ||||
| rs9905704 | 17q22 | 56,632,543 | TEX14 | TEX14 | GATA, SOX2, DMRT1, SOX9 | TEX141 | TEX14 | Microtubule/chromosomal assembly | ||||
| rs9966612 | 18p11.32 | 649,311 | CLUL1 ENOSF1 TYMS | THOC1 | THOC1 | Apoptosis/p53 pathway | ||||||
| rs2195987 | 19p12 | 24,149,545 | AK125686 | ZNF254 | H3k4me3, H3k9ac | SOX2, POU5F1 | ZNF254 | Transctiptional Regulation | ||||
| rs2241024 | 19q11 | 28,257,393 | ||||||||||
| rs4599029 | 19q13.42 | 54,284,689 | NLRP12 | GATA, KLF4, NANOG | ||||||||
| rs12481572 | 20q13.2 | 50,708,054 | POU5F1 | NFATC2, SALL4 | SALL4 | Transctiptional Regulation | ||||||
| rs2839186 | 21q22.3 | 47,690,068 | MCM3APAS MCM3AP | C21orf58, PCNT | H3k4me1, H3k4me3, H3k9ac | KLF4, NANOG, DMRT1, CTCF | PCNT | Microtubule/chromosomal assembly | ||||
| rs739525 | 22q11.21 | 21,332,441 | AIFM3 | AIFM3 | H3k4me1, H3k4me3, H3k9ac | NANOG | AIFM3 | Apoptosis/p53 pathway | ||||
Signficant vs threshold corrected for 96 multiple tests
Nominally significant at P<0.05
eQTL identified in previous study
Figure 2.
Circos plot of integrated functional analysis for all 44 TGCT risk loci. Inner-most ring represents the presence of a Hi-C contact in the NTERA2 cell line, the next four rings (H3k9me3, H3k9ac, H3k4me1, H3k4me3) are narrow-peak histone ChIP-seq tracks for NTERA2, the sixth ring represents -log P values of TGCT risk association from the Oncoarray GWAS data with green line denoting genome-wide significance and the seventh ring (outer-most) is the functional classification of candidate causal genes (green= transctiptional regulation, blue= microtubule/chromosomal assembly, purple= KIT-MAPK), the size of bar represents the strength of functional data. Candidate causal genes are also annotated around the outside of the plot.
Although preliminary and requiring functional validation, three candidate disease mechanisms emerge from analysis across the 44 loci. Firstly, 10 of the risk loci contain candidate genes linked to developmental transcriptional regulation, as evidenced by Hi-C looping interactions (at 8p23.1, 20q13.2), eQTL effects (at 4q22.3, 8p23.1), promoter variants (at 8q13.3, 9p24.3, 12q15, 17q12, 19p12) and coding variants (at 2p13.3, 16q24.2) (Table 2). Notably the new TGCT risk locus at 8p23.1 features a looping chromatin interaction from risk SNP rs17153755 to the promoter of GATA4, which is supported by an overlapping predicted strong enhancer region and a nominal eQTL effect (TCGA data, P=3.1 x 10-2) (Fig. 3a). The rs17153755 risk allele was associated with down-regulation of GATA4 expression, consistent with the hypothesised role of GATA4 as a tumor suppressor gene23,24. In addition the risk locus at 16q24.2 only contains a single gene ZFPM1 (alias FOG, Friend of GATA1), which encodes an essential regulator of GATA125, in which we noted a predicted damaging 26 missense polymorphism (rs3751673, NP_722520.2:p.Arg22Gly). The GATA family of transcription factors are expressed throughout postnatal testicular development27, and play a key role in ensuring correct tissue specification and differentiation28. We also observed promoter variants at 8q13.3 and 9p24.3, providing support respectively for the role of PRDM14 and DMRT1 in TGCT oncogenesis, both of which encode important transcriptional regulators of germ cell specification and sex determination29–32. Of final note the new locus at 20q13.2 was characterized by a predicted disrupted POU5F1 binding motif, together with a looping Hi-C contact from risk SNP rs12481572 to the promoter of SALL4, a gene associated with the maintenance of pluripotency in embryonic stem cells33.
Figure 3.
A-C – Regional plots of three new TGCT loci at A) 8p23.1, B) 15q25.2 and C) 15q22.31. Shown by triangles are the −log10 association P values of genotyped SNPs, based on Oncoarray data. Shown by circles are imputed SNPs at each locus. The intensity of red shading indicates the strength of LD with the sentinel SNP (labelled). Also shown are the SNP build 37 coordinates in mega-bases (Mb), recombination rates in centi-morgans (cM) per mega-base (Mb) (in light blue) and the genes in the region (in dark blue). Below the gene transcripts are Hi-C next generation sequencing read pair counts (intervals are determined by HindIII cut points, with average 3Kb resolution), where gaps represent bait locations, which are plotted along with significant Hi-C interactions, where colour and depth of ribbons represent the score. Below the axis is a zoomed-in section displaying the surrounding genes for each SNP, with the sentinel SNP marked with a red triangle and any significant regulatory markers denoted with a blue circle. Finally the predicted chromHMM states are shown (coloured as per the legend) along with a zoomed-in arc depiction of the same Hi-C contact(s).
Secondly, candidate genes with roles related to microtubule/chromosomal assembly were implicated at five TGCT risk loci, supported by Hi-C looping interactions (at 1q22, 15q25.2), eQTL effects (at 15q25.2, 17q22), promoter variants (at 1q22, 4q24) and coding variants (at 21q22.3). Notably at locus 17q22 we observed a promoter variant (rs302875) which displays a strong eQTL effect (GTEx data, P=4.9 x 10-7) on TEX14 (Testis-Expressed 14), which encodes an important regulator of kinetochore-microtubule assembly in testicular germ cells14,34,35. At new risk locus 15q25.2 we identified a nominal eQTL association (rs2304416, TCGA data, P=3.2 x 10-2) and accompanying chromatin looping interaction with mitotic spindle assembly related gene WDR7336 (Fig. 3b). WDR73 encodes a protein with a crucial role in the regulation of microtubule organization during interphase37 and biallelic mutations cause Galloway-Mowat Syndrome, a human syndrome of nephrosis and neuronal dysmigration. Finally the functional analysis also highlighted microtubule assembly related genes PMF1, CENPE and PCNT 38–41 as candidates at 1q22, 4q24 and 21q22.3 respectively.
Thirdly, the central role of KIT-MAPK signalling in TGCT oncogenesis was further supported at four loci, by Hi-C looping interactions (at 11q14.1, 15q22.31), eQTL effects (at 6p21.31) and promoter variants (at 6p21.31, 11q14.1, 15q22.31). Recent tumour sequencing studies have established that KIT is the major somatic driver gene for TGCT42 and a relationship between the previously identified risk SNP rs995030 (12q21) and KITLG expression has been demonstrated through allele-specific p53 binding by Zeron-Medina et al43. Here we report a new locus at 15q22.31, containing a variant within the promoter of MAP2K1 (Fig. 3c), which raises the prospect of further elucidating mechanisms of KIT-MAPK signalling in driving TGCTs. MAP2K1 (alias MEK1) is downstream of c-Kit and MEK1 inhibition slows primordial germ cell growth in the presence of KIT ligand44. If MAP2K1 is confirmed as a causal gene at 15q22.31, the study of somatic KIT mutational status in patients carrying the risk allele at 15q22.31 should be highly informative. In addition, within the 11q14.1 risk locus, we identify a candidate promoter variant for GAB2, which encodes a docking protein for signal transduction to MAPK and PI3K pathways which interacts directly with KIT45. Finally in our analysis we identify both a candidate promoter variant and a nominal eQTL effect for BAK1 (6p21.31)(TCGA data, P=1.9 x 10-2), which encodes a protein regulating apoptosis which binds with KIT40. While we have sought to decipher the functional basis of risk loci based on the cumulative weight of evidence across eQTL, Hi-C and ChIP-seq data, a limitation has been reliance on relatively small sample size for eQTL analysis. Access to larger eQTL datasets in testicular tissue are likely in the future to address this deficiency enabling a better definition of the causal basis of TGCT risk at each locus.
The 44 risk loci which have now been identified for TGCT collectively account for 34% of the (father-to-son) familial risk and hence have potential clinical utility for personalized risk profiling. To assess this potential, we constructed polygenic risk scores (PRS) for TGCT, considering the combined effect of all risk SNPs modelled under a log-normal relative risk distribution. Using this approach the men in the top 1% of genetic risk have a relative risk of 14 which translates to a 7% lifetime risk of TGCT (Supplementary Fig. 4).
In summary, we have performed a new TGCT GWAS, identifying 19 new risk loci for TGCT, approximately doubling the number of previously reported SNPs. Using capture Hi-C we have generated a chromatin interaction map for TGCT, providing direct physical interactions between non-coding risk SNPs and target gene promoters. Moreover integration of these data together with ChIP-seq chromatin profiling and RNA-seq eQTL analysis, accepting certain caveats, has allowed us to gain preliminary but unbiased tissue-specific insight into the biological basis of TGCT susceptibility. This analysis suggests a model of TGCT susceptibility based on transcriptional dysregulation, which is likely to contribute to the developmental arrest of primordial germ cells coupled with chromosomal instability through defective microtubule function and accompanied upregulation of KIT-MAPK signalling.
Methods
Sample description
TGCT cases were from the UK (n=5,992) and Scandinavia (n=1,327). The UK cases were ascertained from two studies (1) a UK study of familial testicular cancer and (2) a systematic collection of UK collection of TGCT cases. Case recruitment was via the UK Testicular Cancer Collaboration, a group of oncologists and surgeons treating TGCT in the UK (Supplementary note 1). The studies were co-ordinated at the Institute of Cancer Research (ICR). Samples and information were obtained with full informed consent and Medical Research and Ethics Committee approval (MREC02/06/66 and 06/MRE06/41). Additional (n=1,327) case samples of Scandinavian origin were used from a previously published GWAS16.
Control samples for the primary GWAS were all taken from within the UK. Specifically 2,976 cancer-free, male controls were recruited through two studies within the PRACTICAL Consortium (Supplementary note 2): (1) the UK Genetic Prostate Cancer Study (UKGPCS) (age <65), a study conducted through the Royal Marsden NHS Foundation Trust and (2) SEARCH (Study of Epidemiology & Risk Factors in Cancer), recruited via GP practices in East Anglia (2003-2009). 4,446 cancer-free female controls from across the UK were recruited via the Breast Cancer Association Consortium (BCAC). Controls from the UK previously published GWAS11 were from two sources within the UK: 2,482 controls were from the 1958 Birth Cohort (1958BC), and 2,587 controls were identified through the UK National Blood Service (NBS) and were genotyped as part of the Wellcome Trust Case Control Consortium. Additional (n=6,687) control samples of Scandinavian origin were used in the meta-analysis, and have been previously described16. Control samples for replication genotyping (n=4,027) were taken from two studies, the national study of colorectal cancer genetics (NSCCG)46 and GEnetic Lung CAncer Predisposition Study (GELCAPS)47. NSCCG and GELCAP controls were spouses of cancer patients with no personal history of cancer at time of ascertainment.
Primary GWAS
Genotyping was conducted using a custom Infinium OncoArray-500K BeadChip (Oncoarray) from Illumina (Illumina, San Diego, CA, USA), comprising a 250K SNP genome-wide backbone and 250K SNP custom content selected across multiple consortia within COGS (Collaborative Oncological Gene-environment Study). Oncoarray genotyping was conducted in accordance with the manufacturer’s recommendations by the Edinburgh Clinical Research Facility, Wellcome Trust CRF, Western General Hospital, Edinburgh EH4 2XU.
Published GWAS
The UK and Scandinavian GWAS have been previously reported8,11,13. Briefly the UK GWAS comprised 986 cases genotyped on the Illumina HumanCNV370-Duo bead array (Ilumina, San Diego, CA, USA) and 4,946 controls genotyped on the Illumina Infinium 1.2M array. We analysed data on a common set of 314,861 SNPs successfully genotyped by both arrays. The Scandinavian GWAS 16, comprised 1,326 cases and 6,687 controls genotyped using the Human OmniExpressExome-8v1 Illumina array.
Quality Control of GWAS
Oncoarray data was filtered as follows, we excluded individuals with low call rate (<95%), with abnormal autosomal heterozygosity or with >10% non-European ancestry (based on multi-dimensional scaling). We filtered out all SNPs with minor allele frequency <1%, a call rate of <95% in cases or controls or with a minor allele frequency of 1–5% and a call rate of <99%, and SNPs deviating from Hardy-Weinberg equilibrium (10-12 in controls and 10-5 in cases). The final number of SNPs passing quality control filters was 371,504. Quality control (QC) procedures for the UK and Scandinavian GWAS have been previously described8,11,13,16.
Imputation
Genome-wide imputation was performed for all GWAS datasets. The 1000 genomes phase 1 data (Sept-13 release) was used as a reference panel, with haplotypes pre-phased using SHAPEIT248. Imputation was performed using IMPUTE2 software49 and association between imputed genotype and TGCT was tested using SNPTEST 50, under a frequentist model of association. QC was performed on the imputed SNPs; excluding those with INFO score < 0.8 and MAF < 0.01.
Replication genotyping
Replication genotyping of the 37 SNPs was performed by allele-specific KASPar allele-specific SNV primers51. Genotyping was conducted by LGC Limited, Unit 1-2 Trident Industrial Estate, Pindar Road, Hoddesdon, UK.
Statistical Analysis
Study sample size was chosen in order to achieve >50% power to detect common variants, defined as MAF > 5%, OR > 1.320. For Oncoarray data tests of association between imputed SNPs and TGCT was performed under a probabilistic dosage model in in SNPTESTv2.552, adjusting for principal components. Inflation in the test statistics was observed at only modest levels, λ1000=1.03. The inflation factor λ was based on the 90% least-significant SNPs53. The adequacy of the case-control matching and possibility of differential genotyping of cases and controls were formally evaluated using Q-Q plots of test statistics (Supplementary Fig. 1). Population ancestry structure for the UK and Scandinavian cohorts was assessed through visualisation of the first two principle components (Supplementary Fig. 5); stable ancestral clustering was observed (Supplementary Table 3).
Statistical analysis of previously reported GWAS was performed as previously described8,11,13,16,54. Meta-analyses were performed using the fixed-effects inverse-variance method based on the β estimates and standard errors from each study using META v1.655. Cochran's Q-statistic to test for heterogeneity and the I2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated56. For each new locus we examined evidence of departure from a log-additive (multiplicative) model, to assess any genotype specific effect. Using the Oncoarray data individual genotype data ORs were calculated for heterozygote (ORhet) and homozygote (ORhom) genotypes, which were compared to the per allele ORs. We tested for a difference in these 1d.f. and 2d.f. logistic regression models to assess for evidence of deviation (P<0.05) from a log-additive model. Using Oncoarray data we examined for statistical interaction between any of the 44 TGCT predisposition loci by evaluating the effect of adding an interaction term to the regression model, adjusted for stage, using a likelihood ratio test (using a significance threshold of P < 2.58 x 10-5 to account for 1,936 tests). Regional plots were generated using visPIG software57 (Supplementary Fig. 6). Polygenic risk scores (PRS) were constructed using the methodology of Pharoah et al58, based on a log-normal distribution LN (µ, σ2) with mean µ and variance σ2 (i.e. relative risk is normally distributed on a logarithmic scale). The 0.5% lifetime risk of TGCT risk was based on 2014 UK data59, multiplied by relative risk to give lifetime risk per percentile of the PRS. For calculation of the proportion of TGCT genetic risk explained by the 44 loci, a father-to-son relative risk of four was used.
Chromatin mark enrichment analysis
To examine enrichment in specific ChIP-seq tracks across risk loci we adapted the variant set enrichment method of Cowper-Sal lari et al60. Briefly, for each risk locus, a region of strong LD was defined (i.e. R2 > 0.8 and D’ > 0.8), and SNPs mapping to these regions were termed the associated variant set (AVS). Histone ChIP-seq uniform peak data was obtained from ENCODE21 for the NTERA2 cell line, and data was included for four histone marks. For each of these marks, the overlap of the SNPs in the AVS and the binding sites was determined to produce a mapping tally. A null distribution was produced by randomly selecting SNPs with the same LD structure as the risk associated SNPs, and the null mapping tally calculated. This process was repeated 10,000 times, and approximate P-values were calculated as the proportion of permutations where null mapping tally was greater or equal to the AVS mapping tally. An enrichment score was calculated by normalizing the tallies to the median of the null distribution. Thus the enrichment score is the number of standard deviations of the AVS mapping tally from the mean of the null distribution tallies. Tissue specificity was assessed by comparison of enrichment levels in NTERA2, compared to 41 other cell lines from ENCODE21, with analysis performed using the same method as above (Supplementary Fig. 2).
Promoter Hi-C
In situ Hi-C libraries were prepared as described by Rao et al.61 with the following modifications: (i) 25 million cells were fixed and processed; (ii) HindIII enzyme (NEB, Ipswich, MA, USA) was used and digestion was performed overnight; (iii) ligation was performed overnight at 16C; (iv) 3 µl of 15 µM annealed PE adaptors were ligated incubating 3 µl of T4 DNA ligase (NEB, Ipswich, MA, USA) for 2h at RT; (vi) 6 cycles of PCR were performed to amplify the libraries before capture. A Sure Select (Agilent, Santa Clara, CA, USA) custom promoter kit was used to perform capture with the same design as described by Misfud et al.62. For each capture reaction, 750 µg of Hi-C libraries were used. Capture was performed following the manufacture protocol and employing a custom reagent kit (Agilent, Santa Clara, CA, USA). Final PCR amplification was performed using 5 cycles to minimise PCR duplicates. 2x100bp sequencing was performed using Illumina HiSeq2000 or 2500 technology (Illumina, San Diego, CA, USA). The HiCUP pipeline63 was used to process raw sequencing reads, map di-tag positions against the reference human genome and remove duplicate reads. The protocol was performed for two independent NTERA2 biological replicates, with cells obtained from the laboratory of Prof. Janet Shipley (The Institute of Cancer Research, London) and their identity independently confirmed through STR typing at an external laboratory (Public Health England, Porton Down, UK). Cells were tested and found to be negative for mycoplasma contamination. Both Hi-C libraries achieving the following quality control thresholds: >80% reads uniquely aligning, >80% valid pair rate, >85% unique di-tag rate and >80% of interactions being cis (Supplementary Table 4). Statistically significant interactions were called using the CHiCAGO pipeline64, with both biological replicates processed in parallel to obtain a unique list of reproducible NTERA2 contacts. Stability of results across replicates was also verified by processing each sample individually and comparing the significance scores of called interactions; strong correlation was observed between the replicates (r = 0.8, P < 5.0 x 10-10, Supplementary Fig. 7). Interactions with a -log(weighted P-value) > 5 were considered significant. To avoid short-range proximity bias interactions of <40kb were excluded. The distribution of interaction distances closely matched the prior published dataset of Misfud et al.62 (Supplementary Fig. 8). A Hi-C track plotting read pair counts per HindIII fragment has been added to region plot figures to demonstrate the underlying signal strength of significant Hi-C contacts.
3C Validation
3C was used to validate selected chromatin interactions detected by CHi-C (3p24.3, 4q24, 11q14.1, 15q22.31, 15q25.2, 16q12.1, and 16q23.1) (Supplementary Fig. 9, Supplementary Table 5). Three replicates of in situ 3C libraries were prepared using NTERA2 cells. Cell pellets were crosslinked, digested with HindIII, and ligated. Libraries were purified by phenol-chloroform extraction.
For each loci one or more bacterial artificial chromosomes (BACs; Source BioScience, Nottingham, UK) were used as an internal standard (Supplementary Table 6). Clones were streaked and grown before extracting DNA using a QIAGEN Plasmid Maxi Kit (QIAGEN, Hilden, Germany) which was purified by phenol-chloroform extraction. In loci covered by more than one clone, equimolar solutions of clones were prepared. Randomly ligated 3C libraries were generated for each BAC or equimolar solution of BACs.
Unidirectional primer pairs were designed to amplify ligation junctions of the bait and other interacting HindIII fragment (promoter-element, P-E) and around the bait and a flanking control HindIII fragment in between the promoter and distal element (promoter-control, P-C) using Primer365 (Supplementary Tables 7 and 8). Regions were amplified using both P-E and P-C primer pairs in BAC and NTERA2 libraries using a QIAGEN Multiplex PCR Kit (QIAGEN, Hilden, Germany). 5 ng and 100 ng of BAC and NTERA2 library template DNA, respectively, were amplified using the following procedure: initial 15 minute denaturation at 95°C followed by 38 cycles of 94°C for 0.5 minutes, annealing temperature specific to primer pair for 1.5 minutes seconds, 72°C extension for 1.5 minutes, followed by a final 10 minute extension at 72°C extension. 5 µl of each PCR reaction was visualised on 2% agarose gels stained with ethidium bromide. ImageJ66 was used to quantify intensities of PCR products and normalise for differential primer efficiency by comparing to equimolar BAC PCR products.
P-E fragments were Sanger sequenced in NTERA2 libraries to confirm fragments visualised on agarose gels as expected (Supplementary Fig. 10).
Chromatin state annotation
We used ChromHMM67 to infer chromatin states by integrating information on histone modifications and DNaseI hypersensitivity data to identify combinatorial and spatial patterns of epigenetic marks. Aligned next generation sequencing reads from ChIP-Seq and DNAse-Seq experiments on the NTERA2 cells were downloaded from ENCODE21. Read-shift parameters for ChIP-Seq data were calculated using PHANTOMPEAKQUALTOOLS. Genome-wide signal tracks were binarized (including input controls for ChIP-Seq data) and a set of learned models were generated using ChromHMM software67. The parameters of the highest scoring model were retained and model states were iteratively reduced down from 30 to 5 states. A 27-state model found to be stable and was subsequently used for segmenting the genome at 200bp resolution (Supplementary Fig. 11).
Expression quantitative trait locus analysis
We investigated for evidence of association between the SNPs at each locus and tissue specific changes in gene expression using two publically available resources: (i) RNAseq and Affymetrix 6.0 SNP data for 150 TGCT patients from The Cancer Genome Atlas and (ii) normal testicular tissue data from GTEx from 157 samples22. Associations between normalized RNA counts per-gene and genotype were quantified using R package ‘Matrix eQTL’. Box plots of all eQTL associations are presented in Supplementary Fig. 3 and the tissue in which the association was observed (TGCT or normal testis), along with any other tissues resulting in a positive association, are denoted in Supplementary Table 2. To reduce multiple testing, association tests were only performed between SNP and gene pairs where either: (i) a direct promoter variant was observed (as per column six of Table 2) or (ii) a Hi-C contact to a gene promoter was observed (as per column nine of Table 2), together with functionally active chromatin (as per column seven of Table 2). The SNP used for testing at each locus was selected based on the closest available proxy (highest R2) to the functional variant (i.e. the promoter or Hi-C contact variant), rather than using the sentinel SNP with the strongest TGCT association. Finally, as a comparison all possible gene/variant eQTL combinations were also tested at each locus (ignoring the functional Hi-C/promoter/CHiP-seq data), to provide a reference overview of all possible eQTL associations at each locus (Supplementary Table 9).
Transcription factor binding motif analysis
The impact of variants on regulatory motifs was assessed for a set of transcription factors (TF) associated with germ cell development. A germ cell specific TF set was utilized, rather than all TF globally, to provide increased specificity. An OMIM68 search-term-driven method was used to define the germ cell development TF set, using the following search terms: “germ cell” AND “development” AND “transcription factor” (n=46). The TF list was then intersected with predicted TF binding motifs based on a library of position weight matrices computed by Kheradpour and Kellis (2014)69 70. The intersected dataset contained motif position data for 10 TFs: DMRT1, GATA, KLF4, LHX8, NANOG, POU5F1, PRDM1, SOX2, SOX9, and CTCF. To validate the specificity of these motifs for TGCT we conducted variant set enrichment analysis, using the same method as detailed above (based on Cowper-Sal lari et al60), which confirmed enrichment for disruption of these 10 motifs in the 44 TGCT risk loci compared to the null distribution (Supplementary Table 10).
Integration of functional data
For the integrated functional annotation of risk loci LD blocks were defined as all SNPs in R2 > 0.8 with the sentinel SNP. Risk loci were then annotated with six types of functional data: (i) presence of a Hi-C contact linking to a gene promoter, (ii) presence of an expression quantitative trait locus, (iii) presence of a ChIP-seq peak, (iv) presence of a disrupted transcription factor binding motif, (v) presence of a variant within a gene promoter boundary, with boundaries defined using the Ensembl regulatory build71, (vi) presence of a non-synonymous coding change. Candidate causal genes were then assigned to TGCT risk loci using the target genes implicated in annotation tracks (i), (ii), (v) and (vi). Where the data supported multiple gene candidates, the gene with the highest number of individual functional data points was assigned to be the candidate. Where multiple genes have the same number of data points all genes are listed. Competing mechanisms for the same gene (e.g. both coding and promoter variants) were allowed.
Supplementary Material
Acknowledgements
We thank the subjects with TGCT and the clinicians involved in their care for participation in this study. We thank the patients and all clinicians forming part of the UK Testicular Cancer Collaboration (UKTCC) for their participation in this study. A full list of UKTCC members is included in Supplementary note 1. We acknowledge National Health Service funding to the National Institute for Health Research Biomedical Research Centre. We thank the UK Genetics of Prostate Cancer Study (UKGPCS) study teams for the recruitment of the UKGPCS controls. Genotyping of the OncoArray was funded by the US National Institutes of Health (NIH) [U19 CA 148537 for ELucidating Loci Involved in Prostate cancer SuscEptibility (ELLIPSE) project and X01HG007492 to the Center for Inherited Disease Research (CIDR) under contract number HHSN268201200008I]. Additional analytic support was provided by NIH NCI U01 CA188392 (PI: Schumacher). The PRACTICAL consortium was supported by Cancer Research UK Grants C5047/A7357, C1287/A10118, C1287/A16563, C5047/A3354, C5047/A10692, C16913/A6135, European Commission's Seventh Framework Programme grant agreement n° 223175 (HEALTH-F2-2009-223175), and The National Institute of Health (NIH) Cancer Post-Cancer GWAS initiative grant: No. 1 U19 CA 148537-01 (the GAME-ON initiative). A full list of PRACTICAL consortium members is included in Supplementary note 2. We would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now Prostate Action), The Orchid Cancer Appeal, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. This study would not have been possible without the contributions of the following: M. K. Bolla (BCAC), Q. Wang (BCAC), K. Michailido (BCAC), J. Dennis (BCAC), P. Hall (COGS); D.F. Easton (BCAC), A. Berchuck (OCAC), R. Eeles (PRACTICAL), G. Chenevix-Trench (CIMBA), J. Dennis, P. Pharoah, A. Dunning, K. Muir, J. Peto, A. Lee, and E. Dicks. We also thank the following for their contributions to this project: Jacques Simard, Peter Kraft, Craig Luccarini and the staff of the Centre for Genetic Epidemiology Laboratory; and Kimberly F. Doheny and the staff of the Center for Inherited Disease Research (CIDR) genotyping facility. The results published here are in part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. This study makes use of data generated by the Wellcome Trust Case Control Consortium 2 (WTCCC2). A full list of the investigators who contributed to the generation of the data is available from the WTCCC website. We acknowledge the contribution of Elizabeth Rapley and Mike Stratton to the generation of previously published UK GWAS case data. We acknowledge funding from the Swedish Cancer Society (CAN2011/484 and CAN2012/823), the Norwegian Cancer Society (grants number 418975 – 71081 – PR-2006-0387 and PK01-2007-0375) and the Nordic Cancer Union (grant number S-12/07). This study was supported by the Movember foundation and the Institute of Cancer Research. K. Litchfield is supported by a PhD fellowship from Cancer Research UK. R.S.H. and P.B. are supported by Cancer Research UK (C1298/A8362 Bobby Moore Fund for Cancer Research UK). We thank all the individuals who took part in these studies and all the researchers, clinicians, technicians and administrative staff who have enabled this work to be carried out.
Footnotes
Author Contributions
C.T., K.L., and R.S.H designed the study. Case samples were recruited by A.R., R.H. and through UKTCC. R.E., A.D, K.M, J.P., Z.K-J, N.P. and D.E supplied Oncoarray control data. N.O. administrated genotyping of Oncoarray case samples. D.D. coordinated all case sample administration and tracking. K.L., M.L., A.H. and P.B. prepared samples for genotyping experiments. K.L., M.L., G.O., C.L., K.F. and I.A. conducted all Promotor HiC and 3C laboratory experiments. Bioinformatics and statistical analyses were designed by C.T., R.S.H and K.L.. K.L., G.M., C.L. and M.L. conducted all Promotor HiC and 3C data analysis. K.L. and P.L. conducted transcription factor enrichment analysis. K. L., C.L. and M.L. performed all other bioinformatics and statistical analyses. R.K., T. H., W. K., T.G. and F.W. provided Scandinavian GWAS data. K. L. drafted the manuscript with assistance from C.T., R.S.H., M.L., J.S., J.N. and T.B. All authors reviewed and contributed to the manuscript.
Data Availability
Case Oncoarray GWAS data and the Hi-C dataset utilized in this paper have both been deposited in the European Genome–phenome Archive (EGA), which is hosted by the European Bioinformatics Institute (EBI), under the accession codes EGAS00001001836 and EGAS00001001930 respectively.
Competing Financial Interests
The authors declare no competing financial interests.
References
- 1.Manku G, et al. Changes in the expression profiles of claudins during gonocyte differentiation and in seminomas. Andrology. 2016;4:95–110. doi: 10.1111/andr.12122. [DOI] [PubMed] [Google Scholar]
- 2.Le Cornet C, et al. Testicular cancer incidence to rise by 25% by 2025 in Europe? Model-based predictions in 40 countries using population-based registry data. Eur J Cancer. 2014;50:831–9. doi: 10.1016/j.ejca.2013.11.035. [DOI] [PubMed] [Google Scholar]
- 3.Litchfield K, et al. Quantifying the heritability of testicular germ cell tumour using both population-based and genomic approaches. Sci Rep. 2015;5 doi: 10.1038/srep13889. 13889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Swerdlow AJ, De Stavola BL, Swanwick MA, Maconochie NE. Risks of breast and testicular cancers in young adult twins in England and Wales: evidence on prenatal and genetic aetiology. Lancet. 1997;350:1723–8. doi: 10.1016/s0140-6736(97)05526-8. [DOI] [PubMed] [Google Scholar]
- 5.McGlynn KA, Devesa SS, Graubard BI, Castle PE. Increasing incidence of testicular germ cell tumors among black men in the United States. J Clin Oncol. 2005;23:5757–61. doi: 10.1200/JCO.2005.08.227. [DOI] [PubMed] [Google Scholar]
- 6.Hemminki K, Li X. Familial risk in testicular cancer as a clue to a heritable and environmental aetiology. British Journal of Cancer. 2004;90:1765–1770. doi: 10.1038/sj.bjc.6601714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kharazmi E, et al. Cancer Risk in Relatives of Testicular Cancer Patients by Histology Type and Age at Diagnosis: A Joint Study from Five Nordic Countries. Eur Urol. 2015;68:283–9. doi: 10.1016/j.eururo.2014.12.031. [DOI] [PubMed] [Google Scholar]
- 8.Rapley EA, et al. A genome-wide association study of testicular germ cell tumor. Nat Genet. 2009;41:807–10. doi: 10.1038/ng.394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Turnbull C, Rahman N. Genome-wide association studies provide new insights into the genetic basis of testicular germ-cell tumour. Int J Androl. 2011;34:e86–96. doi: 10.1111/j.1365-2605.2011.01162.x. discussion e96-7. [DOI] [PubMed] [Google Scholar]
- 10.Kanetsky PA, et al. Common variation in KITLG and at 5q31.3 predisposes to testicular germ cell cancer. Nat Genet. 2009;41:811–5. doi: 10.1038/ng.393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Turnbull C, et al. Variants near DMRT1, TERT and ATF7IP are associated with testicular germ cell cancer. Nat Genet. 2010;42:604–7. doi: 10.1038/ng.607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kanetsky PA, et al. A second independent locus within DMRT1 is associated with testicular germ cell tumor susceptibility. Hum Mol Genet. 2011;20:3109–17. doi: 10.1093/hmg/ddr207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ruark E, et al. Identification of nine new susceptibility loci for testicular cancer, including variants near DAZL and PRDM14. Nat Genet. 2013;45:686–9. doi: 10.1038/ng.2635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bojesen SE, et al. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nat Genet. 2013;45:371–84. doi: 10.1038/ng.2566. 384e1-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chung CC, et al. Meta-analysis identifies four new loci associated with testicular germ cell tumor. Nat Genet. 2013;45:680–5. doi: 10.1038/ng.2634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kristiansen W, et al. Two new loci and gene sets related to sex determination and cancer progression are associated with susceptibility to testicular germ cell tumor. Hum Mol Genet. 2015 doi: 10.1093/hmg/ddv129. [DOI] [PubMed] [Google Scholar]
- 17.Litchfield K, et al. Multi-stage genome-wide association study identifies new susceptibility locus for testicular germ cell tumour on chromosome 3q25. Hum Mol Genet. 2015;24:1169–76. doi: 10.1093/hmg/ddu511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Litchfield K, et al. Identification of four new susceptibility loci for testicular germ cell tumour. Nat Commun. 2015;6 doi: 10.1038/ncomms9690. 8690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Litchfield K, Shipley J, Turnbull C. Common variants identified in genome-wide association studies of testicular germ cell tumour: an update, biological insights and clinical application. Andrology. 2015;3:34–46. doi: 10.1111/andr.304. [DOI] [PubMed] [Google Scholar]
- 20.Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genetics. 2006;38:390–390. doi: 10.1038/ng1706. (vol 38, pg 209, 2006) [DOI] [PubMed] [Google Scholar]
- 21.Consortium, E.P et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Consortium, G.T. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Agnihotri S, et al. A GATA4-regulated tumor suppressor network represses formation of malignant human astrocytomas. J Exp Med. 2011;208:689–702. doi: 10.1084/jem.20102099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hellebrekers DM, et al. GATA4 and GATA5 are potential tumor suppressors and biomarkers in colorectal cancer. Clin Cancer Res. 2009;15:3990–7. doi: 10.1158/1078-0432.CCR-09-0055. [DOI] [PubMed] [Google Scholar]
- 25.Tsang AP, et al. FOG, a multitype zinc finger protein, acts as a cofactor for transcription factor GATA-1 in erythroid and megakaryocytic differentiation. Cell. 1997;90:109–19. doi: 10.1016/s0092-8674(00)80318-9. [DOI] [PubMed] [Google Scholar]
- 26.Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;Chapter 7:Unit7 20. doi: 10.1002/0471142905.hg0720s76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ketola I, et al. Developmental expression and spermatogenic stage specificity of transcription factors GATA-1 and GATA-4 and their cofactors FOG-1 and FOG-2 in the mouse testis. Eur J Endocrinol. 2002;147:397–406. doi: 10.1530/eje.0.1470397. [DOI] [PubMed] [Google Scholar]
- 28.Zheng R, Blobel GA. GATA Transcription Factors and Cancer. Genes Cancer. 2010;1:1178–88. doi: 10.1177/1947601911404223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kurimoto K, Yamaji M, Seki Y, Saitou M. Specification of the germ cell lineage in mice: a process orchestrated by the PR-domain proteins, Blimp1 and Prdm14. Cell Cycle. 2008;7:3514–8. doi: 10.4161/cc.7.22.6979. [DOI] [PubMed] [Google Scholar]
- 30.Ohinata Y, et al. A signaling principle for the specification of the germ cell lineage in mice. Cell. 2009;137:571–84. doi: 10.1016/j.cell.2009.03.014. [DOI] [PubMed] [Google Scholar]
- 31.Yamaji M, et al. Critical function of Prdm14 for the establishment of the germ cell lineage in mice. Nat Genet. 2008;40:1016–22. doi: 10.1038/ng.186. [DOI] [PubMed] [Google Scholar]
- 32.Smith CA, McClive PJ, Western PS, Reed KJ, Sinclair AH. Conservation of a sex-determining gene. Nature. 1999;402:601–2. doi: 10.1038/45130. [DOI] [PubMed] [Google Scholar]
- 33.Rao S, et al. Differential roles of Sall4 isoforms in embryonic stem cell pluripotency. Mol Cell Biol. 2010;30:5364–80. doi: 10.1128/MCB.00419-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Greenbaum MP, et al. TEX14 is essential for intercellular bridges and fertility in male mice. Proc Natl Acad Sci U S A. 2006;103:4982–7. doi: 10.1073/pnas.0505123103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mondal G, Ohashi A, Yang L, Rowley M, Couch FJ. Tex14, a Plk1-regulated protein, is required for kinetochore-microtubule attachment and regulation of the spindle assembly checkpoint. Mol Cell. 2012;45:680–95. doi: 10.1016/j.molcel.2012.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Jinks RN, et al. Recessive nephrocerebellar syndrome on the Galloway-Mowat syndrome spectrum is caused by homozygous protein-truncating mutations of WDR73. Brain. 2015;138:2173–90. doi: 10.1093/brain/awv153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Colin E, et al. Loss-of-function mutations in WDR73 are responsible for microcephaly and steroid-resistant nephrotic syndrome: Galloway-Mowat syndrome. Am J Hum Genet. 2014;95:637–48. doi: 10.1016/j.ajhg.2014.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Petrovic A, et al. The MIS12 complex is a protein interaction hub for outer kinetochore assembly. J Cell Biol. 2010;190:835–52. doi: 10.1083/jcb.201002070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Rao CV, Yamada HY, Yao Y, Dai W. Enhanced genomic instabilities caused by deregulated microtubule dynamics and chromosome segregation: a perspective from genetic studies in mice. Carcinogenesis. 2009;30:1469–74. doi: 10.1093/carcin/bgp081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Barisic M, et al. Mitosis. Microtubule detyrosination guides chromosomes during mitosis. Science. 2015;348:799–803. doi: 10.1126/science.aaa5175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ma W, Viveiros MM. Depletion of pericentrin in mouse oocytes disrupts microtubule organizing center function and meiotic spindle organization. Mol Reprod Dev. 2014;81:1019–29. doi: 10.1002/mrd.22422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Litchfield K, et al. Whole-exome sequencing reveals the mutational spectrum of testicular germ cell tumours. Nat Commun. 2015;6 doi: 10.1038/ncomms6973. 5973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zeron-Medina J, et al. A polymorphic p53 response element in KIT ligand influences cancer risk and has undergone natural selection. Cell. 2013;155:410–22. doi: 10.1016/j.cell.2013.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.De Miguel MP, Cheng L, Holland EC, Federspiel MJ, Donovan PJ. Dissection of the c-Kit signaling pathway in mouse primordial germ cells by retroviral-mediated gene transfer. Proc Natl Acad Sci U S A. 2002;99:10458–63. doi: 10.1073/pnas.122249399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yu M, et al. The scaffolding adapter Gab2, via Shp-2, regulates kit-evoked mast cell proliferation by activating the Rac/JNK pathway. J Biol Chem. 2006;281:28615–26. doi: 10.1074/jbc.M603742200. [DOI] [PubMed] [Google Scholar]
- 46.Penegar S, et al. National study of colorectal cancer genetics. Br J Cancer. 2007;97:1305–9. doi: 10.1038/sj.bjc.6603997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Eisen T, Matakidou A, Houlston R, Consortium, G Identification of low penetrance alleles for lung cancer: the GEnetic Lung CAncer Predisposition Study (GELCAPS) BMC Cancer. 2008;8:244. doi: 10.1186/1471-2407-8-244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes. Nat Methods. 2012;9:179–81. doi: 10.1038/nmeth.1785. [DOI] [PubMed] [Google Scholar]
- 49.Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44:955–9. doi: 10.1038/ng.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11:499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]
- 51.Cuppen E. Genotyping by Allele-Specific Amplification (KASPar) CSH Protoc. 2007;2007 doi: 10.1101/pdb.prot4841. pdb prot4841. [DOI] [PubMed] [Google Scholar]
- 52.Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–13. doi: 10.1038/ng2088. [DOI] [PubMed] [Google Scholar]
- 53.Clayton DG, et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet. 2005;37:1243–6. doi: 10.1038/ng1653. [DOI] [PubMed] [Google Scholar]
- 54.Litchfield K, et al. Multi-stage genome wide association study identifies new susceptibility locus for testicular germ cell tumour on chromosome 3q25. Hum Mol Genet. 2014 doi: 10.1093/hmg/ddu511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Liu JZ, et al. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat Genet. 2010;42:436–40. doi: 10.1038/ng.572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21:1539–58. doi: 10.1002/sim.1186. [DOI] [PubMed] [Google Scholar]
- 57.Scales M, Jager R, Migliorini G, Houlston RS, Henrion MY. visPIG--a web tool for producing multi-region, multi-track, multi-scale plots of genetic data. PLoS One. 2014;9:e107497. doi: 10.1371/journal.pone.0107497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Pharoah PDP, et al. Polygenic susceptibility to breast cancer and implications for prevention. Nature Genetics. 2002;31:33–36. doi: 10.1038/ng853. [DOI] [PubMed] [Google Scholar]
- 59.CRUK. (2014).
- 60.Cowper-Sal lari R, et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat Genet. 2012;44:1191–8. doi: 10.1038/ng.2416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–80. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mifsud B, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet. 2015;47:598–606. doi: 10.1038/ng.3286. [DOI] [PubMed] [Google Scholar]
- 63.Wingett S, et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 2015;4:1310. doi: 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Cairns Jonathan, FP P, Wingett Steven W, Várnai Csilla, Dimond Andrew, Plagnol Vincent, Zerbino Daniel, Schoenfelder Stefan, Javierre Biola-Maria, Osborne Cameron, Fraser Peter, et al. CHiCAGO: Robust Detection of DNA Looping Interactions in Capture Hi-C data. BioRxiv. 2016 doi: 10.1186/s13059-016-0992-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Untergasser A, et al. Primer3--new capabilities and interfaces. Nucleic Acids Res. 2012;40:e115. doi: 10.1093/nar/gks596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat Methods. 2012;9:671–5. doi: 10.1038/nmeth.2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–6. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514–7. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012;40:D930–4. doi: 10.1093/nar/gkr917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 2014;42:2976–87. doi: 10.1093/nar/gkt1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR. The ensembl regulatory build. Genome Biol. 2015;16:56. doi: 10.1186/s13059-015-0621-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





