Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Apr 1.
Published in final edited form as: Gastroenterology. 2018 Dec 6;156(5):1455–1466. doi: 10.1053/j.gastro.2018.11.066

Large-scale Genome-wide Association Study of East Asians Identifies Loci Associated With Risk for Colorectal Cancer

Yingchang Lu 1, Sun-Seog Kweon 2,3, Chizu Tanikawa 4, Wei-Hua Jia 5, Yong-Bing Xiang 6, Qiuyin Cai 1, Chenjie Zeng 1, Stephanie L Schmit 7, Aesun Shin 8, Keitaro Matsuo 9,10, Sun Ha Jee 12, Dong-Hyun Kim 13, Jeongseon Kim 14, anqing Wen 1, Jiajun Shi 1, Xingyi Guo 1, Bingshan Li 15, Nan Wang 16, Ben Zhang 17, Xinxiang Li 18,19, Min-Ho Shin 20, Hong-Lan Li 6, Zefang Ren 21, Jae Hwan Oh 22, Isao Oze 9, Yoon-Ok Ahn 23, Keum Ji Jung 11, David V Conti 24, Fredrick R Schumacher 25, Gad Rennert 26,27,28, Mark A Jenkins 29, Peter T Campbell 30, Michael Hoffmeister 31, Graham Casey 32, Stephen B Gruber 24,33, Jing Gao 6, Yu-Tang Gao 6, Zhi-Zhong Pan 5, Yoichiro Kamatani 34,35, Yi-Xin Zeng 5, Xiao-Ou Shu 1, Jirong Long 1, Koichi Matsuda 36, Wei Zheng 1
PMCID: PMC6441622  NIHMSID: NIHMS1516051  PMID: 30529582

Abstract

Background & Aims:

Genome-wide association studies (GWASs) have associated approximately 50 loci with risk of colorectal cancer (CRC)—nearly one-third of these loci were initially associated with CRC in studies conducted in East Asian populations. We conducted a GWAS of East Asians to identify CRC risk loci and evaluate the generalizability of findings from GWAS of European populations to Asian populations

Methods:

We analyzed genetic data from 22,775 patients with CRC (cases) and 47,731 individuals without cancer (controls) from 14 studies in the Asia Colorectal Cancer Consortium. First, we performed a meta-analysis of 7 GWAS (10,625 cases and 34,595 controls) and identified 46,554 promising risk variants for replication by adding them to the Multi-Ethnic Global Array (MEGA) for genotype analysis in 6445 cases and 7175 controls. These data were analyzed, along with data from additional 5705 cases and 5961 controls genotyped using the OncoArray. We also obtained data from 57,976 cases and 67,242 controls of European descent. Variants at identified risk loci were functionally annotated and evaluated in correlation with gene expression levels.

Results:

A meta-analyses of all samples from people of Asian descent identified 13 loci and 1 new variant at a known locus (10q24.2) associated with risk of CRC at the genome-wide significance level of P<5 × 10−8. We did not perform experiments to replicate these associations in additional individuals of Asian ancestry. However, the lead risk variant in 6 of these loci was also significantly associated with risk of CRC in European descendants. A strong association (44%–75% increase in risk per allele) was found for 2 low-frequency variants: rs201395236 at 1q44 (minor allele frequency, 1.34%) and rs77969132 at 12p11.21 (minor allele frequency, 1.53%). For 8 of the 13 associated loci, the variants with the highest levels of significant association were located inside or near the protein-coding genes L1TD1, EFCAB2, PPP1R21, SLCO2A1, HLA-G, NOTCH4, DENND5B, and GNAS. For other intergenic loci, we provided evidence for the possible involvement of the genes ALDH7A1, PRICKLE1, KLF5, WWOX, and GLP2R. We replicated findings for 41 of 52 previously reported risk loci.

Conclusions:

We showed that most of the risk loci previously associated with CRC risk in individuals of European descent were also associated with CRC risk in East Asians. Furthermore, we identified 13 loci significantly associated with risk for CRC in Asians. Many of these loci contained genes that regulate the immune response, Wnt signaling to beta-catenin, prostaglandin E2 catabolism, and cell pluripotency and proliferation. Further analyses of these genes and their variants is warranted—particularly for the 8 loci for which the lead CRC risk variants were not replicated in persons of European descent.

Keywords: colon cancer, genetic variants, immunology, epidemiology

Introduction

Colorectal cancer (CRC) is one of the most frequently diagnosed malignancies around the world. Genetic factors play a significant role in the etiology of both familial and sporadic CRC1, 2. Family-based linkage studies and recent whole-exome sequencing studies have identified multiple CRC susceptibility genes, such as APC, MUTYH, MLH1, MSH2, MSH6, PMS2, PTEN, STK11, GREM1, BMPR1A, SMAD4, POLE, POLD1, NTHL1 and TP533-5. Deleterious mutations in these genes, however, are rare and account for less than 6% of CRC cases in the general population3. To date, genome-wide association studies (GWAS) have identified 52 independent loci in relation to CRC risk3, 69. These common genetic risk variants, however, explain only a small fraction of the familial relative risk of CRCs8, 10, particularly in non-European descendants.

In 2010, we initiated the Asia Colorectal Cancer Consortium (ACCC) to identify new genetic risk factors for CRC. Given the differences in genetic architecture and environmental exposure between Asian and European descendants, we hypothesized that our study could identify CRC genetic risk variants that are more specific to the Asian population and uncover important novel genetic risk variants that might otherwise be difficult to identify in European descendants. Over the years, we have reported findings for 13 novel CRC risk loci and two independent risk variants in known CRC risk loci68, 11. In this study, we report the discovery of another 13 novel risk variants for CRC in a large study that includes more than 70,000 cases and controls of East Asian ancestry.

Methods

Overview of Study population and Study Design

Included in the current study are 22,775 CRC cases and 47,731 controls of East-Asian ancestry from 14 studies conducted in China, Japan, and South Korea (Supplementary Notes, Supplementary Figure 1 and Supplementary Table 1). In Stage 1, we meta-analyzed data from seven studies including 10,625 cases and 34,595 controls that were genotyped using high-density SNP arrays. We identified 46,554 variants associated with CRC risk at P < 0.005 (for common variants) or at P < 0.0005 (for rare variants). These variants were added to the Multi-Ethnic Global Array (MEGA), which includes ~2 million variants, as part of the custom content for a large study including multiple complex traits. Updated data from the Japanese CRC Study (BBJ) subsequently became available, and thus, were included in the Stage 1 meta-analysis12. In Stage 2, we analyzed 6,445 cases and 7,175 controls from five studies genotyped using the expanded MEGA (described above) and 5,705 cases and 5,961 controls from two studies genotyped using the OncoArray, which has approximately 570,000 SNPs, including ~260,000 SNPs for the GWAS backbone. To evaluate the generalizability of our findings in Asian descendants, we used data from studies conducted in European descendants comprised of 57,976 cases and 67,242 controls that were recruited in North America, Europe and Australia. These cases and controls were included in three consortia: the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), the Colorectal Transdisciplinary (CORECT) Study, and the Colon Cancer Family Registry (CCFR). The details of these studies have been described previously9. A description of participating studies is presented in the Supplementary Notes. All study protocols were approved by the relevant Institutional Review Boards.

Genotyping and Imputation.

Details of genotyping, genotype calling and quality control are provided in the Supplementary Notes. The genotyping and quality control procedures for Stage 1 have been described previously68, 11, 12. Genotyping for Guangzhou-2 samples was completed using the Asian ExomeChip, an expanded Illumina HumanExome Beadchip that includes 159,370 rare variants (minor allele frequency (MAF)<1%) and 49,603 common variants. Genotyping and quality control of samples in the BBJ were reported previously8, 1113. Genotyping for the Shanghai-3 and HCES-CRC samples in Stage 2 was completed using the Illumina OncoArray14, and genotyping for samples from Shanghai-4, Aichi-2, Korea-NCC, Korea-Seoul and HCES2-CRC was completed using the expanded Illumina MEGA.

We re-genotyped a subset of samples (N = 918) for all 13 index SNPs to assess imputation quality. The re-genotyping was performed with Sequenom MassARRAY. The overall concordance rates between imputed data and genotyped data were high, indicating that the imputation quality was excellent for these SNPs (Supplementary Table 12).

We evaluated population structure with principal component (PC) analysis using EIGENSTRAT15, and found little evidence of population stratification for any of the studies included in this analysis (Supplementary Notes, Supplementary Figures 2 and 3). In order to increase the genome coverage and facilitate the meta-analysis, we imputed untyped genotypes using the 1000 Genomes Project phase 3 mixed reference haplotypes. After quality control filtering, approximately 8.6~10.7 million genotyped or imputed variants on the 22 autosomes remained for the association analysis. The Guangzhou-2 study was genotyped with an exome array, and thus, no imputation was performed, given the low genome coverage of this array.

Statistical Analysis

The score test in Rvtest16 was used to evaluate the association of genotype dosage with colorectal cancer risk after adjusting for age, sex, and, when appropriate, the first five PCs in each individual study (Supplementary Notes). The likelihood ratio test in mach2dat was used to assess the association of the genotype dosage with colorectal cancer risk in the BBJ. SNPs were excluded from the analysis if they had a low imputation quality (R2 < 0.3) or a low MAF (<0.1%), based on the combined cases and controls. The summary statistics from 14 case-control studies included in both Stages 1 and 2 were meta-analyzed using the inverse variance-weighted fixed effect model implemented in METAL17. Variants showing an association with CRC risk at P < 5 × 10−8 in the combined analysis of the Stage 1 and Stage 2 studies were considered to be genome-wide significant18. Each unique locus was defined as ±500 kb on either side of the most significant SNP, with a P < 5 × 10−8. We evaluated heterogeneity across studies and subgroups with a Cochran’s Q test19. No apparent inflation was noted for any of the studies included in Stage 1 and Stage 2, as the λGC and λGC 1,000 were small in the model with the adjustment of the first five principal components (Supplementary Figure 4, Supplementary Notes).

In an attempt to identify additional association signals in the same locus, we performed conditional analyses using the GCTA-COJO approach20 for meta-analysis summary statistics based on a linkage disequilibrium matrix from 6684 unrelated East Asian samples genotyped with MEGA and pruned with interindividual genetic relationships < 0.025.

The familial relative risk (λ) for the offspring of an affected individual due to a single locus was estimated using a log-additive model: λ = (pr2 + q)/(pr + q)2, where p is the frequency of the risk allele, q = 1−p is the frequency of the reference allele and r is the per-allele relative risk21. The proportion of the familial relative risk explained by this locus, assuming a multiplicative interaction between markers in the locus and other loci, was calculated as log (λ)/log (λ0), where λ0 is the overall familial relative risk of 2.2 for CRC (derived from a meta-analysis)22. Assuming that the risks associated with individual loci combine multiplicatively, the familial relative risks would also multiply. Thus, the combined contribution of the familial relative risks from multiple loci is equal to

K(logλk)/(logλ0)

Using an additive genetic model, we calculated weighted polygenic risk scores (PRSs) using the natural log-odds ratios (ORs) as the variant-specific weight that was obtained from current analyses. The PRSs were calculated as the sum of the product of the weight and the number of risk alleles for each of the 57 replicated (P < 0.05) and newly identified GWAS index variants that had an overall imputation R2 > 0.8 across seven studies genotyped with either OncoArray or MEGA. Using the lowest quintile of PRS as the reference, ORs associated with PRS were estimated using logistic regression for other quintile groups, with an adjustment for age, sex, study and the first five PCs (Supplementary Table 11).

Pathway Analysis

The pathway analysis was performed with MAGMA23. This analysis mapped all quality-controlled variants onto genes, estimated the variant-wise-mean P value for individual genes, and then conduced a generalized gene-set analysis, adjusting for the gene size, gene density and other covariates. The East Asian populations in 1000 Genome Phases 3 was used as the reference dataset for linkage disequilibrium estimation. Gene sets were obtained from MsigDB v5.2 (Supplementary Table 9). The false discovery rate (FDR) was used to adjust for the multiple testing.

In Silico Functional Characterization of Novel Loci

To identify potential functional variants and target genes at novel risk loci uncovered in this study, we used ANNOVAR24 to annotate all variants that are in linkage disequilibrium (LD) (r2 ≥ 0.8, EAS in the 1000 Genomes Project) with the lead variants revealed in our GWAS within 500kb (189 variants in total, Supplementary Table 6). The functional impact of variants were assessed with GERP25, SiPhy26 and PolyPhen227. We further characterized these 189 variants using the web-based HaploReg v428. The data from the Roadmap Epigenomic project (ChromHMM states corresponding to enhancer or promoter elements, histone modification ChIP-seq peaks and DNase hypersensitivity data peaks) were extracted, along with the regulatory protein binding from the ENCODE project and the regulatory motifs based on the commercial, literature and motif-finding analysis of the ENCODE project. We also examined these variants using the web-based RegulomeDB29.

cis-Expression Quantitative Trait Loci (cis-eQTL) Analysis

We performed cis-eQTLs analyses for newly identified risk variants using data from tumor-adjacent normal samples obtained from 133 East Asian CRC patients7, 8. For these analyses, we defined cis- as a ±1mb region flanking each risk variant. Germline DNA of these patients was genotyped using the MEGA. We evaluated expression levels of genes using RNA sequencing. Associations between gene expression levels and SNP genotypes were analyzed using linear regression model, adjusting for sex and the top two PCs. We also evaluated the association of these variants with expression of genes, using data from transverse colon tissues (n=246) in the Genotype-Tissue Expression (GTEx) database30.

Results

In the combined analysis of 22,775 CRC cases and 47,731 controls, we identified 13 novel risk loci for CRC at the genome-wide significance level (P < 5×10−8) (Table 1, Supplementary Figure 5). In two of these loci, the lead SNPs have a low MAF: rs201395236 at 1q44 (MAF = 1.34%, allelic OR = 1.75 for the major allele) and rs77969132 at 12p11.21 (MAF = 1.53%, allelic OR = 1.44 for the minor allele). The risk associated with these two low-frequency variants was substantially higher than all common risk variants reported in previous GWAS. The lead SNPs in the remaining 11 were common (MAF > 5%), with MAFs ranging from 14 to 43% and ORs ranging from 1.08 to 1.16. Interestingly, we also identified a new risk variant, rs6584283, at a known CRC locus, 10q24.2, that was reported initially in a GWAS conducted in European descendants. However, this variant was not in LD with either of the two previously reported variants (r2 EUR or EAS < 0.01 with rs103520931 and rs1119016432. The association for rs6584283 was genome-wide significant (ORcondition (95%CI) for C allele = 1.08 (1.06 – 1.11), Pcondition = 5.9×10−10) after adjustments for these two variants.

Table 1.

Summary results of newly identified genetic risk loci/variants associated with colorectal cancer risk at P < 5 × 10−8: Results from the Asia Colorectal Cancer Consortium

Locus SNP Position (hgl9) Nearby genes & Annotation Alleles RAF (%) OR (95% CI) P value PHet I2
Novel risk loci
1p31.3 rs7542665 62,673,037 missense of L1TD1 C/T 27.3 1.08 (1.05 – 1.11) 3.51 × 10−8 0.98 0.0
1q44 rs201395236 245,181,421 intronic of EFCAB2 T/C 98.7 1.75 (1.43 – 2.13) 4.63 × 10−8 0.52 0.0
2p16.3 rs7606562 48,686,695 intronic of PPP1R21 T/A 81.3 1.10 (1.07 – 1.14) 1.21 × 10−8 0.88 0.0
3q22.2 rs113569514 133,748,789 5'UTR of SLCO2A1 T/C 62.0 1.10 (1.07 – 1.13) 2.45 × 10−12 0.92 0.0
5q23.2 rs12659017 125,988,175 intergenic (ALDH7A1, PHAX) G/A 23.2 1.09 (1.06 – 1.12) 4.45 × 10−8 0.11 33.5
6p22.1 rs1476570 29,809,860 intronic of HLA-G A/G 37.6 1.12 (1.08 – 1.16) 6.71 × 10−9 0.01 58.0
6p21.32 rs3830041 32,191,339 intronic of NOTCH4 T/C 14.0 1.16 (1.10 – 1.21) 1.65 × 10−8 0.03 46.2
12p11.21 rs77969132 31,594,813 intronic of DENND5B T/C 1.5 1.44 (1.27 – 1.65) 4.85 × 10−8 0.18 26.2
12q12 rs2730985 43,130,624 147kb 5' of PRICKLE1 G/A 62.9 1.08 (1.05 – 1.11) 1.23 × 10−8 0.20 23.9
13q22.1 rs1886450 73,986,628 intergenic (KLF5, KLF12) G/A 59.3 1.09 (1.07 – 1.12) 6.28 × 10−12 0.60 0.0
16q23.2 rs4341754 80,039,621 intergenic (WWOX, MAF) G/C 57.0 1.09 (1.06 – 1.12) 1.73 × 10−9 0.64 0.0
17p12 rs1078643 10,707,241 19kb 3' of PIRT A/G 77.1 1.13 (1.09 – 1.17) 8.40 × 10−13 0.57 0.0
20q13.32 rs13831 57,475,191 intronic of GNAS G/A 68.4 1.08 (1.05 – 1.11) 2.06 × 10−8 0.68 0.0
Novel risk variant in known loci
10q24.2 rs6584283 101,290,301 2kb 5' of NKX2–3 C/T 56.4 1.09 (1.06 – 1.12) 1.21 × 10−10 0.65 0.0

hg19: genome build 37; Alleles: risk allele/reference allele; RAF, risk allele frequency calculated with the combined stage 1 and stage 2 samples; OR, odds ratio; CI, confidence interval; I2 and PHet: derived from the heterogeneity test across all 14 studies included in Stage 1 and Stage 2.

All 14 risk variants showed a consistent association between Stage 1 and Stage 2, and 12 of them were statistically significant at P < 0.05 in both stages (Supplementary Table 2). Although the associations for rs201395236 and rs3830041 were not significant at P < 0.05 in Stage 1, they were both statistically significant in the two components (OncoArray and MEGA) of Stage 2 (Supplementary Table 3). Associations with CRC risk for each of the 14 new risk variants were consistent across all studies included in both Stages 1 and 2, with little evidence of heterogeneity (Table 1). Only two variants (rs1476570 and rs3830041) showed some heterogeneity across 14 participating studies in their association with CRC risk (P = 0.01 and 0.03, respectively). However, the difference was no longer statistically significant after adjusting for multiple comparisons. No statistically significant difference in the associations was found across ethnicity groups (Supplementary Table 4).

Next, we evaluated the generalization of associations for newly identified CRC risk variants in populations of European ancestry, using data from 57,976 cases and 67,242 controls. Imputed genotype data were available for 12 variants (Table 2). Six variants were found to be associated with CRC risk at P < 0.05 in the same direction as observed in the East-Asian population (rs113569514, rs6584283, rs2730985, rs1886450, rs4341754 and rs1078643), including five (rs113569514, rs6584283, rs2730985, rs4341754 and rs1078643) that showed an association at P < 4 × 10−3, the Bonferroni-corrected significance level (0.05/12 variants). However, there are considerable heterogeneities in the strength of the associations of these SNPs with CRC risk between Asian and European descendants, even for SNPs that showed a significant association in both populations. Furthermore, the frequencies of risk variants also differ considerably in these two populations (Tables 1 and 2). These results are consistent with our hypothesis and provide supports for genetic association studies in non-European populations.

Table 2.

Associations of newly identified risk variants with colorectal cancer risk in populations of European Ancestry

Locus SNP Position (hgl9) Alleles RAF (%) OR (95% CI) P value PHet-ASN vs EUR I2
1p31.3 rs7542665 62,673,037 C/T 66.2 1.01 (0.99–1.03) 0.23 6.69 × 10−5 93.7
1q44 rs201395236 245,181,421 T/C 100 NAb - - -
2p16.3 rs7606562 48,686,695 T/A 65.8 1.01 (0.99–1.03) 0.35 4.83 × 10−6 95.2
3q22.2 rs113569514 133,748,789 T/C 84.7 1.06 (1.03–1.08) 1.6×10−5 0.02 82.2
5q23.2 rs12659017 125,988,175 G/A 73.1 0.99 (0.97–1.01) 0.46 5.95 × 10−7 96.0
6p22.1 rs1476570 29,809,860 A/G 27.6 1.01 (0.99–1.03) 0.44 1.62 × 10−6 95.7
6p21.32 rs3830041 32,191,339 T/C 9.1 1.00 (0.97–1.03) 0.92 1.75 × 10−6 95.6
10q24.2 rs6584283 101,290,301 C/T 53.3 1.03 (1.01–1.05) 3.6×10−4 8.21 × 10−4 91.1
12p11.21 rs77969132 31,594,813 T/C 0 NAb - - -
12q12 rs2730985 43,130,624 G/A 52.3 1.05 (1.03–1.07) 5.5×10−9 0.10 63.9
13q22.1 rs1 886450 73,986,628 G/A 69.1 1.02 (1.01–1.04) 9.7×10−3 5.04 × 10−5 93.9
16q23.2 rs4341754 80,039,621 G/C 43.6 1.05 (1.03–1.07) 5.2×10−8 0.04 77.0
17p12 rs1078643 10,707,241 A/G 75.9 1.08 (1.05–1.10) 6.6×10−12 0.02 81.7
20q13.32 rs13831 57,475,191 G/A 70.5 1.01 (0.99–1.03) 0.32 4.39 × 10−5 94.0

hg19: genome build 37; Alleles: risk allele/reference allele as defined in the Asia Colorectal Cancer Consortium (See Table 1); RAF, risk allele frequency; OR, odds ratio; CI, confidence interval; PHet-EUR: derived from the heterogeneity test across all studies included in the European meta-analysis. I2 and PHet-ASN vs EUR: heterogeneity between East-Asian and European populations calculated with Cochran’s Q test.

a

, Risk allele frequency derived from European populations of 1000 Genome Project (Phase 3).

b

, NA, not estimated due to low imputation quality in European populations.

Evaluation of Previously Reported CRC Risk Variants

We investigated 62 previously reported CRC risk variants at 52 independent loci (Table 3 and Supplementary Table 5)9, 33. There were 19 independent risk variants at 18 loci initially reported in GWAS conducted in East Asian populations. All of them were replicated at P < 0.05, and 18 of them showed an association at P < 0.005, the Bonferroni-corrected significance level (0.10/19, one-side tests) (Table 3). Of the 43 risk variants at the 37 loci that were initially reported in European populations, 26 variants at 24 loci were associated with CRC risk at P < 0.05 in the same direction as reported initially, including 17 variants at P < 0.002 (0.10/43, one-side test). Only five of the tests for heterogeneity were statistically significant at P < 0.05, indicating little heterogeneity as judged using the multiple comparison-adjusted significance level of 9 × 10−4 (0.05/53).

Table 3.

Association results in East Asian populations for risk variants reported in previous GWAS of colorectal cancer, Results from the Asia Colorectal Cancer Consortiuma

Locus SNP Nearby genes Alleles RAF (1KGP) RAF (ACCC) OR (95% CI) P value PHet-ASN I2
Loci initially identified in GWAS of East-Asian-ancestry populationsb
5q31.1 rs647161 PITX1 A/C 0.30/0.66 0.33 1.12 (1.09 – 1.15) 3.10 × 10−15 0.04 45.2
6p21.1 rs4711689 TFEB A/G 0.80/0.54 0.85 1.09 (1.05 – 1.13) 6.17 × 10−6 0.53 0.0
8q23.3 rs2450115 EIF3H T/C 0.54/0.81 0.56 1.11 (1.08 – 1.14) 2.39 × 10−14 0.92 0.0
8q23.3 rs6469656 EIF3H A/G 0.67/0.89 0.65 1.09 (1.06 – 1.12) 2.06 × 10−10 0.41 3.3
10q22.3 rs704017 ZMIZ1 G/A 0.28/0.56 0.33 1.09 (1.06 – 1.12) 1.17 × 10−9 0.66 0.0
10q24.32 rs4919687 CYP17A1 G/A 0.75/0.70 0.83 1.07 (1.03 – 1.11) 2.94 × 10−4 0.81 0.0
10q25.2 rs12241008 VTI1A C/T 0.30/0.10 0.27 1.11 (1.07–1.14) 1.77 × 10−11 0.39 5.9
10q25.2 rs11196172 TCF7L2 A/G 0.65/0.12 0.69 1.12 (1.09 – 1.16) 3.19 × 10−15 0.41 3.8
11q12.2 rs174537 MYRF G/T 0.43/0.65 0.62 1.10 (1.07 – 1.13) 7.52 × 10−12 0.36 8.4
12p13.32 rs10774214 CCND2 T/C 0.32/0.38 0.40 1.09 (1.06 – 1.13) 2.60 × 10−9 0.41 3.5
12p13.31 rs10849432 PLEKHG6 T/C 0.81/0.90 0.82 1.08 (1.04 – 1.12) 1.95 × 10−5 0.27 16.7
12p13.31 rs11064437 SPSB2 C/T 0.72/0.99 0.72 1.04 (1.01 – 1.07) 3.47 × 10−3 0.43 1.6
12pl3.2 rs2238126 ETV6 G/A 0.47/0.19 0.49 1.03 (1.00 – 1.06) 0.03 0.45 0.0
16q24.1 rs847208 FENDRR A/C 0.68/0.62 0.67 1.08 (1.05 – 1.11) 2.62 × 10−7 0.04 44.2
17p13.3 rs12603526 NXN C/T 0.20/0.01 0.32 1.06 (1.03 −1.09) 6.40 × 10−5 0.04 43.2
18q21.1 rs7229639 SMAD7 A/G 0.13/0.10 0.17 1.21 (1.17 – 1.25) 2.49 × 10−27 0.68 0.0
19q13.2 rs1800469 TMEM91 G/A 0.45/0.69 0.50 1.06 (1.03 −1.09) 1.61 × 10−5 0.42 2.8
20p12.3 rs2423279 HAO1 C/T 0.33/0.27 0.31 1.10 (1.06 −1.13) 2.87 × 10−10 0.20 24.2
20q13.12 rs6065668 TOX2 T/C 0.47/0.25 0.40 1.07 (1.04 – 1.10) 3.65 × 10−7 0.09 36.9
Loci initially identified in GWAS of European-ancestry populationsc
1q25.3 rs10911251 LAMC1 A/C 0.48/0.55 0.54 1.05 (1.03–1.08) 1.03 × 10−4 0.42 2.9
2q35 rs992157 TMBIM1 A/G 0.62/0.58 0.59 1.04 (1.01 – 1.07) 6.26 × 10−3 0.79 0.0
3p22.1 rs35360328 CTNNB1 A/T 0.08/0.16 0.06 1.09 (1.02 – 1.16) 0.01 0.04 50.0
3p14.1 rs812481 LRIG1 G/C 0.79/0.56 0.78 0.97 (0.94 – 1.00) 0.07 0.25 19.4
3q26.2 rs10936599 MYNN C/T 0.42/0.76 0.39 1.06 (1.04 – 1.09) 2.41 × 10−6 0.56 0.0
4q22.2 rs1370821 SMARCAD1 T/C 0.33/0.40 0.26 1.01 (0.98 – 1.04) 0.53 0.30 13.9
5p15.33 rs2735940 TERT G/A 0.52/0.49 0.61 1.07 (1.04 – 1.10) 6.00 × 10−6 0.35 9.3
5p13.1 rs58791712 PTGER4 GT/G 0.04/0.26 0.04 1.12 (1.02 – 1.22) 0.02 0.61 0.0
6p21.31 rs6906359 FKBP5 C/T 0.93/0.90 0.94 1.04 (0.98 – 1.10) 0.20 0.69 0.0
6p21.2 rs1321311 CDKN1A A/C 0.17/0.22 0.15 1.09 (1.05 – 1.13) 1.35 × 10−6 0.21 22.3
6p12.1 rs62404968 BMP5 C/T 0.94/0.75 0.96 1.08 (1.01 – 1.15) 0.03 0.34 10.7
8q24.21 rs6983267 POU5F1B G/T 0.39/0.50 0.39 1.17 (1.14 – 1.20) 1.38 × 10−31 0.05 42.5
10p14 rs10795668 GATA3 G/A 0.63/0.68 0.61 1.14 (1.11 – 1.17) 5.20 × 10−24 0.27 17.2
10q11.23 rs10994860 A1CF C/T 0.95/0.80 0.95 1.04 (0.97 – 1.10) 0.25 0.91 0.0
10q24.2 rs1035209 SLC25A28 T/C 0.18/0.19 0.18 1.08 (1.04 – 1.12) 1.12 × 10−5 0.64 0.0
11q13.4 rs3824999 POLD3 G/T 0.42/0.52 0.41 1.07 (1.04 −1.09) 3.16 × 10−6 1.00 0.0
11q23.1 rs3802842 COLCA2 C/A 0.40/0.27 0.39 1.07 (1.04 – 1.10) 5.00 × 10−7 0.25 18.6
12p13.32 rs3217810 CCND2 T/C 0.01/0.12 0.01 1.11 (0.97 – 1.29) 0.14 0.73 0.0
12q13.12 rs7136702 LARP4 T/C 0.44/0.34 0.53 1.02 (0.99 – 1.05) 0.20 0.90 0.0
12q13.12 rs11169552 ATF1 C/T 0.60/0.75 0.67 1.04 (1.01 – 1.07) 8.30 × 10−3 0.97 0.0
12q24.21 rs72013726 MED13L C/CACAA 0.36/0.50 0.39 1.05 (1.01 – 1.09) 0.01 0.29 16.1
14q22.2 rs4444235 BMP4 C/T 0.47/0.49 0.55 1.04 (1.02 – 1.07) 7.85 × 10−4 0.68 0.0
14q22.2 rs1957636 BMP4 T/C 0.66/0.41 0.59 0.99 (0.97 – 1.02) 0.57 0.20 23.9
15q13.3 rs16969681 SCG5 T/C 0.37/0.07 0.46 1.08 (1.05 – 1.11) 1.31 × 10−8 0.06 41.8
15q13.3 rs4779584 SCG5 T/C 0.81/0.20 0.83 1.04 (1.01 – 1.08) 0.02 0.19 24.1
16q22.1 rs9929218 CDH1 G/A 0.77/0.71 0.83 1.05 (1.01 – 1.09) 9.22 × 10−3 0.65 0.0
16q24.1 rs2696839 FOXF1 G/C 0.75/0.50 0.75 1.03 (1.0 – 1.06) 0.04 0.50 0.0
18q21.1 rs4939827 SMAD7 T/C 0.31/0.53 0.24 1.13 (1.09 – 1.16) 5.44 × 10−15 0.84 0.0
19q13.11 rs10411210 RHPN2 C/T 0.81/0.90 0.83 1.10 (1.06 – 1.14) 1.22 × 10−7 0.17 27.0
20p12.3 rs961253 BMP2 A/C 0.11/0.36 0.10 1.08 (1.03 −1.12) 6.93 × 10−4 0.68 0.0
20p12.3 rs4813802 BMP2 G/T 0.23/0.32 0.21 1.10 (1.06 – 1.13) 1.44 × 10−8 0.32 0.0
20q13.13 rs6066825 PREX1 A/G 0.71/0.62 0.71 1.10 (1.07 – 1.13) 4.34 × 10−11 0.15 29.7
20q13.13 rs1810502 PTPN1 C/T 0.39/0.55 0.45 1.07 (1.04 – 1.10) 1.38 × 10−6 0.02 21.4
20q13.33 rs4925386 LAMA5 C/T 0.74/0.67 0.78 1.02 (0.99 – 1.05) 0.22 0.12 32.3

Alleles, risk/reference allele; RAF, risk allele frequency derived from the 1000 Genome Project (Phase 3) East Asians/Europeans (1KGP) or Asia Colorectal Cancer Consortium (ACCC); OR, odds ratio; CI, confidence interval; I2 and PHet-ASN: derived from the heterogeneity test across all studies included in Stage 1 and Stage 2.

a

Variants not replicated in previous studies were not evaluated in our study. Also not evaluated in our study were 9 variants that were initially reported GWAS conducted in European populations and have a very low MAF in Asians (<0.01%), including rs72647484 at 1p36.12, rs6691170 at 1q41, rs140355816 at 8q23.3, rs16892766 at 8q23.3, rs76316943 at 8q23.3, rs3184504 at 12q24.12, rs73208120 at 12q24.12, and rs17094983 at 14q23.1, or being on the X chromosome (rs5934683 at Xp22.2) (see Supplementary Table 5 for references). Sample size included in these analyses ranged from 62,256 to 70,506.

b

Variants at each locus with r2 EAS < 0.2 were listed; two loci (EIF3H at 8q23.3 and SMAD7 at 18q21.1) were initially reported in European populations, but variants identified in East Asian populations were not in LD with variants identified in European populations.

c

Variants at each locus with r2 EUR < 0.2 were listed; one locus (CCND2 at 12p13.32) was initially reported in East Asian populations, but variants identified in European populations were not in LD with variants identified in East Asian populations.

Functional Characterization of the Novel Risk Loci and cis-eQTL Analyses

To identify putative causal variants and genes underlying the observed associations, we used functional genomics data to annotate each of the lead variants we identified at the 13 novel loci and their correlated variants (r2 ≥ 0. 80). Aligning these risk variants with histone methylation/acetylation marks and DNase hypersensitivity sites28 revealed that variants at 8 loci (2p16.3, 3q22.2, 5q23.2, 12q12, 13q22.1, 16q23.2, 17p12 and 20q13.32) overlapped with the promoter/enhance histone marks or DNase hypersensitivity sites in gastrointestinal tissues (Supplementary Table 6). This suggests that these variants may be involved in regulating gene expressions in gastrointestinal tissues. To identify potential target genes, we performed cis-eQTL analyses using data from adjacent normal colon tissues, obtained from 133 CRC patients of East Asian ancestry (Supplementary Table 7), as well as transverse colon tissues obtained from 246 individuals predominantly of European ancestry in the GTEx (Supplementary Table 8). Significant correlations at P < 0.05 were found for 21 and 37 SNP-gene pairs in the East Asian and GTEx data set, respectively. For three of these SNPgene pairs, an identical association direction was found in both data sets. The risk A allele of rs1476570 was associated with reduced expressions of HLA-G and HLA-V; the risk T allele of rs3830041 was associated with reduced MICA expressions.

Pathway analysis of genome-wide summary statistics of CRC risk

We performed pathway analyses based on genome-wide summary statistics of CRC risk. The top enriched pathways were related to mesenchymal cell proliferation, Smad protein phosphorylation, pluripotent states of reprogrammed somatic cells, embryonic development, MHC (HLA) protein complex, gland morphogenesis and epithelial cell migration (FDR P value < 0.05, Supplementary Table 9).

Familial Relative Risk of CRC Explained and CRC risk of Weighted PRS

We estimated that the 14 novel risk variants identified in this study combined explain approximately 3.5% of the familial relative risk of CRC in East Asian populations, comparable to the familial relative risk (4.1%) explained by the 19 risk variants identified previously in the Asian population (Supplementary Table 9). An additional 4.1% of the familial relative risk in East Asian populations can be explained by the 24 risk variants initially identified in studies conducted among European-ancestry populations and replicated in this study. Together, 11.7% of the familial relative risk of CRC in individuals of East Asian ancestry can be explained by the 57 CRC risk variants newly identified (n = 14) and replicated (n = 43) in our study.

There was a clear dose-response association between weighted PRS and CRC risk. Individuals in the highest PRS quintile group (≥ 4.80) had a 3.2-fold increased CRC risk (OR (95%CI) = 3.16 (2.88 – 3.47)) when compared with individuals in the lowest PRS group (< 4.10) (Supplementary Table 11).

Discussion

In this large GWAS, with a sample size of 22,775 CRC cases and 47,731 controls of East Asian descent, we identified, at P<5 × 10–8, 13 new risk loci for CRC and an independent risk variant in a locus previously reported in a GWAS conducted in European descendants. We replicated all of the 19 independent CRC risk variants identified in previous GWAS conducted in East Asian populations and 26 of the 43 risk variants initially identified in GWAS conducted in European-ancestry populations. Using functional genomics data, we showed that most of the newly identified risk variants, or their highly correlated variants, are located in functional regions of the genome, contributing to the regulation of genes with established roles in colorectal tumorigenesis or cellular functions involved in cancer development pathways. Our cis-eQTL analyses provide additional evidence supporting a possible role of several risk variants identified in our study in regulating expression of cancer-related genes. The pathway analysis corroborated with the known biological roles of Smad protein regulation and cellular pluripotency in CRC carcinogenesis, and suggested the HLA protein complex in regulating CRC risk. Our study provides substantial novel information towards the understanding of the genetic and biological basis for CRC.

Of the 13 novel loci identified in our study, seven of the lead risk variants are located inside a protein-coding gene, including a nonsynonymous SNP (rs7542665) located in the L1TD1 gene (1p31.3). However, using the PolyPhen2 prediction, rs7542665 and two other nonsynonymous SNPs (rs7533274 and rs11207933) that are in high LD with rs7542665 (r2 EAS ≥ 0. 80) are classified as “benign”, and thus may not affect the function of the protein encoded by L1TD1. Interestingly, the risk allele of rs7542665 was associated with increased L1TD1 expression in multiple tissues investigated in the GTEx (Supplementary Figure 7). L1TD1 is a RNA-binding protein, and this protein is one of the most specific and abundant proteins in pluripotent stem cells, which is essential for the maintenance of pluripotency in human cells34. It was found that elevated L1TD1 levels might increase the risk of cancer by increasing cellular pluripotency35.

The other six lead risk variants located inside a protein-coding gene are all intronic variants. Findings for rs1476570 and rs3830041 are particularly interesting, as they are located inside the HLA-G and NOTCH4 genes, respectively. These genes play a central role in immune regulation and response. Our eQTL analyses of East Asian CRC samples showed that rs1476570 was associated with the expression of multiple HLA genes, including HLA-F, HLA-G and HLA-V, and rs3830041 was associated with the expression of HLA-DQB2, LTB and MICA (Supplementary Table 7). The risk allele of rs3830041 was associated with reduced MICA expressions. Reduced MICA expressions on the cancer cell surface were found to inhibit NK cell–mediated antitumor immunity36. Genetic variants in this region are associated with infectious diseases, Crohn’s diseases in the Ashkenazi Jewish population, and blood cell phenotypes3739. To our knowledge, no previous GWAS has identified any CRC risk loci in genes with a primary role in immune regulation. Our study provides the first piece of evidence that germline variation in immune response genes may play a significant role in the pathogenesis of CRC.

Other intronic risk variants identified in this study also reveal potential novel pathways for CRC pathogenesis. For example, the protein encoded by PPP1R21 at 2p16.3 (rs7606562) regulates the dephosphorylation activity of the ubiquitous and conserved protein phosphatase 1 enzyme. This protein and its regulatory proteins are involved in signaling pathways that regulate cellular growth, cell cycle and apoptosis40. SNP rs77969132 is located in the intron of the DENND5B gene, encoding GDP-GTP exchange factors 5B, putatively acting on Rab39 in membrane trafficking control41. SNP rs13831 is an intronic variant for the GNAS gene, encoding the Gsα subunit of the heterotrimeric G proteins. Activating mutations in this gene was thought to promote intestinal tumorigenesis through the activation of the Wnt and ERK1/2 MAPK pathways42.

Two of the intronic risk variants identified in our study have a low-frequency allele. SNP rs201395236 is located in the intron 2 of the EFCAB2 gene (MAF = 1.3%, OR = 1.75 for the major allele), while rs77969132 is located in the intron 8 of the DENND5B gene (MAF = 1.5%, OR = 1.44 for the minor allele). These are the first two low-frequency variants identified as having an association with CRC risk, and the risk associated with these variants is substantially higher than that of the common risk variants identified for CRC risk. These variants will be invaluable in estimating individual CRC risk.

We identified six lead risk variants located in intergenic regions. SNP rs113569514, at 3q22.2, is mapped to the 5’ region of SLCO2A1. This gene encodes a principal transmembrane prostaglandin transporter that mediates the uptake and clearance of extracellular prostaglandins43. Germline mutations in SLCO2A1 were found to contribute to familial CRC risk44. Using data from the GTEx project, we found that the risk allele of rs113569514 was associated with a reduced expression level of SLCO2A1 in multiple tissues30, perhaps leading to a reduced prostaglandin catabolism, and subsequently, an increased CRC risk43. Our eQTL analyses also identified a significant correlation of rs1886450 at13q22.1 with KLF5 expression30, rs1078643 at 17p12 with SHISA6, DNAH9 and GLP2R expression, and rs2730985 at 12q12 with PRICKLE1 expression (Supplementary Table 7 and 8). The chromosome region containing KLF5 is significantly amplified in CRC45. Kruppel-like factor 5 is a zinc-finger transcription factor that is indispensable for the integrity and oncogenic transformation of intestinal stem cells46. GLP2R is expressed predominantly within the gastrointestinal tract47, 48, and has been shown to maintain the integrity of the intestinal epithelium by both stimulating cell proliferation and inhibiting apoptotic cell death in the crypt compartment4852. The PRICKLE1 gene encodes a Dishevelled-associated protein that negatively regulates the Wnt/β-catenin signaling pathway53.

For the other two intergenic risk variants, we could not find any significant correlation with any genes. SNP rs2659017 at 5q23.2 is located 57 kb upstream of ALDH7A1. This gene encodes an enzyme of the aldehyde dehydrogenase superfamily, which degrades and detoxifies aldehydes generated by alcohol metabolism and lipid peroxidation. SNP rs4341754 at 16q23.2 is located downstream of the proto-oncogene MAF (405 kb) and the tumor suppressor gene WWOX (793 kb). Several variants in this region are associated with oral cavity and pharyngeal cancer risk (rs4284656)54 and breast cancer risk (rs13329835)55 in European-ancestry populations. However, rs4341754 is not in LD with these GWAS-identified risk variants (r2EUR or EAS = 0.00). The region containing WWOX is recurrently deleted in multiple cancers56, including CRC45. The WW domain containing oxidoreductase, encoded by WWOX, is involved in maintaining genomic stability, and its loss leads to genomic instability and cancer development57, 58

Efforts to replicate the associations for these 13 new risk loci on CRC risk in additional Asian descendants were not undertaken in this study. However, the lead risk variants in five of these loci also showed a significant association with CRC risk in European descendants. Given the differences in genetic architectures between Asian and European descendants, it is not surprising that some of the risk variants identified in one population cannot be directly replicated in another population. Fine-mapping of these loci will be helpful to identify risk variants that may be more specifically related to CRC risk in European descendants. On the other hand, we could not completely rule out the possibility of false positive findings for some of these newly identified risk variants. Replication of these findings in additional Asian descendants should be helpful to validate our finding, especially for the 8 loci that were not replicated in European descents in relation to CRC risk.

In summary, we identified 14 novel risk variants for CRC, including 13 in loci not yet reported in previous studies. These risk variants, along with the common risk variants replicated in our study, explain about 11.7% of the familial relative risk of CRC in the East Asian population. Some of the putative target genes suggested by results from our studies are located in established pathways for colorectal tumorigenesis, such as the Wnt/β-catenin signaling (PRICKLE1) and prostaglandin E2 catabolism (SLCO2A1). Our study also suggests that genes involved in maintaining colon stem cells (L1TD1 and KLF5) or regulating the proliferation of an intestinal epithelium cell (GLP2R) may play a significant role in the pathogenesis of CRC. Finally, we provided evidence, for the first time, that genetic variations in the HLA gene region may contribute to the susceptibility of CRC, supporting a critical role in immune regulation and response in the development and progression of CRC.

Supplementary Material

1

Figure 1.

Figure 1.

Manhattan plot showing the joint meta-analysis association statistics (-log10 (P values)) of Stage 1 and Stage 2 (N case = 22,775, N control = 47,731). Known loci in blue and novel loci in red.

In a study of more than 60,000 East Asian descendants, 13 new genetic loci were discovered that are associated with an increased risk of colorectal cancer.

ACKNOWLEDGMENTS

The authors thank all study participants and research staff of all parent studies for their contributions and commitment to this project. The authors thank Vanderbilt staff members Ms. Jing He for data processing and analyses and Mr. Marshal Younger for editing and preparing the manuscript. The work at Vanderbilt University Medical Center was supported by U.S. National Institutes of Health grants R01CA188214, R37CA070867, R01CA124558, R01CA158473, and R01CA148667, as well as Anne Potter Wilson Chair funds from the Vanderbilt University School of Medicine. Sample preparation and genotyping assays at Vanderbilt University were conducted at the Survey and Biospecimen Shared Resources and Vanderbilt Microarray Shared Resource, which are supported in part by the Vanderbilt-Ingram Cancer Center (P30CA068485). Imputation and statistical analyses were performed on servers maintained by the Advanced Computing Center for Research and Education at Vanderbilt University.

Studies (grant support) participating in the Asia Colorectal Cancer Consortium include the Shanghai Women’s Health Study (US NIH, R37CA070867, UM1CA182910), the Shanghai Men’s Health Study (US NIH, R01CA082729, UM1CA173640), the Shanghai Breast and Endometrial Cancer Studies (US NIH, R01CA064277 and R01CA092585; contributing only controls), the Shanghai Colorectal Cancer Study 3 (US NIH, R37CA070867, R01CA188214 and Anne Potter Wilson Chair funds), the Guangzhou Colorectal Cancer Study (National Key Scientific and Technological Project, 2011ZX09307001–04; the National Basic Research Program, 2011CB504303, contributing only controls, the Natural Science Foundation of China, 81072383, contributing only controls), the Hwasun Cancer Epidemiology Study–Colon and Rectum Cancer (HCES-CRC; grants from Chonnam National University Hwasun Hospital Biomedical Research Institute (HCRI18007), the Japan BioBank Colorectal Cancer Study (grant from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese government), the Aichi Colorectal Cancer Study (Grant-in-Aid for Cancer Research, grant for the Third Term Comprehensive Control Research for Cancer and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology, 17015018 and 221S0001), the Korea-NCC (National Cancer Center) Colorectal Cancer Study (Basic Science Research Program through the National Research Foundation of Korea, 2010–0010276 and 2013R1A1A2A10008260; National Cancer Center Korea, 0910220), and the KCPS-II Colorectal Cancer Study (National R&D Program for Cancer Control, 1631020; Seoul R&D Program, 10526).

Participating studies (grant support) in the GECCO, CORECT and CCFR GWAS meta-analysis are GECCO (US NIH, U01CA137088 and R01CA059045), DALS (US NIH, R01CA048998), DACHS (German Federal Ministry of Education and Research, BR 1704/6–1, BR 1704/6–3, BR 1704/6–4, CH 117/1–1, 01KH0404 and 01ER0814), HPFS (P01 CA 055075, UM1 CA167552, R01 137178, R01 CA151993 and P50 CA127003), NHS (UM1 CA186107, R01 CA137178, P01 CA87969, R01 CA151993 and P50 CA127003), OFCCR (US NIH, U01CA074783), PMH (US NIH, R01CA076366), PHS (US NIH, R01CA042182), VITAL (US NIH, K05CA154337), WHI (US NIH, HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C) and PLCO (US NIH, Z01CP 010200, U01HG004446 and U01HG 004438). CORECT is supported by the National Cancer Institute as part of the GAME-ON consortium (US NIH, U19CA148107), with additional support from National Cancer Institute grants (R01CA81488 and P30CA014089), the National Human Genome Research Institute at the US NIH (T32HG000040) and the National Institute of Environmental Health Sciences at the US NIH (T32ES013678). CCFR is supported by the National Cancer Institute, US NIH under RFA CA-95–011, and through cooperative agreements with members of the Colon Cancer Family Registry and principal investigators of the Australasian Colorectal Cancer Family Registry (US NIH, U01CA097735), the Familial Colorectal Neoplasia Collaborative Group (US NIH, U01CA074799) (University of Southern California), the Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (US NIH, U01CA074800), the Ontario Registry for Studies of Familial Colorectal Cancer (US NIH, U01CA074783), the Seattle Colorectal Cancer Family Registry (US NIH, U01CA074794) and the University of Hawaii Colorectal Cancer Family Registry (US NIH, U01CA074806). The GWAS work was supported by a National Cancer Institute grant (US NIH, U01CA122839). OFCCR was supported by a GL2 grant from the Ontario Research Fund, Canadian Institutes of Health Research and a Cancer Risk Evaluation (CaRE) Program grant from the Canadian Cancer Society Research Institute. T.J. Hudson and B.W. Zanke are recipients of Senior Investigator Awards from the Ontario Institute for Cancer Research, through support from the Ontario Ministry of Economic Development and Innovation. ASTERISK was funded by a Regional Hospital Clinical Research Program (PHRC) and supported by the Regional Council of Pays de la Loire, the Groupement des Entreprises Françaises dans la Lutte contre le Cancer (GEFLUC), the Association Anne de Bretagne Génétique and the Ligue Régionale Contre le Cancer (LRCC).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

We declare no conflicts of interest

References:

  • 1.Lichtenstein P, Holm NV, Verkasalo PK, et al. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 2000;343:78–85. [DOI] [PubMed] [Google Scholar]
  • 2.Mucci LA, Hjelmborg JB, Harris JR, et al. Familial Risk and Heritability of Cancer Among Twins in Nordic Countries. JAMA 2016;315:68–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Peters U, Bien S, Zubair N. Genetic architecture of colorectal cancer. Gut 2015;64:1623–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Adam R, Spier I, Zhao B, et al. Exome Sequencing Identifies Biallelic MSH3 Germline Mutations as a Recessive Subtype of Colorectal Adenomatous Polyposis. Am J Hum Genet 2016;99:337–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Weren RD, Ligtenberg MJ, Kets CM, et al. A germline homozygous mutation in the base-excision repair gene NTHL1 causes adenomatous polyposis and colorectal cancer. Nat Genet 2015;47:668–71. [DOI] [PubMed] [Google Scholar]
  • 6.Jia WH, Zhang B, Matsuo K, et al. Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. Nat Genet 2013;45:191–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zeng C, Matsuda K, Jia WH, et al. Identification of Susceptibility Loci and Genes for Colorectal Cancer Risk. Gastroenterology 2016;150:1633–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang B, Jia WH, Matsuda K, et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat Genet 2014;46:533–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schmit SL, Edlund CK, Schumacher FR, et al. Novel Common Genetic Susceptibility Loci for Colorectal Cancer. J Natl Cancer Inst 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Frampton M, Houlston RS. Modeling the prevention of colorectal cancer from the combined impact of host and behavioral risk factors. Genet Med 2017;19:314–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhang B, Jia WH, Matsuo K, et al. Genome-wide association study identifies a new SMAD7 risk variant associated with colorectal cancer risk in East Asians. Int J Cancer 2014;135:948–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tanikawa C, Kamatani Y, Takahashi A, et al. GWAS identifies two novel colorectal cancer loci at 16q24.1 and 20q13.12. Carcinogenesis 2018;39:652–660. [DOI] [PubMed] [Google Scholar]
  • 13.Cui R, Okada Y, Jang SG, et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut 2011;60:799–805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Amos CI, Dennis J, Wang Z, et al. The OncoArray Consortium: A Network for Understanding the Genetic Architecture of Common Cancers. Cancer Epidemiol Biomarkers Prev 2017;26:126–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Price AL, Patterson NJ, Plenge RM, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–9. [DOI] [PubMed] [Google Scholar]
  • 16.Zhan X, Hu Y, Li B, et al. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics 2016;32:1423–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26:2190–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Skol AD, Scott LJ, Abecasis GR, et al. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 2006;38:209–13. [DOI] [PubMed] [Google Scholar]
  • 19.Lau J, Ioannidis JP, Schmid CH. Quantitative synthesis in systematic reviews. Ann Intern Med 1997;127:820–6. [DOI] [PubMed] [Google Scholar]
  • 20.Yang J, Ferreira T, Morris AP, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 2012;44:369–75, S1–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zheng W, Zhang B, Cai Q, et al. Common genetic determinants of breast-cancer risk in East Asian women: a collaborative study of 23 637 breast cancer cases and 25 579 controls. Hum Mol Genet 2013;22:2539–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Johns LE, Houlston RS. A systematic review and meta-analysis of familial colorectal cancer risk. Am J Gastroenterol 2001;96:2992–3003. [DOI] [PubMed] [Google Scholar]
  • 23.de Leeuw CA, Mooij JM, Heskes T, et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol 2015;11:e1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010;38:e164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cooper GM, Stone EA, Asimenos G, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005;15:901–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Garber M, Guttman M, Clamp M, et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 2009;25:i54–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet 2013;Chapter 7:Unit 7 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ward LD, Kellis M. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res 2016;44:D877–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 2012;22:1790–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Consortium GT. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 2015;348:648–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Whiffin N, Hosking FJ, Farrington SM, et al. Identification of susceptibility loci for colorectal cancer in a genome-wide meta-analysis. Hum Mol Genet 2014;23:4729–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Schumacher FR, Schmit SL, Jiao S, et al. Genome-wide association study of colorectal cancer identifies six new susceptibility loci. Nat Commun 2015;6:7138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wang N, Lu Y, Khankari NK, et al. Evaluation of genetic variants in association with colorectal cancer risk and survival in Asians. Int J Cancer 2017;141:1130–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Narva E, Rahkonen N, Emani MR, et al. RNA-binding protein L1TD1 interacts with LIN28 via RNA and is required for human embryonic stem cell self-renewal and cancer cell proliferation. Stem Cells 2012;30:452–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhu L, Finkelstein D, Gao C, et al. Multi-organ Mapping of Cancer Risk. Cell 2016;166:1132–1146 e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ferrari de Andrade L, Tay RE, Pan D, et al. Antibody-mediated inhibition of MICA and MICB shedding promotes NK cell-driven tumor immunity. Science 2018;359:1537–1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tian C, Hromatka BS, Kiefer AK, et al. Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections. Nat Commun 2017;8:599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Astle WJ, Elding H, Jiang T, et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell 2016;167:1415–1429e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kenny EE, Pe’er I, Karban A, et al. A genome-wide scan of Ashkenazi Jewish Crohn’s disease suggests novel susceptibility loci. PLoS Genet 2012;8:e1002559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Figueiredo J, da Cruz ESOA, Fardilha M. Protein phosphatase 1 and its complexes in carcinogenesis. Curr Cancer Drug Targets 2014;14:2–29. [DOI] [PubMed] [Google Scholar]
  • 41.Yoshimura S, Gerondopoulos A, Linford A, et al. Family-wide characterization of the DENN domain Rab GDP-GTP exchange factors. J Cell Biol 2010;191:367–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wilson CH, McIntyre RE, Arends MJ, et al. The activating mutation R201C in GNAS promotes intestinal tumourigenesis in Apc(Min/+) mice through activation of Wnt and ERK1/2 MAPK pathways. Oncogene 2010;29:4567–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Markowitz SD. Colorectal neoplasia goes with the flow: prostaglandin transport and termination. Cancer Prev Res (Phila) 2008;1:77–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Guda K, Fink SP, Milne GL, et al. Inactivating mutation in the prostaglandin transporter gene, SLCO2A1, associated with familial digital clubbing, colon neoplasia, and NSAID resistance. Cancer Prev Res (Phila) 2014;7:805–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cancer Genome Atlas N. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487:330–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Nakaya T, Ogawa S, Manabe I, et al. KLF5 regulates the integrity and oncogenicity of intestinal stem cells. Cancer Res 2014;74:2882–91. [DOI] [PubMed] [Google Scholar]
  • 47.Wismann P, Barkholt P, Secher T, et al. The endogenous preproglucagon system is not essential for gut growth homeostasis in mice. Mol Metab 2017;6:681–692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Drucker DJ, Yusta B. Physiology and pharmacology of the enteroendocrine hormone glucagon-like peptide-2. Annu Rev Physiol 2014;76:561–83. [DOI] [PubMed] [Google Scholar]
  • 49.Yusta B, Holland D, Koehler JA, et al. ErbB signaling is required for the proliferative actions of GLP-2 in the murine gut. Gastroenterology 2009;137:986–96. [DOI] [PubMed] [Google Scholar]
  • 50.Bahrami J, Yusta B, Drucker DJ. ErbB activity links the glucagon-like peptide-2 receptor to refeeding-induced adaptation in the murine small bowel. Gastroenterology 2010;138:2447–56. [DOI] [PubMed] [Google Scholar]
  • 51.Boushey RP, Yusta B, Drucker DJ. Glucagon-like peptide (GLP)-2 reduces chemotherapyassociated mortality and enhances cell survival in cells expressing a transfected GLP-2 receptor. Cancer Res 2001;61:687–93. [PubMed] [Google Scholar]
  • 52.Drucker DJ, Erlich P, Asa SL, et al. Induction of intestinal epithelial proliferation by glucagonlike peptide 2. Proc Natl Acad Sci U S A 1996;93:7911–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Chan DW, Chan CY, Yam JW, et al. Prickle-1 negatively regulates Wnt/beta-catenin pathway by promoting Dishevelled ubiquitination/degradation in liver cancer. Gastroenterology 2006;131:1218–27. [DOI] [PubMed] [Google Scholar]
  • 54.Lesseur C, Diergaarde B, Olshan AF, et al. Genome-wide association analyses identify new susceptibility loci for oral cavity and pharyngeal cancer. Nat Genet 2016;48:1544–1550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Michailidou K, Hall P, Gonzalez-Neira A, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet 2013;45:353–61, 361e1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Baryla I, Styczen-Binkowska E, Bednarek AK. Alteration of WWOX in human cancer: a clinical view. Exp Biol Med (Maywood) 2015;240:305–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Abu-Remaileh M, Joy-Dodson E, Schueler-Furman O, et al. Pleiotropic Functions of Tumor Suppressor WWOX in Normal and Cancer Cells. J Biol Chem 2015;290:30728–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hazan I, Abu-Odeh M, Hofmann TG, et al. WWOX guards genome stability by activating ATM. Mol Cell Oncol 2015;2:e1008288. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES