Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Feb 1.
Published in final edited form as: Gut. 2013 Aug 14;63(2):326–336. doi: 10.1136/gutjnl-2012-304121

Genetic variants associated with colorectal-cancer risk: comprehensive research synopsis, meta-analysis, and epidemiological evidence

Xiangyu Ma 1,2, Ben Zhang 2, Wei Zheng 2
PMCID: PMC4020522  NIHMSID: NIHMS535098  PMID: 23946381

Abstract

Objective

In the past two decades, approximately 1,000 reports have been published regarding associations between genetic variants in candidate genes and risk of colorectal cancer (CRC). Study results are inconsistent. We aim to provide a synopsis of the current understanding of genetic factors for CRC risk through systematically evaluating results from previous studies.

Design

We searched PubMed and Google Scholar to identify papers that investigated associations between genetic variants and CRC risk and published through December 25, 2012. With data from 950 papers, we conducted 910 meta-analyses for 267 genetic variants in 150 candidate genes with at least three data sources. We used Venice criteria and false-positive report probability tests to grade levels of cumulative epidemiological evidence of significant associations with CRC risk.

Results

Sixty-two variants in 50 candidate genes showed a nominally significant association with CRC risk (p<0.05). Cumulative epidemiological evidence for a significant association with CRC risk was graded strong for eight variants in five genes (APC, CHEK2, DNMT3B MLH1, and MUTYH), moderate for two variants in two genes (GSTM1 and TERT), and weak for 52 variants in 45 genes. In addition, 40 variants in 33 genes showed convincing evidence of no association with CRC risk in meta-analyses including at least 5,000 cases and 5,000 controls.

Conclusion

Approximately 4% of genetic variants evaluated to date in candidate-gene association studies showed moderate to strong cumulative epidemiological evidence of an association with CRC risk. These genetic variants, if confirmed, may explain approximately 5% of familial CRC risk.

Keywords: colorectal cancer, meta-analysis, systematic review, genetic epidemiology, genome-wide association study, genetic variants, gene

Introduction

Colorectal cancer (CRC) is the third-most common cancer, and the second leading cause of cancer death worldwide (1). Genetic factors play an important role in CRC development (2-6). High-penetrance germline mutations in the APC, MUTYH, SMAD4, BMPR1A, STK11, and mismatch repair genes have been identified to account for about 6% of CRC cases (Table 1) (6-13). Since 2007, common genetic variants in approximately 21 loci have been identified through genome-wide association studies (GWAS) (Table 2) (14-24). GWAS-identified variants, however, are associated with weak to moderately elevated risk of CRC, and explain approximately 8% of the familial risk of CRC (20;21).

Table 1. Known high-penetrance mutations in genes contribute to familial colorectal cancer.

Gene Variants Hereditary syndrome Population frequency References
APC Nonsense or frameshift mutations Familial adenomatous polyposis 0.01-0.02% 6, 7
MLH1 Truncating and missense mutations Lynch syndrome 0.10% 6, 8
MSH2 Truncating and missense mutations Lynch syndrome <0.1% 6, 8
MSH6 Truncating and missense mutations Lynch syndrome <0.05% 6, 8
PMS2 Truncating and missense mutations Lynch syndrome <0.05% 6, 8
STK11 Multiple mutations Peutz-Jeghers syndrome 0.0005-0.01% 9, 10
BMPR1A Multiple mutations Juvenile polyposis syndrome <0.0005% 10
SMAD4 Multiple mutations Juvenile polyposis syndrome <0.0005% 10
MUTYH Nonsense and missense mutations MUTYH-associated polyposis <0.02% 11, 12

Table 2. Low-penetrance loci associated with colorectal-cancer risk, identified by genome-wide association studies with p<5×10-8.

Loci Genes * Variants Alleles MAF (%) OR (95% CI) P value Ethnicity Reference
1q41 DUSP10 rs6691170 T/G 36 1.06 (1.03-1.09) 9.55 × 10-10 European 21
1q41 DUSP10 rs6687758 G/A 20 1.09 (1.06-1.12) 2.27 × 10-9 European 21
3q26.2 MYNN rs10936599 T/C 25 0.93 (0.91-0.96) 3.39 × 10-8 European 21
5q31.1 PITX1 rs647161 A/C 31 1.11 (1.08-1.15) 1.22 × 10-10 Asian 24
6p21.31 CDKN1A rs1321311 A/C 23 1.10 (1.07-1.13) 1.14 × 10-10 European 22
6q26-q27 SLC22A3 rs7758229 T/G 22 1.28 (1.18-1.39) 7.92 × 10-9 Asian 23
8q23.3 EIF3H rs16892766 C/A 7 1.25 (1.19-1.32) 3.30 × 10-18 European 19
8q24.21 MYC rs10505477 A/G 51 1.17 (1.12-1.23) 3.16 × 10-11 European 14
8q24.21 MYC rs6983267 G/T 52 1.21 (1.15-1.27) 1.27 × 10-14 European 15
8q24.21 MYC rs7014346 A/G 37 1.19 (1.14-1.24) 8.60 × 10-26 European 18
10p14 FLJ3802842 rs10795668 A/G 33 0.89 (0.86-0.91) 2.50 × 10-13 European 19
11q13.4 POLD3 rs3824999 A/C 50 0.93 (0.91-0.95) 3.65 × 10-10 European 22
11q23 Unknown rs3802842 C/A 29 1.11 (1.08-1.15) 5.82 × 10-10 European 18
12p13.32 CCND2 rs10774214 T/C 35 1.09 (1.06-1.13) 3.06 × 10-8 Asian 24
12q13.13 LARP4, DIP2 rs7136702 T/C 35 1.06 (1.04-1.08) 4.02 × 10-8 European 21
12q13.3 DIP2B, ATF1 rs11169552 T/C 28 0.92 (0.90-0.95) 1.89 × 10-10 European 21
14q22.2 BMP4 rs4444235 C/T 46 1.11 (1.08-1.15) 8.10 × 10-10 European 20
15q13.3 GREM1 rs4779584 T/C 18 1.26 (1.19-1.34) 4.44 × 10-14 European 17
16q22.1 CDH1 rs9929218 A/G 29 0.91 (0.89-0.94) 1.20 × 10-8 European 20
18q21.1 SMAD7 rs4939827 C/T 48 0.85 (0.81-0.89) 1.00 × 10-12 European 16
19q13.1 RHPN2 rs10411210 T/C 10 0.87 (0.83-0.91) 4.60 × 10-9 European 20
20p12.3 HAO1 rs2423279 C/T 30 1.10 (1.06-1.14) 6.64 × 10-9 Asian 24
20p12.3 BMP2 rs961253 A/C 36 1.12 (1.08-1.16) 2.00 × 10-10 European 20
20q13.33 LAMA5 rs4925386 T/C 32 0.93 (0.91-0.95) 1.89 × 10-10 European 21
Xp22.2 SHROOM2 rs5934683 T/C 33 1.07 (1.04-1.10) 7.30 × 10-10 European 22
*

Candidate gene in the locus.

Minor (bold)/major alleles (per initial studies).

MAF=minor allele frequency in controls.

In addition to GWAS, approximately 1,000 papers have been published over the past 25 years investigating genetic variants in candidate genes in relation to CRC risk. Because of the limitation of SNP arrays used in GWAS, many genetic variants evaluated in candidate gene association studies have not been adequately investigated in GWAS. Results from previous candidate gene studies have been inconsistent and are difficult to interpret. Most findings from candidate gene association studies cannot be replicated. Furthermore, sample size from most previous candidate gene association studies was small, so these studies often do not have adequate power to detect a true association. Meta-analysis is a useful tool to systematically evaluate available results published to date to assess evidence for a true association. By pooling data from multiple studies, meta-analysis can increase statistical power and evaluate consistency of association, a major criterion for determining causality. Recently, an interim guideline, named Venice criteria, has been used to systematically grade the cumulative evidence of genetic associations (25;26). Systematic field synopses and meta-analyses have been utilized to evaluate the association of genetic variations in candidate genes with several diseases, including Alzheimer's disease (27), schizophrenia (28), breast cancer (29), cutaneous melanoma (30), and Parkinson's disease (31). Herein, we sought to systematically collect and comprehensively evaluate all candidate-gene association studies of CRC risk, perform meta-analyses for variants with at least three independent datasets, and provide a systematic synopsis of our current understanding of the genetic basis of CRC risk.

Methods

Search strategy and selection criteria

Literature searches were conducted through a two-stage strategy (Figure 1). In Stage 1, we searched the PubMed database using key terms “(colorectal cancer OR colon cancer OR rectal cancer) AND association” before October 1, 2010. This search yielded 8,443 potentially relevant articles which were screened for eligibility by title, abstract, or full text, as necessary – 428 reports, which included 1,036 potential candidate genes, then met eligibility criteria. In Stage 2, conducted October 1, 2010 through December 25, 2012, we used four supplementary approaches to query PubMed and Google Scholar: 1) monthly database queries for “colorectal cancer” and the 1,036 gene names identified in Stage 1 such as “MTHFR”; 2) monthly queries using “colorectal cancer OR colon cancer OR rectum cancer”; 3) searching references and related articles of all gathered papers; and 4) checking previously published meta-analyses and reviews. These four searches identified 48,521 additional reports, of which 522 met our inclusion criteria, adding genetic variants in 342 additional candidate genes. In Stages 1 and 2 combined, we screened a total of 56,964 articles, identifying 945 which reported 3,603 variants in 1,378 independent candidate genes which met our criteria for further analysis.

Figure 1. Profiles of literature search, meta-analysis and evaluation of cumulative evidence.

Figure 1

Studies were eligible for inclusion in this meta-analysis if they met the following criteria: 1) data were published in a peer-reviewed journal in English; 2) the study used a case-control, cohort, or a cross-sectional design in human beings; 3) the study provided sufficient information for the genotypic or allelic distribution of individual variants for both CRC cases and controls, and 4) CRC cases were diagnosed by pathological and/or histological examination. We did not include in the meta-analyses the following two groups of variants: 1) high-penetrance germline mutations in known CRC susceptibility genes, and 2) risk variants identified and confirmed in recent GWAS (Table 2). When multiple publications reported on the same or overlapping data, we used the most informative or most recent publication. Only data from original published papers were included in the present analysis. All variants, regardless of their minor allele frequency (MAF), were considered for meta-analyses when genotype counts or allelic counts were provided in the original studies.

Data extraction and management

All data were extracted by two authors (XM and BZ), and disagreement was resolved by discussion. We recorded first author, year of publication, study name, geographic location of study, ethnicity, PubMed identification number, study design, sample size, mean ages of cases and controls, sample source, genes, variants, major and minor alleles, genotype counts or allelic counts for cases and controls, and Hardy-Weinberg equilibrium (HWE) in controls. Ethnicity was classified as African descendants, Asian (East Asian descent), White (European descent), or Other (including mixed), based on ethnicity of at least 80% of the study population (32). If ethnicity was not reported, we considered ethnicity of the source population where the study was conducted (32). Finally, if a report included several sources or study populations, data were extracted separately.

Statistical analysis and evaluation of cumulative evidence

Statistical analyses were performed by STATA, version 11.0. All tests were two-sided, and p<0.05 was considered statistically significant unless otherwise stated.

Summary odds ratios (ORs) with 95% confidence intervals (CIs) for alleles and genotypes, were used to assess strength of associations between genetic variants and CRC risk by the random-effects method (33). Genotype counts or allelic counts for cases and controls from each original study were used to estimate summary ORs. We did not use adjusted ORs to estimate summary ORs since inconsistent covariates were used for adjustment in original studies included in this meta-analysis. In the primary analyses, we evaluated common variants (MAF≥0.05) using additive model and rare variants (MAF<0.05) using dominant model. For some common variants, a few original studies did not provide sufficient data for analyses with additive model, and thus dominant/recessive model was applied in the primary analyses. For some specific variants, we used the conventional comparisons in original studies, like GSTM1 ‘Present/Null’, NAT2 phenotype (predicted by genetic variants) and MUTYH rs36053993 in the primary analyses. We also conducted subgroup analyses by ethnicities. Dominant and recessive models were also used to assess associations between genetic variants and CRC risk, if available. Meta-analyses were performed only for variants with at least three independent datasets. Because major and minor alleles can be reversed in populations of different ethnicities, averaged MAFs across studies might be greater than 50%. When this occurred, the minor allele among White populations was used as the minor allele in all analyses. For genetic variants other than SNPs, the less prevalent variant or trait was evaluated for associated effects unless otherwise stated. HWE among control groups in each study was assessed by Fisher's exact test to compare observed and expected genotype frequencies (34). We conducted power analysis to evaluate the statistical power of meta-analyses in detecting an association (i.e., OR=1.15) with certain allele frequency (i.e., MAF=0.10) under the additive genetic model, assuming an alpha of 0.05 (35). We calculated the proportion of the familial risk of CRC based on the formula provided by Houlston et al (20).

To determine heterogeneity, we performed Cochran's Q test (36) and calculated the I2 statistic to quantify the proportion of total variation due to heterogeneity (37). Heterogeneity was considered significant if p<0.10. Generally, I2 values <25% correspond to no or little heterogeneity, values 25% – 50% correspond to moderate heterogeneity, and values >50% correspond to strong heterogeneity between studies. Potential small-study bias was assessed with a modified Egger test by Harbord et al. (38). We also evaluated if there was any excess in studies with positive findings than expected using the method described by Ioannidis and Trikalinos (39). To evaluate small-study bias and excessive significant findings, we used p<0.10 as the significant level, as recommended (38;39). For variants showing statistically significant association with CRC risk, sensitivity analyses were performed to determine if the association would be lost when the first published or first positive report was excluded, or when all studies deviated from HWE in controls were excluded.

For statistically significant associations identified by meta-analyses, Venice criteria were applied to assess cumulative evidence (Webappendix notes for Venice criteria). Venice criteria details are published elsewhere (25). For amount of evidence, we did not apply this criterion for rare variants with frequency<1% since an A grade is virtually unobtainable (29). For protection from bias, we also considered GWAS results for all common SNPs (MAF≥5%). If a common variant that can be adequately tagged by GWAS chips was not identified by GWAS, that variant would be downgraded for its evidence of association with CRC risk. Cumulative epidemiological evidence of significant associations in meta-analyses were considered strong if all three grades were A, moderate if all three grades were A or B, and weak if any grade was C. We also performed false-positive report probability (FPRP) analysis to determine if a significant association can be excluded as a false-positive finding. We used the approach developed by Wacholder et al (40) to calculate FPRP for the 62 significant associations. We used prior probability of 0.05 to estimate FPRP value for each of the 62 associations based on p-value and OR obtained from meta-analysis. FPRP<0.05, 0.2≤ FPRP≤0.05, and FPRP>0.2 were considered strong, moderate, and weak evidence of true association, respectively. We upgraded cumulative evidence from moderate to strong, and from weak to moderate, if evidence of true association based on the FPRP analysis was strong. We downgraded cumulative evidence from strong to moderate, and from moderate to weak if evidence of true association was weak. For the 25 significant associations derived from subgroup analysis of different ethnicities or under dominant or recessive model, we also assessed significance based on Bonferroni corrected p-value (5.49×10-5=0.05/910). Regardless of Venice criteria and FPRP grades, we assigned weak evidence of association credibility if p-value > 5.49×10-5.

Results

A total of 945 articles reporting 3,603 variants in 1,378 independent genes were eligible for our analysis (Figure 1). Most of these reports (n=884, 93.5%) were published since 2000. We conducted 910 meta-analyses for 267 variants (241 common and 26 rare) in 150 genes that had at least three data sources (Figure 1). For the 267 main meta-analyses with the use of all available data, mean sample size was 9,633 (range: 519-76,991) from a mean of seven (range: 3-68) independent studies (Webappendix Table 1).

Among the main meta-analyses, 37 (13.9%) variants within 28 genes showed nominally significant association (p<0.05) for CRC risk (Table 3; Webappendix Table 2: references used; Webappendix Table 3). The 37 variants are not in linkage disequilibrium (r2 < 0.1). Mean pooled sample size in the 37 meta-analyses that showed significant association was 15,912 (range: 1,730-51,971), drawn from an average of 11 independent studies (range: 3-56). Approximately 10-fold elevated risk of CRC risk showed association with MUTYH biallelic mutations. Strong associations with CRC (ORs 2.0-10.0) were detected for four rare variants (MLH1 rs121912963, OR=2.74; MLH1 rs63750447, OR=2.14; MUTYH rs34612342, OR=3.32; MUTYH rs36053993, OR=6.49). Moderate associations with CRC (ORs 1.5-2.0 or 0.50-0.67) were found for three rare variants (APC rs1801155, OR=1.96; CHEK2 rs17879961, OR=1.56; CHEK2 1100delC, OR=1.88) and two common variants (DNMT3B rs1569686, OR=0.57; MLH1 rs1800734, OR=1.51). Associations with CRC risk, ORs 0.67-1.50, were observed for the remaining 27 variants, of which most are common. Four of the 37 positive variants (MLH1 rs1800734; MUTYH biallelic mutations; CHEK2 rs17879961; DNMT3B rs1569686) showed highly significant association with CRC risk at p<5×10-7; 13 showed association with CRC risk at p<0.01, and the remaining 20 had p<0.05 (Table 3).

Table 3. Genetic variants nominally significantly associated with colorectal-cancer risk in meta-analyses of all available data.

Number evaluated Colorectal-cancer risk meta-analysis Venice
criteria
garde
False-positive
report
probability§
Cumulative
evidence of
association


Genes Variants Alleles * Chromosome Frequency (%) Ethinicity Studies Cases Controls Genetic models OR (95% CI) P value I2 (%) Pheterogeneity
APC rs1801155 A/T 5 6.80 Jewish 3 804 6,188 Dominant 1.96 (1.37-2.79) 1.99×10-4 0 0.84 BAA 0.007 Strong
CHEK2 1100delC 1100delC/– 22 0.71 White 7 3,874 11,630 Dominant 1.88 (1.29-2.73) 0.001 0 0.50 ×AA 0.036 Strong
CHEK2 rs17879961 C/T 22 3.91 White 6 6,042 17,051 Dominant 1.56 (1.32-1.84) 1.22×10-7 0 0.76 BAA <0.001 Strong
CYP1A1 rs1048943 G/A 15 10.33 All 16 6,704 8,009 Additive 1.24 (1.05-1.47) 0.014 74 0.00 ACC 0.338 Weak
CYP2E1 96-bp insertion 96-bp ins/– 10 16.98 All 4 1,412 1,781 Additive 1.24 (1.03-1.49) 0.023 35 0.20 ABA 0.462 Weak
DNMT3B rs1569686 G/T 20 16.99 All 4 1,054 1,224 Additive 0.57 (0.47-0.68) 1.86×10-9 0 0.99 BAA <0.001 Strong
GH1 rs2665802 A/T 17 45.39 All 7 3,275 3,848 Additive 0.89 (0.80-0.99) 0.025 49 0.07 ABC 0.508 Weak
GSTM1 Present/Null NA 1 50.64 All 56 20,552 31,419 null vs present 1.10 (1.04-1.17) 0.001 48 0.00 ABC 0.046 Moderate
GSTT1 Present/Null NA 22 29.53 All 43 15,144 23,847 null vs present 1.15 (1.05-1.27) 0.004 68 0.00 ACC 0.144 Weak
IGFBP3 rs2854746 G/C 7 46.08 All 5 4,282 7,365 Additive 1.07 (1.01-1.14) 0.016 0 0.07 AAC 0.447 Weak
MLH1 rs121912963 C/G 3 0.43 All 3 1,412 1,508 Additive 2.74 (1.31-5.75) 0.008 15 0.31 ×AC 0.231 Weak
MLH1 rs1800734 A/G 3 21.11 White 5 801 10,890 Additive 1.51 (1.34-1.69) 6.74×10-12 46 0.11 AAA <0.001 Strong
MLH1 rs63750447 A/T 3 1.96 Asian 3 937 919 Additive 2.14 (1.12-4.12) 0.022 41 0.18 BBC 0.458 Weak
MMP1 rs1799750 2G/1G 11 39.63 All 8 1,477 1,751 Additive 0.76 (0.64-0.92) 0.004 61 0.01 ACC 0.138 Weak
MSH3 rs184967 A/G 5 15.25 White 3 5,085 7,136 Additive 1.11 (1.03-1.20) 0.005 0 0.38 AAC 0.182 Weak
MSH3 rs26279 G/A 5 28.13 White 4 5,691 7,665 Additive 1.1 (1.03-1.17) 0.006 17 0.31 AAC 0.157 Weak
MTHFD1 rs1950902 A/G 14 18.78 White 3 3,822 5,452 Additive 0.90 (0.84-0.98) 0.010 0 0.79 AAC 0.275 Weak
MUTYH Monoallelic mutation NA 1 1.69 White 17 25,981 18,811 Carriers vs wild homozygotes 1.17 (1.01-1.34) 0.036 0 0.84 BAC 0.546 Weak
MUTYH Biallelic mutation NA 1 0.01 White 17 25,981 18,811 Carriers vs wild homozygotes 10.19 (5.00-22.04) 5.30×10-10 0 0.88 ×AA <0.001 Strong
MUTYH rs34612342 G/A 1 0.01 White 17 27,041 19,641 GG vs AA 3.32 (1.13-9.81) 0.030 0 1.00 ×AA 0.533 Strong
MUTYH rs36053993 A/G 1 0.00 White 17 26,957 19,870 AA vs GG 6.49 (2.57-1.35) 7.49×10-5 0 0.85 ×AA 0.003 Strong
NAT2 Fast/slow NA 8 47.39 All 35 11,684 15,348 Slow vs fast 0.94 (0.89-0.99) 0.023 1 0.45 AAC 0.47 Weak
NOD2 rs2066844 T/C 16 6.15 White 9 3,297 3,088 Dominant 1.35 (1.02-1.78) 0.038 34 0.14 BBC 0.581 Weak
NOD2 rs2066847 C/– 16 6.21 White 11 4,337 5,395 Dominant 1.30 (1.02-1.65) 0.032 33 0.13 BBC 0.546 Weak
PTGS1 rs5788 A/C 9 13.35 White 4 3,989 6,659 Additive 1.13 (1.04-1.22) 0.004 0 0.64 AAC 0.113 Weak
PTGS2 rs689466 G/A 1 30.34 All 9 4,076 7,610 Additive 0.88 (0.80-0.98) 0.018 56 0.02 ACC 0.405 Weak
SCD rs7849 G/A 10 18.49 All 3 2,011 2,580 Additive 0.85 (0.73-0.98) 0.025 29 0.25 ABC 0.488 Weak
TERT rs2736100 T/G 5 49.34 White 8 16,176 18,135 Additive 1.07 (1.04-1.1) 2.92×10-5 0 0.53 AAC 0.001 Moderate
TGFB1 rs1800469 T/C 19 38.74 All 10 4,405 5,383 Additive 0.88 (0.79-0.97) 0.013 55 0.02 ACC 0.33 Weak
TNF rs1800629 A/G 6 13.78 All 11 2,296 2,283 Additive 1.28 (1-1.62) 0.046 71 0.00 ACC 0.625 Weak
TP73 G4C14/A4T14 NA 1 24.02 All 4 858 1,168 Additive 1.20 (1.04-1.40) 0.015 6 0.36 AAC 0.363 Weak
UBD rs2076485 C/T 6 26.07 White 3 4,281 6,157 Additive 1.07 (1.01-1.14) 0.034 0 0.77 AAC 0.563 Weak
VDR rs11568820 A/G 12 36.61 White 4 3,228 3,455 Dominant 1.15 (1.04-1.27) 0.005 0 0.93 AAB 0.165 Weak
VDR rs1544410 A/G 12 38.96 All 17 11,687 12,301 Additive 0.85 (0.72-0.99) 0.040 93 93.40 ACC 0.87 Weak
VEGF rs3025039 T/C 6 19.43 All 6 1,925 1,884 Additive 1.19 (1.04-1.37) 0.014 29 0.22 ABC 0.347 Weak
XPA rs1800975 A/G 9 36.46 White 3 593 1,137 Additive 0.82 (0.70-0.96) 0.016 4 0.36 AAC 0.379 Weak
XPC rs2228001 C/A 3 37.38 All 9 2,978 5,204 Additive 1.08 (1.01-1.16) 0.021 0 0.83 AAC 0.486 Weak
*

Minor alleles/major alleles (Per Caucasian); majors alleles were treated as reference alleles in the analyses;

Allelic ORs were estimated under the additive model. For dominant or recessive models, ORs were estimated for subjects who carry one or two minor alleles or subjects homozygous for the minor alleles,respectively.

Frequency of minor allele or effect genotype(s) in controls in primary meta-analysis

Venice criteria grades are for amount of evidence, replication of the association, and protection from bias.

§

False-positive report probability (FPRP) was determined based on OR and P value of each variant from meta-analysis and a prior probability of 0.05.

Cumulative epidemiological evidence as graded by combination of results from Venice criteria and FPRP for association with colorectal-cancer risk.

Of the 267 meta-analyses of all available data, 120 (44.9%) had little or no heterogeneity, 43 (16.1%) had moderate heterogeneity, and 104 (39.0%) had strong heterogeneity. The proportion of studies with strong heterogeneity was significantly lower for the 37 positive variants (Table 3) than the remaining 230 variants (19% vs 42%, Fisher's exact p < 0·01). Small-study bias was detected for 36 variants (13.5%), of which seven were positive variants. Of the 267 variants, 38 (14.2%) showed evidence of excess studies with significant findings including four positive variants. When considering all studies included in 267 meta-analyses as a whole, the number of studies with significant findings was also greater than that expected (666 vs 301, p < 0.0001).

In sensitivity analyses, nine SNPs (rs7849, rs1800469, rs3025039, rs1048943, rs689466, rs1544410, rs2854746, rs1800629, G4C14/A4T14) became non-significant after exclusion of HWE-violating studies, and 13 variants (rs2854746, rs121912963, rs63750447, rs26279, rs1950902, MUTYH monoallelic mutation, NAT2 Fast/slow, rs2066844, rs2066847, rs1800629, G4C14/A4T14, rs2076485, rs1544410) became non-significant after exclusion of the first positive or first published report.

We next calculated FPRP value at the prior probability, 0.05, to evaluate the probability of true association with CRC risk for the 37 positive variants from the main analyses. Associations with CRC risk had a FPRP value <0.05 for nine variants in seven genes (APC rs1801155, CHEK2 1100delC and rs17879961, DNMT3B rs1569686, GSTM1 deletion, MLH1 rs1800734, MUTYH biallelic mutations, rs36053993, TERT rs2736100), FPRP 0.05-0.2 for 6 variants in 5 genes (GSTT1 deletion, MMP1 rs1799750, MSH3 rs184967 and rs26279, PTGS1 rs5788, VDR rs11568820), and FPRP > 0.2 for the remaining 22 variants (Table 3).

Epidemiological credibility of significant associations was graded for the 37 positive variants identified through the main analyses (Table 3 and Webappendix Table 3). We first applied Venice criteria. Grades of A were given to 25, 22, and 9 meta-analyses for amount of evidence, replication of association, and protection from bias, respectively. Grades of B were given to 7, 8, and 1 meta-analyses for amount of evidence, replication of association, and protection from bias, respectively. Grades of C were given to 0, 7, and 27 meta-analyses for these three criteria, respectively. Next, strong, moderate, and weak for evidence of true association with CRC risk were assigned to 9, 6, and 22 variants, respectively, based on FPRP. For MUTYH rs34612342, we disregarded FPRP value (FPRP=0.533) when evaluating cumulative evidence because this mutation is pathogenic and has strong evidence to increase the risk of developing multiple adenomatous polyps and colorectal cancer (41). Altogether, eight variants in five genes (APC rs1801155, CHEK2 1100delC and rs17879961, DNMT3B rs1569686, MLH1 rs1800734, MUTYH biallelic mutations, rs34612342, rs36053993), were graded strong for evidence of association with CRC risk using combined Venice criteria and FPRP results. Two variants (GSTM1 Present/Null, TERT rs2736100) scored moderate for evidence of association with CRC risk. The remaining 27 variants scored C in one or more Venice criteria or were downgraded due to high FPRP. These variants were graded weak for cumulative evidence of association with CRC risk, based on combined Venice criteria and FRPR results.

Next, we performed stratified meta-analyses by ethnicity for 207 variants among Whites and 34 variants among Asians (Webappendix Table 5) and identified eight additional variants from eight genes to be nominally associated with CRC risk (p<0.05, Table 4 and Webappendix Table 3). Six of them (rs16260, rs28362491, rs1800566, rs1052133, rs1801394, rs7903146) were associated with CRC risk only in Whites; the other two (rs20417, rs1042522) were associated with CRC risk only in Asians. We also performed meta-analyses using dominant and recessive models to evaluate associations of genetic variants with CRC risk, identifying 17 additional variants across 17 genes showing significant association, although none were statistically significant in additive model (Table 5, and Webappendix Table 4). Similar to the 37 positive variants identified in the main analyses, we applied Venice criteria and FRRP to evaluate these 25 variants. We also considered Bonferroni corrected p-value. All were graded weak for cumulative evidence of association with CRC risk.

Table 4. Genetic variants nominally significantly associated with colorectal-cancer risk identified from additional analyses by ethnic group in additive model.

Number evaluated Colorectal-cancer risk meta-analysis Venice
criteria
garde
False-positive
report
probability§
Cumulative
evidence of
association


Genes Variants Alleles * Chromosome MAF (%) Ethinicity Studies Cases Controls OR (95% CI) P value I2 (%) Pheterogeneity
CDH1 rs16260 A/C 16 28.02 White 6 6,761 6,646 0.93 (0.87-1.00) 0.048 23 0.26 AAC 0.642 Weak
MTRR rs1801394 A/G 5 44.65 White 10 6,430 9,746 0.98 (0.93-1.02) 0.030 4 0.41 AAC 0.535 Weak
NQO1 rs1800566 T/C 16 17.88 White 8 6,293 6,566 1.09 (1.03-1.16) 0.006 0 0.50 AAC 0.183 Weak
OGG1 rs1052133 G/C 3 21.59 White 14 5,908 7,355 1.15 (1.01-1.32) 0.033 74 0.00 ACC 0.558 Weak
PTGS2 rs20417 C/G 1 2.76 Asian 4 1,285 3,040 1.44 (1.06-1.95) 0.019 28 0.25 BBC 0.420 Weak
NFKB1 rs28362491 –/ATTG 4 41.05 White 6 1,199 3,134 1.29 (1.11-1.50) 0.001 55 0.05 ACB 0.036 Weak
TCF7L2 rs7903146 T/C 10 29.08 White 3 1,960 14,290 1.12 (1.02-1.22) 0.015 0 0.47 AAC 0.335 Weak
TP53 rs1042522 C/G 17 37.15 Asian 8 3,993 4,943 1.14 (1.02-1.27) 0.021 60 0.02 ACC 0.430 Weak
*

Minor alleles/major alleles (Per Caucasian); majors alleles were treated as reference alleles in the analyses

Allelic ORs were estimated under the additive model. For dominant or recessive models, ORs were estimated for subjects who carry one or two minor alleles or subjects homozygous for the minor alleles,respectively.

MAF=minor allele frequency in controls.

Venice criteria grades are for amount of evidence, replication of the association, and protection from bias.

§

False-positive report probability (FPRP) was determined based on OR and P value of each variant from meta-analysis and a prior probability of 0.05.

Cumulative epidemiological evidence as graded by combination of results from Venice criteria and FPRP for association with colorectal-cancer risk.

Table 5. Additional genetic variants nominally significantly associated with colorectal-cancer risk in meta-analyses using dominant or recessive models.

Number evaluated Colorectal-cancer risk meta-analysis Venice
criteria
grade
False-positive
report
probability§
Cumulative
evidence of
association


Genes Variants Alleles * Chromosome MAF (%) Ethnicity Studies Cases Controls Genetic models OR (95% CI) P value I2 (%) Pheterogeneity
SELS rs34713741 T/C 15 33.22 All 3 1,442 2,071 Dominant 1.21 (1.05-1.39) 0.008 0 0.40 AAC 0.235 Weak
SERPINE1/PAI-1 rs1799889 5G/4G 7 44.94 White 4 2,241 4,534 Dominant 0.87 (0.78-0.97) 0.014 0 0.56 AAC 0.337 Weak
EPHX1 rs2234922 G/A 1 19.38 All 13 5,329 6,700 Dominant 0.91 (0.85-0.99) 0.020 0 0.50 AAC 0.46 Weak
ERCC5/XPG rs17655 C/G 13 24.73 All 9 6,322 7,537 Dominant 1.13 (1.01-1.25) 0.027 38 0.12 ABC 0.48 Weak
RAD18 rs373572 C/T 3 29.22 All 3 3,174 3,397 Dominant 1.18 (1.01-1.37) 0.033 27 0.25 ABC 0.55 Weak
CCND1 rs9344 A/G 11 48.15 All 22 6,316 8,272 Dominant 1.13 (1.01-1.26) 0.035 43 0.00 ABC 0.569 Weak
IGF1 rs35767 T/C 12 24.75 All 3 2,717 4,880 Recessive 0.75 (0.62-0.91) 0.003 0 0.57 BAC 0.11 Weak
MGMT rs12917 T/C 10 12.99 All 7 4,127 7,284 Recessive 1.54 (1.14-2.08) 0.005 0 0.47 BAA 0.158 Weak
CRP rs1800947 C/G 1 5.70 All 4 2,916 3,544 Recessive 3.84 (1.38-10.74) 0.010 0 0.47 CAC 0.277 Weak
HPGD rs2612656 G/A 4 22.75 White 3 2,979 5,575 Recessive 1.31 (1.05-1.64) 0.016 21 0.28 BAC 0.380 Weak
FRZB rs7775 G/C 2 8.77 White 3 1,256 3,000 Recessive 3.20 (1.17-8.73) 0.023 64 0.06 CCC 0.468 Weak
TGFBR1 rs334354 A/G 9 26.71 All 4 1,226 2,776 Recessive 1.38 (1.04-1.84) 0.029 8 0.35 BAC 0.516 Weak
TGFB1 rs4803455 A/C 19 47.48 All 3 2,786 3,516 Recessive 1.14 (1.01-1.28) 0.030 0 0.37 AAC 0.536 Weak
LIPC rs6083 A/G 15 36.52 All 3 4,702 4,914 Recessive 0.85 (0.74-0.99) 0.032 25 0.27 AAA 0.56 Weak
MTHFR rs1801133 T/C 1 33.50 All 68 32,608 44,383 Recessive 0.92 (0.85-1) 0.036 52 0.00 ACC 0.61 Weak
CYP2C9 rs1799853 T/C 10 13.31 White 6 4,915 5,237 Recessive 1.36 (1.02-1.83) 0.038 0 0.76 BAA 0.60 Weak
MTRR rs10380 T/C 5 9.31 White 4 3,869 5,141 Recessive 1.61 (1.02-2.52) 0.039 6 0.36 BAA 0.597 Weak
*

Minor alleles/Major alleles (Per Caucasian); majors alleles were treated as reference alleles in the analyses

Allelic ORs were estimated under the additive model. For dominant or recessive models, ORs were estimated for subjects who carry one or two minor alleles or subjects homozygous for the minor alleles, respectively.

MAF=minor allele frequency in controls.

OR=odds ratio; CI=confidence interval.

Venice criteria grades are for amount of evidence, replication of the association, and protection from bias.

§

False-positive report probability (FPRP) was determined based on OR and P value of each variant from meta-analysis and a prior probability of 0.05.

Cumulative epidemiological evidence as graded by combination of results from Venice criteria and FPRP for association with colorectal-cancer risk.

The vast majority of meta-analyses performed in this project (205 variants in 130 genes) did not yield any evidence of significant association. These meta-analyses included a mean of six studies (range 3-34) and 7,916 participants (range 519-36,982). Table 6 shows results for 40 variants from 33 genes that showed no evidence of association with CRC risk in meta-analyses with a minimum of 5,000 cases and 5,000 controls.

Table 6. Genetic variants showing no relation to colorectal-cancer risk in meta-analyses with at least 5,000 cases and 5,000 controls in additive model.

Number assessed Colorectal cancer risk Heterogeneity



Genes Variants Comparisons* Frequency (%) Ethnicity Studies Cases Controls OR (95% CI) P value I2 (%) Pheterogeneity
ABCB1 rs1202168 T vs C 41.25 White 4 6,318 5,805 1.05 (0.97-1.14) 0.191 55 0.08
ABCB1 rs9282564 G vs A 8.89 White 4 5,792 5,234 1.21 (0.78-1.88) 0.400 96 0.00
ABCB1/MDR1 rs1045642 C vs T 47.52 All 13 6,312 7,128 0.98 (0.89-1.07) 0.611 58 0.00
APC rs459552 A vs T 22.27 All 8 6,654 7,117 0.96 (0.91-1.02) 0.205 0 0.66
CASP8 rs3834129 6 bp ins vs del 41.14 All 10 6,922 10,750 1.05 (0.92-1.20) 0.441 82 0.00
CASR rs1042636 G vs A 8.52 White 4 6,298 7,839 1.00 (0.92-1.09) 0.936 0 0.66
CDH1 rs16260 A vs C 28.09 All 9 7,220 7,045 0.94 (0.88-1.01) 0.116 21 0.26
COMT rs4680 A vs G 48.89 White 5 5,074 5,239 1.05 (0.94-1.16) 0.390 56 0.06
CYP1A1 rs4646903 C vs T 14.42 All 15 7,258 8,154 1.05 (0.92-1.19) 0.500 65 0.00
CYP1A2 rs762551 C vs A 30.45 All 11 7,667 8,242 1.02 (0.94-1.10) 0.664 55 0.01
CYP1B1 rs1056836 G vs C 43.27 White 9 8,709 9,097 1.02 (0.97-1.06) 0.488 0 0.44
CYP1B1 rs1800440 G vs A 18.36 White 6 6,679 6,923 0.97 (0.88-1.07) 0.580 53 0.06
CYP2C9 rs1057910 G vs A 7.16 All 8 8,538 9,182 1.00 (0.86-1.16) 0.994 62 0.01
EPHX1 rs1051740 C vs T 30.95 All 18 10,478 12,372 1.02 (0.97-1.08) 0.447 36 0.06
ERCC2/XPD rs13181 C vs A 30.25 All 17 6,039 8,749 0.99 (0.92-1.05) 0.649 24 0.17
ERCC2/XPD rs1799793 A vs G 29.71 All 7 5,470 7,135 1.01 (0.96-1.07) 0.674 0 0.68
GSTP1 rs1138272 T vs C 9.84 All 10 7,160 7,789 0.92 (0.80-1.06) 0.234 56 0.02
GSTP1 rs1695 G vs A 27.54 All 33 9,986 15,562 0.98 (0.93-1.03) 0.487 25 0.10
IGF1 (CA)n non R19 vs R19 36.71 All 8 5,493 6,827 0.99 (0.89-1.09) 0.769 70 0.00
IGFBP3 rs2854744 C vs A 46.30 All 9 6,872 10,606 1.03 (0.98-1.07) 0.284 0 0.98
IL6 rs1800795 C vs G 38.55 All 14 6,952 8,657 1.01 (0.93-1.11) 0.749 65 0.00
IRS1 rs1801278 A vs G 6.92 White 7 7,048 7,533 1.08 (0.96-1.22) 0.219 39 0.13
MLH1 rs1799977 G vs A 29.56 All 10 6,384 8,972 1.01 (0.92-1.11) 0.904 58 0.01
MTHFD1 rs2236225 A vs G 45.19 White 6 6,535 9,347 0.98 (0.90-1.07) 0.603 67 0.01
MTHFR rs1801131 C vs A 30.14 All 34 14,965 22,017 0.99 (0.94-1.03) 0.514 32 0.04
MTR rs1805087 G vs A 19.42 All 19 12,945 17,655 0.99 (0.94-1.05) 0.717 33 0.09
MTRR rs1801394 A vs G 48.33 All 16 7,674 11,593 0.96 (0.91-1.01) 0.110 37 0.12
MUTYH rs3219484 A vs G 8.20 White 3 5,391 5,222 0.95 (0.68-1.34) 0.787 91 0.00
MUTYH rs3219489 C vs G 28.44 All 4 5,082 5,280 1.09 (0.92-1.28) 0.317 81 0.00
NAT1 Phenotype Fast vs slow 44.37 All 15 7,336 9,825 1.03 (0.92-1.16) 0.596 61 0.00
NQO1 rs1800566 T vs C 22.90 All 12 7,209 8,783 1.07 (0.99-1.15) 0.090 32 0.13
OGG1 rs1052133 G vs C 25.79 All 18 6,654 8,599 1.10 (0.99-1.23) 0.085 71 0.00
PPARG rs1801282 G vs C 9.10 All 18 13,758 20,300 0.97 (0.91-1.03) 0.339 18 0.24
PPARG rs3856806 T vs C 11.39 All 10 6,189 8,707 1.03 (0.96-1.11) 0.412 2 0.42
PTGS2/COX2 rs5275 C vs T 33.17 All 10 6,059 8,084 1.01 (0.97-1.07) 0.579 0 0.98
TGFBR1 rs11466445 9 bp del vs ins 9.19 All 10 6,338 6,689 1.04 (0.96-1.13) 0.379 1 0.43
TP53 rs1042522 C vs G 31.28 All 31 10,515 12,909 1.00 (0.92-1.10) 0.922 72 0.00
VDR rs2228570 A vs G 37.32 All 20 13,631 15,155 1.00 (0.94-1.06) 0.959 53 0.00
VDR rs7975232 A vs C 43.08 All 9 5,421 5,377 1.08 (0.98-1.19) 0.105 58 0.02
XRCC1 rs25487 G vs A 31.87 All 25 9,541 14,448 1.04 (0.97-1.11) 0.281 49 0.00

OR=odds ratio; CI=confidence interval.

*

Genetic comparison used in meta-analysis: Minor allele vs Major allele.

Frequency of minor allele or effect genotype(s) in controls in primary meta-analysis

Discussion

To our knowledge, this study is the largest and most comprehensive assessment of the literature regarding candidate-gene association studies for CRC risk conducted to date. We systematically evaluated data for 3,603 variants in 1,378 independent candidate genes from 950 reports published in the past two decades. Several meta-analyses have been conducted to evaluate candidate-gene association studies of CRC risk for single gene or several genes. These early analyses, however, were limited to 52 variants in 34 genes (Webappendix Table 6). Recently, Theodoratou et al (42) evaluated genetic variants for CRC risk using data from 635 publications and conducted meta-analyses for 92 polymorphisms in 64 genes, including 18 variants identified from GWAS studies. We did not include GWAS-identified risk variants in this study since they have been robustly replicated and should be considered to have strong evidence of association. Our study not only provides an update of the variants meta-analyzed previously using data from more studies and a bigger sample size, but also assessed more than 193 variants that have not been assessed in any previous meta-analyses, including the meta-analysis conducted by Theodoratou, et al (42). Of the 267 variants in 150 genes summarized by our 910 meta-analyses, 62 variants in 50 genes showed nominally significant association with CRC risk. Using Venice criteria plus FPRP results, we graded eight variants strong for cumulative epidemiological evidence of association with CRC risk (APC rs1801155, CHEK2 1100delC and rs17879961, DNMT3B rs1569686, MLH1 rs1800734, MUTYH biallelic mutations, rs34612342, rs36053993), two variants moderate for cumulative evidence of association with CRC risk (GSTM1 Present/Null, TERT rs2736100), and the remaining 52 variants weak. Of the eight strong variants, MUTYH rs36053993 was also rated as having ‘strong’ evidence for association in Theodoratou's study (42). For 40 variants in 33 genes, we showed no evidence of association with CRC risk in meta-analyses with large sample sizes (10,000 individuals minimum). Our study provides a comprehensive research synopsis of candidate-gene association studies of CRC risk. Results from this study will be helpful for future studies to evaluate genetic risk factors for CRC.

The adenomatous polyposis coli (APC) gene, a tumor suppressor gene at chromosome 5q21, encodes a large multidomain protein including 2,843 amino acids that play a central role in the Wnt singling pathway (43). Germline pathogenic mutations in the APC gene result in autosomal dominant inherited familial adenomatous polyposis (FAP) in which more than 100 adenomatous polyps can develop (3;6). Our meta-analysis provides strong evidence of association for CRC risk with a heterozygous variant at codon 1,307 in exon 15 of the gene (rs1801155), with a 1.96-fold increased risk of CRC in Jews (including Ashkenazi and Israeli Jews). This variant is present in 7% of Ashkenazi Jews, while population frequency is very low in Europeans and Asians (based on HapMap data).

The CHEK2 gene maps to chromosome 22q12.1 and encodes a protein kinase that is activated in response to DNA damage and is involved in cell cycle arrest (44). Our meta-analysis revealed strong evidence of association with CRC risk for a truncating mutation at codon 381 in exon 10 (1100delC) and a missense polymorphism in exon 3 (rs17879961, Ile157Thr). The 1100delC mutation leads to kinase-deficient molecules due to protein truncation (45), while Ile157Thr results in a CHEK2 protein with deficient binding and phosphorylation of downstream substrates (46). Interestingly, in a previous meta-analysis, we found strong cumulative evidence of association for these two variants with breast-cancer risk (29), indicating the CHEK2 gene may play a role in both CRC and breast cancer.

Our meta-analyses revealed strong evidence for an association of CRC risk with three rare variants in the MUTYH gene based on data from 17 population-based studies excluding cases with MUTYH-associated polyposis (MAP). Biallelic mutations in the MUTYH gene mainly constitute either homozygotes (two same) or compound heterozygotes (two different) of Gly382Asp and Tyr165Cys. Gly382Asp and Tyr165Cys are located in exon 7 and exon 13 of the MUTYH gene, respectively, and have been predicted to be deleterious by SIFT (47) and confirmed to be pathogenic (41). However, the monoallelic mutation, including a heterozygous genotype of 12 mutations in the MUTYH gene showed only weak evidence for association with CRC risk in our study. Two common variants (MLH1 rs1800734, DNMT3B rs1569686) showed strong cumulative evidence of association with CRC risk. MLH1, which maps to chromosome 3p22.2, is a human homolog of the E. coli DNA mismatch repair gene mutL and is a locus frequently mutated in hereditary nonpolyposis colon cancer (HNPCC) (48). Approximately 85% of genetically defined HNPCC patients have germline mutations in the MLH1 gene (49). Interestingly, meta-analysis of five studies, comprised of 801 microsatellite instability high (MSI-H) cases and 10,890 controls, identified a highly significant association of rs1800734 (-93G>A) with MSI-H CRC (p=1.67×10-12). This promoter SNP showed a much stronger association with MSI-H CRC (OR=1.51) than overall CRC cases (OR=1.05, p=0.013) based on meta-analysis of six studies: 17,174 cases, 13,166 controls. The DNMT3B gene plays an important role in the generation of aberrant methylation in carcinogenesis (50). Although this gene was not identified as a susceptibility locus for CRC by GWAS, we still rated the SNP (rs1569686) in this gene as having strong evidence for association given the highly consistent results across studies included in our meta-analysis.

Two common variants (GSTM1 null, TERT rs2736100) scored moderate for cumulative evidence of association with CRC risk, and both of them were upgraded from ‘weak’ for having a low false-positive report probability (<0.05). Additional investigations of these variants are needed, particularly since sample sizes of studies for both variants are relatively small. Cumulative epidemiological evidence of association with CRC was weak for the remaining 52 variants, many of which are common and were identified through ethnicity-specific meta-analyses or meta-analyses using dominant or recessive models. Well-designed studies with large samples are warranted to clarify association with CRC for these variants.

Our meta-analysis provides no evidence for association with CRC risk for 205 of the 267 variants evaluated in our study, supporting the notion that the vast majority of genetic variants evaluated in candidate gene association studies may not be truly related to CRC risk. Methodological limitations in previous candidate gene studies, such as small sample size, may explain some of the null associations. However, of the 205 non-significant variants, 40 variants in 33 genes showed no association with CRC risk in meta-analyses including a minimum of 5,000 cases, 5,000 controls, which provides approximately 85% power to detect an OR of 1.15 under the additive model for a variant with MAF 0.10, Type 1 error 0.05. Thus, future epidemiological studies with a similar sample size are unlikely to be helpful in assessing effects of these variants.

There are several limitations of this study. First, although we have systematically searched the literature to identify eligible studies using two stages, it is possible that some studies might have been missed. PubMed was the main database we used for our literature search. To expand our search, we also queried Google Scholar which links multiple databases. Compared with previous meta-analyses which also used multiple databases (Webappendix Table 7), we yielded more studies with a bigger combined sample size for most variants included in our evaluation. Second, we did not assess gene-gene or gene-environment interactions. Additional studies specifically designed to identify these interactions are needed. Third, heterogeneity across studies, including differences in study populations, study designs and genotyping platforms, may have contributed to some of the null associations in this study. More than one-third of the meta-analyses had high heterogeneity, especially for variants with non-significant association. We attempted to address study heterogeneity through stratification analyses by ethnicity. Other sources of heterogeneity also exist and are difficult to address in this meta-analysis because of limited available data. Finally, Venice criteria use p-value<0.05 as significance level to determine association. However, we found most associations with a p-value 0.005-0.05 to have weak evidence for association with CRC in this study. Thus, a more stringent threshold of p-value would be helpful to evaluate evidence for a true-positive association. In addition, Venice criteria offer the advantage of evaluating multiple sources of potential bias, some of which, such as genotyping error, phenotype misclassification, and population stratification, are difficult to assess in meta-analyses.

In our meta-analyses, we identified ten genetic variants showing strong or moderate epidemiological evidence of associations with CRC risk. If all these 10 variants are confirmed to be associated with CRC risk, they could explain approximately 5% of familial CRC risk in European populations. Nevertheless, genetic risk factors identified to date account for less than 30% familial risk of CRC. Some of the missing heritability could be due to methylation markers, copy number variations, structural variants, and rare variants, for which conventional candidate gene association studies and GWAS are inadequate to investigate. Gene-gene and gene-environment interactions may also play a significant role in the etiology of CRC. Additional research, including those with a large sample size, use of higher density SNP arrays and next-generation sequencing technologies, imputation using data from the 1000 Genomes Project and better defined CRC subtypes, are needed to clarify the missing heritability of CRC. Our study, the largest field synopsis conducted to date for CRC candidate gene association studies, not only summarizes the current literature regarding genetic epidemiology of CRC, but also provides comprehensive data and helpful clues for designing future studies to further investigate genetic risk factors for CRC.

Supplementary Material

Webappendix

Significance of this study.

What is already known about this subject?

  • Colorectal cancer (CRC) is one of most commonly diagnosed cancers in the world.

  • Approximately 35% of CRC risk could be attributable to inheritable factors.

  • Many studies have been conducted to evaluate associations between genetic variants in candidate genes and risk of CRC over the past two decades – with inconsistent results.

What are the new findings?

  • This study is the largest, most comprehensive assessment of the literature to date regarding genetic association studies in CRC risk.

  • Of the 267 variants evaluated, 62 variants in 50 candidate genes showed a statistically significant association with CRC risk.

  • Eight variants in five genes showed strong cumulative evidence of association with CRC risk, and two variants in two genes showed moderate evidence.

  • This study provides clues for designing future studies to further investigate genetic risk factors for CRC.

How might it impact on clinical practice in the foreseeable future?

  • Genetic risk variants may be used to identify high-risk individuals for CRC screening and prevention.

Acknowledgments

We thank the authors of many original studies for clarification of data and providing additional information, and Mary Jo Daly for her help with manuscript preparation. This research is supported in part by NIH grant R37 CA070867 and Ingram Professorship funds.

The corresponding author has the right to grant on behalf of all authors and does grant on behalf of all authors, an exclusive licence (or nonexclusive for government employees) on a worldwide basis to the BMJ Publishing Group Ltd and its Licensees to permit this article (if accepted) to be published in Gut editions and any other BMJPGL products to exploit all subsidiary rights, as set out in our licence.

Abbreviations

CRC

colorectal cancer

GWAS

genome-wide association studies

MAF

minor allele frequency

HWE

Hardy-Weinberg equilibrium

ORs

odds ratios

CIs

confidence intervals

FPRP

false-positive report probability

APC

adenomatous polyposis coli

FAP

familial adenomatous polyposis

MAP

MUTYH-associated polyposis

HNPCC

hereditary nonpolyposis colon cancer

MSI-H

microsatellite instability high

Footnotes

Conflicts of interest: We declare no conflict of interest.

Contributors: X Ma and B Zhang conducted literature searches, data extraction, quality assessment and analyses. X Ma and B Zhang drafted the manuscript with substantial contributions from W Zheng. W Zheng reviewed results and provided guidelines for presentation and interpretation.

Reference List

  • 1.Jemal A, Bray F, Center MM, et al. Global cancer statistics. CA Cancer J Clin. 2011;61(2):69–90. doi: 10.3322/caac.20107. [DOI] [PubMed] [Google Scholar]
  • 2.Lichtenstein P, Holm NV, Verkasalo PK, et al. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000;343(2):78–85. doi: 10.1056/NEJM200007133430201. [DOI] [PubMed] [Google Scholar]
  • 3.Jass JR. Familial colorectal cancer: pathology and molecular characteristics. Lancet Oncol. 2000;1:220–6. doi: 10.1016/s1470-2045(00)00152-2. [DOI] [PubMed] [Google Scholar]
  • 4.Cunningham D, Atkin W, Lenz HJ, et al. Colorectal cancer. Lancet. 2010;375(9719):1030–47. doi: 10.1016/S0140-6736(10)60353-4. [DOI] [PubMed] [Google Scholar]
  • 5.Tenesa A, Dunlop MG. New insights into the aetiology of colorectal cancer from genome-wide association studies. Nat Rev Genet. 2009;10(6):353–8. doi: 10.1038/nrg2574. [DOI] [PubMed] [Google Scholar]
  • 6.de la Chapelle A. Genetic predisposition to colorectal cancer. Nat Rev Cancer. 2004;4(10):769–80. doi: 10.1038/nrc1453. [DOI] [PubMed] [Google Scholar]
  • 7.Fearnhead NS, Britton MP, Bodmer WF. The ABC of APC. Hum Mol Genet. 2001;10(7):721–33. doi: 10.1093/hmg/10.7.721. [DOI] [PubMed] [Google Scholar]
  • 8.Kinzler KW, Vogelstein B. Lessons from hereditary colorectal cancer. Cell. 1996;87(2):159–70. doi: 10.1016/s0092-8674(00)81333-1. [DOI] [PubMed] [Google Scholar]
  • 9.van Lier MG, Wagner A, Mathus-Vliegen EM, et al. High cancer risk in Peutz-Jeghers syndrome: a systematic review and surveillance recommendations. Am J Gastroenterol. 2010;105(6):1258–64. doi: 10.1038/ajg.2009.725. [DOI] [PubMed] [Google Scholar]
  • 10.Nagy R, Sweet K, Eng C. Highly penetrant hereditary cancer syndromes. Oncogene. 2004;23(38):6445–70. doi: 10.1038/sj.onc.1207714. [DOI] [PubMed] [Google Scholar]
  • 11.Jasperson KW, Tuohy TM, Neklason DW, et al. Hereditary and familial colon cancer. Gastroenterology. 2010;138(6):2044–58. doi: 10.1053/j.gastro.2010.01.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lubbe SJ, Di Bernardo MC, Chandler IP, et al. Clinical implications of the colorectal cancer risk associated with MUTYH mutation. J Clin Oncol. 2009;27(24):3975–80. doi: 10.1200/JCO.2008.21.6853. [DOI] [PubMed] [Google Scholar]
  • 13.Aaltonen L, Johns L, Jarvinen H, et al. Explaining the familial colorectal cancer risk associated with mismatch repair (MMR)-deficient and MMR-stable tumors. Clin Cancer Res. 2007;13(1):356–61. doi: 10.1158/1078-0432.CCR-06-1256. [DOI] [PubMed] [Google Scholar]
  • 14.Zanke BW, Greenwood CM, Rangrej J, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007;39(8):989–94. doi: 10.1038/ng2089. [DOI] [PubMed] [Google Scholar]
  • 15.Tomlinson I, Webb E, Carvajal-Carmona L, et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet. 2007;39(8):984–8. doi: 10.1038/ng2085. [DOI] [PubMed] [Google Scholar]
  • 16.Broderick P, Carvajal-Carmona L, Pittman AM, et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet. 2007;39(11):1315–7. doi: 10.1038/ng.2007.18. [DOI] [PubMed] [Google Scholar]
  • 17.Jaeger E, Webb E, Howarth K, et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nat Genet. 2008;40(1):26–8. doi: 10.1038/ng.2007.41. [DOI] [PubMed] [Google Scholar]
  • 18.Tenesa A, Farrington SM, Prendergast JG, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet. 2008;40(5):631–7. doi: 10.1038/ng.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tomlinson IP, Webb E, Carvajal-Carmona L, et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat Genet. 2008;40(5):623–30. doi: 10.1038/ng.111. [DOI] [PubMed] [Google Scholar]
  • 20.Houlston RS, Webb E, Broderick P, et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat Genet. 2008;40(12):1426–35. doi: 10.1038/ng.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Houlston RS, Cheadle J, Dobbins SE, et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat Genet. 2010;42(11):973–7. doi: 10.1038/ng.670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dunlop MG, Dobbins SE, Farrington SM, et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat Genet. 2012;44(7):770–6. doi: 10.1038/ng.2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cui R, Okada Y, Jang SG, et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut. 2011;60(6):799–805. doi: 10.1136/gut.2010.215947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jia WH, Zhang B, Matsuo K, et al. Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. Nat Genet. 2013;45(2):191–6. doi: 10.1038/ng.2505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ioannidis JP, Boffetta P, Little J, et al. Assessment of cumulative evidence on genetic associations: interim guidelines. Int J Epidemiol. 2008;37(1):120–32. doi: 10.1093/ije/dym159. [DOI] [PubMed] [Google Scholar]
  • 26.Khoury MJ, Bertram L, Boffetta P, et al. Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases. Am J Epidemiol. 2009;170(3):269–79. doi: 10.1093/aje/kwp119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bertram L, McQueen MB, Mullin K, et al. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet. 2007;39(1):17–23. doi: 10.1038/ng1934. [DOI] [PubMed] [Google Scholar]
  • 28.Allen NC, Bagade S, McQueen MB, et al. Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet. 2008;40(7):827–34. doi: 10.1038/ng.171. [DOI] [PubMed] [Google Scholar]
  • 29.Zhang B, Beeghly-Fadiel A, Long J, et al. Genetic variants associated with breast-cancer risk: comprehensive research synopsis, meta-analysis, and epidemiological evidence. Lancet Oncol. 2011;12(5):477–88. doi: 10.1016/S1470-2045(11)70076-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chatzinasiou F, Lill CM, Kypreou K, et al. Comprehensive field synopsis and systematic meta-analyses of genetic association studies in cutaneous melanoma. J Natl Cancer Inst. 2011;103(16):1227–35. doi: 10.1093/jnci/djr219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lill CM, Roehr JT, McQueen MB, et al. Comprehensive research synopsis and systematic meta-analyses in Parkinson's disease genetics: The PDGene database. PLoS Genet. 2012;8(3):e1002548. doi: 10.1371/journal.pgen.1002548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ioannidis JP, Ntzani EE, Trikalinos TA. 'Racial' differences in genetic effects for complex diseases. Nat Genet. 2004;36(12):1312–8. doi: 10.1038/ng1474. [DOI] [PubMed] [Google Scholar]
  • 33.DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–88. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
  • 34.Wigginton JE, Cutler DJ, Abecasis GR. A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet. 2005;76(5):887–93. doi: 10.1086/429864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Skol AD, Scott LJ, Abecasis GR, et al. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet. 2006;38(2):209–13. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]
  • 36.Lau J, Ioannidis JP, Schmid CH. Quantitative synthesis in systematic reviews. Ann Intern Med. 1997;127(9):820–6. doi: 10.7326/0003-4819-127-9-199711010-00008. [DOI] [PubMed] [Google Scholar]
  • 37.Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539–58. doi: 10.1002/sim.1186. [DOI] [PubMed] [Google Scholar]
  • 38.Harbord RM, Egger M, Sterne JA. A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med. 2006;25(20):3443–57. doi: 10.1002/sim.2380. [DOI] [PubMed] [Google Scholar]
  • 39.Ioannidis JP, Trikalinos TA. An exploratory test for an excess of significant findings. Clin Trials. 2007;4(3):245–53. doi: 10.1177/1740774507079441. [DOI] [PubMed] [Google Scholar]
  • 40.Wacholder S, Chanock S, Garcia-Closas M, et al. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst. 2004;96(6):434–42. doi: 10.1093/jnci/djh075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Al-Tassan N, Chmiel NH, Maynard J, et al. Inherited variants of MYH associated with somatic G:C-->T:A mutations in colorectal tumors. Nat Genet. 2002;30(2):227–32. doi: 10.1038/ng828. [DOI] [PubMed] [Google Scholar]
  • 42.Theodoratou E, Montazeri Z, Hawken S, et al. Systematic meta-analyses and field synopsis of genetic association studies in colorectal cancer. J Natl Cancer Inst. 2012;104(19):1433–57. doi: 10.1093/jnci/djs369. [DOI] [PubMed] [Google Scholar]
  • 43.Fodde R, Smits R, Clevers H. APC, signal transduction and genetic instability in colorectal cancer. Nat Rev Cancer. 2001;1(1):55–67. doi: 10.1038/35094067. [DOI] [PubMed] [Google Scholar]
  • 44.Matsuoka S, Huang M, Elledge SJ. Linkage of ATM to cell cycle regulation by the Chk2 protein kinase. Science. 1998;282(5395):1893–7. doi: 10.1126/science.282.5395.1893. [DOI] [PubMed] [Google Scholar]
  • 45.Meijers-Heijboer H, van den Ouweland A, Klijn J, et al. Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nat Genet. 2002;31(1):55–9. doi: 10.1038/ng879. [DOI] [PubMed] [Google Scholar]
  • 46.Kilpivaara O, Vahteristo P, Falck J, et al. CHEK2 variant I157T may be associated with increased breast cancer risk. Int J Cancer. 2004;111(4):543–7. doi: 10.1002/ijc.20299. [DOI] [PubMed] [Google Scholar]
  • 47.Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–4. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bronner CE, Baker SM, Morrison PT, et al. Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non-polyposis colon cancer. Nature. 1994;368(6468):258–61. doi: 10.1038/368258a0. [DOI] [PubMed] [Google Scholar]
  • 49.Goecke T, Schulmann K, Engel C, et al. Genotype-phenotype comparison of German MLH1 and MSH2 mutation carriers clinically affected with Lynch syndrome: a report by the German HNPCC Consortium. J Clin Oncol. 2006;24(26):4285–92. doi: 10.1200/JCO.2005.03.7333. [DOI] [PubMed] [Google Scholar]
  • 50.Robertson KD, Keyomarsi K, Gonzales FA, et al. Differential mRNA expression of the human DNA methyltransferases (DNMTs) 1, 3a and 3b during the G(0)/G(1) to S phase transition in normal and tumor cells. Nucleic Acids Res. 2000;28(10):2108–13. doi: 10.1093/nar/28.10.2108. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Webappendix

RESOURCES