Abstract
Most genome-wide association studies (GWAS) were analyzed using single marker tests in combination with stringent correction procedures for multiple testing. Thus, a substantial proportion of associated single nucleotide polymorphisms (SNPs) remained undetected and may account for missing heritability in complex traits. Model selection procedures present a powerful alternative to identify associated SNPs in high-dimensional settings. In this GWAS including 1060 colorectal cancer cases, 689 cases of advanced colorectal adenomas and 4367 controls we pursued a dual approach to investigate genome-wide associations with disease risk applying both, single marker analysis and model selection based on the modified Bayesian information criterion, mBIC2, implemented in the software package MOSGWA. For different case-control comparisons, we report models including between 1-14 candidate SNPs. A genome-wide significant association of rs17659990 (P=5.43×10-9, DOCK3, chromosome 3p21.2) with colorectal cancer risk was observed. Furthermore, 56 SNPs known to influence susceptibility to colorectal cancer and advanced adenoma were tested in a hypothesis-driven approach and several of them were found to be relevant in our Austrian cohort. After correction for multiple testing (α=8.9×10-4), the most significant associations were observed for SNPs rs10505477 (P=6.08×10-4) and rs6983267 (P=7.35×10-4) of CASC8, rs3802842 (P=8.98×10-5, COLCA1,2), and rs12953717 (P=4.64×10-4, SMAD7). All previously unreported SNPs demand replication in additional samples. Reanalysis of existing GWAS datasets using model selection as tool to detect SNPs associated with a complex trait may present a promising resource to identify further genetic risk variants not only for colorectal cancer.
Keywords: advanced colorectal adenomas, colorectal cancer, GWAS, model selection, MOSGWA
INTRODUCTION
Numerous genome-wide association studies (GWAS) in diverse complex diseases have uncovered hundreds of genetic risk factors by determining hundred thousands of single nucleotide polymorphisms (SNPs) in cohorts of thousands of individuals in a hypothesis-free approach. Although these findings provide valuable insights into the genetic architecture of common diseases they collectively account for a relatively small proportion of heritability [1].
Colorectal carcinogenesis is a complex multi-step process influenced by both, genetic and environmental risk factors. Only 5-10% [2] of all colorectal cancer (CRC) cases can be ascribed to hereditary syndromes and explained by rare but high-penetrant germline mutations. Another 30% of CRCs can be attributed to non-syndromic familial cases with increased familial risk but without evidence of predisposing mutations. The remaining CRCs evolve sporadically and are influenced by numerous genetic variants with low penetrance but of high prevalence in the population (>1%). This common disease-common variant hypothesis was formulated in the early days of GWAS, but was relativized when identified risk loci explained only a small fraction of genetic variance in complex traits. More refined concepts include the common disease-rare variant hypothesis [2], the infinitesimal and the broad sense heritability model (discussed in [3]).
GWAS of CRC conducted in European but also Asian populations have discovered so far more than 50 risk variants [4–29] mapping to 23 susceptibility loci. Although GWAS have successfully identified multiple associations of genetic variants with risk of CRC, collectively the CRC SNPs identified in European populations account only for 8% of familial CRC risk [30]. Additional rare risk variants still remain undetected and in part may account for the missing heritability of CRC.
Typically, GWAS aims at the identification of a relatively small set of SNPs associated with the investigated phenotype. SNPs exceeding a genome-wide significance threshold (P < 5×10-8) are tested for replication in independent samples. Inevitably, these necessarily stringent penalties for multiple testing have the consequence that a relatively large proportion of associated SNPs cannot be detected. Consequently, the majority of missing heritability may be due to SNPs with effects below the level of genome-wide significant associations [3].
The vast majority of GWAS have been analyzed via single marker analysis. One advantage of this approach is its computational inexpensiveness. However, this standard approach to analyze association with disease risk for each SNP individually assumes complete independence of the analyzed SNPs [31]. In contrast, genetic risk often can be explained as the influence of multiple SNPs mapping to various chromosomal regions resulting in a phenotype [32]. Furthermore, single marker tests cannot take into consideration the distinct correlation structure among SNPs caused by linkage disequilibrium (LD) and interaction effects [31]. Usually, individual effect sizes of SNPs are small, but collectively their impact on the phenotype can be substantial [32]. There are other weighty reasons for considering all genotyped SNPs simultaneously in analysis of GWAS. The predictive power of a single SNP is usually very low, but considering more disease relevant SNPs can improve the accuracy of prediction [33]. In the context of complex diseases multiple genes are involved in disease etiology, thus a joint analysis of multiple SNPs can be more informative and better reflect the relationship between genotype and phenotype than single SNP models [34].
A comprehensive overview of the advantages of model selection based approaches to analysis of GWAS is provided in Frommlet et al. 2016 [35], particularly addressing selection procedures based on modifications of the Bayesian information criterion (BIC) [36]. In high dimensional settings like GWAS where only a small number of SNPs is expected to be associated with disease (under sparsity), it has been shown repeatedly that BIC tends to select too large models. Various modifications of BIC have been proposed to solve this problem, among them mBIC2 [37, 38] which was designed to control the false discovery rate (FDR).
Here, we pursued a dual analysis strategy, reporting results from both single marker tests and MOSGWA [39], an implementation of a model selection procedure based on mBIC2. Genome-wide SNP data of 1060 CRC cases, 689 patients with advanced colorectal adenomas and 4367 controls were analyzed presenting the first GWAS of CRC in an Austrian population.
RESULTS
Downstream analysis was performed for 492,217 SNPs using the software package MOSGWA. Additionally, results from single marker analysis via PLINK are reported using Cochran Armitage trend test (CAT) as well as univariate logistic regression models including the first four principle components as covariates to account for population structure.
Our study population consisted of four different case and control groups, CRC cases (A), advanced adenomas (B), colonoscopy-negative CORSA controls (C) and KORA controls (D) (Table 1). Further clinical characteristics of CRC cases and advanced adenomas are provided in Supplementary Table 1. Specifically, we report the following four case-control comparisons: A vs. C, A vs. CD, AB vs. CD and B vs CD (Table 2).
Table 1. Study population.
TotalPre-QC | TotalPost-QC (%) | Male (%) | Female (%) | Mean age ± SD [y] | |
---|---|---|---|---|---|
CRC (A) | 1060 | 978 (100.0) | 584 (59.7) | 394 (40.3) | 63.5 ± 12.0 |
AA (B) | 689 | 636 (100.0) | 428 (67.3) | 208 (32.7) | 64.5 ± 10.3 |
ControlCORSA (C) | 928 | 855 (100.0) | 496 (58.0) | 359 (42.0) | 65.1 ± 11.8 |
ControlKORA (D) | 3439 | 3439 (100.0) | 1690 (49.1) | 1749 (50.9) | 53.8 ± 14.0 |
Total | 6116 | 5908 | 3198 | 2710 | 58.2 ± 13.9 |
CRC Colorectal cancer cases.
AA Advanced adenomas.
Table 2. Single marker tests and model selection.
SNP | Chromosome | Gene | OR (Logistic) | P (Logistic) | Rank (Logistic) | OR (Model) | P (SM) |
---|---|---|---|---|---|---|---|
A vs. C (978 cases vs. 855 controls) | |||||||
rs1912804 | 16q23.1 | WWOX | 1.69 | 3.39E-07 | 1 | 1.70 | 1.96E-07 |
rs9583269 | 13q33.3 | MYO16 | 0.69 | 6.19E-07 | 2 | 0.69 | 1.24E-06 |
rs10495672 | 2p24.2 | KCNS3 | 1.43 | 3.23E-06 | 7 | 1.46 | 1.95E-06 |
A vs. CD (978 cases vs. 4294 controls) | |||||||
rs17659990 | 3p21.2 | DOCK3 | 1.93 | 1.35E-07 | 1 | 1.98 | 1.59E-08 |
rs694339 | 18q22.3 | CBLN2 | 1.97 | 1.41E-07 | 2 | 1.99 | 7.89E-07 |
rs12916300 | 15q13.1 | HERC2 | 1.35 | 3.75E-07 | 4 | 1.35 | 1.37E-04 |
rs16845107 | 3q13.2 | WDR52 | 0.52 | 5.94E-07 | 6 | 0.52 | 1.49E-06 |
rs11927424 | 3p11.1 | C3orf38 | 1.31 | 1.17E-06 | 11 | 1.32 | 1.07E-07 |
rs16869961 | 4p15.31 | KCNIP4 | 0.75 | 1.51E-06 | 13 | 0.74 | 1.25E-04 |
rs7774435 | 6p21.32 | HLA-DQA2 | - | - | - | 1.46 | 3.52E-04 |
AB vs. CD (1614 cases vs. 4294 controls) | |||||||
rs17659990 | 3p21.2 | DOCK3 | 1.88 | 5.43E-09 | 1 | 1.96 | 2.94E-10 |
rs7742915 | 6p21.2 | BTBD9 | 1.31 | 8.52E-08 | 2 | 1.32 | 1.44E-06 |
rs16944613 | 15q26.1 | CRTC3 | 1.32 | 1.49E-07 | 4 | 1.31 | 3.71E-07 |
rs13129679 | 4p16.3 | RNF4 | 2.30 | 2.38E-07 | 5 | 2.42 | 8.82E-07 |
rs12953717 | 18q21.1 | SMAD7 | 1.26 | 3.00E-07 | 6 | 1.27 | 6.68E-08 |
rs742223 | 6p24.1 | TMEM170B | 0.60 | 5.47E-07 | 9 | 0.59 | 9.49E-08 |
rs2184857 | 1q43 | CHRM3 | 0.79 | 7.10E-07 | 11 | 0.78 | 3.35E-09 |
rs4954585 | 2q22.1 | CXCR4 | 1.26 | 9.84E-07 | 12 | 1.26 | 2.29E-08 |
rs7942260 | 11q21 | PIWIL4 | 0.69 | 7.28E-06 | 30 | 0.67 | 1.84E-06 |
rs7221059 | 17q25.2 | LINC00338 | 0.76 | 1.02E-05 | 45 | 0.74 | 2.92E-05 |
rs4361767 | 8p23.1 | LOC157273 | 0.77 | 1.31E-05 | 62 | 0.75 | 6.91E-05 |
rs340145 | 3q13.2 | TMPRSS7 | 0.82 | 3.91E-05 | 142 | 0.79 | 1.45E-04 |
rs7774435 | 6p21.32 | HLA-DQA2 | - | - | - | 1.65 | 5.82E-05 |
rs3130954 | 6p21.33 | HCG27 | - | - | - | 1.79 | 1.04E-04 |
B vs. CD (636 cases vs. 4294 controls) | |||||||
rs7944251 | 11q14.3 | FAT3 | 0.66 | 3.97E-07 | 2 | 0.66 | 1.27E-08 |
A CRC cases (CORSA).
B Advanced adenomas (CORSA).
C Controls (CORSA).
D Controls (KORA).
OR (Model) Odds ratio based on the coefficients of the model selected by MOSGWA.
P (SM) Single marker test P-value (Cochran Armitage trend test).
OR (Logistic) Odds ratio based on univariate logistic model.
P (Logistic) P-value of univariate logistic model.
Rank (Logistic) Rank of the SNP in the top SNP list of P (Logistic) sorted by P-value.
- HLA region excluded from logistic models.
Table 2 provides for each of the four comparisons some basic information and odds ratios for those SNPs corresponding to the model selected by MOSGWA. Additionally, P-values from CAT test, odds ratios, and P-values based on the univariate logistic model as well as the corresponding rank of each SNP according to the logistic model are presented. A list of the 200 top ranking SNPs for each contrast is provided as Supplementary Materials (Supplementary Table 2).
A vs. C
For the comparison A vs. C, considering only Austrian cases and controls, MOSGWA selected a model of size three including SNPs rs1912804, rs9583269, and rs10495672. The best SNP rs1912804 has a marginal P-value of 3.39×10-7 that is not significant at the commonly adopted genome-wide significance level α=5.0×10-8. The three selected SNPs are among the top seven single marker SNPs (ranks 1, 2 and 7).
A vs. CD
Adding KORA controls increased power to detect associated SNPs. Accordingly, the comparison A vs. CD yielded a model containing seven SNPs including the top SNP rs17659990 (P=1.35×10-7; DOCK3).
AB vs. CD
For the joint analysis of CRC and advanced adenomas versus all controls, AB vs. CD, MOSGWA selected 14 SNPs, including rs17659990 (P=5.43×10-9, DOCK3) that reached the generally accepted level of genome-wide significance, followed by borderline significant rs7742915 (P=8.52×10-8, BTBD9), rs16944613 (P=1.49×10-7, CRTC3), rs13129679 (P=2.38×10-7, RNF4) and rs12953717 (P=3.00×10-7, SMAD7), a well-known CRC susceptibility variant.
B vs. CD
For the comparison of advanced adenomas against the combined control (B vs. CD) MOSGWA identified one SNP on 11q14 (rs7944251, P=3.97×10-7, FAT3). Using only CORSA controls (B vs. C) there was not sufficient power to detect any SNP and MOSGWA selected the null model.
Genotype distributions of 56 CRC or colorectal adenoma susceptibility SNPs previously identified by GWAS were analyzed in the present genome-wide data set. Uncorrected P-values for all calculated case-control comparisons are provided in Table 3. For CRC SNPs, not covered by Axiom array, the distance of the proxy SNP from the original CRC SNP is provided in base pairs. P-values below 0.05 are given in bold, P-values below Bonferroni corrected significance level α=8.9×10-4 are given in bold and are underlined. Several SNPs previously identified by CRC GWAS exhibit significantly different genotype distributions in cases and controls. The strongest associations were found for rs12953717 of SMAD7 on chromosome arm 18q21.1 (P(Avs.C)=4.64×10-4, P(Avs.CD)=2.83×10-5, P(ABvs.CD)=8.64×10-6). Significant associations were also observed for the SMAD7 SNP rs4939827 (P(Avs.CD)=4.03×10-4, P(ABvs.CD)=1.53×10-4) and the RHPN2 SNP rs10411210 on 19q13.11 (P(Avs.CD)=3.28×10-4). Several SNPs of the well-known CRC susceptibility loci on chromosome 8 showed differentially distributed genotypes, among them rs16892766 on 8q23.3 (P(Avs.CD)=5.48×10-4, EIF3H), rs10505477 on 8q24.21 (P(Avs.C)=6.08×10-4, CASC8), and rs6983267 also located on 8q24.21 (P(Avs.C)=7.35×10-4, MYC). Also rs3802842 on chromosome 11q23.1 showed significant associations with CRC risk across different comparisons (P(Avs.C)=8.98×10-5, P(Avs.CD)=8.62×10-5, P(ABvs.CD)=1.86×10-5, COLCA1,2).
Table 3. Associations of CRC susceptibility SNPs identified by preceding GWAS.
SNP | Chr. | Gene | Ref. | Distance | P_AvC | P_AvCD | P_ABvC | P_ABvCD | P_BvC | P_BvCD |
---|---|---|---|---|---|---|---|---|---|---|
rs10911251 | 1q25.3 | LAMC1 | 23 | 3492 | 2.03E-01 | 5.59E-01 | 2.26E-01 | 6.08E-01 | 3.38E-01 | 8.03E-01 |
rs6687758 | 1q41 | DUSP10 | 13 | 3769 | 2.40E-01 | 5.32E-01 | 4.24E-01 | 2.57E-01 | 8.29E-01 | 3.59E-01 |
rs6691170 | 1q41 | DUSP10 | 13 | 265 | 2.29E-01 | 1.19E-01 | 3.96E-01 | 3.40E-01 | 9.64E-01 | 9.23E-01 |
rs2373859 | 2p22.1 | SLC8A1 | 20 | 797 | 4.48E-01 | 9.68E-01 | 2.75E-01 | 8.66E-01 | 1.98E-01 | 5.54E-01 |
rs11903757 | 2q32.3 | NABP1/SDPR | 23 | 4909 | 7.26E-01 | 2.92E-01 | 9.02E-01 | 2.66E-01 | 5.29E-01 | 5.34E-01 |
rs10936599 | 3q26.2 | TERC | 13 | 8296 | 5.38E-01 | 8.38E-01 | 7.26E-01 | 9.83E-01 | 9.48E-01 | 5.98E-01 |
rs35509282 | 4q32.2 | FSTL5 | 27 | 1030 | 9.40E-01 | 7.64E-01 | 9.20E-01 | 8.85E-01 | 9.35E-01 | 9.10E-01 |
rs275454 | 5p15.31 | PAPD7 | 20 | 0 | 4.71E-01 | 2.87E-01 | 6.38E-01 | 5.04E-01 | 9.34E-01 | 9.80E-01 |
rs2853668 | 5p15.33 | TERT | 20 | 0 | 5.61E-01 | 8.88E-01 | 4.40E-01 | 8.84E-01 | 5.15E-01 | 8.38E-01 |
rs647161 | 5q31.1 | PITX1/H2AFY | 22 | 0 | 3.80E-03 | 6.79E-02 | 6.78E-03 | 8.45E-02 | 7.56E-02 | 3.31E-01 |
rs1321311 | 6p21.2 | SRSF3/CDKN1A | 19 | 1541 | 2.62E-01 | 9.68E-01 | 1.88E-01 | 7.48E-01 | 3.09E-01 | 5.60E-01 |
rs1525461 | 7q35 | TPK1 | 20 | 3217 | 4.43E-01 | 5.59E-01 | 2.97E-01 | 2.87E-01 | 2.33E-01 | 1.84E-01 |
rs16888522 | 8q23.3 | EIF3H | 20 | 1580 | 8.15E-02 | 7.24E-02 | 2.27E-01 | 2.52E-01 | 9.43E-01 | 6.36E-01 |
rs16892766* | 8q23.3 | TRPS1/EIF3H/UTP23 | 10 | 0 | 7.75E-03 | 5.48E-04 | 3.67E-02 | 3.23E-03 | 4.64E-01 | 5.74E-01 |
rs10505477 | 8q24.21 | CASC8 | 25 | 0 | 6.08E-04 | 3.48E-03 | 5.44E-03 | 5.10E-02 | 3.38E-01 | 8.22E-01 |
rs10808555 | 8q24.21 | CASC8, MYC | 11 | 0 | 3.20E-03 | 2.08E-02 | 1.22E-02 | 1.28E-01 | 2.54E-01 | 9.78E-01 |
rs6983267* | 8q24.21 | CASC8, MYC | 4 | 0 | 7.35E-04 | 3.03E-03 | 5.10E-03 | 4.36E-02 | 3.03E-01 | 9.34E-01 |
rs7014346 | 8q24.21 | CASC8 | 9 | 0 | 2.31E-03 | 1.26E-02 | 4.42E-03 | 3.91E-02 | 9.48E-02 | 5.69E-01 |
rs7837328 | 8q24.21 | CASC8 | 11 | 214 | 7.89E-03 | 9.95E-02 | 4.86E-03 | 1.43E-01 | 4.49E-02 | 6.62E-01 |
rs719725 | 9p24.1 | TPD52L3/UHRF2/GLDC | 6 | 34073 | 3.07E-01 | 5.52E-01 | 2.16E-01 | 3.59E-01 | 2.57E-01 | 2.28E-01 |
rs10795668 | 10p14 | KRT8P16/TCEB1P3 | 10 | 0 | 3.26E-01 | 2.32E-01 | 1.56E-01 | 1.40E-01 | 1.65E-01 | 2.58E-01 |
rs704017 | 10q23.2 | ZMIZ1-AS1 | 29 | 10425 | 6.97E-02 | 2.32E-02 | 1.65E-01 | 2.08E-01 | 9.04E-01 | 7.31E-01 |
rs1035209 | 10q24.2 | ABCC2/MRP2 | 26 | 0 | 4.37E-01 | 5.15E-01 | 7.78E-01 | 9.56E-01 | 6.37E-01 | 4.96E-01 |
rs11196172 | 10q25.2 | TCF7L2 | 29 | 224 | 8.12E-01 | 6.13E-01 | 5.86E-01 | 7.73E-01 | 2.26E-01 | 7.71E-01 |
rs12241008 | 10q25.2 | VTI1A | 28 | 513 | 3.40E-01 | 5.47E-01 | 6.91E-01 | 8.73E-01 | 4.73E-01 | 1.92E-01 |
rs1665650 | 10q26.2 | HSPA12A | 22 | 1647 | 8.07E-01 | 5.09E-01 | 5.57E-01 | 8.96E-01 | 3.95E-01 | 8.20E-01 |
rs1535 | 11q12.2 | FADS2 | 29 | 7243 | 2.21E-01 | 2.31E-02 | 4.43E-01 | 3.77E-02 | 9.01E-01 | 4.78E-01 |
rs174550 | 11q12.2 | FADS1 | 29 | 96 | 2.15E-01 | 2.18E-02 | 3.55E-01 | 3.99E-02 | 9.99E-01 | 4.17E-01 |
rs4246215 | 11q12.2 | FEN1 | 17 | 5531 | 2.13E-01 | 3.07E-02 | 3.40E-01 | 5.52E-02 | 9.63E-01 | 4.76E-01 |
rs3824999 | 11q13.4 | POLD3 | 19 | 1383 | 1.26E-01 | 2.32E-02 | 2.23E-01 | 5.58E-02 | 6.20E-01 | 4.41E-01 |
rs3802842* | 11q23.1 | COLCA1,2 | 7 | 0 | 8.98E-05 | 8.62E-05 | 1.11E-04 | 1.86E-05 | 4.85E-03 | 2.91E-03 |
rs10849432 | 12p13.31 | CD9 | 29 | 1952 | 6.48E-01 | 7.73E-01 | 9.37E-01 | 3.89E-01 | 6.47E-01 | 2.32E-01 |
rs10774214 | 12p13.32 | CCND2 | 22 | 1816 | 3.85E-02 | 4.02E-02 | 6.87E-02 | 3.68E-02 | 3.37E-01 | 4.48E-01 |
rs3217810 | 12p13.32 | CCND2 | 23 | 887 | 4.27E-01 | 1.78E-01 | 2.74E-01 | 2.81E-02 | 2.94E-01 | 6.89E-02 |
rs3217901 | 12p13.32 | CCND2 | 23 | 0 | 3.20E-01 | 3.11E-01 | 2.41E-01 | 2.18E-01 | 1.65E-01 | 3.00E-01 |
rs11169552 | 12q13.12 | ATF1 | 13 | 0 | 5.08E-01 | 3.61E-01 | 2.72E-01 | 4.37E-02 | 2.05E-01 | 4.71E-02 |
rs7136702 | 12q13.12 | LARP4/DIP2B | 13 | 1753 | 7.08E-01 | 5.18E-01 | 5.40E-01 | 2.69E-01 | 3.91E-01 | 2.14E-01 |
rs59336 | 12q24.21 | TBX3 | 23 | 1817 | 5.48E-01 | 6.78E-01 | 7.01E-01 | 7.19E-01 | 9.24E-01 | 7.89E-01 |
rs7315438 | 12q24.21 | TBX3 | 20 | 481 | 1.47E-01 | 2.54E-01 | 2.78E-01 | 1.82E-01 | 5.32E-01 | 4.60E-01 |
rs1957636 | 14q22.2 | BMP4/ATP5C1P1/CDKN3/MIR5580 | 16 | 3869 | 5.53E-01 | 1.18E-01 | 9.52E-01 | 3.92E-01 | 4.20E-01 | 7.87E-01 |
rs4444235* | 14q22.2 | BMP4/ATP5C1P1/CDKN3/MIR5580 | 7 | 0 | 6.22E-01 | 3.63E-01 | 9.03E-01 | 5.40E-01 | 4.58E-01 | 7.19E-01 |
rs11632715 | 15q13.3 | SCG5, GREM1, FMN1 | 16 | 989 | 5.01E-01 | 3.09E-01 | 3.03E-01 | 1.58E-01 | 2.61E-01 | 3.17E-01 |
rs16969681 | 15q13.3 | SCG5, GREM1, FMN1 | 16 | 0 | 5.25E-01 | 2.82E-01 | 5.79E-01 | 2.46E-01 | 8.45E-01 | 3.70E-01 |
rs4779584 | 15q13.3 | SCG5, GREM1, FMN1 | 7 | 0 | 7.37E-02 | 1.03E-02 | 8.98E-02 | 8.61E-03 | 2.63E-01 | 5.18E-02 |
rs9929218 | 16q22.1 | CDH1/ZFP90 | 7 | 0 | 7.72E-01 | 4.05E-01 | 5.89E-01 | 8.57E-01 | 1.52E-01 | 1.06E-01 |
rs12603526 | 17p13.3 | NXN | 29 | 0 | 2.93E-01 | 1.85E-01 | 1.14E-01 | 4.43E-02 | 1.03E-01 | 6.63E-02 |
rs12953717 | 18q21.1 | SMAD7 | 5 | 0 | 4.64E-04 | 2.83E-05 | 4.55E-04 | 8.64E-06 | 3.21E-02 | 5.04E-03 |
rs4464148 | 18q21.1 | SMAD7 | 5 | 82 | 6.75E-02 | 1.80E-01 | 3.80E-02 | 1.11E-01 | 1.08E-01 | 3.22E-01 |
rs4939827* | 18q21.1 | SMAD7 | 7 | 0 | 8.37E-03 | 4.03E-04 | 9.92E-03 | 1.53E-04 | 1.31E-01 | 1.42E-02 |
rs7229639 | 18q21.1 | SMAD7 | 25 | 170 | 2.69E-01 | 9.11E-02 | 1.36E-01 | 1.90E-02 | 1.96E-01 | 4.29E-02 |
rs10411210 | 19q13.11 | RHPN2 | 7 | 0 | 3.94E-03 | 3.28E-04 | 2.07E-02 | 2.66E-03 | 4.64E-01 | 2.91E-01 |
rs2241714 | 19q13.2 | TGFB1, B9D2 | 21 | 12506 | 6.30E-01 | 8.95E-01 | 6.57E-01 | 9.93E-01 | 8.08E-01 | 8.30E-01 |
rs2423279 | 20p12.3 | BMP2/HAO1/FERMT1 | 22 | 10815 | 8.46E-01 | 4.13E-01 | 7.65E-01 | 8.11E-01 | 3.03E-01 | 5.80E-01 |
rs4813802 | 20p12.3 | BMP2/HAO1/FERMT1 | 16 | 0 | 4.40E-02 | 3.49E-02 | 5.59E-02 | 1.88E-02 | 2.09E-01 | 1.48E-01 |
rs961253 | 20p12.3 | BMP2/HAO1/FERMT1 | 7 | 0 | 1.71E-01 | 1.12E-01 | 6.02E-01 | 4.29E-01 | 4.78E-01 | 3.46E-01 |
rs4925386 | 20q13.33 | LAMA5 | 13 | 53263 | 9.93E-02 | 9.76E-02 | 1.25E-02 | 4.64E-03 | 7.60E-03 | 1.11E-03 |
P-values are uncorrected and P-values <0.05 (5.00E-02) are given in bold.
P-values <0.00089 (8.90E-04) are given in bold and are underlined.
Rs number followed by * indicates CRC SNP with experimentally confirmed functional relevance [52].
Several SNPs previously associated not only with risk of CRC but also with risk of colorectal adenoma exhibited borderline significant P-values in comparisons B vs. C and B vs. CD (rs7837328, P(Bvs.C)=4.49×10-2; rs3802842, P(Bvs.C)=4.85×10-3, P(Bvs.CD)=2.91×10-3; rs4939827, P(Bvs.CD)=1.42×10-2; rs4925386, P(Bvs.C)=7.60×10-3, P(Bvs.CD)=1.11×10-3).
DISCUSSION
Most published GWAS are based on single marker analysis in combination with correction for multiple testing, a strategy which has been shown to suffer both from unnecessarily low power and a relatively high risk of false positive detections in case of complex traits [38]. Reduced statistical power reflects one aspect of missing heritability in GWAS [1]. Simulation studies based on real SNP data provided evidence that model selection strategies may outperform multiple testing in detecting causal SNPs [39] while controlling the type I error rate of false detections and therefore, should be used to complement (standard) analysis of GWAS.
We performed – to our best knowledge – the first GWAS of CRC in an Austrian cohort including 1060 CRC cases, 689 patients with advanced colorectal adenomas, 928 colonoscopy-negative controls, and additional genotype data of 3439 population-based KORA controls from southern Germany. Model selection analysis was based on MOSGWA [39], a bioinformatical tool for analysis of GWAS using the FDR controlling modification of BIC, mBIC2, which has been shown to have certain optimality properties with respect to the number of missclassifications. Due to its fixed selection criterion, MOSGWA requires no parameter tuning like LASSO-based approaches [40]. In simulation studies [39], MOSGWA exceeded the performance of competing approaches and when re-analyzing data of complex diseases from the Wellcome Trust Case-Control Consortium [41] several SNPs could be identified, which were not detected by other algorithms, but were later confirmed by independent studies [39].
In this study, MOSGWA selected models for different case-control comparisons, including between one and 14 SNPs. The theoretically well-founded advantage of the model selection approach is its larger power to detect candidate SNPs compared to single marker tests while at the same time strictly controlling the false discovery rate. Among all four studied contrasts, single marker tests yielded only one significant SNP (rs17659990, P=5.43×10-9, DOCK3) at the usually recommended genome-wide significance level for the comparison AB vs. CD when considering the entire study population. Rs17659990 is an intronic variant of dedicator of cytokinesis 3 (DOCK3) gene, a gene specifically expressed in the central nervous system, that was associated with an attention deficit hyperactivity disorder-like phenotype [42]. DOCK3, also referred to as modifier of cell adhesion (MOCA), was also shown to be an inhibitor of Wnt/beta-catenin signaling [43], a pathway known to play an important role in colorectal carcinogenesis [44]. Moreover, multiple studies reported DOCK3 to be implicated in cancer cell invasion and migration (as recently reviewed [45]). The SNP rs17659990 was also included in the model A vs. CD (model size 7).
For the comparison AB vs. CD, MOSGWA selected a model including 14 SNPs, including apart from rs17659990 another borderline significant SNP (rs7742915, P=8.52×10-8, BTBD9). Rs7742915 of BTB domain containing 9 (BTBD9) gene, a locus encoding a BTB/POZ domain-containing protein, is involved in protein-protein interactions. Genetic variation of BTBD9 was associated with susceptibility to Restless Legs Syndrome [46]. Aside from rs17659990 and rs7742915, further 12 variants with marginal P-values (P>5.0×10-8) were selected for AB vs. CD comparison including rs16944613 (P=1.49×10-7, CRTC3), rs13129679 (P=2.38×10-7, RNF4), and rs12953717 (P=3.00×10-7, SMAD7). Rs12953717 located in intron 3 of SMAD7 gene has been previously linked to CRC risk by two GWAS [5, 9] and was subsequently confirmed as CRC susceptibility variant [47, 48] as recently discussed by Stolfi et al. [49]. SMAD7 is a negative regulator of transforming growth factor-β signaling. Depending on single marker tests only, SMAD7 rs12953717 may not have been regarded as a candidate SNP in our study.
Interestingly, rs1912804 of WW domain-containing oxidoreductase (WWOX) gene emerged in this study of CRC (A vs. C). Defects in this tumor suppressor gene were associated with multiple cancers [50] and altered WWOX expression was observed in tissues of CRC [51]. Recently, WWOX was shown to be involved in double-strand break repair [50]. Although defects in mismatch repair (MMR) genes influence both, hereditary and sporadic CRCs (recently reviewed [52]), no CRC risk SNPs annotating to MMR genes were identified by GWAS thus far.
In this study, we used model selection as a tool to detect SNPs associated with CRC, not aiming at the identification of a model which can be used later for prediction. Therefore, we do not provide model coefficients obtained by MOSGWA but only report the detected SNPs. This is crucial to understand the principle and function of model selection as tool for analysis of GWAS. Considering the identification of disease associated SNPs as a high-dimensional classification problem, SNPs can be classified as either associated or not associated with the trait. Theoretical results showed that performing model selection using the FDR controlling mBIC2 selection criterion yields a classification procedure which asymptotically minimizes the misclassification rate. The expected proportion of false positive SNPs is controlled at a level which decreases with sample size and which will be for this study below 5%. Therefore, about one or two false positive detections can be expected among the reported 14 SNPs in model AB vs. CD.
CRC SNPs identified by preceding GWAS were tested in a hypothesis-driven approach and a number of these SNPs exhibited relevant differences between cases and controls in our data set. Several risk variants were replicated in this study for the first time in the Austrian population. The strongest associations were observed for SNPs annotating to the following genes: SMAD7, RHPN2, EIF3H, CASC8, MYC, and COLCA1,2. Functional relevance was experimentally confirmed for only five common CRC risk loci [52]. Four of them (rs16892766, EIF3H; rs6983267, MYC; rs3802842, COLCA2 and rs4939827, SMAD7) also play a role in our study population.
Sporadic CRCs usually arise from premalignant lesions (adenoma-carcinoma sequence), thus high-risk adenomas impact CRC risk [53, 54]. Removal of advanced adenomas during colonoscopy reduces mortality from CRC [55]. We included advanced colorectal adenomas into this study because these precursors are important targets for CRC prevention. Previously unreported rs7944251 of FAT tumor suppressor homolog 3 (FAT3) was associated with reduced risk of advanced adenoma (OR=0.66, P=3.97×10-7) and the SNP was also selected when comparing advanced adenomas with the combined control group (B vs. CD). All previously unreported candidate SNPs demand replication in independent CRC cohorts.
A strength of this study is the dual approach to analyze genotype distributions in a genome-wide SNP dataset including CRC cases, advanced adenomas and controls. CORSA controls (C) received a complete colonoscopy within B-PREDICT screening and were known to be free of colorectal polyps and CRC. Sometimes, these colonoscopy-negative controls are also referred to as “super-controls” [12]. A recent study indicated that exclusion of controls with a family history of CRC and of controls with record of colorectal adenomas can increase power [56]. To our knowledge, this is the first GWAS of CRC investigating Austrian CRC cases and premalignant colorectal tumors. However, limitations of the study are the limited sample size, especially in the subgroup of advanced adenomas as well as limited availability of environmental data of CRC cases impeding stratification analysis for environmental risk factors. To increase statistical power, individual level genotype data of additional controls (KORA) were included in the study. Because CORSA recruitment is ongoing, further Austrian CRC cases will be genotyped and integrated into the analysis to investigate population specific SNP signatures of CRC risk. Meta-analysis of GWAS present a powerful strategy to enhance the power of identifying weak genetic associations with disease phenotype, but is often complicated by between-study heterogeneity. Precision gained by combination of datasets may be spurious due to different study designs, divergent LD structures, different patterns of correlated phenotypes or dissimilar gene-environment interactions across populations [57, 58].
The application of CRC SNP signatures to improve screening decisions is presently impeded by the fact that single risk variants account only for little heritability and thereby explain a small increment of risk. We hypothesize that potentially disease relevant variants not reaching genome-wide significance may explain a substantial part of missing heritability and are worth exploration and follow-up. Also epigenetic alterations play an important role in colorectal carcinogenesis [59]. The combination of genetic and epigenetic biomarkers to a multi-marker panel considering also environmental risk factors could be suited to complement present screening strategies and for instance be applied after a positive fecal occult blood test, but prior to an invasive colonoscopy. Genetic risk variants are ideal candidates for the development of minimal-invasive and cost-effective biomarker tests enabling personal risk profiling. In the near future, management of CRC will increasingly focus on personalized screening and treatment strategies aiming at early detection and prevention of disease. A combination of single marker tests and model selection in high dimensions may facilitate the identification of marker candidates otherwise not detected due to stringent penalties for multiple testing.
MATERIALS AND METHODS
Study population
In this GWAS, 2677 individuals of our ongoing Colorectal Cancer Study of Austria (CORSA) [60, 61] were genotyped including 1060 CRC cases, 689 patients with advanced adenomas and 928 colonoscopy-negative controls. CRC cases were patients with histologically confirmed, sporadic CRC. CRC cases with clinical record of inflammatory bowel disease (IBD) were excluded from the study. Advanced adenomas included adenomatous villous, adenomatous tubulovillous and tubular polyps larger than 1cm in diameter. All controls received a complete colonoscopy and exhibited no pathological findings.
From June 2003 to November 2012 CORSA participants had been recruited in four hospitals in the province Burgenland (Oberpullendorf, Kittsee, Oberwart and Güssing), Austria, at the Medical University of Vienna (Department of Surgery), and the Medical University of Graz (Department of Internal Medicine).
To augment statistical power, individual level genotype data of 3439 additional control individuals from the German “Cooperative Health Research in the Region of Augsburg” (KORA) platform were included in this study [62]. Population-based controls from the studies S4 and F4 were integrated. To ensure exclusion of CRC cases from the KORA control set, all individuals with evidence of malignant diseases were removed from the dataset. In total, 6116 individuals (1749 colorectal tumors and 4367 controls) were included in this study.
Ethics statement
Written informed consent was obtained from all participants of CORSA. The study was approved by the ethical review committee of the Medical University of Vienna (MUW, EK Nr. 703/2010) and the “Ethikkommission Burgenland” (KRAGES, 33/2010). Conduct of the study followed the approved study protocol and all methods were performed in accordance with the relevant guidelines and regulations. Approval for the use of KORA data was obtained from the KORA-Study Group (K072/13).
Genotyping
Genomic DNA was purified from peripheral blood following the QIAamp DNA Blood Midi Spin Protocol (QIAGEN, Valencia, CA). Genotyping was performed using population-optimized Axiom Genome-Wide CEU 1 Arrays (Affymetrix, Santa Clara, CA) analyzing 587,532 SNPs. Array processing was performed at the Institute of Human Genetics, Helmholtz Center Munich. KORA samples were genotyped on the same array type.
Statistical analysis
Extensive quality control and genotype calling was performed with Affymetrix Genotyping Console Software 4.1.3.840 (www.affymetrix.com). 2469 genotyped CORSA subjects survived QC filtering (Dish QC >0.82, call rate >97.5%).
Inclusion criteria for SNPs eligible for downstream analysis were a minor allele frequency (MAF) >1%, Hardy-Weinberg equilibrium (HWE) P-value cut-off >1.00×10-8, a SNP call rate >97.5%, and >95% calls per individual. 271 SNPs were discarded due to showing significant difference between the CORSA and KORA control group (P-values smaller than 1.00×10-7 in a simple Fisher exact test comparing controls as suggested in [63]). After filtering, 492,217 SNPs remained for which imputation of missing genotypes was performed using Beagle software v.4.0 r1274 [64].
The primary aim of the study was to find SNPs which are associated with CRC or with advanced adenomas, respectively. To this end we performed traditional single marker based analysis as well as a more involved model selection based approach. Single marker analysis was performed with PLINK 1.9 beta 3 (www.cog-genomics.org/plink2) [65]. We report P-values of CAT as well as from a logistic regression model including the factors age and the leading four principal components from a principal component analysis (PCA) which was used to adjust for population structure [66]. A PCA plot of the first four principal components plotted against each other is provided in Supplementary Figure 1. Genotype cluster plots of all reported SNPs underwent visual inspection.
For model selection analysis, the software package MOSGWA was applied (http://mosgwa.sourceforge.net) [39] using multi-marker logistic regression models including again the factors age and the leading four principal components as covariates which were not under selection. In addition to the genome-wide analysis we inspected specifically 56 SNPs which were previously reported in the GWAS literature to be involved in colorectal carcinogenesis. For SNPs not represented on the array, suitable proxies were identified and tested.
SUPPLEMENTARY MATERIALS FIGURES AND TABLES
Acknowledgments
We thank the Biobank Graz of the Medical University of Graz for contribution of samples.
We thank Peter Lichtner and Gertrud Eckstein (Helmholtz Center Munich, Germany) for genotyping.
We kindly thank Azita Deutinger-Permoon (KRAGES, Austria) and her co-workers for supporting CORSA recruitment.
Footnotes
CONFLICTS OF INTEREST
The authors declare no potential conflict of interest.
FUNDING
This study was funded by FFG BRIDGE (grant 829675, to Andrea Gsur), the “Herzfelder’sche Familienstiftung” (grant to Andrea Gsur) and was supported by COST Action BM1206.
The KORA research platform (KORA, Cooperative Research in the Region of Augsburg) was initiated and financed by the Helmholtz Zentrum München - German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research and by the State of Bavaria. Furthermore, KORA research was supported within the Munich Center of Health Sciences (MC Health), Ludwig-Maximilians-Universität, as part of LMUinnovativ.
The KORA-Study Group consists of A. Peters (speaker), J. Heinrich, R. Holle, R. Leidl, C. Meisinger, K. Strauch, and their co-workers, who are responsible for the design and conduct of the KORA studies.
Author contributions
Study design: AG, PH. Patient recruitment: KM, GL, JKH, AS, TBH, MB, JK, SS, AGe, KR, AG, PH, SBr. Laboratory work: PH, SBr. Data analysis: FF, MH, ED, AB, SBu, PH. Revised the manuscript: HSF. KORA controls: KORA-Study Group, MW, TM, KS, JL, CG. Wrote the paper: PH, AG.
REFERENCES
- 1.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hahn MM, de Voer RM, Hoogerbrugge N, Ligtenberg MJL, Kuiper RP, van Kessel AG. The genetic heterogeneity of colorectal cancer predisposition - guidelines for gene discovery. Cell Oncol. 2016;39:491–510. doi: 10.1007/s13402-016-0284-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet. 2012;13:135–145. doi: 10.1038/nrg3118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, Barclay E, Lubbe S, Martin L, et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet. 2007;39:984–988. doi: 10.1038/ng2085. [DOI] [PubMed] [Google Scholar]
- 5.Broderick P, Carvajal-Carmona L, Pittman AM, Webb E, Howarth K, Rowan A, Lubbe S, Spain S, Sullivan K, Fielding S, Jaeger E, Vijayakrishnan J, Kemp Z, et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet. 2007;39:1315–1317. doi: 10.1038/ng.2007.18. [DOI] [PubMed] [Google Scholar]
- 6.Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM, Prendergast J, Olschwang S, Chiang T, Crowdy E, Ferretti V, Laflamme P, Sundararajan S, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007;39:989–994. doi: 10.1038/ng2089. [DOI] [PubMed] [Google Scholar]
- 7.Houlston RS, Webb E, Broderick P, Pittman AM, Di Bernardo MC, Lubbe S, Chandler I, Vijayakrishnan J, Sullivan K, Penegar S, Carvajal-Carmona L, Howarth K, Jaeger E, et al. COGENT Study, Colorectal Cancer Association Study Consortium, CoRGI Consortium Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat Genet. 2008;40:1426–1435. doi: 10.1038/ng.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jaeger E, Webb E, Howarth K, Carvajal-Carmona L, Rowan A, Broderick P, Walther A, Spain S, Pittman A, Kemp Z, Sullivan K, Heinimann K, Lubbe S, et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nat Genet. 2008;40:26–28. doi: 10.1038/ng.2007.41. [DOI] [PubMed] [Google Scholar]
- 9.Tenesa A, Farrington SM, Prendergast JG, Porteous ME, Walker M, Haq N, Barnetson RA, Theodoratou E, Cetnarskyj R, Cartwright N, Semple C, Clark AJ, Reid FJ, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet. 2008;40:631–637. doi: 10.1038/ng.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tomlinson IP, Webb E, Carvajal-Carmona L, Broderick P, Howarth K, Pittman AM, Spain S, Lubbe S, Walther A, Sullivan K, Jaeger E, Fielding S, Rowan A, et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat Genet. 2008;40:623–630. doi: 10.1038/ng.111. [DOI] [PubMed] [Google Scholar]
- 11.Berndt SI, Potter JD, Hazra A, Yeager M, Thomas G, Makar KW, Welch R, Cross AJ, Huang WY, Schoen RE, Giovannucci E, Chan AT, Chanock SJ, et al. Pooled analysis of genetic variation at chromosome 8q24 and colorectal neoplasia risk. Hum Mol Genet. 2008;17:2665–2672. doi: 10.1093/hmg/ddn166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tenesa A, Dunlop MG. New insights into the aetiology of colorectal cancer from genome-wide association studies. Nat Rev Genet. 2009;10:353–358. doi: 10.1038/nrg2574. [DOI] [PubMed] [Google Scholar]
- 13.Houlston RS, Cheadle J, Dobbins SE, Tenesa A, Jones AM, Howarth K, Spain SL, Broderick P, Domingo E, Farrington S, Prendergast JG, Pittman AM, Theodoratou E, et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat Genet. 2010;42:973–977. doi: 10.1038/ng.670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hutter CM, Slattery ML, Duggan DJ, Muehling J, Curtin K, Hsu L, Beresford SA, Rajkovic A, Sarto GE, Marshall JR, Hammad N, Wallace R, Makar KW, et al. Characterization of the association between 8q24 and colon cancer: gene-environment exploration and meta-analysis. BMC Cancer. 2010;10:1–15. doi: 10.1186/1471-2407-10-670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kocarnik JD, Hutter CM, Slattery ML, Berndt SI, Hsu L, Duggan DJ, Muehling J, Caan BJ, Beresford SA, Rajkovic A, Sarto GE, Marshall JR, Hammad N, et al. Characterization of 9p24 risk locus and colorectal adenoma and cancer: gene-environment interaction and meta-analysis. Cancer Epidemiol Biomarkers Prev. 2010;19:3131–3139. doi: 10.1158/1055-9965.EPI-10-0878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tomlinson IPM, Carvajal-Carmona LG, Dobbins SE, Tenesa A, Jones AM, Howarth K, Palles C, Broderick P, Jaeger EEM, Farrington S, Lewis A, Prendergast JGD, Pittman AM, et al. Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet. 2011;7:e1002105. doi: 10.1371/journal.pgen.1002105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Liu L, Zhou C, Zhou L, Peng L, Li D, Zhang X, Zhou M, Kuang P, Yuan Q, Song X, Yang M. Functional FEN1 genetic variants contribute to risk of hepatocellular carcinoma, esophageal cancer, gastric cancer and colorectal cancer. Carcinogenesis. 2012;33:119–123. doi: 10.1093/carcin/bgr250. [DOI] [PubMed] [Google Scholar]
- 18.Carvajal-Carmona LG, Cazier JB, Jones AM, Howarth K, Broderick P, Pittman A, Dobbins S, Tenesa A, Farrington S, Prendergast J, Theodoratou E, Barnetson R, Conti D, et al. Fine-mapping of colorectal cancer susceptibility loci at 8q23.3, 16q22.1 and 19q13.11: refinement of association signals and use of in silico analysis to suggest functional variation and unexpected candidate target genes. Hum Mol Genet. 2011;20:2879–2888. doi: 10.1093/hmg/ddr190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dunlop MG, Dobbins SE, Farrington SM, Jones AM, Palles C, Whiffin N, Tenesa A, Spain S, Broderick P, Ooi LY, Domingo E, Smillie C, Henrion M, et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat Genet. 2012;44:770–776. doi: 10.1038/ng.2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Peters U, Hutter CM, Hsu L, Schumacher FR, Conti DV, Carlson CS, Edlund CK, Haile RW, Gallinger S, Zanke BW, Lemire M, Rangrej J, Vijayaraghavan R, et al. Meta-analysis of new genome-wide association studies of colorectal cancer risk. Hum Genet. 2012;131:217–234. doi: 10.1007/s00439-011-1055-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li Y, Huang J, Amos CI. Genetic association analysis of complex diseases incorporating intermediate phenotype information. PLoS One. 2012;7:e46612. doi: 10.1371/journal.pone.0046612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jia WH, Zhang B, Matsuo K, Shin A, Xiang YB, Jee SH, Kim DH, Ren Z, Cai Q, Long J, Shi J, Wen W, Yang G, et al. Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. Nat Genet. 2013;45:191–196. doi: 10.1038/ng.2505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Peters U, Jiao S, Schumacher FR, Hutter CM, Aragaki AK, Baron JA, Berndt SI, Bezieau S, Brenner H, Butterbach K, Caan BJ, Campbell PT, Carlson CS, et al. Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology. 2013;144:799–807. doi: 10.1053/j.gastro.2012.12.020. e724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kantor ED, Hutter CM, Minnier J, Berndt SI, Brenner H, Caan BJ, Campbell PT, Carlson CS, Casey G, Chan AT, Chang-Claude J, Chanock SJ, Cotterchio M, et al. Gene-environment interaction involving recently identified colorectal cancer susceptibility loci. Cancer Epidemiol Biomarkers Prev. 2014;23:1824–1833. doi: 10.1158/1055-9965.EPI-14-0062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang B, Jia WH, Matsuda K, Kweon SS, Matsuo K, Xiang YB, Shin A, Jee SH, Kim DH, Cai Q, Long J, Shi J, Wen W, et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat Genet. 2014;46:533–542. doi: 10.1038/ng.2985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Whiffin N, Hosking FJ, Farrington SM, Palles C, Dobbins SE, Zgaga L, Lloyd A, Kinnersley B, Gorman M, Tenesa A, Broderick P, Wang Y, Barclay E, et al. Identification of susceptibility loci for colorectal cancer in a genome-wide meta-analysis. Hum Mol Genet. 2014;23:4729–4737. doi: 10.1093/hmg/ddu177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schmit SL, Schumacher FR, Edlund CK, Conti DV, Raskin L, Lejbkowicz F, Pinchev M, Rennert HS, Jenkins MA, Hopper JL, Buchanan DD, Lindor NM, Le Marchand L, et al. A novel colorectal cancer risk locus at 4q32.2 identified from an international genome-wide association study. Carcinogenesis. 2014;35:2512–2519. doi: 10.1093/carcin/bgu148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang H, Burnett T, Kono S, Haiman CA, Iwasaki M, Wilkens LR, Loo LW, Van Den Berg D, Kolonel LN, Henderson BE, Keku TO, Sandler RS, Signorello LB, et al. Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A. Nat Commun. 2014;5:4613. doi: 10.1038/ncomms5613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Carethers JM, Jung BH. Genetics and genetic biomarkers in sporadic colorectal cancer. Gastroenterology. 2015;149:1177–1190. doi: 10.1053/j.gastro.2015.06.047. e1173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Al-Tassan NA, Whiffin N, Hosking FJ, Palles C, Farrington SM, Dobbins SE, Harris R, Gorman M, Tenesa A, Meyer BF, Wakil SM, Kinnersley B, Campbell H, et al. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer. Sci Rep. 2015;5:10442. doi: 10.1038/srep10442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zuber V, Duarte Silva AP, Strimmer K. A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies. BMC Bioinformatics. 2012;13:284. doi: 10.1186/1471-2105-13-284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Fridley BL. Bayesian variable and model selection methods for genetic association studies. Genet Epidemiol. 2009;33:27–37. doi: 10.1002/gepi.20353. [DOI] [PubMed] [Google Scholar]
- 33.He Q, Lin DY. A variable selection method for genome-wide association studies. Bioinformatics. 2011;27:1–8. doi: 10.1093/bioinformatics/btq600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wu Z, Zhao H. Statistical power of model selection strategies for genome-wide association studies. PLoS Genet. 2009;5:e1000582. doi: 10.1371/journal.pgen.1000582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Frommlet F, Bogdan M, Ramsey D. Phenotypes and Genotypes: The Search for Influential Genes. Springer Monography, Computational Biology. 2016;18 [Google Scholar]
- 36.Schwarz G. Estimating the dimension of a model. Ann Statist. 1978:461–464. [Google Scholar]
- 37.Frommlet F, Bogdan M, Chakrabarti A. Asymptotic Bayes optimality under sparsity of selection rules for general priors. Ann Statist. 2011;39:1551–1579. [Google Scholar]
- 38.Frommlet F, Ruhaltinger F, Twaróg P, Bogdan M. Modified versions of Bayesian Information Criterion for genome-wide association studies. Comput Stat Data Anal. 2012;56:1038–1051. [Google Scholar]
- 39.Dolejsi E, Bodenstorfer B, Frommlet F. Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian Information Criterion. PLoS One. 2014;9:e103322. doi: 10.1371/journal.pone.0103322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hoffman GE, Logsdon BA, Mezey JG. PUMA: A unified framework for penalized multiple regression analysis of GWAS data. PLoS Comput Biol. 2013;9:e1003101. doi: 10.1371/journal.pcbi.1003101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.The Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.de Silva MG, Elliott K, Dahl HH, Fitzpatrick E, Wilcox S, Delatycki M, Williamson R, Efron D, Lynch M, Forrest S. Disruption of a novel member of a sodium/hydrogen exchanger family and DOCK3 is associated with an attention deficit hyperactivity disorder-like phenotype. J Med Genet. 2003;40:733–740. doi: 10.1136/jmg.40.10.733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Caspi E, Rosin-Arbesfeld R. A novel functional screen in human cells identifies MOCA as a negative regulator of Wnt signaling. Mol Biol Cell. 2008;19:4660–4674. doi: 10.1091/mbc.E07-10-1046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Novellasdemunt L, Antas P, Li VSW. Targeting Wnt signaling in colorectal cancer. A review in the theme: Cell signaling: Proteins, pathways and mechanisms. Am J Physiol Cell Physiol. 2015;309:C511–C521. doi: 10.1152/ajpcell.00117.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gadea G, Blangy A. Dock-family exchange factors in cell migration and disease. Eur J Cell Biol. 2014;93:466–477. doi: 10.1016/j.ejcb.2014.06.003. [DOI] [PubMed] [Google Scholar]
- 46.Winkelmann J, Schormair B, Lichtner P, Ripke S, Xiong L, Jalilzadeh S, Fulda S, Putz B, Eckstein G, Hauk S, Trenkwalder C, Zimprich A, Stiasny-Kolster K, et al. Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nat Genet. 2007;39:1000–1006. doi: 10.1038/ng2099. [DOI] [PubMed] [Google Scholar]
- 47.Thompson CL, Plummer SJ, Acheson LS, Tucker TC, Casey G, Li L. Association of common genetic variants in SMAD7 and risk of colon cancer. Carcinogenesis. 2009;30:982–986. doi: 10.1093/carcin/bgp086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Slattery ML, Herrick J, Curtin K, Samowitz W, Wolff RK, Caan BJ, Duggan D, Potter JD, Peters U. Increased risk of colon cancer associated with a genetic polymorphism of SMAD7. Cancer Res. 2010;70:1479–1485. doi: 10.1158/0008-5472.CAN-08-1792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Stolfi C, Marafini I, De Simone V, Pallone F, Monteleone G. The dual role of Smad7 in the control of cancer growth and metastasis. Int J Mol Sci. 2013;14:23774. doi: 10.3390/ijms141223774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Abu-Odeh M, Hereema NA, Aqeilan RI. WWOX modulates the ATR-mediated DNA damage checkpoint response. Oncotarget. 2016;7:4344–4355. doi: 10.18632/oncotarget.6571. https://doi.org/10.18632/oncotarget.6571 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kara M, Yumrutas O, Ozcan O, Celik OI, Bozgeyik E, Bozgeyik I, Tasdemir S. Differential expressions of cancer-associated genes and their regulatory miRNAs in colorectal carcinoma. Gene. 2015;567:81–86. doi: 10.1016/j.gene.2015.04.065. [DOI] [PubMed] [Google Scholar]
- 52.Peters U, Bien S, Zubair N. Genetic architecture of colorectal cancer. Gut. 2015;64:1623–1636. doi: 10.1136/gutjnl-2013-306705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Winawer SJ, Zauber AG, Ho MN, O’Brien MJ, Gottlieb LS, Sternberg SS, Waye JD, Schapiro M, Bond JH, Panish JF, Ackroyd F, Shike M, Kurtz RC, et al. Prevention of colorectal cancer by colonoscopic polypectomy. N Engl J Med. 1993;329:1977–1981. doi: 10.1056/NEJM199312303292701. [DOI] [PubMed] [Google Scholar]
- 54.Saini SD, Kim HM, Schoenfeld P. Incidence of advanced adenomas at surveillance colonoscopy in patients with a personal history of colon adenomas: a meta-analysis and systematic review. Gastrointest Endosc. 2006;64:614–626. doi: 10.1016/j.gie.2006.06.057. [DOI] [PubMed] [Google Scholar]
- 55.Zauber AG, Winawer SJ, O’Brien MJ, Lansdorp-Vogelaar I, van Ballegooijen M, Hankey BF, Shi W, Bond JH, Schapiro M, Panish JF, Stewart ET, Waye JD. Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths. N Engl J Med. 2012;366:687–696. doi: 10.1056/NEJMoa1100370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Lemire M, Qu C, Loo LW, Zaidi SH, Wang H, Berndt SI, Bezieau S, Brenner H, Campbell PT, Chan AT, Chang-Claude J, Du M, Edlund CK, et al. A genome-wide association study for colorectal cancer identifies a risk locus in 14q23.1. Hum Genet. 2015;134:1249–1262. doi: 10.1007/s00439-015-1598-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ioannidis JP, Patsopoulos NA, Evangelou E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS One. 2007;2:e841. doi: 10.1371/journal.pone.0000841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Bagos PG. Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis. Stat Appl Genet Mol Biol. 2013;12:285–308. doi: 10.1515/sagmb-2012-0016. [DOI] [PubMed] [Google Scholar]
- 59.Wang X, Kuang YY, Hu XT. Advances in epigenetic biomarker research in colorectal cancer. World J Gastroenterol. 2014;20:4276–4287. doi: 10.3748/wjg.v20.i15.4276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Hofer P, Baierl A, Feik E, Fuhrlinger G, Leeb G, Mach K, Holzmann K, Micksche M, Gsur A. MNS16A tandem repeats minisatellite of human telomerase gene: a risk factor for colorectal cancer. Carcinogenesis. 2011;32:866–871. doi: 10.1093/carcin/bgr053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hofer P, Baierl A, Bernhart K, Leeb G, Mach K, Micksche M, Gsur A. Association of genetic variants of human telomerase with colorectal polyps and colorectal cancer risk. Mol Carcinog. 2012;51:E176–182. doi: 10.1002/mc.21911. [DOI] [PubMed] [Google Scholar]
- 62.Wichmann HE, Gieger C, Illig T, MONICA/KORA Study Group KORA-gen--Resource for population genetics, controls and a broad spectrum of disease phenotypes. Gesundheitswesen. 2005;67:S26–S30. doi: 10.1055/s-2005-858226. [DOI] [PubMed] [Google Scholar]
- 63.Sinnott JA, Kraft P. Artifact due to differential error when cases and controls are imputed from different platforms. Hum Genet. 2012;131:111–119. doi: 10.1007/s00439-011-1054-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Med Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:1–16. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.