Abstract
Background and Aims:
Susceptibility genes and the underlying mechanisms for the majority of risk loci identified by genome-wide association studies (GWAS) for colorectal cancer (CRC) risk remain largely unknown. We conducted a transcriptome-wide association study (TWAS) to identify putative susceptibility genes.
Methods:
Gene-expression prediction models were built using transcriptome and genetic data from the 284 normal transverse colon tissues of European descendants from the Genotype-Tissue Expression (GTEx), and model performance was evaluated using data from The Cancer Genome Atlas (TCGA, n = 355). We applied the gene-expression prediction models and GWAS data to evaluate associations of genetically predicted gene-expression with CRC risk in 58,131 CRC cases and 67,347 controls of European ancestry. Dual-luciferase reporter assays and knockdown experiments in CRC cells and tumor xenografts were conducted.
Results:
We identified 25 genes associated with CRC risk at a Bonferroni-corrected threshold of P < 9.1 × 10−6, including genes in four novel loci, PYGL (14q22.1), RPL28 (19q13.42), CAPN12 (19q13.2), MYH7B (20q11.22), and MAP1L3CA (20q11.22). In nine known GWAS-identified loci, we uncovered nine genes that have not been previously reported, whereas four genes remained statistically significant after adjusting for the lead risk variant of the locus. Through colocalization analysis in GWAS loci, we additionally identified 12 putative susceptibility genes that were supported by TWAS analysis at P < 0.01. We showed that risk allele of the lead risk variant rs1741640 affected the promoter activity of CABLES2. Knockdown experiments confirmed that CABLES2 plays a vital role in colorectal carcinogenesis.
Conclusion:
Our study reveals new putative susceptibility genes and provides new insight into the biological mechanisms underlying CRC development.
Keywords: Colorectal cancer, susceptibility genes, TWAS, CABLES2
Lay Summary
This large-scale transcriptome-wide association study (TWAS) has revealed 25 putative susceptibility genes, including five in novel loci and an additional nine in GWAS loci that have not been previously reported.
Introduction
Genetic factors play an important role in the etiology of both sporadic and familial colorectal cancer (CRC). Multiple CRC susceptibility genes, including APC, MUTYH, MLH1, PMS2, MSH2, MSH6, PTEN, SMAD4, BMPR1A, and MYH, have been identified as being responsible for family CRC syndromes, such as Lynch Syndrome1–3. In addition to these known high-penetrance genes, approximately 150 common genetic variants have been found to be associated with CRC risk through genome-wide association studies (GWAS)4–21. However, together, these variants explain only a small proportion of the familial relative risk of CRC18–21. Thus, additional loci for genetic susceptibility for CRC remain to be identified. Furthermore, the target genes at most of the GWAS-identified risk loci remain largely unclear.
Most GWAS-identified risk variants are located in non-coding or intergenic regions without clear evidence of regulatory epigenetic signals. However, it has been shown that many functional variants in strong linkage disequilibrium (LD) with these variants play regulatory roles in gene expression22–26. Large genomics consortia, including the Genotype-Tissue Expression (GETx, see URLs) and The Cancer Genome Atlas (TCGA, see URLs) projects, have generated massive quantities of high-dimensional genomics data, including data from matched genetic and transcriptome profiles from hundreds of normal and tumor tissues of the colon and rectum. These data are valuable for expression quantitative trait loci (eQTL) analyses, which evaluate the association of SNP genotypes with gene-expression levels. Using an eQTL approach, previous studies, together with our work, have revealed approximately 30 putative susceptibility genes linked to GWAS-identified SNPs (refer to index SNPs) or lead SNPs (refer to SNPs with the best association signals)27–31. In addition to the eQTL approach, the advance of chromatin-chromatin interaction technology has produced large amounts of chromatin-chromatin interaction data in various normal and cancer-cell lines, including CRC cell lines32–35.
Recently, transcriptome-wide association studies (TWAS)36 have been initiated to systematically investigate the transcriptome for disease risk. First, prediction models are built to predict gene expression with cis-SNPs using a reference transcriptome. Then they are applied to GWAS data to evaluate the association of the predicted gene expression with disease risk. A recently performed TWAS, using approximately 27,000 cases and controls, reported several genes of which the predicted expressions were associated with CRC risk37, 38. However, the sample size of that study was relatively small, and several reported associations were not statistically significant after Bonferroni-correction. Herein, we report results from a large CRC TWAS, conducted among 125,478 subjects of European descent, to comprehensively search for susceptibility genes.
Methods
GWAS data from study cohorts
The study utilized GWAS summary statistics data from 125,478 individuals of European descent (58,131 CRC cases and 67,347 controls) from six cohort studies. The cohort studies include GECCO (cases: 11,958, controls: 14,740), CORECT (cases: 22,911, controls: 14,311), CROC (cases: 12,007, controls: 12,000), CONE (cases: 4,439, controls: 4,115), CORSA (cases: 1,460, controls: 774), and UKBio (cases: 5,356, controls: 21,407). All participants provided written informed consent, and each study was approved by the relevant research ethics committee or institutional review boards. Details on sample selection and matching, sample numbers and demographic characteristics of study participants have been described previously18.
Gene-expression model building using GTEx
The recent GTEx release 8 includes RNA sequencing data of transverse colon samples obtained from 284 subjects and whole genome sequencing data from these subjects. Gene-expression and WGS data from these samples were processed according to the GTEx protocol, as described in our previous work39–41. For gene-expression data, expression levels of each gene have been measured using reads per kilobase per million (RPKM) units from RNA-SeQC. We performed data quality control (QC) and normalization processing by filtering lowly-expressed genes, log2 transforming and Robust Multichip Average (RMA). We further performed rank-based inverse normal transformation for gene expression across all samples. We performed a probabilistic estimation of expression residuals (PEER) analysis to generate the top PEER factors (top 15 PEERs used in this study) to adjust batch and other potential confounding factors42 for downstream prediction model building. Only common SNPs (minor allele frequency > 0.05) with ‘PASS’ tags were included. SNPs with a call rate < 98%, with a Hardy-Weinberg equilibrium P value < 10−6 (among subjects of European ancestry) or showing batch effects were excluded. Principal component analysis (PCA) was conducted using EIGENSTRAT43 (see URLs) to generate top PCs from the genotype data.
Genetic and transcriptome data from the 284 transverse colon samples from GTEx were used to build gene-expression prediction models for this study. We trained the gene-expression prediction model by flanking genetic variants (flanking ±1Mb region) using an elastic-net approach, implemented in the MetaXcan tool (Formula 1).
| (1) |
In Formula 1, for each gene, Y is the expression level, G is the number of effect alleles (0–2) for each genetic variant, with an adjustment for top PCs, sex, age, potential batch effects, and other potential confounding factors (PEERs), and ε is the random error. We focused on the cis-regulation of a gene predicted by local genetic variants within 2 Mb flanking the gene region. The parameters of the model for each gene were assessed using tenfold cross-validation, and the correlation (R2) between predicted and observed gene-expression levels was used to evaluate the prediction performance.
Evaluating prediction performance of gene-expression models using TCGA
To further evaluate the prediction performance of these genes in GTEx externally, we downloaded TCGA Colorectal Adenocarcinoma data, including data from gene expressions (RNA-seq V2, measured by median Zscore), DNA methylations (Level 3, Infinium HumanMethylation450), and SCNAs (data_linear_CNA.txt) from cBioPortal (see URLs). We also downloaded Level 3 SNP data (COAD and READ) that were genotyped using the Affymetrix SNP 6.0 array from TCGA’s data portal. A total of 355 tumor samples with gene expressions, DNA methylations, SCNA and SNP data from TCGA were used to evaluate the prediction model. The processing protocol for genotype and gene-expression data was performed for the data from TCGA as described in our previous work25, 31. To evaluate the prediction performance for each gene predicted by a set of SNPs in GTEx, we first constructed a residual model for gene expression as a dependent variable by adjusting DNA methylation, SCNA and the first five PCs derived from the genotype data. We further used the derived residuals to construct a linear regression model with the genotype data from the set of SNPs. The prediction performance of the model for each gene was assessed by the adjusted R2 for the linear model, including all genetic predictors determined in GTEx data. The prediction performance of gene-expression models has been summarized in Supplementary Table 1.
Association analyses between predicted gene expression and CRC risk
To identify susceptibility genes for CRC risk, we applied the weight matrix to the summary statistics data on SNPs from CRC GWAS datasets using the MetaXcan tool44. The MetaXcan method, which has been described elsewhere45, 46, was used for the association analyses.
| (2) |
In Formula 2, the Z-score was used to estimate the association between predicted gene expression CRC risk. Here, wlg is the weight of SNP l for predicting the expression of gene g. and se are the association regression coefficients, and its standard error for SNP l in GWAS, and and are the estimated variances of SNP l and the predicted expression of gene g, respectively. For this study, we estimated the correlations between SNPs included in the prediction models using the phase 3, 1000 Genomes Project data of European populations. The summary statistics were analyzed based on a meta-analysis of GWAS data among 125,478 individuals of European descent from six cohort studies. An association result was included in the meta-analysis only if the estimated minor allele count (MAC) was > 50 and the imputation quality metric Rsq >0.3. Notably, additional QC steps for SNPs and samples have been performed in previous GWAS by the CRC consortia18.
For our TWAS analyses, a Bonferroni-corrected P < 9.1 ×10−6 (0.05/5,491) was used to identify a statistically significant association.
Conditional and colocalization analyses of GWAS association signals
To determine whether the identified associations between genetically predicted gene expression and CRC risk were influenced by association signals identified in GWAS, we conducted conditional analyses by adjusting for index SNPs. We followed a previously reported approach47 to estimate adjusted odds ratios and their standard errors for the association between selected SNPs and CRC risk, given the reported index SNP. Then we re-ran the MetaXcan analyses using the adjusted summary statistics.
We downloaded cis-eQTL results from normal transverse tissues in the GTEx database (Version 8). Colocalization analysis of GWAS association signals and expression of TWAS-identified genes was conducted using summary data–based Mendelian randomization (SMR)48 based on the eQTL results from the GTEx and the GWAS summary statistics data. The genotype data from the European populations of the 1000 Genomes Project Phase 3 was used for the LD estimation. The significant colocalized signals was determined based on the nominal threshold of PSMR < 0.05.
Results
Gene expression predicted by flanking genetic variants
We used transcriptome and genotype data from the GTEx to build gene-expression prediction models for normal transverse colon tissues from European descendants (N = 284) (see Methods). A total of 16,082 models were built for genes predicted by flanking genetic variants (flanking ±1Mb region) using the elastic net approach36, 49. Our results showed that the expression levels of 9,503 genes could be predicted by local genetic variants at R2 > 0.01 (10% correlation), with a median of 27 variants per gene (See Supplementary Table 1). To evaluate the prediction performances for each gene predicted by a set of SNPs in GTEx, we constructed a residual model for gene expression as a dependent variable by adjusting DNA methylation, somatic copy number alteration (SCNA) and the first five principal components (PCs) using data from 355 primary CRC tissues from TCGA. We further used the derived residuals to construct a linear regression model with the genotype data from the same set of SNPs (see Methods). After removing poorly predicted genes, we focused on 5,491 genes for a downstream association analysis between predicted gene expression and CRC risk. These genes had either 1) a model prediction R2 > 0.01 in the GTEx and replicated by TCGA, or 2) a R2 > 0.0625 (25% correlation) in the GTEx regardless of TCGA; this second set included additional genes that could not be evaluated in TCGA due to a lack of data (See Supplementary Table 1).
Identified CRC susceptibility genes in TWAS
We evaluated the associations of the predicted genes with CRC risk using the MetaXcan tool44, based on the weights of the SNPs in the expression-prediction models from GTEx, and their summary statistics from GWAS data (N = 125,487; see Methods). In total, we identified 25 genes with genetically predicted expressions associated with CRC risk at a Bonferroni-corrected threshold of P < 9.1 × 10−6 (Fig. 1 and Supplementary Table 4). Of those, five genes, PYGL (14q22.1), RPL28 (19q13.42), CAPN12 (19q13.2), MYH7B (20q11.22), and MAP1L3CA (20q11.22) are located in four independent loci at least 3 Mb away from any previously reported GWAS-identified locus for CRC risk, suggesting that they may be novel CRC susceptibility loci. Specifically, low predicted expression levels of all of these genes were associated with an increased risk of CRC (Table 1). Using a less stringent threshold at a P < 6.6 × 10−4, a false discovery rate (FDR)-corrected significance level, we identified an additional 48 genes with genetically predicted expressions associated with CRC risk (Fig. 1 and Supplementary Table 5).
Figure 1. Manhattan plot of the association results from the TWAS of colorectal cancer (N = 125, 478).

The blue and red lines represent a false discovery rate (FDR)-corrected significance level of P < 6.6 × 10−4 and a Bonferroni-corrected threshold of P < 9.1 × 10−6, respectively.
Table 1.
Association of colorectal cancer risk with predicted expression for six genes located at least 1MB away from any GWAS reported colorectal cancer risk variants.
| Locus | Gene | Z score | P valuea | R2 (GTE×)b | R2 (TCGA)b | Lead SNPc | Distance (Mb)c |
|---|---|---|---|---|---|---|---|
| 14q22.1 | PYGL | −4.60 | 4.20 × 10−6 | 0.20 | 0.07 | rs35107139 | 3.0 |
| 19q13.2 | CAPN12 | −4.51 | 6.24 × 10−6 | 0.11 | 0.04 | rs28840750 | 5.7 |
| 19q13.4 | RPL28 | −5.03 | 4.86 × 10−7 | 0.23 | 0.04 | rs73068325 | 3.2 |
| 20q11.22d | MYH7B | −4.78 | 1.72 × 10−6 | 0.13 | 0.02 | rs6031311 | 9.1 |
| 20q11.22d | MAP1LC3A | −4.94 | 7.66 × 10−7 | 0.13 | <0.01 | rs6031311 | 9.5 |
P value derived from association analyses in TWAS; Statistically significant based on a Bonferroni-corrected threshold of P < 9.1 × 10−6 from 5,491 tests (0.05/5,491);
Prediction performance (R2) was derived using data from GTEx, while adjusted R2 was derived from TCGA data for validation of a gene expression prediction model. “−” refers to a model in GTEx that could not be validated in TCGA due to a lack of data.
Distance between a gene with the closest lead SNP identified from previous CRC GWAS.
The initial GWAS-identified locus (rs6058093) was excluded in the recent meta-analysis among 125,478 subjects of European descent18.
To determine whether the genes identified were implicated in the associations of established GWAS-identified risk variants, we investigated 20 (of the total of 25) genes located in 16 established independent CRC risk loci (Table 2). We performed conditional analyses for the associations between CRC risk and these genes, adjusting for the associations with the closest lead SNP for each locus (see Methods). We showed that an association with genesCCHCR1 (rs2516420, in a distance of 323kb), LRP1 (rs4759277, 0 kb), RGS9BP (rs28840750, 350kb), and PARD6B (rs6063514, 293kb) remained statistically significant at a P < 2.5 × 10−3 (with multiple comparisons corrections of 0.05/20) after adjusting for the closest risk SNP (Table 2). Of the remaining 16 genes, we observed that the expression prediction models for five of these, SFMBT1 (rs9831861), COLCA1 (rs3087967), COLCA2 (rs3087967), AL121832.2 (rs1741640), and AL121832.3 (rs1741640), included the lead SNPs, or SNPs in strong LD (R2 >0.95) with them. This supports the hypothesis that the observed association for these genes may be driven by these lead SNPs (Table 2). Further colocalization analysis between the expression of all TWAS-identified genes and previous GWAS association signals using the SMR approach showed that the effects of eight lead SNPs on cancer risk may be mediated by expression of TWAS-identified genes, including SF3A3, ACTR1B, AC021016.1, SFMTB1, AC004847.1, C11orf53, COLCA1, COLCA2, DACT1, AL121832.2 and AL121832.3 (Supplementary Table 5). The remaining nine genes have not been previously reported.
Table 2.
Association of colorectal cancer risk with predicted expression for 15 genes located in the regions within 500kb from GWAS reported colorectal cancer risk variants.
| Locus | Gene | Z score | P valuea | R2 (GTE×)b | R2 (TCGA)b | Lead SNPc | Distance (kb)c | P value after adjusting for lead SNPd |
|---|---|---|---|---|---|---|---|---|
| 1p34.3 | SF3A3 | 4.99 | 5.99 × 10−7 | 0.13 | 0.01 | rs4360494 | 144 | 0.94 |
| 2q11.2 | ACTR1B | −4.8 | 1.62 × 10−6 | 0.28 | 0.02 | rs11692435 | 0 | 0.03 |
| 2q35 | AC021016.1 | 5.77 | 7.90 × 10−9 | 0.52 | - | rs35470271 | 27,542 | 0.28 |
| 2q35 | ARPC2 | 4.49 | 7.10 × 10−6 | 0.02 | 0.01 | rs3731861 | 72,177 | 0.90 |
| 3p21.1 | SFMBT1 | 4.61 | 4.03 × 10−6 | 0.20 | 0.15 | rs9831861 | 7,519 | - |
| 6p21.3 | ATP6V1G2 | −4.75 | 2.00 × 10−6 | 0.06 | 0.08 | rs2516420 | 62,619 | 7.6 × 10−3 |
| 6p21.33 | CCHCR1 | 5.24 | 1.61 × 10−7 | 0.46 | <0.01 | rs2516420 | 323,605 | 8.78 × 10−7 |
| 7p13 | AC004847.1 | −4.46 | 8.16 × 10−6 | - | - | rs12672022 | 135,915 | 0.65 |
| 11q23.1 | C11orf53 | −11.66 | 1.94 × 10−31 | 0.21 | 0.06 | rs3087967 | 0 | 7.4 × 10−3 |
| 11q23.1 | COLCA1 | −10.16 | 2.94 × 10−24 | 0.24 | 0.01 | rs3087967 | 4,676 | - |
| 11q23.1 | COLCA2 | −11.25 | 2.31 × 10−29 | 0.36 | 0.06 | rs3087967 | 12,444 | - |
| 12q13.12 | AC074032.1 | −5.35 | 8.82 × 10−8 | 0.19 | - | rs12372718 | 611,689 | 0.86 |
| 12q13.3 | LRP1 | 5.25 | 1.50 × 10−7 | 0.16 | 0.01 | rs4759277 | 0 | 4.47 × 10−4 |
| 12q24.12 | ALDH2 | −4.48 | 7.47 × 10−6 | 0.04 | 0.02 | rs597808 | 231,333 | 0.40 |
| 14q23.1 | DACT1 | −5.42 | 6.04 × 10−8 | 0.06 | <0.01 | rs17094983 | 74,322 | 0.09 |
| 19q13.11 | RGS9BP | −5.02 | 5.22 × 10−7 | 0.02 | 0.02 | rs28840750 | 350,721 | 1.12 × 10−5 |
| 19q13.43 | AC016629.3 | −4.82 | 1.45 × 10−6 | 0.13 | - | rs73068325 | 7,716 | 0.13 |
| 20q13.13 | PARD6B | 5.07 | 4.02 × 10−7 | 0.07 | 0.01 | rs6063514 | 292,799 | 1.93 × 10−6 |
| 20q13.33 | AL121832.2 | 9.68 | 3.84 × 10−22 | 0.23 | - | rs1741640 | 28,945 | - |
| 20q13.33 | AL121832.3 | −8.59 | 8.99 × 10−18 | 0.11 | - | rs1741640 | 44,878 | - |
P value derived from association analyses in TWAS; Statistically significant based on a Bonferroni-corrected threshold of p < 9.1 × 10−6 from,5,491 tests (0.05/5,491).
prediction performance (R2) was derived using data from GTEx, while adjusted R2 was derived from TCGA data for validation of a gene-expression prediction model. “−” refers to a model in GTEx that could not be validated in TCGA due to a lack of data.
Distance between a gene with the closest lead SNP identified from previous CRC GWAS.
P value derived from association analyses in TWAS after adjusting for the association of the closest lead SNP. “−” refers to the lead SNP that has been included in the model or is in strong LD (R2 > 0.95) with a SNP in the model that showed the best association signal with CRC risk.
The genes in bold refer to those reported from the previous eQTL analysis in CRC and the colocalization analysis in this study.
In addition, we systemically evaluated eQTL results in the established GWAS loci18,23 using data from the GTEx and identified 42 eQTL genes for 19 lead variants at a Bonferroni-corrected threshold of P < 2.4 × 10−5 (Supplementary table 6). Further colocalization analysis showed that the CRC susceptibility for 18 leading variants may be mediated by cis-effects on gene regulation, whereas 38 genes were involved (Supplementary Table 6). In addition to the genes identified in the above TWAS, we observed that an additional 12 genes, including ASAH2B, ATF1, CABLES2, HCG20, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DRB5, PRRT1, TRPC6, ZNF132, and ZNF584 were supported by TWAS analysis at a less stringent threshold of P < 0.01 (Supplementary Table 6).
Putative susceptibility genes supported by functional genomic data analysis
To search for additional evidence for the 25 putative susceptibility genes identified from TWAS, we explored regulatory mechanisms for putative functional variants correlated with the SNP that has the strongest association with CRC risk in the prediction model. We performed functional annotations of the variants in strong LD (R2 > 0.8) with these SNPs (Supplementary Table 7; see Supplemental Notes). We analyzed 349 putative functional variants (in strong LD with these SNPs), which showed evidence of the epigenetic signals from a data analysis of Roadmap (see URLs, Supplementary Table 7; see Supplemental Notes). Specifically, a total of 118 variants were mapped to promoter regions, whereas the remaining 231 variants were mapped to enhancer regions. To search for direct evidence that variants regulate the putative susceptibility genes identified from our TWAS analysis, we first examined if the potential functional variants were positioned in proximal promoter regions, as such variants would most likely play a regulatory role in relation to the closest genes. The 13 genes, C11orf53, COLCA1, COLCA2, AC074032, ALDH2, ARPC2, CAPN12, LRP1, PARD6B, PYGL, RPL28, SF3A3, and SFMBT1, were likely regulated by nearby putative functional SNPs with promoter activities (Supplementary Table 7). We further examined whether the genes could be regulated by putative functional variants located in enhancer regions via long-distance promoter-enhancer interactions by analyzing chromatin-chromatin interaction data. An additional six genes, AC004847.1, AC074032.1, MAP1LC3A, ACTR1B, AL121832.2, and DACT1, showed evidence of potential regulation via distal promoter-enhancer interaction. Taken together, a total of 19 of the original 25 identified genes (76%) showed evidence of regulation by putative functional variants via proximal promoter or distal enhancer-promoter interactions. All of them were further supported with the evidence of their potential functional variants located in regions of histone modifications, DNase I hypersensitive or TF binding sites in normal colorectal epithelium and CRC cell lines (Supplementary Table 8).
Functional assays for putative tumor suppressor CABLES2
Previous studies have indicated that the CABLES1 gene may be involved in cell growth and differentiation50, 51, and it appears to function as a tumor suppressor by inhibiting CRC formation and growth51–55. In line with these previous findings, our results showed that a lower predicted expression of CABLES2 may mediate the effect of the lead SNP rs1741640 on an increased risk of CRC. A functional annotation of rs1741640 indicates that this SNP is located in a region with promoter activity. To confirm whether the rs1741640 can regulate CABLES2 expression, we conducted luciferase reporter assays for rs1741640 and another SNP, rs477859, in strong LD with rs1741640 in both HCT116 and RKO cell lines (see Supplemental Notes). Our results showed that the fragment containing the risk allele (C, rs1741640) significantly decreased the promoter activity of CABLES2 compared to the reference allele T in both cell lines (Fig. 2A). The observation was in line with the results from both colocalization and TWAS analyses (Supplementary Table 6). For rs477859, we did not observe that the fragment containing the alternative allele significantly affect the promoter activity compared to the reference allele (Data not shown).
Figure 2. Dual-Luciferase reporter assay and CRC cell proliferation, colony formation, migration and invasion for the CABLES2 gene.

A) Boxplots showing that alternative alleles (CC) of functional SNP rs1741640 can affect promoter activity of CABLES2 compared to reference alleles (TT), based on the results using luciferase reporter assays in both the HCT116 and RKO cell lines. The yellow and black boxes represent the sequence of CABLES2 promoter and the enhancer elements containing the candidate functional SNP individually. The error bars represent the standard deviation of promoter activities. A paired t test was performed to derive p value. B) The virus infection efficiency and C) the knockdown efficiency were, respectively, verified by RFP fluorescence and qPCR in CABLES2 shRNA treated (sh1 and sh2) and vehicle control cells. After SW480 cells were transfected with CABLES2 shRNA, the cell colonies were D) imaged, E) quantified and F) given a ratio using a colony formation assay for the CABLES2 shRNA and vehicle control cells. G) Cell viabilities were measured using CCK8 assays at different time points (Days 1, 2, 3, 4, 5) and H) migration and invasion were measured using Transwell assays for CABLES2 shRNA and vehicle cells. P-values were determined by a t-test from the comparison of knockdown and control cells. “*”, P < 0.05; “**”, P < 0.01; “***”, P < 0.001.
We further performed in vitro functional assays in CRC cell lines to investigate the cellular function of CABLES2. qPCR experiments were conducted to compare its relative expression level in three CRC cell lines (SW480, RKO and HCT116). CABLES2 showed higher expression levels in both the SW480 and HCT116 cell lines when compared to the RKO cell line (Supplementary Fig. 1). We designed knockdown experiments in the SW480 and HCT116 cell lines using short hairpin RNA (shRNA). We employed packaged lentivirus from lentiviral expression vectors containing the Red Fluorescent Protein (RFP) reporter gene with open reading frames (ORF) and a fragment from CABLES2 shRNA-1, shRNA-2, or a control vehicle shRNA. Using the RFP fluorescence reporter system, we observed high transfection efficiencies for the control vehicles, shRNA-1 and shRNA-2, in the SW480 cells (Fig. 2B). In SW480 cells, the shRNA-1 virus infection was able to reduce the endogenous mRNA level by approximately 90%, whereas shRNA-2 decreased the mRNA level by 40% in SW480 cells (Fig. 2C). After performing the CABLES2 shRNA-1 treatment, colony formation was significantly enhanced over that of the control vehicle; CABLES2 shRNA-2 also led to a mild increase in colony formation (Fig. 2D–F; P < 0.001). Cell viabilities after the CABLES2 shRNA-1 or shRNA-2 treatment were significantly increased over those of the control vehicle in the CCK8 assay (P < 0.01 for both; Fig. 2G). Further, we observed that shRNA-1 significantly increased cell migration and invasion when compared to the control vehicle; shRNA-2 also led to an increase of cell invasion in the Transwell assay in the SW480 and HCT116 cell lines (Fig. 2H and Supplementary Fig. 2).
To explore potential genes and pathways regulated by CABLES2, we conducted RNA-sequencing in the CABLES2 shRNA-1 and control vehicle SW480 cells (see Supplemental Notes)). We identified 169 differentially expressed genes, which included 85 up-regulated and 84 down-regulated genes, with CABLES2 shRNA-1, compared to control vehicle cells (Supplementary Fig. 3A; see Supplemental Notes). An enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG, see URLs) Pathways revealed that these differentially-expressed genes were significantly enriched in thePI3K/AKT cancer pathway (P = 7.2 × 10−3). A total of 10 genes were involved in that pathway, including nine up-regulated genes: TNC, ANTXR2, IL7R, CHN2, SGK1, TLR4, FN1, MET and ARHGAP29. The presence of only one down-regulated gene, IL2RG, suggests that the silenced CABLES2 can promote the PI3K-AKT signaling pathway (Supplementary Fig. 3B). We further validated these expression changes for the top seven genes using qPCR in the cells (Supplementary Fig. 3C).
To confirm that CABLES2 is capable of inhibiting the growth of colonic carcinoma, we developed tumor xenografts by the subcutaneous injection of SW480 CABLES2 knockdown and control shRNA cells in nude mice (see Supplemental Notes). We showed that the injected CABLES2 knockdown cells markedly increases tumor volume and weight when compared with control cells (Fig. 3A–C). To analyze cell proliferation in tumor tissue, Ki67 staining was conducted in tumor tissue from nude mice that had been injected with knockdown and control cells. We observed that CABLES2 knockdown cells increase Ki67 foci when compared with control cells (Fig. 3D); 77% of Ki67-positive cells were observed in knockdown cells compared to 35% in control cells (Fig. 3E).
Figure 3. CABLES2 inhibiting growth of colonic carcinoma in nude mice.

A) Photograph of tumor formation in nude mice after injection of cells at day 48. B) Tumor volume was recorded by a measurement of tumor length, height and width every 3–5 days during tumor proliferation, and calculated as length × width × height/2. C) Quantification of tumor weight at day 48 after the anesthetic execution of mice; D-E) Ki67 staining of tumor tissue and quantification of Ki67-positive cells in knockdown and control cells.
Discussion
Utilizing data from a large-scale GWAS collaboration of GTEx and TCGA, we conducted a TWAS analysis to search for susceptibility genes for CRC risk. We disclosed 25 putative CRC predisposition genes, including six novel genes located in the regions far away from any previously reported susceptibility loci for CRC risk, and nine novel genes as targets for established CRC GWAS loci. We have also identified an additional 48 genes with genetically predicted expressions associated with CRC risk using a relax threshold at FDR < 0.05. These genes can be further verified by additional future large-scale genetic studies. These findings greatly increase the number of potential susceptibility genes identified for CRC risk. On the other hand, we also identified 38 putative target genes, including CABLES2, for the established GWAS loci through the colocalization analysis, whereas a majority of genes were supported by TWAS analysis. Using both in vitro and in vivo functional assays, we further showed that the putative tumor suppressor CABLES2 plays a vital role in CRC tumorigenesis. Our findings provide novel insight into the genetic and biological basis for CRC development.
Our identification of these 25 putative susceptibility genes is supported by several lines of additional evidence. For example, for 76% of them, we observed evidence of cis-regulation by putative functional risk SNPs via proximal promoter or distal enhancer-promoter interactions. These data suggest a possible link of a gene with a potential regulatory SNP. Functional assays, such as luciferase reporter assays in CRC cell lines are needed to establish the underlying regulatory mechanisms, as demonstrated by CABLES2 (rs1741640) in our study. Furthermore, 11 of these genes have been implicated as putative target genes for lead SNPs based on previous eQTL and our current colocalization analysis analyses27–31 (Table 2). A particular lead SNP may be a surrogate for multiple variants for cancer risk in the locus, while some target genes for potential causal variants, which are in weak LD with index SNPs, may not be detected by an eQTL analysis. In addition, we conducted a differential gene expression analysis for the genes identified in TWAS among normal colon mucosa, adenoma and adenocarcinoma using publicly available gene expression data, including 135 normal colon mucosa, 363 colon adenoma and 2,760 colon adenocarcinoma (Supplementary Table 5 and 6). Of the 25 genes identified in TWAS, we observed seven genes, including DACT1, CCHCR1, PARD6B, MYH7B, SFMBT1, PYGL, and ALDH2, that showed significant differential expression between carcinoma and adenoma tissues at P < 0.05. For three genes, MYH7B, PYGL, and ALDH2, we observed that low predicted expression levels were associated with an increased risk of CRC and showed lower expression in carcinoma than in adenoma. For the remaining four genes where high predicted expression levels were associated with an increased risk of CRC, they showed higher expression in carcinoma than in adenoma, except for the gene DCAT1 (Supplementary Table 5). The results provide additional evidence to support a possible role of these genes in tumorigenesis. However, we did not observe the same pattern when we compared adenoma with mucosa. This may be due to their potential oncogenic role in the initiation of carcinogenesis in adenoma stage. It should be noted that more than half of our identified genes, including, RPL2856, CCHCR157, 58, PARD6B59, DACT160, 61, LRP162–64, SF3A365, MAP1LC3A66, ACTR1B67, ATP6V1G268, PYGL69, 70, ARPC271, 72, ALDH273, 74, KCNQ175, SFMBT176, have been implicated in cancer-driver events or cancer-related functions in in vitro or in vivo functional experimental studies in CRC or other cancer types.
In our gene-expression prediction model building from GTEx, although we used a relatively low cutoff of prediction performance at R2 > 0.01 (10% correlation) for downstream TWAS analysis, all of the reported genes can still be well predicted with a minimal prediction performance at R2 > 0.04. In addition, we used tumor tissues from TCGA to evaluate the prediction performance of gene-expression built in GTEx. Although we performed additional analyses to adjust for possible confounders of gene expression in tumor tissues of TCGA, large independent normal tissues samples are desirable for the evaluation of the prediction performance of gene-expression. For our prediction model, the reference panels were built using only the data from European populations. Both GWAS and transcriptome data from GTEx are currently limited for other racial groups, preventing us from evaluating these susceptibility genes in non-European populations. Additional efforts are still needed to generate transcriptome data for TWAS in those populations.
Although many of the genes we identified have been implicated previously in CRC susceptibility genes, we provided additional evidence to support their role. Further, we have identified 15 genes not reported previously for CRC susceptibility, including six genes located in the regions not yet identified by GWAS. We performed functional genomic experiments for one particular gene, CABLES2, to demonstrate that some genes identified in TWAS are functionally important in colorectal carcinogenesis. Further in vitro and in vivo experiments will be needed to firmly establish a causal association between the other reported genes and CRC risk.
Supplementary Material
Background and Context.
Large-scale transcriptome-wide association studies (TWAS) to identify genetic risk loci and susceptibility genes for colorectal cancer (CRC) are lacking.
New Findings.
Five genes in four novel loci were identified to be associated with CRC risk in individuals of European descent, along with an additional nine genes in known GWAS loci that have not been previously reported.
Limitations.
Further functional assays are requested to establish the underlying regulatory mechanisms for the risk genetic variants and susceptibility genes identified in this study.
Impact.
Identification of CRC risk loci and susceptibility genes can help identify individuals at high risk. These findings have significant implications for genetic screening.
ACKNOWLEDGMENTS
We thank GTEx, TCGA, ENCODE and Roadmap for providing valuable data resources for the research. We thank Drs. Lang Wu, Yingchang Lu and Chenjie Zeng for valuable discussion and Marshal Younger for assistance with editing and manuscript preparation. The data analyses were conducted using the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. This research is supported primarily by the grant from US National Institutes of Health grant R37 CA227130 to X.G and R01 CA188214 to W.Z.
Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO): National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA164930, U01 CA137088, R01 CA059045, R21 CA191312, R01 CA201407). Genotyping/Sequencing services were provided by the Center for Inherited Disease Research (CIDR) (X01-HG008596 and X01-HG007585). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number HHSN268201200008I. This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA015704. PCS is supported by the National Institutes of Health (R01 CA160356, R01 CA193677, R01 CA204279 and R01 CA143237).
ASTERISK: a Hospital Clinical Research Program (PHRC-BRD09/C) from the University Hospital Center of Nantes (CHU de Nantes) and supported by the Regional Council of Pays de la Loire, the Groupement des Entreprises Françaises dans la Lutte contre le Cancer (GEFLUC), the Association Anne de Bretagne Génétique and the Ligue Régionale Contre le Cancer (LRCC).
The ATBC Study is supported by the Intramural Research Program of the U.S. National Cancer Institute, National Institutes of Health, and by U.S. Public Health Service contract HHSN261201500005C from the National Cancer Institute, Department of Health and Human Services.
CLUE II: This research was funded by the American Institute for Cancer Research and the NCI (P30 CA006973 to W.G. Nelson).
COLO2&3: National Institutes of Health (R01 CA60987).
ColoCare: This work was supported by the National Institutes of Health (grant numbers R01 CA189184 (Li/Ulrich), U01 CA206110 (Ulrich/Li/Siegel/Figueireido/Colditz), 2P30CA015704-40 (Gilliland), R01 CA207371 (Ulrich/Li)), the Matthias Lackas-Foundation, the German Consortium for Translational Cancer Research, and the EU TRANSCAN initiative.
The Colon Cancer Family Registry (CFR) Illumina GWAS was supported by funding from the National Cancer Institute, National Institutes of Health (grant numbers U01 CA122839, R01 CA143247). The Colon CFR/CORECT Affymetrix Axiom GWAS and OncoArray GWAS were supported by funding from National Cancer Institute, National Institutes of Health (grant number U19 CA148107 to S Gruber). The Colon CFR participant recruitment and collection of data and biospecimens used in this study were supported by the National Cancer Institute, National Institutes of Health (grant number UM1 CA167551) and through cooperative agreements with the following Colon CFR centers: Australasian Colorectal Cancer Family Registry (NCI/NIH grant numbers U01 CA074778 and U01/U24 CA097735), USC Consortium Colorectal Cancer Family Registry (NCI/NIH grant numbers U01/U24 CA074799), Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (NCI/NIH grant number U01/U24 CA074800), Ontario Familial Colorectal Cancer Registry (NCI/NIH grant number U01/U24 CA074783), Seattle Colorectal Cancer Family Registry (NCI/NIH grant number U01/U24 CA074794), and University of Hawaii Colorectal Cancer Family Registry (NCI/NIH grant number U01/U24 CA074806), Additional support for case ascertainment was provided by the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute to Fred Hutchinson Cancer Research Center (Control Nos. N01-CN-67009 and N01-PC-35142, and Contract No. HHSN2612013000121), the Hawai’i Department of Health (Control Nos. N01-PC-67001 and N01-PC-35137, and Contract No. HHSN26120100037C, and the California Department of Public Health (contracts HHSN261201000035C awarded to the University of Southern California, and the following state cancer registries: AZ, CO, MN, NC, NH, and by the Victoria Cancer Registry and Ontario Cancer Registry.
COLON: The COLON study is sponsored by Wereld Kanker Onderzoek Funds, including funds from grant 2014/1179 as part of the World Cancer Research Fund International Regular Grant Programme, by Alpe d’Huzes and the Dutch Cancer Society (UM 2012-5653, UW 2013-5927, UW2015-7946), and by TRANSCAN (JTC2012-MetaboCCC, JTC2013-FOCUS). The NQplus study is sponsored by a ZonMW investment grant (98-10030); by PREVIEW, the project PREVention of diabetes through lifestyle intervention and population studies in Europe and around the World (PREVIEW) project, which received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant no. 312057; by funds from TI Food and Nutrition (cardiovascular health theme), a public-private partnership on pre-competitive research in food and nutrition; and by FOODBALL, the Food Biomarker Alliance, a project from JPI Healthy Diet for a Healthy Life.
Colorectal Cancer Transdisciplinary (CORECT) Study: The CORECT Study was supported by the National Cancer Institute, National Institutes of Health (NCI/NIH), U.S. Department of Health and Human Services (grant numbers U19 CA148107, R01 CA81488, P30 CA014089, R01 CA197350, P01 CA196569, R01 CA201407, and R01 CA143237) and National Institutes of Environmental Health Sciences, National Institutes of Health (grant number T32 ES013678).
CORSA: This study was funded by FFG BRIDGE (grant 829675 to Andrea Gsur), the “Herzfelder’sche Familienstiftung” (grant to Andrea Gsur) and was supported by COST Action BM1206.
CPS-II: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study-II (CPS-II) cohort. This study was conducted with Institutional Review Board approval.
CRCGEN: Colorectal Cancer Genetics & Genomics, Spanish study, was supported by Instituto de Salud Carlos III, co-funded by FEDER funds -a way to build Europe- (grants PI14-613 and PI09-1286), Agency for Management of University and Research Grants (AGAUR) of the Catalan Government (grant 2017SGR723), and Junta de Castilla y León (grant LE22A10-2). Sample collection of this work was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d’Oncología de Catalunya (XBTC), Plataforma Biobancos PT13/0010/0013 and ICOBIOBANC, sponsored by the Catalan Institute of Oncology.
Czech Republic CCS: This work was supported by the Czech Science Foundation (grants 17-16857S and 18-09709S), the Grant Agency of the Ministry of Health of the Czech Republic (grants 15-27580A and 17-30920A), the Charles University Research Centre program (UNCE/MED/006), the Charles University Research Fund (Progres Q39 and Q28), and MEYS CR, financed from EFRR (CZ.02.1.01/0.0/0.0/16_019/0000787).
DACHS: This work was supported by the German Research Council (BR 1704/6-1, BR 1704/6-3, BR 1704/6-4, BR 1704/6-6, CH 117/1-1, HO 5117/2-1, HE 5998/2-1, KL 2354/3-1, RO 2270/8-1 and BR 1704/17-1), the Interdisciplinary Research Program of the National Center for Tumor Diseases (NCT), Germany, and the German Federal Ministry of Education and Research (01KH0404, 01ER0814, 01ER0815, 01ER1505A and 01ER1505B).
DALS: National Institutes of Health (R01 CA48998 to M. L. Slattery).
EDRN: This work is funded and supported by the NCI, EDRN Grant (U01 CA 84968-06).
EPIC: The coordination of EPIC is financially supported by the School of Public Health, Imperial College London and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), Federal Ministry of Education and Research (BMBF), Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation (Greece); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); ERC-2009-AdG 232997 and Nordforsk, Nordic Centre of Excellence programme on Food, Nutrition and Health (Norway); Health Research Fund (FIS), PI13/00061 to Granada, PI13/01162 to EPIC-Murcia, Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, ISCIII RETIC (RD06/0020) (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and C8221/A19170 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (United Kingdom).
Disclaimer: Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization.
EPICOLON: This work was supported by grants from Fondo de Investigación Sanitaria/FEDER (PI08/0024, PI08/1276, PS09/02368, P111/00219, PI11/00681, PI14/00173, PI14/00230, PI17/00509, 17/00878, Acción Transversal de Cáncer), Xunta de Galicia (PGIDIT07PXIB9101209PR), Ministerio de Economia y Competitividad (SAF07-64873, SAF 2010-19273, SAF2014-54453R), Fundación Científica de la Asociación Española contra el Cáncer (GCB13131592CAST), Beca Grupo de Trabajo “Oncología” AEG (Asociación Española de Gastroenterología), Fundación Privada Olga Torres, FP7 CHIBCHA Consortium, Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya, 2014SGR135, 2014SGR255, 2017SGR21, 2017SGR653), Catalan Tumour Bank Network (Pla Director d’Oncologia, Generalitat de Catalunya), PERIS (SLT002/16/00398, Generalitat de Catalunya), CERCA Programme (Generalitat de Catalunya) and COST Action BM1206. CIBERehd is funded by the Instituto de Salud Carlos III.
ESTHER/VERDI. This work was supported by grants from the Baden-Württemberg Ministry of Science, Research and Arts and the German Cancer Aid.
Harvard cohorts (HPFS, NHS, PHS): HPFS is supported by the National Institutes of Health (P01 CA055075, UM1 CA167552, U01 CA167552, R01 CA137178, R01 CA151993, R35 CA197735, K07 CA190673, and P50 CA127003), NHS by the National Institutes of Health (R01 CA137178, P01 CA087969, UM1 CA186107, R01 CA151993, R35 CA197735, K07 CA190673, and P50 CA127003) and PHS by the National Institutes of Health (R01 CA042182).
Hawaii Adenoma Study: NCI grants R01 CA72520.
HCES-CRC (Hwasun Cancer Epidemiology Study-Colon and Rectum Cancer): grants from Chonnam National University Hwasun Hospital (HCRI15011-1), and the National Institutes of Health (R01 CA188214).
Kentucky: This work was supported by the following grant support: Clinical Investigator Award from Damon Runyon Cancer Research Foundation (CI-8); NCI R01CA136726.
LCCS: The Leeds Colorectal Cancer Study was funded by the Food Standards Agency and Cancer Research UK Programme Award (C588/A19167).
MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 396414 and 1074383, and by infrastructure provided by Cancer Council Victoria. Cases and their vital statuses were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database.
MEC: National Institutes of Health (R37 CA54281, P01 CA033619, R01 CA063464, and U01 CA164973).
MECC: This work was supported by the National Institutes of Health, U.S. Department of Health and Human Services (R01 CA81488 to SBG, R01 CA197350 to SBG, U19 CA148107 to SBG, N01 CN043302 assigned to SBG, 5P01 CA196569 to WG, P30 CA014089 to SBG, R01 CA144040 to SDM, and P50 CA150964 to SDM), as well as funding from the Ravitz Foundation, the Irving Weinstein Foundation, the Anton B. Burg Foundation, the Jane and Kris Popovich Chair in Cancer Research, and a generous gift from Daniel and Maryann Fong.
MSKCC: The work at Memorial Sloan Kettering Cancer Center in New York was supported by the Robert and Kate Niehaus Center for Inherited Cancer Genomics and the Romeo Milio Foundation. It is also supported by the National Cancer Institute-designated Comprehensive Cancer Center (grant number P30 CA008748).
NCCCS I & II: We acknowledge funding support for this project from the National Institutes of Health, R01 CA66635 and P30 DK034987.
NFCCR: This work was supported by an Interdisciplinary Health Research Team award from the Canadian Institutes of Health Research (CRT 43821); the National Institutes of Health, U.S. Department of Health and Human Serivces (U01 CA74783); and National Cancer Institute of Canada grants (18223 and 18226). Funding was provided to Michael O. Woods by the Canadian Cancer Society Research Institute.
NSHDS: NSHDS investigators thank the Biobank Research Unit at Umeå University, the Västerbotten Intervention Programme, the Northern Sweden MONICA study and Region Västerbotten for providing data and samples and acknowledge the contribution from Biobank Sweden, supported by the Swedish Research Council (VR 2017-00650). This research was also supported by funding to BVG from the Swedish Cancer Society (CAN 2017/581); the Swedish Research Council (VR 2017-01737); Region Västerbotten (VLL-841671, VLL-833291); the Lion’s Cancer Research Foundation (several grants), the Faculty of Medicine and Insamlingsstiftelsen at Umeå University; and the Margareta Dannborg Memorial Fund..
OFCCR: National Institutes of Health, through funding allocated to the Ontario Registry for Studies of Familial Colorectal Cancer (U01 CA074783); see CCFR section above. Additional funding toward genetic analyses of OFCCR includes the Ontario Research Fund, the Canadian Institutes of Health Research, and the Ontario Institute for Cancer Research, through generous support from the Ontario Ministry of Research and Innovation.
OSUWMC: OCCPI funding was provided by Pelotonia and HNPCC funding was provided by the NCI (CA16058 and CA67941).
PLCO: Intramural Research Program of the Division of Cancer Epidemiology and Genetics and supported by contracts from the Division of Cancer Prevention, National Cancer Institute, NIH, DHHS. Funding was provided by National Institutes of Health (NIH), Genes, Environment and Health Initiative (GEI) Z01 CP 010200, NIH U01 HG004446, and NIH GEI U01 HG 004438.
PMH: National Institutes of Health (R01 CA076366 to P.A. Newcomb).
SEARCH: The University of Cambridge has received salary support in respect of PDPP from the NHS in the East of England through the Clinical Academic Reserve. Cancer Research UK (C490/A16561); the UK National Institute for Health Research Biomedical Research Centres at the University of Cambridge.
SELECT: The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was supported by the National Cancer Institute of the National Institutes of Health under Award Numbers UM1CA182883 and U10CA37429.
SMS: This work was supported by the National Cancer Institute (grant P01 CA074184 to J.D.P. and P.A.N., grants R01 CA097325, R03 CA153323, and K05 CA152715 to P.A.N., and the National Center for Advancing Translational Sciences at the National Institutes of Health (grant KL2 TR000421 to A.N.B.-H.)
The Swedish Low-risk Colorectal Cancer Study: The study was supported by grants from the Swedish research council; K2015-55X-22674-01-4, K2008-55X-20157-03-3, K2006-72X-20157-01-2 and the Stockholm County Council (ALF project).
Swedish Mammography Cohort and Cohort of Swedish Men: This work is supported by grants from the Swedish Cancer Foundation, the Swedish Research Council for the Swedish Infrastructure for Medical Population-based Life-course Environmental Research (SIMPLER) and Karolinska Institutés Distinguished Professor Award to Alicja Wolk.
VITAL: National Institutes of Health (K05 CA154337).
WHI: The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C.
Footnotes
Publisher's Disclaimer: This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest:
The authors disclose no conflicts.
References:
- 1.Palles C, Cazier JB, Howarth KM, et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat Genet 2013;45:136–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cancer Genome Atlas N. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487:330–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fearon ER. Molecular genetics of colorectal cancer. Annu Rev Pathol 2011;6:479–507. [DOI] [PubMed] [Google Scholar]
- 4.Zeng C, Matsuda K, Jia WH, et al. Identification of Susceptibility Loci and Genes for Colorectal Cancer Risk. Gastroenterology 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Michailidou K, Beesley J, Lindstrom S, et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat Genet 2015;47:373–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Al-Tassan NA, Whiffin N, Hosking FJ, et al. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer. Sci Rep 2015;5:10442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wang H, Burnett T, Kono S, et al. Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A. Nat Commun 2014;5:4613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schmit SL, Schumacher FR, Edlund CK, et al. A novel colorectal cancer risk locus at 4q32.2 identified from an international genome-wide association study. Carcinogenesis 2014;35:2512–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhang B, Jia WH, Matsuda K, et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat Genet 2014;46:533–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Figueiredo JC, Hsu L, Hutter CM, et al. Genome-wide diet-gene interaction analyses for risk of colorectal cancer. PLoS Genet 2014;10:e1004228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Whiffin N, Hosking FJ, Farrington SM, et al. Identification of susceptibility loci for colorectal cancer in a genome-wide meta-analysis. Hum Mol Genet 2014;23:4729–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhang B, Jia WH, Matsuo K, et al. Genome-wide association study identifies a new SMAD7 risk variant associated with colorectal cancer risk in East Asians. Int J Cancer 2014;135:948–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Peters U, Jiao S, Schumacher FR, et al. Identification of Genetic Susceptibility Loci for Colorectal Tumors in a Genome-Wide Meta-analysis. Gastroenterology 2013;144:799–807 e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dunlop MG, Dobbins SE, Farrington SM, et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat Genet 2012;44:770–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Peters U, Hutter CM, Hsu L, et al. Meta-analysis of new genome-wide association studies of colorectal cancer risk. Hum Genet 2012;131:217–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tomlinson IP, Carvajal-Carmona LG, Dobbins SE, et al. Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet 2011;7:e1002105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cui R, Okada Y, Jang SG, et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut 2011;60:799–805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Huyghe JR, Bien SA, Harrison TA, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet 2019;51:76–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schmit SL, Edlund CK, Schumacher FR, et al. Novel Common Genetic Susceptibility Loci for Colorectal Cancer. J Natl Cancer Inst 2019;111:146–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Law PJ, Timofeeva M, Fernandez-Rozadilla C, et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat Commun 2019;10:2154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lu Y, Kweon SS, Tanikawa C, et al. Large-Scale Genome-Wide Association Study of East Asians Identifies Loci Associated With Risk for Colorectal Cancer. Gastroenterology 2019;156:1455–1466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gusev A, Lee SH, Trynka G, et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet 2014;95:535–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Guo X, Cai Q, Bao P, et al. Long-term soy consumption and tumor tissue MicroRNA and gene expression in triple-negative breast cancer. Cancer 2016;122:2544–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zeng C, Guo X, Long J, et al. Identification of independent association signals and putative functional variants for breast cancer risk through fine-scale mapping of the 12p11 locus. Breast Cancer Res 2016;18:64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Guo X, Lin W, Bao J, et al. A Comprehensive cis-eQTL Analysis Revealed Target Genes in Breast Cancer Susceptibility Loci Identified in Genome-wide Association Studies. Am J Hum Genet 2018;102:890–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chen Z, Wen W, Beeghly-Fadiel A, et al. Identifying Putative Susceptibility Genes and Evaluating Their Associations with Somatic Mutations in Human Cancers. Am J Hum Genet 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Biancolella M, Fortini BK, Tring S, et al. Identification and characterization of functional risk variants for colorectal cancer mapping to chromosome 11q23.1. Hum Mol Genet 2014;23:2198–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Closa A, Cordero D, Sanz-Pamplona R, et al. Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis. Carcinogenesis 2014;35:2039–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Peltekova VD, Lemire M, Qazi AM, et al. Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants. Int J Cancer 2014;134:2330–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hofer P, Hagmann M, Brezina S, et al. Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas. Oncotarget 2017;8:98623–98634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chen Z, Wen W, Beeghly-Fadiel A, et al. Identifying Putative Susceptibility Genes and Evaluating Their Associations with Somatic Mutations in Human Cancers. Am J Hum Genet 2019;In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Rao SS, Huntley MH, Durand NC, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014;159:1665–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jin F, Li Y, Dixon JR, et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 2013;503:290–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mifsud B, Tavares-Cadete F, Young AN, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet 2015;47:598–606. [DOI] [PubMed] [Google Scholar]
- 35.Jager R, Migliorini G, Henrion M, et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat Commun 2015;6:6178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gamazon ER, Wheeler HE, Shah KP, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 2015;47:1091–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bien SA, Su YR, Conti DV, et al. Genetic variant predictors of gene expression provide new insight into risk of colorectal cancer. Hum Genet 2019;138:307–326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Su YR, Di C, Bien S, et al. A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics. Am J Hum Genet 2018;102:904–919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wu L, Shu X, Bao J, et al. Analysis of Over 140,000 European Descendants Identifies Genetically Predicted Blood Protein Biomarkers Associated with Prostate Cancer Risk. Cancer Res 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Guo X, Long J, Chen Z, et al. Discovery of rare coding variants in OGDHL and BRCA2 in relation to breast cancer risk in Chinese women. Int J Cancer 2020;146:2175–2181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Guo X, Shi J, Cai Q, et al. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes. Hum Mol Genet 2018;27:853–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Stegle O, Parts L, Piipari M, et al. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc 2012;7:500–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Price AL, Patterson NJ, Plenge RM, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–9. [DOI] [PubMed] [Google Scholar]
- 44.Barbeira AN, Dickinson SP, Bonazzola R, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun 2018;9:1825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lu Y, Beeghly-Fadiel A, Wu L, et al. A Transcriptome-Wide Association Study Among 97,898 Women to Identify Candidate Susceptibility Genes for Epithelial Ovarian Cancer Risk. Cancer Res 2018;78:5419–5430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wu L, Shi W, Long J, et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat Genet 2018;50:968–978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Yang J, Ferreira T, Morris AP, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 2012;44:369–75, S1–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhu Z, Zhang F, Hu H, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 2016;48:481–7. [DOI] [PubMed] [Google Scholar]
- 49.Gusev A, Ko A, Shi H, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 2016;48:245–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zukerberg LR, DeBernardo RL, Kirley SD, et al. Loss of cables, a cyclin-dependent kinase regulatory protein, is associated with the development of endometrial hyperplasia and endometrial cancer. Cancer Res 2004;64:202–8. [DOI] [PubMed] [Google Scholar]
- 51.Wu CL, Kirley SD, Xiao H, et al. Cables enhances cdk2 tyrosine 15 phosphorylation by Wee1, inhibits cell growth, and is lost in many human colon and squamous cancers. Cancer Res 2001;61:7325–32. [PubMed] [Google Scholar]
- 52.Bonifant CL, Waldman T. ‘Cables’ suspends cancer in mice. Cancer Biol Ther 2005;4:864–5. [DOI] [PubMed] [Google Scholar]
- 53.Kirley SD, D’Apuzzo M, Lauwers GY, et al. The Cables gene on chromosome 18Q regulates colon cancer progression in vivo. Cancer Biol Ther 2005;4:861–3. [DOI] [PubMed] [Google Scholar]
- 54.Park DY, Sakamoto H, Kirley SD, et al. The Cables gene on chromosome 18q is silenced by promoter hypermethylation and allelic loss in human colorectal cancer. Am J Pathol 2007;171:1509–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Arnason T, Pino MS, Yilmaz O, et al. Cables1 is a tumor suppressor gene that regulates intestinal tumor progression in Apc(Min) mice. Cancer Biol Ther 2013;14:672–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Labriet A, Levesque E, Cecchin E, et al. Germline variability and tumor expression level of ribosomal protein gene RPL28 are associated with survival of metastatic colorectal cancer patients. Sci Rep 2019;9:13008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Suomela S, Elomaa O, Skoog T, et al. CCHCR1 is up-regulated in skin cancer and associated with EGFR expression. PLoS One 2009;4:e6030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Chang J, Zhong R, Tian J, et al. Exome-wide analyses identify low-frequency variant in CYP26B1 and additional coding variants associated with esophageal squamous cell carcinoma. Nat Genet 2018;50:338–343. [DOI] [PubMed] [Google Scholar]
- 59.Cunliffe HE, Jiang Y, Fornace KM, et al. PAR6B is required for tight junction formation and activated PKCzeta localization in breast cancer. Am J Cancer Res 2012;2:478–91. [PMC free article] [PubMed] [Google Scholar]
- 60.Shi X, Huo J, Gao X, et al. A newly identified lncRNA H1FX-AS1 targets DACT1 to inhibit cervical cancer via sponging miR-324–3p. Cancer Cell Int 2020;20:358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Zhu K, Jiang B, Yang Y, et al. DACT1 overexpression inhibits proliferation, enhances apoptosis, and increases daunorubicin chemosensitivity in KG-1alpha cells. Tumour Biol 2017;39:1010428317711089. [DOI] [PubMed] [Google Scholar]
- 62.Song H, Li Y, Lee J, et al. Low-density lipoprotein receptor-related protein 1 promotes cancer cell migration and invasion by inducing the expression of matrix metalloproteinases 2 and 9. Cancer Res 2009;69:879–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Boye K, Pujol N, I DA, et al. The role of CXCR3/LRP1 cross-talk in the invasion of primary brain tumors. Nat Commun 2017;8:1571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Tian Y, Wang C, Chen S, et al. Extracellular Hsp90alpha and clusterin synergistically promote breast cancer epithelial-to-mesenchymal transition and metastasis via LRP1. J Cell Sci 2019;132. [DOI] [PubMed] [Google Scholar]
- 65.Siebring-van Olst E, Blijlevens M, de Menezes RX, et al. A genome-wide siRNA screen for regulators of tumor suppressor p53 activity in human non-small cell lung cancer cells identifies components of the RNA splicing machinery as targets for anticancer treatment. Mol Oncol 2017;11:534–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Hamurcu Z, Delibasi N, Gecene S, et al. Targeting LC3 and Beclin-1 autophagy genes suppresses proliferation, survival, migration and invasion by inhibition of Cyclin-D1 and uPAR/Integrin beta1/Src signaling in triple negative breast cancer cells. J Cancer Res Clin Oncol 2018;144:415–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Di Simone N, Hall HA, Welt C, et al. Activin regulates betaA-subunit and activin receptor messenger ribonucleic acid and cellular proliferation in activin-responsive testicular tumor cells. Endocrinology 1998;139:1147–55. [DOI] [PubMed] [Google Scholar]
- 68.Pacifici R, Civitelli R, Rifas L, et al. Does interleukin-1 affect intracellular calcium in osteoblast-like cells (UMR-106)? J Bone Miner Res 1988;3:107–11. [DOI] [PubMed] [Google Scholar]
- 69.Favaro E, Bensaad K, Chong MG, et al. Glucose utilization via glycogen phosphorylase sustains proliferation and prevents premature senescence in cancer cells. Cell Metab 2012;16:751–64. [DOI] [PubMed] [Google Scholar]
- 70.Terashima M, Fujita Y, Togashi Y, et al. KIAA1199 interacts with glycogen phosphorylase kinase beta-subunit (PHKB) to promote glycogen breakdown and cancer cell survival. Oncotarget 2014;5:7040–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Cheng Z, Wei W, Wu Z, et al. ARPC2 promotes breast cancer proliferation and metastasis. Oncol Rep 2019;41:3189–3200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Yoon YJ, Han YM, Choi J, et al. Benproperine, an ARPC2 inhibitor, suppresses cancer cell migration and tumor metastasis. Biochem Pharmacol 2019;163:46–59. [DOI] [PubMed] [Google Scholar]
- 73.Li K, Guo W, Li Z, et al. ALDH2 Repression Promotes Lung Tumor Progression via Accumulated Acetaldehyde and DNA Damage. Neoplasia 2019;21:602–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Seo W, Gao Y, He Y, et al. ALDH2 deficiency promotes alcohol-associated liver cancer by activating oncogenic pathways via oxidized DNA-enriched extracellular vesicles. J Hepatol 2019;71:1000–1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Rapetti-Mauss R, Bustos V, Thomas W, et al. Bidirectional KCNQ1:beta-catenin interaction drives colorectal cancer cell differentiation. Proc Natl Acad Sci U S A 2017;114:4159–4164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hulur I, Gamazon ER, Skol AD, et al. Enrichment of inflammatory bowel disease and colorectal cancer risk variants in colon expression quantitative trait loci. BMC Genomics 2015;16:138. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
