Abstract
Background
Colorectal cancer (CRC) affects approximately 1.9 million people globally each year. While CRC development involves complex genetic and environmental interactions, the underlying molecular mechanisms remain incompletely understood. This study integrated multi-omics data to investigate gene-CRC associations across methylation, expression, and protein levels.
Methods
We obtained summary statistics from methylation QTL (mQTL), expression QTL (eQTL), and protein QTL (pQTL) studies. CRC genetic associations were derived from a meta-analysis of 31 GWAS datasets (100,204 cases, 154,587 controls). Summary data-based Mendelian randomization (SMR) analysis assessed associations between molecular features and CRC risk, followed by colocalization analysis to identify shared causal variants. Functional enrichment analysis was performed using Gene Ontology (GO) and KEGG pathway databases.
Results
SMR analysis identified 2,387 methylation associations (837 genes) and 707 expression associations with CRC. Integration revealed 158 overlapping genes, with six proteins (CCM2, FTCD, ICAM1, LTA, PCSK7, TNFSF14) validated for CRC association. Four genes—CCM2, FTCD, ICAM1, and TNFSF14—showed consistent effects across expression and protein levels. Higher CCM2 and TNFSF14 levels were protective, while higher FTCD and ICAM1 levels increased CRC risk. Colocalization analysis confirmed that CCM2 (PPH4 = 0.857) and ICAM1 (PPH4 = 0.812) share genetic variants with CRC. Functional enrichment analysis revealed significant involvement in immune-related processes, including interferon-gamma signaling, antigen presentation, and NF-kappa B pathway, as well as cell adhesion and endoplasmic reticulum functions.
Conclusions
Our multi-omics integration identified CCM2 and ICAM1 as genes causally associated with CRC risk through shared genetic architecture. Functional analysis highlighted their roles in immune regulation and cell adhesion processes. These findings enhance understanding of CRC pathogenesis and highlight potential therapeutic targets.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12885-025-14798-2.
Keywords: Colorectal cancer, Mendelian randomization, Key genes
Introduction
Colorectal cancer (CRC) is the third most common malignancy globally and occupies an important position in the worldwide cancer spectrum [1]. As one of the leading causes of cancer-related death, the treatment and prevention of CRC are particularly important. Although traditional colonoscopy is considered the gold standard for diagnosis, its invasive nature, time consumption, and high cost make early screening and diagnosis challenging. Therefore, there is an urgent need to develop non-invasive and accurate early screening and diagnostic biomarkers to improve CRC detection.
The development of CRC generally involves three types of genetic and epigenetic alterations: (i) Chromosomal instability caused by various factors, including loss or gain of chromosomal fragments and rearrangements that lead to genetic instability and loss of heterozygosity [2]; (ii) Alterations in the CpG island methylator phenotype [3, 4]; and (iii) Microsatellite instability in DNA regions, which accounts for a substantial proportion of CRC cases [5]. Additionally, well-established pathways such as WNT signaling and mismatch repair mechanisms play crucial roles in CRC pathogenesis. WNT pathway dysregulation leads to aberrant cell proliferation and differentiation, while defective mismatch repair systems result in microsatellite instability and increased mutation rates. Since these genetic variations involve multiple mechanisms and pathways, it is necessary to investigate CRC at various genetic levels—including gene methylation, gene expression, and circulating protein abundance—to achieve a multidimensional understanding.
Each omics layer provides unique mechanistic insights into CRC biology. DNA methylation patterns serve as crucial epigenetic regulatory mechanisms, where examining methylation levels at specific CpG sites can identify genes whose expression may be silenced or activated due to epigenetic modifications. This layer provides insights into early regulatory events that may predispose cells to malignant transformation. Gene expression data reflects the direct consequences of regulatory mechanisms, including both genetic and epigenetic factors. By analyzing transcriptomic profiles, we can observe the functional impact of these regulatory changes, helping understand biological processes that are upregulated or downregulated in CRC. Proteomic data offers a functional dimension to our understanding, as proteins are the ultimate executors of biological functions. Protein abundance and activity can be influenced by multiple factors, including transcription, translation, and post-translational modifications, making this layer critical for linking molecular changes to disease phenotypes and identifying potential therapeutic targets.
Mendelian randomization (MR) analysis uses genetic variants as instrumental variables to enhance the understanding of causal relationships between exposures and outcomes. Compared with observational studies, MR is less susceptible to confounding and reverse causation, as genetic variants are randomly allocated at conception and are unaffected by disease onset. With the increasing availability of large-scale genome-wide association studies (GWAS) and molecular QTL data, it is now feasible to explore the causal relationships between gene regulation and CRC through methylation, expression, and protein levels.
In our study, we used summary data–based Mendelian randomization (SMR) analysis to investigate the potential associations between gene methylation, gene expression, and CRC risk. Because proteins are the ultimate products of gene expression and serve as effectors for most biological processes—as well as being the targets of many drugs—validating the relationship between genes (with positive signals in both methylation and expression) at the protein level is fundamental to establishing causality. By integrating multi-omics evidence and employing functional enrichment analysis, our approach aims to complement and expand upon established CRC mechanisms, potentially identifying novel contributors to CRC risk while reinforcing the importance of known pathways such as WNT signaling and mismatch repair systems.
Methods
Sources of exposure data
The integration of multi-omics data enables us to reveal the underlying molecular network regulation of CRC at various levels. QTL analyses help uncover the associations between single nucleotide polymorphisms (SNPs) and DNA methylation levels, gene expression, and protein abundance.
In this study, summary data for gene methylation, gene expression, and protein abundance were obtained from relevant methylation, expression, and protein QTL studies. Specifically, blood SNP-CpG association data were obtained from an mQTL dataset by McRae et al., which involved 1,980 individuals of European descent [6]. Blood expression QTL (eQTL) data were extracted from the eQTLGen consortium, which included 31,684 individuals [7]. Protein validation data were derived from a pQTL study by Ferkingstad et al., involving summary statistics from 35,559 Icelanders, with protein levels adjusted for age, gender, and sample age [8].
Sources of outcome data
The CRC outcome data were derived from a meta-analysis using a fixed-effect inverse-variance weighted model from 31 GWAS datasets, including 100,204 CRC cases and 154,587 controls of European ancestry [9].
Summary data–based Mendelian randomization analysis and HEIDI test
SMR analysis and the HEIDI test were used to evaluate the associations between gene methylation, gene expression, and CRC risk. Compared with other methods, the SMR and HEIDI approaches can distinguish between pleiotropy and linkage models [10]. For analyses based on top-associated cis-QTLs, when the exposure and outcome data are obtained from two large independent samples, the SMR-multi method offers higher statistical power than traditional MR analysis.
We primarily focused on cis-QTLs due to their direct regulatory relationship with nearby genes, which is a common approach in genetic association studies. Top-associated cis-QTLs were selected within a ± 1,000 kb window centered on the gene and filtered using a P-value threshold (5.0 × 10^−8 in this study). To explore the potential impact of trans signals and address concerns about missing relevant trans associations, we conducted preliminary trans-QTL analyses using available data. While trans signal identification is more challenging due to smaller effect sizes and stringent multiple testing corrections, we included these findings to provide a more comprehensive view of the genetic architecture underlying CRC.
Data with allele frequency differences exceeding 0.2 were excluded, using an LD reference sample, QTL summary data, and outcome summary data. The HEIDI test distinguishes between pleiotropy and linkage by detecting heterogeneity; if the P-HEIDI value is less than 0.01, the association may be due to pleiotropy and is discarded. These analyses were performed using the SMR software tool (version 1.3.1), and P-values were adjusted using the Benjamini-Hochberg method to control the false-positive rate at < 0.05 after multiple comparisons. In the SMR results for gene methylation and gene expression, associations with an FDR-adjusted P-value < 0.05 and a P-HEIDI value > 0.01 were considered positive [11].
Based on the SMR results for gene methylation and gene expression, the overlapping genes were identified. For these overlapping genes, circulating protein abundance data were used for further validation of their association with CRC. Again, an FDR-adjusted P-value < 0.05 and a P-HEIDI value > 0.01 were required to deem the validation positive.
Colocalization analysis
After achieving significant positive results for protein-level validation with CRC, colocalization analysis was conducted to determine whether the regulation of protein expression levels and CRC risk was driven by the same causal variant within a specific region [12]. In the colocalization analysis, five posterior probabilities are reported corresponding to five exclusive hypotheses: (1) Neither circulating protein abundance nor CRC is associated with a causal variant (PPH0); (2) Only circulating protein abundance has a causal variant (PPH1); (3) Only CRC risk has a causal variant (PPH2); (4) Both protein abundance and CRC have causal variants but in distinct regions (PPH3); (5) Circulating protein abundance and CRC share the same regional causal variant (PPH4).
To minimize interference from linkage disequilibrium and unrelated SNPs, the colocalization region was restricted to SNPs within 50 Kb upstream and downstream of the gene encoding the protein. A PPH4 posterior probability > 0.80 was considered supportive evidence of colocalization, corresponding to a false discovery rate of < 5%, thereby strengthening the evidence for causality [13].
Functional enrichment analysis
To explore the biological roles and related pathways of the identified genes, we performed comprehensive functional enrichment analysis. Gene Ontology (GO) analysis was conducted to annotate the significant genes based on their biological processes, cellular components, and molecular functions using the GO database. Additionally, pathway analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database and the Reactome pathway database to identify key biological pathways associated with the identified genes.
The enrichment analysis was performed using appropriate statistical methods with significance thresholds set at adjusted P-value < 0.05. This analysis helped identify pathways potentially involved in CRC development and provided insights into the functional relevance of our findings to CRC pathogenesis.
Results
Gene methylation analysis and CRC association
The SMR analysis of gene methylation and CRC (with FDR-adjusted P < 0.05 and P-HEIDI > 0.01) yielded 2,387 positive associations, including associations between 1,813 CpG probes and 837 unique genes. These methylation-based associations demonstrated significant regulatory potential in CRC development, with effect sizes ranging from moderate to large magnitudes. The identified genes showed diverse biological functions, including cell cycle regulation, DNA repair mechanisms, and inflammatory response pathways. Key genes with the strongest methylation signals included several known cancer-related genes as well as novel candidates that warrant further investigation. (Supplementary Table 1: Complete list of gene methylation associations with CRC)
Gene expression analysis and CRC association
The SMR analysis of gene expression and CRC (with FDR-adjusted P < 0.05 and P-HEIDI > 0.01) identified 707 gene expression eQTLs significantly associated with CRC. These expression-based associations revealed both upregulated and downregulated genes in relation to CRC risk, providing insights into the transcriptomic landscape of CRC susceptibility. The identified eQTLs showed consistent directional effects, with approximately 60% demonstrating increased expression associated with elevated CRC risk, while 40% showed protective effects through reduced expression levels. Notable genes included those involved in immune regulation, cell adhesion, and metabolic processes.(Supplementary Table 2: Gene expression eQTLs significantly associated with CRC).
Multi-omics integration and protein validation
Taking the intersection of the 837 unique genes from the gene methylation analysis and the 707 significant genes from the gene expression analysis yielded 158 common significant associations. This overlap represents genes that demonstrate consistent regulatory signals across both epigenetic and transcriptomic levels, strengthening the evidence for their involvement in CRC pathogenesis. (Supplementary Table 3: List of 158 overlapping genes from methylation and expression analyses)
Since proteins are the ultimate products of gene expression and carry significant validation value, the circulating protein abundance data for the 158 genes were further subjected to SMR validation with CRC,. This validation identified six proteins—CCM2, FTCD, ICAM1, LTA, PCSK7, and TNFSF14—that were significantly associated with CRC. Among these six proteins, four genes (CCM2, FTCD, ICAM1, and TNFSF14) exhibited consistent directional effects between gene expression eQTL and protein abundance pQTL levels, while LTA and PCSK7 showed discordant directions between expression and protein levels, summarized as Figs. 1, 2 and 3.
Fig. 1.
SMR results of gene methylation validated by protein data in relation to CRC. The forest plot displays the effect sizes (beta coefficients) and 95% confidence intervals for the association between genetically predicted methylation levels and CRC risk for the six validated proteins. Negative values indicate protective effects (reduced CRC risk), while positive values indicate increased CRC risk. Statistical significance is denoted by non-overlapping confidence intervals with the null line
Fig. 2.
SMR results of gene expression validated by protein data in relation to CRC. The forest plot shows the effect sizes (beta coefficients) and 95% confidence intervals for the association between genetically predicted gene expression levels and CRC risk for the six validated proteins. The plot demonstrates the consistency or discordance between expression-level and protein-level effects on CRC risk
Fig. 3.
SMR results of protein expression levels in relation to CRC. The forest plot presents the effect sizes (beta coefficients) and 95% confidence intervals for the direct association between genetically predicted circulating protein abundance and CRC risk. This represents the final validation step in our multi-omics pipeline
Given that disease mechanisms include multiple layers of regulation (such as DNA methylation, alternative splicing, and post-translational modifications) and that various mechanisms can interact, we selected for further analysis the genes for which the effects of eQTL and pQTL on CRC risk were consistent. These genes included CCM2, FTCD, ICAM1, and TNFSF14, whose protein data were subjected to colocalization analysis with CRC.
Colocalization analysis
Colocalization analysis was performed to determine whether the identified protein-CRC associations reflect shared causal variants rather than independent signals in linkage disequilibrium. The analysis revealed that CCM2 (PPH4 = 0.857) and ICAM1 (PPH4 = 0.812) demonstrated strong evidence for colocalization with CRC, exceeding the threshold of PPH4 > 0.80. In contrast, FTCD (PPH4 = 0.261) and TNFSF14 (PPH4 = 0.052) showed weaker evidence for shared causal variants, suggesting that their associations with CRC may be mediated through distinct genetic mechanisms or linkage disequilibrium effects (Supplementary Table 4 Colocalization results for protein expression levels and CRC (± 50 K)), summarized as Fig. 4.
Fig. 4.
Genetic association plots for protein expression levels and CRC. Panel A displays the regional association plot for CCM2 with CRC, showing the genomic positions, effect sizes, and statistical significance of variants within the colocalization region. Panel B presents the corresponding plot for ICAM1 with CRC. The plots demonstrate the overlap of association signals between protein expression QTLs and CRC GWAS, supporting the colocalization findings
Functional enrichment analysis
We performed comprehensive functional enrichment analysis to explore the biological roles and related pathways of the identified genes. Gene Ontology (GO) analysis revealed significant enrichment in several key biological processes, including “regulation of response to interferon-gamma” (P-adjusted = 2.3E-04), “regulation of interferon-gamma-mediated signaling pathway” (P-adjusted = 1.8E-03), and “negative regulation of viral process” (P-adjusted = 4.2E-03). These results highlight the potential involvement of the identified genes in immune-related responses, particularly those mediated by interferon-gamma signaling cascades.
In terms of cellular components, the genes were significantly enriched in “intrinsic component of endoplasmic reticulum membrane” (P-adjusted = 1.5E-02) and “integral component of lumenal side of endoplasmic reticulum membrane” (P-adjusted = 2.8E-02), suggesting their roles in endoplasmic reticulum-associated functions and protein processing mechanisms. For molecular functions, enrichment was observed in “peptide antigen binding” (P-adjusted = 6.7E-03) and “MHC class I protein binding” (P-adjusted = 1.2E-02), indicating the genes’ possible roles in antigen presentation and immune recognition processes.
KEGG pathway analysis further illuminated the functional landscape of these genes. Significantly enriched pathways included “Allograft rejection” (P-adjusted = 3.4E-05), “Antigen processing and presentation” (P-adjusted = 8.9E-04), and “NF-kappa B signaling pathway” (P-adjusted = 2.1E-03). These findings imply that the genes may be involved in immune response and inflammation-related processes critical for CRC development. Other notable pathways included “Epstein-Barr virus infection” (P-adjusted = 4.6E-03), “Toxoplasmosis” (P-adjusted = 7.8E-03), and “Human papillomavirus infection” (P-adjusted = 1.3E-02), which suggested potential roles of these genes in pathogen infection and associated immune responses. Additionally, pathways such as “Cell adhesion molecules” (P-adjusted = 5.2E-03) and “TGF-beta signaling pathway” (P-adjusted = 9.4E-03) were also enriched, pointing to the genes’ possible involvement in cell-adhesion and communication processes, summarized as Fig. 5.
Fig. 5.
Functional enrichment analysis results. Panel A shows the top 10 enriched Gene Ontology biological processes with their corresponding adjusted P-values and gene ratios. Panel B displays the significantly enriched KEGG pathways, with bubble sizes representing the number of genes and colors indicating statistical significance levels
.
Summary of multi-omics integration
The comprehensive multi-omics analysis successfully identified four genes (CCM2, FTCD, ICAM1, and TNFSF14) with consistent effects across gene expression and protein abundance levels. Among these, CCM2 and ICAM1 demonstrated strong colocalization evidence with CRC, suggesting genuine causal relationships. The functional enrichment analysis revealed that these genes are predominantly involved in immune regulation, inflammatory responses, and cell adhesion processes, providing mechanistic insights into their potential roles in CRC pathogenesis.
Discussion
After validation at the levels of gene methylation, gene expression, and circulating protein abundance, we ultimately identified six proteins that are closely associated with colorectal cancer (CRC) risk. Integration of multi-omics data revealed that CCM2, FTCD, ICAM1, and TNFSF14 exhibited consistent effects on CRC risk at both the gene expression and protein abundance levels. Further colocalization analysis demonstrated that only CCM2 and ICAM1 had colocalization PPH4 values greater than 0.8, indicating that the protein abundance levels of CCM2 and ICAM1 share the same regional genetic variants with CRC. The functional enrichment analysis revealed that these genes are predominantly involved in immune-related pathways, including interferon-gamma signaling, antigen presentation, and NF-kappa B signaling, highlighting their potential roles in immune regulation and inflammatory processes critical for CRC development.
Mechanistic Insights into CCM2 and CRC
We now focus on the associations between CCM2 and ICAM1 with CRC. CCM2 belongs to the CCM family, which also includes CCM1 (also known as KRIT1) and CCM3 (also known as PDCD10). Mutations in any of these genes can lead to cerebrovascular lesions, resulting in cerebral cavernous malformations (CCMs). CCMs are common cerebrovascular abnormalities that can lead to lifelong cerebral hemorrhage, seizures, and neurological sequelae, and currently, no effective pharmacological treatments exist [14]. CCM2 is involved in disease mechanisms related to hypoxia and angiogenesis [15], and inflammatory processes along with NLRP3 inflammasome activity play key roles in CCM lesion development [16]. In addition to CCMs, previous studies have indicated that CCM2 is associated with other vascular diseases, such as coronary atherosclerosis [17].
In our study, the significant association between CCM2 and CRC raises interesting questions given the close relationship between intestinal and cerebral diseases. The gut–brain axis is implicated in many conditions, including stroke [18], various forms of dementia [19], Parkinson’s disease [20], inflammatory bowel disease [21], and even systemic metabolic disorders [22]. Insights from the gut–brain axis may facilitate more effective treatments for distant brain diseases, though this requires clearly defined inter-organ pathways. Various mechanisms have been proposed to explain the gut–brain connection, including the effects of microbial metabolites on brain function, the influence of the gut microbiome on local immune cells that may migrate to the brain, and direct communication mediated by neural or circulatory factors. Although the relationship between CCM2 and colon cancer has not been explicitly reported, related research has identified the intestinal barrier as an important component of the CCM gut–brain axis [23].
In cancer-related studies, CCM2 has been linked to aging-related cancers and fibrosis [24]. Dysfunction of ROCK can lead to premature senescence of endothelial cells through CCM2 depletion; ROCK1 and ROCK2 mediate this depletion, and the resultant dysregulation of CCM2 affects the tumor microenvironment via senescence, cell cycle arrest, and the senescence-associated secretory phenotype (SASP), thereby promoting the invasiveness of neighboring cells and harmful inflammatory responses that facilitate tumor development [24, 25]. Whether the specific mechanisms linking CCM2 to CRC are consistent with these pathways requires further experimental validation.
ICAM1 and CRC progression
Intercellular adhesion molecule-1 (ICAM-1) is a transmembrane glycoprotein receptor that belongs to the immunoglobulin superfamily and consists of a short intracellular domain and an Ig-like extracellular domain [26]. ICAM-1 plays a critical role in the adhesion processes between macrophages, leukocytes, and endothelial cells during migration and has been widely studied [27]. Existing studies have demonstrated that ICAM-1 is associated with the promotion of several types of cancer. The interaction between ICAM-1 and its specific ligands facilitates adhesion between cancer cells and the vascular endothelium, thereby promoting metastasis [28]. Furthermore, research has shown that ICAM-1 expression is positively correlated with cancer progression and metastasis [29, 30], which is in line with our multi-omics findings.
In the context of CRC risk and metastasis, experimental studies indicate that Fusobacterium nucleatum may induce upregulation of ICAM-1 via ALPK1-mediated activation of the NF-κB pathway, thereby enhancing the adhesion of CRC cells to endothelial cells and facilitating metastasis [31]. This mechanism aligns with our functional enrichment findings, which identified significant involvement of NF-kappa B signaling pathway among the genes associated with CRC risk. Whether ICAM-1 participates in the pathogenesis of CRC through additional mechanisms remains to be elucidated through further experimentation.
Therapeutic implications and clinical relevance
Our findings have significant therapeutic implications for CRC management. Regarding CCM2, research indicates that it plays a crucial role in regulating vascular integrity and cell adhesion. Studies have shown that fasudil, a Rho kinase (ROCK) inhibitor, can ameliorate vascular leakage and endothelial dysfunction caused by CCM2 depletion. This suggests that ROCK inhibition may have therapeutic potential in conditions related to CCM2 dysfunction, potentially including CRC prevention or treatment strategies targeting vascular components of the tumor microenvironment.
For ICAM1, several studies have highlighted its role in promoting cancer progression and metastasis. ICAM1 is known to facilitate the adhesion of cancer cells to the vascular endothelium, thereby enhancing metastasis. In CRC, higher expression of ICAM1 has been associated with increased lymph node metastases and poorer prognosis. Targeting ICAM1 or its associated signaling pathways (such as the c-MET/ICAM-1/SRC signaling axis) may offer a promising therapeutic strategy for CRC patients. Additionally, various cytokines (e.g., TNF-α, IFN-γ, IL-1β, IL-6) have been found to regulate ICAM1 expression through different signaling pathways, suggesting potential targets for therapeutic intervention.
Gene-environment interactions and lifestyle factors
Furthermore, we explored the potential interaction between CCM2 or ICAM1 variants and lifestyle factors in CRC. While our study did not directly investigate these interactions, existing research suggests that lifestyle factors such as diet and physical activity can influence inflammatory pathways and CRC risk. For example, higher levels of inflammatory biomarkers (e.g., IL-8, sTNFR-2) are associated with increased CRC risk, and these levels can be modified by lifestyle factors. While specific interactions between CCM2 or ICAM1 variants and lifestyle factors in CRC require further investigation, it is plausible that such interactions may influence CRC risk and progression. Future studies could explore whether individuals with specific genetic variants in these genes may benefit differentially from lifestyle interventions, potentially leading to personalized prevention strategies.
Comparison with established CRC pathways
Our integrative approach complements and extends beyond established CRC biomarkers and pathways. While WNT signaling and mismatch repair pathways are well-characterized mechanisms in CRC development, our multi-omics analysis has identified additional layers of complexity involving immune regulation and vascular integrity. The genes we identified (CCM2 and ICAM1) may interact with or provide additional insights beyond these established markers. For instance, the involvement of CCM2 in vascular regulation could intersect with angiogenesis pathways that are crucial for tumor growth, while ICAM1’s role in immune cell adhesion may modulate the tumor microenvironment in ways that complement traditional WNT-driven proliferation signals.
The functional enrichment analysis further supports this complementary role, revealing significant involvement in interferon-gamma signaling and antigen presentation processes, which represent immune surveillance mechanisms that may counteract or interact with classical oncogenic pathways. This suggests that our identified genes may represent a distinct but interconnected network of CRC risk factors that operates alongside well-established pathways.
External validation and future experimental directions
To strengthen the validity of our findings, we acknowledge the importance of external replication and functional experiments. Several studies have provided preliminary support for the roles of CCM2 and ICAM1 in cancer biology. For example, CCM2 has been associated with other vascular diseases, such as coronary atherosclerosis, and ICAM1 expression has been consistently linked to cancer progression and metastasis across multiple cancer types. These findings provide some external validation of the potential roles of these genes in CRC.
Future functional experiments should include: In vitro experiments using CRC cell lines to perform knockdown or overexpression studies of CCM2 and ICAM1 to investigate their effects on cell proliferation, migration, and invasion. In vivo experiments utilizing animal models to explore the impact of these genes on tumor development and progression. Molecular mechanism studies investigating the specific pathways through which CCM2 and ICAM1 may influence CRC biology, such as their interactions with other proteins and their roles in signaling cascades.
Additionally, clinical validation studies examining CCM2 and ICAM1 expression levels in CRC patient tissues compared to normal controls, and longitudinal cohort studies investigating whether genetic variants in these genes predict CRC risk in independent populations would provide crucial validation of our computational findings.
Integration with Trans-QTL signals
While our study primarily focused on cis-QTLs due to their direct regulatory relationships with nearby genes, we acknowledge that this approach may have overlooked relevant trans signals that could also contribute to CRC risk. Our preliminary trans-QTL analysis revealed several potential trans associations that warrant further investigation. Future research leveraging larger datasets and more advanced statistical methods could enhance the power to detect trans signals and provide a more complete understanding of the complex genetic mechanisms involved in CRC. The integration of both cis and trans regulatory elements may reveal additional layers of gene regulation that contribute to CRC pathogenesis.
Strengths and limitations
A major strength of our study is the use of both SMR and colocalization analyses to perform a multi-omics investigation of CRC, leveraging genetic variants to estimate the causal effects of gene methylation, gene expression, and protein abundance. The integration of evidence from multiple omics levels reinforces the causal link between the implicated genes and CRC risk. The MR design minimizes bias from confounding and reverse causation, while the colocalization method is a powerful tool for eliminating potential bias caused by linkage disequilibrium. The addition of functional enrichment analysis provides biological context and mechanistic insights that strengthen the interpretation of our findings.
However, our study also has several limitations. First, the lack of comprehensive pQTL data in public databases limited our ability to validate findings using circulating protein levels. Future larger and more comprehensive pQTL datasets could enable more detailed studies on CRC. Second, in our SMR analysis, to avoid bias, we selected genes with consistent effect directions across gene methylation, gene expression, and subsequent protein abundance validation and applied a strict colocalization threshold (PPH4 > 0.8), which might have excluded some genes or proteins with weaker effects on CRC risk. Third, the GWAS data used in this study were based on European populations, which limits the generalizability of our findings to other ethnic groups. Fourth, while we conducted preliminary trans-QTL analyses, the focus on cis-QTLs may have missed important distant regulatory effects that contribute to CRC risk.
Conclusion
Our study demonstrates that the integration of multi-omics evidence—including gene methylation, gene expression, and circulating protein abundance—identifies CCM2, FTCD, ICAM1, and TNFSF14 as being associated with CRC risk, with CCM2 and ICAM1 also sharing regional genetic variants with CRC. The validation of circulating proteins, combined with functional enrichment analysis revealing involvement in immune regulation and inflammatory pathways, provides deeper insights into the mechanisms of CRC pathogenesis and potential therapeutic targets. These findings complement established CRC pathways and highlight the value of integrative multi-omics approaches in cancer research. Future experimental validation and clinical studies will be essential to translate these computational findings into therapeutic applications.
Supplementary Information
Acknowledgements
Thank for the support and assistance from the National Natural Science Foundation for this paper.
Abbreviations
- CRC
Colorectal cancer
- mQTL
Methylation QTL
- eQTL
Expression QTL
- pQTL
Protein QTL
- SMR
Summary data-based Mendelian randomization
- GO
Gene Ontology
- MR
Mendelian randomization
- GWAS
Genome-wide association studies
- SNPs
Single nucleotide polymorphisms
- KEGG
Kyoto Encyclopedia of Genes and Genomes
Authors’ contributions
Lianheng Xia, Jiaxin Wang performed the experiments and analyzed the data. Jie Gao and Xuan Cui contributed to the experimental design and data interpretation. Xuan Cui prepared the figures and tables. Meiyu Song supervised the project and provided funding support. Wukun Ding and Jiayuan Zhang wrote the main manuscript text. All authors reviewed the manuscript.
Funding
National Natural Science Foundation of China, NO.82305253 Nature Scientific Foundation of Heilongjiang Province, NO.LH2022H075.
Data availability
The data sets generated and/or analyzed during the current research period are available in the [GWAS] repository [https://gwas.mrcieu.ac.uk] Obtained in, it is a public database and does not require ethical approval.
Declarations
Ethics approval and consent to participate
I declare that the authors have no competing interests as defined by BMC, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Consent for publication
The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. [DOI] [PubMed] [Google Scholar]
- 2.Bakhoum SF, Silkworth WT, Nardi IK, Nicholson JM, Compton DA, Cimini D. The mitotic origin of chromosomal instability. Curr Biol. 2014;24(4):R148–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Markowitz SD, Bertagnolli MM. Molecular origins of cancer: molecular basis of colorectal cancer. N Engl J Med. 2009;361(25):2449–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nazemalhosseini Mojarad E, Kuppen PJ, Aghdaei HA, Zali MR. The CpG island methylator phenotype (CIMP) in colorectal cancer. Gastroenterol Hepatol Bed Bench. 2013;6(3):120–8. [PMC free article] [PubMed] [Google Scholar]
- 5.Pawlik TM, Raut CP, Rodriguez-Bigas MA. Colorectal carcinogenesis: MSI-H versus MSI-L. Dis Markers. 2004;20(4–5):199–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.McRae AF, Marioni RE, Shah S, Yang J, Powell JE, Harris SE, Gibson J, Henders AK, Bowdler L, Painter JN, et al. Identification of 55,000 replicated DNA methylation QTL. Sci Rep. 2018;8(1):17605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Võsa U, Claringbould A, Westra HJ, Bonder MJ, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53(9):1300–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, Magnusson MI, Styrmisdottir EL, Gunnarsdottir K, Helgason A, Oddsson A, Halldorsson BV, et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet. 2021;53(12):1712–21. [DOI] [PubMed] [Google Scholar]
- 9.Fernandez-Rozadilla C, Timofeeva M, Chen Z, Law P, Thomas M, Schmit S, Díez-Obrero V, Hsu L, Fernandez-Tajes J, Palles C, et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and East Asian ancestries. Nat Genet. 2023;55(1):89–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–7. [DOI] [PubMed] [Google Scholar]
- 11.Wu Y, Zeng J, Zhang F, Zhu Z, Qi T, Zheng Z, Lloyd-Jones LR, Marioni RE, Martin NG, Montgomery GW, et al. Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits. Nat Commun. 2018;9(1):918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10(5):e1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dashti HS, Daghlas I, Lane JM, Huang Y, Udler MS, Wang H, Ollila HM, Jones SE, Kim J, Wood AR, et al. Genetic determinants of daytime napping and effects on cardiometabolic health. Nat Commun. 2021;12(1):900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Snellings DA, Hong CC, Ren AA, Lopez-Ramirez MA, Girard R, Srinath A, Marchuk DA, Ginsberg MH, Awad IA, Kahn ML. Cerebral cavernous malformation: from mechanism to therapy. Circ Res. 2021;129(1):195–215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lopez-Ramirez MA, Lai CC, Soliman SI, Hale P, Pham A, Estrada EJ, McCurdy S, Girard R, Verma R, Moore T, et al. Astrocytes propel neurovascular dysfunction during cerebral cavernous malformation lesion formation. J Clin Invest. 2021. 10.1172/JCI139570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lai CC, Nelsen B, Frias-Anaya E, Gallego-Gutierrez H, Orecchioni M, Herrera V, Ortiz E, Sun H, Mesarwi OA, Ley K, et al. Neuroinflammation plays a critical role in cerebral cavernous malformation disease. Circ Res. 2022;131(11):909–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schnitzler GR, Kang H, Fang S, Angom RS, Lee-Kim VS, Ma XR, Zhou R, Zeng T, Guo K, Taylor MS, et al. Convergence of coronary artery disease genes onto endothelial cell programs. Nature. 2024;626(8000):799–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Benakis C, Brea D, Caballero S, Faraco G, Moore J, Murphy M, Sita G, Racchumi G, Ling L, Pamer EG, et al. Commensal microbiota affects ischemic stroke outcome by regulating intestinal γδ T cells. Nat Med. 2016;22(5):516–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Faraco G, Brea D, Garcia-Bonilla L, Wang G, Racchumi G, Chang H, Buendia I, Santisteban MM, Segarra SG, Koizumi K, et al. Dietary salt promotes neurovascular and cognitive dysfunction through a gut-initiated TH17 response. Nat Neurosci. 2018;21(2):240–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sampson TR, Debelius JW, Thron T, Janssen S, Shastri GG, Ilhan ZE, Challis C, Schretter CE, Rocha S, Gradinaru V, et al. Gut microbiota regulate motor deficits and neuroinflammation in a model of parkinson’s disease. Cell. 2016;167(6):1469–e14801412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bonaz BL, Bernstein CN. Brain-gut interactions in inflammatory bowel disease. Gastroenterology. 2013;144(1):36–49. [DOI] [PubMed] [Google Scholar]
- 22.Wang PY, Caspi L, Lam CK, Chari M, Li X, Light PE, Gutierrez-Juarez R, Ang M, Schwartz GJ, Lam TK. Upper intestinal lipids trigger a gut-brain-liver axis to regulate glucose production. Nature. 2008;452(7190):1012–6. [DOI] [PubMed] [Google Scholar]
- 23.Tang AT, Sullivan KR, Hong CC, Goddard LM, Mahadevan A, Ren A, Pardo H, Peiper A, Griffin E, Tanes C, et al. Distinct cellular roles for PDCD10 define a gut-brain axis in cerebral cavernous malformation. Sci Transl Med. 2019. 10.1126/scitranslmed.aaw3521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Vannier DR, Shapeti A, Chuffart F, Planus E, Manet S, Rivier P, Destaing O, Albiges-Rizo C, Van Oosterwyck H, Faurobert E. CCM2-deficient endothelial cells undergo a ROCK-dependent reprogramming into senescence-associated secretory phenotype. Angiogenesis. 2021;24(4):843–60. [DOI] [PubMed] [Google Scholar]
- 25.Faget DV, Ren Q, Stewart SA. Unmasking senescence: context-dependent effects of SASP in cancer. Nat Rev Cancer. 2019;19(8):439–53. [DOI] [PubMed] [Google Scholar]
- 26.Kang JH, Uddin N, Kim S, Zhao Y, Yoo KC, Kim MJ, Hong SA, Bae S, Lee JY, Shin I, et al. Tumor-intrinsic role of ICAM-1 in driving metastatic progression of triple-negative breast cancer through direct interaction with EGFR. Mol Cancer. 2024;23(1):230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rahman A, Fazal F. Hug tightly and say goodbye: role of endothelial ICAM-1 in leukocyte transmigration. Antioxid Redox Signal. 2009;11(4):823–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Boyano MD, Garcia-Vázquez MD, López-Michelena T, Gardeazabal J, Bilbao J, Cañavate ML, Galdeano AG, Izu R, Díaz-Ramón L, Raton JA, et al. Soluble interleukin-2 receptor, intercellular adhesion molecule-1 and interleukin-10 serum levels in patients with melanoma. Br J Cancer. 2000;83(7):847–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Maeda K, Kang SM, Sawada T, Nishiguchi Y, Yashiro M, Ogawa Y, Ohira M, Ishikawa T, Hirakawa YSCK. Expression of intercellular adhesion molecule-1 and prognosis in colorectal cancer. Oncol Rep. 2002;9(3):511–4. [PubMed] [Google Scholar]
- 30.Tung SY, Chang SF, Chou MH, Huang WS, Hsieh YY, Shen CH, Kuo HC, Chen CN. CXC chemokine ligand 12/stromal cell-derived factor-1 regulates cell adhesion in human colon cancer cells by induction of intercellular adhesion molecule-1. J Biomed Sci. 2012;19(1):91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang Y, Zhang L, Zheng S, Li M, Xu C, Jia D, Qi Y, Hou T, Wang L, Wang B, et al. Fusobacterium nucleatum promotes colorectal cancer cells adhesion to endothelial cells and facilitates extravasation and metastasis by inducing ALPK1/NF-κB/ICAM1 axis. Gut Microbes. 2022;14(1): 2038852. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data sets generated and/or analyzed during the current research period are available in the [GWAS] repository [https://gwas.mrcieu.ac.uk] Obtained in, it is a public database and does not require ethical approval.





