Skip to main content
Medicine logoLink to Medicine
. 2024 Mar 15;103(11):e37484. doi: 10.1097/MD.0000000000037484

Identification and prognostic analysis of candidate biomarkers for lung metastasis in colorectal cancer

Yuxing Liu a, Chenming Liu b, Dong Huang a, Chenyang Ge a, Lin Chen a, Jianfei Fu a, Jinlin Du a,*
PMCID: PMC10939685  PMID: 38489730

Abstract

Colorectal cancer (CRC) is one of the most prevalent types of malignant tumors. It’s vital to explore new biomarkers and potential therapeutic targets in CRC lung metastasis through adopting integrated bioinformatics tools. Multiple cohort datasets and databases were integrated to clarify and verify potential key candidate biomarkers and signal transduction pathways in CRC lung metastasis. DAVID, STRING, UALCAN, GEPIA, TIMER, cBioPortal, THE HUMAN PROTEIN ATLAS, GSEA 4.3.2, FUNRICH 3.1.3, and R 4.2.3 were utilized in this study. The enriched biological processes and pathways modulated by the differentially expressed genes (DEGs) were determined with Gene Ontology, Kyoto Encyclopedia of Genes and Genomes. The search tool Retrieval of Interacting Genes and Cytoscape were used to construct a protein–protein interaction network among DEGs. Four hundred fifty-nine colorectal primary cancer and lung metastatic gene expression profiles were screened from 3 gene expression profiles (GSE41258, GSE68468, and GSE41568). Forty-one upregulated genes and 8 downregulated genes were identified from these 3 gene expression profiles and verified by the transcriptional levels of hub genes in other GEO datasets and The Cancer Genome Atlas database. Two pathways (immune responses and chemokine receptors bind chemokines), 13 key DEGs, 6 hub genes (MMP3, SFTPD, ABCA3, CLU, APOE, and SPP1), and 2 biomarkers (APOE, SPP1) with significantly prognostic values were screened. Forty-nine DEGs were identified as potential candidate diagnostic biomarkers for patients with CRC lung metastasis in present study. Enrichment analysis indicated that immune responses and chemokine receptors bind chemokines may play a leading role in lung metastasis of CRC, and further studies are needed to validate these findings.

Keywords: bioinformatics analysis, colorectal cancer, differentially expressed genes, hub gene, lung metastasis

1. Introduction

Colorectal cancer (CRC) is the second leading cause of cancer-related death worldwide as the third most common malignant cancers worldwide. According to statistics, 20% of CRC patients have already developed distant metastases when first diagnosed. While 50% to 60% of patients with CRC develop distant metastases throughout the course of the disease, 10% to 20% of them manifest as lung metastases, second only to liver metastases.[1,2] The median survival time reported for patients with CRC combined with lung metastases is 17.7 months, which is shorter than the median survival time for patients with CRC alone.[3,4] Therefore, how to improve the prognosis and prolong the survival time of patients with colorectal lung metastasis has become a hot topic.[5] However, the available therapeutic approaches are limited due to the weak understanding of the biology of lung metastasis in CRC, there is an urgent need to better understand the molecular mechanisms of lung metastatic CRC to improve the existing therapeutic approaches and reduce the mortality of patients with CRC.

Previous studies have shown that many different molecules are involved in the development of CRC metastases, for example, the CXCR4 gene that is involved in the occurrence of liver metastases in CRC. The activation of CXCR7 is thought to be the spread of CRC cells to the lung rather than the liver.[6] In addition, some genetic changes, such as WNT pathway activation and RAS mutations, may be associated with an increased proportion of lung metastases.[7] However, these results are insufficient to provide a complete picture of lung metastases in CRC.[8] Recently, bioinformatics has emerged as an effective and promising tool for screening genes and genetic pathways with significant abnormal expression associated with carcinogenic effects. This can provide a theoretical basis for identifying potential therapeutic targets for cancer and understanding cancer prognosis. In particular, many studies with integrated microarray analysis have reported that certain genes or pathways may be involved in CRC liver metastasis.[9] However, there are few studies in CRC lung metastases.[10]

To analyze and predict candidate biomarkers for lung metastasis in CRC, we screened key differentially expressed genes (DEGs) from gene expression profiles in the GEO database (Gene Expression Omnibus). Specifically, the biological functions and signal transduction pathways of the selected DEGs were identified through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. In addition, we constructed protein–protein interaction (PPI) networks and analyzed the prognostic value of candidate genes in CRC through data mining. Finally, we analyzed the RNA-seq dataset of GEO and The Cancer Genome Atlas (TCGA) databases with copy number variation, Gene Set Enrichment Analysis (GSEA), immunohistochemistry, and other methods for verification, so as to evaluate the reliability of the selected candidate biomarkers. Moreover, the verification results are basically consistent with conclusions. Therefore, this study will contribute to the in-depth understanding of molecular mechanism, contribute to the discovery of new suitable molecular diagnostic and therapeutic targets, and accurately predict the long-term prognosis of patients with CRC.

2. Materials and methods

2.1. NCBI-GEO

NCBI-GEO (https://www.ncbi.nlm.nih.gov/geo) is a free microarray/gene profile and next-generation sequencing (NGS) database. In the study, the expression datasets GSE41258, GSE68468, and GSE41568 were obtained from the GEO database. Among them, GSE41258 was based on the GPL96 platform (Affymetrix Human Genome U133A Array) and was published on October 2, 2012. According to the annotation information in the platform, we selected sample data from 186 primary CRC and 20 lung metastatic CRC. GSE68468 was based on the GPL96 platform (Affymetrix Human Genome U133A Array) and was published on May 2, 2015. According to the annotation information in the platform, we selected data from 186 primary colon tumors and 20 lung metastases. GSE41568 was based on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0Array) and was published on December 31, 2016. According to the annotation information in the platform, we selected sample data from 39 primary colorectal tumors and 8 lung metastases. To verify the reliability of the identification of DEGs, this study further examined the transcriptional levels of hub genes in other GEO datasets, containing GSE49355, GSE35834, GSE35144, GSE14297, and GSE68468.

2.2. GEO2R

GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r) is a data processing tool based on the GEO. Statistically significant difference were identified based on a classic t test, with P < .05 and |log2FC > 1| as the cutoff criteria. In this study, we used GEO2R to filter the original data to determine the DEGs and visualized them with SANGERBOX (https://sangerbox.com/Index), FUNRICH, and R.

2.3. Database for the Annotation, Visualization and Integrated Discovery

Database for the Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/home.jsp) is a comprehensive, functional annotation website that can help researchers better elucidate the biological function of submitted genes. The GO enrichment analysis and KEGG pathway enrichment analysis of DEGs were isolated from DAVID and visualized with R4.2.3.[11] Enrichment analysis includes biological processes (BP), cellular components, and molecular functions (MF).

2.4. Search tool Retrieval of Interacting Gene

Search tool Retrieval of Interacting Gene (STRING) (https://string-db.org/) aims to collect, score, and integrate all publicly available sources of PPI data and supplement data with underlying functional predictions. We constructed a DEG-encoded protein and PPI network by using STRING and analyzed the interactions among candidate DEG-encoded proteins, then visualized them by using Cytoscape3.10.1.[12]

2.5. UALCAN

UALCAN (http://ualcan.path.uab.edu/analysis.html) is a comprehensive network resources based on TCGA and MET 500 cohort data analysis. In this study, we used UALCAN to analyze the expression profiles of DEGs.[13] The Student t test was used to generate a P value. The P value cutoff was .05.

2.6. GEPIA

GEPIA (http://gepia.cancer-pku.cn/index.html) is an analytical tool containing RNA sequence expression data from 9736 tumor and 8587 normal tissue samples developed by Peking University. In this study, we performed differential gene expression analysis in CRC lung metastases and primary tumor tissues, and prognostic analysis for DEGs was performed by using GEPIA2 expression analysis module.[14] The cutoff for P values was .05. The Student t test was used to generate P values for expression analysis, and Kaplan–Meier curves were used for prognostic analysis.

2.7. TIMER

TIMER (https://cistrome.shinyapps.io/timer/) is a comprehensive resource for systematical analysis of different types of cancer immune infiltration and allows users to fully explore the clinical and genomic characteristics of the tumor.[15]

2.8. cBioPortal

cBioPortal (www.cBioportal.org) is a comprehensive web resource for visualizing and analyzing multi-dimensional cancer genomic data.[16] One thousand ninety-eight CRC samples were analyzed (TCGA, Firehose Legacy, Sidra-LUMC AC-ICAM, Nat Med2023, CPTAC-2 Prospective, Cell 2019). The mRNA expression z scores were obtained by using the Z score threshold 2.0.

2.9. Gene Set Enrichment Analysis

GSEA is a computational method that determines whether a predefined set of genes shows statistically significant agreement between two biological states, such as phenotypes. The difference is used to assess the tendency of that set of genes to be distributed in a gene table arranged by phenotypic correlation to judge its contribution to the phenotype.

2.10. THE HUMAN PROTEIN ATLAS

THE HUMAN PROTEIN AT LAS is a Swedish-based initiative since 2003 to map all human proteins in cells, tissues, and organs by using the integration of various omics techniques, including antibody-based imaging, mass spectroscopy-based proteomics, transcriptomics, and systems biology.[17] In this study, we used THE HUMAN PROTEIN AT LAS database for protein expression profiling.

3. Results

3.1. Screening and identification of DEGs

The profiles of GSE41258, GSE68468, and GSE41568 were respectively analyzed through online software GEO2R to screen DEGs between primary CRC samples and lung metastatic CRC samples by using |log2FC|>1 and adjusted P values <.05 as cutoff criteria. Finally, 117, 310, 612 differential genes (Fig. 1A–C) were extracted from the 3 expression profile datasets, respectively. Using FUNRICH software, we identified 49 consistent DEGs (Fig. 2A) from all 3 genomic datasets, including 41 upregulated and 8 downregulated genes (Table 1 and Fig. 2B, C). In addition, R (version 4.2.3) was used for cluster analysis and heatmapping to show the expression of 49 DEGs across the 3 datasets (Fig. 1D–F).

Figure 1.

Figure 1.

Distributions of differentially expressed genes in 1039 colorectal primary cancer and lung metastatic tissues (|log2FC| >1 and adjusted P value <.05). These were volcano maps and heatmaps of 3 datasets, included GSE41258 (A, D), GSE68468 (B, E), and GSE41568 (C, F) data set. Red stands for upregulations, green stands for downregulations, and black stands for normal expression in volcanoes. Each point represents a gene.

Figure 2.

Figure 2.

Venn diagram was visualized in FUNRICH software (A–C).

Table 1.

Up and downregulation of 49 DEGs in colorectal cancer lung metastasis.

DEGs Gene symbol
Up SFTPC, SFTPB, SFTPD, SLC34A2, FMO2, LTF, SCGB1A1, CYP4B1, EGFL6, C4BPA, MARCO, IGHD, ITGBL1, CADM1, CYP1B1, LTBP2, SLIT2, ABCA3, BGN, C7, NKX2-1, CCL19, MGP, PRELP, ENPP2, MYH10, CLDN5, APOC1, CCL18, CLU, PTGIS, APOE, ADH1B, SPP1, WIF1, CD52, DPT, CHI3L1, C3, CCL2, CCL5
Down MAB21L2, MMP3, NMU, ADAMDEC1, CXCL14, SPINK4, MUC2, ZG16

DEGs = differentially expressed genes.

3.2. GO and signaling pathway enrichment analysis

GO analysis from DAVID revealed that the candidate DEGs were divided into 3 functional groups: BP group, MF group, and cell component group. In the BP group (Fig. 3A), DEGs were enriched in a number of processes, for example, positive regulation of the ERK1 and ERK2 cascades, cellular response to tumor necrosis factor, immune response, and cellular response to interleukin-1. In the MF group (Fig. 3B), DEGs were enriched in the following processes: chemokine activity, extracellular matrix (ECM) structural components, CCR chemokine receptor binding, receptor binding, and other processes. In the cell component group (Fig. 3C), DEGs were mainly enriched in extracellular space, extracellular region, ECM, and extracellular exosomes. KEGG was mainly found in viral protein interactions with cytokines and cytokine receptor, complement and coagulation cascades, chemokine signaling pathway, and cytokine–cytokine receptor interaction (Fig. 3D).

Figure 3.

Figure 3.

The enrichment analysis of 49 DEGs in CRC lung metastases (David 2021). (A–C) Bubble diagram of GO enrichment in biological process terms, molecular function terms, and cellular component terms. (D) Bubble diagram of KEGG enriched terms. CRC = colorectal cancer, DEG = differentially expressed gene, GO = Gene Ontology, KEGG = Kyoto Encyclopedia of Genes and Genomes.

3.3. PPI network screening and enrichment analysis

We used the STRING database to filter 49 DEGs into a PPI network containing 49 nodes and 109 edges (Fig. 4A) with an average node degree of 4.45, and the average local clustering coefficient of 0.52 with a PPI concentration P value less than 1.0e-16. Seven of 49 DEGs (EGFL6, ENPP2, IGHD, MAB21L2, PTGIS, WIF1, SLIT 2) excluded the PPI network. Therefore, we finally screened out 42 DEGs that were designated as key genes. Meanwhile, the 49 genes were divided into three categories by K-means cluster analysis, and 35 key genes were screened out. Then, we used Cytoscape to remove the non-node genes and made the PPI network diagram (Fig. 4B, C) according to the interaction and expression of the nodes. Thirteen candidate genes were selected by MCODE module analysis in Cytoscape (Fig. 4D, E, Table 2). Based on the PPI network analysis of string database, we divided 35 key genes into 2 modules containing 14 genes. Module 1 contained proteins that mainly regulated humoral immune, while module 2 mainly contained chemokine receptors bind chemokines. Specifically, module 1 included genes CCL19, CXCL14, CCL18, C7, C3, LTF, CCL2, CLU, and C4BPA. Module 2 included the genes CCL5, CCL2, CCL19, CXCL14, and CCL18. Moreover, CCL19, CXCL14, CCL18, CCL2 were involved in both pathways. In addition, C7, C3, C4BPA, CLU were enriched in complement and coagulation cascades. According to the analysis of the 3 functional groups, 31 DEGs were enriched in the extracellular region, 25 DEGs were enriched in the extracellular space, and 12 DEGs were enriched in the ECM.

Figure 4.

Figure 4.

PPI network was visualized in STRING and CYTOSCAPE. (A) The PPI network was visualized in STRING that contained 49 nodes and 109 edges, with an average node degree of 4.45, and an average local clustering coefficient of 0.52 with a PPI concentration P value less than 1.0e-16. (B) Seven of 49 DEGs (EGFL6, ENPP2, IGHD, MAB21L2, PTGIS, WIF1, SLIT 2) did not fall within the PPI network. (C, D) PPI network of 42 DEGs was visualized in CYTOSCAPE. (E) MCODE Genes were visualized in CYTOSCAPE, included 13 nodes and 39 edges. Yellow and green in the node means down regulation, blue and red represents upregulation. The size of the node represents the size of the node degree. Module analysis of DEGs enrolled in PPI network with the criterion degree cutoff = 2, node score cutoff = 0.2, k-core = 2, max depth = 100. DEG = differentially expressed gene, PPI = protein–protein interaction, STRING = search tool Retrieval of Interacting Genes.

Table 2.

Thirteen DEGs in modular analysis.

DEGs Gene symbol
Up SFTPC, SFTPB, SFTPD, LTF, SCGB1A1, ABCA3, NKX21, CLU, APOE, SPP1, CHI3L1, CCL2
Down MMP3

DEGs = differentially expressed genes.

3.4. GSEA enrichment analysis

We used the GSE41258 gene set for enrichment analysis. A total of 12,548 effective genes were screened out from the GSE41258 dataset, and the selection criteria for gene set size was set as minimum value equal to 15 and maximum value equal to 500. A total of 5750 gene sets were removed, and the remaining 5427 gene sets were used for enrichment analysis. According to the analysis results, 2511 gene sets were upregulated in the lung metastasis phenotype of CRC, 536 gene sets were significantly enriched under an false discovery rate (FDR) <25% condition, 315 gene sets were significantly enriched under a P < .01 condition, and 535 gene sets were significantly enriched under a P < .05 condition. In the CRC phenotype, 2511 genes were upregulated, 536 were significantly enriched under an FDR <25% condition, 315 were significantly enriched under a P < .01 condition, and 535 gene sets were significantly enriched under a P < .05 condition. In this study, “|NES|>1, NOM P val < .05, and FDR q val < 0.25” were used as the criteria for significant pathway enrichment. The 20 gene sets with the highest enrichment scores (Fig. 5) were selected from the CRC lung metastases and CRC groups. The results showed that complement cascade, regulation of phagocytosis, multicellular organismal level chemical homeostasis, cell cycle checkpoints, adaptive immune response enriched in lung metastasis of CRC. In addition, 36 of the top 100 genes in the GSEA gene sequence belonged to 49 DEGs previously screened.

Figure 5.

Figure 5.

(A) GSEA software are used to create a heat map of the top 50 genes with high expression levels in the CRC lung metastases and CRC phenotypes in GSE41258 (the color range is from “red, blue” to show the range of expression values as “high to low”). (B) This figure shows the positive correlation (left) and negative correlation (right) between gene grade and ranking index score. (C–G) GSEA analysis about regulation of phagocytosis and immune response gene profiles based on CRC lung metastases and CRC phenotypes in TCGA database. CRC = colorectal cancer, TCGA = The Cancer Genome Atlas.

3.5. Verification of hub gene expression between primary and metastatic CRC in the GEO database

To evaluate the above results from bioinformatics analysis, we further examined the transcriptional levels of hub genes in other GEO datasets. Consistent with GSE41258 GSE68468, GSE41568, the mRNA level of APOE, CLU, SPP1, SFTPD, ABCA3 were found to be increased significantly (Fig. 6A, E–H), while MMP3 was significantly decreased in metastatic CRC (Fig. 6B). However, no significant difference in transcription levels between SFTPC and SFTPB was observed between primary and metastatic CRC (Fig. 6C, D), so we focused on 6 hub genes (APOE, CLU, SPP1, SFTPD, ABCA3, MMP3) in the following study. Besides, we investigated whether these 6 hub genes were related to organotropism of CRC. The results in GSE68468, GSE41258 showed that the expression levels of ABCA3, CLU, and SFTPD were particularly enhanced in lung metastasis of CRC compared with liver metastasis (Fig. 6I–N), and similar expression trends were found in GSE35144 (Fig. 6O, P). Together, these results suggested that ABCA3, CLU, and SFTPD may be important drivers of CRC specific lung tropism metastasis.

Figure 6.

Figure 6.

Expression levels of hub genes in CRC patients from the GEO database. (A–H) The mRNA expression levels of the top 6 hub DEGs in primary and metastatic (including liver, lung, omentum, or peritoneal) CRC were obtained from GEO datasets. (I–K) The transcriptional levels of ABCA3, CLU, and SFTPD in liver and lung metastatic CRC in GSE68468. (L–N) The transcriptional levels of ABCA3, CLU, and SFTPD in liver and lung metastatic CRC in GSE41258. (O, P) The transcriptional levels of ABCA3, SFTPD in liver, lung, omentum, or peritoneal metastatic CRC in GSE35144. ns P > .05; *P < .05; **P < .01; ***P < .001. CRC = colorectal cancer, DEG = differentially expressed gene, GEO = Gene Expression Omnibus.

3.6. Prognostic analysis of hub gene in CRC

To explore the prognostic value of hub gene, we analyzed the TCGA database of CRC patients and found that the mRNA levels of CLU, SFTPD and SPP1 were upregulated in the advanced CRC stage, while the expression of MMP3 was downregulated with the CRC stages. However, the expression levels of ABCA3 and APOE remained unchanged significantly in different CRC stages (Fig. 7A–L). In addition, Kaplan–Meier survival analysis was evaluated for 6 hub genes in CRC patients, and results showed that high expression of APOE, SPP1 was positively associated with poor overall survival and disease-free survival in CRC patients. However, transcription levels of SFTPD, ABCA3, MMP3, and CLU were not found to be significantly associated with overall survival and disease-free survival (Fig. 8A–L). The results of survival curve analysis indicated that 2 DEGs (APOE, SPP1) could be used as prognostic factors. In addition, a strong correlation was observed between the expression of APOE and SPP1 (Fig. 9A–D).

Figure 7.

Figure 7.

Expression profile of ABCA3, APOE, CLU, MMP3, SFTPD, and SPP1 in subgroups of patients with colon cancer and rectum cancer, stratified based on stage criteria (UALCAN). Data are mean ± SE. *P < .05; **P < .01; ***P < .001. SE = standard error, TCGA = The Cancer Genome Atlas.

Figure 8.

Figure 8.

(A-L) The prognostic value of DEGs in CRC patients in the OS and DFS curve (GEPIA). CRC = colorectal cancer, DEGs = differentially expressed genes, DFS = disease-free survival, OS = overall survival.

Figure 9.

Figure 9.

(A, B) The correlation analysis of SPP1 and APOE in CRC patients in the scatter diagram based on correlation coefficient of Spearman and Pearson (GEPIA). (C, D) The correlation analysis of SPP1 and APOE expression level in CRC (TIMER). CRC = colorectal cancer.

3.7. Genetic alterations and co-expression analysis

We conducted a comprehensive analysis of the molecular characteristics of 6 hub genes using TCGA, and the results showed that MMP3, SFTPD, ABCA3, CLU, APOE, and SPP1 were altered in 7%, 7%, 9%, 9%, 6%, and 2.9% of the queried CRC samples, respectively. In addition, enhanced mRNA expression was the most frequently observed change in these samples, we investigated the potential co-expression of these hub genes, finding that the expression of MMP3, SFTPD, ABCA3, CLU, APOE, and SPP1 showed a significant correlation. Among them, the correlation between APOE and SPP1 was strongest (Fig. 10).

Figure 10.

Figure 10.

Genetic alteration and co-expression analyses of 6 hub genes (ABCA3, APOE, CLU, MMP3, SFTPD, and SPP1) in CRC patients. (A) Summary of alterations in 6 hub genes in CRC. (B) Summary of mutation type in CRC. (C–H) Mutation type and copy number of 6 hub genes. (I) The correlation analysis of mutation in mRNA expression between APOE and SPP1 in CRC. CRC = colorectal cancer.

3.8. Prognostic gene validation using clinical tissue samples

To further confirm the prognostic value of hub genes with prognostic value, we used immunohistochemical staining to detect the expression of APOE and SPP1 proteins in tumor tissues. The results showed that APOE and SPP1 proteins were detected in some CRC tissues. Meanwhile, the expression of APOE was not detected in normal lung tissue, and the expression of SPP1 was low in lung tissue, which was consistent with our conclusions (Fig. 11).

Figure 11.

Figure 11.

IHC analysis of APOE and SPP1 with prognostic values. (A–F) Differentially expressed proteins of APOE and SPP1 with prognostic values in CRC and lung normal tissues in The Human Protein Atlas database. CRC = colorectal cancer, IHC = immunohistochemistry.

4. Discussion

Since CRC is the third most common malignant tumor in the world, once many patients are found to have developed distant metastases, or even after radical surgical resection, distant metastases can also appear in the follow-up process, but the treatment effect is not optimistic. Distant metastases are the main factor causing CRC-related death. As CRC is most prone to liver metastasis, most studies are focused on the colorectal liver metastasis (CRLM) at present. In terms of treatment strategy, if radical surgical resection can be performed, it is s till the most effective way, and the 5-year survival rate can reach 53.2%. Liver metastasis of CRC is a complex process involving multiple genes and multiple signaling pathways.[18] Although the mechanism of occurrence has not been thoroughly studied, progress has been made at present. Targeted therapy, chemotherapy or immunotherapy through genetic detection can benefit patients. Lung as the second most common site of CRC distant metastasis. Compared with CRLM, the studies were relatively limited. With the wide application of gene chip and NGS and other gene-related technologies, a large number of core slice data has been generated, most of the data were stored in public databases, hence integrating and reanalyzing these datasets may provide valuable clues for new research.

In recent years, researchers have conducted a large number of microarray data studies on CRC, obtaining hundreds of DEGs, but having limited analysis of microarray data from lung metastases of CRC. Due to the heterogeneity of tissues or samples in independent studies, the results are often limited or inconsistent, and no valid or reliable biomarkers have been identified. In addition, most NGS do not process the functional interpretation of these DEGs to clearly identify suitable key genes. The combination of integrated biological information methods and expression profiling techniques may solve this shortcoming. In this study, we integrated 3 cohort datasets from different sources and conducted in-depth analysis of these data by using multiple bioinformatics methods. Accordingly, 13 DEGs, and 6 hub genes, and 2 pathways were screened based on genomic and transcriptomic sequencing data, as well as expression profiling and prognostic analysis.

Nowadays, there are sufficient studies have shown that tumor microenvironment (TME) plays an important role in mediating distant metastasis of GRC. TME is the environment that tumor cells rely on for survival. It is mainly composed of tumor cells, immune cells, and their products and ECM. Various cells interact with each other or secrete substances such as cytokines, chemokines, or growth factors to regulate the inflammatory or immune state in TME. This is essential for the onset and development of CRC. The composition and proportion of immune cells in TME are significantly different from that in normal tissues.[19] For example, the number of T cells and B cells infiltrated in TME of CRC is lower than that in normal colorectal tissue, while the number of macrophages, dendritic cells, mast cells, and neutrophils infiltrated in TME of CRC is higher than that in normal colorectal tissue. The composition and proportion of immune cells in TME also determine the prognosis of patients, especially the increase in the number of infiltrated CD8+T cells, which can better predict the prognosis and related to the production of tumor necrosis factor, interferon gamma and other substances to target the elimination of tumor cells after activation. While the increase of tumor-associated macrophages (TAMs) predicts poor prognosis.[20,21] It may be related to the inhibition of CD8 + T cells and secretion of some cytokines for the regulation of other immune cells and tumor cells. Among all kinds of immune cells in TME, TAMs are the most abundant immune cells, which are mainly formed by circulating monocytes being recruited to the tumor site by chemokines and cytokines secreted by tumor cells and non-tumor cells in TME.[22] TAMs are highly plastic and can be divided into 2 different phenotypes, namely M1 type and M2 type.[23,24] M1 type shows anti-tumor and pro-inflammatory functions, while M2 type shows pro-tumor and anti-inflammatory functions, which is related to the interaction between cells in TME and the molecular factors affecting polarization. Some studies suggest that M1-type macrophages are mainly responsible for the occurrence of tumors, because the inflammatory mechanism increases the carcinogenesis potential.[25] In the process of tumor development, circulating monocytes are recruited into TME and polarized towards M2 by promoting the further development of tumors. Therefore, M1-type macrophages play a dominant role in early cancer. While M2-type macrophages are prevalent in middle and advanced cancers, TAMs in TME are not simply polarized M1 or M2 macrophages in their interactions with tumor cells, but exhibit both pro-inflammatory and anti-inflammatory properties.[26] Although many studies have shown that TAMs can promote the progression of cancer, the cancer-promoting mechanism remains to be explored. Some studies suggest that the cancer-promoting effect of TAMs is mainly related to its immunosuppressant effect in TME. TAMs inhibit the activity of various immune cells, including cytotoxic T cells, by secreting chemokines and cytokines, but can maintain the activity of regulatory T cells, thus inducing immunosuppression. In addition, TAMs can upregulate cytokines, such as IL-1ra, which promote metastasis by enhancing tumor cell stemness.[27] The PPI network analysis results indicate that module 1 genes (CCL19, CXCL14, CCL18, C7, C3, LTF, CCL2, CLU, C4BPA) are involved in humoral immune, module 2 genes (CCL5, CCL2, CCL19, CXCL14, CCL18) participate in chemokine receptors bind chemokines. Zhou et al[28] revealed the regulatory role of CXCL14 in tumor metastasis and tumor immune cell composition through the single cell sequencing data set of tumor tissues. Wang et al[29] analyzed the expression profiles of multiple human clinical colon cancer datasets and mouse colon cancer models to reveal the variation trend of CXCL14 expression during colon polyps, primary colon cancer, and liver metastases. Firstly, CXCL14 expression levels were normally distributed in healthy tissues but after the occurrence of intestinal polyps, the expression level dispersion increased, indicating a slight dysregulation in the expression level. After further development into primary tumors, the dispersion of expression levels increased further and the median decreased, indicating that the dysregulation of the expression level was intensified and the selection of cancer cells in vivo was progressing in the direction of CXCL14 silencing. The phenomenon of gene silencing became more significant after liver metastasis or lung metastasis, indicating that CXCL14 silencing has certain advantages for tumor metastasis or tumor formation after metastasis. Coussens and Werb[30] have shown that chemokines and their receptors serve as important regulators of various metastatic and advanced cancer. For example, Wolf et al[31] analyzed CCL2 expression in human primary nonmetastasized colon tumors (UICC stages I and II) and in colon tumors that metastasized into the lymph nodes (UICC stage) or into distant organs (UICC stage IV). CCL2 transcripts were more abundant in primary colon tumors of stages I, II, and III when compared to healthy colon samples. However, CCL2 expression was particularly high in colon tumors stage IV that developed metastases in distant organs, indicating that upregulation of CCL2 correlates with metastatic potential. In addition, elevated CCL2 and CCL5 levels have been previously linked to malignancy and increased metastasis in a number of cancers.

In the GO analysis results, it was found that DEGs was mainly enriched in the extracellular space, extracellular regions, and ECM in cell component modules. Cancer-associated fibroblasts (CAF) in the ECM can interact with tumor cells in various ways to promote or inhibit the proliferation and invasion of tumor cells. At present, it is believed that CAF is mainly derived from tissue-resident fibroblasts, which are transformed into CAF under the stimulation of tumor cells. CAF has been classified into a variety of subtypes through different classifications, but in CRC, CAF is mainly divided into 2 different gene expression subtypes, CAF-A and CAF-B.[32] CAF is the main component of ECM and the main role of ECM remodeling cells to maintain matrix stability. CAF can promote immune escape and it can recruit monocytes and promote their adhesion to tumor cells to inhibit natural killer cells in CRC by creating an immunosuppressive environment. In addition, CAF promotes the polarization of macrophages into immunosuppressive and tumor-promoting phenotypes. Using CRC clinical samples together with ex vivo CAFs-CRC co-culture models, Zhong et al[33] found that CAFs induce expression of Leucine Rich Alpha-2-Glycoprotein 1 (LRG1) in CRC, where it showed markedly higher expression in metastatic CRC tissues compared to primary tumors. They further showed that CAFs-induced LRG1 promoted CRC migration and invasion that was concomitant with epithelial-mesenchymal transition induction. In addition, this signaling axis had also been confirmed in the liver metastatic mouse model which displayed CAFs-induced LRG1 substantially accelerates metastasis. CRC tumor cells and CAF work together to promote metastasis of CRC.

In addition, we further analyzed the prognostic value of these hub genes in the TCGA database, and the results showed that high expression of APOE and SPP1 were positively correlated with poorer overall survival in CRC patients with significant correlation. APOE is a glycoprotein with a molecular weight of 35 kDa, whose mRNA is widely expressed in liver, kidney, lung, spleen, skin, brain, and various cells (such as macrophages).[34] Studies had found that APOE plays an important role in inflammation and immune regulation. mRNA and protein levels of IL-4 and IL-10 in the liver tissue of APOE-/- mice were significantly decreased, which inhibited humoral immunity. In addition, PGE2 induces the increased expression of APOE mRNA in activated macrophages through the ERK1/2 signaling pathway, affecting cytokine secretion in macrophages, causing changes in cytokine microenvironment and destroying the balance between M1/M2 types of macrophages, thus affecting the occurrence and development of tumors.[35] Dai et al[36] reported that APOE was increased in CRC liver metastases and CRC lung metastases, which was positively associated with advanced stages and poor overall survival in CRC. Recent studies have demonstrated that APOE also contributes to DNA synthesis, cell proliferation, angiogenesis, and metastasis to facilitate tumorigenesis and progression. SPP1 is an important ECM component, secreted by a variety of cells, including tumor cells, immune cells and fibroblasts. Upregulated expression of SPP1 in tumor tissue and serum of multiple tumors is associated with poor prognosis in patients. Some studies have shown that SPP1 regulates the polarization of macrophages toward M2 type in lung adenocarcinoma, but the specific mechanism remains unclear.[37,38] Liu et al[39] performed scRNA-seq analysis on 201 clinical samples of 51 treatment-naive patients with CRLM and primary CRC, and investigated their immune cellular composition and transcriptional dynamics. Strikingly, SPP1+ TAMs were dominantly enriched in CRLM and showed the highest proangiogenesis potential, which might benefit the metastasis of CRC cells. In particular, SPP1+ TAMs were characterized as malignancy driven and associated with the unfavorable prognosis of CRC patients. This is also consistent with our GO and pathway enrichment results.

5. Conclusion

In summary, bioinformatics analysis identified DEGs and hub genes that may play a key role in regulating lung metastasis in CRC by defining a total of 49 DEGs and 6 hub genes. Enrichment analysis indicated that immune responses and Chemokine receptors bind chemokines may play a leading role in lung metastasis of CRC. APOE and SPP1 are closely related to the stage and poor prognosis of advanced CRC, which may be potential therapeutic targets for the treatment and prevention of lung metastasis of CRC. These findings contribute to the further understanding of lung metastases in CRC, and further studies are needed to for validations.

Author contributions

Conceptualization: Yuxing Liu.

Data curation: Chenming Liu.

Formal analysis: Dong Huang.

Investigation: Chenyang Ge.

Project administration: Lin Chen.

Methodology: Jianfei Fu.

Funding acquisition: Jinlin Du.

Abbreviations:

BP
biological processes
CAF
cancer-associated fibroblasts
CC
cellular components
CRC
colorectal cancer
CRLM
colorectal liver metastasis
DAVID
Database for the Annotation, Visualization and Integrated Discovery
DEGs
differentially expressed genes
ECM
extracellular matrix
GEO
Gene Expression Omnibus
GO
Gene Ontology
GSEA
Gene Set Enrichment Analysis
IHC
immunohistochemistry
KEGG
Kyoto Encyclopedia of Genes and Genomes
MF
molecular functions
NGS
next-generation sequencing
PPI
protein–protein interaction
STRING
search tool Retrieval of Interacting Gene
TAMs
tumor-associated macrophages
TCGA
The Cancer Genome Atlas
TME
tumor microenvironment

YL and CL contributed equally to this work.

The study was financially supported by Jinhua Science and Technology Program Project (2019-3-004).

The current analysis does not require ethical approval, because our integrated bioinformatics analysis only collects uploaded data information from the GEO database search. The program does not process any patient’s personal data and will not cause any patient hurt.

The authors have no conflicts of interest to disclose.

The datasets generated during and/or analyzed during the current study are publicly available.

How to cite this article: Liu Y, Liu C, Huang D, Ge C, Chen L, Fu J, Du J. Identification and prognostic analysis of candidate biomarkers for lung metastasis in colorectal cancer. Medicine 2024;103:11(e37484).

Contributor Information

Yuxing Liu, Email: 22118065@zju.edu.cn.

Chenming Liu, Email: 22118065@zju.edu.cn.

Dong Huang, Email: huangd2020@126.com.

Chenyang Ge, Email: gechen2018@126.com.

Lin Chen, Email: chenin2019@126.com.

Jianfei Fu, Email: fuanei2000@163.com.

References

  • [1].Van der Geest LG, Lam-Boer J, Koopman M, et al. Nationwide trends in incidence, treatment and survival of colorectal cancer patients with synchronous metastases. Clin Exp Metastasis. 2015;32:457–65. [DOI] [PubMed] [Google Scholar]
  • [2].Riihimäki M, Hemminki A, Sundquist J, et al. Patterns of metastasis in colon and rectal cancer. Sci Rep. 2016;6:29765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Parnaby CN, Bailey W, Balasingam A, et al. Pulmonary staging in colorectal cancer: a review. Colorectal Dis. 2012;14:660–70. [DOI] [PubMed] [Google Scholar]
  • [4].Marks KM, West NP, Morris E, et al. Clinicopathological, genomic and immunological factors in colorectal cancer prognosis. Br J Surg. 2018;105:e99–e109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Cavallaro P, Bordeianou L, Stafford C, et al. Impact of single-organ metastasis to the liver or lung and genetic mutation status on prognosis in stage IV colorectal cancer. Clin Colorectal Cancer. 2020;19:e8–e17. [DOI] [PubMed] [Google Scholar]
  • [6].Keeley EC, Mehrad B, Strieter RM. CXC chemokines in cancer angiogenesis and metastases. Adv Cancer Res. 2010;106:91–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Guillemot E, Karimdjee-Soilihi B, Pradelli E, et al. CXCR7 receptors facilitate the progression of colon carcinoma within lung not within liver. Br J Cancer. 2012;107:1944–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Rao US, Hoerster NS, Thirumala S, et al. The influence of metastatic site on the expression of CEA and cellular localization of β-catenin in colorectal cancer. J Gastroenterol Hepatol. 2013;28:505–12. [DOI] [PubMed] [Google Scholar]
  • [9].Liu WQ, Li WL, Ma SM, et al. Discovery of core gene families associated with liver metastasis in colorectal cancer and regulatory roles in tumor cell immune infiltration. Transl Oncol. 2021;14:101011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Wang X, Gao G, Chen Z, et al. Identification of the miRNA signature and key genes in colorectal cancer lymph node metastasis. Cancer Cell Int. 2021;21:358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57. [DOI] [PubMed] [Google Scholar]
  • [12].Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47:D607–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Chandrashekar DS, Bashel B, Balasubramanya SAH, et al. UALCAN: a portal for facilitating tumor subgroup gene expression and survival analyses. Neoplasia. 2017;19:649–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Tang Z, Li C, Kang B, et al. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res. 2017;45:W98–W102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Li T, Fan J, Wang B, et al. TIMER: a web server for comprehensive analysis of tumor-infiltrating immune cells. Cancer Res. 2017;77:e108–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Gao J, Aksoy BA, Dogrusoz U, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6:pl1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Uhlén M, Fagerberg L, Hallström BM, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419. [DOI] [PubMed] [Google Scholar]
  • [18].Xu D, Wang HW, Yan XL, et al. Sub-millimeter surgical margin is acceptable in patients with good tumor biology after liver resection for colorectal liver metastases. Eur J Surg Oncol. 2019;45:1551–8. [DOI] [PubMed] [Google Scholar]
  • [19].Yoon PS, Piccolo ND, Shirure VS, et al. Advances in modeling the immune microenvironment of colorectal cancer. Front Immunol. 2021;11:614300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Pagès F, Kirilovsky A, Mlecnik B, et al. In situ cytotoxic and memory T cells predict outcome in patients with early-stage colorectal cancer. J Clin Oncol. 2009;27:5944–51. [DOI] [PubMed] [Google Scholar]
  • [21].Kemp MG. Crosstalk between apoptosis and autophagy: environmental genotoxins, infection, and innate immunity. J Cell Death. 2017;9:1179670716685085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Duan Z, Luo Y. Targeting macrophages in cancer immunotherapy. Signal Transduct Target Ther. 2021;6:127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Wu K, Lin K, Li X, et al. Redefining tumor-associated macrophage subpopulations and functions in the tumor microenvironment. Front Immunol. 2020;11:1731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Li X, Yao W, Yuan Y, et al. Targeting of tumour-infiltrating macrophages via CCL2/CCR2 signalling as a therapeutic strategy against hepatocellular carcinoma. Gut. 2017;66:157–67. [DOI] [PubMed] [Google Scholar]
  • [25].Qian BZ, Pollard JW. Macrophage diversity enhances tumor progression and metastasis. Cell. 2010;141:39–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Pinto ML, Rios E, Durães C, et al. The two faces of tumor-associated macrophages and their clinical significance in colorectal cancer. Front Immunol. 2019;10:1875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Wang W, Liu Y, Guo J, et al. miR-100 maintains phenotype of tumor-associated macrophages by targeting mTOR to promote tumor metastasis via Stat5a/IL-1ra pathway in mouse breast cancer. Oncogenesis. 2018;7:97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Zhou L, Zhang Y, Wei M, et al. Comprehensive analysis of CXCL14 uncovers its role during liver metastasis in colon cancer. BMC Gastroenterol. 2023;23:273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Wang Y, Wang S, Niu Y, et al. Data mining suggests that CXCL14 gene silencing in colon cancer is due to promoter methylation. Int J Mol Sci. 2023;24:16027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Coussens LM, Werb Z. Inflammation and cancer. Nature. 2002;420:860–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Wolf MJ, Hoos A, Bauer J, et al. Endothelial CCR2 signaling induced by colon carcinoma cells enables extravasation via the JAK2-Stat5 and p38MAPK pathway. Cancer Cell. 2012;22:91–105. [DOI] [PubMed] [Google Scholar]
  • [32].Sahai E, Astsaturov I, Cukierman E, et al. A framework for advancing our understanding of cancer-associated fibroblasts. Nat Rev Cancer. 2020;20:174–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Zhong B, Cheng B, Huang X, et al. Colorectal cancer-associated fibroblasts promote metastasis by up-regulating LRG1 through stromal IL-6/STAT3 signaling. Cell Death Dis. 2021;13:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Zhang HL, Wu J, Zhu J. The immune-modulatory role of apolipoprotein E with emphasis on multiple sclerosis and experimental autoimmune encephalomyelitis. Clin Dev Immunol. 2010;2010:186813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Akdis M, Burgler S, Crameri R, et al. Interleukins, from 1 to 37, and interferon-γ: receptors, functions, and roles in diseases. J Allergy Clin Immunol. 2011;127:701–21.e1-70. [DOI] [PubMed] [Google Scholar]
  • [36].Dai W, Guo C, Wang Y, et al. Identification of hub genes and pathways in lung metastatic colorectal cancer. BMC Cancer. 2023;23:323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Cao DX, Li ZJ, Jiang XO, et al. Osteopontin as potential biomarker and therapeutic target in gastric and liver cancers. World J Gastroenterol. 2012;18:3923–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Zhang Y, Du W, Chen Z, et al. Upregulation of PD-L1 by SPP1 mediates macrophage polarization and facilitates immune escape in lung adenocarcinoma. Exp Cell Res. 2017;359:449–57. [DOI] [PubMed] [Google Scholar]
  • [39].Liu Y, Zhang Q, Xing B, et al. Immune phenotypic linkage between colorectal cancer and liver metastasis. Cancer Cell. 2022;40:424–437.e5. [DOI] [PubMed] [Google Scholar]

Articles from Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES