Abstract
Background
Colorectal cancer (CRC) represents a huge global health challenge characterized by significant morbidity and mortality. The urgent need to identify biomarkers through integrative validation research to enhance diagnostic accuracy and prognostic stratification has prompted the exploration of immune and prognostic genes. This study aimed to systematically identify differentially expressed genes (DEGs) associated with both immunity and prognosis in CRC, validate their clinical significance, and construct a reliable prognostic model.
Methods
This research sought to identify DEGs associated with immunity and prognosis in CRC. We examined clinical and RNA sequencing data from 698 CRC patients obtained from The Cancer Genome Atlas (TCGA). Utilizing the Xiantao Academic Platform, we conducted differential expression analysis and identified hub genes associated with immunity and prognosis through Least Absolute Shrinkage and Selection Operator (LASSO) and Cox regression analyses, alongside five machine learning algorithms to construct a prognostic model. The hub genes were validated using the Gene Expression Omnibus (GEO) database, molecular docking, molecular dynamics simulation, single-cell and spatial transcription analyses.
Results
LASSO and Cox regression analyses, along with five machine learning algorithms, were employed to identify significant genes linked to immunity and prognosis, yielding three hub genes: ULBP2, INHBB, and STC2. Validation of these genes in the GEO dataset GSE21815 demonstrated significant diagnostic performance, with area under the curve (AUC) values of 0.908, 0.742, and 0.934, respectively. A prognostic model integrating clinical factors and hub genes was developed, demonstrating high predictive accuracy for 1-, 3-, and 5-year survival rates. Further analysis revealed significant enrichment in the TGF-β signaling pathway and natural killer cell-mediated cytotoxicity, as evidenced by Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. The single-sample Gene Set Enrichment Analysis (ssGSEA)-based immune infiltration analysis revealed immune infiltration differences between groups with high and low immune phenotype scores. Molecular docking and dynamics simulations revealed valproic acid, cyclosporine, and genistein as potential therapeutic compounds with strong binding affinities to the hub genes. Single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics provided insights into hub gene expression patterns and interactions within the tumor microenvironment.
Conclusions
This comprehensive study highlights the potential of ULBP2, INHBB, and STC2 as promising biomarkers for CRC, emphasizing their roles in regulating tumor progression and immune responses. Future studies should focus on targeted therapeutic strategies that utilize these biomarkers to enhance treatment efficacy and patient prognosis.
Keywords: Colorectal cancer (CRC), biomarkers, immune phenotype score, molecular dynamics simulation, single-cell, spatial transcriptomics
Highlight box.
Key findings
• The study identified three hub genes (ULBP2, INHBB, STC2) associated with immune and prognosis in colorectal cancer (CRC).
• A clinical prognostic model incorporating these hub genes and clinical factors was developed, demonstrating high predictive accuracy for 1-, 3-, and 5-year survival rates.
• Molecular docking and molecular dynamics simulations revealed valproic acid, cyclosporine, and genistein as potential therapeutic compounds targeting the hub genes.
What is known and what is new?
• Immune-related genes play a crucial role in CRC progression and prognosis. Existing research has investigated single or small sets of immune genes.
• This study systematically identified and validated a panel of three hub genes (ULBP2, INHBB, STC2) using comprehensive bioinformatics techniques and public databases. It established a prognostic model integrating clinical factors and hub genes, providing new insights into CRC management.
What is the implication, and what should change now?
• The findings suggest that ULBP2, INHBB, and STC2 could serve as promising biomarkers for CRC diagnosis and prognosis, facilitating personalized therapeutic interventions.
• Future research should focus on experimental validation of the biological functions and mechanisms of these hub genes. Larger multicenter studies incorporating wet lab validation are needed to enhance the generalizability and clinical applicability of the findings. Additionally, developing targeted therapeutic strategies leveraging these biomarkers holds potential to improve CRC patient outcomes.
Introduction
Colorectal cancer (CRC) is a prevalent malignant tumor characterized by high global incidence and mortality rates. According to the World Health Organization, CRC is the third most prevalent cancer worldwide, with more than 1.9 million new cases annually (1). Although existing treatment methods, such as surgery, radiotherapy, and chemotherapy, have improved patient prognosis to some extent, there are still limitations in efficacy and prognosis due to individual patient differences and the complexity of tumor biological characteristics (2). Consequently, it is crucial to investigate integrative validation research biomarkers and therapeutic approaches to enhance the survival and quality of life for CRC patients (3).
Recent studies have indicated a strong correlation between immune cell infiltration in the tumor microenvironment and CRC prognosis (4). Research indicates that immune-related genes significantly influence CRC development and progression, with their expression patterns potentially serving as biomarkers for predicting patient prognosis and treatment response. Besides, the expression of immune checkpoint genes is significantly linked to tumor progression and immune evasion, highlighting their clinical importance in the personalized treatment of CRC (5).
While studies have investigated the role of immune-related genes in CRC, comprehensive research systematically identifying immune genes closely associated with CRC prognosis remains limited (6). Most existing research has focused on the effects of single genes or a small range of genes, lacking a comprehensive analysis of large-scale data. Developing a prognostic model using immune-related genes offers novel insights and evidence for early CRC diagnosis and personalized treatment (7).
This study sought to enhance the reliability and clinical relevance of screening outcomes by systematically identifying immune and prognostic genes in CRC. This was achieved through the integration of data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) public databases, and the application of bioinformatics techniques including differential expression analysis, machine learning algorithms, immune infiltration analysis, molecular docking, molecular dynamics simulations, and single-cell and spatial transcriptomics (ST) analysis. Besides, by integrating various clinical factors and expression data of immune and prognostic genes in CRC, an effective prognostic model was established. This immune gene-based prognostic model may provide a new direction for managing CRC patients, predicting survival rates, and providing valuable references for clinical decision-making, thereby facilitating the development of personalized medicine. The schematic illustration of the overall general workflow of this study is presented in Figure 1. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-918/rc).
Figure 1.
Schematic illustration of the overall general workflow of this study. DEG, differentially expressed gene; GO, Gene Ontology; GSEA, Gene Set Enrichment Analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Methods
Data acquisition
The study utilized publicly accessible data primarily obtained from TCGA (https://portal.gdc.cancer.gov/) and the GEO (https://www.ncbi.nlm.nih.gov/geo/) databases. Clinical and RNA sequencing data for 698 CRC patients were obtained from TCGA. This dataset comprised 647 CRC tissue samples and 51 normal colorectal tissue samples, along with clinical details including age, gender, T stage, N stage, M stage, pathological stage, carcinoembryonic antigen (CEA) levels, and lymphatic invasion. The CRC dataset GSE21815, sourced from the GEO database, was sequenced using the Agilent-014850 Whole Human Genome Microarray 4x44K (Probe Name version; Agilent Technologies, Santa Clara, CA, USA) and includes 141 tissue samples from CRC patients, consisting of 132 CRC tissues and 9 normal colorectal tissues. Besides, we accessed the single-cell dataset GSE261388 from GEO, sequenced on the Illumina NovaSeq 6000 platform (Illumina Inc., San Diego, CA, USA), which included 3 normal colorectal mucosa samples and 3 CRC samples, totaling 6 samples for this study. ST data were sourced from the Single-Cell Colorectal Cancer Liver Metastases (CRLM) Atlas (http://www.cancerdiversity.asia/scCRLM/), from which one CRC ST sample was chosen for detailed analysis. In our analysis, TCGA samples served as the training dataset, while the GEO dataset GSE21815 was used as the validation dataset.
Screening of immune-related and prognostic differentially expressed genes (DEGs)
DEGs related to immune and prognostic factors were identified using the Xiantao Academic Platform (https://www.xiantaozi.com/). The mRNA specific to CRC was filtered, yielding DEGs with |log fold change (FC)| >2 and adjusted P<0.05. For prognostic screening, we identified mRNA associated with CRC prognosis with P<0.05. Genes associated with immune phenotypes were obtained from the Immport database (https://www.immport.org/home). The intersection of DEGs, prognostic genes, and immune genes yielded immune and prognostic-related DEGs. Subsequently, we employed the Least Absolute Shrinkage and Selection Operator (LASSO) regression to refine the selection of these intersecting genes, followed by univariate Cox regression and forest plot validation to ascertain their clinical prognostic significance. To further filter and evaluate hub genes in CRC, we employed five machine learning algorithms: 1,000 iterations of 10-fold cross-validation LASSO, Learning Vector Quantification (LQV), Boruta, Treebag, and Support Vector Machine (SVM). Genes identified as features by all five algorithms were considered significant hub DEGs. The R package “Upset” was employed to visualize the interactions among these hub genes.
Clinical significance analysis of hub genes and validation in the validation dataset
Using the Xiantao Academic Platform, we generated a volcano plot to illustrate the specific positions of hub genes. RNAseq data from the TCGA-COAD and TCGA-READ projects were downloaded and organized, with FPKM (fragments per kilobase million) format data extracted for comparative analysis. The pROC package (1.18.0, R Foundation for Statistical Computing, Vienna, Austria) was utilized to plot diagnostic receiver operating characteristic (ROC) curves, while paired sample plots were generated from corresponding cancer and adjacent tissue samples. Besides, we downloaded CRC clinical data to construct Kaplan-Meier (KM) curves using the survival package (3.3.1). The expression of hub genes was analyzed in relation to clinical M stage, T stage, and pathological stage, with results visualized using ggplot2 (3.3.6). The GSE21815 dataset was also utilized to generate comparative plots and diagnostic ROC curves to validate the significance and clinical diagnostic relevance of hub genes.
Construction of a hub gene prognostic model
We next established a clinical prognostic model to examine factors affecting CRC prognosis, integrating age, gender, pathological M, N, and T stages, lymphatic invasion, and hub genes, and represented it with a nomogram. Calibration curves were generated to evaluate the concordance between predicted and observed survival rates at 1, 3, and 5 years. Risk scores derived from the nomogram were classified into high and low-risk groups, and risk factor plots were generated to illustrate the model’s effectiveness in stratifying samples and highlighting survival differences between these groups.
Hub gene identification and enriched pathway mapping using bioinformatics tools
We employed the STRING database (version 12.0, https://cn.string-db.org/) to investigate protein-protein interactions (PPI) across different diseases. The STRING database integrates established and predicted connections between proteins, including physical interactions and functional relationships. For PPI construction, we set the minimum interaction score to 0.150 as a low confidence level and limited the first shell to a maximum of 50 interactors, identifying genes associated with hub genes.
Gene Ontology (GO) enrichment analysis included biological processes (BP), molecular functions (MF), and cellular components (CC). The Kyoto Encyclopedia of Genes and Genomes (KEGG) was utilized as a bioinformatics tool to identify significantly altered metabolic pathways in gene lists. The R package “clusterProfiler” was used for GO and KEGG enrichment analysis of genes associated with hub genes, applying a correction threshold of P<0.05 and q<0.25 to identify the top five functions in BP, CC, MF, and KEGG, which were visualized using lollipop plots.
We next conducted GSEA to identify significant gene expression differences between disease and control groups, using the R package ‘DESeq2’ for differential expression analysis. The ‘clusterProfiler’ package was utilized for GSEA, performing 1,000 gene set permutations with the c2.cp.all.v2022.1.Hs.symbols.gmt (All Canonical Pathways) reference set from MSigDB (https://www.gsea-msigdb.org/gsea/msigdb). Gene sets with adjusted P<0.05 and q<0.25 were considered significantly enriched, and the top 5 upregulated and downregulated pathways were identified and visualized.
Immune phenotype score, immune infiltration, and immune checkpoint analysis
Immune phenotype-related genes were obtained from the Immport database to be used as marker genes for calculating the immune phenotype score. We assessed immune infiltration levels in all CRC patient samples utilizing the ssgseaParam function from the R package ‘GSVA’.
Immune infiltration refers to the accumulation of immune cells, such as T cells, B cells, and macrophages, within the tumor or inflammatory tissues, indicating the immune system’s reaction to abnormal cells or pathogens. The single-sample Gene Set Enrichment Analysis (ssGSEA) method quantifies the proportion of 28 immune cell types based on gene expression data. Using R packages “limma”, “reshape2”, “ggpubr”, “corrplot”, “dplyr”, “tidyverse”, “ggplot2”, “linkET”, and “ggExtra”, we generated comparative plots, correlation heatmaps, immune cell and gene expression networks, and correlation lollipop plots for each hub gene with immune cells.
Immune checkpoint genes, which encode molecules expressed on immune cell surfaces that regulate immune activation, were identified from the literature (8,9), resulting in the selection of 30 immune checkpoints. A correlation heatmap was generated to illustrate the relationships between hub genes and immune checkpoints.
Molecular docking and molecular dynamics simulation of hub genes with predicted drugs
We employed the Comparative Toxicogenomics Database (CTD) to predict protein-drug interactions and identify small-molecule structures for hub genes, drawing on literature and clinical studies. We downloaded 3D structures of selected drugs in SDF format from PubChem and obtained PDB files for target proteins from the PDB database. Molecular docking was conducted via the CB-docking database (https://cadd.labshare.cn/cb-dock2/php/blinddock.php), where lower binding energy signifies greater receptor-ligand interaction affinity and stability. The results of molecular docking were visualized using PyMOL.
Molecular dynamics simulations were conducted to explore the interaction mechanisms between hub genes and drug molecules. This technique enabled the observation of atomic-level molecular interactions, such as bond dynamics and molecular motion.
We utilized the Gromacs software package, a widely recognized tool for molecular simulations, to perform detailed experiments. Proteins were modeled using CHARMM36 force field parameters, and ligand topology was constructed with GAFF parameters. The protein-ligand complex was positioned in a cubic box with periodic boundary conditions and filled with TIP3P water molecules. Prior to simulation, energy minimization was achieved through the steepest descent and conjugate gradient methods. Subsequently, we performed 100 picoseconds (ps) of isothermal-isochoric and isothermal-isobaric ensemble equilibrations. After energy minimization and equilibration, a 100 nanoseconds (ns) molecular dynamics simulation was performed with a 2 femtoseconds (fs) time step, recording structural coordinates every 10 ps. We conducted an analysis of the molecular dynamics simulation trajectory, focusing on root mean square deviation (RMSD), root mean square fluctuation (RMSF), solvent-accessible surface area (SASA), radius of gyration (Rg), and the number of hydrogen bonds between proteins and ligands.
Single-cell, cell communication, and pseudotime analysis
For the GSE261388 single-cell dataset, we performed quality control filtering based on the following criteria: 200< Count_RNA <5,000, Feature_RNA >200, complexity log10GenesPerUMI >0.9, and mitochondrial gene ratio mitoRatio <0.2. We filtered out low-expressed genes with expression levels greater than 10. Data normalization and standardization were conducted, and doublet cells were identified to isolate single cells. The “VariableFeaturePlot” function identified 3,000 highly variable genes. We applied the “RUNPCA” function for principal component analysis (PCA) to reduce the dimensionality of the SCRNA-SEQ data based on the top 3,000 genes, selecting DIMS =30. Cells were grouped into clusters using the “FindNeighbors” and “FindClusters” functions with a resolution of 0.8. Cell communication analysis was conducted to identify both incoming and outgoing pathways, as well as ligand-receptor pairs (10,11).
In this study, we utilized the CellChat software package to compute and analyze intercellular communication among different cell types in CRC samples, maintaining default parameters for in-depth analysis and visualization of communication levels among signaling pathways. Pseudotime analysis was conducted using the “monocle” package to generate pseudotime trajectories, providing insights into branching and linear differentiation processes. Genes with a discrepancy estimate greater than 0.3 and an average expression exceeding 0.05 were utilized to construct pseudotime trajectories. The default parameters of the DDRTree algorithm were employed. We employed Monocle’s integrated branch expression analysis modeling to visualize hub gene expression as heatmaps across various time trajectories, enabling quantitative analysis of gene expression changes during cell determination.
ST analysis
The Seurat package was employed for processing and visualizing ST data. We utilized the NormalizeData and ScaleData functions for standardization and normalization of ST data. An unsupervised clustering technique was then utilized to group similar ST spots. Cell population annotations were based on hematoxylin and eosin (H&E) staining sections and the expression of highly variable genes within each cluster. Differences in cell subgroups were determined through visual inspection, aided by immunohistochemistry and ST data. The spatialdimplot and SpatialFeaturePlot functions were integrated to visualize gene expression levels in ST data (12). Moreover, the expression levels of hub genes across different cell types in ST were displayed.
Statistical analysis
Statistical analyses were conducted with R version 4.3.3, utilizing Spearman correlation tests to assess parameter relationships. The Wilcoxon test was employed to assess group differences. Statistical significance was categorized as P<0.05 (*), P<0.01 (**), and P<0.001 (***), while ‘ns’ denoted a lack of significant difference.
Results
Identification of DEGs associated with immune response and prognosis
We utilized the filter molecular modules of the Xiantao Academic Platform to identify 2,154 DEGs in CRC samples. In the prognosis screening module, we identified 1,876 prognostic genes for CRC. We acquired 2,483 immune-related genes from the Immport database. We identified 18 DEGs related to both immunity and prognosis by intersecting the sets of DEGs, prognostic genes, and immune genes (Figure 2A). Through prognosis LASSO coefficient screening, we selected 14 genes from the aforementioned 18 genes (Figure 2B,2C). Using univariate Cox regression, we filtered out 14 clinical prognostic genes, which were visualized in a forest plot. The analysis revealed that all 14 genes exhibited P below 0.05. Specifically, KIR2DL4, TNFRSF17, CXCL11, CXCL8, and TG were identified as protective genes, whereas UCN, LEP, ULBP2, TPM2, INHBB, STC2, ANGPTL1, FGF19, and NR5A1 were classified as pathogenic genes (Figure 2D).
Figure 2.
Identification of immune and prognosis related differentially expressed genes. (A) Venn diagram of differentially expressed genes, immune genes, and prognostic genes. (B) Prognostic LASSO coefficient screening graph for colorectal cancer. (C) Prognostic LASSO variable trajectory chart in colorectal cancer. (D) Forest plot of 14 immune and prognosis-related genes. CI, confidence interval; DEG, differentially expressed gene; HR, hazard ratio; LASSO, Least Absolute Shrinkage and Selection Operator.
Using five machine learning algorithms, we further screened the nine pathogenic genes. LASSO regression identified eight genes (Figure 3A), while LQV analysis selected seven genes (Figure 3B). The Boruta retained nine genes (Figure 3C), Treebag included all nine genes (Figure 3D), and six genes were selected in SVM (Figure 3E). The six genes identified by all five machine learning methods were ANGPTL1, FGF19, INHBB, STC2, UCN, and ULBP2 (Figure 3F). After literature review and comprehensive clinical analysis, we ultimately selected ULBP2, INHBB, and STC2 as immunity and prognostic hub genes in CRC samples.
Figure 3.
Five machine learning algorithms for screening three hub genes. (A) LASSO. (B) LQV. (C) Boruta. (D) Bagged Tree. (E) SVM. (F) UpsetPlots show the intersection genes of five machine learning algorithms. LASSO, Least Absolute Shrinkage and Selection Operator; LQV, Learning Vector Quantification; SVM, Support Vector Machine.
Clinical significance analysis of hub genes and validation in the validation dataset
To analyze the significance of hub genes in clinical diagnosis and prognosis, a volcano plot was generated using CRC samples from TCGA, showing the specific positions of three hub genes. The three hub genes, ULBP2, INHBB, and STC2, were all located in the high-expression region with positive logFC values (Figure 4A). Consistent with previous findings, ULBP2 shows differential expression in CRC, and our further validation confirms its diagnostic and prognostic value (13). A group comparison plot revealed that the three hub genes exhibited significantly higher expression in the disease group than in the control group (Figure 4B). Diagnostic ROC analysis revealed that the area under the curve (AUC) values for ULBP2, INHBB, and STC2 were 0.983, 0.801, and 0.945, respectively, all exceeding 0.7, which indicated substantial diagnostic value (Figure 4C).
Figure 4.
Clinical significance analysis of three hub genes and validation in the GSE21815 dataset. (A) The volcano plot of the TCGA colorectal cancer dataset shows three hub genes. (B) The box plot shows three hub genes in colorectal cancer. (C) The diagnostic ROC curve diagram shows three hub genes in colorectal cancer. (D) The paired diagram shows three hub genes in colorectal cancer. (E) The box plot shows three hub genes in GSE21815 dataset. (F) The diagnostic ROC curve diagram shows three hub genes in GSE21815 dataset. (G) The KM curve of ULBP2 high and low expression groups. (H) The KM curve of INHBB. (I) The KM curve of STC2. (J) The group comparison of the three-hub gene in M stages. (K) The group comparison of the three-hub gene in T stages. (L) The group comparison of the three-hub gene in pathologic stages. *, P<0.05; **, P<0.01; ***, P<0.001. AUC, area under the curve; CI, confidence interval; FPKM, fragments per kilobase of transcript per million mapped reads; FPR, false positive rate; HR, hazard ratio; KM, Kaplan-Meier; ROC, receiver operating characteristic; TCGA, The Cancer Genome Atlas; TPR, true positive rate.
In both the paired sample plot of adjacent and cancer samples and the group comparison plot using the GSE21815 dataset, all three hub genes showed significantly higher expression levels in the disease group compared to the control group (Figures 4D,4E). The diagnostic ROC analysis revealed significant diagnostic value for ULBP2, INHBB, and STC2, with areas under the curve of 0.908, 0.742, and 0.934, respectively (Figure 4F). The KM curve, derived from CRC clinical data, indicates hazard ratio (HR) values of 1.48 for ULBP2, 1.73 for INHBB, and 1.52 for STC2, all with P values below 0.05. All three hub genes are associated with high expression and poor prognosis (Figure 4G-4I). The expression of the three hub genes correlated with clinical M stage, T stage, and pathological stage, and the group comparison plot results showed significant differences across various stages (Figure 4J-4L).
Construction of three hub gene prognostic models
A clinical prognostic model for CRC was developed to examine factors influencing prognosis, incorporating age, gender, pathologic M, N, and T stages, lymphatic invasion, and the genes ULBP2, INHBB, and STC2. A nomogram demonstrated the significant prognostic value of these three hub genes (Figure 5A). Calibration curves were constructed to evaluate the consistency between predicted and actual survival rates at 1, 3, and 5 years. The calibration curves indicated that the predicted values were generally consistent with the actual values (Figure 5B-5D).
Figure 5.
Construction of clinical prognostic model with three hub genes and other clinical indicators. (A) Nomogram of clinical prognostic model. (B) Calibration curve in 1 year. (C) Calibration curve in 3 years. (D) Calibration curve in 5 years. (E) Risk factor diagram of high and low expression groups of nomogram risk values.
Patients were next categorized into high-risk and low-risk groups based on the risk scores from the nomogram, and a risk factor plot was generated to illustrate this classification. The risk factor plot includes four components: (I) risk score, categorizing nomogram-derived scores into high and low-risk groups based on the median; (II) survival outcomes, illustrated by a dot plot indicating higher mortality in the high-risk group compared to the low-risk group; (III) risk group, represented by a horizontal bar showing the distribution of high and low-risk scores; (IV) heatmap, depicting the expression levels of ULBP2, INHBB, and STC2, which were elevated in the high-risk group and reduced in the low-risk group(Figure 5E).
Hub gene identification and enriched pathway mapping using bioinformatics tools
Using the STRING database, we explored the interaction relationships between three hub genes and other proteins, identifying 53 genes. The PPI network consisted of 53 nodes and 289 edges. We conducted functional enrichment analysis to study the biological functions and pathways of the protein-coding genes related to the three hub genes. GO analysis revealed gene enrichment in several areas: activin receptor and transmembrane receptor protein serine/threonine kinase signaling pathways, natural killer cell-mediated cytotoxicity and immunity, and positive regulation of SMAD protein phosphorylation (BP). This is consistent with the reported role of INHBB in regulating related signaling pathways, and our KEGG analysis further supports its involvement in TGF-β signaling pathway (14); external side of plasma membrane, membrane-anchored components, serine/threonine protein kinase and protein kinase complexes, and plasma membrane signaling receptor complexes (CC); activin binding, transmembrane receptor protein serine/threonine kinase and activin-activated receptor activities, hormone activity, and transmembrane receptor protein kinase activity (MF) (Figure 6A). Integrated GO analysis of hub gene-associated proteins, incorporating logFC values, demonstrated significant functional enrichment across multiple BP by circus plot (Figure 6B). A lollipop plot was generated, revealing that these genes were enriched in the TGF-β signaling pathway, natural killer cell-mediated cytotoxicity, pluripotency regulation of stem cells, cytokine-cytokine receptor interaction, and fluid shear stress and atherosclerosis. These findings were corroborated by the KEGG results (Figure 6C), with subsequent integrated KEGG-logFC circus plot specifically highlighting these five critical pathways (Figure 6D).
Figure 6.
GO, KEGG, and GSEA functional enrichment analysis. (A) Lollipop plots show the top 5 enriched GO terms for BP, CC, and MF. (B) Circle plots show the top 5 enriched GO terms for BP, CC, and MF. (C) Lollipop plot shows the top 5 enriched KEGG pathways. (D) The circle plot shows the top 5 enriched KEGG pathways. (E) GSEA advanced visualization shows the top 5 upregulated pathways. (F) GSEA advanced visualization shows the top 5 downregulated pathways. BP, biological process; CC, cellular component; FC, fold change; GO, Gene Ontology; GSEA, Gene Set Enrichment Analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function.
We conducted GSEA to identify the most significantly altered pathways between the disease group (CRC) and the control group, aiming to clarify the primary sources of differing patient risks. We identified the top 5 upregulated and downregulated pathways by referencing pathway data from the MsigDB database and evaluating normalized enrichment scores (NES). GSEA revealed the top five upregulated pathways as reactome formation of the cornified envelope, reactome keratinization, reactome meiotic recombination, reactome DNA methylation, and reactome amyloid fiber formation (Figure 6E). Conversely, the top five downregulated pathways included reactome CGNP effects, KEGG calcium signaling pathway, reactome platelet calcium homeostasis, reactome platelet homeostasis, and reactome nitric oxide stimulates guanylate cyclase (Figure 6F).
Immune phenotype scoring, immune infiltration analysis, and immune checkpoint analysis
We analyzed expression differences between high and low expression groups of immune-related genes in CRC samples from TCGA, utilizing 2,483 immune phenotype-related genes from the Immport database as markers. Samples were categorized into high and low expression groups according to their immune phenotype scores.
Using the ssGSEA method, we examined the infiltration of 28 immune cell types in both high and low score groups. All 28 types of immune cells exhibited significant differences between the high and low expression groups (P<0.001), with elevated expression levels in the high expression group and reduced levels in the low expression group (Figure 7A). We performed a correlation analysis to elucidate the relationships among immune cells within the CRC immune microenvironment, revealing that most immune cells exhibited positive correlations (Figure 7B). In the immune cell expression network diagram, the hub genes ULBP2, INHBB, and STC2 are connected by lines whose thickness indicates correlation strength; orange lines denote positive correlations, while green lines signify negative correlations. In the heatmap, darker colors represent stronger correlations (Figure 7C). In the correlation lollipop charts for each hub gene and immune cell, we included only those immune cells with a P less than 0.05. ULBP2 expression showed positive correlations with 14 immune cell types, including activated dendritic cells, activated CD4 T cells, immature dendritic cells, type 2 T helper cells, natural killer T cells, myeloid-derived suppressor cells (MDSCs), regulatory T cells, T follicular helper cells, CD56 bright natural killer cells, CD56 dim natural killer cells, natural killer cells, activated CD8 T cells, type 17 T helper cells, and type 1 T helper cells. Conversely, it was negatively correlated with five immune cell types: eosinophils, immature B cells, memory B cells, mast cells, and activated B cells (Figure 7D). INHBB expression showed positive correlations with ten immune cell types, including regulatory T cells, natural killer cells, plasmacytoid dendritic cells, CD56 bright natural killer cells, MDSCs, immature dendritic cells, type 1 T helper cells, natural killer T cells, macrophages, and activated dendritic cells. Conversely, it was negatively correlated with three immune cell types: immature B cells, memory B cells, and activated B cells (Figure 7E). STC2 expression showed positive correlations with six immune cells: CD56 bright natural killer cells, activated CD4 T cells, immature dendritic cells, regulatory T cells, plasmacytoid dendritic cells, and activated dendritic cells. Conversely, it was negatively correlated with nine immune cells: eosinophils, mast cells, effector memory CD8 T cells, central memory CD4 T cells, monocytes, effector memory CD4 T cells, immature B cells, memory B cells, and activated B cells (Figure 7F).
Figure 7.
ssGSEA immune infiltration analysis using the immune phenotype score. (A) Boxplots show the differences expression of 28 immune cells in high and low groups. (B) The correlation heatmap shows the relationship between 28 immune cells. (C) The network plot shows the relationships between 28 immune cells and 3 hub genes expression utilizing ssGSEA. (D) The lollipop plots show the correlation between 28 immune cells and ULBP2 expression. (E) The lollipop plots show the correlation between 28 immune cells and INHBB expression. (F) The lollipop plots show the correlation between 28 immune cells and STC2 expression. (G) The correlation heatmap shows the correlation between immune checkpoints and the expression of three hub genes. *, P<0.05; **, P<0.01; ***, P<0.001. ssGSEA, single-sample Gene Set Enrichment Analysis.
A heatmap showing the correlation between 30 immune checkpoints and the hub genes ULBP2, INHBB, and STC2 was selected from the literature, demonstrating that the expression of most immune checkpoints was positively correlated with ULBP2, while HHLA2 and TNFRSF25 were negatively correlated with ULBP2; SIGLEC15, NPR1, LAIR1, TNFSF4, ADORA2A, CD276, TNFSF14, ICOSLG, TNFRSF25, VSIR, TNFRSF4, CD40, TNFRSF8 were positively correlated with INHBB expression, while CD244, ICOS, CD48, IDO1, HHLA2, CD44, TNFRSF9 were negatively correlated with INHBB expression; most immune checkpoint expressions were negatively correlated with STC2 expression, with all the above correlations showing significant differences (Figure 7G).
Molecular docking and dynamics simulations of predicted drugs targeting hub genes
The CTD database was used to screen small-molecule drug structures associated with the genes ULBP2, INHBB, and STC2, each having a reference count exceeding 1, identifying 25 drug compounds. Among the drugs predicted for the ULBP2, INHBB, and STC2, valproic acid, cyclosporine, and genistein were selected as three important drug compounds after comprehensive evaluation. To assess the affinity of the candidate drugs for their targets, molecular docking analysis was performed. Affinity is a score used by the software to calculate binding capability. The calculations showed successful docking of ULBP2 and valproic acid (Figure 8A), with a binding energy of −5.3; INHBB and cyclosporine successfully docked (Figure 8B), with a binding energy of −12.49; STC2 and genistein successfully docked (Figure 8C), with a binding energy of −8.5. Xu et al. have reported the potential of STC2 as a CRC-related target, and our molecular docking and dynamics simulation results further confirm its binding stability with genistein, providing a basis for targeted drug development (15). Yellow dashed lines in the figure indicate the hydrogen bonds formed between molecules, and the hydrogen bond lengths were calculated using PYMOL software (Schrödinger LLC, New York, NY, USA), with subsequent image refinement and output. The strong binding of small-molecule compounds to ULBP2, INHBB, and STC2 indicated their potential therapeutic effects by targeting the proteins associated with these hub genes in CRC patients.
Figure 8.
Molecular docking results of ULBP2, INHBB and STC2 with predicted drugs. (A) The ULBP2 protein was docked with valproic acid, and the hydrogen bonds formed between the molecules were marked with yellow dotted lines. (B) The INHBB protein was docked with cyclosporine. (C) The STC2 protein was docked with genistein.
Molecular dynamics simulations were conducted to further validate the binding affinity between the drug small molecules and the three core targets of the hub genes. The RMSD quantifies the conformational stability of proteins and ligands by assessing deviations in atomic positions from their initial states. A smaller deviation indicates greater conformational stability (16). RMSD was utilized to evaluate the simulation system’s stability. As shown in Figure 9A, the three sets of complexes reached equilibrium after 70 ns, indicating stable binding of the drugs to all proteins. The INHBB-cyclosporine complex exhibited the lowest RMSD value, around 0.6 nm, suggesting a notably stable interaction between INHBB and cyclosporine.
Figure 9.
Molecular dynamics simulation of three hub genes and predicted drugs. (A) RMSD of three hub genes and drugs. (B) RMSF of three hub genes and drugs. (C) SASA of three hub genes and drugs. (D) Rg of three hub genes and drugs. (E) Hydrogen bonds numbers of three hub genes and drugs. Rg, radius of gyration; RMSD, root mean square deviation; RMSF, root mean square fluctuation; SASA, solvent-accessible surface area.
The RMSF measures the flexibility of amino acid residues in a protein (17). As shown in Figure 9B, except for the C-terminus and N-terminus, the RMSF values for amino acid residues in both the INHBB-cyclosporine and STC2-genistein complexes remained relatively low, ranging from 0.25 to 1 nm. The ULBP2-valproic acid complex exhibited a slightly higher RMSF value than the other two complexes, suggesting increased flexibility. The SASA is crucial for assessing protein surface exposure. This simulation assessed the SASA between the target protein and the drug small molecules, as shown in Figure 9C. The findings indicated stable fluctuations in the SASA values for both the INHBB-cyclosporine and STC2-genistein complexes, suggesting no notable expansion or contraction post-binding. The radius of gyration indicates structural alterations and the compactness of a protein, with larger Rg changes signifying system expansion. As shown in Figure 9D, the Rg value of the ULBP2-valproic acid complex decreased after binding, indicating that these interactions led to a tighter conformation. The Rg values for the INHBB-cyclosporine and STC2-genistein complexes remained stable throughout the 100 ns simulation, suggesting no significant structural changes. To investigate the nature of the hydrogen bonds at the binding sites of the complexes, the number of hydrogen bonds formed as the main stabilizing interactions between the drug small molecule ligands and the proteins was calculated. Figure 9E demonstrates distinct hydrogen bonding patterns across complexes: ULBP2-valproic acid maintained 1 stable bond (3–4 within 0.35 nm), INHBB-Cyclosporine showed 1–2 stable bonds (2–3 within 0.35 nm), and STC2-Genistein exhibited 1–3 stable bonds (5–7 within 0.35 nm). Based on the analysis of hydrogen bond formation, these complexes all exhibited good binding stability.
Single-cell, cell communication, and pseudotime analysis
After initial screening, logarithmic normalization and dimensionality reduction methods were applied to the scRNA-SEQ dataset. Using the uniform manifold approximation and projection (UMAP) method, 15 different cell populations were annotated, and the distribution of each cell population was visualized in the UMAP plot (Figure 10A). The 15 cell populations identified were cancer-associated fibroblasts, CD4+ NKT-like cells, classical Monocytes, erythroid-like and precursor cells, memory CD4+ T cells, mesenchymal cells, naive B cells, naive CD4+ T cells, plasma B cells, plasmacytoid Dendritic cells, platelets, progenitor cells, stromal cells, tissue resident memory T (TRM) cells, and unknown. Among the 15 cell populations, the number of genes contained in each population varied, with stromal cells containing the most genes at 1,361, followed by cancer-associated fibroblasts with 1,194 genes (Figure 10B). Each cell population was characterized by a representative marker gene, and the 15 marker genes used for single-cell clustering analysis are displayed in the UMAP (Figure 10C). The characteristic dot plot highlights the three most significantly expressed genes in each cell cluster (Figure 10D). The UMAP displays the expression levels of ULBP2, INHBB, and STC2 across different cell populations (Figure 10E). The characteristic dot plot indicated that ULBP2, INHBB, and STC2 were significantly expressed in progenitor cells (Figure 10F).
Figure 10.
The cell populations identified from the GSE261388 scRNA-seq dataset. (A) Annotating cell populations in colorectal cancer using UMAP. (B) Bar graph shows the number of genes contained in different cells. (C) Featureplots show the top 1 marker genes expression in 15 cell populations. (D) The point chart shows the top 3 marker genes expression in 15 cell populations. (E) Featureplots show the 3 hub genes expression in 15 cell populations. (F) The point chart shows the 3 hub genes expression in 15 cell populations. UMAP, uniform manifold approximation and projection.
To further investigate the interactions between the 15 cell clusters, we used the “CellChat” tool for cell interaction analysis. We found strong interactions among stromal cells, cancer-associated fibroblasts, and progenitor cells, indicating that these interactions could influence the immune microenvironment (Figure 11A,11B). We further explored the potential incoming and outgoing signaling pathways and specific ligand-receptor pairs among 15 cell types (Figure 11C). In the Outgoing signaling patterns plot, progenitor cells exhibited the highest number of ligands, followed by cancer-associated fibroblasts. The potential signaling pathways for progenitor cells included COLLAGEN, MHC-I, APP, MIF, PECAM1, LAMININ, CD99, CDH5, VISFATIN, CXCL, ESAM, NOTCH, JAM, EPHB, CD46, NECTIN, ICAM, THBS, PTPRM, and PDGF. The potential signaling pathways for cancer-associated fibroblasts included COLLAGEN and FN1 (Figure 11D). Analysis of incoming signaling patterns revealed that CD4+ NKT-like cells harbored the highest number of receptors, followed by progenitor cells. The potential signaling pathways for progenitor cells included PECAM1, LAMININ, CCL, CD99, FN1, MK, CDH5, VISFATIN, ESAM, NOTCH, VEGF, PARs, JAM, TIGIT, EPHB, ADGRE5, CD46, and PTPRM. The potential signaling pathways for cancer-associated fibroblasts included PDGF (Figure 11E). The bubble plot analysis (Figure 11F) identified three key interaction patterns: (I) strong autocrine PECAM1-PECAM1 signaling between progenitor cells, (II) multi-directional APP-CD74 mediated communication from progenitor cells to themselves, naive B cells, and plasmacytoid dendritic cells, and (III) paracrine CCL5-ACKR1 signaling from CD4+ NKT-like cells to progenitor cells.
Figure 11.
The cell communication analysis from the GSE261388 scRNA-seq dataset. (A) The network graph demonstrates interactions among 15 cell populations. (B) The network graph illustrates the interactions between progenitor cells and other cells. (C) The heatmap shows the number of interactions between different types of immune cells. The horizontal and vertical axes of the heatmap represent Sources (senders) and Targets (receivers), respectively. (D) The heatmap shows the Outgoing signaling patterns between 15 different types of immune cells and 34 signaling molecules. The horizontal axis of the heatmap represents different cell types, and the vertical axis represents different signaling molecules. (E) The heatmap shows the Incoming signaling patterns between 11 different types of immune cells and 34 signaling molecules. (F) Communication bubble diagrams show cell communication between different cell types through ligands and receptors in signaling pathways.
We next reconstructed pseudotime cell trajectories using the 15 identified cell types to pinpoint key gene expression programs influencing CRC progression. The pseudotime stages in the trajectory revealed different processes (Figure 12A,12B). Temporal profiling of hub genes ULBP2, INHBB, and STC2 in the pseudotime analysis revealed stage-specific patterns. ULBP2 exhibited high expression in the mid-to-late stages of the pseudotime analysis, while both INHBB and STC2 showed high expression in the early stages of the pseudotime analysis (Figure 12C). Cellular pseudotemporal mapping showed progenitor cells predominantly occupying initial trajectory positions, whereas cancer-associated fibroblasts were distributed bimodally, with the majority in early stages and a small portion in the mid-to-late stage (Figure 12D).
Figure 12.
Pseudotime analysis of the GSE261388 scRNA-seq dataset. (A) Pseudotime sorting trajectory diagram. (B) Pseudotime state trajectory diagram. (C) The point heatmap shows the gene expression levels of ULBP2, INHBB and STC2 in pseudotime analysis. (D) The respective display of 15 type cells in the pseudotime analysis.
ST analysis
The ST data were standardized and normalized, and five different cell populations were annotated using the UMAP method. The UMAP enables visualization of the distribution of each cell population, which includes fibroblast, lamina propria, normal epithelium, smooth muscle, and tumor (Figure 13A). Clustering visualization using the UMAP method was conducted to compare tumor-adjacent tissues with tumor tissues (Figure 13B). Spatial mapping revealed distinct distribution patterns of the five cell types (Figure 13C). Figure 13D displays the spatial information of cells in both tumor-adjacent and tumor tissues. Figure 13E illustrates the spatial expression levels of the hub genes ULBP2, INHBB, and STC2 in individual cells.
Figure 13.
Spatial transcriptome analysis in colorectal cancer. (A) Annotating cell populations in colorectal cancer utilizing UMAP. (B) Annotation of tumor cells and paracancerous cells using UMAP. (C) Annotation of cell populations in spatial transcriptome. (D) Annotation of tumor cells and paracancerous cells in spatial transcriptome. (E) Expression levels of ULBP2, INHBB and STC2 in the spatial transcriptome. UMAP, uniform manifold approximation and projection.
Discussion
CRC is a major global health issue and a leading cause of cancer-related morbidity and mortality, characterized by complex interactions between genetic and environmental factors, which contribute to its pathogenesis and progression (18). Although early detection and treatment strategies have progressed, the prognosis for CRC patients remains heterogeneous, largely due to factors like tumor stage, histological features, and the tumor microenvironment (19). Recent studies have emphasized the vital role of immune responses in CRC, suggesting that immune-related biomarkers may be valuable prognostic indicators. Comprehending the molecular foundation of CRC, especially the interplay between immune components and tumor biology, is essential for creating targeted treatments and enhancing patient outcomes (20).
This study sought to identify immune and prognostic DEGs linked to CRC progression and to develop clinical prognosis models. Through bioinformatics analysis of clinical and gene expression data from TCGA and GEO databases, we identified differentially expressed, prognostic, and immune-related genes. Through LASSO regression, univariate Cox regression analysis, and machine learning algorithms, we identified ULBP2, INHBB, and STC2 as important hub genes (21). Current evidence suggests that ULBP2 is significantly associated with immune cell interactions and may influence tumor microenvironment dynamics, affecting patient survival (22). This study provides new ideas for personalized treatment and offers clinicians a reliable prognostic assessment tool, thereby helping to improve the survival rate of CRC patients (23).
This study employed five machine learning algorithms to identify hub genes, such as ULBP2, INHBB, and STC2. The application of machine learning techniques significantly improved the accuracy and efficiency of gene screening, providing new ideas for subsequent biomarker research. Each algorithm presents unique strengths and weaknesses in gene screening, and optimizing these algorithms to improve screening efficiency is a crucial area for future research (24). The three hub genes ULBP2, INHBB, and STC2 were all highly expressed in CRC patients and are associated with poorer prognosis. The KM curve indicates HR values exceeding 1, with P values below 0.05. The high expression of these genes may serve as important indicators for the prognosis of CRC, assisting clinicians in decision-making when formulating treatment plans. Further research on the expression differences of these genes in CRC at different stages and their correlation with clinical characteristics will help reveal their potential mechanisms and relationship with CRC progression (25). Validation through the GSE21815 dataset showed that ULBP2, INHBB, and STC2 were significantly overexpressed in the disease group, with areas under the ROC curve all greater than 0.7. The validation results indicated that the selected hub genes demonstrated good diagnostic capabilities across different datasets, enhancing the reliability of the research findings. Integrating diverse datasets for a more comprehensive analysis and designing future multicenter studies will be crucial for advancing development in this field (26).
Next, we constructed a clinical prognostic model incorporating hub genes and validated the model’s accuracy through calibration curves. This model offers a new tool for risk assessment in CRC patients, helping clinicians develop personalized treatment plans. In practical clinical applications, the analysis of the impact weights of different clinical factors in the model will enhance its usability and accuracy (27). Survival analysis revealed significant differences in outcomes between high-risk and low-risk groups, offering crucial evidence for clinical decision-making (28). By further optimizing the risk group strategy, the clinical applicability of the model can be improved, and it can even be extended to the construction of risk assessment models for other types of cancer (29).
Analysis of immune infiltration revealed significant differences in the infiltration of 28 immune cell types between high and low expression groups, underscoring the role of immune cells in the CRC microenvironment. The presence of these immune cells may affect tumor progression and treatment response; therefore, regulating the level of immune cell infiltration is expected to improve treatment outcomes. A deeper understanding of immune cell types and their functions can provide new therapeutic insights for clinical practice, especially in the context of personalized immunotherapy (30). The present study identified 30 immune checkpoints, revealing a positive correlation with ULBP2, INHBB, and STC2 for most of them. These findings provide a biological basis for understanding the response of CRC to immune checkpoint inhibitors and contribute to the development of new immunotherapy strategies. Future studies should integrate immune checkpoints with hub genes to investigate novel treatment strategies, to enhance survival rates and quality of life for CRC patients (31).
The binding energies of ULBP2, INHBB, and STC2 with their corresponding drugs were −5.3, −12.49, and −8.5, respectively. These drugs could offer therapeutic benefits by targeting proteins associated with hub genes, suggesting novel approaches for personalized CRC treatment. Molecular docking results revealed the drugs’ binding affinity to target proteins and suggest potential clinical applications, providing crucial evidence for future drug development (32). Following a 100 ns molecular dynamics simulation, the INHBB-cyclosporine complex exhibited the lowest RMSD value, suggesting a stable interaction. This result provides a dynamic perspective for understanding the interaction between the drug and the target protein and helps optimize drug design. By evaluating indicators such as RMSD and RMSF, we can explore the characteristics of the interaction between the drug and the target in greater depth, thereby providing a theoretical basis for subsequent experimental validation (33). The ULBP2-valproic acid, INHBB-cyclosporine, and STC2-Genistein complexes exhibit 3–4, 2–3, and 5–7 hydrogen bonds, respectively, suggesting good stability. The stability of hydrogen bonds is an essential factor in drug binding, contributing to the effectiveness and selectivity of the drug. Therefore, considering the comprehensive impact of hydrogen bonds and other forces (such as hydrophobic interactions) on drug design will be crucial in the drug development process (34).
In single-cell RNA sequencing analysis, we further revealed the interactions between different cell types and their impact on tumor progression (35). We identified 15 different cell populations, and the analysis indicated that ULBP2, INHBB, and STC2 were significantly expressed in progenitor cells. This finding highlights the diverse cell types within the tumor microenvironment and their potential roles in CRC progression (4). A comprehensive investigation into the roles of specific cell types in tumor development will yield crucial insights for future personalized therapies. Through analysis using CellChat software, we found strong interactions between stromal cells, tumor-associated fibroblasts, and progenitor cells, which affect the immune microenvironment. This discovery underpins the theoretical understanding of intercellular communication and advances immunotherapy development (36).
Future studies should leverage this communication data to devise improved treatment strategies (37). Temporal analysis revealed distinct gene expression patterns across stages, with ULBP2 elevated in later stages, while INHBB and STC2 were more prominent initially. This finding provides new insights into understanding CRC progression and helps identify critical time points and intervention points (38). The potential application of pseudotime analysis in tumor progression research will provide important evidence for formulating corresponding treatment strategies. ST analysis identified five distinct cell populations using the UMAP method, illustrating their distribution across tumor and non-tumor regions. This provides a cellular perspective for understanding the complexity of the tumor microenvironment (39). Further exploration can be conducted on the roles and functions of various cell populations in tumor development, as well as the interactions between different cell types in the tumor microenvironment.
This study is primarily limited by the absence of wet lab validation and a relatively small sample size. In addition, the diversity of the dataset may lead to inter-batch differences, which could affect the reproducibility and reliability of the results. Despite accessing extensive gene expression and clinical data from various public databases, careful consideration of data heterogeneity is essential to maintain result validity. Future research should experimentally validate the functions of these DEGs and increase the sample size to improve result generalizability. Furthermore, considering the differences in tumor microenvironments and patient individuality, future studies should also explore the expression and function of these hub genes in different clinical contexts.
Conclusions
In summary, this study successfully identified the hub genes ULBP2, INHBB, and STC2 as associated with CRC and validated their clinical significance through various bioinformatics analyses. Elevated expression of these genes correlates with poor patient prognosis, indicating their potential as biomarkers and therapeutic targets. While our findings hold promise for clinical applications, additional experimental research is necessary to validate their biological functions and mechanisms. This research could lead to novel strategies for early diagnosis and personalized treatment of CRC.
Supplementary
The article’s supplementary files as
Acknowledgments
We thank TCGA and GEO databases for providing shared data.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Footnotes
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-918/rc
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-918/coif). The authors have no conflicts of interest to declare.
References
- 1.De S, Paul S, Manna A, et al. Phenolic Phytochemicals for Prevention and Treatment of Colorectal Cancer: A Critical Evaluation of In Vivo Studies. Cancers (Basel) 2023;15:993. 10.3390/cancers15030993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Li Q, Wang J, Cheng Y, et al. Long-Term Survival of Neuroblastoma Patients Receiving Surgery, Chemotherapy, and Radiotherapy: A Propensity Score Matching Study. J Clin Med 2023;12:754. 10.3390/jcm12030754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ilie-Petrov AC, Cristian DA, Diaconescu AS, et al. Molecular Deciphering of Colorectal Cancer: Exploring Molecular Classifications and Analyzing the Interplay among Molecular Biomarkers MMR/MSI, KRAS, NRAS, BRAF and CDX2 - A Comprehensive Literature Review. Chirurgia (Bucur) 2024;119:136-55. 10.21614/chirurgia.2024.v.119.i.2.p.136 [DOI] [PubMed] [Google Scholar]
- 4.Zhu M, Hu Y, Gu Y, et al. Role of amino acid metabolism in tumor immune microenvironment of colorectal cancer. Am J Cancer Res 2025;15:233-47. 10.62347/ZSOO2247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wu Y, Zhuang J, Qu Z, et al. Advances in immunotyping of colorectal cancer. Front Immunol 2023;14:1259461. 10.3389/fimmu.2023.1259461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhao L, Xi L, Liu Y, et al. The Impact of Tertiary Lymphoid Structures on Tumor Prognosis and the Immune Microenvironment in Colorectal Cancer. Biomedicines 2025;13:539. 10.3390/biomedicines13030539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Li C, Wirth U, Schardey J, et al. An immune-related gene prognostic index for predicting prognosis in patients with colorectal cancer. Front Immunol 2023;14:1156488. 10.3389/fimmu.2023.1156488 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhong H, Shi Q, Wen Q, et al. Pan-cancer analysis reveals potential of FAM110A as a prognostic and immunological biomarker in human cancer. Front Immunol 2023;14:1058627. 10.3389/fimmu.2023.1058627 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Guo Q, Zhao L, Yan N, et al. Integrated pan-cancer analysis and experimental verification of the roles of tropomyosin 4 in gastric cancer. Front Immunol 2023;14:1148056. 10.3389/fimmu.2023.1148056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang T, Wu Z, Li L, et al. CellGAT: A GAT-Based Method for Constructing a Cell Communication Network Integrating Multiomics Information. Biomolecules 2025;15:342. 10.3390/biom15030342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ma Q, Li Q, Zheng X, et al. CellCommuNet: an atlas of cell-cell communication networks from single-cell RNA sequencing of human and mouse tissues in normal and disease states. Nucleic Acids Res 2024;52:D597-606. 10.1093/nar/gkad906 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Peng Z, Ren Z, Tong Z, et al. Interactions between MFAP5 + fibroblasts and tumor-infiltrating myeloid cells shape the malignant microenvironment of colorectal cancer. J Transl Med 2023;21:405. 10.1186/s12967-023-04281-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yang X, Su X, Wang Z, et al. ULBP2 is a biomarker related to prognosis and immunity in colon cancer. Mol Cell Biochem 2023;478:2207-19. 10.1007/s11010-022-04647-2 [DOI] [PubMed] [Google Scholar]
- 14.Yuan J, Xie A, Cao Q, et al. INHBB Is a Novel Prognostic Biomarker Associated with Cancer-Promoting Pathways in Colorectal Cancer. Biomed Res Int 2020;2020:6909672. 10.1155/2020/6909672 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Xu Y, Cao C, Zhu Z, et al. Novel Hypoxia-Associated Gene Signature Depicts Tumor Immune Microenvironment and Predicts Prognosis of Colon Cancer Patients. Front Genet 2022;13:901734. 10.3389/fgene.2022.901734 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Coutsias EA, Wester MJ. RMSD and Symmetry. J Comput Chem 2019;40:1496-508. 10.1002/jcc.25802 [DOI] [PubMed] [Google Scholar]
- 17.Sharanya CS, Wilbee DS, Sathi SN, et al. Computational screening combined with well-tempered metadynamics simulations identifies potential TMPRSS2 inhibitors. Sci Rep 2024;14:16197. 10.1038/s41598-024-65296-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Feng Y, Zhang Y, Zhou D, et al. MicroRNAs, intestinal inflammatory and tumor. Bioorg Med Chem Lett 2019;29:2051-8. 10.1016/j.bmcl.2019.06.013 [DOI] [PubMed] [Google Scholar]
- 19.Xu Y, Fu G, Chen X, et al. Integrated single-nuclear RNA sequencing analysis reveals distinct characteristics of mucinous adenocarcinoma in right-sided colon cancer. Int J Biol Macromol 2025;309:142744. 10.1016/j.ijbiomac.2025.142744 [DOI] [PubMed] [Google Scholar]
- 20.Greco L, Rubbino F, Dal Buono A, et al. Microsatellite Instability and Immune Response: From Microenvironment Features to Therapeutic Actionability-Lessons from Colorectal Cancer. Genes (Basel) 2023;14:1169. 10.3390/genes14061169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang L, Yu S, Chan ER, et al. Notch-Regulated Dendritic Cells Restrain Inflammation-Associated Colorectal Carcinogenesis. Cancer Immunol Res 2021;9:348-61. 10.1158/2326-6066.CIR-20-0428 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lv C, Luo K, Liu S. Fucosyltransferase 4 Predicts Patient Outcome in Rectal Cancer through an Immune Microenvironment-Mediated Multi-Mechanism. J Oncol 2022;2022:4637570. 10.1155/2022/4637570 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gong X, Wu Q, Tan Z, et al. Identification and validation of cuproptosis and disulfidptosis related genes in colorectal cancer. Cell Signal 2024;119:111185. 10.1016/j.cellsig.2024.111185 [DOI] [PubMed] [Google Scholar]
- 24.Li J, Deng Z, Liu Y, et al. Prognostic and immunological significance of metastasis-associated protein 3 in patients with thymic epithelial tumors. Discov Oncol 2024;15:216. 10.1007/s12672-024-01066-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yang C, Yu T, Lin Q. A Novel Signature Based on Anoikis Associated with BCR-Free Survival for Prostate Cancer. Biochem Genet 2023;61:2496-513. 10.1007/s10528-023-10387-9 [DOI] [PubMed] [Google Scholar]
- 26.Martins-da-Silva A, Baroni M, Salomão KB, et al. Clinical Prognostic Implications of Wnt Hub Genes Expression in Medulloblastoma. Cell Mol Neurobiol 2023;43:813-26. 10.1007/s10571-022-01217-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yang H, Hua J, Han Y, et al. Development and preliminary validation of five miRNAs for lung adenocarcinoma prognostic model associated with immune infiltration. Sci Rep 2025;15:528. 10.1038/s41598-024-84128-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang L, Wang S, Wang L. Construction of lncRNA prognostic model related to disulfidptosis in lung adenocarcinoma. Heliyon 2024;10:e35657. 10.1016/j.heliyon.2024.e35657 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wang S, Zi H, Li M, et al. Development and validation of a mitotic catastrophe-related genes prognostic model for breast cancer. PeerJ 2024;12:e18075. 10.7717/peerj.18075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Christodoulou S, Sotiropoulou CD, Vassiliu P, et al. MicroRNA-675-5p Overexpression Is an Independent Prognostic Molecular Biomarker of Short-Term Relapse and Poor Overall Survival in Colorectal Cancer. Int J Mol Sci 2023;24:9990. 10.3390/ijms24129990 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cotan HT, Emilescu RA, Iaciu CI, et al. Prognostic and Predictive Determinants of Colorectal Cancer: A Comprehensive Review. Cancers (Basel) 2024;16:3928. 10.3390/cancers16233928 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Li Q, Hujiaaihemaiti M, Wang J, et al. Identifying key transcription factors and miRNAs coregulatory networks associated with immune infiltrations and drug interactions in idiopathic pulmonary arterial hypertension. Math Biosci Eng 2023;20:4153-77. 10.3934/mbe.2023194 [DOI] [PubMed] [Google Scholar]
- 33.Liu M, Xu Y. Gene Identification and Potential Drug Therapy for Drug-Resistant Melanoma with Bioinformatics and Deep Learning Technology. Dis Markers 2022;2022:2461055. 10.1155/2022/2461055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liu M, Yang F, Xu Y. Identification of Potential Drug Therapy for Dermatofibrosarcoma Protuberans with Bioinformatics and Deep Learning Technology. Curr Comput Aided Drug Des 2022;18:393-405. 10.2174/1573409918666220816112206 [DOI] [PubMed] [Google Scholar]
- 35.Pathania AS. Immune Microenvironment in Childhood Cancers: Characteristics and Therapeutic Challenges. Cancers (Basel) 2024;16:2201. 10.3390/cancers16122201 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhang A, Miao K, Sun H, et al. Tumor heterogeneity reshapes the tumor microenvironment to influence drug resistance. Int J Biol Sci 2022;18:3019-33. 10.7150/ijbs.72534 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Liu X, Ren B, Ren J, et al. The significant role of amino acid metabolic reprogramming in cancer. Cell Commun Signal 2024;22:380. 10.1186/s12964-024-01760-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hussain Z, Nigri J, Tomasini R. The Cellular and Biological Impact of Extracellular Vesicles in Pancreatic Cancer. Cancers (Basel) 2021;13:3040. 10.3390/cancers13123040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Saunders AS, Bender DE, Ray AL, et al. Colony-stimulating factor 3 signaling in colon and rectal cancers: Immune response and CMS classification in TCGA data. PLoS One 2021;16:e0247233. 10.1371/journal.pone.0247233 [DOI] [PMC free article] [PubMed] [Google Scholar]













