Skip to main content
BMC Cancer logoLink to BMC Cancer
. 2025 Jan 10;25:65. doi: 10.1186/s12885-025-13437-0

A novel machine learning-based immune prognostic signature for improving clinical outcomes and guiding therapy in colorectal cancer: an integrated bioinformatics and experimental study

Yuanchun Zhao 1, Dexu Xun 1, Jiajia Chen 1, Xin Qi 1,
PMCID: PMC11724613  PMID: 39794799

Abstract

Immune cells are pivotal components in the tumor microenvironment (TME), which can interact with tumor cells and significantly influence cancer progression and therapeutic outcomes. Therefore, classifying cancer patients based on the status of immune cells within the TME is increasingly recognized as an effective approach to identify prognostic biomarkers, paving the way for more effective and personalized cancer treatments. Considering the high incidence and mortality of colorectal cancer (CRC), in this study, an integrated machine learning survival framework incorporating 93 different algorithmic combinations was utilized to determine the optimal strategy for developing an immune-related prognostic signature (IRPS) based on the average C-index across the four CRC cohorts. Notably, IRPS was demonstrated to be an independent risk factor for predicting the survival outcomes of CRC patients, showing superior performance compared to traditional clinical features and 63 published signatures in both training and validation cohorts. Furthermore, CRC patients classified in the low-risk group according to the IRPS showed higher sensitivity to immunotherapy than those in the high-risk group, suggesting that low-risk patients are more likely to benefit from immunotherapy. Through in silico screening of potential compounds, dasatinib, vinblastine, and YM-155 were identified as potential therapeutic agents for high-risk CRC patients. In vitro studies demonstrated that knockdown of APCDD1, a key component of the IRPS, inhibited the proliferation, migration and invasion of HT-29 cells and promoted their apoptosis. Thus, the IRPS serve as a powerful tool for predicting patient prognosis, immunotherapy response and candidate drugs, thereby enhancing clinical decision-making and treatment evaluation of CRC.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12885-025-13437-0.

Keywords: Colorectal cancer, Tumor microenvironment, Machine learning in cancer prognosis, Immune-related prognostic signature, Immune checkpoints, Therapeutic agents

Introduction

Colorectal cancer (CRC) ranks as the third most prevalent malignancy worldwide and the second leading cause of cancer-related death globally. By the year 2040, it is projected that the number of newly diagnosed cases of CRC will increase to 3.2 million, with the estimated death toll reaching 1.6 million [1]. Especially, CRC is characterized by strong heterogeneity, with differences in clinical, pathological, and molecular characteristics between patients, posing significant challenges to the customization of effective treatments. Early risk stratification is critical to improve clinical outcomes of CRC patients based on disease progression. Currently, while the TNM staging system is widely used for CRC prognosis in clinical practice, it has limitations in predicting individual outcomes and recurrence risk [2]. Comparatively, prognostic signatures can integrate different molecular markers, enabling a more dynamic and individualized assessment of prognosis. Therefore, it is essential to develop efficient prognostic signatures to advance precision medicine and improve outcomes of CRC patients.

Tumor microenvironment (TME) refers to the complex environment surrounding a tumor, comprising of a variety of cell types such as cancer cells, immune cells, stromal cells, and extracellular matrix components. In CRC, the intricate interplay between tumor cells and the immune cells within this microenvironment can significantly influence the disease progression [3]. It has been reported that immune subtypes, such as “immune-hot” and “immune-cold” tumors, are closely related to CRC progression and therapeutic response. This also reflects the complexity and heterogeneity of the immune response in CRC [4]. Notably, immunotherapy, particularly immune checkpoint inhibitors, has emerged as a promising therapeutic avenue for certain subsets of CRC patients, emphasizing the growing importance of harnessing the immune system in the combating this disease [5, 6]. Thus, the systematic exploration of immune phenotypes represents a valuable approach that contributes to a deeper understanding of the intricate anti-tumor response, guiding the development of effective immunotherapies for CRC.

Machine learning methods excel in capturing complex non-linear relationships and variable selection of high-dimensional datasets, facilitating the establishment of robust prognostic signatures. Recently, several immune-related signatures have been developed to assess the prognosis of patients with different types of cancer, including CRC, demonstrating relatively high performance across specific cohorts [7, 8]. Methodologically, these prognostic signatures are commonly constructed based on LASSO Cox regression analysis, which is a shrinkage method that combines Cox proportional hazards regression with the L1 penalty. However, introducing an L1 penalty for sparsity may result in coefficient bias, with certain variables being overestimated and others excessively overlooked. Its risk of overfitting highlights the need for complementary methods. Therefore, combining other machine learning techniques with LASSO Cox regression could enhance the robustness and predictive accuracy in identifying key factors that affect patient outcomes, making it more effective and popular tool for constructing prognostic models.

In this study, we aim to establish an immune-derived prognostic signature using 93 machine-learning algorithm combinations to assess the overall survival and treatment response in CRC. Its performance was compared with the traditional clinical features and 63 published signatures in the training cohort and four validation cohorts. Moreover, the association between the signature-derived risk score and immune infiltration levels and immunotherapy response were comprehensively explored. Candidate drugs for CRC patients were screened based on the GDSC, CTRP, and PRISM databases. In addition, the function of a representative gene within the prognostic signature was investigated in CRC cell lines (Fig. 1). Therefore, this work provides valuable insights to the prognosis and treatment of CRC.

Fig. 1.

Fig. 1

The pipeline of the present study

Materials and methods

Data acquisition and preprocessing

RNA-seq read count matrix and clinical data of colon adenocarcinoma (COAD, n = 404) and rectum adenocarcinoma (READ, n = 75) were obtained from The Cancer Genome Atlas (TCGA) database (https://gdc-portal.nci.nih.gov/). The read count was converted to the transcripts per million (TPM) level and further log2 transformed. Additionally, GSE17538 (n = 232), GSE29621 (n = 65), and GSE38832 (n = 122) datasets from the Affymetrix GPL570 platform (HG-U133_Plus_2) were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The robust multiarray average (RMA) algorithm was employed to process the raw data from GEO datasets by using the “Affy” R package [9]. Subsequently, after removing batch effects by applying the Combat algorithm, TCGA-COAD and TCGA-READ datasets were merged to form the TCGA cohort, and all the GEO datasets belonging to the Affymetrix GPL570 platform were combined to create the GEO-meta cohort. In the follow-up analysis, the GSE17538 dataset was used as the training cohort, and the GSE29621, GSE38832, GEO-meta, and TCGA datasets were applied as the validation cohorts. Specifically, samples lacking clinical information related to sex, age, Stage, T stage, N stage, M stage, overall survival, and survival status were excluded. The baseline table summarizing the clinical characteristics of CRC patients in the GSE17538, GSE29621, GSE38832, and TCGA datasets was presented in Table S1.

Identification of CRC immune subgroups

The single-sample gene set enrichment analysis (ssGSEA) algorithm utilizing 29 immune gene sets, including genes related to different immune cell types, functions, pathways, and checkpoints (Table S2) [10], was employed to quantify enrichment scores of every sample included in the GSE17538 dataset. This analysis was conducted using R packages “GSVA”, “GSEABase”, and “limma”. The infiltration levels based on these immune gene sets were analyzed using the “ConsensusClusterPlus” R package with a total of 1,000 times repetitions [11]. Subsequently, the consensus score matrix, cumulative distribution function (CDF) curves, the relative change of area under the CDF curve and proportion of ambiguous clustering (PAC) were used collectively to determine the optimal number of clusters [12].

Verification of the effectiveness of immune clustering

To validate the effectiveness of immune clustering approach, tumor purity, stromal score, immune score, and ESTIMATE score of each CRC sample in the GSE17538 dataset was determined using “ESTIMATE” package [13]. Similarly, the CIBERSORT algorithm was employed to quantify the infiltration levels of 22 immune cell types across each sample in the GSE17538 dataset [14]. Moreover, other five methodologies including TIMER, xCell, MCPcounter, EPIC, and quanTIseq were used to assess the immune cell infiltration levels of different immune subgroups by using “IOBR” R package [15]. Besides, the expression levels of human leukocyte antigen (HLA) family and immune checkpoint-related genes were detected. Statistical comparisons of scores, immune infiltration levels, and gene expression abundance between different immune subgroups were performed using the Wilcoxon test.

Identification of immune-related genes with prognostic value

The “limma” R package was utilized to perform a differential expression analysis of the mRNAs between different immune subgroups in GSE17538 dataset [16]. Differentially expressed genes (DEGs) were selected based on the following criteria: |log2 fold change (FC)| >0.585 and FDR < 0.05. To explore the function of the identified DEGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed by applying the “clusterProfiler” R package [17]. Subsequently, univariate Cox regression analysis was conducted to identify DEGs with prognostic value by using “glmnet” R package in GSE17538 dataset [18]. DEGs genes with prognostic value were selected based on the following criteria: FDR < 0.05 and hazard ratios (HRs) > 1 or < 1.

Machine learning-based prognostic signature construction

To develop an immune-related prognostic signature (IRPS) with high accuracy and stability performance, nine machine-learning algorithms, including elastic network (Enet), LASSO, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modelling (GBM), and survival support vector machine (survival-SVM) were selected. Among them, LASSO, CoxBoost, and Stepwise Cox possess capabilities for dimensionality reduction and variable filtering, forming 93 distinct combinations with the other algorithms. Then, those algorithm combinations were performed on DEG genes with prognostic value to fit prediction models based on the 10-fold cross validation in the GSE17538 cohort. For each model, the Harrell’s concordance index (C-index) was calculated across GSE17538 dataset and three validation datasets (GSE29621, GSE38832, and TCGA), and the model with the highest average C-index on the four datasets was considered the optimal model to perform further analysis.

Performance evaluation of IRPS in predicting CRC patients with different survival risks

According to the cutoff value of the risk score calculated using the “survminer” R package, patients were divided into high-risk and low-risk groups in the training and validation cohorts, respectively. Subsequently, Kaplan–Meier survival curve analysis was performed using the “survival” R package to determine the difference in overall survival between the high- and low-risk groups [19]. The prediction accuracy of the model was evaluated using the time-dependent receiver operating characteristic (ROC) curve analysis, conducted with the “timeROC” R package [20]. Besides, the differences in risk scores among subgroups defined by different clinical factor (e.g. age, gender, Stage, T stage, N stage and M stage) were evaluated using the Wilcoxon tests. Furthermore, to evaluate the accuracy and reliability of IRPS, a total of 63 CRC prognostic models were compiled from published literature in the Pubmed database ( https://pubmed.ncbi.nlm.nih.gov/). (Table S3). Then, the C-index values of the IRPS and 63 reported signatures were calculated, and the “compareC” R package was employed to compare the performance of IRPS with that of other signatures [21]. Furthermore, univariate and multivariate Cox regression analyses were conducted to evaluate the independent predictive ability of the IRPS by using the “survival” R package.

Construction and validation of an IRPS-based nomogram for personalized overall survival prediction in CRC patients

To quantify the overall survival of individual CRC patient, a nomogram that integrates IRPS and common clinical characteristics was developed by using the “rms” R package [22]. In the nomogram, a higher total score suggests a lower overall survival probability. ROC curve analysis and calibration curve analysis were applied to evaluate the nomogram’s discrimination ability and accuracy using the “timeROC” and “calibrate” R packages, respectively [23]. Moreover, decision curve analysis (DCA) was performed to evaluate net clinical benefit of the nomogram by using the “ggDCA” R package [24].

Association between the established prognostic signature and immune status

To investigate the relationship between the IRPS and immune status, the CIBERSORT algorithm was employed to assess the distribution of 22 immune cell types across each sample in the GEO-meta and TCGA datasets [14]. The Spearman correlation analysis was utilized to analyze the association between the immune infiltration scores and the expression of the IRPS-related genes, and the IRPS-based risk score.

Correlation analysis between risk subgroups and immunotherapy response

To assess differences in responses to immunotherapy between high- and low-risk groups, the expression levels of immune checkpoint-related genes were compared using Wilcoxon tests. Meanwhile, the correlation between the expression levels of these genes and risk scores was evaluated using the Pearson correlation analysis. Besides, the immunophenoscore (IPS) reflects a comprehensive measure of immunogenicity and can predict the patient’s response to PD-L1 and CTLA4 treatment [25]. We extracted the IPS score of the COAD and READ patients from The Cancer Immunome Atlas (https://tcia.at/) and compared the potential immunotherapy response between high- and low-risk groups by “ggpubr” R package. Moreover, the tumor immune dysfunction and exclusion (TIDE) (http://tide.dfci.harvard.edu/login/) was conducted to evaluate the potential for tumor immune escape in the two risk groups by using the gene expression profile of tumor samples, with a lower TIDE score indicating a better response to immunotherapy [26].

Potential therapeutic drug screening for CRC patients

Based on the genomics of drug sensitivity in cancer (GDSC) database (https://www.cancerrxgene.org/), the “oncoPredict” R package was utilized to predict IC50 values of 179 drugs for each CRC patient within the GEO-meta cohort [27], and Wilcoxon test was used to analyze the differences of IC50 values between high- and low-risk groups. Moreover, utilizing drug sensitivity data from the Cancer Therapeutics Response Portal (CTRP, https://portals.broadinstitute.org/ctrp) and the Profiling Relative Inhibition Simultaneously in Mixtures (PRISM, https://depmap.org/portal/prism/), the “pRRophetic” R package was employed to calculate the area under curve (AUC) values for each sample in the GEO-meta cohort, where lower AUC values signify higher drug sensitivity [28]. Then, “VennDiagram” R package was used to find intersections of compounds with potential therapeutic effects based on GDSC, CTRP, and PRISM databases. Next, the ADMET properties of the candidate drugs were analyzed by using ADMETlab 2.0 (https:/admetmesh.scbdd.com/) and SwissADME databases (http://wwwswissadme.ch/).

Molecular docking analysis between bottleneck genes and candidate drugs

To determine whether candidate drugs have the potential to target bottleneck genes for the treatment of CRC, the “limma” R package was applied to identify DEGs between high- and low-risk groups in the GEO-meta dataset, with the threshold of |log2FC|≥ 0.585 and FDR < 0.05. Based on the identified DEGs, a PPI network with a confidence score above 0.400 was built by searching the String database (v 11.5) (https://stringdb.org/) [29] and its topological characteristics were analyzed with Cytoscape software (v 3.6.1) [30]. The nodes ranked in the top 10% of betweenness centrality in topology analysis were selected as bottleneck genes. Subsequently, the three-dimensional structures of the bottleneck proteins and candidate drugs were retrieved from the publicly available RCSB Protein Data Bank (PDB) database (https://www.rcsb.org/) [31] and PubChem (https://pubchem.ncbi.nlm.nih.gov/), respectively. After the pretreatment of the protein and ligand, AutoDock software (v 4.2.6) [32] was employed to perform molecular docking between candidate drugs and bottleneck genes, and Pymol software (v 2.4.0) was utilized to visualize and analyze the docking results with the highest affinity [33].

Expression pattern analysis of genes constituting the IRPS

The Gene Expression Profiling Interactive Analysis (GEPIA2, http://gepia2.cancer-pku.cn/) database [34, 35] was employed to conduct differential analysis of gene expression levels between tumor and normal tissues retrieved from TCGA and GTEx databases.

Cell lines and cell culture

The human CRC cell lines HT29, SW480 and SW620, as well as the human normal colon mucosal epithelial cell line NCM460, were purchased from KeyGEN BioTECH Corp., Ltd (China). Among them, HT29 was cultured in McCoy’s 5 A medium (KeyGEN BioTECH Corp., Ltd, China) containing 10% fetal bovine serum (FBS; KeyGEN BioTECH Corp., Ltd, China); SW480 and SW620 were cultured in Leibovitz’s L15 medium (KeyGEN BioTECH Corp., Ltd, China) containing 10% FBS; NCM460 was cultured in RPMI-1640 medium (KeyGEN BioTECH Corp., Ltd, China) containing 10% FBS. All the cells were cultured at 37℃ and 5% CO2 condition.

qRT-PCR analysis

Total RNA was extracted using Trizol reagent (KeyGEN BioTECH Corp., Ltd, China), followed by cDNA synthesis using PrimeScript RT Master Mix (TaKaRa, Japan) according to the manufacturer’s instructions. Gene amplification was conducted using TB Green® Premix Ex Taq™ II (TaKaRa, Japan) with StepOnePlus™ system. GAPDH was utilized as the reference gene for calculating relative gene expression levels. The primer sequences were as follows: APCDD1 forward: GGGTGGAGAAGCAGTACCTT, APCDD1 reverse: GGGTGGCATCAGTGTGAATG; GAPDH forward: CAAATTCCATGGCACCGTCA, GAPDH reverse: AGCATCGCCCCACTTGATTT. The relative expression of APCDD1 was determined by using the 2 − ΔΔCt approach. Each experiment was performed at least three times.

Cell proliferation, migration, invasion and apoptosis assays

The siRNA sequences targeting APCDD1 used in this study were as follows: sense: 5’-GCACCAAGGCCGUGAACUUTT-3’ and anti-sense: 5’-AAGUUCACGGCCUUGGUGCTT-3’. The siAPCDD1 was transfected into HT29 cells using Lipofectamine 3000 reagent (Invitrogen, USA). For the cell proliferation assay, the cells transfected for 48 h were added to 96-well cell culture plates. Cell viability was measured using the Cell Counting Kit-8 (CCK-8, KeyGEN BioTECH Corp., Ltd, China) at different incubation times. Then, the optical density (OD) values at 450 nm were quantified using a microplate reader (KeyGEN BioTECH Corp., Ltd, China). For the cell migration assay, Transwell with 8.0 μm pore size membranes (Corning Corporation, USA) were inserted into a 24-well plate. Meanwhile, the cells transfected for 48 h were resuspended in serum-free medium. 1 × 105/100 µL medium was added in the upper chamber (Corning, USA) and 500 µL DMEM medium containing 20% FBS was added in the lower chamber (Corning, USA). After 24 h of culture, the cells in the upper chamber were fixed with 4% paraformaldehyde and stained with 0.5% crystal violet solution. The number of migrated cells was counted in three randomly selected fields under a microscope. For the cell invasion assay, the chamber was pretreated with 10% Matrigel gel (Corning, USA), and the experiment was carried out following the same procedures as the migration assay. For the apoptosis assay, the cells transfected for 48 h were double stained with FITC-annexin V and propidium iodide (PI) according to the manufacturer’s instructions of apoptosis detection kit (KeyGEN BioTECH Corp., Ltd, China), and the apoptosis rates of the HT29 cells were measured by flow cytometry. The experiments were performed at least three times.

Statistical analysis

All experiments were independently conducted at least three times, and all results were expressed as the mean ± standard deviation. Statistical analysis was performed by using GraphPad Prism (v 8) and R software (v 4.3.1). P < 0.05 was considered to be statistically significant. *P < 0.05, **P < 0.01, ***P < 0.001.

Results

Characterization of immune cell infiltration landscape in CRC based on consensus clustering

Given the pivotal role of the immune system in CRC progression and metastasis, we first performed a consensus cluster analysis based on the infiltration levels of 29 immune cells determined by ssGSEA algorithm in the GSE17538 cohort. As shown in Fig. 2A-D, the consistency clustering matrix, CDF curve, the relative change of area under the CDF curve and PAC statistics all indicated that the optimal number of clusters was achieved when K = 2. Accordingly, two immune subgroups of CRC were identified, referred to Cluster 1 (n = 73) and Cluster 2 (n = 159). Notably, the two clusters exhibited a significant difference in immune infiltration, with Cluster 1 demonstrating a higher abundance of infiltration compared with Cluster 2 (Fig. 2E). Moreover, the ESTIMATE algorithm was employed to verify the stability of the ssGSEA results. As shown in Fig. 2F, patients in Cluster 1 exhibited significantly higher immune scores, stromal scores, and estimate scores than those in Cluster 2, along with lower tumor purity when compared with those in Cluster 2 (P < 0.001). Furthermore, as shown in Fig. S1, based on CIBERSORT, TIMER, xCell, MCPcounter, EPIC and quanTIseq algorithms, it was found that the infiltration level of immune cells in Cluster 1 was higher than that in Cluster 2 (P < 0.05). In addition, the expression levels of multiple immune checkpoint genes (IDO1, LAG3, CTLA4, TNFRSF9, CD80, PDCD1LG2, CD86, PDCD1, LAIR1, TNFRSF14, CD274, HAVCR2, LGALS9, CD48, TNFRSF18 and NRP1) and human leukocyte antigen (HLA) genes (HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-DPA1, HLA-DRA, HLA-DPB1, HLA-DMA, HLA-DMB, HLA-DQB1, HLA-DQA1, HLA-DOA, HLA-DOB and HLA-DQB2) were markedly elevated in the Cluster 1 (Fig. 2G and H, P < 0.05). Therefore, the two immune subtypes were identified in CRC.

Fig. 2.

Fig. 2

Identification and validation of immune subgroups of CRC patients in the GSE17538 cohort. A The consistency clustering matrix when K = 2; (B) The CDF curves of consensus matrix for the K ranging from 2 to 9; (C) The relative change of area under the CDF curve for the K ranging from 2 to 9; (D) The proportion of ambiguous clustering (PAC) score for the K ranging from 2 to 9; (E and F) Differential analysis of the immune status (E) and TME-related scores (F) between Cluster 1 and Cluster 2 subgroups; (G and H) Differential analysis of the expression levels of immune checkpoint genes (G) and HLA family genes (H) between Cluster 1 and Cluster 2 subgroups. *P < 0.05; **P < 0.01; ***P < 0.001

Identification of immune-related genes with prognostic value

To explore the prognostic role of immune-related genes in CRC, 289 DEGs were first identified between the above two immune subtypes (Cluster 1 and Cluster 2) in the GSE17538 dataset (Fig. 3A). Functional enrichment analyses demonstrated that these DEGs were significantly enriched in the immune-related GO terms (e.g. positive regulation of cytokine production, cytokine-mediated signaling pathway, regulation of T cell activation, immune receptor activity) (Fig. 3B) and KEGG pathways (e.g. chemokine signaling pathway, intestinal immune network for lgA production and Th1 and Th2 cell differentiation), which play crucial roles in CRC initiation and progression (Fig. 3C). Then, univariate Cox regression analysis was performed to investigate the relationship between the expression profiles of 289 DEGs and the overall survival of CRC patients. As shown in Fig. 3D, 36 DEGs were identified as prognostic indicators in CRC patients (P < 0.05). Among them, 21 DEGs were associated with a higher risk of mortality (HR > 1) and 15 DEGs were linked to a lower risk of mortality (HR < 1).

Fig. 3.

Fig. 3

Identification of immune-related genes with prognostic value in CRC. A Identification of DEGs between Cluster 1 and Cluster 2 based on GSE17538 cohort; (B) GO functional enrichment analysis of immune-related DEGs; (C) KEGG pathway enrichment analysis of immune-related DEGs; (D) Immune-related DEGs with prognostic value was screened through univariate Cox regression analysis in the GSE17538 cohort

Machine learning-based construction of an immune-related prognostic signature for CRC

To develop a robust IRPS for CRC patients, the 36 DEGs with prognostic significance were further screened by our machine learning-based computational procedure. In detail, 93 algorithm combinations were applied to establish prediction models in the GSE17538 training cohort. The C-index value of each model on the training and validation datasets was calculated. As shown in Fig. 4A, the combination of LASSO and GBM produced the optimal model with the highest average C-index (0.751) across the four CRC cohorts. This model was recognized as the IRPS, which consists of a final set of 13 immune-related DEGs (CALB1, DOCK8, CASP1, RGS1, CD3D, BAG2, SOCS1, ISG15, SERPING1, GALNT6, GUCY2C, CFTR and APCDD1) (Fig. 4B).

Fig. 4.

Fig. 4

Construction of the immune-related prognostic model for CRC through machine learning methods. A An integrated machine learning survival framework incorporating 93 combinations was utilized to determine the optimal strategy for developing IRPS based on the average C-index across the four CRC cohorts; (B) The prognostic model genes were screened by LASSO Cox regression analysis, with the optimal λ value

Performance evaluation of IRPS in predicting CRC patients with distinct risks

CRC patients were stratified into high- and low-risk groups using the optimal cut-off score value derived from the "survminer" R package. As shown in the Fig. 5A, the Kaplan–Meier curve analyses demonstrated that patients in the high-risk group exhibited significantly worse survival outcomes in both the GSE17538 training dataset and four validation datasets (GSE29621, GSE38832, GEO-meta and TCGA) (all P < 0.0001). Moreover, to evaluate the predictive ability of IRPS, time-dependent ROC curve analyses were conducted in the training and four validation cohorts. As shown in Fig. 5B, the AUC value of the IRPS in predicting 1-, 2-, and 3-year overall survival was 0.8561, 0.8802 and 0.8699 in the GSE17538 dataset; 0.7939, 0.8677 and 0.8186 in the GSE29621 dataset; 0.7746, 0.8197 and 0.814 in the GSE38832 dataset; 0.8256, 0.8651 and 0.8465 in the GEO-meta dataset; 0.6132, 0.6452 and 0.6615 in the TCGA dataset, respectively. Furthermore, since clinical characteristics are commonly used to evaluate the prognosis of CRC patients, we assessed the relationship between IRPS and common clinical characteristics. As shown in Fig. S2, in the training and validation datasets, the risk scores of patients in M1, N2, T3-4 and Stage III-IV were significantly higher than those in M0, N0-1, T1-2 and Stage I-II, respectively (P < 0.05).

Fig. 5.

Fig. 5

Performance evaluation of IRPS in predicting CRC patients with distinct risks. A Kaplan-Meier survival curve analysis with 95% confidence intervals for the high- and low-risk CRC patients in the training and validation datasets; (B) Time-dependent ROC curves analysis for evaluating the prognostic performance of IPRS in the training and validation datasets; (C and D) Univariate Cox regression analysis (C) and multivariate Cox regression analysis (D) for assessing the prognostic independence of IRPS in the training and validation datasets

Additionally, to assess the potential of the IRPS as an independent factor for overall survival of CRC patients, the IRPS-derived risk score and several important clinical factors were included in univariate and multivariate Cox regression analyses. Univariate Cox regression analysis showed that the risk score was an independent risk indicator for overall survival in the GSE17538 training cohort (HR: 7.301 [5.218–10.215]) and the two validation cohorts (GSE29621 (HR: 15.609 [6.030–40.410]) and TCGA (HR: 2.294 [1.466–3.590])), with all P < 0.05 (Fig. 5C). Moreover, multivariate Cox regression analysis confirmed the risk score as an independent risk indicator for overall survival in the GSE17538 training cohort (HR: 6.750 [4.685–9.725]) and validation cohorts GSE29621 (HR: 12.227 [4.565–32.748]) and TCGA (HR: 1.768 [1.130–2.765]), with all P < 0.05 (Fig. 5D). Therefore, IRPS established via systematic integration of machine learning algorithms could serve as an independent prognostic factor for CRC patients.

Performance comparison between IRPS and other prognostic signatures for CRC

Advancement in high-throughput sequencing technologies have led to a significant surge in medical big data, facilitating the development of gene expression-based prognostic signatures using machine learning algorithms. To comprehensively assess the performance of IRPS, a total of 63 mRNA prognostic signatures associated with critical biological processes (e.g. ferroptosis, pyroptosis, oxidative stress, and autophagy) were collected from published CRC-related literature. The predictive accuracy of the IRPS was compared with these reported models by calculating C-index values. As shown in Fig. 6, IRPS ranked first in four cohorts (GSE17538, GSE29621, GSE38832 and TCGA) and top 35% in TCGA cohort, exhibiting obviously superior accuracy than other signatures (P < 0.05). Notably, data sources (i.e. GEO and TCGA databases) can impact the generalizability of the prognostic model.

Fig. 6.

Fig. 6

Performance comparisons between IRPS and 63 gene expression-based prognostic signatures. C-index values of the IRPS and 63 published signatures were compared in the training and validation datasets, including GSE17538, GSE29621, GSE38832 and TCGA. *P < 0.05; **P < 0.01; ***P < 0.001

Construction of an IRPS-based nomogram for personalized overall survival prediction in CRC patients

To develop a robust tool for quantifying overall survival probability of CRC individuals, a nomogram was constructed by integrating the IRPS-derived risk score and clinical factors (e.g. gender, age and stage) in the GSE17538 dataset (Fig. 7A). The calibration curves of the nomogram demonstrated good agreement between the predicted and observed overall survival probability (Fig. 7B). Next, ROC analysis was performed to compare the performance of the nomogram with that of the risk score and clinical factors. As shown in Fig. 7C, the nomogram had a higher AUC value than risk score, stage, gender and age in predicting the 1-, 2- and 3-year survival probability of CRC patients. Besides, the DCA curves showed a preferable positive net benefit for the nomogram in predicting 1-, 2-, 3- year overall survival (Fig. 7D). These findings suggested that the nomogram possesses strong potential for clinical application, highlighting their utility in guiding decision-making related to patient outcomes.

Fig. 7.

Fig. 7

Construction and validation of the IRPS-based nomogram for CRC patients in the GSE17538 cohort. A Construction of the IRPS-based nomogram for predicting the overall survival probability of CRC patients; (B-D) Calibration curves (B), ROC curves (C) and DCA curves (D) of the IRPS-based nomogram for predicting the 1-, 2-, and 3-year survival probability of CRC patients

Close association between IRPS and immune status

To explore the relationship between IRPS and TME, CIBERSORT algorithm was employed to investigate the correlation between the 13 genes making up IRPS and 22 types of tumor-infiltrating immune cells in the GEO-meta and TCGA cohorts. As shown in Fig. 8A, there was a significant correlation between these genes and most of the immune cells (P < 0.05). For example, BAG2 abundance was dramatically negatively correlated with infiltration levels of Treg, NK cells activated, and significantly positively correlated with infiltration levels of T cells follicular helper, cells and M1-type macrophages. Next, the association between IRPS-derived risk score and infiltration levels of 22 immune cells was evaluated. As shown in Fig. 8B, the IRPS-derived risk score was positively correlated with the infiltration levels of M2-type macrophages, M0-type macrophages and B cells memory, and negatively correlated with the infiltration levels of T cells CD8, T cells CD4 memory activated and plasma cell.

Fig. 8.

Fig. 8

Correlation analysis between the IRPS and immune status of CRC patients. A Correlation analysis between the expression levels of 13 genes constituting the IRPS and the infiltration levels of multiple immune cells in the GEO-meta and TCGA cohorts; (B) Correlation analysis between the IRPS-based risk score and the infiltration levels of immune cells in the GEO-meta and TCGA cohorts. *P < 0.05; **P < 0.01; ***P < 0.001

Pivotal role of IRPS in guiding immunotherapy response

Given the promising role of immunotherapy in the treatment of CRC, we further investigated the difference in immunotherapy response between high- and low-risk patients. First, the expression levels of immune checkpoint genes PDCD1 and CTLA4 were significantly negatively correlated with the IRPS-derived risk score in the GSE17538 and GEO-meta cohorts (Fig. 9A and B, P < 0.05). Second, IPS is often used to predict the response to immune checkpoint inhibitors, including PD-L1 and CTLA4 [25]. As shown in Fig. 9C, the low-risk group had higher IPS scores than the high-risk group in the TCGA cohort (P < 0.05), indicating the difference in response to PD-L1 and CTLA4 treatment between the two groups. Furthermore, TIDE score which serves as an indicator of the probability of immune escape, was evaluated [26]. As shown in Fig. 9D-F, in the GSE17538 and GEO-meta cohorts, CRC patients belonging to the high-risk group showed significantly higher TIDE scores (P < 0.05), and the risk score was significantly elevated in non-responders to immunotherapy than those responders to immunotherapy (P < 0.05). These findings suggested that low-risk CRC patients are more likely to respond to immunotherapy.

Fig. 9.

Fig. 9

Comparative analysis of the immunotherapy response in high-risk and low-risk CRC patients. A Comparison of the expression levels of immune checkpoint genes including PDCD1 and CTLA4 between high- and low-risk groups in the GSE17538 and GEO-meta cohorts; (B) The expression levels of immune checkpoint genes including PDCD1 and CTLA4 were significantly correlated with the IRPS-based risk scores in the GSE17538 and GEO-meta cohorts; (C) Comparison of the IPS scores between high- and low-risk groups in TCGA cohorts; (D) Comparison of the TIDE scores between high- and low-risk groups in GSE17538 and GEO-meta cohorts; (E) Comparison of TIDE scores between responders and non-responders groups in GSE17538 and GEO-meta cohorts; (F) Comparison of the number of responders and non-responders between high- and low-risk groups in GSE17538 and GEO-meta cohorts. *P < 0.05; **P < 0.01; ***P < 0.001

Potential therapeutic drug screening for CRC patients

To identify potential drugs for treating CRC patients, the IC50 values of commonly used chemotherapy drugs for CRC patients in the GEO-meta dataset were calculated using the “oncoPredict” R package based on the GDSC database. As shown in Fig. 10A, the IC50 values of drugs such as vinblastine, staurosporine, gemcitabine, epirubicin, pevonedistat, luminespib, dactinomycin and sepantronium bromide in the high-risk group were significantly lower than those in the low-risk group (P < 0.05), suggesting that the IRPS-derived risk score might be a potential approach for selecting appropriate drugs for CRC individualized treatment.

Fig. 10.

Fig. 10

Potential therapeutic drug screening for CRC patients. A Estimated IC50 of eight chemotherapeutic agents within the high- and low-risk groups in the GEO-meta dataset; (B) Flowchart of screening potential therapeutic compounds for CRC based on the CTRP and PRISM databases; (C and D) Spearman’s correlation analysis (C) and differential drug response analysis (D) of 4 CTRP-derived compounds. (E and F) Spearman’s correlation analysis (E) and differential drug response analysis (F) of 11 PRISM-derived compounds; (G) Venn plots show the intersection number of potential therapeutic compounds screened in GDSC and CTRP, GDSC and PRISM, and CTRP and PRISM; (H and J) The chemical formula for Dasatinib (H), Vinblastine (I), and YM-155 (J). *P < 0.05; **P < 0.01; ***P < 0.001

Moreover, we identified potential drugs by analyzing gene expression and drug sensitivity profiles from the CTRP and PRISM datasets. Differential AUC values (log2FC > 0.1) between the high- and low-risk CRC patients and negative correlation coefficients (r < − 0.35) between the IRPS-derived risk score and AUC values were set as the threshold for selecting compounds (Fig. 10B) [28, 36, 37]. Accordingly, 4 potential drugs (YM-155, P1-103, birinapant and dasatinib) were identified from the CTRP database (Fig. 10C), and 11 potential drugs (teriflunomide, carfilzomib, elesclomol, YM-155, vindesine, atorvastatin, vinblastine, GZD824, dasatinib, temsirolimus and PHA-793887) were identified from the PRISM database (Fig. 10E). The estimated AUC values of these drugs were negatively correlated with risk scores, and were significantly reduced in the high-risk group (Fig. 10D and F). As lower AUC values signify higher sensitivity to potential drugs [28], three compounds identified in at least two datasets were selected as potential drugs for treating high-risk CRC (Fig. 10G), namely dasatinib (Fig. 10H), vinblastine (Fig. 10I) and YM-155 (Fig. 10J). In addition, the analysis of ADMET properties in both ADMETlab 2.0 and SwissADME databases showed that dasatinib was the most promising drug, followed by vinblastine and YM-155 (Fig. S3). These findings suggested that these three drugs have the potential to treat high-risk CRC patients.

To identify potential targets for dasatinib, vinblastine and YM-155, the PPl network was constructed by using the STRING database based on the 210 DEGs between high- and low-risk groups in the GEO-meta dataset (Fig. S4A). 16 bottleneck genes (FN1, CXCR4, TAGLN, SERPINE1, PTGS2, RUNX2, SPP1, PXDN, FABP1, POSTN, UGT2A3, CCN2, SERPING1, GJA1, FBN1, PROM1, WWTR1 and LCN2) were identified through the topological analysis (Fig. S4B). Among them, the proteins encoded by PTGS2, SERPINE1, COL10A1, CXCR4 and LCN2 had 3D protein structures in the PDB database. Molecular docking analysis was then conducted to assess the potential interactions between the five protein receptors and the three compound ligands. As shown in Fig. S5, dasatinib can bind to the pocket of PTGS2 and CXCR4 proteins with the binding energies of −10.20 kcal/mol and − 9.04 kcal/mol, respectively. Similarly, YM-155 can bind to the pocket of PTGS2 with the binding energies of −8.03 kcal/mol; vinblastine can bind to the pocket of CXCR4 with the binding energies of −7.64 kcal/mol. These findings suggested that dasatinib, vinblastine and YM-155 drugs have the potential to treat CRC by targeting bottleneck genes.

Functional analysis of the representative gene APCDD1 in IRPS

To further explore the role of IRPS in CRC, a functional analysis of genes within IRPS was performed. Notably, among the genes constituting the IRPS, the expression levels and mechanisms of CASP1 [38, 39], CD3D [40], SOCS1 [41], ISG15 [42], GALNT6 [43], GUCY2C [44] and CFTR [45, 46] in CRC have been well-reported. Comparatively, genes such as APCDD1 [47], BAG2 [48], CALB1 [49], DOCK8, RGS1 [50] and SERPING1 have been less investigated, and their expression dynamics remain poorly understood in CRC. Based on this, the expression pattern of APCDD1, a representative gene constituting IRPS, was initially analyzed through GEPIA2 database, and it was found that the expression level of APCDD1 was higher in CRC tissues compared to control tissues (Fig. 11A). The qRT-PCR analysis also demonstrated remarkably higher expression levels of APCDD1 in CRC cell lines (SW620, HT-29, and SW480) compared to normal human colon mucosal epithelial cell line (NCM460) (P < 0.05, Fig. 11B). The expression level of the APCDD1 gene in the HT-29 cell line was successfully reduced from an average of 1.00 to 0.32 following knockdown with siRNA, and the si-APCDD1 cell line was then used for subsequent in vitro experiments (Fig. 11C). As shown in Fig. 11D-G, the proliferation, migration and invasion ability of HT-29 cells in the si-APCDD1 group was significantly lower than that in the si-NC group (P < 0.001), while the total apoptosis rate of HT-29 cells in the si-APCDD1 group was significantly increased compared to the si-NC group (P < 0.05). These findings suggested that APCDD1 may serve as a pivotal oncogene involved in the progression of CRC, providing valuable insights into the translational potential of the prognostic signature.

Fig. 11.

Fig. 11

Functional analysis of the representative gene APCDD1 in IRPS (A) Comparison of the APCDD1 expression level between CRC tissues and normal control tissues based on the GEPIA2 database; (B) The expression level of APCDD1 in CRC lines SW620, HT29 and SW480 was detected by qRT-PCR with the normal colon epithelial cell line NCM460 as a control; (C) The siRNA interference efficiency of APCDD1 in HT29 cell line was detected by qRT-PCR; (D) CCK-8 assays were used to detect the effect of APCDD1 knockdown on HT29 cell proliferation; (E and F) Transwell assays were used to detect the effect of APCDD1 knockdown on HT29 cell migration (E) and invasion (F); (G) Apoptosis assays were used to detect the effect of APCDD1 knockdown on apoptosis of HT29 cells. *P < 0.05; **P < 0.01; ***P < 0.001

Discussion

CRC is a prevalent gastrointestinal malignancy, often diagnosed at advanced stages, which contributes to the limited therapeutic and prognostic outcomes despite advances in treatment strategies. In TME, the intricate interactions between cancer cells and immune cells can influence the course of the disease. Classification of cancer patients based on the status of immune cells in TME can contribute to the screening of effective immune biomarkers, thereby helping to accurately predict the prognosis of patients.

In this study, the CRC samples of GSE17538 dataset were divided into Cluster 1 and Cluster 2 immune subgroups, and the infiltration levels of immune cells in Cluster 1 was significantly higher than that in Cluster 2 (P < 0.05). Then, based on 289 immune-related genes differentially expressed between Cluster 1 and Cluster 2, 36 genes with prognostic value were identified by univariate Cox regression analysis. Through screening 93 machine learning algorithms, integration of LASSO and GBM was selected as the optimal method used for developing a 13-gene prognostic signature termed IRPS. Kaplan–Meier survival curve analysis, ROC curve analysis, univariate and multivariate Cox regression analysis showed that IRPS had excellent prognostic performance in GSE17538 training cohort and four validation cohorts (GSE29621, GSE38832, GEO-meta and TCGA). Especially, compared with 63 published CRC signatures, IRPS demonstrated significantly superior accuracy than other signatures in almost all cohorts, indicating the robustness of its prognostic performance. It should be mentioned that the majority of published models were unsatisfactory across the training cohort and validation cohorts, which might be due to poor generalization capabilities from overfitting [4]. In addition, we observed significantly higher expression levels of immune checkpoint genes, such as PDCD1 and CTLA4, in low-risk patients, suggesting that they were more likely to benefit from immunotherapy. Notably, the IPS and TIDE scores, which are two widely recognized tools for predicting tumor patient sensitivity to immunotherapy response based on expression profiles [26], consistently showed that low-risk CRC patients had a greater response rate to immunotherapy. Overall, these findings indicated that IRPS can provide a method for predicting immunotherapy-sensitive CRC patients.

Considering the poor prognosis and poor sensitivity to immunotherapy observed in high-risk patient groups, this study integrated the GDSC, CTRP, and PRISM databases to identify potential therapeutic agents for CRC patients. Three compounds, namely dasatinib, YM-155 and vinblastine, were found to have the potential for treating CRC patients in high-risk group. Among them, dasatinib and YM-155 have been shown to possess therapeutic effects on CRC. For example, Scott et al. [51]. found that dasatinib, as a BCR-ABL kinase inhibitor, had significant inhibitory activity on the proliferation of CRC cell lines; Dunn et al. [52]. discovered that dasatinib can make KRAS mutant CRC patients more sensitive to cetuximab therapy; Zhan et al. [53]. found that low concentrations of YM-155 can induce apoptosis and ER stress-mediated apoptosis signaling in the CMS1 subtype of CRC. While vinblastine has been established as a treatment option for breast cancer [54] and bladder cancer [55], its potential application in CRC remains unreported. Furthermore, molecular docking analysis showed that these drugs may target the bottleneck genes in PPI network constructed from the DEGs between high- and low-risk groups. Among them, dasatinib can bind to the pocket of the CXCR4 protein with a binding energy of −9.04 kcal/mol, which is consistent with the report that dasatinib can bind to CXCR4 for the treatment of leukemia [56]. Moreover, vinblastine is capable of binding to the pocket of the CXCR4 protein, with a binding energy of −7.64 kcal/mol. This finding aligns with the finding from Cutler et al. [57]. that vinblastine can reduce expression levels of CXCR4 on CRC cell lines. Additionally, the binding energies for dasatinib with PTGS2 and YM-155 with PTGS2 were − 10.20 kcal/mol and − 8.03 kcal/mol, respectively, which is a new discovery in this study. In the future, further cell experiments, animal experiments and clinical trials should be conducted to verify the therapeutic effects of dasatinib, vinblastine and YM-155, thereby providing more powerful support for CRC treatment.

In addition, APCDD1 was identified as a representative gene constituting IRPS. It has been reported that the change of APCDD1 expression level was closely associated with the occurrence, development, and poor prognosis of various cancers. For example, Cho et al. [58]. found that APCDD1 could restrain the invasion of breast cancer cells by inhibiting the classic WNT signaling pathway; Han et al. [59]. reported that APCDD1 knockdown can promote the invasion and metastasis of osteosarcoma cells; Takahashi et al. [47]. found that APCDD1 is directly regulated by the β-catenin/Tcf complex, and its increased expression may contribute to the occurrence of CRC. To date, there have been relatively few studies on the function of APCDD1 in CRC. Accordingly, in vitro experiments were performed to explore the role of APCDD1 in CRC. The results showed that APCDD1 knockdown significantly inhibited the proliferation, migration, and invasion of HT29 cells, and promoted HT29 cell apoptosis. This evidence suggests that APCDD1 may be an important oncogene in the development of CRC. However, current functional studies of APCDD1 are limited to the cellular level, and further investigation using animal models and clinical experiments is necessary for a more comprehensive understanding.

Conclusion

Based on an integrated machine learning survival framework containing 93 combinations, IRPS was developed for predicting the overall survival of CRC patients. It exhibited superior performance compared with the traditional clinical features and 63 published signatures in the training cohort and four validation cohorts. Moreover, the IRPS holds promise for predicting immunotherapeutic responses and identifying therapeutic agents for CRC patients. In addition, APCDD1, a representative gene consisting of IRPS, was demonstrated to inhibit the proliferation, migration and invasion of HT-29 cells, and promote their apoptosis. Therefore, the present study provides a novel tool for risk stratification and therapeutic response assessment of in patients with CRC.

Supplementary Information

12885_2025_13437_MOESM1_ESM.docx (4.1MB, docx)

Additional file 1: Fig. S1 Validation of immune subgroups in CRC by CIBERSORT, TIMER, MCPcounter, EPIC, xCell, and quanTIseq algorithms. (A) Boxplot shows the immune cell infiltration levels between Cluster 1 and Cluster 2 subgroups calculated by the CIBERSORT algorithm; (B) Heatmap shows the immune cell infiltration levels between Cluster 1 and Cluster 2 subgroups calculated by TIMER, MCPcounter, EPIC, xCell, and quanTIseq algorithms. *P<0.05;**P<0.01; ***P<0.001. Fig. S2 Association between the IRPS-based risk score and common clinicopathological features in the training and validation datasets. Fig. S3 ADMET properties of YM-155, Vinblastine and Dasatinib analyzed by ADMET 2.0 and SwissADME databases.(A) ADMET properties of YM-155, vinblastine and dasatinib were analyzed based on ADMETlab 2.0 database; (B) ADMET properties of YM-155, vinblastine and dasatinib were analyzed based on SwissADME database. Fig. S4 The IRPS-related PPI network constructed by DEGs between high- and low-risk CRC groups. (A) Identification of DEGs between high- and low-risk CRC groups in the GEO-meta dataset; (B) Layout of the PPI network constructed by DEGs between high- and low-risk CRC groups. Orange diamond represents an up-regulated bottleneck gene; green diamond represents a down-regulated bottleneck gene; orange circle represents an up-regulated non-bottleneck gene; green circle represents a down-regulated non-bottleneck gene. Fig. S5 Molecular docking between CRC candidate drugs and their potential targets. (A) Heatmap showed the binding energy between each candidate drug and each bottleneck gene; (B) Molecular docking result of dasatinib and PTGS2; (C) Molecular docking result of dasatinib and CXCR4; (D) Molecular docking result of vinblastine and CXCR4; (E) Molecular docking result of YM-155and PTGS2.

12885_2025_13437_MOESM2_ESM.xlsx (14.7KB, xlsx)

Additional file 2: Table S1. The baseline table summarizing the clinical characteristics of CRC patients in the GSE17538, GSE29621, GSE38832, and TCGA datasets.

12885_2025_13437_MOESM3_ESM.xlsx (12.6KB, xlsx)

Additional file 3: Table S2. The gene sets related to immune cell types, functions, pathways and checkpoints.

12885_2025_13437_MOESM4_ESM.xlsx (31.2KB, xlsx)

Additional file 4: Table S3. Information of 63 published prognostic signatures for CRC.

Acknowledgements

Not applicable.

Authors’ contributions

XQ and YZ designed the research. YZ collected the data and performed the computational analyses. YZ and XQ drafted the manuscript. XQ, DX and JC revised the manuscript. All authors read and approved the final manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 32270705) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. KYCX23_3344).

Data availability

Public data used in this work can be acquired from the TCGA Research Network portal (https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO, http:// www.ncbi.nlm.nih.gov/geo/).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Morgan E, Arnold M, Gini A, Lorenzoni V, Cabasag C, Laversanne M, Vignat J, Ferlay J, Murphy N, Bray F. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut. 2023;72(2):338–44. [DOI] [PubMed] [Google Scholar]
  • 2.Schneider NI, Langner C. Prognostic stratification of colorectal cancer patients: current perspectives. Cancer Manage Res. 2014;6:291–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Spill F, Reynolds DS, Kamm RD, Zaman MH. Impact of the physical microenvironment on tumor progression and metastasis. Curr Opin Biotechnol. 2016;40:41–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu Z, Liu L, Weng S, Guo C, Dang Q, Xu H, Wang L, Lu T, Zhang Y, Sun Z, et al. Machine learning-based integration develops an immune-derived lncRNA signature for improving outcomes in colorectal cancer. Nat Commun. 2022;13(1):816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen Q, Li T, Yue W. Drug response to PD-1/PD-L1 blockade: based on biomarkers. OncoTargets Therapy. 2018;11:4673–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pagni F, Guerini-Rocco E, Schultheis AM, Grazia G, Rijavec E, Ghidini M, Lopez G, Venetis K, Croci GA, Malapelle U, et al. Targeting immune-related biological processes in solid tumors: we do need biomarkers. Int J Mol Sci. 2019;20(21):5452. [DOI] [PMC free article] [PubMed]
  • 7.Ning J, Sun K, Fan X, Jia K, Meng L, Wang X, Li H, Ma R, Liu S, Li F, et al. Use of machine learning-based integration to develop an immune-related signature for improving prognosis in patients with gastric cancer. Sci Rep. 2023;13(1):7019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhao B, Pei L. A macrophage related signature for predicting prognosis and drug sensitivity in ovarian cancer based on integrative machine learning. BMC Med Genom. 2023;16(1):230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of Affymetrix GeneChip data at the probe level. Bioinf (Oxford England). 2004;20(3):307–15. [DOI] [PubMed] [Google Scholar]
  • 10.Galon J, Costes A, Sanchez-Cabo F, Kirilovsky A, Mlecnik B, Lagorce-Pagès C, Tosolini M, Camus M, Berger A, Wind P, et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Sci (New York NY). 2006;313(5795):1960–4. [DOI] [PubMed] [Google Scholar]
  • 11.Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinf (Oxford England). 2010;26(12):1572–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Șenbabaoğlu Y, Michailidis G, Li JZ. Critical limitations of consensus clustering in class discovery. Sci Rep. 2014;4:6207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, Treviño V, Shen H, Laird PW, Levine DA, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zeng D, Ye Z, Shen R, Yu G, Wu J, Xiong Y, Zhou R, Qiu W, Huang N, Sun L, et al. IOBR: Multi-omics Immuno-Oncology Biological Research to Decode Tumor Microenvironment and signatures. Front Immunol. 2021;12:687975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innov (Cambridge (Mass). 2021;2(3):100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized Linear models via Coordinate Descent. J Stat Softw. 2010;33(1):1–22. [PMC free article] [PubMed] [Google Scholar]
  • 19.O’Quigley J, Moreau T. Cox’s regression model: computing a goodness of fit statistic. Comput Methods Programs Biomed. 1986;22(3):253–6. [DOI] [PubMed] [Google Scholar]
  • 20.García JP, Ferreira JC, Patino CM. Receiver operating characteristic analysis: an ally in the pandemic. Jornal brasileiro de pneumologia: publicacao oficial da Sociedade Brasileira de Pneumologia e Tisilogia. 2021;47(2):e20210139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kang L, Chen W, Petrick NA, Gallas BD. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach. Stat Med. 2015;34(4):685–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang Z, Kattan MW. Drawing nomograms with R: applications to categorical outcome and survival data. Annals Translational Med. 2017;5(10):211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Prognostic Res. 2019;3:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA. 2015;313(4):409–10. [DOI] [PubMed] [Google Scholar]
  • 25.Charoentong P, Finotello F, Angelova M, Mayer C, Efremova M, Rieder D, Hackl H, Trajanoski Z. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Rep. 2017;18(1):248–62. [DOI] [PubMed] [Google Scholar]
  • 26.Jiang P, Gu S, Pan D, Fu J, Sahu A, Hu X, Li Z, Traugh N, Bu X, Li B, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med. 2018;24(10):1550–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform. 2021;22(6):bbab260. [DOI] [PMC free article] [PubMed]
  • 28.Chen B, Zhou X, Yang L, Zhou H, Meng M, Zhang L, Li J. A cuproptosis activation scoring model predicts neoplasm-immunity interactions and personalized treatments in glioma. Comput Biol Med. 2022;148:105924. [DOI] [PubMed] [Google Scholar]
  • 29.Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kohl M, Wiese S, Warscheid B. Cytoscape: software for visualization and analysis of biological networks. Methods Mol Biology (Clifton NJ). 2011;696:291–303. [DOI] [PubMed] [Google Scholar]
  • 31.Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, Costanzo LD, Duarte JM, Dutta S, Feng Z, et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 2017;45(D1):D271–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Norgan AP, Coffman PK, Kocher JP, Katzmann DJ, Sosa CP. Multilevel parallelization of AutoDock 4.2. J Cheminform. 2011;3(1):12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Seeliger D, de Groot BL. Ligand docking and binding site analysis with PyMOL and Autodock/Vina. J Comput Aided Mol Des. 2010;24(5):417–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tang Z, Li C, Kang B, Gao G, Li C, Zhang Z. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res. 2017;45(W1):W98–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhang X, Feng H, Li Z, Li D, Liu S, Huang H, Li M. Application of weighted gene co-expression network analysis to identify key modules and hub genes in oral squamous cell carcinoma tumorigenesis. OncoTargets Therapy. 2018;11:6001–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yao N, Jiang W, Wang Y, Song Q, Cao X, Zheng W, Zhang J. An immune-related signature for optimizing prognosis prediction and treatment decision of hepatocellular carcinoma. Eur J Med Res. 2023;28(1):123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Yang C, Huang X, Li Y, Chen J, Lv Y, Dai S. Prognosis and personalized treatment prediction in TP53-mutant hepatocellular carcinoma: an in silico strategy towards precision oncology. Brief Bioinform. 2021;22(3):bbaa164. [DOI] [PubMed]
  • 38.Domblides C, Soubeyran I, Lartigue L, Mahouche I, Lefort F, Velasco V, Barnetche T, Blanco P, Déchanet-Merville J, Faustin B. Prognostic role of inflammasome components in human colorectal cancer. Cancers. 2020;12(12):3500. [DOI] [PMC free article] [PubMed]
  • 39.Peng L, Zhu N, Wang D, Zhou Y, Liu Y. Comprehensive Analysis of Prognostic Value and Immune Infiltration of NLRC4 and CASP1 in Colorectal Cancer. Int J Gen Med. 2022;15:5425–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yang Y, Zang Y, Zheng C, Li Z, Gu X, Zhou M, Wang Z, Xiang J, Chen Z, Zhou Y. CD3D is associated with immune checkpoints and predicts favorable clinical outcome in colon cancer. Immunotherapy. 2020;12(1):25–35. [DOI] [PubMed] [Google Scholar]
  • 41.Kang XC, Chen ML, Yang F, Gao BQ, Yang QH, Zheng WW, Hao S. Promoter methylation and expression of SOCS-1 affect clinical outcome and epithelial-mesenchymal transition in colorectal cancer. Biomed Pharmacotherapy = Biomedecine Pharmacotherapie. 2016;80:23–9. [DOI] [PubMed] [Google Scholar]
  • 42.Nguyen HM, Gaikwad S, Oladejo M, Paulishak W, Wood LM. Targeting ubiquitin-like protein, ISG15, as a novel tumor associated antigen in colorectal cancer. Cancers. 2023;15(4):1237. [DOI] [PMC free article] [PubMed]
  • 43.Duan J, Chen L, Gao H, Zhen T, Li H, Liang J, Zhang F, Shi H, Han A. GALNT6 suppresses progression of colorectal cancer. Am J cancer Res. 2018;8(12):2419–35. [PMC free article] [PubMed] [Google Scholar]
  • 44.Pattison AM, Merlino DJ, Blomain ES, Waldman SA. Guanylyl cyclase C signaling axis and colon cancer prevention. World J Gastroenterol. 2016;22(36):8070–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Fink AK, Yanik EL, Marshall BC, Wilschanski M, Lynch CF, Austin AA, Copeland G, Safaeian M, Engels EA. Cancer risk among lung transplant recipients with cystic fibrosis. J Cyst Fibrosis: Official J Eur Cyst Fibros Soc. 2017;16(1):91–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Than BLN, Linnekamp JF, Starr TK, Largaespada DA, Rod A, Zhang Y, Bruner V, Abrahante J, Schumann A, Luczak T, et al. CFTR is a tumor suppressor gene in murine and human intestinal cancer. Oncogene. 2017;36(24):3504. [DOI] [PubMed] [Google Scholar]
  • 47.Takahashi M, Fujita M, Furukawa Y, Hamamoto R, Shimokawa T, Miwa N, Ogawa M, Nakamura Y. Isolation of a novel human gene, APCDD1, as a direct target of the beta-Catenin/T-cell factor 4 complex with probable involvement in colorectal carcinogenesis. Cancer Res. 2002;62(20):5651–6. [PubMed] [Google Scholar]
  • 48.Tu R, Kang W, Kang Y, Chen Z, Zhang P, Xiong X, Ma J, Du RL, Zhang C. c-MYC-USP49-BAG2 axis promotes proliferation and chemoresistance of colorectal cancer cells in vitro. Biochem Biophys Res Commun. 2022;607:117–23. [DOI] [PubMed] [Google Scholar]
  • 49.Man Y, Xin D, Ji Y, Liu Y, Kou L, Jiang L. Identification and validation of a novel six-gene signature based on mucinous adenocarcinoma-related gene molecular typing in colorectal cancer. Discover Oncol. 2024;15(1):63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lange N, Unger FT, Schöppler M, Pursche K, Juhl H, David KA. Identification and validation of a potential marker of tissue quality using gene expression analysis of human colorectal tissue. PLoS ONE. 2015;10(7):e0133987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Scott AJ, Song EK, Bagby S, Purkey A, McCarter M, Gajdos C, Quackenbush KS, Cross B, Pitts TM, Tan AC, et al. Evaluation of the efficacy of dasatinib, a Src/Abl inhibitor, in colorectal cancer cell lines and explant mouse model. PLoS ONE. 2017;12(11):e0187173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Dunn EF, Iida M, Myers RA, Campbell DA, Hintz KA, Armstrong EA, Li C, Wheeler DL. Dasatinib sensitizes KRAS mutant colorectal tumors to cetuximab. Oncogene. 2011;30(5):561–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Zhan T, Faehling V, Rauscher B, Betge J, Ebert MP, Boutros M. Multi-omics integration identifies a selective vulnerability of colorectal cancer subtypes to YM155. Int J Cancer. 2021;148(8):1948–63. [DOI] [PubMed] [Google Scholar]
  • 54.Navarro M, Bellmunt J, Balañá C, Colomer R, Jolis L, del Campo JM. Mitomycin-C and vinblastine in advanced breast cancer. Oncology. 1989;46(3):137–42. [DOI] [PubMed] [Google Scholar]
  • 55.Pfister C, Gravis G, Flechon A, Chevreau C, Mahammedi H, Laguerre B, Guillot A, Joly F, Soulie M, Allory Y, et al. Perioperative dose-dense methotrexate, vinblastine, doxorubicin, and cisplatin in muscle-invasive bladder cancer (VESPER): survival endpoints at 5 years in an open-label, randomised, phase 3 study. Lancet Oncol. 2024;25(2):255–64. [DOI] [PubMed] [Google Scholar]
  • 56.McCaig AM, Cosimo E, Leach MT, Michie AM. Dasatinib inhibits CXCR4 signaling in chronic lymphocytic leukaemia cells and impairs migration towards CXCL12. PLoS ONE. 2012;7(11):e48929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Cutler MJ, Lowthers EL, Richard CL, Hajducek DM, Spagnuolo PA, Blay J. Chemotherapeutic agents attenuate CXCL12-mediated migration of colon cancer cells by selecting for CXCR4-negative cells and increasing peptidase CD26. BMC Cancer. 2015;15:882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Cho SG. APC downregulated 1 inhibits breast cancer cell invasion by inhibiting the canonical WNT signaling pathway. Oncol Lett. 2017;14(4):4845–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Han W, Liu J. Epigenetic silencing of the wnt antagonist APCDD1 by promoter DNA hyper-methylation contributes to osteosarcoma cell invasion and metastasis. Biochem Biophys Res Commun. 2017;491(1):91–7. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12885_2025_13437_MOESM1_ESM.docx (4.1MB, docx)

Additional file 1: Fig. S1 Validation of immune subgroups in CRC by CIBERSORT, TIMER, MCPcounter, EPIC, xCell, and quanTIseq algorithms. (A) Boxplot shows the immune cell infiltration levels between Cluster 1 and Cluster 2 subgroups calculated by the CIBERSORT algorithm; (B) Heatmap shows the immune cell infiltration levels between Cluster 1 and Cluster 2 subgroups calculated by TIMER, MCPcounter, EPIC, xCell, and quanTIseq algorithms. *P<0.05;**P<0.01; ***P<0.001. Fig. S2 Association between the IRPS-based risk score and common clinicopathological features in the training and validation datasets. Fig. S3 ADMET properties of YM-155, Vinblastine and Dasatinib analyzed by ADMET 2.0 and SwissADME databases.(A) ADMET properties of YM-155, vinblastine and dasatinib were analyzed based on ADMETlab 2.0 database; (B) ADMET properties of YM-155, vinblastine and dasatinib were analyzed based on SwissADME database. Fig. S4 The IRPS-related PPI network constructed by DEGs between high- and low-risk CRC groups. (A) Identification of DEGs between high- and low-risk CRC groups in the GEO-meta dataset; (B) Layout of the PPI network constructed by DEGs between high- and low-risk CRC groups. Orange diamond represents an up-regulated bottleneck gene; green diamond represents a down-regulated bottleneck gene; orange circle represents an up-regulated non-bottleneck gene; green circle represents a down-regulated non-bottleneck gene. Fig. S5 Molecular docking between CRC candidate drugs and their potential targets. (A) Heatmap showed the binding energy between each candidate drug and each bottleneck gene; (B) Molecular docking result of dasatinib and PTGS2; (C) Molecular docking result of dasatinib and CXCR4; (D) Molecular docking result of vinblastine and CXCR4; (E) Molecular docking result of YM-155and PTGS2.

12885_2025_13437_MOESM2_ESM.xlsx (14.7KB, xlsx)

Additional file 2: Table S1. The baseline table summarizing the clinical characteristics of CRC patients in the GSE17538, GSE29621, GSE38832, and TCGA datasets.

12885_2025_13437_MOESM3_ESM.xlsx (12.6KB, xlsx)

Additional file 3: Table S2. The gene sets related to immune cell types, functions, pathways and checkpoints.

12885_2025_13437_MOESM4_ESM.xlsx (31.2KB, xlsx)

Additional file 4: Table S3. Information of 63 published prognostic signatures for CRC.

Data Availability Statement

Public data used in this work can be acquired from the TCGA Research Network portal (https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO, http:// www.ncbi.nlm.nih.gov/geo/).


Articles from BMC Cancer are provided here courtesy of BMC

RESOURCES