Abstract
Colorectal cancer (CRC) is the most common malignancy in the digestive system, with a lower 5-year overall survival rate. There is increasing evidence showing that RNA modification regulators such as m1A, m5C, m6A, and m7G play crucial roles in tumor progression. However, the prognostic role of integrated m6A/m5C/m1A/m7G methylation modifications in CRC has not been reported and requires further investigation. Five cohorts with 989 samples were first retrieved from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Then, Three m6A/m1A/m5C/m7G-associated molecular subtypes were identified in the TCGA cohort via the consensus clustering analysis, and 1710 co-expression module genes associated with subtypes were obtained from weighted gene co-expression network analysis (WGCNA) results. After conducting univariate Cox analysis in each cohort and retaining common genes, an RNA methylation-related signature (RMS) was developed through the combination of 101 algorithms. The RMS exhibited strong accuracy and robustness in predicting survival outcomes across distinct cohorts (TCGA, GSE17536, GSE17537, GSE29612, and GSE38832) and demonstrated good performance compared with previously reported risk signatures. Additionally, the RMS was identified as an independent prognostic factor for overall survival in the TCGA, GSE17536, GSE17537, GSE29612, and GSE38832 cohorts. The patients were then stratified into high and low-risk groups based on the median risk score across the five cohorts. Compared to the high-risk groups, the low-risk group showed an increased immune cell infiltration level and showed more benefit from immunotherapy and chemotherapy drugs. Moreover, six drugs (KU-0063794, temozolomide, DNMDP, ML162, SJ-172550, ML050) from the Cancer Therapeutics Response Portal (CTRP) and five drugs (BIBX-1382, lomitapide, ZLN005, PPT, panobinostat) from the PRSM database were identified for the high-risk group patients. By integrating data from the TCGA database and the Cancer Cell Line Encyclopedia (CCLE) database, a potential therapeutic target named TERT was identified for the high-risk group of patients. The single-cell results indicated that TERT was highly expressed in epithelial cells. Overall, our developed RMS can accurately predict patients survival outcomes and immunotherapy response, indicating promising application in clinical practice. These findings may offer guidance for the prognosis and personalized treatment of CRC.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-89944-8.
Keywords: Colorectal cancer, RNA modification, Machine learning, Immune cell, Immunotherapy, Candidate drugs
Subject terms: Prognostic markers, Biomarkers, Tumour biomarkers, Urological cancer
Introduction
According to global cancer statistics, CRC is the most frequently occurring tumor in the worldwide, ranking third highest in incidence and mortality among malignant tumors of the digestive system1,2. It is estimated that there are 1.9 million new cases and 935,000 deaths from CRC each year2. Due to the lack of noticeable symptoms, early CRC patients are frequently diagnosed in the advanced stage. Despite significant advancements in surgery, chemotherapy, and immunotherapy in recent years, the prognosis for metastatic CRC remains poor, with a less than 20% 5-year survival rate3,4. Currently, increasing evidence suggests that biomarkers are therapeutic targets5. However, many published signatures have shown limitations, performing well in the training cohort, but poorly in external validation cohorts. Therefore, it is imperative to develop a novel signature with high performance and robustness for risk stratification in CRC.
RNA modification is considered an important process in epigenetic regulation, that plays a crucial role in processes such as gene expression, genome editing, and cell differentiation6. According to the distinct modification sites, RNA modification exists in various forms, including N1-methyladenosine (m1A), 5-methylcytosine (m5C), N6-methyladenosine (m6A), and 7-methylguanosine (m7G)7. Recent studies have revealed that abnormal expression of RNA modification regulators were tightly associated with tumor progression. For example, the m6A regulator, METTL3 has been shown to have high expression in multiple cancers and is associated with an unfavourable survival outcome, including bladder cancer8, thyroid cancer9, non small cell lung cancer10, and pancreatic cancer11. For m5C regulators, numerous studies have identified that disregulated DNMT1 is correlated with poor prognosis of head and neck squamous cell carcinoma9and hepatocellular carcinoma12. The m1A regulator demethylase ALKBH3, also known as prostate cancer antigen 1 (PCA-1), is enriched in prostate cancer13, and is associated with a poor outcome in breast cancer14 and gastrointestinal cancer15. N7-Methylguanosine (m7G) is a type of modification present at the 5’ cap of RNA and internal messenger RNA, and is one of the most highly methylated modifications16. Numerous studies have indicated that m7G regulators are implicated in the regulation of various cancers, such as colonic adenocarcinoma17, acute myeloid leukemia18, lung squamous cell carcinoma19, high-grade glioma20, and gastric cancer21. Moreover, recent studies have reported that RNA modification can have a significant impact on tumor immunity by influencing the maturation of immune cells and the immunogenicity of RNA, paving a promising direction for cancer immunotherapy22,23.
In the present study, we developed a consensus RNA modification signature through machine learning analysis and validated its performance in multiple cohorts. Additionally, we also investigated the relationship between RMS and immune infiltration and uncovered potential drugs and targets for high-risk patients. These findings may aid in risk stratification and tailored personalized treatment strategies.
Materials and methods
Data source and processing
To develop a robustness consensus signature, a total of 989 samples with their corresponding clinical information were collected from the TCGA and GEO databases. In the TCGA database, we first downloaded the raw reads count of the TCGA-COAD and TCGA-READ cohorts, and then transformed them to transcript per million (tpm) values. We then merged the TCGA-COAD (n = 440) and TCGA-READ (n = 160) cohorts into a training cohort (TCGA-CRC, n = 600) after removing the batch effect using the “sva” R package24. In the GEO database, the expression profile and corresponding clinical information of four cohorts including GSE17536 (n = 177), GSE17537 (n = 55), GSE29612 (n = 65), and GSE38832 (n = 92) were obtained. Additionally, 84 RNA modification regulators were retrieved from previously study25 (Table S1). The overall design were showed in Fig. 1.
Fig. 1.
The overall design of the study.
Consensus clustering of CRC samples
An unsupervised consensus clustering approach utilizing the k-means algorithm was adopted to identify molecular subtypes based on the expression level of RNA modification regulators in the TCGA-CRC cohort. To ensure robustness and reliable clustering results, a total of 1000 iterations and an 80% resampling rate were executed using the “ConsensusClusterPlus” R package26. Then, the cumulative distribution function (CDF) curves, delta area curve, and proportion of ambiguous clustering (PAC) scores were used to determine the optimal cluster numbers.
Characterization of co-expression modules
WGCNA is a useful tool in bioinformatics to identify highly correlated gene modules and associate them with sample traits. This method contributes to selecting candidate biomarkers or therapeutic targets based on correlated networks. To identify subtype associated co-expression module and genes, we applied WGCNA in the TCGA-CRC cohort. Following the reference procedure, we excluded the outliers and selected 12 as the soft threshold to ensure a scale free network. Next, the weighted adjacency matrix was converted into a topological overlap measure (TOM) to estimate the degree of genes and produce the disTOM (1-TOM). The dynamic tree cutting method was applied to module characterization, assigning each module a color, and then merging modules with high similarity. Finally, the Pearson correlation analysis was performed to identify the correlation coefficient and p value between each module and subtypes. The gene significance (GS) and module membership (MM) values were also calculated to screen highly correlated modules with subtypes.
Function annotation and pathway enrichment
To identify molecular functions and involved pathways of the module genes, we applied the “clusterProfiler” R package to explore the biological pathways of the module genes27. The significant functions or pathways were screened based on the criterion: adjusted p value < 0.05. In addition, we retrieved KEGG gene sets from the MSigDB database and used gene set variation analysis (GSVA) to calculated the enrichment score of pathway28. The “GseaVis” R package was applied to visualize the relationship between pathways and signatures.
Machine learning analysis
Before constructing the signature using machine learning, we firstly applied univariate cox regression analysis to select significant co-expression module genes with a p value less than 0.05 in the five cohorts. Then, the common risky genes or protective genes in at least three cohorts were retained for further analysis. The ten machine learning algorithms including stepwise Cox, random survival forest [RSF], elastic network [Enet], supervised principal components [SuperPC], partial least squares regression for Cox [plsRcox], CoxBoost, survival support vector machine [survival-SVM], Lasso, Ridge, and generalized boosted regression modeling [GBM] were integrated into 101 algorithm combinations to fit RAN modification signature based upon 10 fold cross validation in TCGA-CRC cohort. The C-index value across five cohorts was calculated to select the optimal RMS. The patients were then categorized into high and low risk group based on the median risk score across the five cohort. Kaplan-Meier (KM) survival curve analysis was used to estimate the survival difference between the high and low risk group using the “survminer” R package. The receive operator curve (ROC) was applied to evaluate the prognostic value of RMS.
Immune cell infiltration
The distinct immune cell infiltration was estimated by six methods including TIMER, CIBERSORT, QUANTISIQ, MCPCOUNTER, XCELL and EPIC in the “IOBR” R package. Moreover, We applied the tumor immune dysfunction and exclusion (TIDE) (http://tide.dfci.harvard.edu) software to asses the immunotherapy response of each CRC patient.
Drug and therapeutic target identification
To evaluate the chemotherapy drug sensitivity between high and low risk group, the half-maximal inhibitory concentration (IC50) values calculated by pRRophetic R package was applied to predict benefit of patients. Moreover, we obtained the CRC cell line expression data from CCLE database, and 18,333 genes CERES scores from genome-scale CRISPR knockout in 739 cell lines were acquired from the dependency map (DepMap) portal. The CERES score is utilized to estimate the dependency of genes of interest on certain cancer cell lines (CCLs). A lower CERES score suggests a greater importance of the gene for cell growth and survival in the given CCL. Furthermore, we also retrieved numerous drug targets from a previous study29. Subsequently, a correlation analysis was performed between drug target and signature risk score to identify potential drug targets to the high risk patients in TCGA and CCLE database, respectively. Additionally, we also downloaded the area under curve (AUC) value of 481 compounds over 835 CCLs in CTRP database and 1448 compounds over 482 CCLs in PRISM database to further identify potential drugs for the treatment of high risk group patients. The knearest neighbor (k-NN) imputation algorithm was employed to impute the missing AUC values.
Single-cell RNA sequencing (ScRNA-seq) data processing
The raw scRNA-seq dataset GSE132257 was retrieved from the GEO database with 5 normal and 5 CRC tissues included, using the Seurat software (version: 5.1.0)30,31. Cells were excluded if they had less than 200 or more than 2500 genes, expressed genes in less than 3 cells, or had mitochondrial gene content over 15%. The “NormalizedData” and “FindVariableFeatures” function were applied to normalized the cohort, and top 2000 genes with high variable were identified. Subsequently, the “ScaleData” function was used to quantify the gene expression, and batch effects were corrected with the “Harmony” package for dataset integration across different specimens. Moreover, Principal component analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) method was used to perform dimensionality reduction. The “FindNeighbors” and “FindClusters” function were used to determine the cell cluster numbers. Marker genes of clusters were identified using the “FindAllMarkers” function, and the cell types were annotated using the classical markers.
Statistically analysis
All analyses were implemented in the R environment. The comparisons between the high and low risk groups were conducted using the Student’s t-test (for normally distributed data) or the Wilcoxon rank sum test (for non-normally distributed data). The correlation between continuous variables was assessed using Pearson or Spearman correlation analysis. A two-tailed P-value of less than 0.05 was considered significant in all statistical analyses.
Results
Consensus clustering identify RNA modification subtypes
To construct RNA modification subtypes, we applied the ConsensusClusterPlus R package to identify RNA modification molecular subtypes in TCGA. By evaluating the CDF curve, delta curve and PAC curve results, we comprehensively decided to select K = 3 as the optimal clustering number (Figure S1A-C), and generated the consensus matrix plot (Figure S1D). Thus, all samples were categorized into three subtypes (C1, C2, C3) and consensus heatmap showed that the three subtype have high a consistency. In addition, Figure S1F revealed a significant expression difference of RNA modification regulators in three molecular subtypes.
Characterization of subtype associated modules through WCGNA
We used the WGCNA approach to identify key modules and genes that are highly correlated with molecular subtypes in TCGA-CRC cohorts. After removing the outliers, a soft power value of 12 was selected to ensure a scale-free network (Fig. 2A). According to the determined power values, we further divided the gene set into 10 modules by constructing a weighted co-expression network (Fig. 2B). We then analyzed the relationship between molecular subtypes and gene modules through pearson correlation analysis, and the turquoise module was found to be the most correlated with subtypes (Fig. 2C). Figure 1D revealed the genes were highly correlated with gene significance of molecular subtypes ( Fig. 2D). Pathway enrichment of the turquoise module revealed the genes are mainly involved in endoctyosis, ubiquities mediated proteolysis, RNA transport, spliceosome, cellular senescence, oocyte meiosis, T cell receptor signaling pathway, cell cycle and Fanconi anemia pathway (Fig. 2E).
Fig. 2.
Characterization of molecular subtypes associated gene modules through WGCNA algorithm. (A) The section of soft threshold (B) Co-expression network construction based upon the soft threshold. (C) The relationship between gene modules and molecular subtypes. (D) A scatter plot of the turquoise module eigengenes. (E) Pathway enrichment of the turquoise module genes.
Construction and evaluation of RMS
Before model construction, we conducted univariate cox regression analysis in five cohorts and 127 common genes were retained to the downstream analysis. After that we utilized 101 machine learning combination algorithms comprised of 10 algorithms including Enet, plsRcox, Ridge, StepCox, coxBoost, GBM, SuperPC, RSF, Lasso, and survivalvM with 10-fold cross-validation to calculate the C-index value in five cohort. The Enet[alpha = 0.1] model with highest C-index value was selected as the optimal signature with the average C-index value was 0.722 (Fig. 3A). The patients were then divided into high and low risk group based on the median risk score of each cohort. The KM survival curve analysis result revealed that patients in the high-risk group have a significant shorter survival time compared to the low risk group across the TCGA (Fig. 3B), GSE17536 (Fig. 3C), GSE17537 (Fig. 3D), GSE29612 (Fig. 3E), and GSE38832 (Fig. 3F) cohort (P < 0.05). Additionally, the AUC values calculated from the ROC analysis range from 0.603 to 0.891, indicating that the prognostic model has good prediction ability in the TCGA (Fig. 3B), GSE17536 (Fig. 3C), GSE17537 (Fig. 3D), GSE29612 (Fig. 3E), and GSE38832 (Fig. 3F) cohort, respectively. Collectively, these results demonstrated the robustness and reliability of RMS.
Fig. 3.
Development and evaluation of machine learning signature. (A) Construction of RMS through 101 machine learning algorithm. Kaplan-Meier survival curve (left) analysis of OS between high and low RMS score in TCGA (B), GSE17536 (C), GSE17537 (D), GSE29612 (E), and GSE38832 (F) cohort. The ROC curve (right) analysis of 1-, 3- and 5-years in TCGA (B), GSE17536 (C), GSE17537 (D), GSE29612 (E), and GSE38832 (F) cohort.
To validate the outstanding performance of RMS, we further collected 34 prognostic signature from public studies, and calculated their C-index value. Interestingly, RMS showed a higher C-index value in the TCGA, GSE17537, GSE17536, GSE29612, GSE38832 and meta cohort when compared to other published model, indicated that RMS is superior to other models in predicting CRC prognosis (Fig. 4). In addition, we also explored the relationship between RMS and clinical pathological factors. We observed that the low risk group corresponds to more deaths (Fig. 5A), advance stage (Fig. 5D), T3&T4 (Fig. 5E), M1 (Fig. 5F), and N1&N2 cases (Fig. 5G). However, no significant difference in gender and age was observed between the high and low -risk group (Fig. 5B-C).
Fig. 4.
Comparisons of C-index value of RMS and others known signature in TCGA, GSE17536, GSE17537, GSE29612, GSE38832 and meta GEO cohort.
Fig. 5.
Evaluation of the relationship between RMS and Clinical traits, including survival status (A), age (B), gender (C), stage (D), T (E), M (F), N (G).
To further demonstrate whether the prognostic model could independently predict patient survival, we collected clinical information from the five cohorts. By performing univariate and multivariate cox regression analysis, we demonstrate the RMS can serve as an independent prognostic factors in TCGA (Fig. 6A), GSE17536 (Fig. 6B), GSE17537 (Fig. 6C), GSE29612 (Fig. 6D), and GSE38832 (Fig. 6E) cohort.
Fig. 6.
Univariate and multivariate cox regression analysis was performed to identify the independence of RMS in TCGA (A), GSE17536 (B), GSE17537 (C), GSE29612 (D), and GSE38832 (E) cohort.
Association between immune infiltration and RMS
In order to explore the relationship between RMS and tumor immune cell infiltration, seven algorithms (XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC, CIBERSORT-ABS and CIBERSORT ) were applied to estimate distinct immune cells type infiltration level based on the expression data. The correlation coefficient between RMS score and immune cells were further evaluated. As the result showed, we found that high RMS score was negatively correlated with most of immune cells, including B cell (XCELL), memory CD4 T cell (XCELL), mast cell (XCELL), neutrophil (XCELL), plasma T cell (XCELL), NK cell (MCPCOUNTER), CD4 T cell (EPIC), CD 8 T cell (EPIC), while positively correlated with stromal score (XCELL), M2 macrophage, cancer associated fibroblast (MCPCOUNTER, QUANTISEQ) (Fig. 7A-B). These results indicated that low RMS score have a higher immune infiltration and can partly interpret that why low risk group patients corresponding to a favourable survival outcome. We further investigated the relationship between immune modulators and RMS, and found that most of immune modulators including chemokines (CXCL1, CXCL2, CXCL3), Recptors (CXCR1, CXCR2, CXCR6), immunoinhibitors (ICOS, IDO1), and immunostimulators (TNFSF9, TNFRSF17, TMIGD2) presented a negative correlation with RMS score (Fig. 7C). In addition, using the TIDE algorithm, we calculated the TIDE value, dysfunction score, and exclusion score for each patient. We found a significant decrease in the low-risk group compared to the high-risk group, indicating a higher likelihood of response to immunotherapy in the low-risk group. The barplot demonstrated that patients in the low risk group might benefit from immunotherapy (P < 0.05) (Fig. 7D).
Fig. 7.
Immune infiltration landscape of CRC. (A) The correlation between immune cells calculated by distinct algorithms and RMS score. Each dot represent immune cells, and different colors represent algorithm (B) A heatmap showed the relationship between immune cells and RMS score. (C) A heatmap showed the correlation between immune modulators and RMS score (D) Evaluation of immunotherapy response between high and low RMS score group through TIDE analysis, *p < 0.05, **p < 0.01, ***p < 0.001.
Biological characteristic of RMS
To further explore the pathway enrichment of RMS, we conducted single sample gene set enrichment analysis (ssGSEA) on the KEGG reference gene set. We identified seven pathways, including ECM-receptor interaction, focal adhesion, WNT signaling pathway, PI3K-Akt signaling pathway, notch signaling pathway, Rap1 signaling pathway, and Hippo signaling pathway, that were significantly enriched in the high-risk group (Figure S2A). In contrast, the low-risk group showed enrichment in cell cycle, IL-17 signaling pathway, fatty acid metabolism, DNA replication, homologous recombination, Fanconi anemia pathway, and spliceosome (Figure S2B).
We also applied the ssGSEA algorithm to quantify the cancer immunity cycle signatures. The cancer immunity cycle contains seven steps: antigen release (step 1), cancer antigen presentation (step 2), priming and activation (step 3), tumor immune infiltrating cell recruitment (step 4), immune cell infiltration (step 5), cancer cell recognition by T cells (step 6), and cancer cell killing (step 7). We observed that RMS was negatively correlated with most of these processes, including the release of cancer antigens (step 1), cancer antigen presentation (step 2), priming and activation (step 3), CD4 T cell recruitment (step 4), CD8 T cell recruitment (step 4), MDSC recruitment (step 4), neutrophil recruitment (step 4), Th1 cell recruitment (step 4), Th2 cell recruitment (step 4), Th22 cell recruitment (step 4), Treg cell recruitment (step 4), and killing of cancer cells (step 7) (Figure S3A). Additionally, RMS was positively correlated with DNA repair pathways, such as cell cycle, DNA replication, nucleotide excision repair, homologous recombination, and mismatch repair (Figure S3B).
Characterization of drugs to the high-RMS group patients
To explore the sensitivity of chemotherapy drugs to the RMS, we calculated the IC50 value of each drug in high and low risk group. The lower the IC50 value, the more sensitive patients are to chemotherapy drugs. We identified that both gemcitabine (Fig. 8A) and sorafenib (Fig. 8B) are more sensitive to patients in the low RMS group, while sunitinib is suitable for treating high RMS group patients (Fig. 8D). However, docetaxel showed no difference between high- and low-RMS group patients (Fig. 8C). Next, we also adopted two methods to identify potential agents for patients in the high risk group patients in the CTRP and PRISM drug databases. Firstly, we screened for differentially expressed agents between the high and low-risk groups, and we retained agents with low estimated AUC values in the high-risk group. Secondly, we performed a correlation analysis between the RMS score and the AUC value of each agent, and we reserved agents with a negative correlation with the RMS score. The common drugs from these two analyses were considered as candidate agents. As a result, six agents (KU-0063794, temozolomide, DNMDP, ML162, SJ-172550, and ML050) (Fig. 8E) from CTRP database and five agents (BIBX-1382, lomitapide, ZLN005, PPT and panobinostat) (Fig. 8F) from PRISM database with lower AUC values in high RMS group were identified.
Fig. 8.
Evaluation and identification of drugs associated with RMS. (A-D) Evaluation of sensitivity of four chemotherapy drugs including gemcitabine, docetaxel, sorafenib and sunitinib. (E) The differential drug analysis (left) and correlation analysis (right) result of six drugs from CTRP database. (F) The differential drug analysis (left) and correlation analysis (right) result of five drugs from PRISM database. *p < 0.05, **p < 0.01, ***p < 0.001.
Analysis of target genes and their potential drugs
To identify potential therapeutic targets for high risk group patients, we gathered data on the target information of 6125 compounds from a previous study. We then proceeded with a two-step analysis to identify potential targets29. At first, we performed Spearman correlation analysis between RMS score and target expression in TCGA-CRC cohort. A total of 384 targets with a correlation coefficient > 0.1 and a P value < 0.05 were identified (Fig. 9A). Next, we conducted a Spearman correlation between the CERES score and RMS score, retaining 122 targets with correlation coefficient < -0.3 and P value < 0.05 were retained (Fig. 9B). As a result, six genes including EEF2, KCNJ11, P2RY11, RPS2, SLC2A4 and TERT were served as the candidate target from the two analyses. Among these genes, TERT had a CERES score that was close to zero in a proportion of CRC cells, compared to the other genes. It has been reported that abnormally expressed TERT is associated with tumor proliferation and metastasis, but its role in CRC is still unknown. Therefore, we further evaluated its expression in the normal and tumor samples in CRC. As showed in Fig. 10A-B, we observed that TERT was highly expressed not only in tumor samples, but also in paired tumor samples. Moreover, the diagnostic ROC curve demonstrated that TERT have a high accuracy in distinguishing normal and tumor samples (AUC = 0.793) (Fig. 10C). The KM survival curve analysis showed that high expression of TERT was corresponding to a poor survival outcome (Fig. 10D). In addition, we also screened 18 potential agent that correlated with TERT from the GDSC drug database. Among these drugs, erlotinib was most positively correlated with TERT, while BMS-754,807 was most negatively correlated with TERT (Fig. 10E). Notably, the mechanism of action of erlotinib is the EGFR signaling pathway, while the mechanism of action of BMS-754,807 is the IGF1R signaling pathway (Fig. 10F).
Fig. 9.
Identification of candidate drug targets for high RMS score patients. (A) The volcano plot (left) and scatter plots (right) of Pearson correlation between RMS score and RNA expression level of drug targets in CRC lines, each red dot represent a TCGA sample. (B) The volcano plot (left) and scatter plots (right) of Pearson correlation and significance between RMS score and CERES score of drug targets in CRC lines, each green dot represent a CRC cell line.
Fig. 10.
Exploration of the expression and prognostic value of TERT in TCGA-CRC cohort. (A) Comparisons of the expression of TERT in normal and tumor tissue. (B) Comparisons of the expression of TERT in paired normal and tumor tissue. (C) Evaluation of the diagnostic value of TERT through ROC curve analysis (D) Kaplan-Meier survival curve of OS between high and low expression of TERT. (E) The correlation between candidate drugs and TERT. (F) The mechanism of action of the drug candidate. *p < 0.05, **p < 0.01, ***p < 0.001.
Single cell RNA-seq analysis
After quality control and normalization, 14,855 cells were categorized into 25 cell clusters and annotated into seven cell types (CD8 + T cells, plasma cells, B cells, epithelial cells, fibroblast cells, mast cells and endothelial cells) on the basis of differentially expressed genes (DEGs) and classical markers (Figure S4A-F). The distribution of distinct cells were between normal and CRC tissue were evaluated (Figure S4G). Moreover, TERT and risk score was quantified at the single cell level. As showed in Figure S4, TERT was highly expressed in epithelial cells (Figure S4H), while risk score presented a high level in endothelial cells, mast cells, fibroblast cells (Figure S4I).
Discussion
RNA modification promotes or inhibits tumorigenesis by regulating various processes such as cell proliferation, differentiation, invasion, migration, stemness, metabolism, and drug resistance32,33. Among these numerous RNA modifications, N6-methyladenosine (m6A), N1-methyladenosine (m1A), 5-methylcytosine (m5C), and N7-methylguanosine (m7G) editing are tightly associated with tumorigenesis32. Additionally, RNA modification is considered an essential factor in the design and development of effective mRNA vaccines34. With the outbreak of COVID-19 in 2019 worldwide, many pharmaceutical companies including Moderna, Pfizer/BioNTech, CureVac, and Arctrus have been working on developing mRNA-based COVID-19 candidate vaccines, with some using RNA modification technology and others not35. Notably, the vaccines manufactured by Moderna and Pfizer/BioNTech contain modified nucleotides and have an efficacy rate of 90%36–38. In contrast, the unmodified CureVac vaccine has a protection rate against coronavirus infection that drops to 47% 34,38. These clinical observations indicated mRNA modification play an important role in the preservation of mRNA vaccines.
Due to the rapid development of next generation sequencing, the abundance of omics data has facilitated the comprehension of molecular mechanism of diseases and provide a numerous biomarkers for prognosis and diagnosis. Presently, several studies have revealed the prognostic role of RNA modification in various of cancers, such as ovarian cancer39, pancreatic cancer40, oral squamous cell carcinoma41, lung cancer42, and soft tissue sarcoma43. Although these published signature showed a high performance in the training cohort, their performance remains poor in the others cohort. Therefore, there is an urgent need to develop a reliable and stable RNA modification signature to effectively stratify patients and validate its prognostic value across multiple cohorts.
In the present study, three molecular subtypes with distinct expression level were firstly identified based on the expression profile of RNA modification regulators. The WGCNA algorithm was applied to identify modules and module genes correlated with subtypes. Then prognostic genes were screened by performing univaruate cox regression analysis in all cohorts based on the module genes. We then developed a consensus RMS with 51 RNA modification based upon the combination algorithm Enet[alpha = 0.1]. The RMS showed a stable performance in predicting prognosis and is considered an independent prognostic factor in all cohorts. Furthermore, our RMS demonstrated a significant superiority in predicting prognosis compared to previously published models associated CRC. These results indicate that our model could accurately identify high risk patients and contribute to the helping clinicians in promptly to tailor treatment strategy. Moreover, the relationship between RMS and immune infiltration revealed that patients in the low RMS group associated with higher immune cells infiltration, including B cell, memory CD4 T cell, plasma T cell, NK cell, CD 8 T cell. To further clarify the association between immune infiltration and immunotherapy in CRC, we calculated the TIDE score for each patient and found that patients in the low RMS score group showed a significantly lower TIDE score. The TIDE (Tumor Immune Dysfunction and Exclusion) provides various scores to help biologists understand TME status, including a T cell dysfunction score, a T cell exclusion score, a cytotoxic T lymphocyte score, and scores for cell types restricting T cell infiltration like CAF, MDSC, and M2 macrophages. These metrics aid in assessing TME status and T cell functions, guiding clinicians in predicting immunotherapy response using tumor RNA-seq profiling44. The lower the patient’s TIDE value, the more likely they are to benefit from immunotherapy. Therefore, patients in the low risk group could benefit from immunotherapy.
We also explored the benefit of chemotherapy drugs including gemcitabine, docetaxel, sorafenib, and sunitinib in high and low risk groups. The results indicated that patients in low risk group benefit from gemcitabine and sorafenib, while sunitinib was more sensitive to high risk group patients, suggesting that the heterogeneity of the tumor. These findings might lay the groundwork for combining immunotherapy and chemotherapy for CRC patients in high RMS group. Moreover, we identified several drugs for the treatment of high RMS group from CTRP and PRISM, including KU − 0063794, temozolomide, DNMDP, ML162, SJ − 172,550, ML050, BIBX − 1382, lomitapide, ZLN005, PPT, panobinostat. Among these drugs, cumulative studies demonstrated a synergistic activity of temozolomide and poly-ADP-ribose polymerase (PARP) inhibitors, which block the repair of DNA damage caused by TMZ in small cell lung cancer (SCLC)45. Lee et al. confirmed that lomitapide could inhibit the growth and viability of cancer cells and have the potential as a new therapeutic option for the treatment of cancer46. Zhang et al. identified that combination of panobinostat and doxorubicin could effectively inhibit the growth of soft tissue sarcoma cells47. Additionally, we also identified a therapeutic target TERT for high RMS group patients. The telomerase reverse transcriptase (TERT) gene encodes the catalytic subunit of telomerase, which plays a crucial role in tumorigenesis by maintaining telomere stability and cell proliferation capacity48. Recent studies have demonstrated that TERT is involved in multiple biological pathways that promote cancer cell proliferation, including MAPK signaling pathway, mTOR signaling pathway, and NF-κB signaling pathway, etc48–50. These results provide new direction for the development of personalized treatment of CRC.
Despite the high performance of RMS, several limitations need to be elucidated. Firstly, all the cohorts were retrieved from single-center retrospective designed, and lack of validation in prospective. Secondly, the functional role of the therapeutic target TERT was not fully investigated in experiments. Thirdly, different GPL platforms of GEO cohorts could lead to bias and poor survival outcomes. Therefore, we next plan to conduct more in-depth analysis on RMS and TERT.
In conclusion, by utilizing 101 machine learning combination algorithms, we successfully constructed and validated an RMS for predicting survival. The RMS demonstrated high performance and can serve as an independent prognostic factors in all cohorts. Moreover, we also identified a potential therapeutic target TERT for the treatment of high risk patients. These results provide a direction and efficient tools for personalized treatment and clinical decision making for CRC patients.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
Not applicable.
Author contributions
Weimin Zhong conceived and designed the study. Hao Wei and Qingsong Luo performed the data analysis, Hao Wei wrote the manuscript. All authors reviewed and approved the manuscript.
Funding
Not applicable.
Data availability
The available datasets could be retrieved from the TGCA database (https://portal.gdc.cancer.gov/) and GEO database (http://www.ncbi.nlm.nih.gov/geo/). The code applied in the study is available from the corresponding author upon reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Siegel, R., Miller, K., Fuchs, H. & Jemal, A. Cancer statistics, 2022. Cancer J. Clin.72. (2022). [DOI] [PubMed]
- 2.Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer J. Clin. 71. (2021). [DOI] [PubMed]
- 3.Sur, D. et al. Colorectal cancer: Evolution of screening strategies. Clujul Medical 92. (2018). [DOI] [PMC free article] [PubMed]
- 4.Tauriello, D., Calon, A., Lonardo, E. & Batlle, E. Determinants of metastatic competency in colorectal cancer. Mol. Oncol. 11. (2017). [DOI] [PMC free article] [PubMed]
- 5.Ma, Y. et al. Multi-omics cluster defines the subtypes of CRC with distinct prognosis and tumor microenvironment. Eur. J. Med. Res.29. (2024). [DOI] [PMC free article] [PubMed]
- 6.Xu, Y. et al. Role of Main RNA Methylation in Hepatocellular Carcinoma: N6-Methyladenosine, 5-Methylcytosine, and N1-Methyladenosine. Front. Cell Dev. Biol.. 9 (2021). [DOI] [PMC free article] [PubMed]
- 7.Xie, S. et al. Emerging roles of RNA methylation in gastrointestinal cancers. Cancer Cell Int.20. (2020). [DOI] [PMC free article] [PubMed]
- 8.Huang, Y. et al. Enhancing m6A modification of lncRNA through METTL3 and RBM15 to promote malignant progression in bladder cancer. Heliyon10, e28165 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhou, X. et al. The m6A methyltransferase METTL3 drives thyroid cancer progression and lymph node metastasis by targeting LINC00894. Cancer Cell Int.. 24 (2024). [DOI] [PMC free article] [PubMed]
- 10.Yang, Z. et al. The METTL3/miR-196a Axis predicts poor prognosis in non-small cell Lung Cancer. J. Cancer. 15, 1603–1612 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Li, Y. et al. Increased expression of METTL3 in pancreatic cancer tissues associates with poor survival of the patients. World J. Surg. Oncol. 20. (2022). [DOI] [PMC free article] [PubMed]
- 12.Gu, X. et al. Uncovering the association between m5C regulator-mediated methylation modification patterns and tumour microenvironment infiltration characteristics in hepatocellular carcinoma. Front. Cell Dev. Biol. 9. (2021). [DOI] [PMC free article] [PubMed]
- 13.Konishi, N. et al. High expression of a new marker PCA-1 in human prostate carcinoma. Clin. cancer Research: Official J. Am. Association Cancer Res.11, 5090–5097 (2005). [DOI] [PubMed] [Google Scholar]
- 14.Stefansson, O. et al. CpG promoter methylation of the ALKBH3 alkylation repair gene in breast cancer. BMC Cancer17. (2017). [DOI] [PMC free article] [PubMed]
- 15.Zhao, Y. et al. m1A regulated genes modulate PI3K/AKT/mTOR and ErbB pathways in gastrointestinal Cancer. Transl. Oncol.12, 1323–1333 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Malbec, L. et al. Dynamic methylome of internal mRNA N7-methylguanosine and its regulatory role in translation. Cell Res.2019, 1–15 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gao, Y., Ren, J., Chen, K. & Guan, G. Construction and validation of a prognostic signature for mucinous colonic adenocarcinoma based on N7-methylguanosine-related long non-coding RNAs. J. Gastrointest. Oncol.15, 203–219 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhang, C. et al. Identification and validation of a prognostic risk-scoring model for AML based on mG-associated gene clustering. Front. Oncol.13, 1301236 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liu, W. et al. A signature of five 7-methylguanosine-related genes is a prognostic marker for lung squamous cell carcinoma. J. Thorac. Disease. 15, 6265–6278 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li, X., Li, Y., Li, N., Shen, L. & Li, Z. Integrative analyses reveal biological function and prognostic role of m7G methylation regulators in high-grade glioma. Aging 15. (2023). [DOI] [PMC free article] [PubMed]
- 21.Deng, K. et al. Identification and validation of a novel prognostic model for gastric cancer based on m7G-related genes. Transl. Cancer Res.12, 1836–1851 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang, M., Song, J., Yuan, W., Zhang, W. & Sun, Z. Q. Roles of RNA methylation on tumor immunity and clinical implications. Front. Immunol.. 12 (2021). [DOI] [PMC free article] [PubMed]
- 23.Li, Y. et al. The role of RNA methylation in tumor immunity and its potential in immunotherapy. Mol. Cancer . 23 (2024). [DOI] [PMC free article] [PubMed]
- 24.Leek, J., Johnson, W., Parker, H., Jaffe, A. & Storey, J. The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinf. (Oxford England). 28, 882–883 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ye, X. et al. m6A/ m1A /m5C/m7G-related methylation modification patterns and immune characterization in prostate cancer. Front. Pharmacol.. 13 (2022). [DOI] [PMC free article] [PubMed]
- 26.Wilkerson, M. & Hayes, D. ConsensusClusterPlus: A class discovery tool with confidence assessments and item tracking. Bioinf. (Oxford England). 26, 1572–1573 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innov.2, 100141 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hänzelmann, S., Castelo, R. & Guinney, J. Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinform.14, 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yang, C. et al. Prognosis and personalized treatment prediction in TP53-mutant hepatocellular carcinoma: An in silico strategy towards precision oncology. Briefings in bioinformatics . 22 (2020). [DOI] [PubMed]
- 30.Lee, H. O. et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat. Genet.52, 1–10 (2020). [DOI] [PubMed] [Google Scholar]
- 31.Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol.42, 293–304 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Tang, Q. et al. RNA modifications in cancer. Br. J. Cancer . 129 (2023). [DOI] [PMC free article] [PubMed]
- 33.Chen, X. H. et al. Regulations of m6A and other RNA modifications and their roles in cancer. Front. Med. (2024). [DOI] [PubMed]
- 34.Mei, Y. & Wang, X. RNA modification in mRNA cancer vaccines. Clin. Exp. Med.23, 1–15 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gaviria, M. & Kilic, B. A network analysis of COVID-19 mRNA vaccine patents. Nat. Biotechnol.39, 546–548 (2021). [DOI] [PubMed] [Google Scholar]
- 36.Baden, L. et al. Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine. New England J. Med. . 384 (2020). [DOI] [PMC free article] [PubMed]
- 37.Thomas, S. et al. Efficacy and safety of the BNT162b2 mRNA COVID-19 vaccine in participants with a history of cancer: Subgroup analysis of a global phase 3 randomized clinical trial. Vaccine . 40 (2021). [DOI] [PMC free article] [PubMed]
- 38.Barbier, A., Jiang, A., Zhang, P., Wooster, R. & Anderson, D. The clinical progress of mRNA vaccines and immunotherapies. Nat. Biotechnol.40. (2022). [DOI] [PubMed]
- 39.Zheng, P., Li, N. & Zhan, X. Ovarian cancer subtypes based on the regulatory genes of RNA modifications: Novel prediction model of prognosis. Front. Endocrinol. . 13 (2022). [DOI] [PMC free article] [PubMed]
- 40.Li, T. et al. Identification of methyltransferase modification genes associated with prognosis and immune features of pancreatic adenocarcinoma. Mol. Cell Probes. 67, 101897 (2023). [DOI] [PubMed] [Google Scholar]
- 41.Wu, X., Tang, J. & Cheng, B. Oral squamous cell carcinoma gene patterns connected with RNA methylation for Prognostic Prediction. Oral Dis.30. (2022). [DOI] [PubMed]
- 42.Mao, S., Chen, Z., Wu, Y., Xiong, H. & Yuan, X. Crosstalk of eight types of RNA modification regulators defines tumor microenvironments, cancer hallmarks, and prognosis of lung adenocarcinoma. J. Oncol.. (2022). [DOI] [PMC free article] [PubMed]
- 43.Wang, X. et al. Multi-omics analysis of copy number variations of RNA regulatory genes in soft tissue sarcoma. Life Sci.265, 118734 (2020). [DOI] [PubMed] [Google Scholar]
- 44.Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. . 24 (2018). [DOI] [PMC free article] [PubMed]
- 45.Andrini, E. et al. Challenges and future perspectives for the use of temozolomide in the treatment of SCLC. Cancer Treat. Rev.129, 102798 (2024). [DOI] [PubMed] [Google Scholar]
- 46.Lee, B. et al. Lomitapide, a cholesterol-lowering drug, is an anticancer agent that induces autophagic cell death via inhibiting mTOR. Cell Death Dis.13, 603 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zhang, Y. et al. Synergistic activities of Panobinostat and doxorubicin in soft tissue sarcomas. Biomed. Pharmacother.176, 116895 (2024). [DOI] [PubMed] [Google Scholar]
- 48.Nault, J. C., Ningarhari, M., Rebouissou, S. & Zucman-Rossi, J. The role of telomeres and telomerase in cirrhosis and liver cancer. Nat. Rev. Gastroenterol. Hepatol. . 16 (2019). [DOI] [PubMed]
- 49.Li, Y. & Tergaonkar, V. Noncanonical functions of telomerase: implications in telomerase-targeted Cancer therapies. Cancer Res.74. (2014). [DOI] [PubMed]
- 50.Dogan, F. & Biray Avcı, Ç. Correlation between telomerase and mTOR pathway in cancer stem cells. Gene . 641 (2017). [DOI] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The available datasets could be retrieved from the TGCA database (https://portal.gdc.cancer.gov/) and GEO database (http://www.ncbi.nlm.nih.gov/geo/). The code applied in the study is available from the corresponding author upon reasonable request.










