Skip to main content
Discover Oncology logoLink to Discover Oncology
. 2025 Nov 18;16:2126. doi: 10.1007/s12672-025-03974-2

Development and validation of an AI-augmented deep learning model for survival prediction in de novo metastatic colorectal cancer

Merih Yalçıner 1,, Efe Cem Erdat 1, Engin Eren Kavak 2, Güngör Utkan 1
PMCID: PMC12627317  PMID: 41251848

Abstract

Purpose

Accurate prognostication in metastatic colorectal cancer (mCRC) remains challenging due to disease heterogeneity. This study aimed to develop and validate an artificial intelligence-augmented deep learning model for risk stratification in patients receiving first-line treatment.

Methods/patients

We developed a deep neural network with artificial intelligence augmentation, using data from patients with de novo mCRC treated at two major reference centers between 2010 and 2024. Patients with BRAF-mutated and MSI-high tumors were excluded. The model incorporated clinical characteristics, laboratory parameters, and treatment data. The primary outcome was progression-free survival (PFS).

Results

A total of 214 patients were included in the study, with 127 patients in the training and internal validation cohort and 87 patients in the external validation cohort. The model stratified patients into three distinct risk groups with significantly different PFS (log-rank p < 0.001). The low-risk group (n = 34) achieved a median PFS of 16.8 months with a 29% event rate, the medium-risk group (n = 33) showed a median PFS of 9.3 months with a 58% event rate, and the high-risk group (n = 34) demonstrated a median PFS of 7.5 months with a 76% event rate. Feature importance analysis identified carcinoembryonic antigen, neutrophil/lymphocyte ratio, and liver function tests as the strongest predictors of PFS. The model’s performance was consistent across both internal and external validation cohorts.

Conclusions

This deep learning model demonstrates robust prognostic capabilities in mCRC, effectively stratifying patients into distinct risk groups. The model could aid in clinical decision-making and treatment planning for patients receiving first-line therapy.

Supplementary Information

The online version contains supplementary material available at 10.1007/s12672-025-03974-2.

Keywords: Colorectal cancer, Artificial intelligence, Machine learning, Prognosis

Introduction

Colorectal cancer (CRC) is one of the leading causes of cancer mortality worldwide [1]. Despite significant advances in our knowledge of tumor biology and novel treatment strategies, metastatic CRC (mCRC) still poses a serious global health issue [2, 3]. Given its biological diversity, CRC has a wide range of clinical presentations and a variable prognosis. Identifying high-risk patients, who require more prompt and intensive interventions, is of utmost importance. Prognostic factors, including biomarkers and clinical features, are areas of active research [46].

Although technological advancements have enabled the discovery of numerous potential predictive and prognostic biomarkers in mCRC patients, there remains a knowledge gap regarding practical and easily accessible tools for identifying high-risk patients. At present, in addition to clinical features such as performance status, tumor burden, and tumor location, factors affecting the choice of treatment include biological markers, including KRAS/NRAS/BRAF mutation status and microsatellite instability [7]. Although there is currently no standardized risk scoring system that can alter treatment algorithms, evidence is accumulating regarding the prognostic significance of easily accessible biochemical tests, including inflammatory indices [8]. The inflammatory microenvironment plays an important role in the development of tumors [9]. Inflammation promotes tumor growth via many pathways, including facilitating angiogenesis, invasion and metastasis of tumor cells [10]. Following the discovery of inflammation’s role in tumor development, numerous articles have been published highlighting the prognostic significance of inflammatory indices such as the neutrophil-to-lymphocyte ratio (NLR) [1114]. These tests have the advantage of being inexpensive and easily accessible, which is why further investigation into their clinical implications could facilitate disease management and improve patient outcomes.

Machine learning (ML) has emerged as a significant advancement in the medical field and is an active area of research. It involves the use of algorithms and statistical models to analyze complex medical data, potentially enabling more accurate predictions and improved decision-making in clinical practice [15]. Studies utilizing ML for CRC management are being rapidly developed. These include research on the development of predictive biomarkers, examination of the tumor microenvironment, prediction of postoperative outcomes, metastasis prediction in early-stage disease, and survival analyses [1620]. Furthermore, AI-driven biomarker assessment has shown promise in personalizing treatment decisions, with artificial intelligence algorithms successfully identifying patients likely to benefit from targeted anti-EGFR therapies based on immunohistochemical analysis [21]. Additionally, AI has been employed in developing quantitative cancer-immunity cycle models to predict disease progression and survival outcomes in advanced metastatic colorectal cancer patients [22]. Machine learning frameworks have also been developed to evaluate the generalizability of clinical trial results to real-world colorectal cancer patients, addressing the gap between controlled trial settings and clinical practice [23].

Further research has great potential to facilitate disease management and improve patient outcomes. In this study, we aimed to develop a tool, enhanced with ML, to identify high-risk mCRC patients, using clinical features and readily available biochemical tests.

Patients and methods

This study was conducted in accordance with the “Declaration of Helsinki”. The institutional ethics committee (Ankara University Ethics Committee, No: 2025000089-1) approved the study protocol. All the patient data was recorded in an electronic database, and all the identities were blinded. Written informed consent was obtained from all patients for anonymized patient information to be published in this article.

Study design

This study was conducted for developing a prognostic deep-learning model with artificial intelligence augmentation. Patients with mCRC receiving first-line treatment in two major reference centers in Turkey between 2010 and 2024 were included in the study.

Data collection and preprocessing

Adult patients (≥ 18 years) with histologically confirmed de novo metastatic colorectal adenocarcinoma, measurable disease according to RECIST 1.1 criteria, ECOG performance status 0–1, adequate organ function, and initiation of first-line systemic chemotherapy combined with bevacizumab or cetuximab between 2010 and 2024 were included. A total of 267 patients were screened, with 53 exclusions (18 BRAF mutations, 12 MSI-high tumors, 15 incomplete data, 5 consent refusals, 3 lost to follow-up), yielding 214 patients for analysis (127 training/internal validation, 87 external validation, Fig. 1). The data collected included demographics of patients like gender, age at diagnosis, comorbid diseases, disease properties like primary tumor region, sites of metastasis, RAS mutational status, biological agents used in the first-line treatment and progression-free survival (PFS) of the patients. Included biochemical parameters included liver function tests like alanine aminotransferase, aspartate aminotransferase, gamma-glutamyl transferase, alkaline phosphatase, total and indirect bilirubin, kidney function tests like creatinine, sodium, potassium, calcium, phosphorus, inflammatory, hematological and tumor markers like albumin, CA-19-9, carcinoembryonic antigen, hemoglobin, white blood cell counts and neutrophil/lymphocyte ratio. Exclusion criteria included patients with BRAF-mutated tumors, MSI-high status tumors, incomplete clinical or laboratory data, previous systemic treatment for metastatic disease, synchronous primary cancers, ECOG performance status ≥ 2, inadequate organ function precluding standard chemotherapy, pregnancy, refusal to provide informed consent, or loss to follow-up before first response assessment.

Fig. 1.

Fig. 1

Patient selection flowchart

PFS was defined as the time from initiation of first-line treatment to documented disease progression according to Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1 criteria or death from any cause, whichever occurred first. Disease progression was determined by radiological assessment using contrast-enhanced computed tomography (CT) scans of the chest, abdomen, and pelvis. Patients underwent routine clinical and radiological assessment every 8–12 weeks during active treatment, or more frequently if clinically indicated. Imaging evaluation included baseline CT scans within 4 weeks of treatment initiation, followed by response assessment scans every 2–3 treatment cycles.

We imputed missing data with medians, cut-off values and cross-interactions of parameters were applied before the training process. AI augmentation with automated feature engineering, including polynomial transformations and generation of interaction terms, was employed to capture complex relationships among variables. Hyperparameter selection was optimized using Bayesian algorithms, and an ensemble learning technique was used to aggregate predictions from multiple independently trained models.

Artificial intelligence augmented training process

A deep neural network (mCRC-RiskNet) was developed to predict patient outcomes. Artificial intelligence augmentation included analysis of all possible combinations and risk scoring, determined by the best and consistent performance. The model architecture consisted of an input normalization layer followed by three hidden layers with dimensions [256, 128, 64], employing residual connections for improved gradient flow. The major output of the model was 3-scale risk scoring ranging between low to high for PFS (Fig. 2).

Fig. 2.

Fig. 2

mCRC Risk-Net development flowchart

The model was trained using the AdamW optimizer with an initial learning rate of 0.001 and weight decay of 0.0001. Training proceeded for a maximum of 100 epochs with early stopping (patience = 15) based on validation loss improvement. A reduce-on-plateau learning rate scheduler was implemented with a factor of 0.5 and patience of 5 epochs. Gradient clipping (max norm = 1.0) was applied to prevent exploding gradients.

All training was conducted using PyTorch 2.0 on CUDA-enabled hardware on Google Colab Platform.

Statistical analysis

The primary outcome was PFS, defined as the time from baseline to disease progression or death from any cause. Survival curves were estimated using the Kaplan-Meier method, with differences between risk groups assessed via log-rank test. The cutoff values for these groups were determined internally by the model to optimally separate patients according to their risk distribution and clinical outcomes.

Feature importance was quantified using integrated gradients, which assigns attribution scores by approximating the integral of gradients along the input path. The method was chosen for its theoretical soundness (satisfying axioms of attribution) and ability to capture non-linear relationships. Individual feature contributions were normalized to percentage scale for interpretability.

Statistical analyses were performed using Python 3.10 with lifelines 0.26.0 for survival analysis and scikit-learn 1.0 for evaluation metrics.

Results

Patient characteristics

A total of 214 patients with mCRC were included in the study, with 127 patients in the training and internal validation cohort and 87 patients in the external validation cohort. All the patients had metastatic disease at initial diagnosis. The median age was 59 years (IQR: 51.5–68) in the training/internal validation cohort and 63 years (IQR: 53–69) in the external validation cohort. Male patients constituted the majority in both cohorts (66.9% and 60.9%, respectively). All the patients in both cohorts had a performance status of 0–1. Common comorbidities among the patients included hypertension (46.5% in training/internal validation, 39.1% in external validation), diabetes mellitus (24.4% and 21.8%), and atherosclerotic cardiovascular disease (18.9% and 17.2%).

The most common primary tumor location was the left colon (39.2% in training/internal validation, 47.1% in external validation), followed by rectum (33.6% and 28.7%) and right colon (27.2% and 24.1%). Liver was the predominant site of metastasis in both cohorts (78.0% and 87.4%), followed by distant lymph nodes (37.3% and 23.0%), peritoneum (32.3% and 18.4%), and lung (23.6% and 11.5%). In terms of first-line biological agents used in combination with chemotherapy, bevacizumab was administered to 51.2% of patients in the training/internal validation cohort and 54% in the external validation cohort, while cetuximab was used in 48.8% and 46% of patients, respectively. RAS mutations were present in approximately half of the patients in both cohorts (48.0% and 50.6%). Patient demographics are summarized in Table 1.

Table 1.

Baseline characteristics comparison between training and external validation cohorts

Characteristic Training/Internal Validation (n = 127) External Validation (n = 87) p-value
Age, median (IQR) 59 (51.5–68) 63 (53–69) 0.142
Gender, n (%) 0.347
Male 85 (66.9%) 53 (60.9%)
Female 42 (33.1%) 34 (39.1%)
ECOG Performance Status 0–1, n (%) 127 (100%) 87 (100%) 1.000
Comorbidities, n (%)
Diabetes mellitus 31 (24.4%) 19 (21.8%) 0.661
Hypertension 59 (46.5%) 34 (39.1%) 0.283
ASCVD 24 (18.9%) 15 (17.2%) 0.760
Primary tumor region, n (%) 0.433
Left colon 49 (39.2%) 41 (47.1%)
Rectum 42 (33.6%) 25 (28.7%)
Right colon 34 (27.2%) 21 (24.1%)
Metastatic sites, n (%)
Liver 99 (78.0%) 76 (87.4%) 0.080
Distant lymph node 47 (37.3%) 20 (23.0%) 0.027
Peritoneum 41 (32.3%) 16 (18.4%) 0.024
Lung 30 (23.6%) 10 (11.5%) 0.025
Biological agent, n (%) 0.705
Bevacizumab 65 (51.2%) 47 (54.0%)
Cetuximab 62 (48.8%) 40 (46.0%)
RAS mutation, n (%) 61 (48.0%) 44 (50.6%) 0.714
Laboratory values, median (IQR)
CEA (ng/mL) 28.5 (8.2–89.3) 31.2 (9.1–94.7) 0.523
CA 19-9 (U/mL) 45.3 (12.1–156.8.1.8) 48.7 (14.3–162.4.3.4) 0.612
NLR 3.2 (2.1–4.8) 3.4 (2.3–5.1) 0.389
Albumin (g/dL) 3.8 (3.4–4.2) 3.7 (3.3–4.1) 0.456
ALT (U/L) 24 (16–38) 26 (17–40) 0.512
AST (U/L) 28 (20–42) 30 (21–44) 0.483
ALP (U/L) 98 (72–145) 102 (75–151) 0.567
Total bilirubin (mg/dL) 0.8 (0.5–1.2) 0.9 (0.6–1.3) 0.423
Hemoglobin (g/dL) 11.8 (10.2–13.1) 11.6 (10.0–12.9) 0.612

ASCVD: Atherosclerotic cardiovascular disease; CEA: Carcinoembryonic antigen; NLR: Neutrophil-to-lymphocyte ratio; ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALP: Alkaline phosphatase

Mann-Whitney U test; Chi-square test or Fisher’s exact test as appropriate

Statistically significant differences (p < 0.05) were observed only in metastatic site distribution, with no significant differences in other baseline characteristics, confirming cohort comparability

Feature importance analysis

The feature importance analysis reveals that the most influential individual predictor is CA-19-9, with an importance score of 54.24, followed by CEA at 17.44. Other features such as Age, GGT, LDH, ALT, TM-region, Alb, NLR, and Hemoglobin have relatively minimal contributions, all below 2 in importance score. When aggregated by category, Tumor Markers dominate the overall importance with a combined score of 71.68, indicating their critical role in the model. Liver Function parameters contribute modestly with a score of 1.91, followed by Demographics at 1.62. Other categories like Metabolic, Clinical, Inflammatory, and Hematologic have very low importance scores, highlighting that tumor markers are the primary drivers of predictive performance in this context. (Fig. 3). Model interpretability was addressed using SHapley Additive exPlanations (SHAP), and detailed feature attribution results are provided in the appendix.

Fig. 3.

Fig. 3

Feature importance analysis

Model performance and risk stratification

The AI-augmented deep learning model successfully stratified patients into three prognostic risk groups with distinct progression-free survival (PFS) outcomes. The low-risk group (n = 34) achieved a median PFS of 16.8 months with a 29% event rate. The medium-risk group (n = 33) had a median PFS of 9.3 months with a 58% event rate, while the high-risk group (n = 34) demonstrated the poorest outcomes with a median PFS of 7.5 months and a 76% event rate. Event rates by risk groups are shown in Fig. 4.

Fig. 4.

Fig. 4

Event Rate by risk group

In the internal cohort, median PFS was 12.4 months (95% CI: 9.0–14.0) for the low-risk group, 10.1 months (95% CI: 7.9–13.2) for the medium-risk group, and 9.6 months (95% CI: 8.2–15.9) for the high-risk group. External validation demonstrated comparable median PFS values of 14.0 months (95% CI: 12.0–20.0), 12.0 months (95% CI: 8.0–30.0), and 9.0 months (95% CI: 8.0–15.0) for the respective risk groups (Fig. 5).

Fig. 5.

Fig. 5

Progression free survival according to risk group

Time-dependent receiver operating characteristic (ROC) curve analysis revealed moderate discriminative performance, with area under the curve (AUC) values of 0.65 (95% CI: 0.51–0.79) at 6 months (N = 65), 0.63 (95% CI: 0.42–0.84) at 12 months (N = 32), and 0.50 (95% CI: 0.07–0.92) at 18 months (N = 10). The decreasing sample size at later time points likely contributed to wider confidence intervals and reduced stability. Corresponding ROC curves are presented in the supplementary appendix.

Survival times by risk groups are shown in Fig. 6.

Fig. 6.

Fig. 6

Survival times by risk group

Discussion

In this study, we developed and validated an artificial intelligence-augmented deep learning model that effectively stratifies patients with de novo mCRC into three distinct prognostic groups with significantly different PFS outcomes. Our findings demonstrate that a combination of clinical and laboratory parameters, particularly carcinoembryonic antigen levels, neutrophil/lymphocyte ratio, and liver function tests, provided robust prognostic information for patients receiving first-line treatment.

Despite recent significant advances regarding novel screening methods and treatment options, mCRC, being a heterogeneous disease, still poses a significant challenge. Therefore, accurate risk stratification of the patients is of utmost importance for improved patient outcomes. There are many risk factors identified for guiding treatment, including but not limited to, performance status, tumor location and burden, as well as KRAS/NRAS/BRAF mutation status and microsatellite instability [7]. As our knowledge of tumor biology improves, many other mechanisms that may have prognostic significance are being unveiled [10, 12, 13]. Despite advances in mCRC management, a significant knowledge gap persists in readily accessible prognostic tools that can guide treatment decisions, a gap that artificial intelligence approaches have the potential to address.

The literature contains numerous AI-enhanced studies not only for prognostication but also for diagnosis, molecular characterization, and risk assessment in mCRC management, representing an area with significant potential for development. In a study of 37 cases, researchers identified key proteins (PF4 and AACT) in serum extracellular vesicles using proteomics and ML, developing a random forest model with excellent diagnostic accuracy for CRC, even at early stages [24]. Another study utilized a ML algorithm to create piRNA sequence descriptors for diagnostic purposes [25]. Apart from diagnostics, there are ML studies on drug efficacy. In a recent study that included 1065 patients, ML model successfully predicted which early-stage colon cancer patients would benefit from adding oxaliplatin to standard adjuvant therapy [26]. A recent meta-analysis suggested that ML and deep learning models demonstrated high sensitivity and specificity for predicting lymph node metastasis in T1 CRC patients [27]. Moreover, ML models have also demonstrated high accuracy in predicting metastatic sites, disease staging, and treatment response in CRC, including liver/lung metastasis and lymph node involvement [28, 29]. Other studies have focused on using ML to enhance radiological assessment [3032]. Although most of these models need to be prospectively validated for routine use in clinical practice, it is clear that ML-enhanced methods have significant potential to improve clinical outcomes.

While ML approaches have been applied to predict outcomes in CRC, studies specifically using AI-augmented deep learning for comprehensive prognostic stratification in de novo mCRC patients with readily available clinical and laboratory parameters remain limited in the literature. There are several large-scale studies addressing prognosis, treatment response, and survival prediction in CRC patients. A recent study identified and validated a 67-gene signature that can identify mCRC patients who would benefit from FOLFOX treatment [33]. In another study, researchers developed ML models incorporating clinical, laboratory (including NLR and tumor markers), and genetic features to predict postoperative complications and survival outcomes for colorectal liver metastases patients undergoing resection [34]. In a study that evaluated the data of 35,639 mCRC patients obtained from SEER database, ML algorithms were combined with nomograms to predict early deaths. However, this model did not include laboratory parameters [35]. In another study, using ML to analyze data from the National Cancer Database (2010–2014) including 19,364 patients with synchronous colon cancer metastases, researchers created nomograms incorporating age, metastatic sites, and CEA levels to predict 3-year overall survival [36]. Our findings are consistent with the recent analysis by Bachet et al. of 37,560 mCRC patients across multiple treatment lines, which similarly identified inflammatory indices, liver function parameters and hematological markers as prognostic factors, reinforcing the validity of these readily available laboratory parameters for risk stratification in mCRC [37]. Compared to these recent studies, our research shares similar laboratory parameters (NLR, tumor markers) as well as clinical features, sites of metastasis and tumor location as prognostic indicators, but our model differs in focusing specifically on de novo metastatic disease with PFS as the primary outcome, meaning that this model, following prospective validation, has the potential to identify patients who would benefit from more intensive first-line treatments, which could improve outcomes for high-risk patients while preventing overtreatment in low-risk groups. Moreover, our model offers a practical alternative to models requiring genetic profiling. Following prospective validation, this risk stratification model could potentially guide treatment decision-making in clinical practice. For example, high-risk patients (median PFS 7.5 months) might benefit from consideration of intensive triplet chemotherapy or early clinical trial enrollment, while low-risk patients (median PFS 16.8 months) could be managed with standard doublet therapy and routine monitoring. Medium-risk patients may require individualized approaches based on performance status and patient preferences. The model could also inform prognostic discussions and optimize surveillance strategies. However, prospective validation studies are essential before implementing any risk-adapted treatment strategies to ensure safety and efficacy in routine clinical practice. Additionally, while TNM staging is fundamental for cancer classification, all patients in our cohort were stage IV (metastatic disease), limiting its discriminative ability for prognostic stratification. Our model addresses this limitation by providing granular risk stratification within stage IV disease, identifying clinically meaningful differences in PFS ranging from 7.5 to 16.8 months—a greater than two-fold difference that TNM staging cannot capture. By integrating readily available laboratory parameters that reflect disease biology beyond anatomical extent, our model has the potential to complement TNM staging with enhanced prognostic discrimination specifically within the metastatic population.

The strong predictive performance of tumor markers, NLR, and liver function tests in our model reflects key biological mechanisms driving mCRC progression. Elevated CEA levels indicate not only tumor burden but also enhanced metastatic potential through increased cell adhesion and angiogenesis promotion [38]. The prognostic significance of NLR reflects the well-documented role of systemic inflammation in cancer progression, where elevated neutrophil counts and reduced lymphocyte populations indicate immune dysregulation that favors tumor growth [39]. Abnormal liver function parameters reflect both hepatic metastatic burden and metabolic dysregulation that supports tumor growth while impairing drug metabolism and systemic immune function. These biomarkers represent interconnected biological networks where their combination creates a signature of aggressive disease biology, which our AI model captures through complex relationships, explaining its superior prognostic performance.

In our study, we excluded patients who had tumors with BRAF mutation or MSI-high status to ensure a molecularly homogeneous patient population suitable for our prognostic model. BRAF-mutated tumors in particular, present a challenge due to their aggressive nature and poor response to standard anti-EGFR therapeutic approaches, and requirement for specific targeted therapies including BRAF inhibitors combined with MEK inhibitors. These tumors demonstrate significantly different survival patterns and treatment responses compared to BRAF wild-type tumors. Similarly, MSI-high tumors, representing 3–5% of metastatic cases, have emerged as a distinct therapeutic entity with preferential response to immune checkpoint inhibitors rather than conventional chemotherapy regimens [39, 40]. By excluding these molecularly distinct subgroups, our model focuses on the majority of mCRC patients (approximately 85–90%) who receive standard first-line chemotherapy combined with anti-VEGF or anti-EGFR therapy. This approach prevents the introduction of heterogeneity from fundamentally different molecular pathways and treatment paradigms that would likely confound the prognostic performance of our model. The resulting homogeneous population allows for more accurate risk stratification using conventional clinical and laboratory parameters, enhancing the model’s clinical utility for routine practice where precise prognostic assessment is most needed for patients receiving standard systemic therapy.

Our study has several limitations. First, despite the inclusion of both internal and external validation cohorts, the total sample size (n = 214) is relatively modest compared to some registry-based studies. Second, the retrospective nature of the data collection may have introduced biases such as incomplete or missing data. Third, while our model performs well in predicting PFS, the generalizability to different patient populations requires further validation. Additionally, our model does not incorporate genomic data beyond RAS mutation status, which may limit its applicability as more targeted therapies become available. Despite these limitations, our study has notable strengths. The AI-augmented deep learning approach offers advantages over traditional statistical methods by capturing complex non-linear relationships between variables. Our model utilizes readily available clinical and laboratory parameters, making it practical for implementation in real-world settings without requiring advanced molecular testing. The clear stratification into three distinct risk groups with significantly different survival outcomes demonstrates the model’s clinical utility. Furthermore, the consistent performance across both training and external validation cohorts suggests generalizability within similar patient populations. Finally, the inclusion of patients from multiple centers enhances the diversity of the dataset and strengthens the validity of our findings.

In this study, we developed and validated an artificial intelligence-augmented deep learning model using readily available clinical and laboratory parameters, that effectively stratifies patients with de novo mCRC into distinct prognostic groups with significantly different PFS outcomes. While further prospective validation is needed before widespread clinical implementation, our findings suggest that AI-augmented prognostic models represent a promising approach to enhance personalized medicine in the management of mCRC.

Supplementary Information

Additional file 1 (413.3KB, docx)

Author contributions

MY: writing the manuscript, conceptualization; ECE: data curation, statistical analysis; EEK: data curation, statistical analysis; GU: conceptualization, review of the final results, writing the manuscript.

Funding

None.

Data availability

The data that support the findings of this study are available on request from the corresponding author.

Declarations

Ethics approval and consent to participate

This study was conducted in accordance with the “Declaration of Helsinki”. The institutional ethics committee (Ankara University Ethics Committee, No: 2025000089-1) approved the study protocol. All the patient data was recorded in an electronic database, and all the identities were blinded. Written informed consent was obtained from all patients to participate in this study.

Consent for publication

Written informed consent was obtained from all patients for anonymized patient information to be published in this article.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Roshandel G, Ghasemi-Kebria F, Malekzadeh R. Colorectal cancer: epidemiology, risk factors, and prevention. Cancers (Basel). 2024;16: 1530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Housini M, Dariya B, Ahmed N, et al. Colorectal cancer: genetic alterations, novel biomarkers, current therapeutic strategies and clinical trials. Gene. 2024;892:147857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Underwood PW, Ruff SM, Pawlik TM. Update on targeted therapy and immunotherapy for metastatic colorectal cancer. Cells. 2024;13:245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu Y, Liu C, Huang D, et al. Identification and prognostic analysis of candidate biomarkers for lung metastasis in colorectal cancer. Medicine. 2024;103:e37484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Battaglin F, Ou F-S, Qu X. HER2 gene expression levels are predictive and prognostic in patients with metastatic colorectal cancer enrolled in CALGB/SWOG 80405. J Clin Oncol. 2024;42:1890–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Fellhofer-Hofer J, Franz C, Vey JA, et al. Chemokines as prognostic factor in colorectal cancer patients: a systematic review and meta-analysis. Int J Mol Sci. 2024;25:5374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Biller LH, Schrag D. Diagnosis and treatment of metastatic colorectal cancer. JAMA. 2021;325:669. [DOI] [PubMed] [Google Scholar]
  • 8.Passardi A, Azzali I, Bittoni A, et al. Inflammatory indices as prognostic markers in metastatic colorectal cancer patients treated with chemotherapy plus bevacizumab. Ther Adv Med Oncol. 2023. 10.1177/17588359231212184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mantovani A, Allavena P, Sica A, et al. Cancer-related inflammation. Nature. 2008;454:436–44. [DOI] [PubMed] [Google Scholar]
  • 10.Lu H, Ouyang W, Huang C. Inflammation, a key event in cancer development. Mol Cancer Res. 2006;4:221–33. [DOI] [PubMed] [Google Scholar]
  • 11.Chua W, Charles KA, Baracos VE, et al. Neutrophil/lymphocyte ratio predicts chemotherapy outcomes in patients with advanced colorectal cancer. Br J Cancer. 2011;104:1288–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Walsh SR, Cook EJ, Goulder F, et al. Neutrophil-lymphocyte ratio as a prognostic factor in colorectal cancer. J Surg Oncol. 2005;91:181–4. [DOI] [PubMed] [Google Scholar]
  • 13.Guillem-Llobat P, Dovizio M, Alberti S, et al. Platelets, cyclooxygenases, and colon cancer. Semin Oncol. 2014;41:385–96. [DOI] [PubMed] [Google Scholar]
  • 14.Toiyama Y, Inoue Y, Saigusa S, et al. C-reactive protein as predictor of recurrence in patients with rectal cancer undergoing chemoradiotherapy followed by surgery. Anticancer Res. 2013;33:5065–74. [PubMed] [Google Scholar]
  • 15.Choi RY, Coyner AS, Kalpathy-Cramer J, et al. Introduction to machine learning, neural Networks, and deep learning. Transl Vis Sci Technol. 2020;9:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yang X, Qiu H, Wang L, et al. Predicting colorectal cancer survival using time-to-event machine learning: retrospective cohort study. J Med Internet Res. 2023;25:e44417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kang J, Choi YJ, Kim I, et al. LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal cancer. Cancer Res Treat. 2021;53:773–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang R, Dai W, Gong J, et al. Development of a novel combined nomogram model integrating deep learning-pathomics, radiomics and immunoscore to predict postoperative outcome of colorectal cancer lung metastasis patients. J Hematol Oncol. 2022;15:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wei W, Li Y, Huang T. Using machine learning methods to study colorectal cancer tumor micro-environment and its biomarkers. Int J Mol Sci. 2023;24:11133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liu Z, Liu L, Weng S, et al. Machine learning-based integration develops an immune-derived LncRNA signature for improving outcomes in colorectal cancer. Nat Commun. 2022;13:816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Williams CJM, Seligmann JF, Elliott F, et al. Artificial intelligence–assisted amphiregulin and epiregulin IHC predicts panitumumab benefit in RAS wild-type metastatic colorectal cancer. Clin Cancer Res. 2021;27:3422–31. [DOI] [PubMed] [Google Scholar]
  • 22.Li C, Wei Y, Lei J. Quantitative cancer-immunity cycle modeling for predicting disease progression in advanced metastatic colorectal cancer. NPJ Syst Biol Appl. 2025;11:33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Orcutt X, Chen K, Mamtani R, et al. Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial emulations. Nat Med. 2025;31:457–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yin H, Xie J, Xing S, et al. Machine learning-based analysis identifies and validates serum exosomal proteomic signatures for the diagnosis of colorectal cancer. Cell Rep Med. 2024;5:101689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li S, Kouznetsova VL, Kesari S, et al. PiRNA in machine-learning-based diagnostics of colorectal cancer. Molecules. 2024;29:4311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen L, Wang Y, Cai C, et al. Machine learning predicts oxaliplatin benefit in early colon cancer. J Clin Oncol. 2024;42:1520–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cheong C, Kim NW, Lee HS, et al. Application of machine learning for predicting lymph node metastasis in T1 colorectal cancer: a systematic review and meta-analysis. Langenbecks Arch Surg. 2024;409:287. [DOI] [PubMed] [Google Scholar]
  • 28.Abbaspour E, Mansoori B, Karimzadhagh S, et al. Machine learning and deep learning models for preoperative detection of lymph node metastasis in colorectal cancer: a systematic review and meta-analysis. Abdom Radiol. 2024. 10.1007/s00261-024-04668-z. [DOI] [PubMed] [Google Scholar]
  • 29.Guo Z, Zhang Z, Liu L, et al. Machine learning for predicting liver and/or lung metastasis in colorectal cancer: a retrospective study based on the SEER database. Eur J Surg Oncol. 2024;50:108362. [DOI] [PubMed] [Google Scholar]
  • 30.Mühlberg A, Holch JW, Heinemann V, et al. The relevance of CT-based geometric and radiomics analysis of whole liver tumor burden to predict survival of patients with metastatic colorectal cancer. Eur Radiol. 2021;31:834–46. [DOI] [PubMed] [Google Scholar]
  • 31.Ricci Lara MA, Esposito MI, Aineseder M, et al. Radiomics and machine learning for prediction of two-year disease-specific mortality and KRAS mutation status in metastatic colorectal cancer. Surg Oncol. 2023;51:101986. [DOI] [PubMed] [Google Scholar]
  • 32.Lu L, Dercle L, Zhao B, et al. Deep learning for the prediction of early on-treatment response in metastatic colorectal cancer from serial medical imaging. Nat Commun. 2021;12:6654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Abraham JP, Magee D, Cremolini C, et al. Clinical validation of a machine-learning–derived signature predictive of outcomes from first-line oxaliplatin-based chemotherapy in advanced colorectal cancer. Clin Cancer Res. 2021;27:1174–83. [DOI] [PubMed] [Google Scholar]
  • 34.Chen Q, Chen J, Deng Y, et al. Personalized prediction of postoperative complication and survival among colorectal liver metastases patients receiving simultaneous resection using machine learning approaches: a multi-center study. Cancer Lett. 2024;593:216967. [DOI] [PubMed] [Google Scholar]
  • 35.Zhang Y, Zhang Z, Wei L, et al. Construction and validation of nomograms combined with novel machine learning algorithms to predict early death of patients with metastatic colorectal cancer. Front Public Health. 2022. 10.3389/fpubh.2022.1008137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhao B, Gabriel RA, Vaida F, et al. Using machine learning to construct nomograms for patients with metastatic colon cancer. Colorectal Dis. 2020;22:914–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bachet J-B, de Gramont A, Raeisi M, et al. Characteristics of patients and prognostic factors across treatment lines in metastatic colorectal cancer: an analysis from the aide et recherche en cancérologie digestive database. J Clin Oncol. 2025;43:2094–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Niedzielska J, Jastrzębski T. Carcinoembryonic antigen (CEA): origin, role in oncology, and concentrations in serum and peritoneal fluid. J Clin Med. 2025;14:3189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Socorro Faria S, Fernandes PC, Barbosa Silva MJ, et al. The neutrophil-to-lymphocyte ratio: a narrative review. Ecancermedicalscience. 2016. 10.3332/ecancer.2016.702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Liu M, Liu Q, Hu K, et al. Colorectal cancer with BRAF V600E mutation: trends in immune checkpoint inhibitor treatment. Crit Rev Oncol Hematol. 2024;204:104497. [DOI] [PubMed] [Google Scholar]
  • 41.Yamamoto H, Watanabe Y, Arai H, et al. Microsatellite instability: a 2024 update. Cancer Sci. 2024;115:1738–48. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1 (413.3KB, docx)

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author.


Articles from Discover Oncology are provided here courtesy of Springer

RESOURCES