ABSTRACT
Patient outcomes in advanced renal cell carcinoma (RCC) remain poor, with five‐year survival rates ranging from ~10% to 30%. Early projections of therapeutic outcomes could optimize precision medicine and accelerate drug development. While machine learning (ML) models integrating tumor growth inhibition (TGI) metrics have improved survival predictions over traditional models, their application in RCC remains unexplored. Herein, we used TGI metrics and baseline data to evaluate parametric (PM) and semi‐parametric (SPM) survival models alongside ML approaches for predicting progression‐free (PFS) and overall survival (OS) in 1839 RCC patients from four trials (evaluating sunitinib, axitinib, sorafenib, interferon‐alpha, and avelumab + axitinib). Data were split into training (70%) and testing (30%), and feature selection was used to determine parsimonious and robust models. Bootstrap resampling (n = 100) was employed for models' validation, and performance was assessed using C‐index and Integrated Brier Score. In brief, training data results demonstrated that tree‐based ML models (random survival forest (RSF) and XGBoost) outperformed PM and SPM models in predicting PFS (C‐index: 0.783–0.785 vs. 0.725–0.738 for PM and SPM; p < 0.05) and OS (C‐index: 0.77–0.867 vs. 0.750–0.758 for PM and SPM; p < 0.05), with RSF achieving better prediction of PFS and OS using only 3–5 covariates, compared to 9–35 with other tested methods. Tree‐based methods were also superior in the testing data. SHapley Additive exPlanations revealed nonlinear relationships among top predictors, including TGI metrics, underscoring the ability of tree‐based methods to capture complex prognostic interactions. Further validation is required to confirm models' generalizability to additional therapies and patients with differing tumor severity.
Keywords: machine learning, overall survival, prediction, progression‐free survival, random forest, random survival forest, RCC, renal cell carcinoma, tumor growth inhibition, XGBoost
Study Highlights.
- What is the current knowledge on the topic?
-
○Tumor growth inhibition (TGI) metrics capture changes in tumor dynamics during treatment and are promising prognostic markers of clinical outcomes in various cancers. While ML models that incorporate TGI metrics have improved survival predictions in various solid tumors, their application in RCC remains unexplored.
-
○
- What question did this study address?
-
○This study aimed to use TGI metrics and baseline clinical and demographic data for predicting progression‐free survival (PFS) and overall survival (OS) in patients with RCC from four trials of anti‐cancer agents using ML and traditional parametric and semi‐parametric methodologies. Additionally, this work sought to understand the top predictive variables in ML models using SHapley Additive exPlanations (SHAP) analysis.
-
○
- What does this study add to our knowledge?
-
○Tree‐based ML models, including random survival forest (RSF) and XGBoost, outperformed traditional parametric (Accelerated Failure Time) and semi‐parametric (CoxBoost, Lasso Cox, Cox Proportional‐Hazards) models in projecting both PFS and OS. TGI metrics, particularly tumor growth rate constant (KL), were top indicators of both outcomes across modeling approaches. For PFS, both tree‐based approaches outperformed traditional approaches and did so using fewer (3–5 vs. 9–35) variables. RSF also outperformed traditional methods for OS using 5 compared to 14–34 features. SHAP analysis revealed nonlinear interactions between top TGI metrics, potentially explaining the superiority of tree‐based approaches. Survival predictions by top tree‐based methods tracked well over time with observed survival on Kaplan–Meier curves across drug treatments.
-
○
- How might this change clinical pharmacology or translational science?
-
○To our knowledge, this is the first study that integrated TGI metrics using tree‐based models to improve ML‐based survival predictions in RCC across multiple drug classes. With additional validation, such models could facilitate clinical trial enrichment, risk stratification, and personalized therapeutic selection in RCC. Additionally, this workflow can be potentially applied to optimize ML‐based survival predictions and simplify the identification of important covariates for additional tumors and clinical endpoints.
-
○
1. Introduction
Early projection of therapeutic outcomes in cancer patients is a fundamental goal of precision medicine and can help guide drug development decision‐making in oncology [1]. Currently, projection of progression‐free survival (PFS) and overall survival (OS) relies heavily on traditional statistical methods, including parametric approaches, such as Accelerated Failure Time (AFT), or semi‐parametric approaches, such as Cox proportional hazard (Cox PH) [2]. While these methods are relatively straightforward to implement and interpret, they are limited by restrictive assumptions about the relationships between variables, the relationships between markers and outcomes, and the limited number of markers they can accommodate [3, 4]. In an era where rapid technological advancements have enabled the creation of large and complex datasets, machine learning (ML) approaches offer a promising alternative. Perhaps due to their ability to model complex non‐linear interactions and handle high‐dimensional data, ML approaches have yielded improved predictability of outcomes across multiple tumor types [3, 4, 5]. However, their adoption in clinical settings is often hindered by interpretability challenges [6]. To address this limitation, interpretable ML approaches, such as SHapley Additive exPlanations (SHAP), can be implemented to unmask the “black box” nature of some ML methods and elucidate how predictions are derived.
Tumor growth inhibition (TGI) metrics measure drug‐specific (cell‐kill rate constant [KD] and drug‐resistance rate constant [LAM (λ)]) and disease‐specific (baseline tumor size and tumor growth rate constant [KL]) tumor size parameters from longitudinal imaging data, and they are valuable prognostic markers across multiple solid tumor types [7, 8, 9]. This includes renal cell carcinoma (RCC), the most common type of kidney cancer and the ninth most common malignancy in the United States [10]. Among patients with RCC, approximately 16% present with metastatic disease for whom the 5‐year survival remains a dismal 10%–30% [11, 12]. Therefore, improving projections for treatment response in RCC for optimizing therapy selection and accelerating drug development is critical to address the unmet clinical needs of RCC patients [13].
Despite the prognostic potential of TGI metrics, their integration into ML models for survival prediction in RCC remains largely unexplored. Therefore, this study aimed to evaluate the performance of classical parametric and semi‐parametric statistical methods and ML‐based methods in predicting PFS and OS using TGI metrics and baseline clinical and demographic variables from RCC patients who participated in clinical trials of various mono‐ and combination therapies, including kinase inhibitors, immunomodulators, monoclonal antibodies, and combination therapies. Following model development, post hoc exploration of the observed and predicted survival by treatment subgroups was conducted to determine whether the performance of the general models differed across diverse drug classes. Finally, this work sought to identify TGI metrics and clinical and demographic variables that contribute to improved survival prediction by ML in RCC.
2. Methods
2.1. Data Sources
Patients (N = 1839) with advanced renal cell carcinoma (RCC) who participated in any of the four clinical trials (one Phase I trial [NCT02493751] and three Phase III trials [NCT00083889, NCT00920816, and NCT02684006]) were included in this analysis. Patients in these studies were administered kinase inhibitors (sunitinib, sorafenib, or axitinib), an immunomodulator (interferon‐alpha), or a combination of a monoclonal antibody and a kinase inhibitor (avelumab and axitinib). Detailed descriptions of these trials have been previously published [14, 15, 16, 17], and are briefly presented in Table S1. After data preprocessing (described below), study subjects with the following cohorts were included in the analysis: interferon‐alpha (n = 328), sunitinib (n = 776), sorafenib (n = 83), axitinib (n = 175), or avelumab and axitinib combination administration (n = 477). An institutional review board or independent ethics committee reviewed and approved the relevant study documents, including the protocols for those studies. Study conduct adhered to the Declaration of Helsinki principles, the Council for International Organizations of Medical Sciences international ethical guidelines, and all applicable laws and regulations. Written informed consent was obtained from all participants.
2.2. Generation of TGI Metrics
As previously described [18], the primary tumor dynamic model presented in this report took the general form described by Claret et al. [19]. The model structure utilized longitudinal tumor size data to estimate drug‐specific cell‐kill rate constant (KD), drug‐resistance rate constant (LAM), and disease‐specific parameters such as baseline tumor size and tumor growth rate constant (KL) (details and equations in Supporting Information methods). Input data comprised 1839 patients and 12,356 imaging observations over time (Table S1 for imaging timepoints). Additional TGI metrics, including time to tumor growth (TTG) and tumor size ratios at week 6 (TR6) and week 8 (TR8) after the patient's first dose of treatment, were derived using post hoc estimates from the final model.
2.3. Variables Included and Data Preparation
In addition to the TGI variables estimated using the TGI modeling, a total of 42 baseline clinical and demographic variables were included after dataset preprocessing (Table S2). Data was split into training (70%) and testing (30%) subsets, and data preparation was carried out independently on each subset. Details regarding preprocessing analyses, including handling of missing data, imputation, and feature engineering, are summarized in the Supporting Information text.
2.4. Prediction of Survival Outcomes
As highlighted in Figure S1, multiple traditional (parametric, semi‐parametric) and ML algorithms were compared to assess the ability to project (i) PFS, the time from therapeutic onset to disease worsening, and (ii) OS, the time from therapeutic onset to death. The outcome projected was a composite of survival time (in weeks) and event status (binary).
The traditional parametric survival methods, such as the Accelerated Failure Time (AFT) [20] model assessed in this work, assume that survival times follow a known distribution. Semi‐parametric survival approaches, including CoxBoost [21], Lasso Cox [22], and Cox Proportional‐Hazards [23], in contrast, do not make assumptions about the shape of the baseline hazard function. For ML methods, tree‐based methods including random survival forest (RSF) [24] and extreme gradient boosting (XGBoost) [25] were utilized. A brief description of each model's methodology is provided in the Supporting Information text.
An overview of the analysis workflow performed for all models is presented in Figure S2. In brief, after splitting the data (70/30) and performing data preprocessing, as described earlier, all models were trained on the training dataset. For ML, CoxBoost, and the Lasso Cox models, hyperparameters were tuned using 10‐fold cross‐validation. Please refer to Table S3 in the supplemental for more information about the tuning parameters and objective optimization metrics used. After training each model, we conducted bootstrap resampling (n = 100 iterations) of the training data to estimate the average performance and variability of each model in predicting PFS and OS, using two key metrics: the concordance index (C‐index) and the integrated Brier score (IBS). The C‐index evaluates a model's ability to correctly rank individuals by predicted risk, with values closer to 1.0 indicating better discrimination [26]. The IBS quantifies projection accuracy over time, with lower scores reflecting better calibration [27]. Additional information on these metrics is provided in the Supporting Information text.
The best performing parametric and semi‐parametric models—based on training performance (C‐index and IBS)—as well as both ML models (XGBoost and RSF) were then used to project PFS and OS on the testing dataset, where bootstrap resampling (n = 100 bootstraps) was again used to assess each model's performance. Top performance was defined using the Multiple Comparisons with the Best (MCB) test [28] (described in detail below). For each model, we applied feature selection strategies on the training set to identify more parsimonious models with relevant markers that contribute most meaningfully to survival outcomes. For ML models, we assessed model performance with various subsets of markers by ranking variable importance and constructing models with the top 10, 7, 5, and 3 markers on the training set. Further methodological details and optimization metrics are described in the Supporting Information text.
2.5. Statistical Analysis and Model Visualization
To evaluate the statistical significance of performance differences among models, we applied the MCB test, a robust non‐parametric method that compares all models against the best‐performing one while controlling the family‐wise error rate (additional details in Supporting Information text). The best‐performing model's critical region (shaded region) serves as the reference; models with confidence intervals overlapping this region are statistically indistinguishable from the best, while those with intervals above it underperform significantly.
Additionally, predictions of top models on the testing dataset and 95% prediction intervals were compared to the observed survival data represented by Kaplan–Meier estimates, allowing for a visual assessment of model calibration and projection performance over time. Furthermore, to determine whether the survival projection by the top‐performing model differed across treatment subgroups, we repeated this analysis for each treatment subgroup on the testing dataset.
Additional plots, including boxplots to visualize models' performance and plots of performance over time by IBS, were generated. All modeling was performed using packages and libraries in R and Python, as noted in Table S3.
2.6. Model Interpretability
To understand how the best models arrived at their projections, SHAP analysis was performed on the training samples. This method quantifies the contribution of each input variable to a model's projections, providing a useful way to verify that the model is behaving reasonably and in a manner consistent with domain expertise. Additionally, SHAP analysis can also reveal novel important features and illuminate relationships among markers. In our study, positive SHAP values indicated an increased risk of progression or non‐survival, and global SHAP variable importance was calculated by summing the absolute SHAP values of the feature across all samples [29]. The SHAPforxgboost [30] R package was used for XGBoost calculations, and the shapviz [31] and kernelshap [32] R packages were employed for RSF.
3. Results
3.1. Prediction of Survival Outcomes
3.1.1. Progression Free Survival
By the MCB test, as shown on Figure 1A, both RSF and XGBoost achieved the best performance of PFS on the training set by C‐index (0.785 [sd 0.012] and 0.783 [sd 0.009], respectively), compared to other tested methods (p < 0.05) (Table 1A). RSF additionally achieved the best performance in terms of IBS (0.092 [sd 0.004]), compared to parametric and non‐parametric tested methods (p < 0.05) (Figure 1B, Table 1A, Figure S4A). On the testing set, both XGBoost and RSF outperformed (p < 0.05) the top parametric (AFT) and semi‐parametric (CoxBoost) approaches by C‐index (0.75 [sd 0.017], 0.739 [sd 0.016], respectively) and IBS (0.132 [sd 0.006], 0.132 [sd 0.004], respectively) (Figure 1C,D, Table 1A, Figure S4A). For further visualization of C‐index and IBS over bootstrap resamples, boxplots were generated with Wilcoxon Rank Sum tests comparing the C‐index and IBS of the best‐performing model to other models (Figure S4A).
FIGURE 1.

MCB test of PFS model performance. FS, feature select models.
TABLE 1.
Prediction of PFS and OS.
| Model | Vars | IBS (mean [95% CI]) | C‐index (mean [95% CI]) | ||
|---|---|---|---|---|---|
| Training set | Testing set | Training set | Testing set | ||
| A. Progression free survival (PFS) | |||||
| Machine learning methods | |||||
| RSF | 35 | 0.092 [0.091, 0.093] | 0.132 [0.131, 0.133] | 0.785 [0.783, 0.787] | 0.739 [0.736, 0.742] |
| XGBoost | 35 | 0.123 [0.122, 0.124] | 0.132 [0.131, 0.133] | 0.783 [0.781, 0.785] | 0.75 [0.747, 0.753] |
| Parametric methods | |||||
| AFT Full | 35 | 0.133 [0.132, 0.134] | 0.153 [0.152, 0.154] | 0.736 [0.734, 0.738] | 0.725 [0.722, 0.728] |
| AFT FS | 11 | 0.165 [0.162, 0.168] | 0.194 [0.192, 0.196] | 0.699 [0.697, 0.701] | 0.696 [0.693, 0.699] |
| Semi‐parametric methods | |||||
| CoxBoost | 27 | 0.132 [0.13, 0.134] | 0.149 [0.148, 0.15] | 0.738 [0.736, 0.74] | 0.725 [0.722, 0.728] |
| Lasso Cox | 12 | 0.136 [0.134, 0.138] | 0.729 [0.727, 0.731] | ||
| Cox Full | 35 | 0.133 [0.131, 0.135] | 0.735 [0.733, 0.737] | ||
| Cox FS | 9 | 0.138 [0.136, 0.14] | 0.725 [0.723, 0.727] | ||
| B. Overall survival (OS) | |||||
| Machine learning methods | |||||
| RSF | 34 | 0.09 [0.089, 0.091] | 0.137 [0.136, 0.138] | 0.867 [0.866, 0.868] | 0.726 [0.721, 0.731] |
| XGBoost | 34 | 0.135 [0.134, 0.136] | 0.77 [0.768, 0.772] | ||
| Parametric methods | |||||
| AFT Full | 34 | 0.144 [0.142, 0.146] | 0.139 [0.138, 0.14] | 0.758 [0.756, 0.76] | 0.715 [0.71, 0.72] |
| AFT FS | 15 | 0.143 [0.141, 0.145] | 0.144 [0.143, 0.145] | 0.751 [0.749, 0.753] | 0.707 [0.702, 0.712] |
| Semi‐parametric methods | |||||
| CoxBoost | 21 | 0.148 [0.146, 0.15] | 0.144 [0.143, 0.145] | 0.755 [0.753, 0.757] | 0.725 [0.72, 0.73] |
| Lasso Cox | 16 | 0.153 [0.15, 0.156] | 0.75 [0.748, 0.752] | ||
| Cox Full | 34 | 0.147 [0.145, 0.149] | 0.756 [0.754, 0.758] | ||
| Cox FS | 14 | 0.141 [0.139, 0.143] | 0.751 [0.749, 0.753] | ||
Note: Model performance by integrated brier score (IBS) and c‐index. “Cox” denotes Cox Proportional‐Hazards model. “Vars” indicates the number of variables included in the model.
Abbreviation: AFT, parametric accelerated failure time model.
3.1.2. Overall Survival
By the MCB test, as shown in Figure 2A, RSF achieved the best performance of OS on the training set by C‐index (0.867 [sd 0.006]) compared to other tested methods (p < 0.05) (Table 1B). RSF also achieved the best performance in terms of IBS (0.09 [sd 0.004]) on the training set compared to parametric and non‐parametric tested methods (p < 0.05) (Figure 2B, Table 1B, Figure S4B). When the top models from the ML (RSF), parametric survival (AFT), and semi‐parametric survival (CoxBoost) approaches were brought forward onto the testing set, RSF outperformed (p < 0.05) other models by C‐index (0.726 [sd 0.028]), while RSF and the AFT model, including all predictors, were equally best by IBS (0.137 [sd 0.006], 0.139 [sd 0.006], respectively) compared to other tested models (Figure 2C,D, Table 1B, Figure S4B).
FIGURE 2.

MCB test of OS model performance. FS, feature select models.
3.2. Machine Learning Performance Following Feature Selection
We next applied feature selection strategies based on variable importance from those methods on the training set to identify more parsimonious ML models with a smaller number of features (10, 7, 5, 3) that contribute most meaningfully to survival outcomes. For predicting PFS by RSF using C‐index, the MCB plot shows that the RSF model with only 3 features performed better than the majority of the tested parametric and semi‐parametric models (Figure S5A, Table S4). Using IBS, both RSF (top 3 features: KL, LAM, KD) and RSF (top 5 features: KL, LAM, KD, neutrophil‐to‐lymphocyte, estimated glomerular filtration rate [eGFR]) achieved the best performance (Figure S5B, Table S4). For predicting OS, the RSF models using the top 5 (KL, tumor burden at baseline, number of metastases at baseline, albumin at baseline, neutrophil‐to‐lymphocyte) and top 7 features (KL, tumor burden at baseline, number of metastases at baseline, albumin at baseline, neutrophil‐to‐lymphocyte, Eastern Cooperative Oncology Group [ECOG] score at baseline, treatment with axitinib) had the best rank by MCB compared with alternative parametric and semi‐parametric approaches (Figure S5C,D, Table S4).
For predicting PFS using XGBoost by C‐index, the XGBoost model which included 10 features performed better than alternative parametric and semi‐parametric methods (Figure S6A, Table S4). Using IBS, XGBoost models incorporating the top 3–10 features (top 3: KL, LAM, KD) all performed better than alternative parametric and semi‐parametric methods (Figure S6B, Table S4). Predictions of OS by XGBoost after feature selection were not evaluated given that XGBoost was not a top predictive model of OS by MCB rankings.
3.3. Prediction of Survival Probability by ML Models Across Different Treatments
To assess model predictions over time, predicted survival probability in the testing set generated by top ML models for PFS (RSF (Figure 3A) and XGBoost (Figure S7)) and OS (RSF (Figure 3B)) was plotted against observed survival captured by Kaplan–Meier curves. Furthermore, individual plots by drug treatments were generated to assess the generalizability of top models across therapies. For RSF predictions of PFS, Kaplan–Meier survival curves fall within the 95% confidence interval of predicted survival in the testing dataset for axitinib‐ and sorafenib‐treated patients, while observed survival was lower than predicted survival for avelumab + axitinib, sunitinib, interferon‐alpha, and the total combined set of patients at later time points (following 60 weeks of therapy) (Figure 3A). For the XGBoost‐derived PFS (Figure S7), Kaplan–Meier curves fall within 95% confidence intervals of predicted survival in the testing dataset for each drug. The 95% confidence intervals are relatively wider for XGBoost‐derived predictions of PFS. For RSF predictions of OS, Kaplan–Meier survival curves fall within the 95% confidence interval of predicted survival in the testing dataset for all treatments, combined and individually, up until 200 weeks. At 200 weeks, the observed survival was lower than predicted survival for avelumab + axitinib and the total combined set of patients (Figure 3B).
FIGURE 3.

Random survival forest prediction intervals compared against observed survival data on the testing dataset. (A) Progression free survival. (B) Overall survival. Solid black line represents Kaplan–Meier curve, while dotted dashed line represents mean survival and color bands represent 95% confidence intervals. Note that the x‐axis has been scaled to achieve uniformity. The last datapoint observed for progression free survival was 115 weeks (Avelumab + Axitinib), 93 weeks (Axitinib), 107 weeks (Interferon‐alpha), 76 weeks (Sorafenib), and 101 weeks (Sunitinib). For overall survival, these were 277 weeks (Avelumab + Axitinib), 100 weeks (Axitinib), 193 weeks (Interferon‐alpha), 97 weeks (Sorafenib), and 198 weeks (Sunitinib).
Additional plots of IBS over time comparing top ML models to the marginal Kaplan–Meier model demonstrate the consistent superiority of ML models across time on the training set and testing sets for PFS and OS (Figure S8).
3.4. Model Interpretability and Top Predictors
To determine the contribution of each variable to model projections, SHAP waterfall and partial dependence plots were generated for predictions from the top models (RSF and XGBoost for PFS, and RSF for OS). For RSF predictions of PFS, the waterfall plots demonstrate the high importance of TGI metrics, which are the top 4 most important variables (Figure 4A). Higher values (yellow color) of tumor growth rate constant (KL) and drug resistance rate constant (LAM), and lower values (purple color) time to tumor growth (TTG), and lower cell kill rate (KD) values all correspond with a higher likelihood of disease progression. Additionally, a higher neutrophil‐to‐lymphocyte ratio, higher Eastern Cooperative Oncology Group performance status scale (ECOG) score, Memorial Sloan Kettering Cancer Center scale (MSKCC) score of two (as compared with 1), and lower albumin corresponded with a higher likelihood of disease progression, while having a reduction or interruption in Sunitinib dosing corresponded with a reduced likelihood of disease progression. Partial dependence plots (Figure 4B) were generated to visualize SHAP vs. feature values for the top four variables. Plots were colored by the value of the top interacting variable to visualize predictor–predictor interactions. Plots demonstrate that KL rapidly increases SHAP value (relative importance) over the KL range 0.0–0.015 before plateauing (0.015–0.15). Likewise, LAM, KD, and TTG all demonstrate nonlinear relationships between values and SHAP importance. Interactions demonstrate that low time to tumor growth corresponds with a high growth rate constant. The strongest interacting variable with KL was determined to be KD, and high KL values are largely observed across the range of KD values and vice versa.
FIGURE 4.

(A) SHAP waterfall plots of random survival forest‐derived indicators for progression‐free survival. (B) Partial dependence plots of top four variables. Legend title represents strongest interacting variable: plots are colored according to strongest interacting variable by mean SHAP interaction absolute values. For example, the strongest interacting variable with KL (tumor growth rate constant) is KD (cell‐kill rate constant).
KL, ECOG score, neutrophil‐to‐lymphocyte ratio, and albumin were also in the top 10 most important predictors of OS in the RSF model, and the associations between values and outcomes trended in the same direction (Figure 5A). Additionally, for OS, higher baseline tumor burden, total number of metastases, bone metastases, blood platelets, and having a score of 1 on the MSKCC all corresponded with a higher likelihood of progression. The SHAP value of the MSKCC score = 1 was, on average, lower than that of the MSKCC score = 2. Hemoglobin values appeared consistently low, with the majority of samples having a low‐range SHAP value (0 to −10) of hemoglobin, while some had higher range values (0 to 40). Partial dependence plots (Figure 5B) demonstrate that baseline tumor burden most strongly interacts with the neutrophil‐to‐lymphocyte ratio and that baseline tumor burden has a nearly linear relationship with model importance (SHAP value).
FIGURE 5.

(A) SHAP waterfall plots of random survival forest‐derived predictions for overall survival. (B) Partial dependence plots of top four variables. Legend title represents strongest interacting variable: plots are colored according to strongest interacting variable by mean SHAP interaction absolute values. For example, the strongest interacting variable with KL (tumor growth rate constant) is BPLT (baseline platelets).
Comparisons were drawn between RSF and XGBoost‐derived ML predictions (Figure S9 for XGBoost SHAP) for PFS. For PFS, variables consistently in the top 10 of SHAP important variables included KL, KD, LAM, and neutrophil‐to‐lymphocyte ratio, with KL being the strongest indicator in both models and whose partial dependence plot mirrors that of the RSF model. The relative directionality of effects appears consistent between models, although high values of important variables for KD and neutrophil‐to‐lymphocyte ratio trend closer to the center (SHAP score = 0). XGBoost also identified lower age, higher eGFR, and higher baseline tumor burden associated with increased likelihood of progression, while high values of BMI, lactate dehydrogenase, and platelets are observed across the spectrum of SHAP values.
In addition to SHAP values, variable importance was calculated for each of the five methods, with the best model presented following feature selection to determine which features are consistently retained across models (Figure 6, Table S5). For both PFS and OS, each of the five developed models retained the tumor growth rate constant (KL) as an important indicator. Additional TGI metrics, including LAM and TTG, were retained by all but one model (XGBoost—Top 3) for predicting PFS. LAM was retained by all but one model (RSF—Top 5) following feature selection for predicting OS.
FIGURE 6.

Model inputs for selected models following feature selection, selected by MCB analysis and top numerical performance via Wilcox tests. (A) Progression free survival (B) Overall survival. BMI, body mass index. Cox (FS): Cox (Feature Select—9 features for PFS, and 14 features for OS) and AFT (FS): Parametric accelerated failure time model (Feature Select—11 features for PFS and 15 features for OS). CoxBoost for PFS and OS includes 27 and 21 features, respectively, while Lasso Cox includes 21 and 16 features, respectively.
4. Discussion
To our knowledge, this study is the first to use ML approaches for predicting survival outcomes in patients with RCC using TGI metrics, baseline demographics, and clinical laboratory markers. By leveraging tree‐based ML models like RSF and XGBoost, this study demonstrates a significant advancement in survival predictions for advanced RCC, with superior discrimination (C‐index: 0.75–0.87) and calibration (IBS: 0.09–0.14) for PFS and OS compared to traditional parametric and semi‐parametric methods. Notably, RSF models achieved comparable or better performance of PFS and OS using only 3–5 covariates, compared to 9–35 required by conventional approaches. Additionally, XGBoost achieved better performance (i.e., IBS) for PFS than alternative parametric and semi‐parametric approaches using only 3 TGI metrics. The important features retained following feature selection for tree‐based ML approaches were also retained in traditional modeling methods, further highlighting their predictive utility while underscoring the ability of ML to reduce the number of necessary additional covariates. Together, these findings underscore the capability of tree‐based ML approaches to distill complex interactions among TGI metrics (e.g., tumor growth rate constant, KL) and clinical variables into actionable prognostic tools, offering a scalable framework for precision oncology.
SHAP analysis provided insights into how indicators influenced ML model predictions for tree‐based models. Absolute SHAP values quantified the relative importance of each feature for each individual, highlighting the importance of TGI metrics for predicting PFS and KL along with ECOG score, baseline tumor burden, and baseline neutrophil‐to‐lymphocyte ratio for predicting OS. For PFS, partial dependence plots revealed nonlinear relationships between individual TGI metrics and their corresponding SHAP values, indicating that patient risk escalated steeply (e.g., with KL and LAM) or decreased (for KD and TTG) up to a certain threshold, followed by a plateau. Partial dependence plots further enabled visualization of the interaction between TGI metrics, which also appeared nonlinear in nature. For example, while KL associated most strongly with KD in SHAP analysis, high KL values were observed across the spectrum of KD values and vice versa. This indicates a lack of simple linear relationship between the two variables. For OS, similar nonlinear dependencies were identified between KL, the neutrophil‐to‐lymphocyte ratio, and survival risk. Such observations underscore the advantage of tree‐based methods, which inherently capture complex interactions that may elude even sophisticated nonlinear parametric models, such as those assuming a log‐normal survival distribution.
Surprisingly, SHAP analyses for RSF prediction of PFS revealed that dose reductions and dose interruptions in Sunitinib treatment had negative SHAP values, which indicate a lower likelihood of progression. This finding aligns with previous work [33, 34], which showed that patients undergoing frequent or permanent dose reductions might stay on treatment for longer durations due to improved tolerability without compromising its efficacy, which may explain the present inverse relationship between drug exposure and PFS. Furthermore, SHAP analyses also emphasized the importance of the neutrophil‐to‐lymphocyte ratio as a predictor, especially for predicting OS, highlighting its potential as an affordable and accessible biomarker for informing clinical decision‐making and stratifying patients in clinical trials in both metastatic and localized RCC [35].
This study aligns with growing evidence that ML modeling can overcome the limitations of traditional survival models, which often rely on linear assumptions or struggle with high‐dimensional data [3, 36, 37]. The non‐linear survival risk patterns revealed by the SHAP analyses—particularly with the KL's U‐shaped association with PFS—demonstrate ML's capability to capture complex relationships that conventional methods might miss. This capability mirrors ML's role in other domains, such as pharmacokinetic/pharmacodynamic modeling [36, 38, 39] where non‐linear dose–response relationships are common. Similar to hybrid models that integrate neural networks with pharmacological ordinary differential equations (ODEs), this work highlights how ML can augment domain‐specific biomarkers like TGI metrics to improve predictive accuracy [40]. Furthermore, the parsimony of ML models here (3–5 variables) echoes findings in gene expression‐based survival prediction, where ML outperforms regression models by efficiently prioritizing informative features [36]. The practical utility of such models extends to clinical trial design and therapeutic personalization. By stratifying patients based on dynamic risk profiles, ML could enrich trials with high‐benefit cohorts, reducing costs and accelerating timelines—a concept exemplified by platforms like TrialTranslator [41]. Additionally, the integration of TGI metrics as top predictors underscores the value of longitudinal imaging data in early efficacy signals, which could inform go/no‐go decisions in drug development. This approach aligns with broader efforts to leverage ML for dose optimization, adverse event projection, and synthetic control arm generation [38, 39, 42]. Of note, in the present dataset, tumor size data availability increased progressively over time, with approximately 40% of all tumor size observations available by 12 weeks, ~60% by 24 weeks, ~75% by 36 weeks, and ~88% by 52 weeks. Preliminary analyses using truncated datasets (not shown here) revealed that TGI model‐derived parameters were sensitive to the duration of available follow‐up: all TGI parameters, including KL, KD, and LAM, increased by more than 15% when derived from data limited to 12 weeks. KL and LAM remained > 15% higher even when using only 24‐week data, compared to estimates derived from the full dataset. In contrast, early tumor size ratio metrics at Week 6 and Week 8 (TR6 and TR8) were more stable across truncated datasets [18]. These variables were initially considered for inclusion in the modeling. However, during data preprocessing, features that were highly correlated (Pearson r > 0.7) or had high variance inflation factors (VIF > 2.5) were excluded to reduce multicollinearity, which can negatively impact the performance of certain survival models, particularly linear ones such as AFT and Cox‐based methods. As a result, TR6 and TR8 were not included in the final modeling set. While they were not retained in the final models, their observed stability suggests potential utility in future models focused on earlier prediction. A formal sensitivity analysis is warranted to assess whether ML survival models trained on early or incomplete tumor data can maintain predictive accuracy. This remains an important area for future work as we seek to extend model applicability to earlier decision‐making timepoints in both clinical trials and real‐world scenarios.
While promising, this work has limitations. Generalizability of these models across additional racial and ethnic groups should be evaluated, as participants in this work were largely Caucasian or Asian. Accordingly, future testing of this model can inform its generalizability to other RCC therapies with different mechanisms of action or RCC patients with different grades of tumor severity. Finally, while the findings reported are encouraging, there are some potential limitations in the prospective use of these algorithms. Use of TGI metrics as indicators requires sufficient clinical follow‐up in order to obtain accurate estimates. Deep learning could potentially provide a solution to this limitation, allowing for the generation of tumor metrics from shorter follow‐ups, which has been explored [43].
In conclusion, the findings presented here demonstrate the potential of tree‐based ML models to enhance survival prediction in RCC by efficiently distilling complex interactions among TGI metrics and clinical variables into actionable prognostic tools. While this work focuses on RCC, its methodology—combining TGI metrics with interpretable tree‐based models—holds significant potential for broader application in other malignancies, pending further validation. As shown in studies leveraging ML for trial enrollment [44], drug toxicity projection [42], and mechanistic modeling [45], ML‐driven approaches stand to impact clinical pharmacology and drug development broadly by translating heterogeneous data into clinically actionable insights [38, 46, 47]. With further validation, the framework described here could enable dynamic risk stratification, precision trial enrollment, and personalized therapeutic selection in RCC, ultimately advancing the paradigm of patient‐centric drug development.
Author Contributions
C.W.G. and M.H.S. wrote the manuscript; M.H.S. and D.O. designed the research; M.H.S., C.W.G., S.L., D.N., and J.L. performed the research; M.H.S., C.W.G., S.L., D.N., and J.L. analyzed the data; M.H.S. and D.O. contributed new reagents/analytical tools.
Conflicts of Interest
M.H.S. and J.L. are employees of, and may own stock/options in, Pfizer Inc. D.O. is an employee of Johnson & Johnson and owns stocks/options in Johnson & Johnson and Pfizer. S.L. and D.N. are former employees of Pfizer Inc., and D.N. owns stocks/options in Pfizer. C.W.G. declared no competing interests for this work. As an Associate Editor for Clinical and Translational Science, Mohamed Shahin was not involved in the review or decision process for this paper.
Supporting information
Data S1: Supporting Information.
Data S2: Supporting Information.
Funding: The authors received no specific funding for this work.
This paper was selected as a 2025 PhRMA Foundation Trainee Challenge Award winner.
References
- 1. Fountzilas E., Tsimberidou A. M., Vo H. H., and Kurzrock R., “Clinical Trial Design in the Era of Precision Medicine,” Genome Medicine 14 (2022): 101, 10.1186/s13073-022-01102-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Altman D. G., De Stavola B. L., Love S. B., and Stepniewska K. A., “Review of Survival Analyses Published in Cancer Journals,” British Journal of Cancer 72 (1995): 511–518, 10.1038/bjc.1995.364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Gong X., Hu M., and Zhao L., “Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐To‐Event Analysis,” Clinical and Translational Science 11 (2018): 305–311, 10.1111/cts.12541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Simon N., Friedman J., Hastie T., and Tibshirani R., “Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent,” Journal of Statistical Software 39 (2011): 1–13, 10.18637/jss.v039.i05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Kolasseri A. E. and Bhimavarapu V., “Comparative Study of Machine Learning and Statistical Survival Models for Enhancing Cervical Cancer Prognosis and Risk Factor Assessment Using SEER Data,” Scientific Reports 14 (2024): 22203, 10.1038/s41598-024-72790-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Rajkomar A., Dean J., and Kohane I., “Machine Learning in Medicine,” New England Journal of Medicine 380 (2019): 1347–1358, 10.1056/NEJMra1814259. [DOI] [PubMed] [Google Scholar]
- 7. Chan P., Marchand M., Yoshida K., et al., “Prediction of Overall Survival in Patients Across Solid Tumors Following Atezolizumab Treatments: A Tumor Growth Inhibition‐Overall Survival Modeling Framework,” CPT: Pharmacometrics & Systems Pharmacology 10 (2021): 1171–1182, 10.1002/psp4.12686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Sheng Y., Teng S. W., Wang J., Wang H., and Tse A. N., “Tumor Growth Inhibition‐Overall Survival Modeling in Non‐Small Cell Lung Cancer: A Case Study From GEMSTONE‐302,” CPT: Pharmacometrics & Systems Pharmacology 13 (2024): 437–448, 10.1002/psp4.13094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Velasquez E., Kassir N., Cheeti S., et al., “Predicting Overall Survival From Tumor Dynamics Metrics Using Parametric Statistical and Machine Learning Models: Application to Patients With RET‐Altered Solid Tumors,” Frontiers in Artificial Intelligence 7 (2024): 1412865, 10.3389/frai.2024.1412865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Padala S. A., Barsouk A., Thandra K. C., et al., “Epidemiology of Renal Cell Carcinoma,” World Journal of Oncology 11 (2020): 79–87, 10.14740/wjon1279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Chow W. H., Dong L. M., and Devesa S. S., “Epidemiology and Risk Factors for Kidney Cancer,” Nature Reviews. Urology 7 (2010): 245–257, 10.1038/nrurol.2010.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Ljungberg B., Campbell S. C., Cho H. Y., et al., “The Epidemiology of Renal Cell Carcinoma,” European Urology 60 (2011): 615–621, 10.1016/j.eururo.2011.06.049. [DOI] [PubMed] [Google Scholar]
- 13. Winer A. G., Motzer R. J., and Hakimi A. A., “Prognostic Biomarkers for Response to Vascular Endothelial Growth Factor‐Targeted Therapy for Renal Cell Carcinoma,” Urologic Clinics of North America 43 (2016): 95–104, 10.1016/j.ucl.2015.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Hutson T. E., Lesovoy V., al‐Shukri S., et al., “Axitinib Versus Sorafenib as First‐Line Therapy in Patients With Metastatic Renal‐Cell Carcinoma: A Randomised Open‐Label Phase 3 Trial,” Lancet Oncology 14 (2013): 1287–1294, 10.1016/S1470-2045(13)70465-0. [DOI] [PubMed] [Google Scholar]
- 15. Motzer R. J., Penkov K., Haanen J., et al., “Avelumab Plus Axitinib Versus Sunitinib for Advanced Renal‐Cell Carcinoma,” New England Journal of Medicine 380 (2019): 1103–1115, 10.1056/NEJMoa1816047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Larkin J., Oya M., Martignoni M., et al., “Avelumab Plus Axitinib as First‐Line Therapy for Advanced Renal Cell Carcinoma: Long‐Term Results From the JAVELIN Renal 100 Phase Ib Trial,” Oncologist 28 (2023): 333–340, 10.1093/oncolo/oyac243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Motzer R. J., Hutson T. E., Tomczak P., et al., “Sunitinib Versus Interferon Alfa in Metastatic Renal‐Cell Carcinoma,” New England Journal of Medicine 356 (2007): 115–124, 10.1056/NEJMoa065044. [DOI] [PubMed] [Google Scholar]
- 18. Lin S., Li J., Shahin M., Nickens D., and Ouellet D., “American Society of Clinical Pharmacology and Therapeutics.”
- 19. Claret L., Mercier F., Houk B. E., Milligan P. A., and Bruno R., “Modeling and Simulations Relating Overall Survival to Tumor Growth Inhibition in Renal Cell Carcinoma Patients,” Cancer Chemotherapy and Pharmacology 76 (2015): 567–573, 10.1007/s00280-015-2820-x. [DOI] [PubMed] [Google Scholar]
- 20. George L. L., “The Statistical Analysis of Failure Time Data,” Technometrics 45 (2012): 255–256, 10.1198/tech.2003.s768. [DOI] [Google Scholar]
- 21. Tutz G. and Binder H., “Boosting Ridge Regression,” Computational Statistics & Data Analysis 51 (2007): 6044–6059, 10.1016/j.csda.2006.11.041. [DOI] [Google Scholar]
- 22. Tibshirani R., “The Lasso Method for Variable Selection in the Cox Model,” Statistics in Medicine 16 (1997): 385–395. [DOI] [PubMed] [Google Scholar]
- 23. Cox D. R., “Regression Models and Life‐Tables,” Journal of the Royal Statistical Society. Series B, Statistical Methodology 34 (1972): 187–202. [Google Scholar]
- 24. Ishwaran H., Kogalur U. B., Blackstone E. H., and Lauer M. S., “Random Survival Forests,” Annals of Applied Statistics 2 (2008): 841–860, 10.1214/08-AOAS169. [DOI] [Google Scholar]
- 25. Vieira D., Gimenez G., Marmerola G., and Estima V., “XGBoost Survival Embeddings: Improving Statistical Properties of XGBoost Survival Analysis Implementation,” (2021).
- 26. F. E. Harrell, Jr. , Lee K. L., and Mark D. B., “Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors,” Statistics in Medicine 15 (1996): 361–387. [DOI] [PubMed] [Google Scholar]
- 27. Park S. Y., Park J. E., Kim H., and Park S. H., “Review of Statistical Methods for Evaluating the Performance of Survival or Other Time‐To‐Event Prediction Models (From Conventional to Deep Learning Approaches),” Korean Journal of Radiology 22 (2021): 1697–1707, 10.3348/kjr.2021.0223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Koning A. J., Franses P. H., Hibon M., and Stekler H. O., “The M3 Competition: Statistical Tests of the Results,” International Journal of Forecasting 21 (2005): 397–409, 10.1016/j.ijforecast.2004.10.003. [DOI] [Google Scholar]
- 29. Lundberg S., “A Unified Approach to Interpreting Model Predictions,” (2017), arXiv Preprint arXiv:1705.07874.
- 30. “SHAPforxgboost: SHAP Plots for ‘XGBoost’ v. R Package Version 0.1.0,” (2020), https://github.com/liuyanguu/SHAPforxgboost/.
- 31. “shapviz: SHAP Visualizations v. R Package Version 0.9.7,” (2025), https://github.com/modeloriented/shapviz.
- 32. Covert I. and Lee S.‐I., “Proceedings of the 24th International Conference on Artificial Intelligence and Statistics.”
- 33. Fostvedt L. K., Nickens D. J., Tan W., and Parivar K., “Tumor Growth Inhibition Modeling to Support the Starting Dose for Dacomitinib,” CPT: Pharmacometrics & Systems Pharmacology 11 (2022): 1256–1267, 10.1002/psp4.12841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Kalra S., Rini B. I., and Jonasch E., “Alternate Sunitinib Schedules in Patients With Metastatic Renal Cell Carcinoma,” Annals of Oncology 26 (2015): 1300–1304, 10.1093/annonc/mdv030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Nunno V. D., Mollica V., Gatto L., et al., “Prognostic Impact of Neutrophil‐To‐Lymphocyte Ratio in Renal Cell Carcinoma: A Systematic Review and Meta‐Analysis,” Immunotherapy 11 (2019): 631–643, 10.2217/imt-2018-0175. [DOI] [PubMed] [Google Scholar]
- 36. Bashiri A., Ghazisaeedi M., Safdari R., Shahmoradi L., and Ehtesham H., “Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review,” Iranian Journal of Public Health 46 (2017): 165–172. [PMC free article] [PubMed] [Google Scholar]
- 37. Chan P., Zhou X., Wang N., Liu Q., Bruno R., and Jin J. Y., “Application of Machine Learning for Tumor Growth Inhibition ‐ Overall Survival Modeling Platform,” CPT: Pharmacometrics & Systems Pharmacology 10 (2021): 59–66, 10.1002/psp4.12576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Shahin M. H., Barth A., Podichetty J. T., et al., “Artificial Intelligence: From Buzzword to Useful Tool in Clinical Pharmacology,” Clinical Pharmacology and Therapeutics 115 (2024): 698–709, 10.1002/cpt.3083. [DOI] [PubMed] [Google Scholar]
- 39. Terranova N., Renard D., Shahin M. H., et al., “Artificial Intelligence for Quantitative Modeling in Drug Discovery and Development: An Innovation and Quality Consortium Perspective on Use Cases and Best Practices,” Clinical Pharmacology and Therapeutics 115 (2024): 658–672, 10.1002/cpt.3053. [DOI] [PubMed] [Google Scholar]
- 40. Kacprzyk K., Holt S., Berrevoets J., Qian Z., and Schaar M. V. D., “ODE Discovery for Longitudinal Heterogeneous Treatment Effects Inference,” in The Twelfth International Conference on Learning Representations (ICLR) (2024). arXiv: 2403.10766. [Google Scholar]
- 41. Orcutt X., Chen K., Mamtani R., Long Q., and Parikh R. B., “Evaluating Generalizability of Oncology Trial Results to Real‐World Patients Using Machine Learning‐Based Trial Emulations,” Nature Medicine 31 (2025): 457–465, 10.1038/s41591-024-03352-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Catacutan D. B., Alexander J., Arnold A., and Stokes J. M., “Machine Learning in Preclinical Drug Discovery,” Nature Chemical Biology 20 (2024): 960–973, 10.1038/s41589-024-01679-1. [DOI] [PubMed] [Google Scholar]
- 43. Laurie M. and Lu J., “Explainable Deep Learning for Tumor Dynamic Modeling and Overall Survival Prediction Using Neural‐ODE,” npj Systems Biology and Applications 9 (2023): 58, 10.1038/s41540-023-00317-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Chow R., Midroni J., Kaur J., et al., “Use of Artificial Intelligence for Cancer Clinical Trial Enrollment: A Systematic Review and Meta‐Analysis,” Journal of the National Cancer Institute 115 (2023): 365–374, 10.1093/jnci/djad013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Feuerriegel S., Frauen D., Melnychuk V., et al., “Causal Machine Learning for Predicting Treatment Outcomes,” Nature Medicine 30 (2024): 958–968, 10.1038/s41591-024-02902-1. [DOI] [PubMed] [Google Scholar]
- 46. Ryan D. K., Maclean R. H., Balston A., Scourfield A., Shah A. D., and Ross J., “Artificial Intelligence and Machine Learning for Clinical Pharmacology,” British Journal of Clinical Pharmacology 90 (2024): 629–639, 10.1111/bcp.15930. [DOI] [PubMed] [Google Scholar]
- 47. Mayer B., Kringel D., and Lotsch J., “Artificial Intelligence and Machine Learning in Clinical Pharmacological Research,” Expert Review of Clinical Pharmacology 17 (2024): 79–91, 10.1080/17512433.2023.2294005. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1: Supporting Information.
Data S2: Supporting Information.
