Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2025 Jan 6;31(3):869–880. doi: 10.1038/s41591-024-03398-5

Prediction of checkpoint inhibitor immunotherapy efficacy for cancer using routine blood tests and clinical data

Seong-Keun Yoo 1,2,3,4,5,6,#, Conall W Fitzgerald 7,8,#, Byuri Angela Cho 1,2,3,4,5,6,#, Bailey G Fitzgerald 9,#, Catherine Han 7,8, Elizabeth S Koh 7,8, Abhinav Pandey 7,8, Hannah Sfreddo 7,8, Fionnuala Crowley 10,11, Michelle Rudshteyn Korostin 10, Neha Debnath 12, Yan Leyfman 13, Cristina Valero 7,8, Mark Lee 7,8, Joris L Vos 7,8, Andrew Sangho Lee 7,8, Karena Zhao 7,8, Stanley Lam 7,8, Ezekiel Olumuyide 1,2, Fengshen Kuo 14,15, Eric A Wilson 1,2,3,4,5,6, Pauline Hamon 1,2,6, Clotilde Hennequin 1,2,6, Miriam Saffern 1,2, Lynda Vuong 14,15, A Ari Hakimi 14,15, Brian Brown 1,2,5,16, Miriam Merad 1,2,6,10,17, Sacha Gnjatic 1,2,3,10,17, Nina Bhardwaj 1,10,18, Matthew D Galsky 10, Eric E Schadt 19, Robert M Samstein 1,2, Thomas U Marron 1,2,10,20, Mithat Gönen 21, Luc G T Morris 7,8,, Diego Chowell 1,2,3,4,5,6,
PMCID: PMC11922749  PMID: 39762425

Abstract

Predicting whether a patient with cancer will benefit from immune checkpoint inhibitors (ICIs) without resorting to advanced genomic or immunologic assays is an important clinical need. To address this, we developed and evaluated SCORPIO, a machine learning system that utilizes routine blood tests (complete blood count and comprehensive metabolic profile) alongside clinical characteristics from 9,745 ICI-treated patients across 21 cancer types. SCORPIO was trained on data from 1,628 patients across 17 cancer types from Memorial Sloan Kettering Cancer Center. In two internal test sets comprising 2,511 patients across 19 cancer types, SCORPIO achieved median time-dependent area under the receiver operating characteristic curve (AUC(t)) values of 0.763 and 0.759 for predicting overall survival at 6, 12, 18, 24 and 30 months, outperforming tumor mutational burden (TMB), which showed median AUC(t) values of 0.503 and 0.543. Additionally, SCORPIO demonstrated superior predictive performance for predicting clinical benefit (tumor response or prolonged stability), with AUC values of 0.714 and 0.641, compared to TMB (AUC = 0.546 and 0.573). External validation was performed using 10 global phase 3 trials (4,447 patients across 6 cancer types) and a real-world cohort from the Mount Sinai Health System (1,159 patients across 18 cancer types). In these external cohorts, SCORPIO maintained robust performance in predicting ICI outcomes, surpassing programmed death-ligand 1 immunostaining. These findings underscore SCORPIO’s reliability and adaptability, highlighting its potential to predict patient outcomes with ICI therapy across diverse cancer types and healthcare settings.

Subject terms: Outcomes research, Predictive markers, Prognostic markers, Machine learning, Cancer immunotherapy


Using multiple datasets from real-world evidence and completed trials, a machine learning model using routine blood and clinical data is shown to be predictive of patient response to immune checkpoint inhibitor therapy, across cancer types and outperforming standard biomarkers.

Main

Immune checkpoint inhibitors (ICIs) such as anti-cytotoxic T lymphocyte-associated antigen 4 (CTLA-4) or anti-programmed death 1 (PD-1)/programmed death-ligand 1 (PD-L1) agents can induce durable responses in a subset of patients with advanced-stage cancers1. However, most patients incur treatment costs without experiencing durable clinical benefit2,3. Thus, a model that can predict the efficacy of ICI drugs would have important ramifications in precision medicine by helping physicians identify patients who are more or less likely to benefit from these treatments.

Tumor mutational burden (TMB) and PD-L1 expression are biomarkers approved by the U.S. Food and Drug Administration (FDA) for this purpose2,4,5. However, these biomarkers have limited accuracy6,7 and practical constraints that have precluded their widespread clinical use, such as the need for sufficient tumor tissue and resources to sequence DNA in case of TMB and the lack of standardized antibody clones and scoring systems for PD-L1 immunohistochemistry3. Thus, a clinical need remains for a quantitative predictive marker that can be easily obtained at a low cost and quick turnaround time across diverse geographical regions and health systems.

An ideal candidate for assessing ICI efficacy may include the systematic integration of widely available clinical variables and standardized measurements from laboratory blood tests that are routinely used in modern medicine8,9. Notably, some studies have reported the association of body mass index (BMI), neutrophil-to-lymphocyte ratio (NLR) or albumin (ALB) with response to ICI1013. However, a comprehensive analysis of clinical and laboratory data and their potential to predict ICI outcomes across different cancer types has not been reported.

Machine learning, a branch of artificial intelligence, enables algorithms to learn from data, identify key patterns and make predictions14. These models have already shown success in various biomedical fields7,9,1519. In this study, we explored whether a machine learning system could predict ICI outcomes using routine blood tests and standard clinical variables. We trained, tested and externally tested a machine learning model on three real-world cohorts and 10 global phase 3 clinical trial cohorts to predict ICI efficacy.

Results

Study cohorts

This study included 9,745 patients across 21 cancer types treated with ICIs from Memorial Sloan Kettering Cancer Center (MSKCC), Mount Sinai Health System (MSHS) and 10 global phase 3 clinical trials (Fig. 1a, Extended Data Tables 13 and Supplementary Fig. 1).

Fig. 1. Schematic of the study design and analysis.

Fig. 1

a, Cohort collection. Top: a real-world cohort (MSK-I) from MSKCC was used for model development. Middle: two real-world cohorts from MSKCC (MSK-II) and MSHS were used. Bottom: 10 global phase 3 clinical trials were used. ITT, intention-to-treat. b, Feature selection analysis. Top: number of features collected in the MSK-I cohort for model development. Bottom: 47 features were tested for the association with overall survival using the Cox proportional hazards regression or clinical benefit using the Cochran-Mantel-Haenszel test. Systemic therapy history was adjusted as a confounding factor in both tests. c, Machine learning analysis. Top: model construction with separate models for predicting overall survival and clinical benefit. Middle: model performance comparison using ROC and AUC (receiver operating characteristic and area under the receiver operating characteristic curve). Bottom: model performance evaluation. Among the two machine learning models, the one that performed the best on the hold-out test set was subjected to the analyses.

Extended Data Table 1.

Patient characteristics

graphic file with name 41591_2024_3398_Tab1_ESM.jpg

Extended Data Table 3.

Patient characteristics: phase 3 clinical trial cohorts

graphic file with name 41591_2024_3398_Tab3_ESM.jpg

To develop the model, we first retrospectively collected data from 2,035 patients across 17 cancer types treated with ICIs between 2014 and 2019 at MSKCC (hereafter referred to as MSK-I), which were randomly divided into a training set (n = 1,628) and hold-out test set (n = 407) with 80:20 ratio. We developed machine learning models using the training set from this cohort and then tested the models in the hold-out test sets. We then further tested the model on an independent cohort of additional 2,104 ICI-treated patients from MSKCC (hereafter referred to as MSK-II). The MSK-II cohort was collected after initial model development, and identical inclusion and exclusion criteria were used as the MSK-I cohort, but the years of eligibility were expanded to patients treated between 2011 and 2020. We then externally tested the model on 4,447 ICI-treated patients in 10 global phase 3 clinical trials2029. Further external testing of the model was performed on a real-world cohort of 1,159 patients treated with ICIs between 2011 and 2019 at MSHS, a large comprehensive health system serving a diverse patient population across the New York metropolitan region. We also analyzed 6,629 patients treated for cancer at MSKCC who did not receive ICI30 (hereafter referred to as MSK non-ICI; Extended Data Table 4). For details on patient inclusion and exclusion criteria, see Methods and Supplementary Figs. 2 and 3.

Extended Data Table 4.

Patient characteristics: MSK non-ICI cohort

graphic file with name 41591_2024_3398_Tab4_ESM.jpg

Characteristics of the patient data

Patients were treated with inhibitors of PD-1 (n = 3,793), PD-L1 (n = 5,253), CTLA-4 (n = 72) or combinations of more than one drug (n = 627), including anti-CTLA-4 with anti-PD-1, anti-CTLA-4 with anti-PD-L1, anti-CTLA-4 with anti-PD-1 and anti-PD-L1 and anti-PD-1 and anti-PD-L1. The median follow-up duration for each cohort was: 25.38 months (interquartile range (IQR) 13.50–45.01) for the training set, 27.37 months (IQR 13.68–49.58) for the hold-out test set, 9.42 months (IQR 3.10–20.67) for the MSK-II cohort, 8.84 months (IQR 2.75–28.47) for the MSHS cohort and 13.64 months (IQR 6.72–19.86) across the clinical trials. The 10 clinical trial cohorts included patients from 12 experimental arms treated with atezolizumab (anti-PD-L1): IMbrave150 (ref. 20), IMspire150 (ref. 21), IMmotion151 (ref. 22), IMvigor211 (ref. 23), IMpower133 (ref. 24), IMpower130 (ref. 25), IMpower131 (atezolizumab plus carboplatin and nanoparticle ALB-bound paclitaxel (ACNP))26, IMpower131 (atezolizumab plus carboplatin and paclitaxel (ACP))26, IMpower132 (ref. 27), IMpower150 (atezolizumab plus bevacizumab, carboplatin and paclitaxel (ABCP))28, IMpower150 (ACP)28 and OAK29. The median follow-up duration of each clinical trial cohort is provided in Extended Data Table 3. We analyzed bladder cancer, hepatobiliary cancer, melanoma, non-small cell lung cancer (NSCLC), renal cell carcinoma (RCC) and small cell lung cancer (SCLC) as separate cancer types as they were collected in all available cohorts. The remaining cancer types were grouped as ‘Others’ in each cohort.

Clinical features and outcomes

We retrospectively collected clinical variables and standardized measurements from routine laboratory blood tests performed on the date of, or no more than 30 days before, the first ICI infusion (Fig. 1b and Supplementary Table 1). In the MSKCC cohorts, TMB was collected from patients’ tumors based on the FDA-authorized MSK-IMPACT platform31. In the clinical trial cohorts, PD-L1 immunostaining data using the SP142 or SP263 clones (Ventana Medical Systems) were collected (Methods). The two primary outcomes were overall survival and a treatment effect, measured as clinical benefit. Overall survival was measured from the first ICI infusion to death from any cause, with the first line used for patients who received multiple ICI treatments. For clinical trial cohorts, overall survival was measured from randomization to death from any cause. Patients alive at the time of review were censored at their last contact. Clinical benefit was defined as a patient’s tumor showing a complete response (CR), partial response (PR) or stable disease (SD) without progression for at least 6 months after the first infusion of ICI, as in prior studies3235. Patients whose tumors showed PD or SD for <6 months after the first ICI infusion were classified as having no clinical benefit. CR, PR, SD and PD were based on RECIST v1.1 criteria36. Both primary outcomes were available in the MSKCC and clinical trial cohorts, but only overall survival data were available in the MSHS cohort. For a description of clinical features and outcomes, see Methods.

Development of the machine learning model

Before model training, we performed feature selection analyses on the training set to identify features associated with the target outcomes of ICI treatment (Fig. 1b and Supplementary Fig. 4). We developed two machine learning models using demographic, clinical and routine laboratory blood test data to predict outcomes after ICI administration, with one trained to predict overall survival and the other trained to predict clinical benefit (CR, PR and SD ≥6 months), and selected the one that performed the best in the hold-out test set (Fig. 1c). Each model consisted of an ensemble of three algorithms3742 with soft-voting. A five-fold cross-validation (CV) was used to optimize each algorithm’s hyperparameters during training. During model training, the training set was divided into five equal-sized folds, each containing the same proportion of data. The algorithm then underwent five iterations of training and evaluation. In each iteration, four folds were used for training, and one fold was used for validation. Model performance was assessed using the concordance index (C-index) for overall survival and the area under the receiver operating characteristic curve (AUC) for clinical benefit. The performance metrics from the five iterations were averaged to obtain a single performance measurement. This process was repeated for all possible hyperparameter combinations, and the hyperparameter with the highest performance metric was selected as the optimal hyperparameter.

The model trained to predict overall survival, SCORPIO (Standard Clinical and labOratory featuRes for Prognostication of Immunotherapy Outcomes), calculates a risk score ranging from 0 to 1, where a higher score indicates a higher probability of a poor outcome (that is, lack of efficacy or early death) after ICI administration. This model was trained using 33 features significantly associated with overall survival, identified through feature selection analysis (Supplementary Fig. 4a and Supplementary Table 2). Similarly, SCORPIO-CB, trained to predict clinical benefit, generates a probability score from 0 to 1, with a higher score indicating a higher likelihood of clinical benefit. This model was trained with 22 features significantly associated with clinical benefit, as identified in the feature selection analysis (Supplementary Fig. 4b and Supplementary Table 2). The performance of the two models was assessed using time-dependent AUC (AUC(t)) for overall survival and AUC for clinical benefit. For details of the machine learning system, see Methods.

For the primary analysis of prognosticating clinical outcomes, patients were stratified into high-risk, moderate-risk and low-risk groups based on the first and third quartiles of the risk scores observed in the training set (Supplementary Fig. 5). The Cox proportional hazards regression tested the association of risk scores with overall survival and the Fisher’s exact test compared clinical benefit rates across the three risk groups. For details of the statistical analysis, see Methods.

Model performance in the internal test datasets

In the hold-out test data, SCORPIO, the machine learning model trained to predict overall survival, prognosticated overall survival at 6, 12, 18, 24 and 30 months following ICI with a median pan-cancer AUC(t) of 0.763 (Fig. 2). SCORPIO outperformed SCORPIO-CB and TMB in predicting overall survival, as shown by AUC(t) values (Supplementary Fig. 6a). It also predicted clinical benefit with a pan-cancer AUC of 0.714, surpassing SCORPIO-CB (pan-cancer AUC 0.701) and TMB (pan-cancer AUC 0.546; Fig. 2 and Supplementary Fig. 6b). SCORPIO consistently outperformed both SCORPIO-CB and TMB across all cancer types (Supplementary Figs. 6 and 7).

Fig. 2. Performance of SCORPIO across all real-world cohorts, phase 3 clinical trials and different tumor types.

Fig. 2

Dot plot summarizing SCORPIO’s performance in prognosticating overall survival at 6, 12, 18, 24 and 30 months and predicting clinical benefit in the three real-world cohorts and 12 experimental arms from 10 phase 3 clinical trials. RWD, real-world data; RCT, randomized clinical trials. aThe calculation of AUC was not feasible due to the absence of clinical benefit data in the MSHS cohort. bThe calculation of AUC(t) was not feasible, as all patients had died by this time point. cThe calculation of AUC(t) was not feasible, as all patients remained alive at this time point.

To determine whether cancer-type-specific models provide better predictive value than SCORPIO, a pan-cancer model, we developed models trained on data specific to each cancer type. First, we conducted feature selection analyses and model training separately for each cancer type. Among the 17 cancer types in the training set, we identified 10 with features significantly associated with overall survival (Supplementary Table 2). We then trained 10 models and compared their performance to SCORPIO in the hold-out test set. SCORPIO outperformed most of the cancer-type-specific models in predicting both overall survival and clinical benefit (Supplementary Fig. 8). This indicates that SCORPIO trained on the large pan-cancer data successfully learned relevant relationships across cancer types.

Next, we compared SCORPIO’s performance to nine machine learning models from Vanguri et al.43, which predict ICI efficacy in patients with NSCLC using uni-, bi- or multi-modal data (radiology, pathology, tumor genetics and PD-L1 scoring). SCORPIO outperformed these models in prognosticating overall survival (Supplementary Fig. 9a) and showed comparable performance in predicting clinical benefit, even though it was trained on simpler, more accessible pan-cancer data (Supplementary Fig. 9b).

In the hold-out test data, the three risk groups (low-risk, moderate-risk and high-risk) showed significantly different overall survival (Supplementary Fig. 10a). Across tumor types, the hazard ratios (HRs) for death compared to the high-risk group were 0.25 (95% confidence interval (CI), 0.18–0.34) for the low-risk group and 0.48 (95% CI, 0.37–0.63) for the moderate-risk group. Furthermore, the clinical benefit rates significantly differed in each risk group across tumor types – low-risk, 55.96%; moderate-risk, 28.64%; high-risk, 12.12% (P = 3.22 × 10−11; Supplementary Fig. 10b).

We then sought to test SCORPIO on the independent real-world MSK-II cohort. In this cohort, SCORPIO prognosticated overall survival at 6, 12, 18, 24 and 30 months following ICI with a median pan-cancer AUC(t) of 0.759 (Fig. 2). It also predicted clinical benefit from ICI with a pan-cancer AUC of 0.641. In accordance with the results from the hold-out test data, SCORPIO outperformed TMB based on both AUC(t) and AUC (Supplementary Figs. 11 and 12). The three risk groups had significantly different overall survival (Fig. 3a and Supplementary Fig. 13a). Across tumor types, HRs for death in the low-risk and moderate-risk groups compared to the high-risk group were 0.16 (95% CI, 0.14–0.19) and 0.38 (95% CI, 0.34–0.43), respectively. Furthermore, the clinical benefit rates significantly differed in each risk group across tumor types—low-risk, 65.09%; moderate-risk, 52.20%; high-risk, 32.89% (P = 2.35 × 10−11; Fig. 3b and Supplementary Fig. 13b). In both internal test datasets, the association between risk groups and clinical outcomes was independent of the line of therapy in which ICI was administered, sex, age, Eastern Cooperative Oncology Group performance status (ECOG-PS), microsatellite instability (MSI) status and TMB (Supplementary Figs. 14 and 15).

Fig. 3. Performance of SCORPIO on the MSK-II cohort (internal test set).

Fig. 3

a, Kaplan-Meier plots showing overall survival for the three risk groups stratified by SCORPIO. Tick marks indicate censored data. Black vertical and horizontal dashed lines represent the median survival time for each risk group. Two-sided P values were calculated using the log-rank test. Correction for multiple testing was not applied. b, Bar charts displaying clinical benefit rates for the three risk groups stratified by SCORPIO. Two-sided P values were calculated using the Fisher’s exact test. Correction for multiple testing was not applied.

To determine whether SCORPIO is specifically prognostic for ICI efficacy or generally prognostic for patients with cancer regardless of treatment, we analyzed a cohort of non-ICI-treated patients from MSKCC30. SCORPIO was able to prognosticate overall survival for non-ICI patients in some, but not all, cancer types (Supplementary Fig. 16). However, in contrast to the ICI treatment context, its prognostic accuracy decreased for specific cancers such as bladder cancer (P = 0.2713), hepatobiliary cancer (P = 0.1038), esophageal cancer (P = 0.8886) and ovarian cancer (P = 0.4305; Fig. 3a and Supplementary Figs. 10a, 13a and 16). These findings suggest that SCORPIO is more effective at prognosticating overall survival in the context of ICI treatment.

Model interpretability

To understand how each feature contributes to SCORPIO’s risk score prediction, we analyzed the relative effect of its 33 features in the training set using the SHapley Additive exPlanations (SHAP) approach44. SHAP quantified the contribution of each feature to patient-to-patient variation in ICI efficacy (Fig. 4a). The top five features contributing the most were chloride (CL), ALB, hemoglobin (HGB), ECOG-PS and eosinophil proportion among white blood cells (EOS%).

Fig. 4. Model interpretability.

Fig. 4

a, Global model explanation using a dot plot of aggregated SHAP values for SCORPIO features. A higher value in a feature with a negative aggregated SHAP value (yellow) lowers the risk score value, whereas a higher value in a feature with a positive aggregated SHAP value (purple) increases it. Features were sorted by absolute aggregated SHAP value. WBC, white blood cell; RBC, red blood cell; AGAP, anion gap; PROT, total protein; LYM%, lymphocyte proportion among WBCs; NEUT%, neutrophil proportion among WBCs; Smoking, smoking history; NEUT, neutrophil count; CREAT, creatinine; LYM, lymphocyte count; HCT, hematocrit; GLU, glucose; MONO, monocyte count; ALT, alanine aminotransferase; Age, age at ICI; AST, aspartate aminotransferase; MCHC, mean corpuscular HGB concentration; Stage, tumor stage at ICI; MLR, monocyte-to-lymphocyte ratio; RDW, red blood cell distribution width; ALK, alkaline phosphatase; BASO%, basophil proportion among WBCs; eGFR, estimated glomerular filtration rate; PLT, platelet; BILI, total bilirubin; BLR, basophil-to-lymphocyte ratio. be, Local model explanation in (b and c) two representative cases with a CR to ipilimumab/nivolumab and atezolizumab, respectively, and (d and e) two representative cases with PD to atezolizumab and pembrolizumab, respectively. Each case is depicted with a bar chart in the left panel, displaying the aggregated SHAP values that indicate the magnitude and the direction of each feature’s impact on the predicted risk score. The right panel shows pre- and post-treatment radiographic images. Feature values of the corresponding features in a given patient are provided in the bar charts. The best overall tumor response and survival of each patient are also shown. Density plots show the distribution of the risk scores in the training set, and black dashed lines indicate each patient’s predicted risk score. MSS, microsatellite stable. The yellow arrows in b and c indicate liver and lung metastases, respectively, in the pre-immunotherapy scan (‘Pre-ICI’). The corresponding post-therapy scan (‘Post-ICI’) demonstrates complete response, with no visible lesions. The yellow lines in d represent the bidirectional diameters of malignant pleural effusion in pre- and post-immunotherapy scans, reflecting progressive disease despite treatment. The yellow dashed line in e outlines a new malignant pleural effusion that developed during ICI therapy, indicating progressive disease. f,g, Heatmaps displaying the association between 14 immune cell types and the top five features, along with the predicted risk score from SCORPIO in (f) patients with NSCLC and (g) patients with head and neck (H&N) cancer. NK, natural killer. Two-sided P values were calculated using the Spearman’s rank correlation test. * False discovery rate (FDR) adjusted P < 0.05. ** FDR adjusted P < 0.01. Number in each cell denotes Spearman’s ρ.

Figure 4b–e show representative patients from the hold-out test set with different risk scores and clinical responses. Each feature’s contribution varied in direction and magnitude based on its value and the values of other features, demonstrating the model’s complexity in predicting ICI efficacy for each patient.

Next, we investigated how the top five features and the predicted risk score reflect characteristics of the tumor microenvironment (TME). We gathered an additional cohort of 264 patients with NSCLC, with available bulk RNA-sequencing (RNA-seq), blood test values (performed on the date of, or no more than 30 days before, the tumor biopsy) and clinical data. Using the Danaher signature45, which was validated as the most accurate immune cell deconvolution method for NSCLC46, we deconvoluted 14 immune cell types. We then analyzed the correlations between their abundances and the levels of the top five features, as well as the predicted risk score (Fig. 4f). Our findings showed that higher ALB levels were associated with increased abundances of mast cells, T cells, B cells, CD45 cells and regulatory T cells. Conversely, lower ECOG-PS was linked to greater abundances of T cells, B cells, CD45 cells, exhausted CD8 cells and cytotoxic cells. Additionally, a lower predicted risk score (indicating better-predicted response to immunotherapy) corresponded with higher abundances of mast cells, T cells, B cells, CD45 cells, regulatory T cells, natural killer CD56 dim cells and Th1 cells. We further analyzed the association between the abundances of the 14 immune cell types and the levels of the top 5 features, as well as the predicted risk score, in patients with head and neck (H&N) cancer (n = 32) from the MSK-I cohort (Fig. 4g). Compared to the NSCLC cohort, there were fewer significant associations, likely due to the smaller sample size. However, very similar relationships were observed—various immune cell types were positively correlated with ALB levels, whereas ECOG-PS and predicted risk score were negatively correlated with many immune cell types. These results suggest that some features in SCORPIO reflect the TME status and a low predicted risk score corresponds to an immune-inflamed phenotype in patients.

We also assessed whether the top five features correlated with TMB. Using patients from multiple MSKCC cohorts (n = 2,969), we found that TMB was generally not associated with the top five features or the risk scores, except in a few cancer types (Supplementary Fig. 17).

Model performance in the external test datasets

Among the clinical trial cohorts, SCORPIO achieved its highest performance in prognosticating overall survival at 6, 12, 18, 24 and 30 months in the IMvigor211 trial (bladder cancer) with a median AUC(t) of 0.782, and in predicting clinical benefit in IMspire150 trial (melanoma), with an AUC of 0.684 (Fig. 2). In each clinical trial cohort, the three risk groups showed significantly different overall survival rates (P < 0.0001 for each trial; Fig. 5a). Similarly, clinical benefit rates varied significantly across risk groups (P values: 0.043 for IMpower133, 0.0027 for IMpower131 (ACNP), 0.001 for IMpower131 (ACP), 0.0004 for IMpower150 (ABCP), 0.0003 for IMspire150 and OAK, 0.0001 for IMpower150 (ACP) and <0.0001 for the remaining trials; Fig. 5b). Importantly, these results were independent of sex, age and PD-L1 expression (Supplementary Fig. 18). In clinical trials, SCORPIO outperformed PD-L1 staining in predicting clinical benefit and overall survival, as indicated by various performance metrics (Supplementary Fig. 19).

Fig. 5. Performance of SCORPIO on the 10 global phase 3 clinical trial cohorts (external test sets).

Fig. 5

a, Kaplan-Meier plots showing overall survival for the three risk groups stratified by SCORPIO in the 12 experimental arms from the 10 clinical trial cohorts. Tick marks indicate censored data. Black vertical and horizontal dashed lines represent the median survival time for each risk group. Two-sided P values were calculated using the log-rank test. Correction for multiple testing was not applied. HCC, hepatocellular carcinoma. b, Bar charts showing clinical benefit rates for the three risk groups stratified by SCORPIO in the 12 experimental arms from the 10 clinical trial cohorts. Two-sided P values were calculated using the Fisher’s exact test. Correction for multiple testing was not applied.

To further test model generalizability, we analyzed a real-world cohort of patients treated at a large, comprehensive health system (MSHS), encompassing a diverse patient population. In this cohort, SCORPIO prognosticated overall survival at 6, 12, 18, 24 and 30 months following ICI with a median pan-cancer AUC(t) of 0.725 (Fig. 2), and the three risk groups had significantly different overall survival after ICI administration (Fig. 6). Across tumor types, HRs for death in the low-risk and moderate-risk groups compared to the high-risk group were 0.25 (95% CI, 0.18–0.34) and 0.41 (95% CI, 0.33–0.50), respectively. Importantly, all these results were independent of the line of therapy in which ICI was administered, sex, age and ECOG-PS (Supplementary Fig. 20).

Fig. 6. Performance of SCORPIO on the MSHS cohort (external test set).

Fig. 6

Kaplan-Meier plots showing overall survival for the three risk groups stratified by SCORPIO. Tick marks indicate censored data. Black vertical and horizontal dashed lines represent the median survival time for each risk group. Two-sided P values were calculated using the log-rank test. Correction for multiple testing was not applied.

Model performance comparison across cohorts and tumor types

SCORPIO performed better in prognosticating overall survival in real-world cohorts compared to phase 3 clinical trials across most cancer types (Fig. 2). For example, in bladder cancer, the median AUC(t) in real-world cohorts was 0.809 (across all time points and cohorts), outperforming the 0.782 observed in the IMvigor211 trial. Similarly, for hepatobiliary cancer, the median AUC(t) in real-world data reached 0.746, surpassing the 0.704 reported in the IMbrave150 trial. Notably, SCORPIO demonstrated the strongest performance in RCC from the real-world cohorts, with a median AUC(t) of 0.829, higher than the 0.668 observed in the IMmotion151 trial.

The model performed better in real-world cohorts, likely due to the broader range of patient characteristics, cancer types and treatment environments in the training data. This also suggests that the model effectively captures the complexities and variations found in everyday clinical practice, enhancing its applicability for predicting the efficacy of ICIs in diverse patient populations. Notably, the model performed better at prognosticating overall survival than predicting clinical benefit across most cancer types and cohorts, likely reflecting the robustness of overall survival as a reliable clinical endpoint, which is often prioritized in oncology for its clear and objective outcomes compared to clinical benefit.

Furthermore, our analysis revealed that SCORPIO’s performance in predicting clinical benefit is not uniform across different cancer types (Fig. 2). To understand the variability in performance across cancer types, we compared SHAP values between cancer-type-specific models and SCORPIO. This analysis revealed key features that SCORPIO’s pan-cancer modeling approach may have overlooked. Our findings showed some variability in the importance of specific features in different cancer types within the SCORPIO model (Supplementary Fig. 21). For example, although SHAP analyses indicated that ALB and HGB were important in SCORPIO, their importance was reduced in cancer-type-specific models, particularly in bladder cancer, ovarian cancer, H&N cancer and NSCLC. Additionally, features like viral infection, relevant in H&N cancer due to human papillomavirus status, and platelet count, influential in both H&N cancer and melanoma, highlight the possibly unique biological characteristics of each cancer type. These variations may suggest that SCORPIO’s pan-cancer approach may not fully capture the cancer-type-specific importance of certain features. Nevertheless, SCORPIO outperformed cancer-type-specific models in predicting overall survival and clinical benefit, demonstrating its robustness and generalizability (Supplementary Fig. 8). Targeted refinements that incorporate cancer-specific features may further enhance SCORPIO’s accuracy in predicting clinical benefit, balancing the need for generalizability with the precision required for specific cancer types in the future.

Discussion

There is an important clinical need to develop universally accessible biomarkers to predict patient response to ICIs. Currently, available genomic and immunological assays are not widely accessible globally. In this study, we describe SCORPIO, a machine learning model that relies on routine blood tests and basic clinical data to predict clinical outcomes after ICI administration more effectively than existing FDA-approved biomarkers like TMB and PD-L1 immunohistochemistry.

Our data were collected from two centers and 10 global phase 3 clinical trials, totaling 9,745 patients across 21 cancer types, representing the largest dataset in cancer immunotherapy to date. The MSHS cohort consists of patients with diverse backgrounds from outpatient centers across New York City. Compared to the MSKCC cohort and clinical trial cohorts, the MSHS cohort is more heterogeneous regarding ethnicity, socioeconomic status, comorbidity and health literacy. Despite this heterogeneity, we found consistent results across the MSHS, MSKCC and clinical trial cohorts. Importantly, risk group stratification was based on generalized cutoffs that prognosticated patient outcomes across cancer types.

SCORPIO outperformed TMB and PD-L1 staining in predicting ICI efficacy. PD-L1 immunohistochemistry is not universally available and is performed using various platforms, antibodies and quality assurance practices47,48. TMB estimation requires resource-intensive genomic profiling, and measured TMB varies across genomic panels due to differences in panel size, gene content and bioinformatics pipelines49,50.

Our study has some limitations. The training set was retrospectively collected over several years from MSKCC, leading to a prevalence of certain cancer types such as NSCLC, melanoma, bladder cancer and RCC. Despite this, we showed that the model could predict clinical benefit and survival across multiple external datasets from another medical center and global clinical trials. The diversity of these external datasets introduces heterogeneity but also confirms the model’s generalizability. However, the model’s performance on less common cancer types included in the ‘Others’ group should be further tested on larger datasets.

Although SCORPIO maintained consistent performance in prognosticating overall survival across various cohorts, its ability to predict clinical benefit from ICI varied across different cancer types and cohorts. This finding suggests that SCORPIO is reliable for survival prognostication but faces challenges in accurately predicting clinical benefit.

In terms of clinical endpoints, overall survival is generally regarded as the most reliable and objective endpoint for assessing treatment efficacy in oncology51,52 and is a key metric used by regulatory agencies when approving new cancer drugs53. The case of bevacizumab in metastatic breast cancer highlights this, as its initial approval based on progression-free survival was later revoked due to a lack of overall survival improvement54,55. Given SCORPIO’s strong predictive performance for overall survival, we conclude that it effectively provides prognostic insights on patient survival when treated with ICI drugs. SCORPIO’s more modest performance in predicting tumor response is expected in the context of immunotherapy, especially ICIs, where the link between tumor response and patient survival can be weak, influenced by factors like pseudoprogression, delayed responses and development of new lesions followed by responses56,57. A meta-analysis by Kaufman et al. found that 75% (6/8) of randomized trials of ICI drugs showing improved overall survival lacked improvements in progression-free survival58, underscoring the weak relationship between overall survival and surrogate endpoints like progression-free survival or objective response rate59. Nonetheless, evaluating surrogate measures such as tumor response alongside overall survival can provide a more comprehensive understanding of treatment impact, including earlier signals of anti-tumor activity. Future iterations of SCORPIO will offer improved prediction of surrogate measures such as tumor response as well as overall survival, as additional data become available.

Despite its limitations, SCORPIO remains a highly accessible model for predicting ICI efficacy and can aid clinical decision-making when used alongside other assessments like TMB, PD-L1 staining and MSI status. It can help prioritize treatment options between ICI, cytotoxic and targeted therapies, assess the risk-benefit ratio of ICI for patients at risk of immune-related adverse events and guide clinical trial design by selecting or enriching patients more or less likely to benefit from ICIs.

In conclusion, we developed and tested SCORPIO, a machine learning model that predicts outcomes for patients with cancer treated with ICIs. SCORPIO’s key advantage is its accessibility in all practice settings, including low-resource healthcare environments. All features in SCORPIO are routinely collected in hospitals and clinics worldwide and can be accessed via patient clinical records, making our approach noninvasive, cost-effective and globally accessible. Further investigation is required to prospectively validate the use of our model in various clinical settings.

Methods

Ethics approval

The study protocol was approved by the institutional review boards at Icahn School of Medicine at Mount Sinai and MSKCC, and informed consent was obtained from all patients.

Cohorts description

Cohort description for MSK-I cohort

We retrospectively assembled a real-world cohort with 3,278 patients who were treated with at least one dose of ICI from 2014 through 2019 from MSKCC (Supplementary Fig. 2a). We excluded 818 patients with a history of more than one cancer, 26 patients who were enrolled in blinded trials, 115 patients with cancer types with fewer than 25 cases, 184 patients with inadequate clinical or laboratory data. We also excluded 100 patients who received ICI in a neoadjuvant or adjuvant setting. As a result, the MSK-I cohort consisted of 2,035 patients across 17 cancer types (Extended Data Table 1). Of 2,035 patients, the median age was 63.50 years (IQR 54.77–70.92 years), and 1,164 (57.20%) were male. Of the total, 638 patients (31.35%) were treated with ICI as the first line of therapy. The most abundant cancer types were: NSCLC (n = 666, 32.73%), RCC (n = 229, 11.25%), melanoma (n = 210, 10.32%), H&N cancer (n = 168, 8.26%) and bladder cancer (n = 111, 5.45%) (Supplementary Fig. 1a).

Clinical features for MSK-I cohort

A detailed description of all features and their units is provided in Supplementary Table 1. Two features were collected for demographic data (age and sex), and eight features were collected for clinical data (BMI, drug class, chemotherapy during immunotherapy, systemic therapy history (PreChemo), ECOG-PS, smoking history, tumor stage and viral infection). From blood tests, 47 features were initially collected: 17 features from comprehensive metabolic panel (CMP; ALB, alkaline phosphatase, alanine aminotransferase, anion gap, aspartate aminotransferase, blood urea nitrogen, calcium, CL, carbon dioxide, creatinine, estimated glomerular filtration rate (eGFR), glucose, potassium, bilirubin, total protein, magnesium and phosphorus), 21 features from complete blood count (CBC; white blood cell (WBC) count, basophil count, eosinophil count, granulocyte count, lymphocyte count, monocyte count, neutrophil count, basophil proportion among WBCs, EOS%, granulocyte proportion among WBCs, lymphocyte proportion among WBCs, monocyte proportion among WBCs, neutrophil proportion among WBCs, hematocrit, HGB, mean corpuscular HGB concentration, mean corpuscular HGB, mean corpuscular volume, platelet, red blood cell and red blood cell distribution width), 3 features from coagulation panel (activated partial thromboplastin time, international normalized ratio and prothrombin time), conjugated bilirubin, direct bilirubin, glucose-6-phosphate dehydrogenase, ionized calcium, lactate dehydrogenase and lipase. Among these, 13 features, which had ≥70% missing values across the patients in the cohort, were removed from the subsequent analyses: 2 features from CMP (magnesium and phosphorus), 2 features from CBC (granulocyte count and granulocyte proportion among WBCs), and all features from coagulation panel (activated partial thromboplastin time, international normalized ratio and prothrombin time), conjugated bilirubin, direct bilirubin, glucose-6-phosphate dehydrogenase, ionized calcium, lactate dehydrogenase and lipase.

Then, four immune cell-to-lymphocyte ratios were manually calculated as the absolute count of each immune cell type divided by the absolute count of lymphocytes: basophil-to-lymphocyte rate, eosinophil-to-lymphocyte ratio, monocyte-to-lymphocyte ratio and NLR. The above immune cell-to-lymphocyte ratios were considered as part of the CBC.

In total, there were 48 features from four types of data modalities: demographic (n = 2), clinical (n = 8), CMP (n = 15) and CBC (n = 23). All clinical features were collected before the first ICI infusion (performed on the date of, or no more than 30 days before, the first ICI infusion). For eGFR, results are reported without race adjustment. Tumors were staged at the time of ICI administration following the guidelines from the American Joint Committee on Cancer, 8th edition60 (with the exception of primary central nervous system (CNS) malignancies, which were not staged).

TMB data from MSK-IMPACT31 next-generation sequencing assay approved by the FDA as a tumor profiling method was available. TMB was defined as the total number of somatic nonsynonymous mutations per megabase (mut/Mb). For the subgroup analyses, patients with TMB ≥ 10 and TMB < 10 were defined as TMB-High and TMB-Low groups, respectively. MSI status was evaluated using MSIsensor61 with the following criteria: stable (0 ≤ MSI score < 3), indeterminate (3 ≤ MSI score < 10) and unstable (MSI score ≥ 10).

Before performing feature selection and training the machine learning algorithms, we imputed missing values in the MSK-I cohort using MissForest62 from the missingpy package (v.0.2.0) with default parameters (max_iter=10, decreasing=False, missing_values=np.nan, copy=True, n_estimators=100, criterion = (‘mse’, ‘gini’), max_depth=None, min_samples_split=2, min_samples_leaf=1, in_weight_fraction_leaf=0.0, max_features = ‘auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs = -1, random_state=None, verbose=0, warm_start=False, class_weight=None) using the Python 3.8.8 (https://www.python.org/). The average number of missing values across the 48 features was 0.70 per patient. After missing value imputation, we randomly split the MSK-I cohort with 80:20 ratio for the training set (n = 1,628) and the hold-out test set (n = 407). To avoid any potential bias of model performance between the training and hold-out test sets, we kept the same distribution of tumor response, cancer types and systemic therapy history between the training and hold-out test sets. Splitting the training and the hold-out test sets was performed with the group_by and sample_frac functions from the dplyr (v.1.1.4) and tidyverse (v.2.0.0) packages using the R programming language version 4.1.1 (https://www.r-project.org/).

Cohort description for MSK-II cohort

We retrospectively collected an additional real-world cohort with 3,159 patients who were treated with at least one dose of ICI from 2011 through 2020 at MSKCC to further test our model internally, drawn from patients captured under a broader time period than the MSK-I cohort and contemporaneous patients not undergoing tumor genomic sequencing (Supplementary Fig. 2b). We excluded 660 patients with a history of more than one cancer, 14 patients who were enrolled in blinded trials, 65 patients with cancer types with fewer than 10 cases and 184 patients with inadequate clinical or laboratory data. We also excluded 132 patients who received ICI in a neoadjuvant or adjuvant setting. As a result, the MSK-II cohort consisted of 2,104 patients across 19 cancer types (Extended Data Table 1 and Supplementary Fig. 2b). Of the 2,104 patients, the median age was 67.13 years (IQR 58.59–74.33 years), and 1,180 (56.08%) were male. Of the total, 1,189 patients (56.51%) were treated with ICI as the first line of therapy. The most abundant cancer types were NSCLC (n = 755, 35.88%); bladder cancer (n = 156, 7.41%); RCC (n = 154, 7.32%); melanoma (n = 151, 7.18%) and SCLC (n = 137, 6.51%) (Supplementary Fig. 1d).

Clinical features of MSK-II cohort

For the MSK-II cohort, four types of data modalities required for SCORPIO were collected: demographic (n = 1), clinical (n = 4), CMP (n = 11) and CBC (n = 17) (for details about the 33 features required for SCORPIO, see ‘Feature selection analysis’ section). All clinical features except for tumor stage were retrieved (performed on the date of, or no more than 30 days before, the first ICI infusion). Tumors were staged at diagnosis following the guidelines from the American Joint Committee on Cancer, 8th edition (except for primary CNS malignancies, which were not staged). For eGFR, results are reported without race adjustment. In the MSK-II cohort, 934 patients (44.39%) underwent MSK-IMPACT sequencing. Hence, TMB data were only available for this subset of patients. For subgroup analyses, patients with TMB ≥ 10 and TMB < 10 were defined as TMB-High and TMB-Low groups, respectively. MSI status was evaluated using MSIsensor with the following criteria: stable (0 ≤ MSI score < 3), indeterminate (3 ≤ MSI score < 10) and unstable (MSI score ≥ 10).

We imputed missing values using MissForest from the missingpy package with default parameters after combining this cohort with the training set into a single data frame. The average number of missing variables across the 33 features was 0.57 per patient.

Cohort description for MSK non-ICI cohort

For the MSK non-ICI cohort, 6,629 patients not treated with ICI were derived from a previous study30 (Extended Data Table 4). The median age in this cohort was 61.15 years (IQR 50.83–69.55 years), and 2,912 (43.93%) were male. The most abundant cancer types were NSCLC (n = 1,160, 17.50%), colorectal cancer (n = 1,124, 16.96%), breast cancer (n = 820, 12.37%), pancreatic cancer (n = 753, 11.36%) and sarcoma (n = 541, 8.16%).

Clinical features of MSK non-ICI cohort

For the MSK non-ICI cohort, four types of data modalities required for SCORPIO were collected: demographic (n = 1), clinical (n = 4), CMP (n = 11) and CBC (n = 17). All clinical features were collected at the time of diagnosis. Tumors were also staged at diagnosis following the guidelines from the American Joint Committee on Cancer, 8th edition (with the exception of primary CNS malignancies, which were not staged). For eGFR, results are reported without race adjustment.

We imputed missing values using MissForest from the missingpy package with default parameters after combining this cohort with the training set into a single data frame.

Cohort description for MSHS cohort

An additional retrospective real-world cohort from MSHS was collected to test whether the prognostic power of our framework was generalizable to a different healthcare setting (Supplementary Fig. 2c). In the MSHS cohort, we identified 1,230 patients who were treated with at least one dose of ICI from 2011 through 2019. We excluded one patient who was enrolled in a blinded trial, 26 patients treated with ICI for hematologic malignancies, 16 patients with cancer types of fewer than 10 cases and 28 patients with inadequate clinical or laboratory data. As a result, the MSHS cohort consisted of 1,159 patients across 18 cancer types (Extended Data Table 1). Of 1,159 patients, the median age was 66.84 years (IQR 58.92–74.38 years), and 691 (59.62%) were male. Of the total, 551 patients (47.54%) were treated with ICI as the first line of therapy. The most abundant cancer types were hepatobiliary cancer (n = 304, 26.23%), NSCLC (n = 281, 24.25%), melanoma (n = 128, 11.04%), H&N cancer (n = 94, 8.11%) and bladder cancer (n = 66, 5.69%) (Supplementary Fig. 1e).

Clinical features of MSHS cohort

For the MSHS cohort, four types of data modalities that were required for SCORPIO were collected: demographic (n = 1), clinical (n = 4), CMP (n = 11) and CBC (n = 17). All clinical features were retrieved (performed on the date of, or no more than 30 days before, the first ICI infusion). For eGFR, results are reported without race adjustment. The following records of the 1,159 patients were manually reviewed for clinical data verification: ECOG-PS, cancer type, tumor stage, smoking history, drug type and systemic therapy history. Tumors were staged at the time of ICI administration following the guidelines from the American Joint Committee on Cancer, 8th edition60 (with the exception of primary CNS malignancies, which were not staged). We imputed missing values using MissForest from the missingpy package with default parameters after combining this cohort with the training set into a single data frame. The average number of missing variables across 33 features was 1.94 per patient.

Outcomes in real-world cohorts

Overall survival was calculated from the first ICI infusion to death from any cause, with patients alive at the time of review censored at their last contact. For patients who received multiple ICI treatments, the start date of the first treatment was used in the analysis. In the MSK-I cohort, both clinical benefit and overall survival data were available for all patients. In the MSK-II cohort, overall survival data were available for all patients, but only 934 patients (44.39%) had clinical benefit data. For the MSK non-ICI and MSHS cohorts, only overall survival data were available. The primary clinical outcomes were clinical benefit to ICI and overall survival after ICI. Clinical benefit was classified based on RECIST v1.1 (ref. 36). If formal RECIST reads were unavailable, the physician notes and imaging studies were manually reviewed by physician investigators to categorize the overall best response for each patient using the same criteria based on the change in the sum of diameters of target lesions. CR, PR and SD ≥6 months were classified as clinical benefit whereas SD <6 months and PD were classified as no clinical benefit. The rationale for using clinical benefit as a treatment effect outcome is derived from systematic reviews in the cancer immunotherapy context, which indicate that patients with SD ≥6 months have more similar overall survival outcomes to patients with tumor response categorized as minor PR, in contrast to patients with SD <6 months, who have overall survival outcomes more similar to patients experiencing PD35.

Cohort description for clinical trial cohorts

We obtained 10 phase 3 clinical trials2029 for a further external testing: IMbrave150 (n = 279)20, IMspire150 (n = 256)21, IMmotion151 (n = 445)22, IMvigor211 (n = 447)23, IMpower133 (n = 197)24, IMpower130 (n = 467)25, IMpower131 (n = 680)26, IMpower132 (n = 288)27, IMpower150 (n = 793)28 and OAK (n = 595)29 (Extended Data Tables 2 and 3 and Supplementary Fig. 3). There were six different cancer types across the 10 cohorts: hepatocellular carcinoma (IMbrave150), BRAFV600E-positive melanoma (IMspire150), RCC (IMmotion151), bladder cancer (IMvigor211), SCLC (IMpower133) and NSCLC (IMpower130, IMpower131, IMpower132, IMpower150 and OAK). Patient characteristics are provided in Extended Data Table 3.

Extended Data Table 2.

Clinical trial information

graphic file with name 41591_2024_3398_Tab2_ESM.jpg

For eight clinical trials, at least one additional drug was treated in addition to atezolizumab (anti- PD-L1): 1) atezolizumab plus bevacizumab for IMbrave150, 2) atezolizumab plus vemurafenib and cobimetinib for IMspire150, 3) atezolizumab plus bevacizumab for IMmotion151, 4) atezolizumab plus carboplatin and etoposide for IMpower133, 5) atezolizumab plus carboplatin and nanoparticle ALB-bound paclitaxel (nab-paclitaxel) for IMpower130, 6) atezolizumab plus carboplatin and nab-paclitaxel or paclitaxel (ACNP or ACP) for IMpower131, 7) atezolizumab plus pemetrexed and carboplatin or cisplatin for IMpower132 and 8) atezolizumab plus carboplatin and paclitaxel with or without bevacizumab (ABCP or ACP) for IMpower150. For two clinical trials, only atezolizumab was administrated: IMvigor211 and OAK. All analyses were performed based on the intention-to-treat principle. Therefore, 12 experimental arms were subjected to the external testing analysis (Extended Data Table 3).

Clinical features of clinical trial cohorts

All patients with baseline laboratory test results were analyzed (results with “Y” flag in the LBBLFL column from the laboratory test file shared by Roche). For the clinical trial data, we imputed missing values using the MissForest from the missingpy package with default parameters after combining each cohort with the training set into a single data frame. Among the 33 features used in SCORPIO, eGFR, mean corpuscular HGB concentration and red blood cell distribution width were unavailable for all clinical trial cohorts. Smoking history was unavailable for the IMspire150 and IMmotion151. In addition, total protein was unavailable for the IMspire150. The average number of missing values across the 33 features per patient in each clinical trial was as follows: 3.50 for IMbrave150, 5.30 for IMspire150, 4.84 for IMmotion151, 3.48 for IMvigor211, 3.54 for IMpower133, 3.45 for IMpower130, 3.31 for IMpower131, 3.44 for IMpower132, 3.28 for IMpower150 and 3.35 for OAK.

In the clinical trial cohorts, PD-L1 immunostaining results using the SP142 or SP263 clones (Ventana Medical Systems) were available. The SP263 clone data was available for IMbrave150 and IMpower133, and the rest of the clinical trials had the SP142 clone data available. The raw PD-L1 immunostaining values from the immune cell (IC) or tumor cell (TC) were available for IMbrave150, IMmotion151 (only raw IC value was available), IMpower133, IMpower130, IMpower131, IMpower132, IMpower150 and OAK. The raw PD-L1 staining values for IC and TC were unavailable for IMspire150 and IMvigor211, but categorical group information based on the PD-L1 staining levels was available (IC0/1/2/3 and TC0/1/2/3). To categorize patients based on the PD-L1 expression level, we applied the FDA-approved cutoffs on the clinical trials with NSCLC (IMpower130, IMpower131, IMpower132, IMpower150 and OAK; PD-L1 expression in ≥ 50% TC or ≥ 10% IC (PD-L1-High group) and < 50% TC and < 10% IC (PD-L1-Low group)) and bladder cancer (IMvigor211; PD-L1 expression in ≥ 5% IC (PD-L1-High group) and < 5% IC (PD-L1-Low group)). In cancer types without the FDA-approved cutoff, we used the same criteria from the original publications: IMbrave150 (PD-L1 expression in ≥ 1% TC or ≥ 1% IC (PD-L1-High group) and < 1% TC and < 1% IC (PD-L1-Low group))20, IMspire150 (PD-L1 expression in ≥ 1% IC (PD-L1-High group) and < 1% IC (PD-L1-Low group))21, IMmotion151 (PD-L1 expression in ≥ 1% IC (PD-L1-High group) and < 1% IC (PD-L1-Low group))22 and IMpower133 (PD-L1 expression in ≥ 5% TC or ≥ 5% IC (PD-L1-High group), ≥ 1% TC or ≥ 1% IC (PD-L1-Mid group), and < 1% TC and < 1% IC (PD-L1-Low group))24.

Outcomes in clinical trial cohorts

Overall survival was defined as the time from randomization to death from any cause. Patients alive at the time of the last follow-up were censored. For the clinical benefit, we used the best confirmed overall response by investigators. CR, PR and SD ≥6 months were classified as clinical benefit whereas SD <6 months and PD were classified as no clinical benefit. In the clinical trial protocols, patients who did not have post-baseline imaging for RECIST v1.1 evaluation (data missing, not available (NA), or not evaluated (NE)) were classified as non-responders, and, therefore in this analysis were categorized in the no clinical benefit group.

Machine learning model construction

Feature selection analysis

In the MSK-I cohort, 68.65% of the patients (n = 1,397) received systemic therapy as a first-line treatment before ICI. Because medications used for systemic therapy can influence the measurement of blood cell counts6365, metabolic compositions66, or BMI67, the impact of the systemic therapy history was investigated first. Using the training set, we first tested if there was any bias in the collected data toward PreChemo. Of the total of 47 features with a missing value of less than 30% across patients in the MSK-I cohort, 7 features (age, sex, chemotherapy during immunotherapy, virus, drug class, smoking and stage) were excluded from this analysis, as they were not affected by the systemic therapy history (Supplementary Table 1). Therefore, we tested the association of 40 features (15 CMP, 23 CBC and two clinical features) with PreChemo. We found that 75.00% (30 out of 40) of the features exhibited significantly different values with respect to PreChemo in the training set (Supplementary Fig. 22). Therefore, multivariable analyses adjusting for PreChemo were performed when selecting the features associated with the two target variables (overall survival and clinical benefit).

Feature selection analyses were performed on the training set. We used a Cochran-Mantel-Haenszel test to find the association between features and clinical benefit (clinical benefit and no clinical benefit). Before applying a Cochran-Mantel-Haenszel test, we dichotomized each continuous feature based on a cutoff by its median value from the training set. We used a Cox proportional-hazard regression to identify the variables associated with overall survival. The continuous values were directly subjected to the analysis in a Cox regression. Both tests were conducted, adjusting for PreChemo as a confounding factor. We selected features with significant false discovery rate (FDR) adjusted P values (< 0.05) for each corresponding outcome. The FDR method was applied separately to the P values from the Cochran-Mantel-Haenszel test and the Cox proportional-hazard regression test. As a result, we identified 22 variables significantly associated with clinical benefit, and 33 variables showed significant associations with overall survival (Supplementary Fig. 4 and Supplementary Table 2).

Model construction

The first step we took when constructing SCORPIO-CB was to train three different classifiers, including one classic classifier (ridge logistic regression (RLR)37) and two machine learning classifiers generally performed best among 179 classifiers68 (support vector machine (SVM)38, and random forest (RF)39), using the 22 variables from the feature selection analysis (Supplementary Fig. 4b). We employed the three classifiers using the scikit-learn69 (v.1.2.2) package. The target outcome, clinical benefit, was coded as a dummy variable: 0 and 1 for no clinical benefit (SD <6 months and PD) and clinical benefit (CR, PR and SD ≥6 months), respectively. For a hyperparameter tuning with a five-fold CV, we used the GridSearchCV function for the SVM and the RF, whereas the LogisticRegressionCV function was applied for the RLR. The optimal hyperparameters for each algorithm were selected based on the highest average AUC value across folds. All tested and optimal hyperparameters of each algorithm are provided (Supplementary Table 3).

For SCORPIO, one classic survival model (ridge cox regression (RCOX)40), and two survival models corresponding to SVM and RF (fast survival SVM (FSSVM)41, and random survival forest (RSF)42), were trained by a hyperparameter tuning with a five-fold CV using the selected 33 variables (Supplementary Fig. 4a). We used the scikit-survival70 (v.0.20.0) package for the three survival models. The target outcome, overall survival, was coded as two fields with the survival time (in months) and the survival status (0 and 1 for censored and deceased, respectively). For a hyperparameter tuning, we used a custom script that performs grid search analysis with five-fold CV. The best hyperparameters were selected based on the highest average C-index value across folds. C-index was calculated by the concordance_index_censored function from the scikit-survival package. All tested and optimal hyperparameter of each algorithm are provided (Supplementary Table 3).

For the RCOX and the RLR, we normalized the feature values using Z-score method before running the algorithms71. The original feature value (xi,j) of a feature (feature i) in the jth sample was transformed (Zi,j) as follows:

Zi,j=xi,jμiσi

where µi and σi denote the average value and the standard deviation value of a given feature (feature i) across the samples in the training set, respectively. For the test sets, the standard deviation and the average values from the training set were used.

To generate unweighted ensemble models, we averaged the risk scores generated by the three algorithms as a previous study18. For SCORPIO, we first applied a min-max normalization before averaging the values, because the three survival models resulted in different scales of risk score: RCOX (from −1.14 to 2.05 in the training set), FSSVM (from −2.83 to −1.07 in the training set) and RSF (from 103.06 to 1,628.80 in the training set). The scaled risk score (risk_scorej) in the jth sample was calculated from the original risk score (risk_scorej) as follows:

risk_scorej=risk_scorejmintrainmaxtrainmintrain

where mintrain and maxtrain represent the minimum risk score and the maximum risk score across the samples in the training set, respectively. This approach transforms the raw risk scores of each survival model into a value from 0 to 1 in the training set.

In the test sets, we calculated a risk_scorej employing the minimum value (mintrain) and the maximum value (maxtrain) from the training set. In each survival model, this step sometimes results in greater than 1 of risk_scorej for patients predicted to show extremely poor response to ICI whose risk_scorej is greater than maxtrain. On the other hand, each survival model sometimes results in less than 0 of risk_scorej for patients predicted to show extremely good response to ICI whose risk_scorej is less than mintrain. For these patients, risk_score was converted as 1 or 0 when they had a value greater than 1 or less than 0, respectively. After we applied the aforementioned normalization step to each survival models, we averaged risk_scorej from three survival models to generate the unweighted risk score in SCORPIO.

In contrast to SCORPIO, we directly calculated the average of the predicted score from the three classifiers for SCORPIO-CB because the classifiers used here generated the output with the same scale of the predicted probability (from 0 to 1).

All analyses related to the model construction were conducted using Python 3.8.8.

Patient stratification and outcome comparison

For the primary analysis of prognosticating clinical outcomes, patients in the test sets were stratified into three risk groups. Patients were stratified using the risk scores according to the first quartile (0.24) and third quartile (0.47) of the risk scores observed in the training set: high-risk group (risk score ≥ 0.47), moderate-risk group (0.24 ≤ risk score < 0.47) and low-risk group (risk score < 0.24). The same cutoff values were used in all cohorts regardless of data source and cancer type. The distribution of risk scores and the number of patients in each risk group in each cohort is provided in Supplementary Fig. 5. To compare clinical benefit rates by the risk group, a Fisher’s exact test was performed. To compare overall survival by the risk group, the Cox proportional hazards regression and the log-rank test were used. A two-sided P < 0.05 was considered statistically significant. Kaplan-Meier plots, log-rank test P values and Cox proportional HRs were generated by the survminer package (v.0.4.9). For the real-world cohorts, we analyzed bladder cancer, hepatobiliary cancer, melanoma, NSCLC, RCC and SCLC separately since they were collected in all our cohorts. The rest of the cancer types were grouped as ‘other’ in each cohort and then was analyzed. All the statistical tests were performed with R programming language version 4.1.1 (https://www.r-project.org/).

Comparing the prognostic performance of the machine learning models with TMB, and PD-L1

In the hold-out test set, we selected the best model between SCORPIO and SCORPIO-CB for subsequent analyses on the MSK-II, clinical trials and MSHS cohorts. For this, we calculated AUC values to measure the performance for clinical benefit classification and AUC(t) values to measure the performance for prognosticating overall survival. We visualized the receiver operating characteristic (ROC) curves and calculated AUC values using the precrec package72 (v.0.14.4). AUC(t) values were calculated using the timeROC package73 (v.0.4).

In the MSKCC cohorts, the predictive power of TMB was also evaluated along with the two machine learning models. In the clinical trial cohorts, we also evaluated the predictive power of PD-L1 staining when raw immunostaining values from the IC or TC were available.

All analyses regarding the AUC and AUC(t) were performed with continuous values.

Comparing the performance of SCORPIO and other machine learning models

We compared SCORPIO’s performance with previously developed machine learning models for predicting ICI efficacy in patients with NSCLC43. The study by Vanguri et al.43 included 26 models across nine data categories: clinical, radiology, pathology, genomics, dynamic deep attention-based multiple-instance learning model with masking (DyAM) unimodal, DyAM bimodal, DyAM multi-modal (automated), DyAM multi-modal (with PD-L1 tumor proportion score) and multi-modal average. Each category’s best-performing model was subjected to the analysis. For a fair comparison with SCORPIO, we used data from 150 out of 237 patients from Vanguri et al., ensuring model scores from all nine models were available. We obtained model scores from “Source Data Extended Data Fig. 9” of Vanguri et al.’s publication. We re-evaluated overall survival and clinical benefit using RECIST v1.1 criteria, which is consistent with our study. Three patients with concurrent cancer diagnoses were included in analyses of index tumor response but not included in analyses for survival outcomes.

For visualization of ROC curves and calculation of AUC values, we used the precrec package. AUC(t) values were calculated using the timeROC package. All AUC and AUC(t) analyses were conducted with continuous values.

Model interpretability

Global model explanation

In the training set, we applied SHAP method44 (v.0.44.1) to examine the magnitude of the relative importance and the direction of the impact of each feature in SCORPIO (Fig. 4a). Two different explainer functions were applied in this study: the Explainer function for the RCOX and the RSF, and the KernelExplainer function for the FSSVM. To demonstrate the relative importance and the direction of impact of each variable in the ensemble models, we generated the aggregated SHAP values across three algorithms which form SCORPIO (RCOX, FSSVM and RSF) (Supplementary Fig. 23a). Since the above three survival models have different scales of risk score and SHAP value, we had to normalize the SHAP values before generating the aggregative ones to avoid a biased result.

First, the mean of the |SHAP| value of all samples across the training set, which displays the average impact of a feature (feature i) on model output, (SHAP¯i,train), was scaled by a min-max normalization as follows (Step 1):

SHAP¯i,train=SHAP¯i,trainmintrainmaxtrainmintrain

where mintrain and maxtrain denote the minimum SHAP¯train value and the maximum |SHAP¯|train value across 33 features in the training set, respectively. As a result, the average impact of a feature on model output (SHAP¯train) gets transformed into a decimal from 0 to 1 (|SHAP¯|i,train).

Second, the min-max scaled values (|SHAP¯|i,train) were transformed into negative values when variables had negative direction of impact on the predicted risk score. The direction of impact was measured by the Spearman’s correlation coefficient between the original feature values and the SHAP values using all samples in the training set (Step 2). When the original feature values and the SHAP values in a given feature (feature i) had a negative correlation coefficient, this feature thus has a negative impact on the predicted risk score. The aforementioned two steps were individually applied to each survival algorithm (Supplementary Fig. 23b). Third, the average of ±SHAP¯i,train from the three survival algorithms were calculated as the aggregated SHAP values for SCORPIO (Fig. 4a).

Local model explanation

To generate the aggregated SHAP values for SCORPIO at a patient level (Fig. 4b–e), a similar approach was taken as the global level analysis. We first calculated the min-max scaled SHAP (SHAP) values for a feature (feature i) in the jth patient as follows (Step 1):

|SHAP|i,j=|SHAP|i,jminjmaxjminj

where minj and maxj denote the minimum |SHAP| value and the maximum |SHAP| value across the 33 features in the jth patient, respectively.

The patient level SHAP values already provide the directions of effect as negative or positive values with their raw values. Hence, calculating Spearman’s correlation coefficient between the original feature values and the SHAP values, which was performed in the global model explanation, was not required in the local model explanation. Instead, we transformed the min-max normalized SHAP value of a feature (feature i), SHAPi,j, into the negative value when its original SHAPi, j was negative (Step 2). These approaches were individually applied to the three different survival models. After the aforementioned two steps, we calculated the average of ±SHAPi,j from the three survival models for the aggregated SHAP values at a patient level.

Bulk RNA-seq analysis

RNA was isolated from formalin-fixed paraffin-embedded tumor samples of NSCLC. Bulk RNA-seq was performed using the Tempus xT RNA-seq protocol74, which involves exome capture with IDT xGen probes covering over 19,000 genes, requiring at least 50 ng of RNA for library construction. Sequencing was done to a minimum depth of 30 million reads on a NovaSeq 6000. Transcript abundances in transcripts per million (TPM) values were derived using Kallisto75 (v.0.44.0) pseudoalignments to Ensembl GRCh37 (Release 75). Gene-level TPM values were obtained by summing the transcript-level TPM of 20,061 genes with at least one annotated protein-coding transcript covered by the assay; then, the values were log2(TPM + 1)-transformed. Batch correction was applied for samples sequenced with different probe designs by limma76 (v.3.54.2).

For the H&N cancer samples, bulk RNA-seq reads were aligned against the hg19 reference genome by STAR77 (v.2.5.3a) 2-pass alignment. Raw read counts were computed using GenomicAlignments78 (v.1.14.2) over aligned reads with UCSC KnownGene79 in hg19 as the base gene model. The Union counting mode was used and only mapped paired reads after alignment quality filtering were considered. Finally, fragments per kilobase of transcript per million mapped reads (FPKM) values were computed by DESeq2 (ref. 80) (v.1.18.1).

We employed the Danaher signature45 to deconvolute 14 immune cell compositions from the two cohorts with bulk RNA-seq. For the NSCLC and H&N cancer cohorts, we used limma-corrected log2(TPM + 1) and log2(FPKM + 1) values, respectively. The expression values of marker genes for each cell type were averaged following the methodology described in the original paper45.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-024-03398-5.

Supplementary information

Supplementary Information (27.9MB, pdf)

Supplementary Figs. 1–23 and Tables 1–3.

Reporting Summary (2MB, pdf)

Acknowledgements

We are grateful to our patients and their families for their bravery and support of cancer research. We thank members of the Chowell Lab and Morris Lab for helpful discussions. This study was supported in part by NIH R01 DE027738, the Department of Defense Peer Reviewed Cancer Research Program and Rare Cancer Research Program, The Geoffrey Beene Cancer Research Center, Cycle for Survival and Cycle for Survival: Team Fearless4Jen, The Jayme and Peter Flowers Fund, the Sebastian Nativo Fund (to L.G.T.M.) and the NIH/NCI Cancer Center Support Grant P30 CA008748 (institutional, to MSKCC). This work was supported in part by the Alexander and Alexandrine Sinsheimer Foundation (D.C.), Department of Defense (ME220130) (D.C.), the Melanoma Research Alliance (D.C.), US NIH grant R01 CA283469 (R.M.S. and D.C.), the NIH Cancer Target Discovery and Development (CTD2) Network (U01CA282114) (R.M.S., M.M., D.C. and L.G.T.M.). This research was supported in part by the Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Education (2022R1A6A3A03066899) (S.-K.Y.). S.G. acknowledges funding from U24 CA224319, U01 DK124165 and from the Mount Sinai Tisch Cancer Institute Cancer Center NCI Core Grant P30 CA196521. This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant ULTR004419 from the National Center for Advancing Translational Sciences. This publication is based on research using data from data contributors Roche that has been made available through Vivli, Inc. Vivli has not contributed to or approved, and is not in any way responsible for, the contents of this publication. The funders had no role in study design, data collection and analysis, decision to publish, or manuscript preparation.

Extended data

Author contributions

S.-K.Y., C.W.F., B.A.C., L.G.T.M. and D.C. conceived and designed the study. S.-K.Y., C.W.F., B.A.C., B.G.F., L.G.T.M. and D.C. wrote the manuscript. S.-K.Y. and B.A.C. developed the machine learning models. S.-K.Y., C.W.F., B.A.C., B.G.F., C.H., E.S.K., A.P., H.S., F.C., M.R.K., N.D., Y.L., C.V., M.L., J.L.V., A.S.L., K.Z., S.L., E.O., F.K., E.A.W., P.H., C.H., M.S., L.V., A.A.H., B.B., M.M., S.G., N.B., M.D.G., E.E.S., R.M.S., T.U.M., M.G., L.G.T.M. and D.C. acquired, analyzed or interpreted the data. M.G. provided statistical advice. All authors critically revised the manuscript for intellectual content. L.G.T.M. and D.C. supervised the study.

Peer review

Peer review information

Nature Medicine thanks Vassiliki Boussiotis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Lorenzo Righetto and Ulrike Harjes, in collaboration with the Nature Medicine team.

Data availability

Individual-level patient data from the real-world datasets used in this study are not publicly available due to the number of data features drawn from the clinical testing performed as part of routine care, which could compromise the privacy of research participants. These data will be made available to researchers upon request from the corresponding authors (D.C. and L.G.T.M.) and execution of a data transfer agreement as required by the Institutional Review Boards of the authors’ institutions. Requests will be reviewed and approved within 1 month. All data for the clinical trial cohorts are available at Vivli (https://vivli.org/). The anonymized individual participant-level data shared on Vivli is accessible only within a secure research environment. Researchers interested in accessing these data must submit a detailed research proposal for approval, either by an Independent Review Committee or the original data contributors. Comprehensive guidelines and instructions for requesting access to clinical trial data can be found on the Vivli website.

Code availability

The code required to run SCORPIO is available on Zenodo (10.5281/zenodo.13646737). Model files in pickle format, necessary for running SCORPIO, will be provided upon approval of a request for replicating the study results and the signing data access agreement. Code requests should be sent via email to the corresponding authors (D.C. and L.G.T.M.).

Competing interests

S.-K.Y., B.A.C., C.W.F., C.H., L.G.T.M. and D.C. have a provisional patent application for using routine blood tests and clinical variables to predict cancer immunotherapy response. D.C., R.M.S. and L.G.T.M. are co-inventors on a patent (US11230599/EP4226944A3) filed by MSKCC for using TMB to predict immunotherapy response, licensed to Personal Genome Diagnostics (PGDx). S.-K.Y., C.V., L.G.T.M. and D.C. are co-inventors on a patent (US20240282410A1) filed jointly by Cleveland Clinic and MSKCC for a multi-modal machine learning model to predict immunotherapy response, licensed to Tempus. S.G. reports grants from Boehringer Ingelheim, Bristol Myers Squibb, Celgene, Genentech, Regeneron and Takeda not related to this study and personal fees from Taiho outside the submitted work. M.D.G reports grants from Bristol Myers Squibb, AstraZeneca, Merck and Genentech, and serves as an advisory board/consultant for Astellas, Bristol Myers Squibb, Merck, Genentech, AstraZeneca, Pfizer, EMD Serono, SeaGen, Janssen, Numab, Dragonfly, GlaxoSmithKline, Basilea, UroGen, Rappta Therapeutics, Alligator, Silverback, Fujifilm, Curis, Gilead, Bicycle, Asieris, Abbvie, Analog Devices, Veracyte, Daiichi and Aktis. E.E.S is an executive officer at Pathos, a clinical-stage oncology drug development and information company, and owns equity in this company. The other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Seong-Keun Yoo, Conall W. Fitzgerald, Byuri Angela Cho, Bailey G. Fitzgerald.

Contributor Information

Luc G. T. Morris, Email: morrisl@mskcc.org

Diego Chowell, Email: diego.chowell@mssm.edu.

Extended data

is available for this paper at 10.1038/s41591-024-03398-5.

Supplementary information

The online version contains supplementary material available at 10.1038/s41591-024-03398-5.

References

  • 1.Boussiotis, V. A. Molecular and biochemical aspects of the PD-1 checkpoint pathway. N. Engl. J. Med.375, 1767–1778 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Havel, J. J., Chowell, D. & Chan, T. A. The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy. Nat. Rev. Cancer19, 133–150 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Topalian, S. L., Taube, J. M., Anders, R. A. & Pardoll, D. M. Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy. Nat. Rev. Cancer16, 275–287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Subbiah, V., Solit, D. B., Chan, T. A. & Kurzrock, R. The FDA approval of pembrolizumab for adult and pediatric patients with tumor mutational burden (TMB) ≥10: a decision centered on empowering patients and their physicians. Ann. Oncol.31, 1115–1118 (2020). [DOI] [PubMed] [Google Scholar]
  • 5.Samstein, R. M. et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet.51, 202–206 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Valero, C. et al. Response rates to anti-PD-1 immunotherapy in microsatellite-stable solid tumors with 10 or more mutations per megabase. JAMA Oncol.7, 739–743 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chowell, D. et al. Improved prediction of immune checkpoint blockade efficacy across multiple cancer types. Nat. Biotechnol.40, 499–506 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet.53, 185–194 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cohen, N. M. et al. Personalized lab test models to quantify disease potentials in healthy individuals. Nat. Med.27, 1582–1591 (2021). [DOI] [PubMed] [Google Scholar]
  • 10.McQuade, J. L. et al. Association of body-mass index and outcomes in patients with metastatic melanoma treated with targeted therapy, immunotherapy, or chemotherapy: a retrospective, multicohort analysis. Lancet Oncol.19, 310–322 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yoo, S. K., Chowell, D., Valero, C., Morris, L. G. T. & Chan, T. A. Outcomes among patients with or without obesity and with cancer following treatment with immune checkpoint blockade. JAMA Netw. Open5, e220448 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Valero, C. et al. Pretreatment neutrophil-to-lymphocyte ratio and mutational burden as biomarkers of tumor response to immune checkpoint inhibitors. Nat. Commun.12, 729 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yoo, S. K., Chowell, D., Valero, C., Morris, L. G. T. & Chan, T. A. Pre-treatment serum albumin and mutational burden as biomarkers of response to immune checkpoint blockade. npj Precis. Oncol.6, 23 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Obermeyer, Z. & Emanuel, E. J. Predicting the future: big data, machine learning, and clinical medicine. N. Engl. J. Med.375, 1216–1219 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med.380, 1347–1358 (2019). [DOI] [PubMed] [Google Scholar]
  • 16.Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med.25, 44–56 (2019). [DOI] [PubMed] [Google Scholar]
  • 17.Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature598, 348–352 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sammut, S. J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature601, 623–629 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chang, T. G. et al. LORIS robustly predicts patient outcomes with immune checkpoint blockade therapy using common clinical, pathologic and genomic features. Nat. Cancer5, 1158–1175 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cheng, A. L. et al. Updated efficacy and safety data from IMbrave150: atezolizumab plus bevacizumab vs. sorafenib for unresectable hepatocellular carcinoma. J. Hepatol.76, 862–873 (2022). [DOI] [PubMed] [Google Scholar]
  • 21.Gutzmer, R. et al. Atezolizumab, vemurafenib, and cobimetinib as first-line treatment for unresectable advanced BRAF(V600) mutation-positive melanoma (IMspire150): primary analysis of the randomised, double-blind, placebo-controlled, phase 3 trial. Lancet395, 1835–1844 (2020). [DOI] [PubMed] [Google Scholar]
  • 22.Rini, B. I. et al. Atezolizumab plus bevacizumab versus sunitinib in patients with previously untreated metastatic renal cell carcinoma (IMmotion151): a multicentre, open-label, phase 3, randomised controlled trial. Lancet393, 2404–2415 (2019). [DOI] [PubMed] [Google Scholar]
  • 23.van der Heijden, M. S. et al. Atezolizumab versus chemotherapy in patients with platinum-treated locally advanced or metastatic urothelial carcinoma: a long-term overall survival and safety update from the phase 3 IMvigor211 clinical trial. Eur. Urol.80, 7–11 (2021). [DOI] [PubMed] [Google Scholar]
  • 24.Liu, S. V. et al. Updated overall survival and PD-L1 subgroup analysis of patients with extensive-stage small-cell lung cancer treated with atezolizumab, carboplatin, and etoposide (IMpower133). J. Clin. Oncol.39, 619–630 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.West, H. et al. Atezolizumab in combination with carboplatin plus nab-paclitaxel chemotherapy compared with chemotherapy alone as first-line treatment for metastatic non-squamous non-small-cell lung cancer (IMpower130): a multicentre, randomised, open-label, phase 3 trial. Lancet Oncol.20, 924–937 (2019). [DOI] [PubMed] [Google Scholar]
  • 26.Jotte, R. et al. Atezolizumab in combination with carboplatin and nab-paclitaxel in advanced squamous NSCLC (IMpower131): results from a randomized phase III trial. J. Thorac. Oncol.15, 1351–1360 (2020). [DOI] [PubMed] [Google Scholar]
  • 27.Nishio, M. et al. Atezolizumab plus chemotherapy for first-line treatment of nonsquamous NSCLC: results from the randomized phase 3 IMpower132 trial. J. Thorac. Oncol.16, 653–664 (2021). [DOI] [PubMed] [Google Scholar]
  • 28.Socinski, M. A. et al. Atezolizumab for first-line treatment of metastatic nonsquamous NSCLC. N. Engl. J. Med.378, 2288–2301 (2018). [DOI] [PubMed] [Google Scholar]
  • 29.Mazieres, J. et al. Atezolizumab versus docetaxel in pretreated patients with NSCLC: final results from the randomized phase 2 POPLAR and phase 3 OAK clinical trials. J. Thorac. Oncol.16, 140–150 (2021). [DOI] [PubMed] [Google Scholar]
  • 30.Valero, C. et al. The association between tumor mutational burden and prognosis is dependent on treatment context. Nat. Genet.53, 11–15 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cheng, D. T. et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J. Mol. Diagn.17, 251–264 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rizvi, H. et al. Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing. J. Clin. Oncol.36, 633–641 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Aggarwal, C. et al. Baseline plasma tumor mutation burden predicts response to pembrolizumab-based therapy in patients with metastatic non-small cell lung cancer. Clin. Cancer Res.26, 2354–2361 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Alborelli, I. et al. Tumor mutational burden assessed by targeted NGS predicts clinical benefit from immune checkpoint inhibitors in non-small cell lung cancer. J. Pathol.250, 19–29 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Luo, J. et al. Deciphering radiological stable disease to immune checkpoint inhibitors. Ann. Oncol.33, 824–835 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Eisenhauer, E. A. et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur. J. Cancer45, 228–247 (2009). [DOI] [PubMed] [Google Scholar]
  • 37.Le Cessie, S. & Van Houwelingen, J. C. Ridge estimators in logistic regression. J. R. Stat. Soc. Ser. C. Appl. Stat.41, 191–201 (1992). [Google Scholar]
  • 38.Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn.20, 273–297 (1995). [Google Scholar]
  • 39.Breiman, L. Random Forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
  • 40.Verweij, P. J. M. & Van Houwelingen, H. C. Penalized likelihood in Cox regression. Stat. Med.13, 2427–2436 (1994). [DOI] [PubMed] [Google Scholar]
  • 41.Polsterl, S., Navab, N. & Katouzian, A. Fast training of support vector machines for survival analysis. Lect. Notes Artif. Int. 9285, 243–259 (2015). [Google Scholar]
  • 42.Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann. Appl. Stat.2, 841–860 (2008). [Google Scholar]
  • 43.Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer3, 1151–1164 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (Neural Information Processing Systems Foundation, Inc., 2017).
  • 45.Danaher, P. et al. Gene expression markers of tumor infiltrating leukocytes. J. Immunother. Cancer5, 18 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature567, 479–485 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mino-Kenudson, M. et al. The International Association for the Study of Lung Cancer Global Survey on Programmed Death-Ligand 1 Testing for NSCLC. J. Thorac. Oncol.16, 686–696 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sharma, K. et al. Advancing oncology drug therapies for sub-Saharan Africa. PLOS Glob. Public Health3, e0001653 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Vega, D. M. et al. Aligning tumor mutational burden (TMB) quantification across diagnostic platforms: phase II of the Friends of Cancer Research TMB Harmonization Project. Ann. Oncol.32, 1626–1636 (2021). [DOI] [PubMed] [Google Scholar]
  • 50.Nassar, A. H. et al. Ancestry-driven recalibration of tumor mutational burden and disparate clinical outcomes in response to immune checkpoint inhibitors. Cancer Cell40, 1161–1172 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Delgado, A. & Guddati, A. K. Clinical endpoints in oncology: a primer. Am. J. Cancer Res.11, 1121–1131 (2021). [PMC free article] [PubMed] [Google Scholar]
  • 52.Driscoll, J. J. & Rixe, O. Overall survival: still the gold standard: why overall survival remains the definitive end point in cancer clinical trials. Cancer J.15, 401–405 (2009). [DOI] [PubMed] [Google Scholar]
  • 53.Kim, C. & Prasad, V. Cancer drugs approved on the basis of a surrogate end point and subsequent overall survival: an analysis of 5 years of US Food and Drug Administration approvals. JAMA Intern. Med. 175, 1992–1994 (2015). [DOI] [PubMed] [Google Scholar]
  • 54.D’Agostino, R. B. Sr. Changing end points in breast-cancer drug approval—the Avastin story. N. Engl. J. Med.365, e2 (2011). [DOI] [PubMed]
  • 55.Sekeres, M. A. The avastin story. N. Engl. J. Med.365, 1454–1455 (2011). [DOI] [PubMed] [Google Scholar]
  • 56.Wolchok, J. D. et al. Guidelines for the evaluation of immune therapy activity in solid tumors: immune-related response criteria. Clin. Cancer Res.15, 7412–7420 (2009). [DOI] [PubMed] [Google Scholar]
  • 57.Hodi, F. S. et al. Evaluation of immune-related response criteria and RECIST v1.1 in patients with advanced melanoma treated with pembrolizumab. J. Clin. Oncol.34, 1510–1517 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Kaufman, H. L. et al. Evaluation of classical clinical endpoints as surrogates for overall survival in patients treated with immune checkpoint blockers: a systematic review and meta-analysis. J. Cancer Res. Clin. Oncol.144, 2245–2261 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Mushti, S. L., Mulkey, F. & Sridhara, R. Evaluation of overall response rate and progression-free survival as potential surrogate endpoints for overall survival in immunotherapy trials. Clin. Cancer Res.24, 2268–2275 (2018). [DOI] [PubMed] [Google Scholar]
  • 60.Amin, M. B. et al. The Eighth Edition AJCC Cancer Staging Manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J. Clin.67, 93–99 (2017). [DOI] [PubMed] [Google Scholar]
  • 61.Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics30, 1015–1016 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Stekhoven, D. J. & Buhlmann, P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics28, 112–118 (2012). [DOI] [PubMed] [Google Scholar]
  • 63.Groopman, J. E. & Itri, L. M. Chemotherapy-induced anemia in adults: incidence and treatment. J. Natl Cancer Inst.91, 1616–1634 (1999). [DOI] [PubMed] [Google Scholar]
  • 64.Crawford, J., Dale, D. C. & Lyman, G. H. Chemotherapy-induced neutropenia: risks, consequences, and new directions for its management. Cancer100, 228–237 (2004). [DOI] [PubMed] [Google Scholar]
  • 65.Wang, Y., Probin, V. & Zhou, D. Cancer therapy-induced residual bone marrow injury: mechanisms of induction and implication for therapy. Curr. Cancer Ther. Rev.2, 271–279 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ramadori, G. & Cameron, S. Effects of systemic chemotherapy on the liver. Ann. Hepatol.9, 133–143 (2010). [PubMed] [Google Scholar]
  • 67.Ryan, A. M., Prado, C. M., Sullivan, E. S., Power, D. G. & Daly, L. E. Effects of weight loss and sarcopenia on response to chemotherapy, quality of life, and survival. Nutrition67-68, 110539 (2019). [DOI] [PubMed] [Google Scholar]
  • 68.Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res.15, 3133–3181 (2014). [Google Scholar]
  • 69.Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn Res.12, 2825–2830 (2011). [Google Scholar]
  • 70.Pölsterl, S. scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J. Mach. Learn. Res.21, Article 212 (2020). [Google Scholar]
  • 71.Marquardt, D. W. Comment: you should standardize the predictor variables in your regression models. J. Am. Stat. Assoc.75, 87–91 (1980). [Google Scholar]
  • 72.Saito, T. & Rehmsmeier, M. Precrec: fast and accurate precision-recall and ROC curve calculations in R. Bioinformatics33, 145–147 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Blanche, P., Dartigues, J. F. & Jacqmin-Gadda, H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat. Med.32, 5381–5397 (2013). [DOI] [PubMed] [Google Scholar]
  • 74.Beaubier, N. et al. Clinical validation of the tempus xT next-generation targeted oncology sequencing assay. Oncotarget10, 2384–2396 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol.34, 525–527 (2016). [DOI] [PubMed] [Google Scholar]
  • 76.Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol.9, e1003118 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Rosenbloom, K. R. et al. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res.43, D670–D681 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol.15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (27.9MB, pdf)

Supplementary Figs. 1–23 and Tables 1–3.

Reporting Summary (2MB, pdf)

Data Availability Statement

Individual-level patient data from the real-world datasets used in this study are not publicly available due to the number of data features drawn from the clinical testing performed as part of routine care, which could compromise the privacy of research participants. These data will be made available to researchers upon request from the corresponding authors (D.C. and L.G.T.M.) and execution of a data transfer agreement as required by the Institutional Review Boards of the authors’ institutions. Requests will be reviewed and approved within 1 month. All data for the clinical trial cohorts are available at Vivli (https://vivli.org/). The anonymized individual participant-level data shared on Vivli is accessible only within a secure research environment. Researchers interested in accessing these data must submit a detailed research proposal for approval, either by an Independent Review Committee or the original data contributors. Comprehensive guidelines and instructions for requesting access to clinical trial data can be found on the Vivli website.

The code required to run SCORPIO is available on Zenodo (10.5281/zenodo.13646737). Model files in pickle format, necessary for running SCORPIO, will be provided upon approval of a request for replicating the study results and the signing data access agreement. Code requests should be sent via email to the corresponding authors (D.C. and L.G.T.M.).


Articles from Nature Medicine are provided here courtesy of Nature Publishing Group

RESOURCES