Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2025 Jan 3;9:e2400157. doi: 10.1200/CCI-24-00157

Explainable Machine Learning to Predict Treatment Response in Advanced Non–Small Cell Lung Cancer

Vinayak S Ahluwalia 1,2, Ravi B Parikh 3,4,
PMCID: PMC11706357  NIHMSID: NIHMS2036478  PMID: 39752605

Abstract

PURPOSE

Immune checkpoint inhibitors (ICIs) have demonstrated promise in the treatment of various cancers. Single-drug ICI therapy (immuno-oncology [IO] monotherapy) that targets PD-L1 is the standard of care in patients with advanced non–small cell lung cancer (NSCLC) with PD-L1 expression ≥50%. We sought to find out if a machine learning (ML) algorithm can perform better as a predictive biomarker than PD-L1 alone.

METHODS

Using a real-world, nationwide electronic health record–derived deidentified database of 38,048 patients with advanced NSCLC, we trained binary prediction algorithms to predict likelihood of 12-month progression-free survival (PFS; 12-month PFS) and 12-month overall survival (OS; 12-month OS) from initiation of first-line therapy. We evaluated the algorithms by calculating the AUC on the test set. We plotted Kaplan-Meier curves and fit Cox survival models comparing survival between patients who were classified as low-risk (LR) for 12-month disease progression or 12-month mortality versus those classified as high-risk.

RESULTS

The ML algorithms achieved an AUC of 0.701 (95% CI, 0.689 to 0.714) and 0.718 (95% CI, 0.707 to 0.730) for 12-month PFS and 12-month OS, respectively. Patients in the LR group had lower 12-month disease progression (hazard ratio [HR], 0.47 [95% CI, 0.45 to 0.50]; P < .001) and 12-month all-cause mortality (HR, 0.31 [95% CI, 0.29 to 0.34]; P < .0001) compared with the high-risk group. Patients deemed LR for disease progression and mortality on IO monotherapy were less likely to progress (HR, 0.53 [95% CI, 0.46 to 0.61]; P < .0001) or die (HR, 0.30 [95% CI, 0.24 to 0.37]; P < .001) compared with the high-risk group.

CONCLUSION

An ML algorithm can more accurately predict response to first-line therapy, including IO monotherapy, in patients with advanced NSCLC, compared with PD-L1 alone. ML may better aid clinical decision making in oncology than a single biomarker.

INTRODUCTION

Immune checkpoint inhibitor (ICI) therapies have shown promise in treating various cancers, particularly in patients with advanced non–small cell lung cancer (NSCLC). Between 24% and 60% of patients with NSCLC are positive for the PD-L1 biomarker,1 which may qualify them for treatment with first-line ICI monotherapy (immuno-oncology [IO] monotherapy) such as pembrolizumab, as opposed to platinum-based chemotherapy (CT).2 The current standard of care is to treat patients with IO monotherapy whose tumor cells or tumor-infiltrating cells have a staining intensity of at least 50% for PD-L1.3-7 While pembrolizumab monotherapy is technically approved for patients with PD-L1 between 1% and 50%, combination therapy with pembrolizumab and CT is preferred when feasible.8-10 A two-drug immunotherapy regimen, nivolumab plus ipilimumab, is also approved for this patient population if PD-L1 expression is at least 1%. While nivolumab plus ipilimumab was shown to increase median overall survival (OS) relative to CT, treatment discontinuation secondary to side effects was more common in the immunotherapy group, demonstrating that PD-L1/CTLA4 immunotherapy is not benign.11,12

CONTEXT

  • Key Objective

  • Can a machine learning (ML) algorithm outperform PD-L1 as a predictive biomarker for response to first-line therapy in patients with advanced non–small cell lung cancer (NSCLC)?

  • Knowledge Generated

  • Using a real-world database, we trained ML algorithms that predict likelihood of 12-month progression-free survival (PFS; 12-month PFS) and 12-month overall survival (OS; 12-month OS) for patients with advanced NSCLC. The algorithms achieved an area under the receiver operating curve of 0.70 (95% CI, 0.69 to 0.71) and 0.72 (95% CI, 0.70 to 0.73) for 12-month PFS and 12-month OS, respectively.

  • Relevance

  • The reported algorithm has fair performance on a highly relevant clinical task with possible implications for clinical practice. Further improvements in performance would be necessary before clinical adoption, but this is an important step forward in using routinely collected clinical data to inform treatment effectiveness.

Garon et al4 prospectively validated the PD-L1 cutoff of 50% for patients receiving pembrolizumab monotherapy, demonstrating a response rate of 45.2% in that subgroup, as opposed to 19.4% in the entire cohort. Similarly, patients with PD-L1 over 50% had a response rate of 44.8% to pembrolizumab monotherapy as opposed to CT.6 One meta-analysis has shown that PD-L1 status has an AUC of 0.63 (95% CI, 0.61 to 0.63) for predicting patient response to IO monotherapy.13 These findings imply that while the PD-L1 threshold has utility, approximately half of patients who meet the PD-L1 threshold will still experience disease progression on IO monotherapy, indicating a need for improvement. Exploring different definitions for biomarker positivity and developing better predictors of treatment response may better allocate IO monotherapy to the patients who would benefit the most while sparing patients who would not benefit from its side effects.

Given the promise that artificial intelligence (AI) and machine learning (ML) demonstrate in the treatment and prognostication of cancer,14-17 it is likely that data-driven statistical models that incorporate temporal data can better inform clinical decision making related to immunotherapy allocation than an individual biomarker alone such as PD-L1.18 One study used ML to predict response to IO monotherapy in patients with advanced NSCLC using tumor-infiltrating lymphocyte scoring although the cohort was small (n = 685).19 Our objective was to create a novel ML algorithm that can serve as a biomarker to predict likelihood of response to various first-line therapies in patients with advanced NSCLC. We hypothesized that this new ML algorithm would outperform PD-L1 alone as a predictive biomarker in terms of AUC on an independent test set.

METHODS

We conducted a two-phase cross-sectional study. In phase I, we developed an ML algorithm that serves as a biomarker for treatment response in patients with advanced NSCLC. In phase II, we performed a comparative effectiveness study using propensity score matching (PSM), which emulates pseudo-randomization, to establish whether the ML model is a predictive biomarker.

Data Source

This study was institutional review board–exempt at the University of Pennsylvania. This study used the nationwide Flatiron Health electronic health record–derived deidentified database. The Flatiron Health database is a longitudinal database, comprising deidentified patient-level structured and unstructured data, curated via technology-enabled abstraction.20,21 The study included patients diagnosed with advanced NSCLC from January 1, 2011, to May 31, 2024. During the study period, the deidentified data originated from approximately 280 cancer clinics (approximately 800 sites of care). The majority of patients in the database originate from community oncology settings. We adhered to the TRIPOD checklist for the reporting of results from prediction models.22

Inclusion Criteria

We acquired patient data from the real-world database if they were diagnosed with advanced NSCLC on or after January 1, 2011, were at least 18 years old at advanced diagnosis, and had at least two documented clinical visits after diagnosis. Moreover, these patients' tumors were stage IIIB, IIIC, or IV at initial cancer diagnosis or had an earlier stage at initial diagnosis but experienced recurrence or progression after January 1, 2011.

Data Extraction

In ML model training and statistical analysis, only patients who received first-line therapy within 365 days of advanced diagnosis were included. If the date of death was available, we calculated the number of days from treatment initiation to death. If there was recorded evidence of cancer progression as defined as radiologic, pathologic, or clinical progression, we similarly calculated the number of days from advanced diagnosis to the first evidence of progression. Progression was recorded as early as February 2011 and as late as May 2024. A designation of mixed response or pseudo-progression was not considered progression, but we noted patients who experienced mixed response or pseudo-progression before radiologic, pathologic, or clinical progression. Patients were excluded if the time between the date of first-line treatment initiation and the date of their last available clinic note was lower than 365 days.

We extracted basic demographic information for each patient including birth year, sex, race, ethnicity, US state where treatment was provided, and age (in years) at the date of advanced diagnosis. Other relevant information about a patient included insurance status, cancer histology, cancer staging, smoking status (never smoker, history of smoking), Eastern Cooperative Oncology Group23 status, and the results of other biomarker testing, if available, including, but not limited to, PD-L1. If a PD-L1 status was available, it was recorded as a value corresponding to its percent staining (Data Supplement, Table S1). If PD-L1 staining information was not available, an interpretation of PD-L1 low expression was represented as a 0.01 and PD-L1 high expression was a 0.5. Patients who were ALK- or EGFR-positive were excluded. All variables that were available for extraction were included in the model.

Abnormal renal function was classified as having an estimated glomerular filtration rate of <60 (calculated using the CDK-EPI formula24) or a diagnosis of renal disease before treatment initiation. Abnormal liver function was defined as having a total bilirubin of 2.0 or more, AST ≥109, ALT ≥97,25 or a diagnosis of liver disease before treatment. Indicator variables representing insurance status during cancer treatment were also used (commercial health plan, Medicaid, Medicare, other government program, patient assistance program, self-pay, workers' compensation).

Treatment with IO monotherapy was defined as first-line treatment with nivolumab, pembrolizumab, cemiplimab, atezolizumab, durvalumab, or nivolumab/ipilimumab without concurrent CT or another cancer-related immunotherapy. Treatment with CT was defined as receiving first-line cisplatin or carboplatin with concurrent vinorelbine, docetaxel, gemcitabine, pemetrexed, or paclitaxel. Combination therapy was defined as first-line immunotherapy as defined above with concurrent cisplatin or carboplatin (Data Supplement, Table S2).

Training of ML Models

We created two ML models, one to predict 12-month progression-free survival (PFS) and the other to predict 12-month OS with multiple features (Data Supplement, Table S3). The treatment received is included as a feature in the training data (ie, first-line CT, IO monotherapy, combination therapy, etc). Given the variety of available ML models, we trained XGBoost, Gradient Boost, and random forest classifiers using the sklearn library in Python 3.11, picking the model that yielded the highest AUC on the test set as the final model. The cohort was divided into training (80%) and test (20%). Each model was trained using five-fold cross-validation, and hyperparameter tuning was completed using a grid-search methodology. No information that was obtained after initiation of first-line therapy was included in the training or testing sets. Final model parameters were selected on the basis of the highest AUC during five-fold cross-validation.

Statistical Analysis

We constructed Kaplan-Meier curves and fit Cox survival models demonstrating 12-month PFS for patients in the test set who received first-line CT or first-line IO monotherapy. In these graphs, we stratify by predicted response to first-line therapy, with responsive patients being those deemed by the ML algorithm to be at low risk (LR) for disease progression (as determined by the ML algorithm) and unresponsive patients being those deemed to be at high risk for disease progression.26 The threshold we selected for classifying predictions as high risk or LR relied on the Youden index, which selects the prediction threshold that optimizes the biomarker's differentiating ability when equal weight is given to sensitivity and specificity.27,28 We performed an analogous analysis for 12-month OS.

To enhance the explainability of the ML models, we use Shapley plots,29 which provide interpretability regarding how the ML classifier makes its predictions. The plots elucidate the features that are most impactful to the model's final prediction. We then remove the five most impactful features according to the Shapley plots and retrain the models to determine the drop in predictive power (Data Supplement, Table S4).

Statistical analysis post-ML model development and training was performed using Stata 18.0 (StataCorp, College Station, TX). We calculated the AUC using the predictions from the ML algorithm on the independent test set, with confidence intervals derived from 1,000 bootstraps.

Comparative Effectiveness Study

We conducted a comparative effectiveness study to determine if the predictions from the ML based-biomarker are indeed a predictive biomarker rather than just a prognostic biomarker.30 We performed PSM between patients who received IO monotherapy and those who received CT in the test set. Specifically, we performed 1:1 PSM without replacement, prespecifying a caliper of 0.2. The Data Supplement (Fig S1) contains a directed acyclic graph that illustrates confounders of the relationship between the treatment received and the outcome of interest. After PSM, we constructed Kaplan-Meier curves and fit Cox survival models for patients receiving IO monotherapy and those receiving CT, stratifying by those deemed high-risk and LR according to the ML algorithm. This procedure allows us to eliminate confounding by indication that is present when assigning IO monotherapy versus CT, more clearly eliciting the predictive power of the ML model.

Sensitivity and Subgroup Analyses

In the Data Supplement, we conduct multiple sensitivity and subgroup analyses, including a PSM experiment using the Benjamini-Hochberg correction31 (Data Supplement).

RESULTS

Study Population and Demographics

A total of 38,048 patients from 2010 to 2024 met inclusion criteria (Data Supplement, Fig S2). A total of 25,505 (67.0%) experienced disease progression, and 14,043 (36.9%) died within 12 months of first-line therapy initiation. A total of 18,974 (49.9%) received doublet CT, 5,064 (13.3%) received IO monotherapy, 7,295 (19.2%) received combination therapy, and 6,715 (17.7%) received another form of first-line therapy. A total of 5,622(14.8%) had PD-L1 expression ≥50%, 4,300 (11.3%) had 1% ≤PD-L1 <50%, 6,288 (16.5%) were confirmed to be negative, and 21,838 (57.4%) did not have PD-L1 information (Table 1). Thirty-seven patients in the cohort had mixed and/or pseudo-progression and then experienced clinical, pathologic, or radiologic progression (average interval of 234 days). The distributor of data has demonstrated a 15-day agreement with the National Death Index of 95.6%-97.6%.32

TABLE 1.

Demographic Makeup of the Entire Cohort and the Test Set

Characteristic Entire Cohort (N = 38,048) Test Set (n = 7,610)
Age at diagnosis, years 67.8 ± 9.6 67.7 ± 9.8
Progression after 12 months, No. (%)
 No evidence of disease progression 12,543 (33.0) 2,488 (32.7)
 Evidence of disease progression 25,505 (67.0) 5,122 (67.3)
Mortality after 12 months, No. (%)
 Alive after 12 months 24,005 (63.1) 4,803 (63.1)
 Died within 12 months 14,043 (36.9) 2,807 (36.9)
Sex, No. (%)
 Male 20,309 (53.4) 3,600 (47.3)
 Female 17,739 (46.6) 4,010 (52.7)
Race, No. (%)
 White 27,106 (71.2) 5,434 (71.4)
 Black 3,792 (10.0) 771 (10.1)
 Asian 653 (1.7) 151 (2.0)
 Other race 2,926 (7.7) 568 (7.5)
 Unknown 3,571 (9.4) 686 (9.0)
Ethnicity, No. (%)
 Hispanic/Latino 1,192 (3.1) 228 (3.0)
 Non-Hispanic/Latino 36,856 (96.9) 7,382 (97.0)
Tumor stage at initial diagnosis, No. (%)
 Stage 0 2 (<0.1) 0 (0.0)
 Stage I 3,223 (8.5) 698 (9.2)
 Stage II 1,969 (5.2) 389 (5.1)
 Stage III 9,356 (24.6) 1,908 (25.1)
 Stage IV 22,671 (59.6) 4,454 (58.5)
 Occult 4 (<0.1) 1 (<0.1)
 Unknown 823 (2.2) 160 (2.1)
ECOG status, No. (%)
 0 7,355 (19.3) 1,496 (19.7)
 1 16,534 (43.5) 3,304 (43.4)
 2 6,968 (18.3) 1,387 (18.2)
 3 1,539 (4.0) 320 (4.2)
 4 55 (0.1) 12 (0.2)
 Unknown 5,957 (14.7) 1,091 (14.3)
Practice type, No. (%)
 Academic Medical Center 6,810 (17.9) 1,370 (18.0)
 Community Medical Center 31,238 (82.1) 6,240 (82.0)
Smoking status, No. (%)
 History of smoking 34,899 (91.7) 6,965 (91.5)
 Never smoker 2,989 (7.9) 619 (8.1)
 Unknown 160 (0.4) 26 (0.3)
Tumor histology, No. (%)
 Squamous cell carcinoma 10,693 (28.1) 2,182 (28.7)
 Nonsquamous cell carcinoma 25,680 (67.5) 5,095 (67.0)
 Unknown 1,675 (4.1) 333 (4.4)
PD-L1, No. (%)
 ≥50% 5,622 (14.8) 1,097 (14.4)
 1% ≤ PD-L1 <50% 4,300 (11.3) 861 (11.3)
 Confirmed PD-L1–negative 6,288 (16.5) 1,243 (16.3)
 Unknown 21,838 (57.4) 4,409 (57.9)
Insurance coverage during treatment, No. (%)
 Medicare 16,663 (43.8) 3,354 (44.1)
 Medicaid 3,489 (9.2) 720 (9.5)
 Commercial insurance 22,172 (58.3) 4,458 (58.6)
 Self-pay 2,060 (5.4) 424 (5.6)
 No insurance reported 6,334 (16.6) 1,235 (16.2)
First-line oncologic therapy prescribed, No. (%)
 Platinum-based doublet chemotherapy 18,974 (49.9) 3,814 (50.1)
 IO monotherapy 5,064 (13.3) 1,014 (13.3)
 Combination therapy 7,295 (19.2) 1,411 (18.5)
 Other 6,715 (17.7) 1,371 (18.0)
Mutation presence, No. (%)
 ROS1+ 271 (0.7) 57 (0.8)
 BRAF+ 1,348 (3.5) 272 (3.6)
 KRAS+ 6,786 (17.8) 1,372 (18.0)
 NTRK1+ 747 (2.0) 136 (1.8)
 NTRK2+ 301 (0.8) 56 (0.7)
 NTRK3+ 803 (2.1) 152 (2.0)
 Other NTRK mutation 77 (0.2) 14 (0.2)
 HER2/ERBB2+ 1,133 (3.0) 232 (3.1)
 RET 944 (2.5) 209 (2.8)
 MET 1,935 (5.1) 397 (5.2)
Anti-infective use before treatment, No. (%)
 No 37,054 (97.4) 7,404 (97.3)
 Yes 994 (2.6) 206 (2.7)
Glucocorticoid use before treatment, No. (%)
 No 12,248 (32.2) 2,468 (32.4)
 Yes 25,800 (67.8) 5,142 (67.6)
Reported metastases before treatment, No. (%)
 Brain 930 (2.4) 170 (2.2)
 Bone 1,567 (4.1) 300 (3.9)
 Other CNS location 40 (0.1) 12 (0.2)
 Digestive system 392 (1.0) 100 (1.3)
 Adrenal glands 271 (0.7) 56 (0.7)
 Other location/unspecified 1,214 (3.2) 270 (3.6)
Comorbidities before treatment, No. (%)
 Diabetes 2,473 (6.5) 517 (6.8)
 Connective tissue disease 447 (1.2) 86 (1.1)
 Interstitial lung disease 6 (<0.1) 1 (<0.1)
 Abnormal renal function 5,467 (14.4) 1,126 (14.8)
 Abnormal liver function 928 (2.4) 174 (2.3)

NOTE. The distributor of data has demonstrated a 15-day agreement with the NDI of 95.6%-97.6%.32

Abbreviations: ECOG, Eastern Cooperative Oncology Group; IO, immuno-oncology; NDI, National Death Index.

The test set had 7,610 patients with similar covariate balance as the original cohort. Within the test set, patients in the IO monotherapy group with PD-L1 ≥50% had decreased 12-month disease progression (hazard ratio [HR], 0.74 [95% CI, 0.62 to 0.88]; P = .001) compared with those with PD-L1 <50%, but there was no significant difference in 12-month OS (HR, 0.79 [95% CI, 0.61 to 1.03]; P = .076). Those receiving CT or combination therapy with PD-L1 ≥50% experienced decreased 12-month disease progression and 12-month OS relative to those receiving the same treatment but with PD-L1 <50% (Data Supplement, Table S5 and Figs S3 and S4).

ML Model Evaluation

The ML models achieved an area under the receiver operating curve of 0.701 (95% CI, 0.689 to 0.714) and 0.718 (95% CI, 0.707 to 0.730) for predicting 12-month PFS and 12-month OS, respectively (Table 2; Data Supplement, Fig S5). On the basis of test set AUC, the best performing models used XGBoost for predicting 12-month PFS and 12-month OS (Data Supplement, Table S6). LR patients are less likely to experience disease progression or death within 12 months, whereas high-risk patients are classified as being more likely to experience disease progression or death within 12 months. The threshold to be classified as high-risk for 12-month PFS and 12-month OS was 0.79 and 0.50, respectively, calculated using the Youden index. Kaplan-Meier survival plots illustrate two curves, one for patients deemed LR and the other for patients deemed high risk for each of IO monotherapy, CT, and combination therapy (Fig 1).

TABLE 2.

AUROC for Each ML Model on the Independent Test Set

Biomarker Outcome AUROC (95% CI) on the Test Set
ML model 12-month PFS 0.701 (0.689 to 0.714)
12-month OS 0.718 (0.707 to 0.730)

NOTE. AUROC was calculated using 1,000 bootstraps. All patients who received first-line therapy were included in the analysis.

Abbreviations: AUROC, area under the receiver operating curve; ML, machine learning; OS, overall survival; PFS, progression-free survival.

FIG 1.

FIG 1.

Kaplan-Meier curves demonstrating (A) 12-month PFS and (B) 12-month OS for test set patients classified as low-risk for disease progression or death (responsive) using the ML algorithm versus those classified as high-risk for disease progression or death (unresponsive). Thresholds of 0.79 and 0.50 were set for classifying patients as high-risk in the PFS and OS models, respectively. Patients receiving doublet chemotherapy, IO monotherapy, or combination therapy were included. IO, immuno-oncology; ML, machine learning; OS, overall survival; PFS, progression-free survival.

Patients classified as LR for 12-month PFS had significantly better survival compared with the high-risk group (HR, 0.47 [95% CI, 0.45 to 0.50]; P < .001) for all therapy groups (Table 3). Patients deemed LR for progression on IO monotherapy according to the model had decreased 12-month disease progression (HR, 0.53 [95% CI, 0.46 to 0.61]; P < .001) compared with the high-risk group. This survival benefit persisted when looking at patients on IO monotherapy with PD-L1 ≥50% (HR, 0.47 [95% CI, 0.36 to 0.61]; P < .001).

TABLE 3.

HRs Relating Likelihood of 12-Month Disease Progression and 12-Month Mortality for Test Set Patients Labeled as LR Relative to Those Labeled as High-Risk

Outcome nLR nhigh-risk HR (95% CI) P
12-month PFS
 All patients 3,685 3,925 0.47 (0.45 to 0.50) <.001a
 IO monotherapy 685 448 0.53 (0.46 to 0.61) <.001a
 IO monotherapy and PD-L1 ≥50% 369 100 0.47 (0.36 to 0.61) <.001a
 CT 1,636 2,172 0.43 (0.40 to 0.47) <.001a
 CT and PD-L1 ≥50% 212 64 0.47 (0.33 to 0.66) <.001a
 Combination therapy 854 557 0.53 (0.46 to 0.60) <.001a
 Combination therapy and PD-L1 ≥50% 182 44 0.37 (0.26 to 0.56) <.001a
12-month OS
 All patients 3,696 3,901 0.31 (0.29 to 0.34) <.001a
 IO monotherapy 668 463 0.30 (0.24 to 0.37) <.001a
 IO monotherapy and PD-L1 ≥50% 347 121 0.32 (0.22 to 0.45) <.001a
 CT 1,655 2,144 0.28 (0.24 to 0.31) <.001a
 CT and PD-L1 ≥50% 230 46 0.22 (0.14 to 0.37) <.001a
 Combination therapy 790 621 0.43 (0.36 to 0.52) <.001a
 Combination therapy and PD-L1 ≥50% 163 63 0.38 (0.23 to 0.64) <.001a

Abbreviations: CT, chemotherapy; HR, hazard ratio; IO, immuno-oncology; LR, low-risk; OS, overall survival; PFS, progression-free survival.

a

Significant at the 0.01 level.

Similar results were seen when predicting 12-month OS. LR patients had improved survival compared with high-risk patients (HR, 0.31 [95% CI, 0.29 to 0.34]; P < .001). The algorithm also predicted 12-month OS for patients receiving IO monotherapy (HR, 0.30 [95% CI, 0.24 to 0.37; P < .001]) and for those with PD-L1 ≥50% (0.32 [95% CI, 0.22 to 0.45]; P < .001). CT and combination therapy patients deemed LR for 12-month mortality were less likely to die, with HRs of 0.28 (95% CI, 0.24 to 0.31; P < .001) and 0.43 (95% CI, 0.36 to 0.52; P < .001), respectively.

For predicting 12-month PFS, having a stage IV tumor at initial diagnosis, PD-L1, Medicare enrollment, diagnosis year, and albumin were the five most impactful factors for the ML model (Fig 2). Similar factors were important when predicting 12-month OS. Albumin had the highest impact, followed by Medicare enrollment, PD-L1, having a stage IV tumor at initial diagnosis, and having PD-L1 being reported.

FIG 2.

FIG 2.

Shapley figures for the 12-month PFS XGBoost models demonstrating the 10 most important features for model predictions. Warmer colors indicate higher absolute values for the features, and cooler colors indicate lower absolute values. Positive Shapley values (as seen on the x-axis) indicate that a feature increases the likelihood of 12-month disease progression or mortality. Negative Shapley values (as seen on the x-axis) indicate that a feature decreases the likelihood of 12-month disease progression or mortality. Feature importance in descending order of importance, with the most important feature listed first. (A) 12-month PFS model. (B) 12-month OS model. ECOG, Eastern Cooperative Oncology Group; OS, overall survival; PFS, progression-free survival; SHAP, shapley additive explanations.

When restricting our analysis to those who both received IO monotherapy and had a documented PD-L1 status, the ML algorithm reported an AUC of 0.652 (95% CI, 0.612 to 0.695) and 0.652 (95% CI, 0.608 to 0.692) for predicting 12-month PFS and 12-month OS, respectively. However, PD-L1 alone reported an AUC of 0.549 (95% CI, 0.504 to 0.592) and 0.490 (95% CI, 0.440 to 0.541) for 12-month PFS and 12-month OS, respectively (Data Supplement, Table S7).

Comparative Effectiveness Study

After PSM on the test set, we plotted Kaplan-Meier curves and fit Cox survival models for patients receiving IO monotherapy and CT, stratifying by ML-derived risk (Fig 3). LR IO monotherapy and CT patients had lower 12-month disease progression compared with their high-risk counterparts (HR, 0.49 [95% CI, 0.38 to 0.63] and HR, 0.49 [95% CI, 0.38 to 0.63], respectively). There was similar stratification on the basis of 12-month OS for IO monotherapy (HR, 0.28 [95% CI, 0.19 to 0.42]) and CT (HR, 0.29 [95% CI, 0.20 to 0.41]; Data Supplement, Table S8).

FIG 3.

FIG 3.

Kaplan-Meier curves demonstrating (A) 12-month PFS and (B) 12-month OS for test set patients classified as low-risk (responsive) using the ML algorithm versus those classified as high-risk (unresponsive). Survival curves were generated after 1:1 propensity score matching without replacement and a caliper of 0.2. Thresholds of 0.79 and 0.50 were set for classifying patients as high-risk in the PFS and OS models, respectively. IO, immuno-oncology; ML, machine learning; OS, overall survival; PFS, progression-free survival.

DISCUSSION

We have shown that an ML algorithm that uses multiple features is a superior predictor of treatment response in advanced NSCLC compared with PD-L1 alone in terms of AUC and unadjusted HRs. This ability not only applies to IO monotherapy but other first-treatment options as well, including doublet CT and combined immunotherapy/CT. This supports the notion that using ML algorithms which amalgamate multiple features into predictions can outperform a single biomarker. By emulating prospective validation on the test set using PSM, we show that the ML model can reliably stratify patients on the basis of likelihood of treatment response to IO monotherapy or CT. We further demonstrate that when we restrict development of the ML model to patients who received first-line IO monotherapy, the predictions of the ML model remain a more effective predictive biomarker than PD-L1 alone.

While PD-L1 is a useful biomarker for predicting disease progression with IO monotherapy in this retrospective cohort, it is not a statistically significant predictor of OS. Moreover, we demonstrate that after PSM, IO monotherapy did not demonstrate a PFS or OS benefit over CT at any arbitrarily chosen PD-L1 threshold (Data Supplement). This indicates that when emulating pseudo-randomization in cohort studies, the standard PD-L1 threshold of 50% may not always be optimal for allocating IO monotherapy over CT. Subsequently, an ML algorithm such as the one we describe in this study may provide a more accurate prediction of a patient's likelihood of treatment response than a single biomarker.

The year of first-line therapy initiation was a significant contributor to the prediction for 12-month PFS, according to the Shapley plot. This could be attributed to the fact that clinicians' understanding of immunotherapy has evolved throughout the 2010s as more randomized control data became available. As a result, while a single predictive biomarker such as PD-L1 may be flawed, its use to guide immunotherapy or doublet CT allocation likely increased with time and may affect the likelihood of treatment response.

There are multiple reasons why an ML algorithm may outperform a single biomarker. First, the biomarker is subject to error as in the case of PD-L1, where a pathologist must interpret staining intensity and the patient-facing provider may make clinical decisions based solely on that interpretation. ML algorithms incorporate many features into their predictions, minimizing the contribution of error in any one given feature to the final prediction. In addition, the use of multiple features in the model allows the algorithm to provide more personalized risk prediction, enhancing precision medicine applications.

Clinicians may be wary of relying on ML models to aid clinical decision making because of problems with interpretability. Using Shapley plots, we see that PD-L1 was an impactful feature in the 12-month PFS and 12-month OS models, but it was the second and third most important feature, respectively. Other important features included stage at initial diagnosis, Medicare enrollment, and year of diagnosis. These findings are intuitive and demonstrate how using many covariates in model creation can lead to logical decisions by the AI. These automated predictions are based on routinely collected real-world data, ensuring that this algorithm is clinically feasible and can inform treatment decisions at the bedside. We propose that this method can apply to other domains that use hard-and-fast threshold-based criteria to better optimize treatment allocation. Further work should include reproducing the methods in this article with a larger cohort.

For a clinician to use this algorithm, we envision that one would input the necessary variables into a graphical user interface that is either hosted on a third-party smartphone application or online. The clinician would indicate which first-line therapy they are considering (ie, IO monotherapy, doublet CT, combination therapy, etc), and the algorithm would predict a likelihood of 12-month PFS and 12-month OS by outputting a continuous value between 0 and 1. Using the respective probability outputted by the software, the clinician can determine if they wish to continue with the therapy in question or if they should consider a different treatment.

This study has limitations. We cannot generalize the results of the PSM approach to patients with PD-L1–positive advanced NSCLC at large. Our claims only apply to this retrospective cohort; the PD-L1 threshold of 50% for allocating IO monotherapy over CT has been validated multiple times.6,9,12 We use a relatively small cohort to develop the algorithm; the algorithm could be further refined by significantly increasing the number of patients who meet inclusion criteria. Doing so would likely increase algorithmic performance in terms of AUC and better separate LR from high-risk groups.

In conclusion, we present a novel ML algorithm that can accurately predict likelihood of first-line treatment response in patients with advanced NSCLC. This algorithm is more predictive for determining the efficacy of IO monotherapy compared with the PD-L1 biomarker, which has previously been used to allocate first-line IO monotherapy over first-line CT. This study provides evidence that ML algorithms are better predictors of disease course and treatment efficacy in oncology than current threshold-based criteria such as PD-L1 status. Indeed, AI techniques may prove to be better at allocating therapy than hard-and-fast individual biomarkers for multiple use cases in medicine.

ACKNOWLEDGMENT

We would like to acknowledge Linsday Warrenburg, PhD, for the guidance she provided.

DISCLAIMER

The views and opinions expressed in this article are those of the authors and do not reflect the official positions or policies of institutions with which the authors are affiliated.

DATA SHARING STATEMENT

A data sharing statement provided by the authors is available with this article at DOI https://doi.org/10.1200/CCI-24-00157. The data that support the findings of this study have been originated by Flatiron Health, Inc. Requests for data sharing by license or by permission for the specific purpose of replicating results in this manuscript can be submitted to dataaccess@flatiron.com. Development of the ML model was made in Python v3.11.5 and required the following packages: sklearn, pandas, shap, and numpy. Statistical analysis was performed in Stata, requiring the teffects and psmatch2 packages. Code is available at https://github.com/vahluw/NSCLC_PDL1_Immunotherapy.

AUTHOR CONTRIBUTIONS

Conception and design: All authors

Financial support: Ravi B. Parikh

Administrative support: Ravi B. Parikh

Provision of study materials or patients: Ravi B. Parikh

Collection and assembly of data: Vinayak S. Ahluwalia

Data analysis and interpretation: All authors

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Vinayak S. Ahluwalia

Research Funding: Flatiron Health (Inst)

Ravi B. Parikh

Stock and Other Ownership Interests: Merck, GNS Healthcare, Onc.AI, Thyme Care, Verve Therapeutics, Bristol Myers Squibb, AstraZeneca

Honoraria: Wake Forest School of Medicine

Consulting or Advisory Role: Thyme Care, Humana, NanOlogy, Merck, Biofourmis, ConcertAI, G1 Therapeutics, Archetype Therapeutics, Credit Suisse, Klick Inc, Onc.AI

Research Funding: Emerson Collective (Inst), Mendel AI (Inst), Prostate Cancer Foundation (Inst), Arnold Ventures (Inst), Commonwealth Fund (Inst), Schmidt Futures (Inst)

Patents, Royalties, Other Intellectual Property: Technology to integrate patient-reported outcomes into electronic health record algorithms

Open Payments Link: https://openpaymentsdata.cms.gov/physician/701967

No other potential conflicts of interest were reported.

REFERENCES

  • 1.Yu H, Boyle TA, Zhou C, et al. : PD-L1 expression in lung cancer. J Thorac Oncol 11:964-975, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Oun R, Moussa YE, Wheate NJ: The side effects of platinum-based chemotherapy drugs: A review for chemists. Dalton Trans 47:6645-6653, 2018 [DOI] [PubMed] [Google Scholar]
  • 3.Lahiri A, Maji A, Potdar PD, et al. : Lung cancer immunotherapy: Progress, pitfalls, and promises. Mol Cancer 22:40, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Garon EB, Rizvi NA, Hui R, et al. : Pembrolizumab for the treatment of non-small-cell lung cancer. N Engl J Med 372:2018-2028, 2015 [DOI] [PubMed] [Google Scholar]
  • 5.Brahmer JR, Drake CG, Wollner I, et al. : Phase I study of single-agent anti-programmed death-1 (MDX-1106) in refractory solid tumors: Safety, clinical activity, pharmacodynamics, and immunologic correlates. J Clin Oncol 28:3167-3175, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Reck M, Rodríguez-Abreu D, Robinson AG, et al. : Pembrolizumab versus chemotherapy for PD-L1–positive non–small-cell lung cancer. N Engl J Med 375:1823-1833, 2016 [DOI] [PubMed] [Google Scholar]
  • 7.Mok TSK, Wu YL, Kudaba I, et al. : Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): A randomised, open-label, controlled, phase 3 trial. Lancet 393:1819-1830, 2019 [DOI] [PubMed] [Google Scholar]
  • 8.Di Federico A, De Giglio A, Parisi C, et al. : PD-1/PD-L1 inhibitor monotherapy or in combination with chemotherapy as upfront treatment for advanced NSCLC with PD-L1 expression ≥50%: Selecting the best strategy. Crit Rev Oncol Hematol 160:103302, 2021 [DOI] [PubMed] [Google Scholar]
  • 9.Gandhi L, Rodríguez-Abreu D, Gadgeel S, et al. : Pembrolizumab plus chemotherapy in metastatic non–small-cell lung cancer. N Engl J Med 378:2078-2092, 2018 [DOI] [PubMed] [Google Scholar]
  • 10.Paz-Ares L, Luft A, Vicente D, et al. : Pembrolizumab plus chemotherapy for squamous non–small-cell lung cancer. N Engl J Med 379:2040-2051, 2018 [DOI] [PubMed] [Google Scholar]
  • 11.Hellmann MD, Paz-Ares L, Bernabe Caro R, et al. : Nivolumab plus ipilimumab in advanced non-small-cell lung cancer. N Engl J Med 381:2020-2031, 2019 [DOI] [PubMed] [Google Scholar]
  • 12.Brahmer J, Reckamp KL, Baas P, et al. : Nivolumab versus docetaxel in advanced squamous-cell non-small-cell lung cancer. N Engl J Med 373:123-135, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Seymour L, Bogaerts J, Perrone A, et al. : iRECIST: Guidelines for response criteria for use in trials testing immunotherapeutics. Lancet Oncol 18:e143-e152, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gould MK, Huang BZ, Tammemagi MC, et al. : Machine learning for early lung cancer identification using routine clinical and laboratory data. Am J Respir Crit Care Med 204:445-453, 2021 [DOI] [PubMed] [Google Scholar]
  • 15.Liu D, Wang X, Li L, et al. : Machine learning-based model for the prognosis of postoperative gastric cancer. Cancer Manag Res 14:135-155, 2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nartowt BJ, Hart GR, Muhammad W, et al. : Robust machine learning for colorectal cancer risk prediction and stratification. Front Big Data 3:6, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Stark GF, Hart GR, Nartowt BJ, et al. : Predicting breast cancer risk using personal health data and machine learning models. PLoS One 14:e0226765, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Xie J, Luo X, Deng X, et al. : Advances in artificial intelligence to predict cancer immunotherapy efficacy. Front Immunol 13:1076883, 2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rakaee M, Adib E, Ricciuti B, et al. : Association of machine learning-based assessment of tumor-infiltrating lymphocytes on standard histologic images with outcomes of immunotherapy in patients with NSCLC. JAMA Oncol 9:51-60, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Birnbaum B, Nussbaum N, Seidl-Rathkopf K, et al. : Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research. arXiv 10.48550/arXiv.2001.09765 [DOI]
  • 21.Ma X, Long L, Moon S, et al. : Comparison of population characteristics in real-world clinical oncology databases in the US: Flatiron Health, SEER, and NPCR. MedRxiv 10.1101/2020.03.16.20037143 [DOI]
  • 22.Collins GS, Reitsma JB, Altman DG, et al. : Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMC Med 13:1, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Oken MM, Creech RH, Tormey DC, et al. : Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am J Clin Oncol 5:649-655, 1982 [PubMed] [Google Scholar]
  • 24.CKD-EPI creatinine equation. National Kidney Foundation, 2021. https://www.kidney.org/content/ckd-epi-creatinine-equation-2021
  • 25.Koyama T, Hamada H, Nishida M, et al. : Defining the optimal cut-off values for liver enzymes in diagnosing blunt liver injury. BMC Res Notes 9:41, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Manz CR, Chen J, Liu M, et al. : Validation of a machine learning algorithm to predict 180-day mortality for outpatients with cancer. JAMA Oncol 6:1723-1730, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Youden WJ: Index for rating diagnostic tests. Cancer 3:32-35, 1950 [DOI] [PubMed] [Google Scholar]
  • 28.Ruopp MD, Perkins NJ, Whitcomb BW, et al. : Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom J Biom Z 50:419-430, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.An introduction to explainable AI with Shapley values—SHAP latest documentation. https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html
  • 30.Ballman KV: Biomarker: Predictive or prognostic? J Clin Oncol 33:3968-3971, 2015 [DOI] [PubMed] [Google Scholar]
  • 31.Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289-300, 1995 [Google Scholar]
  • 32.Zhang Q, Gossai A, Monroe S, et al. : Validation analysis of a composite real-world mortality endpoint for patients with cancer in the United States. Health Serv Res 56:1281-1287, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

A data sharing statement provided by the authors is available with this article at DOI https://doi.org/10.1200/CCI-24-00157. The data that support the findings of this study have been originated by Flatiron Health, Inc. Requests for data sharing by license or by permission for the specific purpose of replicating results in this manuscript can be submitted to dataaccess@flatiron.com. Development of the ML model was made in Python v3.11.5 and required the following packages: sklearn, pandas, shap, and numpy. Statistical analysis was performed in Stata, requiring the teffects and psmatch2 packages. Code is available at https://github.com/vahluw/NSCLC_PDL1_Immunotherapy.


Articles from JCO Clinical Cancer Informatics are provided here courtesy of Wolters Kluwer Health

RESOURCES