Abstract
The phase 3 IMpower150 trial in treatment-naïve patients with metastatic non-small-cell lung cancer (NSCLC) demonstrates significantly longer progression-free (PFS) and overall survival (OS) with first-line atezolizumab (anti-PD-L1)-bevacizumab (anti-VEGF)-carboplatin-paclitaxel (ABCP) than with bevacizumab-carboplatin-paclitaxel (BCP). We characterise four molecular NSCLC subtypes identified by unsupervised clustering of transcriptomes of 564 pre-treatment primary tumour samples from IMpower150 using non-negative matrix factorization (NMF1-4). Each subtype has distinct tumour PD-L1 expression levels, epithelial characteristics, immune composition, and treatment outcomes. Both NMF2 (enriched in tumour proliferation signal, macrophages, and monocytes) and NMF4 (enriched in B cells and T cells) have elevated tumour PD-L1 expression. Of these two, only NMF4 demonstrates PFS and OS benefits with ABCP versus either BCP or atezolizumab-carboplatin-paclitaxel (ACP). Patients with NMF1 (enriched in basal and squamous-like cells) have improved outcomes on ABCP compared with ACP or BCP; those with NMF3 (enriched in adenocarcinoma signatures) show similar outcomes among treatments. These insights could help inform individualised first-line treatment for metastatic NSCLC.
Subject terms: Tumour biomarkers, Tumour immunology, Lung cancer
Standard first line therapy in patients with non-small cell lung cancer is immunotherapy but responses vary and consistent predictive biomarkers are lacking. Here, using RNA-sequencing data from a large clinical trial in NSCLC patients, the authors define four molecular subsets with distinct tumour-intrinsic and -extrinsic features with differing outcomes to immunotherapy combinations.
Introduction
Lung cancer is the leading cause of cancer-related deaths1,2, with non-small cell lung cancer (NSCLC) comprising ≤ 85% of lung cancers3. Most NSCLC tumours have non-squamous pathology4. The current standard of care for oncogenic driver mutation-negative NSCLC involves immune checkpoint inhibitors (ICIs), which relieve the inhibitory effects of tumours on Tlymphocytes4. During this process, programmed death ligand 1 (PD-L1) on the surface of tumour and other cells binds to the programmed death protein (PD)−1 on activated T cells to prevent cytotoxic T-cell responses to tumours5. In the absence of ICIs, high PD-L1 levels therefore facilitate immune escape, tumour PD-L1 expression is associated with shorter survival and poor prognosis6.
The PD-L1 inhibitor atezolizumab is used to treat NSCLC either as monotherapy for patients with PD-L1–high expression (≥ 50% of tumour cells or ≥ 10% of tumour area infiltrated by immune cells)7,8 or in combination with chemotherapy and the anti-angiogenic agent bevacizumab regardless of PD-L1 levels9,10. However, PD-L1’s predictive value for immunotherapy outcomes varies across trials11, necessitating additional markers for personalised NSCLC treatment. Favourable responses to immunotherapy in NSCLC have been linked to high tumour mutational burden (TMB)12, intratumoural plasma and B cells13, and a pro-immunogenic tumour microenvironment (TME)14 characterized by abundant tumour-infiltrating lymphocytes15, dendritic cells, and macrophages14. Despite these associations, a universally accepted predictive biomarker remains elusive.
The phase 3 IMpower150 trial (NCT02366143) was conducted in treatment-naïve patients with advanced non-squamous metastatic NSCLC9. The study demonstrated significantly improved progression-free (PFS) and overall survival (OS) with the ‘ABCP’ treatment combination of atezolizumab (anti–PD-L1), bevacizumab (anti-vascular endothelial growth factor [VEGF]) and carboplatin-paclitaxel, compared with bevacizumab plus carboplatin-paclitaxel (BCP), regardless of tumour PD-L1 expression (PFS HR 0.62 [95% CI, 0.52-0.74], P < 0.001; OS HR 0.78 [95% CI, 0.64-0.96], P = 0.02). Although no clinical benefit was seen with atezolizumab plus carboplatin-paclitaxel (ACP) versus BCP in the overall study population, patients with PD-L1–high NSCLC had improved PFS with ACP vs BCP (HR 0.63 [95% CI, 0.43-0.92])10.
In this work, we identify transcriptomic-based molecular subtypes of advanced non-squamous NSCLC using unsupervised, unbiased machine learning-based clustering while previous molecular subtyping of non-squamous NSCLC involved The Cancer Genome Atlas (TCGA) tumours, which are almost exclusively early stage16,17. We describe the relationships between these molecular subtypes and PD-L1 expression status, tumour/TME biology, and clinical outcomes.
Results
Identification and characterisation of four molecular subtypes of non-squamous NSCLC
We sought to identify and characterise molecular subtypes of non-squamous NSCLC that represent the molecular heterogeneity of NSCLC and might predict PFS and OS following PD-L1–angiogenesis blockade. Consensus non-negative matrix factorisation (NMF) was used to group the transcriptomes of primary NSCLC tumour biopsies taken from 564 treatment-naïve patients in IMpower1509,10. Based on the 3072 most variably expressed genes, consensus NMF clustering identified four groups as the most robust (ie., four subtypes, henceforth termed NMF1-4) (Fig. 1a). Of the patients, 195 had been treated with ABCP, 178 with ACP, and 185 with BCP; 6 patients were not treated.
Fig. 1. Transcriptional classification reveals four subtypes of non-squamous NSCLC with distinct biologies.
a RNA-sequencing data from NSCLC tumour samples from 564 patients in the IMpower150 trial9 were grouped into the four molecular subtypes shown in the consensus matrix, based on genes with the highest expression variance, using non-negative matrix factorization. Heatmap summarizing consensus clustering of samples factorized via NMF (k = 4; for details see Methods), with each sample represented in one row/column. Consensus values (the fraction of times of every pair of samples clustered together across multiple iterations of factorization) are represented as red when two samples always cluster together and purple when two samples never cluster together. b Heatmap for gene expression signatures18 of various immune cell types and cancer-related pathways in each NMF subtype, compared with all other patients in IMpower150 (left panel). The dot heatmap (right panel) shows the summarized gene signature enrichment or diminution in each NMF subtype. c Heatmap of gene signature enrichment or diminution for B cells, myeloid cells, stroma cells, and T effector cells by NMF subtype. d Proportions of patients whose tumour samples were well differentiated, moderately differentiated, or undifferentiated. e Proportions of patients whose tumour samples had PD-L1–positive (> 50% using the SP263 immunohistochemistry assay), PD-L1–low (1-49%), and PD-L1–negative (< 1%) protein expression by NMF subtype. f Distribution of The Cancer Genome Atlas16 lung adenocarcinoma (LUAD) subtypes (left panel) and of LUAD subtypes defined by Roh et al.17 (right panel) among NMF subtypes. Source data are provided as a Source Data file. cDC1 conventional type 1 dendritic cell, cDC2 conventional type 2 dendritic cell, EMT epithelial-mesenchymal transition, DC mature mature dendritic cell, NK cell natural killer cell, pDC plasmacytoid dendritic cell, sc.foll.b follicular B cell profiled from single-cell dataset, sc.GC.b germinal centre B cell profiled from single-cell dataset, sc.plasma plasma cell profiled from single-cell dataset.
To understand the underlying biology of the molecular subtypes, we first compared the specific immune cell types, biological pathways, and gene expression patterns in each group (Fig. 1b). The NMF1 subtype (n = 103 samples, 18%; Fig. 1a) was enriched in gene signatures associated with basal/squamous-like cells, neutrophils, and endothelial cells (Fig. 1b). The NMF2 subtype (n = 184, 33%) was enriched in gene signatures for macrophages, monocytes, and proliferation. The NMF3 subtype (n = 158, 28%) was predominantly enriched in adenocarcinoma and exhibited minimal expression of gene signatures typically associated with squamous cell carcinoma. While all the tumors in our study are non-squamous, specific gene expression patterns associated with squamous cell carcinoma can still be present at varying levels in non-squamous tumors. Our observation indicates that the NMF3 subgroup had the least presence of these squamous cell-associated gene signatures compared to other subgroups. The NMF4 subtype (n = 119, 21%) had the most lymphocyte-inflamed transcriptomic profile, with enrichment of epithelial-to-mesenchymal transition–related gene signatures, as well as B-, dendritic, stromal, and T-cell gene signatures.
We further classified NMF subtype biology using TME-associated and immune cell subtype-specific expression profiles (Fig. 1c)18. Only the NMF4 subtype showed B-cell enrichment, specifically follicular B-cells, germinal centre B-cells, and plasma cells. The other three subtypes expressed low levels of B-cell signatures. Considering myeloid cells, the NMF2 subtype expressed the highest level of two myeloid subset signatures myeloid_C6_MMP9 and myeloid_C4_CCL2 from Gavish et al.18, whereas NMF4 was enriched with myeloid_C2_CD16, myeloid_C3_CCR2. Regarding stromal cells, both NMF2 and NMF4 were enriched in cancer-associated fibroblasts (CAFs) (fibroblast_C10_COMP, fibroblast_C11_SERPINE1). NMF4 was the only subtype enriched in the 8-gene T-effector (Teff) signature (CD8A, GZMA, GZMB, IFNG, CXCL9, CXCL10, PRF1, TBX21). NMF3 showed low levels of immune and stromal cells. In total, the transcriptional subtypes identified suggest distinct tumor and TME compartments.The transcriptomic profiles suggested differences in epithelial biology; therefore, we examined potential differences observable by pathology. The proportions of samples classified by pathologists as undifferentiated and moderately- and well-differentiated were similar between NMF1, −2, and, −4 (Fig. 1d). However, NMF3 comprised mostly well-differentiated samples (84%), which was in line with the highest expression of adenocarcinoma and minimal expression of squamous cell carcinoma gene signatures of NMF3 in Fig. 1b. The pathological subtypes were also compared between molecular subtypes (Supplementary Fig 1f). There was no correlation between NMF and pathological subtypes because of the limited sample size of the pathological groups other than adenocarcinoma. Similarly, no significant differences were seen between NMF subtype and prognostic features such as ECOG performance status, liver metastases, and sex (Supplementary Fig 1g).
The NSCLC molecular subtypes were further analysed for PD-L1 protein expression levels on cancer cells using the SP263 assay (Supplementary Table 1). PD-L1 expression level was significantly higher in the NMF2 and NMF4 subtypes than NMF1 and NMF3 (P = 1.31−10 for NMF2 and NMF4 vs NMF1 and NMF3). Among all the pairs, the only comparison that is not statistically significant is NMF2 vs NMF4 (P = 0.36) (Fig. 1e). Overall, NMF4 and NMF2 have similar PD-L1 levels yet distinct immune infiltration patterns.
To compare our NMF molecular subtypes with other transcriptome-based subtypes for NSCLC, we evaluated the distribution of TCGA lung adenocarcinoma subtypes16 and the distribution of subtypes published by Roh et al. (Fig. 1f)17 within each NMF subtype. The terminal respiratory ‘bronchoid’ subtype identified in the TCGA16 was enriched in NMF3 (82%; Fig. 1f), consistent with NMF3’s adenocarcinoma features (Fig. 1b). The bronchoid subtype was also seen in 55% of NMF1 and 42% of NMF4 samples (Fig. 1f). The S5 cluster identified by Roh et al.17 was enriched in NMF3 (54%) and NMF1 (42%). The S5 cluster has low proliferation signatures, is enriched in EGFR mutations/amplifications, and is more common in patients aged > 65 years17. NMF2 included similar proportions of the S2, S3, and S4 clusters. NMF4 was most enriched in the S3 cluster (52%), previously shown to be associated with anti-PD-L1 treatment responses due to high PD-L1 protein expression and an immune-inflamed phenotype17. Nevertheless, none of the transcriptional NMF subtypes we identified overlapped exclusively with any previously described NSCLC subtype.
To validate the NMF subtypes, we developed a machine learning-based classifier using a random forest algorithm based on the samples from the IMpower150 study to predict the same four NMF subtypes in 541 non-squamous advanced NSCLC samples from patients in the OAK study8. OAK compared atezolizumab monotherapy vs docetaxel in patients with previously treated advanced metastatic NSCLC. In OAK, NMF1 and NMF2 were the least (11%) and most common (55%) subtypes; NMF3 and NMF4 comprised 14% and 20%, respectively (Supplementary Fig. 1a). We examined the OAK NMF transcriptomes using the same gene sets for cancer-associated pathways, biological processes, and immune cell types used for the IMpower150 samples (Supplementary Fig. 1b, c). The OAK NMF subtypes showed similar expression profiles to IMpower150, such as enriched neutrophils and natural killer cells in NMF1; enriched macrophages and proliferation signals in NMF2; enriched adenocarcinoma cells and fibroblasts in NMF3; and enrichment of B cells, myeloid cells, and lymphocytes in NMF4. As in the IMpower150 samples, PD-L1 protein levels were significantly higher in NMF2 and NMF4 than NMF1 and NMF3 (P = 0.0005) (Supplementary Fig. 1d).
To understand if the genomic profiles were distinct in the transcriptomic-based subtypes, we performed whole-exome sequencing of 261 available samples from patients in IMpower150. The prevalence of highly altered genes identified (Fig. 2a) was consistent with previous findings in lung adenocarcinoma16. The most commonly altered genes (in > 10% of tumours) were examined within each molecular subtype (Fig. 2b). These were TP53, STK11, KRAS, NOTCH1, KEAP1, CDKN2A, NFE2L2, SMARCA4, RBM10, EGFR, FAT1, and BRAF. TP53 variants were the most prevalent in the NMF2 subtype. STK11, CDKN2A, and variants showed the highest prevalence in the NMF2 and NMF3 subtypes. EGFR variants were most frequently found in NMF3. When the proportions of samples with mutations in these genes were compared among NMF subtypes using a χ2 test, the proportions with mutations in TP53, STK11, CDKN2A, SMARCA4, and EGFR were significantly different (P < 0.05) among subtypes (Fig. 2c, d). Although these enrichments were statistically significant, no specific mutation uniquely defines any subtype.
Fig. 2. Landscape of somatic mutations in the NMF molecular subtypes.
a Oncoplot of genes that had mutations and copy number variations in ≥ 2% of 261 tumour samples used in whole-exome sequencing analyses. b Oncoplots for the four NMF subtypes, showing genes altered in ≥ 10% of samples in the same order as in Fig. 2a. c FDR-adjusted P values from a Pearson’s χ2 test are reported for comparisons of mutations in each indicated gene among samples. d Percentage of NSCLC samples carrying variations in each gene by NMF subtype. The proportions of each molecular subtype with available whole exome sequencing data were 37 NMF1 samples (35.9%), 84 NMF2 samples (46.2%), 73 NMF3 samples (46.8%), and 65 NMF4 samples (54.6%). e Tumour mutation burden (TMB) in each NMF subtype. The lower and upper hinges in the box plot correspond to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5× IQR from the hinge (where IQR is the interquartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. There were 37 samples in NMF1, 85 samples in NMF2, 73 samples in NMF3, and 65 samples in NMF4. Source data are provided as a Source Data file. Two-sided t-test was used for pairwise comparison. P values were 0.128 (NMF2 vs NMF4), 0.6238 (NMF3 vs NMF1), 0.6907 (NMF4 vs NMF1), 0.0038 (NMF3 vs NMF2), 0.0016 (NMF4 vs NMF2), 0.9396 (NMF4 vs NMF3). *P < 0.05; **P < 0.01; ***P < 0.001.
The enrichment of oncogenic signalling pathways with genomic alterations (i.e., KRAS, MTOR, CDKN2A, TP53, NFE2L2, ARID1A, SETD2, and U2AF1) was evaluated in each NMF subtype (Supplementary Fig. 2a). NMF1 had the lowest rate of gene alterations in the pathways evaluated, with a significant difference among subtypes seen for the NFE2L2, ARID1A, SETD2, and U2AF1 pathways (all P < 0.05). NMF4 had the highest rates of gene alterations in all other pathways evaluated except the CDKN2A and ARID1A pathways. More than half of the NMF4 samples had alterations in the KRAS and mTOR pathways. NMF2 had the second highest rates of alteration in all pathways.
Given the differences in PD-L1 levels between the NMF subtypes, we examined for an association between PD-L1 expression and CD274 copy number deletions, alterations, and amplifications (Supplementary Fig. 2b). The copy number of CD274, which encodes PD-L1, has been shown to correlate with PD-L1 expression19 and is a putative biomarker for ICI benefit in patients with non-squamous NSCLC20. PD-L1 expression correlated with CD274 copy number amplification and deletion in NSCLC samples from IMpower150 (P = 0.00015) and OAK (P = 0.03053). There was no significant difference between IMpower150 and OAK in the proportions of patients having different PD-L1 protein expression levels (χ2 test P = 0.145, Supplementary Fig. 2c). Further, the TMB was also compared between molecular subtypes (Fig. 2e). Although NMF2 had the highest TMB, TMB did not differ significantly between the four subtypes.
It is important to note that NMF2 and NMF4 exhibit distinct transcriptional as well as mutational profiles. Specifically, NMF2 is characterized by higher frequencies of STK11 (χ2 test P = 0.0016) and KEAP1 (χ2 test P = 0.014) alterations than NMF4, which are known to be involved in immunotherapy resistance2. The associations depicted in Fig. 1c were not as strong as this direct comparison because the inclusion of the additional subtypes (NMF1 and NMF3) diluted the overall strength of the association. While NMF4 is not devoid of STK11/KEAP1 alterations, their enrichment in NMF2 may contribute to the relative lack of lymphocyte infiltration. In summary, while mutation profiles displayed some differences between subtypes, no single gene alteration uniquely defined any of the NMF subtypes identified.
Molecular subtypes were associated with different outcomes after treatment with atezolizumab-bevacizumab-chemotherapy combination
Clinical outcomes with ABCP, ACP, and BCP in IMpower150 were evaluated by NMF molecular subtype in the biomarker-evaluable population (n = 413). Efficacy was similar between the intention-to-treat (ITT) population (ABCP vs. BCP PFS HR: 0.56 [95% CI 0.48-0.65]; ABCP vs. BCP OS HR: 0.80 [0.68-0.95]) and the biomarker-evaluable population (ABCP vs. BCP PFS HR: 0.60 [95% CI 0.41-0.88]; ABCP vs BCP OS HR: 0.63 [95% CI 0.42-0.95]). Patients with NMF1-subtype tumours had longer median PFS with ABCP than ACP or BCP (HR 0.49 [95% CI 0.29-0.85] for ABCP vs BCP) (Fig. 3a). No differences in median PFS between combinations were observed for patients with NMF2 and NMF3 tumours. Patients with NMF4 tumours had substantially longer PFS after ABCP than other treatment combinations (HR 0.26 [95% CI 0.15-0.44] for ABCP vs BCP). Patients with NMF4 tumours also had the longest median OS with ABCP (OS HR 0.59 [95% CI 0.34-1.03] for ABCP vs BCP); (Fig. 3b). Patients with NMF1 also had a substantially longer OS with ABCP than BCP (OS HR 0.51 [95% CI 0.29-0.90] for ABCP vs BCP). The median OS for patients with NMF2 and NMF3 subtypes was similar between treatment arms.
Fig. 3. Association between clinical outcomes with atezolizumab-bevacizumab-chemotherapy combinations and NMF subtype in IMpower150.
a PFS in each treatment arm by NMF subtype. The forest plot shows the PFS HRs for ABCP vs BCP for each subtype. b OS in each treatment arm by NMF subtype. The forest plot shows the OS HRs for ABCP vs BCP. c Comparison of PFS across NMF subtypes by treatment. d Comparison of PFS (top 3 panels) and OS (bottom 3 panels) with ABCP vs BCP in NMF2 and NMF4 subtypes by PD-L1 level: PD-L1–high (tumour proportion score ≥ 50% by SP263 IHC assay), PD-L1–low (1%-49%) and PD-L1–negative (< 1%). Data in the ACP arm were not substantially different from those in the BCP arm and are not shown in the graphs to aid visualization. e Results of the multivariate analysis evaluating the association between the outcomes (PFS: left panel, and OS: right panel) and multiple factors on the ABCP arm. These factors include NMF subtypes, PD-L1 levels, tumor mutation burden, STK11/KEAP1 mutational status, and smoking history. Source data are provided as a Source Data file. For the forest plots in a,b,e, each row represents a different predictor variable, with the point estimates and 95% CIs for the HRs displayed as squares and horizontal error bars, respectively. The vertical dashed line represents the line of no effect (HR = 1.0). Variables with confidence intervals that do not cross this line are considered statistically significant. For the survival analyses in a, b, e, the log-rank test was used to compare Kaplan-Meier survival curves. Cox proportional hazards regression models were used to generate hazard ratios and 95% confidence intervals. Multivariable Cox proportional hazards regression models were used to compare the interdependence of distinct biomarkers for prediction of OS benefit. ABCP atezolizumab plus bevacizumab plus carboplatin-paclitaxel, ACP atezolizumab plus carboplatin-paclitaxel, BCP bevacizumab plus carboplatin-paclitaxel, NE not estimable.
To determine whether any NSCLC subtype was associated with improved PFS, PFS with each combination was evaluated by subtype (Fig. 3c). In the ABCP arm, patients with NMF4 tumours had a longer median PFS than with other NMF subtypes. However, in the ACP and BCP arms, PFS for patients with the NMF4 subtype did not differ from that for patients with other subtypes. These findings suggest that the NMF4 subtype may be predictive of response to ABCP.
To explore whether PD-L1 expression level in tumour cells played a role in these subtype-specific clinical responses, PFS and OS were analysed by cancer cell PD-L1 protein level in patients with NMF2 and NMF4 subtypes, which showed the greatest enrichment for PD-L1–positive tumours (Fig. 1e). Patients in the NMF4 subgroup treated with ABCP had longer median OS and PFS than those in the ACP (median survival data shown in Supplementary Table 2) and BCP arms, in both groups of patients with PD-L1–high and –low expression (Fig. 3d). However, the NMF4 subgroup on the ABCP arm did not associate with longer PFS or OS than the BCP arm within PD-L1 negative patients. This finding suggests that the NMF4 subtype may be predictive of response to ABCP within PD-L1 positive patients. In patients with PD-L1–negative NMF2 and NMF4 subtypes, OS and PFS were similar, irrespective of ABCP or BCP treatment. These findings suggest that NMF4 but not NMF2 subtype patients who were PD-L1 positive benefited from ABCP; however, interpretation of these analyses is limited by the small sample sizes.
The association between NMF subtype and tumour response to treatment per Response Evaluation Criteria in Solid Tumors 1.1 was also investigated (Supplementary Fig. 3a). Complete (CRs) and partial responses (PRs) collectively were most frequently observed in patients with the NMF4 subtype. These occurred most often in the ABCP arm (76%), although NMF4 patients did have CRs and PRs with ACP (59%) and BCP (57%). In patients with the NMF1 subtype, 70% had a PR with ABCP (P = 0.012 vs those with stable disease or disease progression collectively), compared with 33% who had CRs or PRs with ACP and 41% who had PRs with BCP. In patients with NMF2 tumours, CRs and PRs occurred in 63%, 47%, and 59% in the ABCP, ACP, and BCP arms, respectively. In those with NMF3 tumours, the proportions with CRs or PRs were 41%, 28% and 46% in the respective treatment arms. We had similar observations of PFS excluding patients with EGFR or ALK mutations (Supplementary Fig 3g). In summary, tumour responses to all three combinations were most frequently seen in patients with the NMF4 subtype, followed by NMF1 responses to ABCP.
The association between NMF molecular subtypes in patient tumour samples from OAK8 and survival with atezolizumab monotherapy or docetaxel was also investigated (Supplementary Fig. 3b). Unlike in IMpower150, the NMF4 subtype in OAK was not associated with substantially longer median PFS or OS with atezolizumab than docetaxel (Supplementary Fig. 3b). The median PFS was generally similar in both treatment arms for all four subtypes (Supplementary Fig 3c). The median OS was substantially shorter in patients with NMF2 than the other subtypes in the atezolizumab and docetaxel arms (Supplementary Fig 3c). We also compared the survival rate of NMF2 versus NMF4 within patients with PD-L1–positive and PD-L1–negative NSCLC (Supplementary Fig. 3d).We found no significant improvement in OS or PFS in the NMF2 or NMF4 arms, regardless of the PD-L1 levels. These data suggest that patients with NMF4 NSCLC may require bevacizumab to achieve significantly improved efficacy with atezolizumab over other treatments.
We performed an additional validation analysis using a cohort of patients with unresectable hepatocellular carcinoma from the IMbrave150 trial, which compared combination treatment with anti-PD-L1 (atezolizumab) and anti-VEGF (bevacizumab) in one arm with anti-RAF kinase and anti-VEGFR (sorafenib) treatment in the other21. Because NSCLC and HCC have distinct molecular landscapes, the transcriptional subtypes identified in IMpower150 and resultant random forest classifier cannot be directly applied to samples from IMbrave150. Therefore, we used lymphocyte gene signatures (T- and B-cell signatures) (Bagaev et al. 2021 Cancer Cell) to approximate the defining features of the NMF4 subtype identified in IMpower150, to group the patients from IMbrave150 into two groups. We dichotomized patients with median T cell enrichment and B cell enrichment respectively. One group of patients with high T cell enrichment and high B cell enrichment were representative of NMF4. The remaining patients grouped together represented the other subtypes (NMF1, 2, and 3). We observed the PFS trended in the consistent direction, of which NMF4-like subgroup has longer PFS (median PFS 8.81 months) than the other patients (median PFS 5.68 months) in the combination arm (HR 0.709 [95% CI 0.433,1.161]) but not in the sorafenib arm (HR 1.149 [95% CI 0.586,2.256]) (Supplementary Fig 1e). The statistical insignificance on the combination arm could be because of the different indication than IMpower150 and different method for classification. We also observed that patients in the NMF4-like group had improved PFS in the atezolizumab and bevacizumab vs sorafenib arm (HR 0.242 [95% CI 0.108,0.539]). These findings support our hypothesis and confirm the robustness of the NMF4 subtype classification in the context of combination treatment with anti-PD-L1 and anti-VEGF.
While our analysis suggests that the NMF4 subtype may be predictive of response to ABCP in IMpower150, it was important to acknowledge that this group also demonstrates higher PD-L1 expression and lower rates of STK11/KEAP1 alterations, which were reported to be independently associated with improved outcomes in immunotherapy22. To delineate the unique clinical impact of NMF4 status, we performed a multivariate analysis that included PD-L1 expression, TMB, smoking history, and the status of pathogenic/biallelic STK11 and KEAP1 mutations within the ABCP arm (Fig. 3e). Our findings indicated that even after accounting for these factors, the NMF4 signature remains a significant predictor of response to ABCP, suggesting it provided independent predictive value.
We also compared TCGA lung adenocarcinoma subtypes for a parallel clinical analysis. We observed that the proximal inflammatory squamoid subtype had the longest median OS and PFS with ABCP compared to BCP (Supplementary Fig. 3e). Nonetheless, the positive outcome association with NMF4 within the ABCP arm was independent of the proximal inflammatory squamoid subtype (Supplementary Fig 3f, PFS shown in the left panel, OS shown in the right panel). This indicated that the NMF4 subtype captures unique features and could predict the response to ABCP independent of TCGA subtypes. In summary, these outcomes suggest that patients with NMF4 tumours benefit from the ABCP combination, compared with either ACP or BCP.
Pathological features are correlated with tumour transcriptional profiles and responses to immunotherapy
To elucidate why the NMF2 and NMF4 subtypes had similarly high PD-L1 levels by immunohistochemistry staining but different responses to ABCP, we compared the immune pathological features of each molecular subtype. A lymphocyte, immune cell, macrophage, and plasma cell (TIMAP) digital model trained on histology slides was used to predict the human interpretable pathological features in the malignant (tumour) and supporting tissue (stroma) of the four NMF subtypes among IMpower150 tumour samples.
Comparing the distribution between subtypes of stromal lymphocytes and plasma cells and intra-epithelial lymphocytes (Fig. 4a), we observed the following. NMF1 had the lowest density of stromal plasma cells and the highest level of intra-epithelial lymphocytes. NMF2 had the lowest density of stromal lymphocytes. NMF3 had levels of stromal lymphocytes similar to NMF1 and the lowest level of intra-epithelial lymphocytes. NMF4 had a high density of intra-epithelial tumoral lymphocytes similar to NMF1 and the highest levels of stromal lymphocytes and plasma cells. These features were consistent with the enrichment of lymphocyte signatures from the RNA sequencing data (Supplementary Fig. 4a).
Fig. 4. NSCLC subtypes have different stromal pathologic features and treatment outcomes.
a Distribution of lymphocyte and plasma cell density in the cancer stroma and cancer epithelium by NMF subtype. *P < 0.05; **P < 0.01; *** P < 0.001. P-values can be found in the Source Data. The lower and upper hinges in the box plot correspond to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5× IQR from the hinge (where IQR is the interquartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. Two-sided t-test was used for the statistical test. b Scatterplots of plasma cell and lymphocyte cell densities in each NMF subtype. Each dot indicates one sample. Samples are grouped by the median of plasma cell and lymphocyte density. Red: plasma cell-high, lymphocyte-high; orange: plasma cell-low, lymphocyte-high; green: plasma cell-high lymphocyte-low; blue: plasma cell-low, lymphocyte-low. The percentages within each quartile indicate the fraction of samples within each group. c Forest plots show the median PFS and HR by stromal cell density ratio for patients in IMpower150 who received ABCP compared with BCP (left panel) and ACP compared with BCP (right panel). There were 74 samples in the HH group, 24 in the LH group, 20 in the HL group, and 78 in the LL group in the ABCP arm; 83 samples in the HH group, 18 in the LH group, 21 in the HL group, and 67 in the LL group in the BCP arm; 54 samples in the HH group, 24 in the LH group, 25 in the HL group, and 69 in the LL group in the ACP arm. d Proportions of tertiary lymphoid structures (TLS) in each NMF subtype. Sample numbers were NMF1: n = 106; NMF2: n = 187; NMF3: n = 169; NMF4: n = 124. Source data are provided as a Source Data file. ABCP atezolizumab plus bevacizumab plus carboplatin-paclitaxel, ACP atezolizumab plus carboplatin-paclitaxel, BCP bevacizumab plus carboplatin-paclitaxel, HH high-high, LH low-high, HL high-low, LL low-low.
Previously, we found that tumors with high lymphocyte and plasma cell infiltration identify tumors with tertiary lymphoid structure (TLS) or TLS-like structures13. TLS has both of two cell types to be present. So we decided to look at the pairing of lymphocytes and B cells. Samples were stratified into four groups (high-high, low-high, high-low, low-low) based on the medians of lymphocyte and plasma cell density in cancer stroma. These pairwise comparisons showed that NMF4 was the most enriched in high-plasma cell-high-lymphocyte samples compared with the other subtypes (Fig. 4b). NMF3 stroma had similar proportions of high-plasma-high-lymphocyte and low-plasma-low-lymphocyte samples. In contrast, NMF1 and NMF2 stroma were the most enriched in low-plasma cell-low-lymphocyte samples.
To determine whether the balance among these cell types was associated with ABCP treatment outcomes irrespective of molecular subtype, HRs for PFS after treatment with ABCP and ACP versus BCP were compared among different TME-associated cell type pairs at different ratios (Fig. 4c); HRs < 1 favoured ABCP. Only patients with high-plasma cell-high-lymphocyte stroma had PFS HRs significantly favouring ABCP vs BCP (i.e., HR and 95% CI < 1), suggesting that stromal plasma cells are essential for the response to ABCP. This is similar to our previous finding that tertiary lymphoid structures (TLS), B- and plasma cells are predictive of prolonged OS with atezolizumab monotherapy13. The trends were in the same direction for OS between these treatments based on TME-associated cell type pairs at different ratios (Supplementary fig. 4b). These patterns of response based on cell type density ratios were not seen in patients treated with ACP, suggesting that these scenarios may be unique to the anti-PD-L1–anti-angiogenesis–chemotherapy combination.
The gene signature and digital pathology features we observed suggested potential enrichment for TLSs in NMF4. TLSs have previously been shown to correlate with favourable clinical outcomes in NSCLC and improved responses to immunotherapies23,24, so we assessed their relationship with NMF subtype (Fig. 4d). NMF4 had the highest proportion of mature and immature TLSs (34%); NMF1 had the lowest (4%), followed by NMF2 (9%). We also observed that detectable TLS correlate with favourable clinical outcomes on the ABCP arm in the IMpower150 dataset. (Supplementary Fig 4c). In summary, these results suggest that NMF4 is a uniquely lymphocyte-rich molecular subtype of advanced non-squamous NSCLC that is associated with improved outcomes following treatment with a combination of anti-PD-L1, anti-angiogenesis, and chemotherapy agents.
Discussion
Using an RNA sequencing dataset based on more advanced non-squamous NSCLC samples than currently publicly available, we identified four primary non-squamous NSCLC tumour subtypes from patients in the phase 3 IMpower150 study. The subtypes’ biologies differed in terms of PD-L1 protein expression and their associated TMEs, immune cells, and cancer-associated pathways and processes. Importantly, treatment with anti–PD-L1, anti-VEGF, and platinum-based chemotherapy combinations led to different clinical outcomes among subtypes. The differences were especially pronounced between NMF2 and NMF4, even though these subtypes were the most enriched in PD-L1–high tumours. Additionally, patients with NMF1 had substantially longer median PFS and OS with ACBP than ACP, despite having the second-lowest enrichment of PD-L1–positive tumours.
Considering tumor and TME signals, the NMF subtypes displayed distinct biology that may relate to the differences in clinical outcomes. For example, NMF1 subtype was enriched in neutrophil signals, while NMF2 and NMF4 were enriched in other myeloid signatures. In NSCLC, neutrophils can promote tumour growth and spread via modulation of tumour angiogenesis or by inducing local immunosuppression25. One limitation of the current study is that myeloid cell signatures used were derived from single-cell RNA-seq datasets, which, when used to analyze bulk RNA-seq data, may capture expression from other cell types in addition to myeloid cells. NMF2 was enriched in TGF-β CAFs, which associated with a lack of response to atezolizumab in metastatic urothelial carcinoma due to their restriction of T-cell infiltration26. Despite cancer-associated fibroblast signature enrichment, NMF4 had the most lymphocyte-inflamed transcriptomic profile and was the only subtype showing B-cell, Teff, and TLS enrichment, which have been associated with prolonged survival in patients with NSCLC treated with atezolizumab and different cancers treated with atezolizumab/bevacizumab combination13,23,27,28. While fibroblasts are generally associated with immune suppression and poor responses, their presence within a proinflammatory context might suggest an active dynamic where fibroblasts are being modulated by inflammatory signals. The atezolizumab/bevacizumab combination therapy could also interact with the fibroblast population in NMF4. Given the survival results and the complex TME, the mechanisms of interaction between fibroblasts and lymphocytes remain to be resolved.
Our findings are generally in agreement with outcome associations in other cancers treated with either PD-L1 blockade and/or VEGF blockade. For example, in renal cell carcinoma (RCC), improved outcomes with atezolizumab plus bevacizumab vs sunitinib were seen in patients with RCC molecular subtypes that had Teff, JAK/STAT, IFN-α and -γ expression profiles and the highest infiltration of T-cells, B-cells, macrophages, and dendritic cells29. This confirmed that pre-existing intratumoural adaptive immune presence is an important contributor to benefit from immunotherapy-containing regimens29. In metastatic urothelial carcinoma tumours with the ‘inflamed’ phenotype, the Teff gene signature combined with TMB correlated strongly with response to atezolizumab26. Molecular characterisation of advanced hepatocellular carcinoma samples also showed that pre-existing immunity in pre-treatment tumour tissues appears to drive the clinical activity of atezolizumab plus bevacizumab28. Overall, the data from this and previous studies indicate that pre-existing immunity in pre-treatment tumour tissues appears to drive the clinical activity of atezolizumab plus bevacizumab in several indications, including non-squamous NSCLC.
First-line ICI treatment decisions for patients with driver mutation-negative, non-squamous metastatic NSCLC are determined by PD-L1 expression status30. However, different clinical outcomes were seen between the NMF2 and NMF4 subtypes that were the most enriched for PD-L1–high tumours: patients with NMF4 tumours had substantially longer median PFS (24 months) and OS (35 months) with ABCP treatment while patients with NMF2 tumours had the shortest survival of all the subtypes (median PFS, 7 months; median OS, 16 months). We showed that patients with the NMF4 but not the NMF2 subtype who were PD-L1 positive benefited from ABCP. Our data indicate minimal additional benefit from the addition of atezolizumab to chemotherapy and bevacizumab in PD-L1 negative patients across any of the NMF subtypes. PD-L1 negative status has been previously associated with poor response to immunotherapies31, suggesting that the lack of benefit observed across all NMF subtypes may be due to broader underlying mechanisms of immune evasion or resistance. It suggests the need to identify additional biomarkers that might predict response in PD-L1 negative patients and investigate other therapeutic approaches that could benefit these patients.
The remarkable difference in outcomes between the PD-L1 expression-enriched subtypes NMF2 and NMF4 was not seen in patients treated with ACP in IMpower150, nor with atezolizumab monotherapy in OAK, although it should be noted that patients enrolled in OAK had received prior platinum chemotherapy. This suggests that bevacizumab is required for optimal outcomes in patients with NMF4-subtype NSCLC. This is consistent with previous findings that in RCC and hepatocellular carcinoma, atezolizumab plus bevacizumab resulted in longer median PFS than atezolizumab monotherapy in patients whose tumours had high expression of Teff and myeloid signatures27,28. While NMF3 appears to be associated with a good prognosis irrespective of treatment type, likely due to its well-differentiated status, patients with the NMF1 and NMF2 subtypes may benefit from different combination therapies. For example, NMF1 appears more akin to a squamous cell carcinoma; therefore, patients with this subtype may benefit from chemotherapy regimens typically given for advanced lung squamous cell carcinomas, such as platinum chemotherapy plus gemcitabine. Although PD-L1 tumour cell expression is high in NMF2, certain macrophages/monocyte populations are enriched and TLSs and/or lymphocyte infiltration appear to be minimal. Therefore, patients with the NMF2 subtype might benefit from additional innate immune stimulation that might repolarize these myeloid subsets into more pro-inflammatory states32.
In our study, while mutational differences do not solely define the transcriptional subtypes, we did observe differences in the prevalence of STK11 and KEAP1 alterations between NMF2 and NMF4. STK11 and KEAP1 mutations are known to play crucial roles in non-squamous carcinoma pathogenesis and progression3,22. Particularly, STK11 and KEAP1 alterations have been associated with a TME depleted of CD8 T cells22, which is consistent with our finding that NMF2 had the lowest level of CD8 T cells compared to the other molecular subtypes. But the STK11 and KEAP1 mutations were not independently significantly associated with clinical outcomes in our study after accounting for subtype. The impact of STK11 and KEAP1 on the TME and immunotherapy resistance requires further study.
A study limitation is that these were retrospective analyses. It remains to be seen if patients prospectively selected according to these profiles would obtain the comparative clinical benefits demonstrated here. Another limitation is that the TIMAP model could only identify large classes of cell types, rather than granular subsets. Finally, these findings remain to be validated in larger prospective clinical studies and with easily usable tests.
In conclusion, NMF4 subtype represents a patient population with advanced NSCLC enriched in Teff, B cells, plasma cells, pan-macrophages, and PD-L1–positive tumours that may benefit more from first-line ABCP than patients with the other NSCLC subtypes. These findings deepen our understanding of clinically relevant NSCLC subtypes and provide insights that could potentially inform individualised therapy for patients with metastatic NSCLC.
Methods
Study design and participants
The design of the randomised, open-label, phase 3 IMpower150 trial has been reported previously10. Patients with chemotherapy-naive metastatic NSCLC were randomly assigned (1:1:1) to receive atezolizumab plus bevacizumab plus carboplatin plus paclitaxel (ABCP), atezolizumab plus carboplatin plus paclitaxel (ACP), or bevacizumab plus carboplatin plus paclitaxel (BCP) every 3 weeks. The co-primary endpoints were overall survival (OS) and investigator-assessed progression-free survival (PFS) in the intention-to-treat population of patients with wild-type NSCLC, i.e., without epidermal growth factor receptor (EGFR) or anaplastic lymphoma kinase (ALK) genetic alterations9. In this study, pre-treatment tumour biopsy samples from 564 of 1202 patients (47%) were transcriptionally profiled by RNA sequencing analysis, and TMB was profiled by DNA whole-exome sequencing in the 261 samples for which sufficient tissue remained.
The study design of the randomised, open-label phase 3 OAK study of atezolizumab monotherapy vs docetaxel (NCT02008227) in 850 previously treated patients with advanced, metastatic NSCLC has also been reported previously8. The primary endpoint was OS. In this study, tumour samples from 541 patients with non-squamous NSCLC were transcriptionally profiled by RNA sequencing analysis and NMF using the same methods as for the tumour samples from IMpower150.
RNA processing
Formalin-fixed paraffin-embedded (FFPE) tissue was macro-dissected for tumour area using haematoxylin and eosin as a guide. RNA was extracted using the Qiagen miRNeasy FFPE kit (Qiagen) and assessed using Qubit and Agilent Bioanalyzer for quantity and quality. First-strand cDNA synthesis was primed from total RNA using random primers, followed by the generation of second-strand cDNA with dUTP in place of dTTP in the master mix to facilitate preservation of strand information. Libraries were amplified by PCR and enriched for the mRNA fraction by positive selection using a cocktail of biotinylated oligonucleotides corresponding to coding regions of the genome. Libraries were sequenced using the Illumina sequencing-by-synthesis platform on the Novaseq 6000, with a sequencing protocol of 50 bp paired-end sequencing and total read depth of 80 M reads per sample.
RNA sequencing data generation and gene expression analysis
Whole-transcriptome profiles were generated using the Illumina TruSeq Stranded Total RNA method. Before generating the sequencing libraries, ribosomal (r)RNA was removed with biotinylated probes that selectively bind rRNA species using the RiboZero Magnetic Gold kit. The remaining reads were aligned to the human reference genome (NCBI Build 38) using GSNAP version 2013-10-10, allowing a maximum of two mismatches per 75-base sequence (parameters: ‘-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1-pairmax-rna = 200000 –clip-overlap). Transcript annotation was based on the Ensembl genes database (release 77). To quantify gene expression levels, the number of reads mapped to the exons of each RefSeq gene was calculated using the functionality provided by the R/Bioconductor package GenomicAlignments. Raw counts were adjusted for gene length using transcript-per-million normalization, and subsequently log2-transformed.
Consensus non-negative matrix factorization
An unsupervised machine learning approach based on consensus non-negative matrix factorization (NMF) was applied to normalized RNA sequencing data to identify transcriptionally distinct subtypes of NSCLC. Using median absolute deviation analysis, we selected 3072 genes that had the highest variability across the tumour samples (top 10%). Subclasses were then computed by reducing the dimensionality of the expression data from thousands of genes to a few metagenes using consensus NMF clustering (CRAN. R package version 0.22.0)33. This method computes multiple k-factor factorization decompositions of the expression matrix and evaluates the stability of the solutions using a cophenetic coefficient. The most robust subtypes identified by consensus NMF clustering of 564 tumours using the 3072 most variable genes was k = 4, testing k = 2 to k = 8. To evaluate the performance of the NMF classification, we used the holdout method to split the samples in IMpower150 into two groups as training (70%) and testing (30%) datasets. The accuracy of the classification reached 0.9539 in the testing data.
Validation of molecular subtypes
The molecular subtypes identified in the IMpower150 tumour samples were validated using 541 pre-treatment tumour samples from the OAK trial of atezolizumab monotherapy vs docetaxel for pretreated NSCLC8. We trained a machine learning classifier based on the random forest algorithm (R package randomForest version 4.6.14) using the IMpower150 samples and predicted the NMF clusters in the OAK independent data set using R package Caret version 6.0.92. A random forest classifier involves learning a large number of binary decision trees from random subsets of the training data (the IMpower150 transcript per million [tpm] matrix used to discover the NMF classes). For generalisation, the samples in this training dataset were down-sampled based on the samples NMF group, and the genes were normalised (i.e., z-score transformed).
Tumour whole-exome sequencing and variant calling
Whole-exome libraries were prepared from tumour FFPE DNA and matched germline DNA using the Agilent SureSelect v6 and sequenced at 2 ×150 bp. Fastq file quality checks were performed with FastQC (v.0.11.9). Fastqs were pre-processed and aligned to hg38 using Picard (v2.18), Burrows-Wheeler Aligner (v0.7.15-r1140) Genome Analysis Toolkit v4.1.4.1. Tumour/normal pair confirmation was provided by NGSCheckmate. Variant calling was done by Mutect2, LoFreq2, and Strelka and annotated using Ensembl Variant Effect Predictor. Nonsynonymous variants with a variant effect predictor score of moderate or high were only reported if identified by two of three variant callers.
Copy-number alteration profiling by WES
Copy-number alterations were determined from the resulting BAM files, using CNVKit using default parameters (Talevich 2014 PLOS). Copy-number alterations were categorized for each segment as “Amplication”, “Deletion” and “Normal”, which were defined as follows: log2 copy number ≥ 0.51 for Amplification; log2 copy number ≤ −0.46 for deletion; log2 copy number > −0.46 and < 0.51 as Normal.
Gene expression/signature analyses
Gene sets in this study were curated from public repositories as cited throughout the text. Gene signature scores were calculated as the mean z-score for all the genes in the signature across each respective cohort.
TCGA subtype assignment and molecular subtypes assignment
For the TCGA subtype assignments, the expression data were subset to only the genes used for each TCGA subtype16. The expression levels of the subset genes were standardized by subtracting the mean expression of the subset genes in each sample and then dividing by the standard deviation. For each sample, the mean of standardized expression of the subset genes was compared with the centroid values for each subtype from Wilkerson et al.34. Each sample was assigned to the subtype with the smallest difference between its mean expression and corresponding centroid value.
For subtype assignments based on the study by Roh et al.17, we first standardized the expression levels by subtracting the mean expression of all genes in each sample and then dividing by the standard deviation. Next, we identified the top 50 genes specific to each subtype as defined in the Roh et al. study. We then calculated the mean standardized expression of these subtype-specific genes to determine the enrichment score for each subtype. Finally, each sample was assigned to the subtype with the highest enrichment score.
Pathologist review of clinical trial whole-slide images for digital pathology image quality control, metadata accuracy, differentiation, and tertiary lymphoid structure screening
A team of five board-certified pathologists reviewed all images on an internal digital pathology viewer (Roche IRISe) according to predefined criteria for inclusion in pathology readouts or analysis. Images were rejected from further study if > 40% of the tumour area was out of focus or obscured by artefact. High-quality images contained adequate tumour area, defined as in-focus areas containing more than 100 invasive tumour cells with intact, contiguous tumour-associated stroma. Case- and sample-level metadata were confirmed or adjudicated to reflect image-level content for sample diagnosis, collection type, anatomic location, general tissue type, and tumour type.
For differentiation, pathologists reviewed high-quality images for features of glandular differentiation, including characteristic acinar, papillary, micropapillary, and lepidic patterns, as well as the unequivocal presence of mucin. ‘Well differentiated’ was defined as an architectural pattern of glandular differentiation resembling benign lung tissue (i.e., lepidic) or showing well-formed, simple glands in > 95% of tumour areas, with or without mucin. If the percentage of glandular differentiation was lower (10–95%), the ‘moderate differentiation’ label was applied. In these cases, glands might have been incomplete or shown micropapillary features or solid or cribriform growth patterns. ‘Undifferentiated’ cases showed < 10% glandular differentiation, including tumours with large cell or signet ring features.
For tertiary lymphoid structure (TLS) screening, pathologists reviewed high-quality images at low power (digital 1–2×) to identify lymphoid aggregates, then confirmed the presence or absence of a germinal centre at high power (digital 4–8×) to categorise observations. Images from lymph node samples were excluded from TLS screening. Mature TLSs were defined as a lymphoid infiltrate of any size composed of tightly packed lymphocytes and a germinal centre recognizable as central pallor at low power, with a well-defined ovoid or round distinct border at high power, containing characteristic cell morphologies such as tingible-body macrophages, centrocytes, centroblasts, dendritic cells, and/or mitotic figures. High endothelial venules or vascular hyalinization sclerosis could be present. Immature TLSs were defined as a distinct, dense collection of small monotonous lymphocytes measuring between 200 and 500 microns, easily recognized at 1-2×, and lacking a distinct germinal centre on high power. Some high endothelial venules or thin-walled vessels could be present but typically not at high density.
PD-L1 immunohistochemistry and scoring
PD-L1 protein expression was assessed by immunohistochemistry using the SP263 assay (Ventana, AZ). SP263 assesses the percentage of tumour cells with membranous PD-L1 staining of any intensity. Categories were defined by tumour cell (TC) cutoff values of PD-L1 negative (TC < 1%), PD-L1 positive (TC ≥ 1%), PD-L1 low (TC 1%-49%), or PD-L1 high (TC ≥ 50%).
Prediction of cell types using convolutional neural networks
Digital images were obtained from 970 patients (80%) from the IMpower150 study and 333 vendor-procured samples. These were independently examined by 5 pathologists. The 564 patients from IMpower150 were used for the NMF1-4 analysis were part of this larger cohort. Image models specific to NSCLC were co-developed in collaboration with PathAI, using methods similar to those previously published32.
Briefly, convolutional neural network artefact, region, and cell models were trained to segment images to identify histopathological features including regions of artefact, cancer epithelium, stroma, and necrosis, as well as cell types including cancer cells, lymphocytes, macrophages, plasma cells, and fibroblasts. A total of 53,242 annotations were collected from 90 distinct pathologists, using bounding box annotations for tissue regions and point annotations for cell types of interest.
The performance of the resultant image models for cell labels was tested using a high-quality ground truth dataset generated from held-out images. The cell model was deployed on small regions of images (frames) containing representative cell populations that were separately exhaustively annotated by five pathologists. The median cell count from the pathologist team for each frame was considered as the consensus count for each cell type. The cell count from the image model and the average count from the pathologist team was compared to the consensus using Pearson rank correlation. Model performance was considered acceptable when the model to consensus Pearson rank correlation was not inferior to the single annotator to consensus correlation. Image models were then deployed across all study images (n = 1218) and pre-defined features were extracted from each image.
Statistical analysis
For survival analyses, the log-rank test was used to compare Kaplan-Meier survival curves. Cox proportional hazards regression models were used to generate hazard ratios and 95% confidence intervals. Multivariable Cox proportional hazards regression models were used to compare the interdependence of distinct biomarkers for prediction of OS benefit.
Pearson’s χ2 test with continuity correction was used (R function chisq.test) for categorical variables. Benjamini-Hochberg FDR adjusted P values (q values) are reported31.
Spearman’s correlation test was applied to human interpretable factors and RNA gene signatures.
The lower and upper hinges in all box plots correspond to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5×IQR from the hinge (where IQR is the interquartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge.
All analyses were conducted using R v.4.1.0 with the survminer, survival, and limma packages.
Ethics statement
Details of the IMpower150 study design and primary results have been published previously9. The full study protocol and statistical analysis plan are publicly available at ClinicalTrials.gov under the identifier NCT02366143. The study protocol received approval from institutional review boards or ethics committees at each participating site, in accordance with the International Conference on Harmonisation Good Clinical Practice (ICH-GCP) guidelines and the ethical principles outlined in the Declaration of Helsinki. A complete list of the 161 ethics committees involved is provided in the Reporting Summary. All participants gave written informed consent prior to enrollment and were not compensated for their participation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
This work was supported by F. Hoffmann-La Roche Ltd /Genentech, Inc, a member of the Roche Group. We thank the patients and their families and the IMpower150 study team and investigators. Medical writing assistance with preparation of the manuscript was provided by Samantha Santangelo, PhD, of Nucleus Global, an Inizio Company, and funded by Roche.
Author contributions
Conceptualization: T.L., H.H., S.M., D.S.S., M.B., M.X.H., M.K.S., and B.Y.N. Methodology: T.L., H.H., A.Q., A.A., and J.M.G. Software: T.L., H.H. Validation: T.L., H.H., X.G., and Y.W. Formal analysis: T.L., H.H., and M.X.H. Investigation: T.L., H.H., M.A.S., M.K.S., F.B., R.J., E.F., H.K., J.M.G., M.B., and B.Y.N. Resources: T.L., H.H., M.A.S., M.R., F.C., F.B., R.J., H.K., J.M.G., D.S.S., M.B, and B.Y.N. Data curation: T.L., H.H., X.G., Y.W., E.F., J.M.G., and B.Y.N. Writing – Original Draft: T.L., D.S.S., M.K.S., and B.Y.N. Writing – Review and Editing: All authors. Visualization: T.L., H.H. Supervision: M.R., S.M., H.K., J.M.G., M.K.S., D.S.S., M.B., M.X.H., M.K.S., and B.Y.N. Project Administration: M.A.S., D.S.S., M.B., M.K.S., and B.Y.N.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Data availability
The processed RNA-seq data and associated patient-level clinical data, are available in the EGA database under accession code [EGAS50000001272, https://ega-archive.org/studies/EGAS50000001272]. Data will be made available to qualified researchers via the European Genome-Phenome Archive (EGA). Access requires submission of a request through the standard EGA process and approval by the Genentech Data Access Committee. Approved researchers must enter into a Data Access Agreement, which enforces terms on patient privacy, data security, and use consistent with informed consent and applicable data protection laws. Under the current agreement, qualified researchers may access the data for one year for purposes aligned with these terms. Agreement provisions may evolve over time in line with changes in privacy regulations and technology. The remaining data are available within the Article, Supplementary Information or Source Data. Source data are provided with this paper. For up-to-date details on Roche’s Global Policy on the Sharing of Clinical Information and how to request access to related clinical study documents, see https://go.roche.com/data_sharing. Anonymised records for individual patients across more than one data source external to Roche cannot, and should not, be linked due to a potential increase in risk of patient re-identification. Source data are provided with this paper.
Competing interests
M.A.S. declares grants or contracts from AstraZeneca, BeOne, Enliven, Genentech, and Lilly; consulting fees from BMS, Lilly, Spectrum, and Summit; payment or honoraria from AstraZeneca, Guardant, Janssen, Jazz, and Merck/MSD; and participation on a data safety monitoring board or advisory board for BMS and Summit. M.R. declares consulting fees from Amgen, AstraZeneca, Beigene, Boehringer-Ingelheim, BMS, Daiichi-Sankyo, GSK, Janssen, Lilly, Merck/MSD, Mirati, Novartis, Pfizer, Regeneron, Roche, and Sanofi; payment or honoraria from Amgen, AstraZeneca, Beigene, Boehringer-Ingelheim, BMS, Daiichi-Sankyo, GSK, Janssen, Lilly, Merck/MSD, Mirati, Novartis, Pfizer, Regeneron, Roche, and Sanofi; support for attending meetings and/or travel from Amgen, AstraZeneca, Beigene, Boehringer-Ingelheim, and BMS; and participation on a data safety monitoring board or advisory board for Daiichi-Sankyo and Sanofi. F.C. declares grants or contracts from AbbVie, Amgen, AstraZeneca, Bayer, Beigene, BMS, Galecto, Illumina, Lilly, Merck/MSD, Mirati, Novocure, OSE, Pfizer, Pharmamar, Regeneron, Roche, Sanofi, Summit Therapeutics, Takeda, and Thermofisher; consulting fees from AbbVie, Amgen, AstraZeneca, Bayer, Beigene, BMS, Galecto, Illumina, Lilly, Merck/MSD, Mirati, Novocure, OSE, Pfizer, Pharmamar, Regeneron, Roche, Sanofi, Summit Therapeutics, Takeda, and Thermofisher; payment or honoraria from AbbVie, Amgen, AstraZeneca, Bayer, Beigene, BMS, Galecto, Illumina, Lilly, Merck/MSD, Mirati, Novocure, OSE, Pfizer, Pharmamar, Regeneron, Roche, Sanofi, Summit Therapeutics, Takeda, and Thermofisher; and participation on a data safety monitoring board or advisory board for AbbVie, Amgen, AstraZeneca, Bayer, Beigene, BMS, Galecto, Illumina, Lilly, Merck/MSD, Mirati, Novocure, OSE, Pfizer, Pharmamar, Regeneron, Roche, Sanofi, Summit Therapeutics, Takeda, and Thermofisher. F.B. declares participation on a data safety monitoring board or advisory board for AbbVie, ACEA, Amgen, AstraZeneca, Bayer, Boehringer-Ingelheim, BMS, Eli Lilly Oncology, Eisai, F. Hoffmann–La Roche Ltd, Genentech, Ignyta, Innate Pharma, Ipsen, Loxo, MedImmune, Merck/MSD, Novartis, Pfizer, Pierre Fabre, Sanofi-Aventis, Summit Therapeutics, and Takeda; payments made to their institution. R.M.J. declares receiving personal fees and advisory board participation with Mirati Therapeutics, Inc. E.F. declares consulting fees from Genentech. D.S.S. declares royalties or licenses from UTSW Medical Center, stock in BeOne Medicines, and was an employee of Genentech at the relevant time of this study. T.L., H.H., S.M., A.Q., A.A., X.G., H.K., J.M.G., M.B., M.X.H., Y.W., M.K.S., and B.Y.N. declare they are employees of Genentech/Roche and hold Roche stock.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Minu K. Srivastava, Barzin Y. Nabet.
Change history
2/20/2026
In this article the affiliation details for Federico Cappuzzo were incorrectly given as 'Istituto Nazionale Tumori Regina Elena, Rome, Italy' but should have been 'Istituto Nazionale Tumori IRCCS Regina Elena, Rome, Italy'. The original article has been updated.
Contributor Information
Minu K. Srivastava, Email: srivastava.minu@gene.com
Barzin Y. Nabet, Email: nabet.barzin@gene.com
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-025-66803-8.
References
- 1.Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin.73, 17–48 (2023). [DOI] [PubMed] [Google Scholar]
- 2.Sung, H. et al. GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.71, 209–249 (2021). [DOI] [PubMed] [Google Scholar]
- 3.Molina, J. R., Yang, P., Cassivi, S. D., Schild, S. E. & Adjei, A. A. Non-small cell lung cancer: epidemiology, risk factors, treatment, and survivorship. Mayo Clin. Proc.83, 584–594 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tang, S. et al. Immune checkpoint inhibitors in non-small cell lung cancer: progress, challenges, and prospects. Cells11, 320 (2022).
- 5.Chen, D. S., Irving, B. A. & Hodi, F. S. Molecular pathways: next-generation immunotherapy—inhibiting programmed death-ligand 1 and programmed death-1. Clin. Cancer Res.18, 6580–6587 (2012). [DOI] [PubMed] [Google Scholar]
- 6.Sun, C., Mezzadra, R. & Schumacher, T. N. Regulation and function of the PD-L1 checkpoint. Immunity48, 434–452 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1–selected patients with NSCLC. N. Engl. J. Med383, 1328–1339 (2020). [DOI] [PubMed] [Google Scholar]
- 8.Mazieres, J. et al. Atezolizumab versus docetaxel in pretreated patients with NSCLC: final results from the randomized phase 2 POPLAR and phase 3 OAK clinical trials. J. Thorac. Oncol.16, 140–150 (2021). [DOI] [PubMed] [Google Scholar]
- 9.Socinski, M. A. et al. Atezolizumab for first-line treatment of metastatic nonsquamous NSCLC. N. Engl. J. Med378, 2288–2301 (2018). [DOI] [PubMed] [Google Scholar]
- 10.Socinski, M. A. et al. IMpower150 final overall survival analyses for atezolizumab plus bevacizumab and chemotherapy in first-line metastatic nonsquamous NSCLC. J. Thorac. Oncol.16, 1909–1924 (2021). [DOI] [PubMed] [Google Scholar]
- 11.Brueckl, W. M., Ficker, J. H. & Zeitler, G. Clinically relevant prognostic and predictive markers for immune-checkpoint-inhibitor (ICI) therapy in non-small cell lung cancer (NSCLC). BMC Cancer20, 1185 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gandara, D. R. et al. Blood-based tumor mutational burden as a predictor of clinical benefit in non-small-cell lung cancer patients treated with atezolizumab. Nat. Med24, 1441–1448 (2018). [DOI] [PubMed] [Google Scholar]
- 13.Patil, N. S. et al. Intratumoral plasma cells predict outcomes to PD-L1 blockade in non-small cell lung cancer. Cancer Cell40, 289–300.e284 (2022). [DOI] [PubMed] [Google Scholar]
- 14.Genova, C. et al. Therapeutic implications of tumor microenvironment in lung cancer: focus on immune checkpoint blockade. Front Immunol.12, 799455 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rakaee, M. et al. Association of machine learning-based assessment of tumor-infiltrating lymphocytes on standard histologic images with outcomes of immunotherapy in patients with NSCLC. JAMA Oncol.9, 51–60 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature511, 543–550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Roh, W. et al. High-resolution profiling of lung adenocarcinoma identifies expression subtypes with specific biomarkers and clinically relevant vulnerabilities. Cancer Res82, 3917–3931 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gavish, A. et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature618, 598–606 (2023). [DOI] [PubMed] [Google Scholar]
- 19.Huang, R. S. P. et al. Pan-cancer landscape of CD274 (PD-L1) copy number changes in 244 584 patient samples and the correlation with PD-L1 protein expression. J. Immunother. Cancer9, 2680 (2021).
- 20.Murugesan, K. et al. Association of CD274 (PD-L1) copy number changes with immune checkpoint inhibitor clinical benefit in non-squamous non-small cell lung cancer. Oncologist27, 732–739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Finn, R. S. et al. Atezolizumab plus bevacizumab in unresectable hepatocellular carcinoma. N. Engl. J. Med382, 1894–1905 (2020). [DOI] [PubMed] [Google Scholar]
- 22.Skoulidis, F. et al. STK11/LKB1 mutations and PD-1 inhibitor resistance in KRAS-mutant lung adenocarcinoma. Cancer Discov.8, 822–835 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ren, F. et al. Tertiary lymphoid structures in lung adenocarcinoma: characteristics and related factors. Cancer Med11, 2969–2977 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Trüb, M. & Zippelius, A. Tertiary lymphoid structures as a predictive biomarker of response to cancer immunotherapies. Front Immunol.12, 674565 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Horvath, L. et al. Beyond binary: bridging neutrophil diversity to new therapeutic approaches in NSCLC. Trends Cancer10, 457–474 (2024).
- 26.Mariathasan, S. et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature554, 544–548 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McDermott, D. F. et al. Clinical activity and molecular correlates of response to atezolizumab alone or in combination with bevacizumab versus sunitinib in renal cell carcinoma. Nat. Med.24, 749–757 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhu, A. X. et al. Molecular correlates of clinical response and resistance to atezolizumab in combination with bevacizumab in advanced hepatocellular carcinoma. Nat. Med.28, 1599–1611 (2022). [DOI] [PubMed] [Google Scholar]
- 29.Motzer, R. J. et al. Molecular subsets in renal cancer determine outcome to checkpoint and angiogenesis blockade. Cancer Cell38, 803–817.e804 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hendriks, L. E. et al. Non-oncogene-addicted metastatic non-small-cell lung cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann. Oncol.34, 358–376 (2023). [DOI] [PubMed] [Google Scholar]
- 31.Mo, D.-C. et al. The role of PD-L1 in patients with non-small cell lung cancer receiving neoadjuvant immune checkpoint inhibitor plus chemotherapy: a meta-analysis. Sci. Rep.14, 26200 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Barry, S. T., Gabrilovich, D. I., Sansom, O. J., Campbell, A. D. & Morton, J. P. Therapeutic targeting of tumour myeloid cells. Nat. Rev. Cancer23, 216–237 (2023). [DOI] [PubMed] [Google Scholar]
- 33.Brunet, J.-P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA101, 4164–4169 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wilkerson, M. D. et al. Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation. PLoS One7, e36530 (2012).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The processed RNA-seq data and associated patient-level clinical data, are available in the EGA database under accession code [EGAS50000001272, https://ega-archive.org/studies/EGAS50000001272]. Data will be made available to qualified researchers via the European Genome-Phenome Archive (EGA). Access requires submission of a request through the standard EGA process and approval by the Genentech Data Access Committee. Approved researchers must enter into a Data Access Agreement, which enforces terms on patient privacy, data security, and use consistent with informed consent and applicable data protection laws. Under the current agreement, qualified researchers may access the data for one year for purposes aligned with these terms. Agreement provisions may evolve over time in line with changes in privacy regulations and technology. The remaining data are available within the Article, Supplementary Information or Source Data. Source data are provided with this paper. For up-to-date details on Roche’s Global Policy on the Sharing of Clinical Information and how to request access to related clinical study documents, see https://go.roche.com/data_sharing. Anonymised records for individual patients across more than one data source external to Roche cannot, and should not, be linked due to a potential increase in risk of patient re-identification. Source data are provided with this paper.




