Skip to main content
Science Advances logoLink to Science Advances
. 2026 Mar 6;12(10):eady8546. doi: 10.1126/sciadv.ady8546

Epithelial plasticity shapes intratumoral heterogeneity and cell lineages in early-stage lung cancer

Yangjie Xiong 1,, Xiaofang Wang 1,, Kai Yan 2,3,, Ning Xin 4, Weiqing Li 5, Zhengwei Zhang 5, Yumei Cheng 1, Chunling Zeng 1, Yuxiang Luo 1, Xiaoxiao Liu 1, Xiaojing Lu 1, Xinhui Yan 1, Haoqi Lan 1, Tanwen Wu 1, Yue Dong 1, Xu Lin 5, Ying Li 5, Xiaona Jia 1, Simin Wang 1, Hua Tang 2,6,*, Yuexiang Wang 1,*
PMCID: PMC12965311  PMID: 41790896

Abstract

Intratumoral heterogeneity (ITH) has been investigated primarily in locally advanced or metastatic cancer; however, much less is known about ITH in early-stage cancer, and the origins of ITH are poorly understood. Through single-cell and spatial transcriptomics of early-stage ground-glass opacity (GGO)–like lung adenocarcinoma (LUAD) (14 patients; 103,375 cells), we systematically define tumor states and demonstrate that pervasive transcriptional ITH exists in early-stage LUAD. Lineage diversification through epithelial plasticity, via a shift to less differentiated states and transdifferentiation, underlies a critical dimension of early ITH in lung cancer. We further reveal that decreased differentiation serves as a pathognomonic feature of malignant transformation and predicts poor prognosis. Notably, we identified a unique transitional state during AT2-to-AT1 transdifferentiation with activated tumor-suppressive pathways/genes. Integrative analysis of scRNA-seq, CUT&Tag, and bulk RNA-seq reveals that KLF4 and JDP2 are key transcription factors that reprogram LUAD into transitional state and inhibit progression. These findings elucidate ITH mechanisms in early-stage cancer and propose epithelial plasticity-targeted therapies.


Single-cell analysis reveals how cell plasticity drives early-stage lung cancer heterogeneity.

INTRODUCTION

Lung cancer is the leading cause of cancer-related death worldwide with lung adenocarcinoma (LUAD) being the most common histological subtype (1). Tumor metastasis and drug resistance are the primary causes for mortality among patients with lung cancer (2), with intratumoral heterogeneity (ITH) widely recognized as a critical driver of these processes (36). ITH enables tumor cells to adapt to environmental pressures (4, 7, 8) while simultaneously increasing the complexity of therapeutic interventions (9). Therefore, a comprehensive understanding of ITH in LUAD and the mechanisms driving its emergence hold important clinical implications.

ITH is a hallmark of cancer (10, 11), referring to genetic, phenotypic, and behavioral differences among cells within the same tumor (3, 12, 13). ITH arises through both genetic and nongenetic mechanisms (12, 13). Genomic instability promotes stochastic mutation accumulation, increasing genetic diversity and generating subclones with distinct genotypes over time (1416). Moreover, the plasticity of tumor cells, coupled with microenvironmental signals, allows them to dynamically alter their phenotypic and functional states (17, 18). ITH also generates from differentiation during tumor evolution, where mature tumor cells regain stem-like properties and undergo self-renewal and differentiation to form diverse lineages (1922). In addition, recent studies show that tumor cells may follow normal developmental programs, generating heterogeneity by mimicking the differentiation pathways of normal cells (23, 24). These mechanisms are not mutually exclusive; rather, they interact to construct a multilayered and complex system of heterogeneity.

Single-cell transcriptome sequencing (scRNA-seq) is a robust and unbiased tool for assessing cellular and transcriptomic ITH (10, 25, 26). Current scRNA-seq studies in lung cancer have primarily concentrated on late-stage LUAD and therapeutic contexts (9, 2730). These studies have successfully decoded the complex tumor ecosystems, meticulously mapped the immune and stromal microenvironments, and provided critical insights into the roles of tumor cell populations in driving metastasis and mediating drug resistance (9, 2932). However, investigations into the characteristics of tumor cell populations in early-stage LUAD and the mechanisms driving the emergence of ITH remains limited.

In this study, we used scRNA-seq and spatial transcriptomic technologies, complemented by integrative computational analyses and functional validations, to dissect the cellular and transcriptomic ITH of early-stage LUAD at single-cell resolution. Clinical samples were collected from two distinct early-stage LUAD subtypes: minimally invasive adenocarcinoma (MIA) and stage I invasive adenocarcinoma (IA). We delineate the ITH landscape of early-stage LUAD with tumor sizes <3 cm and demonstrate that cell lineage development served as a key contributor to ITH. Mechanistically, lineage diversification driven by epithelial plasticity, via a shift to less differentiated states and transdifferentiation, defines a key dimension of early ITH in lung cancer. Furthermore, we characterized distinct cell populations and revealed decreased differentiation evidenced by the loss of alveolar lineage markers during tumor progression, which demonstrated significant prognostic value. Alveolar type 2 (AT2) cells are considered the cells of origin for LUAD and differentiate into AT1 cells in response to alveolar injury (33). Notably, we identified a distinct transitional state during AT2-to-AT1 transdifferentiation that exhibited activation of tumor-suppressive pathways/genes in early-stage LUAD, suggesting tumor cells may follow normal developmental constraints. Last, two key transcription factors (TFs), Krüppel-like factor 4 (KLF4) and Jun dimerization protein 2 (JDP2), were identified, whose overexpression in late-stage LUAD reversed the transitional phenotype, drove differentiation toward AT1 lineage, and suppressed tumor proliferation.

RESULTS

A single-cell transcriptional landscape of early-stage LUAD

The scRNA-seq was performed on freshly dissected LUAD tumor tissues collected from 12 patients with early-stage LUAD, including 6 MIA (minimally invasive) and 6 IA (invasive, stage I) samples (Fig. 1A). The clinical and histopathological characteristics and radiology images are summarized in fig. S1. None of the patients had received prior chemotherapy, radiotherapy, or immunotherapy (fig. S1). Following stringent quality control and removal of the doublets, 103,375 cells were retained for subsequent analyses. Unsupervised clustering analysis was used to define major cell clusters with similar expression profiles (Fig. 1B). Each cluster was annotated with corresponding gene markers: epithelial cells (markers: EPCAM, CDH1, KRT18, and KRT19), endothelial cells (markers: CLDN5, FLT1, PECAM1, and RAMP2), T cells (markers: CD3D, CD3E, CD3G, and TRAC), NK cells (markers: GNLY, KLRD1, NCAM1, and NKG7), B cells (markers: CD79A, IGHA2, IGHG3, and IGHM), myeloid cells (markers: CD68, FCGR3, MARCOA, and LYZ), and mast cells (markers: GATA2, KIT, and MS4A2) (Fig. 1C). We calculated the proportion of major cell clusters in each tumor tissue sample (Fig. 1D and table S1). A significant decrease in the proportion of T cells in the IA group was observed (Fig. 1E), which is consistent with the findings of previous studies (34). Further unsupervised clustering analysis of the main cell types revealed distinct molecular characteristics of stromal and immune cells (fig. S2, A to E).

Fig. 1. Global analysis of cell populations in IA and MIA LUAD patient samples.

Fig. 1.

(A) Schematic overview of the experimental design and analysis workflow. (B) UMAP plot depicting the major cell types identified in MIA (n = 6) and IA (n = 6). UMAP, uniform manifold approximation and projection. (C) Dot plot illustrating the expression levels of canonical marker genes across all major cell types. (D) Histogram displaying the cell composition percentage of each patient sample. (E) Quantification of cell cluster frequency representation among tissues from MIA to IA. The median and interquartile range are shown for each patient group, two-sided t test. (F) Heatmap showing large-scale CNVs for individual cells from 12 LUAD patient samples, clustered by the K-means clustering algorithm (rows). Nonmalignant cells were used as references (top), with large-scale CNVs observed in malignant cells (middle). The color indicates the log2 CNV ratio. Red: amplifications; blue: deletions. (G) Box plot presenting the CNV scores of different groups clustered by the K-means clustering algorithm. (H) Quantification of cell cluster frequency representation among tumor cells from MIA to IA. The median and interquartile range are shown for each patient group, two-sided Wilcoxon test. NK, natural killer; EPCAM, epithelial cell adhesion molecule.

The InferCNV (35) algorithm was used to calculate the copy number variation (CNV) scores in EPCAM+ epithelial cells to identify malignant cells. Unsupervised K-means clustering analysis of CNV profiles across all 22 autosomes identified distinct genomic alteration patterns in malignant cell clusters, while the K2 cluster showed no obvious CNV aberrations, maintaining a genetic profile consistent with normal cells (Fig. 1F). Based on both the presence of characteristic CNV patterns and statistically higher CNV scores (standard deviations of CNVs inferred across 22 autosomes, P < 2.2 × 10−16) comparable to normal cells (Fig. 1G), we classified cell clusters K1 and K3-K6 as malignant cells. Notably, K2 cluster and normal epithelial controls displayed lower variation than malignant cells (Fig. 1G). To further characterize the K2 cluster, we performed unsupervised clustering analysis and annotated the cells using established lung epithelial biomarkers (fig. S3, A and B). The resulting cell type annotation showed high concordance with normal epithelial cell signatures in the HCL reference database (fig. S3, C and D) (36). These findings demonstrate that the K2 cluster predominantly consists of normal epithelial cells, leading to its exclusion from subsequent tumor cell analyses. After identifying tumor cells in epithelial cells, we calculated the proportion of tumor cells in each tumor tissue. The results revealed a significant increase in the proportion of tumor cells in the IA group (Fig. 1H), suggesting that tumor cells play critical roles in the progression from MIA to IA.

Decreased differentiation is an indicator of early-stage LUAD progression

LUAD originates from AT2 cells and typically exhibits indolent progression during early stages (33). However, the transition from MIA to IA marks a critical turning point, characterized by aggressive clinical behavior and a phenotypic shift of tumor cells from benign to malignant states (37). To elucidate this progression, we profiled 9653 tumor cells from early-stage LUAD (MIA: 1190 cells; IA: 8463 cells) using scRNA-seq. For comprehensive comparative analysis, we integrated scRNA-seq data from healthy lung tissues (5498 AT2 cells) (38) and distinct pathological stages of LUAD, including advanced (stage III/IV; 5298 tumor cells) (31) and metastatic (12,141 tumor cells) stages (Fig. 2A) (29). We characterized the molecular and cellular transitions during LUAD progression at single-cell resolution.

Fig. 2. The decreased differentiation of epithelial cells marks tumor progression.

Fig. 2.

(A) UMAP plot depicting the major cell types identified in MIA (n = 6) and IA (n = 6). (B) Hallmark pathway enrichment in tumor cells among patients with MIA and IA, as determined by GSEA. Only pathways with a q value < 0.05 are displayed. (C) Box plots showing the MYC, P53, TNF-α, and WNT pathway scores per cell by AUCell between normal AT2 cells and tumor cells in the MIA, IA, advanced, and metastatic stages. The median and interquartile range are shown for each group, two-sided Wilcoxon test. (D) Inferred tumor cell lineages per cell using the scHCL database, comparing normal AT2 cells, AT1 cells, and tumor cells in the MIA, IA, advanced, and metastatic stages. (E) Distribution of AT1 cell and AT2 cell scores determined by AUCell across normal AT2 cells, AT1 cells, and tumor cells in the MIA, IA, advanced, and metastatic stages. Two-sided Wilcoxon test. (F) Trajectory of cells from normal AT1 cells and AT2 cells to the MIA, IA, advanced, and metastatic stages, constructed by Monocle2. Each point corresponds to a single cell. (G) AT1 cell and AT2 cell scores in the trajectory of cells from normal AT1 cells and AT2 cells to MIA, IA, advanced, and metastatic stages, constructed by Monocle2. Each point corresponds to a single cell. (H) Kaplan-Meier curves of overall survival for AT2 and AT1 scores in two independent LUAD cohorts, the TCGA LUAD cohort and the GSE72094 LUAD cohort (40). P values were calculated using a two-sided log-rank test, with scores dichotomized as high or low. High: samples within top quartile of signature score. Low: samples below the third quartile of signature score.

Hallmark gene sets were used to evaluate pathway changes in tumor cells. Distinct signaling pathway alterations were observed at each stage (fig. S4, A to E, and table S3). Tumor cells of advanced-stage and metastatic LUAD showed predominant activation of cell proliferation pathways (Fig. 2B), particularly myelocytomatosis oncogene (MYC) signaling (Fig. 2C and fig. S4, B and C). In contrast, tumor cells of early-stage LUAD displayed marked enrichment of biological processes, including p53 pathway–mediated apoptosis, tumor necrosis factor–α (TNF-α) signaling–driven immune responses (Fig. 2C and fig. S4E), and particular significance in cell differentiation programs involving Wnt, NOTCH, transforming growth factor–β, and epithelial–mesenchymal transition (EMT) signaling cascades (Fig. 2B).

To investigate the role of tumor cell differentiation in the progression of LUAD, we used the HCL (Human Cell Landscape) database, a valuable and well-annotated scRNA-seq resource for human biology, to infer tumor cell lineages across distinct pathological stages. This method was validated in healthy lung tissue and could accurately identify epithelial cell subtypes (fig. S3D). Lineage inference of tumor cells revealed that MIA and IA predominantly displayed mixed AT2/AT1 lineage features (Fig. 2D and table S4), which is consistent with previous research findings (30). Notably, when we calculated epithelial lineage features in tumor cells using AT2- and AT1-specific signature genes derived from differentially expressed genes (DEGs) in normal lung epithelial cells (table S5) (38), we observed a progressive loss of AT2/AT1 lineage features during LUAD progression from early to advanced stage (Fig. 2E). We next used pseudotime trajectory analysis to delineate the complete evolutionary pathway of tumor cells from their AT2 cell origins through successive stages. Tumor cells sequentially progressed from AT2 to MIA, IA, advanced, and metastatic stages, following a pseudotime developmental trajectory (Fig. 2F). The progressive loss of AT2/AT1 lineage characteristics marks LUAD tumor cell progression (Fig. 2G). These results are consistent with the idea that decreased differentiation of committed epithelial cells drives pathogenesis in multiple disorders, particularly malignancies (22, 39).

To characterize decreased differentiation as an essential phenotypic feature accompanying malignant transformation during early-stage LUAD progression, we analyzed a scRNA-seq dataset of genetically engineered mouse model of LUAD spanning seven stages, ranging from preneoplastic hyperplasia to adenocarcinoma (fig. S5A) (21). This longitudinal analysis revealed a progressive loss of AT2 cell features during tumor progression, with the most significant reduction occurring during the adenoma-to-adenocarcinoma transition (fig. S5B). We also performed survival analysis in two independent early-stage LUAD (stage I/II) cohorts [The Cancer Genome Atlas (TCGA), n = 443; GSE72094, n = 321] (40). The data from both cohorts demonstrated that patients with higher AT2/AT1 lineage feature exhibited significantly better survival (Fig. 2H).

The decreased differentiation of tumor cells is not only a biological hallmark of LUAD progression from early to advanced stages but also a key phenotypic feature of the benign-to-malignant transition in early-stage LUAD progression. This finding provides a clinically relevant framework for prognostic stratification in early-stage LUAD.

Intratumoral transcriptional heterogeneity in early-stage LUAD

Understanding the heterogeneity patterns in early-stage LUAD provides critical insights for developing previously unknown therapeutic strategies. An integrative multimodal approach was used to systematically characterize ITH in early-stage LUAD. We initiated our investigation by performing principal components analysis (PCA) coupled with unsupervised clustering of tumor cells (Fig. 3A and fig. S7A). By calculating cluster resolution at different levels, we found that dividing the tumor cells into four clusters provided the optimal distinction (fig. S6, A to D). The proportion of tumor clusters in each tumor tissue sample was subsequently calculated. Each tumor tissue with a tumor size <3 cm already contained at least three or more tumor clusters (12 of 12 patients) (Fig. 3B). Gene set enrichment analysis (GSEA) of the uniquely highly expressed genes in each cluster (Fig. 3C and table S6) revealed their distinct biological functions (fig. S7B and table S7). Furthermore, we applied non-negative matrix factorization (NMF) (41) to delineate tumor cell–specific ITH programs, with each program was defined by the top 50 most highly weighted genes (fig. S7C). Five coexpression programs were extracted and functionally annotated through GSEA (fig. S7D and table S8). The robust concordance between PCA and NMF methodologies was demonstrated by calculating program activity scores within each tumor cluster using AUCell (42) (fig. S7E).

Fig. 3. Epithelial plasticity drives early ITH in lung cancer.

Fig. 3.

(A) UMAP plot showing 9653 tumor cells from MIA (n = 6) and IA (n = 6). (B) Histogram displaying the tumor cell composition percentage of each patient sample. (C) Heatmap showing the top 10 DEGs in the cluster. (D) Spatial transcriptome atlas depicting the spatial regions and sub regions of the cancer region of P13 and P14 patients. (E) Histogram displaying the cell composition percentage of P13 and P14 patient sample. (F) Inferred tumor cell lineages per cell using the scHCL database, comparing normal AT2 cells, AT1 cells, and tumor cells in the C1 to C4 clusters, advanced, and metastatic stages. (G) Distribution of AT1 and AT2 cell scores determined by AUCell across the same groups as in (F). Two-sided Wilcoxon test. (H) Pseudo-time trajectory of normal AT1 cells, AT2 cells, and tumor cells in the C1 to C4 clusters, constructed with Monocle2. (I) Proportion of normal AT1 cells, AT2 cells, and tumor cells on different branches in the trajectory. (J) AT1 cell and AT2 cell scores in the trajectory of normal AT1 cells, AT2 cells, and tumor cells in the C1 to C4 clusters, constructed with Monocle2. (K) Correlation of AT1 and AT2 score with transcriptional diversity expression in genetically engineered mouse LUAD, adapted from data reported by Marjanovic et al. (21). The Pearson’s correlation coefficient is shown. (L) Cell distribution of cells from normal AT1 cells, AT2 cells, and the C2 cluster. Each point corresponds to a single cell. (M) Dot plot illustrating the expression levels of transitional cell marker genes across all tumor clusters. (N) Cell-cell transitions estimated using scvelo, revealing distinct trajectories of C2 cluster from AT2 cells to AT1 cells. Differentiation scores of AT1 cells, AT2 cells, and the C2 cluster were inferred via CytoTRACE. (O) Mechanism of ITH formation.

To further validate the presence of ITH within tumor tissues in early-stage LUAD, spatial transcriptome was performed on freshly dissected LUAD tumor tissues from two patients with early-stage LUAD. The cell components in each spot (bin50) in the LUAD tissue were determined by spacexr (43) using scRNA-seq data as the reference (fig. S8, A and B). The InferCNV algorithm was used to calculate the CNV scores in EPCAM+ cells to identify malignant cells (fig. S8C). Each tumor cluster was directly observed in the tissue sections (Fig. 3, D and E). The integration of single-cell and spatial transcriptomic profiling demonstrated the existence of the ITH in early-stage LUAD.

To delineate the relationship between transcriptomic heterogeneity and genomic variation, we performed inferCNV analysis on individual patients (fig. S9, A to D). The results revealed that tumor cells within individual patients with early-stage LUAD generally exhibited homogeneous CNV pattern. The integrative analysis of CNV and transcriptional clustering demonstrated that while tumor cells from the same patient shared identical CNV profiles, they could be stratified into distinct transcriptional clusters (fig. S9, E and F). Notably, individual transcriptional clusters encompassed diverse CNV modules, suggesting that transcriptional heterogeneity may precede genetic heterogeneity.

Epithelial plasticity drives ITH in early-stage LUAD

High tumor heterogeneity is associated with poor prognosis, emphasizing the need to investigate its cellular emergence in early-stage LUAD (11, 15). The above results have demonstrated decreased differentiation as a phenotypic hallmark of early-stage LUAD progression. We investigated the evolutionary relationships among tumor cells from the perspective of epithelial cell plasticity. HCL annotation database was performed to map tumor cell lineages, with healthy lung AT2/AT1 cells serving as controls and advanced-stage/metastatic cells defining malignant differentiation. We used tumor-normal epithelial cell similarity mapping to characterize the lineage attributes and composition of tumor cells, and lineage inference of tumor cells revealed that distinct tumor clusters predominantly exhibited mixed AT2/AT1 lineage features (Fig. 3F and table S9). The quantification of lineage-specific gene signatures (AT2/AT1) unveiled a marked differentiation loss spectrum across alveolar lineages (Fig. 3G). Pseudotime analysis was used to reconstruct the evolutionary relationships among tumor cell clusters. Trajectory analysis showed: (i) C1, exhibiting the lowest AT2/AT1 lineage scores, predominantly occupied terminal positions in the trajectory; (ii) C3 and C2, showing the highest AT2 and AT1 lineage features, respectively, were positioned at the trajectory origin (Fig. 3, H and I). Along the inferred tumor progression trajectory, we observed gradual attenuation of AT1/AT2 lineage features (Fig. 3J). These findings align with our above conclusion that decreased differentiation marks tumor progression.

Here, we propose a hypothesis that the observed cellular state diversity in early-stage LUAD may originate from the accumulation of tumor cells shifting to less differentiated states. To validate this hypothesis, we reanalyzed the scRNA-seq dataset from a LUAD mouse model that captured the entire tumor progression seven stage from pre-neoplastic hyperplasia to fully developed adenocarcinoma (fig. S5C) (21). We obtained two findings: (i) The loss of epithelial differentiation features in tumor cells significantly enhances transcriptional diversity among tumor cells (Fig. 3K), and (ii) newly emerged tumor cell clusters exhibit progressive loss of AT2 lineage features during tumor progression (fig. S5D). Together, results from human and murine models reveal a new mechanism underlying early LUAD ITH: During tumor progression, the shift of tumor cells toward less differentiated states triggers distinct cellular states, whose accumulation facilitates the emergence of ITH in early-stage LUAD (Fig. 3O).

Notably, we observed that the C2 cluster primarily localized to the S1 branch of tumor evolution, coinciding with the AT2 cell and AT1 cell axis (Fig. 3, H and I). Using well-characterized marker genes defining the transitional state in AT2-to-AT1 transdifferentiation during pulmonary repair (44), we found that C2 cluster showed high expression of these marker genes (Fig. 3M). Complementary RNA velocity (45) and cytoTRACE (46) analyses further confirmed that C2 cluster occupy a transitional state along the AT2-to-AT1 transdifferentiation (Fig. 3N). These findings indicate that the conserved AT2-to-AT1 transdifferentiation program co-opted by tumor cells represents a distinct mechanistic pathway contributing to ITH development in early-stage LUAD.

In summary, our findings reveal that tumor cells drive ITH in early-stage LUAD by altering epithelial states via the accumulation of tumor cells shifting to less differentiated states and transdifferentiation (Fig. 3O). This finding provides a theoretical framework for understanding LUAD pathogenesis.

Co-occurrence of the proliferative state and immune-active state among cancer cell populations of the same early-stage LUAD

To gain deeper insights into the key cellular events during the progression of early-stage LUAD, we conducted a more detailed analysis of the specific functions of the tumor clusters (Fig. 4A). The lineage inference of tumor cells revealed that the C1 cluster exhibited the lowest AT2/AT1 features, whereas the C3 cluster showed the highest AT2 features and the second highest AT1 features (Fig. 4B). By comparing the distribution changes of tumor cell populations in MIA and IA, we found that during disease progression, the proportion of the C1 cluster significantly increased, while that of the C3 cluster significantly decreased, with no notable changes observed in other clusters (Fig. 4C). These results indicate that the expansion of the C1 cluster and the reduction of the C3 cluster may be key factors driving the progression of early-stage LUAD. To validate this conclusion, we analyzed tumor cells from single-cell datasets across early-stage (34), advanced-stage (31), and metastatic (29) LUAD samples. The results demonstrated that as tumors progressed from early to advanced and metastatic stages, the features of the C1 cluster continued to intensify, whereas those of the C3 cluster gradually diminished (Fig. 4D). This trend further confirms that the dynamic changes in the C1 and C3 cluster not only play a crucial role in early-stage LUAD but also promote tumor progression toward advanced stage and metastatic.

Fig. 4. Tumor clusters in early-stage LUAD predict clinical outcome.

Fig. 4.

(A) UMAP plot of tumor cells, color-coded for the C1 and C3 clusters. (B) Box plots showing the AT1 cell and AT2 cell scores of different tumor clusters. The median and interquartile range are shown for each cluster, two-sided Wilcoxon test. (C) Quantification of tumor cell cluster frequency representation among tissues from MIA to IA. The median and interquartile range are shown for each patient group, two-sided Wilcoxon test. (D) Box plot showing the C1 to C4 scores across early, advanced, and metastatic stages. The median and interquartile range are shown for each cluster, two-sided Wilcoxon test. (E) Violin plot displaying marker gene expression of CRABP2 and LAMP3 in tumor cells. (F) Violin plot displaying the CRABP2+ cancer cell and LAMP3+ cancer cell marker genes and their signature gene set scores, adapted from LUAD data reported by Zhu et al. (34). (G) Dot plot illustrating different signaling pathways in C1 and C3 clusters using Gene Ontology (GO) enrichment analysis. (H) Number of significant ligand-receptor pairs between any pair of two cell populations in the C1 and C3 clusters. The edge width is proportional to the indicated number of ligand-receptor pairs. The circle size is proportional to the number of cells in each cell group, and the edge width represents the communication probability. (I) Kaplan-Meier curves of overall survival for C1 and C3 scores in two independent LUAD cohorts. P values were calculated using a two-sided log-rank test, with scores dichotomized as high or low. High: samples within top quartile of signature score. Low: samples below the third quartile of signature score.

To further characterize the expression profiles of C1 and C3 cluster, we performed differential expression analysis and selected genes showing the most significant expression differences between clusters (table S6). CRABP2 was selected as a marker gene for the C1 cluster and LAMP3 as a marker gene for the C3 cluster (Fig. 4E). For independent validation, we analyzed tumor cells from a published early-stage LUAD scRNA-seq dataset. Strikingly, cluster 3 in this external dataset showed both the highest CRABP2 expression levels and C1 signature score, while cluster 5 exhibited both the highest LAMP3 expression levels and C1 signature score (Fig. 4F). This strongly supports the robustness of our classification system in early-stage LUAD. The GSEA of DEGs demonstrated that the C1 cluster was significantly enriched in genes associated with cell growth, differentiation, and cell junctions (Fig. 4G and table S7), representing proliferative signals characteristic of malignant cells. In contrast, the C3 cluster showed significant enrichment in immune response pathways, including T cell activation, immune cell recruitment, and major histocompatibility complex (MHC) protein complexes involved in antigen presentation (Fig. 4G and table S7), indicating a strong immune response of tumor cells.

To investigate the potential immunological implications of the C3 cluster strong immune pathway enrichment, we applied CellChat (47), a tool that is able to quantitatively infer and analyze intercellular communication networks among ligands, receptors, and their cofactors from scRNA-seq data (fig. S10, A to C). The C3 cluster cells exhibited the most frequent interactions with immune cells, which explains its high enrichment in immune response pathways, reflecting characteristics of high immune infiltration (Fig. 4H and fig. S10D). In contrast, the C1 cluster showed the fewest interactions with both stromal and immune cells compared to other tumor cell clusters (Fig. 4H and fig. S10, D and E). Reduced cellular interactions indicate a more aggressive tumor phenotype or immune evasion phenotype, where malignant cells become increasingly autonomous and less dependent on the surrounding microenvironment for growth and survival (48). This observation is consistent with prior findings demonstrating that immune evasion can occur even in early-stage tumors (49).

To spatially resolve cellular functions, we quantified intercellular distances within tumor tissues. This analysis revealed that C3 malignant cells exhibited immediate proximity to immune cells, while C1 cells showed maximal spatial segregation from immune cells (fig. S8, D and E). Since our spatial transcriptomics data were confined to tumor-enriched regions, we integrated publicly available 10x Visium datasets encompassing matched tumor-normal tissue sections (fig. S11A) (34). Combined hematoxylin and eosin (H&E) staining and CNV score profiling enabled precise demarcation of tumor-stromal boundaries (fig. S11B), demonstrating predominant localization of CRABP2+ C1 cancer cells at invasion fronts (fig. S11, C and D). These spatial patterns validate the designation of C1 as the “proliferative state” and C3 as the “immune-active state” through orthogonal positional evidence.

Based on differential gene expression analysis, we identified characteristic genes for the C1 and C3 clusters, constructing the C1 and C3 gene signatures, respectively. Using single-sample GSEA deconvolution analysis on transcriptomic data from two independent early-stage LUAD cohorts (TCGA and GSE72094), we quantified the characteristic scores of C1 and C3 cluster in both cohorts. Kaplan-Meier survival analysis demonstrated that elevated C1 features were significantly associated with poor prognosis, whereas high C3 features correlated with favorable clinical outcomes (Fig. 4I).

Collectively, our findings demonstrate the coexistence of proliferative state (C1) and immune-active state (C3) tumor cell clusters in early-stage LUAD with a tumor size of <3 cm. The molecular features of C1 and C3 clusters can serve as predictive biomarkers for prognosis.

Transcriptional regulatory events during the AT2-to-AT1 transitional state in early-stage LUAD

We have characterized the C2 cluster as an AT2-to-AT1 transitional state (Fig. 3). The TF KLF4 was identified as a robust marker gene for the C2 cluster, as it exhibits significantly higher expression levels compared to all other clusters (Fig. 5B). For independent validation, we analyzed tumor cells from a published early-stage LUAD single-cell RNA-seq dataset (34). Strikingly, clusters 0, 1, 2, 4, and 5 exhibited both the high KLF4 expression levels and C2 features, matching our C2 cluster (Fig. 5C). This strongly supports the robustness of our classification system in early-stage LUAD.

Fig. 5. KLF4+ cancer cells represent an AT2-to-AT1 transitional state characterized with tumor suppressor pathways/genes activation.

Fig. 5.

(A) UMAP plot of tumor cells, color-coded for the C2 cluster. (B) Violin plot displaying KLF4 gene expression in C2 cluster. (C) Violin plot displaying the KLF4+ cancer cell cluster marker gene and the corresponding signature gene set scores, adapted from LUAD data reported by Zhu et al. (34). (D) Representative images of immunofluorescence staining of LUAD tissue of P03 patient. Green color: EPCAM; yellow color: CLDN4; red color: KLF4; blue color: DAPI. (E) Box plots showing the AT1 scores of different tumor clusters. The median and interquartile range are shown for each cluster, two-sided Wilcoxon test. (F) Dot plot illustrating different signaling pathways in C2 clusters using GO enrichment analysis. (G) Box plot showing the P53 pathway scores of different tumor clusters. The median and interquartile range are shown for each cluster, two-sided Wilcoxon test. (H) Fractions of cells in each cell cycle stage among different tumor clusters. (I) Heatmap showing gene expression of tumor suppressor genes from the TSGene database among the four tumor cell clusters. (J) Dot plot showing TF regulon activity among cell clusters calculated via SCENIC and gene expression of TFs among tumor cell clusters calculated via SCENIC. (K) JDP2 and target genes inferred by RcisTarget from both the KLF4+ cancer cluster and the MP2 program. (L) Correlation of JDP2 and KLF4 expression (transcripts per million, TPM) in all LUAD samples from the TCGA dataset. The Pearson’s correlation coefficient is shown.

We sought to further investigate the association between the C2 cluster and AT2-to-AT1 transition cells. Fluorescence staining of patient-derived tissue sections confirmed the presence of KLF4+ cancer cells and revealed the coexpression of KLF4 and a transitional state marker CLDN4 within the tumor tissue (Fig. 5D). Further analysis revealed that C2 cluster exhibited the highest AT1 features (Fig. 5E), elevated activity in tumor-suppressive pathways, such as the p53 and apoptotic signaling pathway (Fig. 5, F and G) and molecular features indicative of G2-M cell cycle arrest (Fig. 5H). These findings align with prior studies (44, 50, 51), further confirming that C2 cluster as a transitional state during AT2-to-AT1 differentiation. We found that the C2 cluster differentially expresses tumor suppressor genes (Fig. 5I). We began by screening the TSGene database for previously reported tumor suppressor genes in LUAD (52) and performing an intersection analysis with the high-expression signature genes in tumor clusters. Strikingly, the C2 cluster harbored a significantly higher number of differentially expressed TSGs compared to other clusters (Fig. 5I), suggesting a potential tumor-suppressive role during this transitional state in early-stage LUAD.

TFs play pivotal roles in governing cell fate determination (53, 54), To identify C2 cluster-specific transcriptional regulators, we implemented two complementary approaches. First, we applied Single-Cell Regulatory Network Inference and Clustering (SCENIC) (42) analysis, which integrates differential gene expression patterns with TF binding site enrichment and coexpression networks. This approach revealed distinct TF expression profiles across tumor clusters, with KLF4 emerging as a specifically enriched TF in the C2 cluster (Fig. 5J).

Second, leveraging the notable overlap between the C2 cluster and MP2 module (fig. S12, A and B). By integrating C2 cluster–specific DEGs with the NMF-derived MP2 gene set, we generated a coexpressed C2/MP2 signature gene set. Subsequent TF analysis using Cytoscape iRegulon plugin identified JDP2 as a potential upstream regulator of the C2 cluster (Fig. 5K). JDP2 was also highly expressed in the C2 cluster (fig. S12, C and D), and its expression was significantly lower in LUAD tumor samples than in adjacent normal tissues (fig. S12E). The expression of JDP2 and KLF4 was positively correlated (Fig. 5L). Considering the high expression of tumor suppressor genes in the C2 cluster, JDP2 may represent a novel tumor suppressor gene in LUAD. In summary, our study identifies the C2 cluster as a tumor-suppressive transition state during AT2-to-AT1 differentiation, governed by the KLF4/JDP2 transcriptional regulatory network.

Reprogramming LUAD into AT2-to-AT1 transitional state by KLF4/JDP2 overexpression suppresses tumor progression

The C2 cluster was uniquely enriched for tumor suppressor genes (Fig. 5I). Restoring tumor cells to the C2 transitional state with high expression of tumor suppressor genes offers a therapeutic strategy. The biological function of KLF4 was investigated in human LUAD models. KLF4 biological function was evaluated by reexpressing KLF4 in KLF4low LUAD cells. Two human LUAD cell lines, A549 and H1437, were shown to exhibit low KLF4 expression. Lentivirus-mediated KLF4 transduction into A549 and H1437 cells induced KLF4 expression (Fig. 6A and fig. S13A). KLF4 reexpression reduced tumor cell proliferation in both short-term (Fig. 6B and fig. S13B) and long-term (Fig. 6, C and E, and fig. S13, C and D) assays and migration (Fig. 6D) assay. To determine whether the inhibition of cell proliferation manifested in vivo, we generated both control and KLF4-restored A549 xenografts in nude mice. KLF4 overexpression attenuated tumor growth (Fig. 6, F and G). These results demonstrate the tumor-suppressive role of KLF4 in LUAD progression. The tumor suppressor function of JDP2 was also validated in LUAD models. Lentivirus-mediated JDP2 transduction into H1437 and A549 cells induced JDP2 expression (fig. S14A). JDP2 overexpression inhibited cell proliferation (fig. S14, B to D) and migration (fig. S14E).

Fig. 6. High expression of TF KLF4 restores transitional state features.

Fig. 6.

(A) Box plots showing KLF4 mRNA expression in the A549 LUAD cell line in the overexpression group and the control group. The median and interquartile range are shown for each cluster, two-sided Wilcoxon test. (B) Cell viability assay showing that overexpression of the KLF4 gene in KLF4low LUAD A549 cells decreases cell proliferation. (C) Colony formation assays confirming that overexpression of the KLF4 gene decreases cancer cell proliferation in the A549 cells. (D) Transwell assays confirmed that overexpression of the KLF4 gene decreases cell migration in the A549 cells. (E) Soft agar assay showing that overexpression of the KLF4 gene inhibits anchorage-independent growth in the A549 cells. (F and G) Growth curves of xenograft tumors (F) and subcutaneous tumor size (G) of each group showing that the overexpression of the KLF4 gene inhibits tumor growth. (H) CUT&Tag-seq tracks of the gene of transitional marker and C2 cluster–specific genes locus in the indicated cells. KLF4-OE: Flag-KLF4–overexpressing A549 cells (anti-Flag antibody); Ctrl: Flag-overexpressing A549 cells (anti-Flag antibody). (I) Volcano plot showing the transitional marker and C2 cluster–specific genes distribution with respect to KLF4 expression levels. (J and K) GSEA showing that the KLF4+ cancer cell (J), AT1 cell, and AT2 cell (K) features were affected by high expression of KLF4.

Through CUT&Tag profiling, we demonstrated that KLF4 and JDP2 directly regulates key transitional state markers, including established markers (KRT8 and CLDN4) (50) and C2 cluster–specific genes (Fig. 6H, fig. S14F, and table S10). RNA-seq analysis further confirmed that the overexpression of KLF4 and JDP2 reconstituted the C2 transitional state features (Fig. 6, I and J; fig. S14, G and H; and table S11). The overexpression of KLF4 and JDP2 promoted tumor cell reprogramming toward AT1-like features while preserving AT2 features (Fig. 6K and fig. S14H), indicating its specific role in driving the AT2-to-AT1 transitional program.

Our findings demonstrate that KLF4 and JDP2 function as key transcriptional regulators capable of restoring the C2 transitional state, effectively reprogramming malignant cells into a less aggressive phenotype with AT1-differentiation potential. These results highlight the therapeutic potential of targeting the KLF4/JDP2 regulatory axis to induce cellular reprogramming in LUAD.

DISCUSSION

Heterogeneity is a hallmark of cancer that drives tumor evolution and disease progression (10, 11). Despite its importance during tumorigenesis, very little is known about whether and to what extent early-stage lesions exhibit heterogeneity. The scRNA-seq technology provides a high-resolution method for assessing the transcriptome states of cells (55). Current knowledge is focused on the ITH of full-fledged cancers, in which tumoral heterogeneity has already been established (9, 2730). How this tumoral heterogeneity evolves during the early stages of LUAD development remains poorly understood. Here, we dissociated each tumor immediately after surgery to obtain a single-cell suspension and processed for scRNA-seq without initial sorting to ensure an unbiased analysis of the tumor cellular characteristics. Using single-cell and spatial transcriptomic technology in combination with integrative computational analyses and functional validations, we showed that, far from being homogenous, the early-stage LUAD tumor cells display remarkable transcriptomic ITH, which provides a source of diversity for tumor plasticity and evolution. We observed that more than three transcriptomically distinct tumor cell clusters coexisted in all of the early-stage LUAD cases analyzed (14 of 14 patients). Our tumor cell clusters were validated in independent early-stage LUAD datasets.

LUAD is characterized by a high heterogeneity that leads to disease progression (11, 15). Identifying the mechanisms that orchestrate these heterogeneous ecosystems is crucial to improving current treatment approaches for this disease. From a lineage development perspective, we found a continuum of decreased differentiation events during LUAD progression. Unexpectedly, decreased differentiation process was already observable within early-stage LUAD tissues. Through pseudotemporal trajectory reconstruction, we established that progressive decreased differentiation represents a hallmark feature of evolving tumor cell clusters. Using a scRNA-seq dataset from a LUAD mouse model that captured the entire tumor progression seven stage from preneoplastic hyperplasia to fully developed adenocarcinoma, we demonstrated that decreased differentiation promotes the generation of transcriptional diversity in early-stage LUAD, with newly emerged tumor cell clusters exhibit progressive loss of AT2 lineage features during tumor progression. Intriguingly, we identified a unique tumor cluster exhibiting features of an AT2-to-AT1 transitional state, revealing transdifferentiation as an additional mechanism contributing to epithelial plasticity in tumor cells. Collectively, our results demonstrate that early-stage LUAD generates cellular diversity through shifting to less differentiated states and transdifferentiation-mediated epithelial plasticity, ultimately leading to the development of ITH.

Our data show that remarkable heterogeneity occurs during early-stage LUAD development, reinforcing the need to target heterogeneity at the earliest steps of treatment. We identified that coexisting proliferative state (C1) and immune-active state (C3) cell cluster serve as predictive biomarkers for prognosis in early-stage LUAD. Most notably, we found an AT2-to-AT1 transitional state (C2 cluster) characterized by tumor suppressor activation. We established KLF4 and JDP2 as key transcriptional regulators capable of reinstating the C2 transitional state, effectively reprogramming malignant cells into a less aggressive phenotype with AT1-like features. These results underscore the therapeutic promise of targeting the KLF4/JDP2 regulatory axis to induce tumor cell reprogramming in LUAD.

Combination therapies may target coexisting cellular states, and differentiation therapies may shift cells from a proliferative state (e.g., the C1 cluster) to immune-active state (e.g., the C3 cluster). The current findings are constrained by their LUAD-specific nature. Future multicancer investigations at single-cell resolution should establish whether our cell observed linkage between cellular lineage states and transcriptional ITH represents a pan-cancer phenomenon with broad biological implications.

MATERIALS AND METHODS

Patient samples and clinical information

This study was approved by the Ethics Committee of Shanghai Changzheng Hospital (2018SL004). The scRNA-seq and spatial transcriptome were conducted on 14 samples from patients spanning two histological stages: MIA and IA, as classified by the 2015 World Health Organization guidelines. All patients, who were diagnosed at Shanghai Changzheng Hospital, provided informed consent. They underwent surgical treatment without prior neoadjuvant chemotherapy, radiotherapy, or immunotherapy.

ScRNA-seq library preparation and sequencing

Fresh tissues were immediately transported in Hank’s balanced salt solution (HBSS; Life Technologies) on ice after surgery, minced into cubes smaller than 0.5 mm3, and transferred into a 15-ml conical tube with prewarmed HBSS, collagenase I (1 mg/ml), and collagenase IV (0.5 mg/ml). Single-cell separation, cDNA amplification, and library construction (10x Genomics Chromium) were performed at Shanghai Applied Protein Technology Co. Ltd. Libraries were sequenced on the MGISEQ-2000 platform with 150–base pair (bp) paired-end reads.

scRNA-seq data processing and analysis

Gene expression matrices were generated per sample using CellRanger (v6.1.2) and converted to Seurat (56) objects (v4.0.4). The predicted doublets were removed using DoubletFinder (v2.0.3). The cells were filtered on the basis of the number of expressed genes and the percentage of the mitochondrial genome, followed by regularized negative binomial regression for unique molecular identifier (UMI) normalization using the SCTransform() function. PCA was conducted using RunPCA() on highly variable features. Batch effects were corrected using the Harmony package (v0.1.0). Clusters were identified through shared-nearest-neighbor–based clustering, and dimensionality was reduced with RunTSNE(). The signature genes for each cluster were identified using FindMarkers(), and the cell types were annotated on the basis of common biomarkers and signature genes.

Spatial transcriptomic library preparation and sequencing

Capture chips were generated following the Stereo-seq protocol (57). Tissue samples were snap-frozen in isopentane prechilled with liquid nitrogen, embedded in Tissue-Tek OCT (optimal cutting temperature compound), and stored at −80°C. Frozen sections (10-μm thick) were prepared using a Leica CM1950 cryostat, adhered to the surface of the Stereo-seq chip, and fixed in precooled methanol at −20°C for 40 min. The sections were then stained with nucleic acid dye (Thermo Fisher Scientific, Q10212) and imaged using a Ti-7 Nikon Eclipse microscope (fluorescein isothiocyanate channel, 10× objective).

Tissue sections on the chip were permeabilized using 0.1% pepsin (Sigma-Aldrich, P7000) in 0.01 M HCl buffer at 37°C for 12 min. The released RNA was captured by the chip and reverse transcribed at 42°C for 2 hours using a reaction mix containing SuperScript II, deoxynucleotide triphosphates (dNTPs), betaine, MgCl2, dithiothreitol, ribonuclease inhibitor, and Stereo-seq template switch oligo (TSO). After tissue removal, the chip was treated with Exonuclease I [New England Biolabs (NEB), M0293L] for 1 hour. The first-strand cDNA was then amplified using KAPA HiFi Hotstart Ready Mix (15 cycles) and purified with Ampure XP beads (0.6×). A total of 20 ng of cDNA was fragmented with Tn5 transposase at 55°C for 10 min, followed by reaction termination with SDS and a second round of polymerase chain reaction (PCR) amplification (13 cycles). Library construction and sequencing were performed on the MGI DNBSEQ-Tx platform (R1: 35 bp, R2: 100 bp) at Shanghai Majorbio Bio-pharm Technology Co. Ltd.

Stereo-seq data processing and analysis

To improve RNA capture efficiency at a 500-nm resolution, the raw spatial expression matrix was aggregated into larger pseudo-spots using a 50 by 50 window (bin50, equivalent to 25 μm2). The cell type composition of each bin50 spot was inferred using spacexr (v2.2.1) with factorized cell type–specific topic profiles from scRNA-seq data. The potential composition was pruned and renormalized using the top cell types in descending probability order, with the primary cell type assigned for visualization.

Spatial distance calculation

Spatial distances were computed using the CellTrek (58) (v0.0.94) R package based on K-nearest neighbor distance analysis. The parameter k was set to 5 for all calculations. For each cell cluster, we randomly sampled 500 cells per iteration and repeated this random sampling process 100 times to obtain robust spatial distance estimates. The median values from these 100 iterations were used as the final spatial distance measurements.

Diversity score calculation for ITH

Diversity score was defined to quantify ITH based on the gene expression profiles of malignant cells within a tumor (59). PCA was applied to project the original expression profiles of all malignant cells into the eigenvector space, extracting principal components (PCs) to capture key information while reducing noise.

Given a tumor sample with m malignant cells, each cell is represented by n features. The ith malignant cell can be expressed as (xi1, xi2,..., xin), while the centroid—calculated as the arithmetic mean of PCs across all malignant cells in the tumor—is denoted as (u1, u2,..., un). The diversity score is defined as the average distance of all malignant cells from the centroid, computed as

Diversity score=1mi=1mj=1n(xijuj)2

To mitigate the impact of extreme values, we applied a threshold of means ± 3 × SD to identify outliers. Cells were considered extreme-value outliers if their first three PCs fell outside the range [μj − 3σj, μj + 3σj], where μj and σj represent the mean and SD of the jth PC within the tumor. These outlier cells were excluded from the diversity score calculation.

Inference of copy number and identification of cancer cells

CNV analysis was performed using inferCNV (v1.0.6) with the default parameters. Unsupervised K-means clustering analysis of CNVs across all 22 autosomes identified distinct genomic alteration patterns in malignant cell clusters. Visualizing the distribution of the relative CNVs enables definitive malignant cell populations based on genomic alteration patterns. To quantify CNVs at the cell level, CNV scores was quantified by the SDs of CNVs inferred across 22 autosomes. Both the presence of characteristic CNV patterns and statistically higher CNV scores, we classified these clusters K1 and K3-K6 as tumor cells. Cell populations exhibiting low CNV scores will undergo further sub analysis to definitively exclude their identity as malignant cells.

DEG and gene enrichment analyses

DEGs across cell types were identified using the FindAllMarkers() function in Seurat, which uses the Wilcoxon test and fold change. DEGs were retained if the adjusted P value was <0.01 and log fold change (logFC) > 0.5. GSEA (v4.2.078) was conducted via preranked gene lists from differential expression outputs. The gene sets included Hallmarks (h.all.v7.5.1) and Gene Ontology (c5.go.v7.5.1).

Gene signature scores

Gene signature scores were calculated using the AUCell method, with gene lists provided in Supplementary Materials.

Cell lineage inference

Tumor cell origins were inferred by mapping the scRNA-seq data to the HCL database using the scHCL package.

Gene set variation analysis

Pathway analyses were conducted on the 50 hallmark pathways from the molecular signature database. Metabolic pathway activities were assessed via a curated dataset of 85 pathways. Pathway activity estimates for individual cells were assigned using gene set variation analysis (GSVA) (60) with the GSVA R package (v1.30.0).

Cell cycle analysis

The cell cycle phases were analyzed using 43 G1-S genes and 54 G2-M genes, with scores calculated for each malignant cell using the CellCycleScoring function in Seurat.

Cell-cell communication analysis

Cell-cell communication was analyzed using CellChat (v1.0.0). Ligand and receptor genes expressed by each cell were projected onto a reference communication network, with communication probabilities inferred by gene expression. Probabilities between subclones and other nonmalignant cells, as well as between subclones, were statistically analyzed. Visualizations were created using the ggplot2 package (v3.3.2).

Survival analysis

To identify patient subgroups with poor prognosis, we performed survival analysis using cluster-specific mean expression of DEGs (adjusted P < 0.01, |logFC| > 0.5). Patients were stratified into quartiles based on GSVA scores derived from these DEGs, with Kaplan-Meier analysis comparing the overall survival between the lowest and highest quartiles.

Trajectory and RNA velocity analysis

The pseudo-time trajectory analysis was performed using Monocle2 (61) to map cell subtype differentiation and conversion. The top 2000 most variable genes per cluster were used to construct trajectories, with dimensionality reduction via DDRTree and visualization through plot cell trajectory. RNA velocity analysis was conducted using BAM files with Velocyto in Python, which were mapped to UMAP plots via Seurat clustering analysis.

Identifying coregulated gene modules

To capture ITH, coregulated gene modules were identified using both PCA-based and NMF-based approaches. DEGs were first identified in each tumor, followed by NMF-based gene module identification. The resulting signatures and expression patterns matched core programs across malignant cells.

TF analysis

Dominant TFs in different subclones were explored using SCENIC (42) (v1.2.1). Potential targets of each TF were inferred through coexpression analysis (GENIE3) and DNA motif analysis compared with the Rcis-Target database. Regulon activity per cell was analyzed via the AUCell function, with visualization via featurePlot in Seurat.

CUT&Tag-seq assay and data processing

CUT&Tag assay was performed using NovoNGS CUT&Tag 4.0 High-Sensitivity Kit (Novoprotein, #N259-YH01-01A). Briefly, 5 × 105 cells were enriched with ConA beads and incubated with primary antibody buffer of anti-Flag antibody (Abmart, #M20008) and then with the secondary antibody buffer of anti-mouse immunoglobulin G antibody (Novoprotein, #N270). The cells were then incubated with protein A/G-Tn5 transposome and the tagmentation buffer (10 mM MgCl2 in ChiTaq buffer). The DNA fragments were extracted by Tagment DNA extract beads, and the libraries were amplified using 2x HiFi AmpliMix. The DNA libraries were extracted by DNA clean beads for sequencing. The raw sequencing data underwent quality control and adapter trimming with Cutadapt (v3.5), followed by alignment to the human reference genome (GRCh38) using Bowtie2 (v2.2.5). PCR duplicates were eliminated with Sambamba (v0.8.1). Read density normalization was then performed using deepTools (v3.5.1), calculating reads per kilobase per million mapped reads values in 100-bp genomic intervals. Significant binding sites were identified through peak detection analysis implemented in MACS2 (v2.2.7) under different experimental conditions. Genomic feature annotation of the identified peaks was carried out using HOMER software (v4.11). The final results were examined and presented through local visualization with IGV (v2.5.0).

RNA-seq analysis

Total RNA was extracted from tumor tissues and cells following the standard. Paired-end sequencing was performed on a BGI-500 instrument. The sequence data were processed and mapped to the human reference genome (hg38) using Bowtie2. Read counts for each feature were calculated with HTseq (v0.11.2). Gene expression was quantified as FPKM using RNA-seq by expectation-maximization. Differential expression analysis was conducted with DESeq2 (v1.30.1). DEGs were further analyzed via GSEA (v4.0.1) in preranked mode using log2 fold change values with default settings.

Cell culture

Human embryonic kidney (HEK) 293T cells [American Type Culture Collection (ATCC), catalog #ACS-4500) and the non–small cell lung cancer (NSCLC) cell lines A549 (ATCC, catalog #CCL-185) and NCI-H1437 (ATCC, catalog #CRL-5872) were used. HEK293T and NCI-H1437 cells were maintained in RPMI 1640 (HyClone, #SH30027.01) medium supplemented with 10% fetal bovine serum (FBS) (Thermo Fisher Scientific; #10099141) and penicillin/streptomycin (Thermo Fisher Scientific, #15140122). A549 cells were maintained in Dulbecco’s modified Eagle’s medium (Gibco, #C11995500BT) medium supplemented with 10% FBS and penicillin/streptomycin. All of these cells were cultured at 37°C in a 5% CO2 humidified atmosphere. None of the cell lines in this study appeared in the misidentified cell line list maintained by the International Cell Line Authentication Committee. All of the cell lines were routinely tested for microbial contamination (including Mycoplasma contamination).

Plasmid constructs and lentivirus production

The KLF4 and JDP2 lentiviral constructs were generated by cloning the corresponding cDNAs into the pCDH-CMV-MCS-EF1-Puro lentiviral expression vector (System Biosciences, catalog #CD510B-1). The 3x Flag sequence was added just ahead the N terminus of KLF4 or JDP2 to obtain the Flag-tag KLF4 or Flag-tag JDP2. The Flag-tag KLF4 and Flag-tag JDP2 lentiviral constructs were generated by cloning the Flag-tag KLF4 or Flag-tag JDP2 into the pCDH-CMV-MCS-EF1-Puro lentiviral expression vector. Lentivirus particles were generated by cotransfecting these lentiviral constructs and helper virus packaging plasmids pCMVΔR8.9 and pHCMV-VSV-G into HEK293T cells using Lipofectamine 3000 (Invitrogen, #L3000015) or polyethylenimine (Sigma-Aldrich, #408727). Lentiviruses were harvested after 24, 36, and 48 hours and frozen at −80°C in aliquots at appropriate amounts for infection.

RT-PCR and quantitative PCR

Tissues were homogenized in TRIzol reagent (Invitrogen, #15596026), followed by total RNA isolation using the standard protocol. The RNA was further reverse-transcribed into cDNA using the HiScript III First Strand cDNA Synthesis Kit (+ genomic DNA wiper; ABclonal, #RK20400). Quantitative PCR was performed for target gene expression analysis using the ChamQUniversal SYBR qPCR Master Mix (ABclonal, #RK21203). The samples were run in triplicate with no-reverse-transcription (no-RT) or nontemplate controls. The amplification accuracy was verified by melting curve analysis.

Cell viability assay

Viability studies were performed using the CellTiter-Glo luminescent assay (Promega, #G7573). A549 (1000 per well), and NCI-H1437 (2000 per well) cells transduced with specific genes or ctrl were seeded in each well of 96-well flat-bottomed plate. To evaluate cell proliferative potential, viability assays conducted at the indicated timepionts. Luminescence was analyzed using a BioTek Gen5 Microplate Readers (BioTek, #H1210-018).

Colony formation assay

Cell proliferation was assayed by colony formation. A549 (600 per well) and NCI-H1437 (800 per well) cells transduced with specific genes or ctrl were seeded in each well of six-well plates and cultured for 2 to 3 weeks. Then, the cells were fixed with 4% paraformaldehyde for 30 min and stained with crystal violet solution (Sangon Biotechnology, #E607309-0100) for 3 hours. All assays were performed in triplicate wells.

Soft agar assay

Six-well plates were first layered with 0.6% bottom agar (BD DIFCO, #214220) containing RPMI 1640 medium with 10% FBS and penicillin/streptomycin. A549 (10000 per well), and NCI-H1437 (10000 per well) cells transduced with specific genes or ctrl were seeded in 0.35% top agar containing 10% FBS and penicillin/streptomycin. Cells were allowed to grow for 3 to 5 weeks and then stained with 1 ml of methyl thiazol tetrazolium (1 mg/ml; Sigma-Aldrich, #M5655) for 4 hours. Colonies area percent was measured by ImageJ software (National Institutes of Health). All assays were performed in triplicate wells.

Cell migration assay

The cells were serum-starved for 24 hours before the assay. A549 (50000 per well) cells transduced with specific genes or ctrl in 100 μl of medium containing 1% FBS were seeded into the upper chamber of each insert. The lower chamber was filled with 700 μl of medium containing 10% FBS to serve as a chemoattractant. The cells were allowed to migrate for 24 to 36 hours at 37°C in a humidified atmosphere with 5% CO2. After incubation, nonmigratory cells on the upper surface of the membrane were carefully removed with a cotton swab. The cells that had migrated to the lower surface of the membrane were fixed with 4% paraformaldehyde for 30 min, stained with crystal violet solution for 3 hours, and then washed with phosphate-buffered saline (PBS). The stained cells were imaged using a light microscope.

Multiplexed immunofluorescence staining

To prepare the tumor samples for H&E and immunofluorescence staining, the tissues were fixed with 10% formalin followed by paraffin embedding. For multiplexed immunofluorescence staining, the slides were heated for 30 min at 60°C, deparaffinized in xylene, and rehydrated with an alcohol series. Antigen retrieval was performed in tris-EDTA (pH 9.0) at 98°C for 20 min, followed by cooling. After three washes with PBS, the slides were blocked with 3% bovine serum albumin (BSA) for 1 hour at 37°C and incubated with primary antibodies (mouse anti-EpCAM, rabbit anti-CLDN4, or mouse anti-KLF4) overnight at 4°C. The slides were rinsed with PBS and then incubated with horseradish peroxidase (HRP)–conjugated secondary antibodies for 1 hour at room temperature. The slides were rinsed with PBS and then incubated with fluorescein-conjugated tyramide signal amplification (iF488-tyramide, iF555-tyramide, or iF647-tyramide) at room temperature for 10 min in the dark. For triple-label staining, these sections were subjected to antigen repair again with tris-EDTA (pH 9.0) antigen retrieval solution for 20 min, followed by blocking with 3% BSA. Incubation with primary and secondary antibodies was performed as described previously. Last, nuclei were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) for 5 min, followed by washing with PBS. The sections were mounted with antifade mounting medium.

Xenograft tumor model

A549 cells (2 × 106) in PBS were injected subcutaneously into 6-week-old female BALB/c nude mice, and the tumor xenografts were allowed to grow for 4 to 5 weeks. The resulting tumors were measured every 3 days. The tumor volume was calculated using the following formula: tumor volume = length × width × width/2. Once the largest tumor diameter reached the maximal tumor diameter allowed under our institutional protocol, all of the mice were euthanized, and the tumors were collected. The maximal tumor diameter allowed by the Institutional Animal Care and Use Committee (IACUC) was 2.0 cm. All animal experiments were performed in accordance with the ethical guidelines and were approved by the IACUC of the Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences (approval number: SINH-2023-WYX-1 and SINH-2024-WYX-1).

Statistics and reproducibility

Statistical analyses were conducted using R software using two-sided Student’s tests, Pearson’s correlation tests, and Wilcoxon tests. The results are expressed as the means ± SD. Boxplots display the median (center), 25th and 75th percentiles (box bounds), and minimum and maximum (whiskers). Kaplan-Meier survival analysis was performed via the log-rank test. A P value of <0.05 was considered statistically significant.

Acknowledgments

We thank J. Fletcher at Brigham and Women’s Hospital/Harvard Medical School for critical and constructive comments. We also thank Z. Weng, K. Wang, and L. Qiu from Institutional Center for Shared Technologies and Facilities of SINH, CAS for technical assistance.

Funding:

This work was supported by grants from the National Key Research and Development Program of China (2023YFE0117900) (to Y.W.), the National Natural Science Foundation of China (82072974 and 82120108020) (to Y.W.), the Oriental Talent Program (Shanghai Academic/Technology Research Leader Program, BJKJ2024038) (to Y.W.), the Postdoctoral Fellowship Program of CPSF (GZB20240790) (to S.W.), and the Shanghai Post-doctoral Excellence Program (2024714) (to S.W.).

Author contributions:

Conceptualization: Y.X., H.T., and Y.W. Methodology: Y.X., X.W., Y.C., C.Z., and Y.W. Software: Y.X., Y. Luo, X. Liu, and X. Lu. Validation: Y.X., X.W., Y.C., C.Z., Y. Luo, X. Liu., X. Lu., H.L., T.W., Y.D., S.W., and Y.W. Formal analysis: Y.X., X.W., H.L., Y.D., H.T., and Y.W. Investigation: Y.X., K.Y., N.X., H.T., and Y.W. Resources: K.Y., N.X., W.L., Z.Z., X.Y., X. Lin., Y. Li., H.T., and Y.W. Data curation: Y.X., N.X., Y.C., X. Liu., X. Lu., X.J., H.T., and Y.W. Writing—original draft: Y.X., X.W., H.T., and Y.W. Writing—review and editing: Y.X., X.W., K.Y., N.X., W.L., Z.Z., Y.C., C.Z., Y. Luo., X. Liu., X. Lu., X.Y., H.L., T.W., Y.D., X. Lin., Y. Li., X.J., S.W., H.T., and Y.W. Visualization: Y.X., X.W., C.Z., Y. Luo., S.W., H.T., and Y.W. Supervision: H.T. and Y.W. Project administration: X.J., H.T., and Y.W. Funding acquisition: S.W., H.T., and Y.W.

Competing interests:

The authors declare that they have no competing interests.

Data, code, and materials availability:

This study did not generate new biological materials. All reagents and materials used in this study are commercially available or can be obtained from the indicated suppliers or public repositories. The single-cell RNA and spatial transcriptomics sequencing data generated in this study have been deposited in the National Genomics Data Center (NGDC) Genome Sequence Archive for Human under accession number HRA008879 (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA008879) and HRA012291 (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA012291). The bulk RNA-seq and CUT&Tag-seq reported in this study have been deposited in the National Omics Data Encyclopedia under accession no. OEP00006143 (www.biosino.org/node/project/detail/OEP00006143) and OEP00006493 (www.biosino.org/node/project/detail/OEP00006493). Code for data processing and figure generation is available at Zenodo with DOI: 10.5281/zenodo.16922603 (https://doi.org/10.5281/zenodo.16922603). The publicly available bulk WES data for early-stage LUAD were obtained from the study by (62). The publicly available scRNA-seq data of early-stage, advanced-stage, metastatic LUAD used in this study were downloaded from the GEO (www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE189357 (34), GSE148071 (31), and GSE131907 (29). The publicly available scRNA-seq data of normal lung tissues were downloaded from https://cellxgene.cziscience.com/collections. The publicly available spatial transcriptomics data of early-stage LUAD were downloaded from the GEO under accession number GSE189487 (34). The publicly available bulk RNA-seq data of early-stage LUAD were downloaded from the GEO under accession number GSE72094 (63).

Supplementary Materials

The PDF file includes:

Figs. S1 to S14

Legends for tables S1 to S11

sciadv.ady8546_sm.pdf (19.5MB, pdf)

Other Supplementary Material for this manuscript includes the following:

Tables S1 to S11

REFERENCES

  • 1.Siegel R. L., Giaquinto A. N., Jemal A., Cancer statistics, 2024. CA Cancer J. Clin. 74, 12–49 (2024). [DOI] [PubMed] [Google Scholar]
  • 2.Riihimaki M., Hemminki A., Fallah M., Thomsen H., Sundquist K., Sundquist J., Hemminki K., Metastatic sites and survival in lung cancer. Lung Cancer 86, 78–84 (2014). [DOI] [PubMed] [Google Scholar]
  • 3.Marusyk A., Janiszewska M., Polyak K., Intratumor heterogeneity: The Rosetta Stone of therapy resistance. Cancer Cell 37, 471–484 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Vitale I., Shema E., Loi S., Galluzzi L., Intratumoral heterogeneity in cancer progression and response to immunotherapy. Nat. Med. 27, 212–224 (2021). [DOI] [PubMed] [Google Scholar]
  • 5.Dagogo-Jack I., Shaw A. T., Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol. 15, 81–94 (2018). [DOI] [PubMed] [Google Scholar]
  • 6.Lawson D. A., Kessenbrock K., Davis R. T., Pervolarakis N., Werb Z., Tumour heterogeneity and metastasis at single-cell resolution. Nat. Cell Biol. 20, 1349–1360 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhang J., Fujimoto J., Zhang J., Wedge D. C., Song X., Zhang J., Seth S., Chow C. W., Cao Y., Gumbs C., Gold K. A., Kalhor N., Little L., Mahadeshwar H., Moran C., Protopopov A., Sun H., Tang J., Wu X., Ye Y., William W. N., Lee J. J., Heymach J. V., Hong W. K., Swisher S., Wistuba I. I., Futreal P. A., Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256–259 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zilionis R., Engblom C., Pfirschke C., Savova V., Zemmour D., Saatcioglu H. D., Krishnan I., Maroni G., Meyerovitz C. V., Kerwin C. M., Choi S., Richards W. G., De Rienzo A., Tenen D. G., Bueno R., Levantini E., Pittet M. J., Klein A. M., Single-cell transcriptomics of human and mouse lung cancers reveals conserved myeloid populations across individuals and species. Immunity 50, 1317–1334.e10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Maynard A., McCoach C. E., Rotow J. K., Harris L., Haderk F., Kerr D. L., Yu E. A., Schenk E. L., Tan W. L., Zee A., Tan M., Gui P., Lea T., Wu W., Urisman A., Jones K., Sit R., Kolli P. K., Seeley E., Gesthalter Y., Le D. D., Yamauchi K. A., Naeger D. M., Bandyopadhyay S., Shah K., Cech L., Thomas N. J., Gupta A., Gonzalez M., Do H., Tan L. S., Bacaltos B., Gomez-Sjoberg R., Gubens M., Jahan T., Kratz J. R., Jablons D., Neff N., Doebele R. C., Weissman J., Blakely C. M., Darmanis S., Bivona T. G., Therapy-induced evolution of human lung cancer revealed by single-cell RNA sequencing. Cell 182, 1232–1251.e22 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gavish A., Tyler M., Greenwald A. C., Hoefflin R., Simkin D., Tschernichovsky R., Darnell N. G., Somech E., Barbolin C., Antman T., Kovarsky D., Barrett T., Castro L. N. G., Halder D., Chanoch-Myers R., Laffy J., Mints M., Wider A., Tal R., Spitzer A., Hara T., Raitses-Gurevich M., Stossel C., Golan T., Tirosh A., Suvà M. L., Puram S. V., Tirosh I., Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618, 598–606 (2023). [DOI] [PubMed] [Google Scholar]
  • 11.Li Z., Seehawer M., Polyak K., Untangling the web of intratumour heterogeneity. Nat. Cell Biol. 24, 1192–1201 (2022). [DOI] [PubMed] [Google Scholar]
  • 12.Marine J. C., Dawson S. J., Dawson M. A., Non-genetic mechanisms of therapeutic resistance in cancer. Nat. Rev. Cancer 20, 743–756 (2020). [DOI] [PubMed] [Google Scholar]
  • 13.Vendramin R., Litchfield K., Swanton C., Cancer evolution: Darwin and beyond. EMBO J. 40, e108389 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dentro S. C., Leshchiner I., Haase K., Tarabichi M., Wintersinger J., Deshwar A. G., Yu K., Rubanova Y., Macintyre G., Demeulemeester J., Vázquez-García I., Kleinheinz K., Livitz D. G., Malikic S., Donmez N., Sengupta S., Anur P., Jolly C., Cmero M., Rosebrock D., Schumacher S. E., Fan Y., Fittall M., Drews R. M., Yao X. T., Watkins T. B. K., Lee J., Schlesner M., Zhu H. T., Adams D. J., McGranahan N., Swanton C., Getz G., Boutros P. C., Imielinski M., Beroukhim R., Sahinalp S. C., Ji Y., Peifer M., Martincorena I., Markowetz F., Mustonen V., Yuan K., Gerstung M., Spellman P. T., Wang W. Y., Morris Q. D., Wedge D. C., Van Loo P., PCAWG Evolution and Heterogeneity Working Group and the PCAWG Consortium , Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Martínez-Ruiz C., Black J. R. M., Puttick C., Hill M. S., Demeulemeester J., Cadieux E. L., Thol K., Jones T. P., Veeriah S., Naceur-Lombardelli C., Toncheva A., Prymas P., Rowan A., Ward S., Cubitt L., Athanasopoulou F., Pich O., Karasaki T., Moore D. A., Salgado R., Colliver E., Castignani C., Dietzen M., Huebner A., Al Bakir M., Tanic M., Watkins T. B. K., Lim E. L., Al-Rashed A. M., Lang D. Y., Clements J., Cook D. E., Rosenthal R., Wilson G. A., Frankell A. M., Trécesson S. D., East P., Kanu N., Litchfield K., Birkbak N. J., Hackshaw A., Beck S., Van Loo P., Jamal-Hanjani M., TRACERx Consortium, Swanton C., McGranahan N., Genomic-transcriptomic evolution in lung cancer and metastasis. Nature 616, 543–552 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Negrini S., Gorgoulis V. G., Halazonetis T. D., Genomic instability--An evolving hallmark of cancer. Nat. Rev. Mol. Cell Biol. 11, 220–228 (2010). [DOI] [PubMed] [Google Scholar]
  • 17.Holzel M., Bovier A., Tuting T., Plasticity of tumour and immune cells: A source of heterogeneity and a cause for therapy resistance? Nat. Rev. Cancer 13, 365–376 (2013). [DOI] [PubMed] [Google Scholar]
  • 18.de Visser K. E., Joyce J. A., The evolving tumor microenvironment: From cancer initiation to metastatic outgrowth. Cancer Cell 41, 374–403 (2023). [DOI] [PubMed] [Google Scholar]
  • 19.Friedmann-Morvinski D., Verma I. M., Dedifferentiation and reprogramming: origins of cancer stem cells. EMBO Rep. 15, 244–253 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kreso A., Dick J. E., Evolution of the cancer stem cell model. Cell Stem Cell 14, 275–291 (2014). [DOI] [PubMed] [Google Scholar]
  • 21.Marjanovic N. D., Hofree M., Chan J. E., Canner D., Wu K., Trakala M., Hartmann G. G., Smith O. C., Kim J. Y., Evans K. V., Hudson A., Ashenberg O., Porter C. B. M., Bejnood A., Subramanian A., Pitter K., Yan Y., Delorey T., Phillips D. R., Shah N., Chaudhary O., Tsankov A., Hollmann T., Rekhtman N., Massion P. P., Poirier J. T., Mazutis L., Li R., Lee J. H., Amon A., Rudin C. M., Jacks T., Regev A., Tammela T., Emergence of a high-plasticity cell state during lung cancer evolution. Cancer Cell 38, 229–246.e13 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Moorman A., Benitez E. K., Cambulli F., Jiang Q., Mahmoud A., Lumish M., Hartner S., Balkaran S., Bermeo J., Asawa S., Firat C., Saxena A., Wu F., Luthra A., Burdziak C., Xie Y., Sgambati V., Luckett K., Li Y., Yi Z., Masilionis I., Soares K., Pappou E., Yaeger R., Kingham T. P., Jarnagin W., Paty P. B., Weiser M. R., Mazutis L., D’Angelica M., Shia J., Garcia-Aguilar J., Nawy T., Hollmann T. J., Chaligne R., Sanchez-Vega F., Sharma R., Pe’er D., Ganesh K., Progressive plasticity during colorectal cancer metastasis. Nature 637, 947–954 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Patel A. S., Yanai I., A developmental constraint model of cancer cell states and tumor heterogeneity. Cell 187, 2907–2918 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kaiser A. M., Gatto A., Hanson K. J., Zhao R. L., Raj N., Ozawa M. G., Seoane J. A., Bieging-Rolett K. T., Wang M., Li I., Trope W. L., Liou D. Z., Shrager J. B., Plevritis S. K., Newman A. M., Van Rechem C., Attardi L. D., p53 governs an AT1 differentiation programme in lung cancer suppression. Nature 619, 851–859 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tirosh I., Suva M. L., Cancer cell states: Lessons from ten years of single-cell RNA-sequencing of human tumors. Cancer Cell 42, 1497–1506 (2024). [DOI] [PubMed] [Google Scholar]
  • 26.Luo H., Xia X., Huang L.-B., An H., Cao M., Kim G. D., Chen H.-N., Zhang W.-H., Shu Y., Kong X., Ren Z., Li P.-H., Liu Y., Tang H., Sun R., Li C., Bai B., Jia W., Liu Y., Zhang W., Yang L., Peng Y., Dai L., Hu H., Jiang Y., Hu Y., Zhu J., Jiang H., Li Z., Caulin C., Park J., Xu H., Pan-cancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nat. Commun. 13, 6619 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Stewart C. A., Gay C. M., Xi Y., Sivajothi S., Sivakamasundari V., Fujimoto J., Bolisetty M., Hartsfield P. M., Balasubramaniyan V., Chalishazar M. D., Moran C., Kalhor N., Stewart J., Tran H., Swisher S. G., Roth J. A., Zhang J. J., de Groot J., Glisson B., Oliver T. G., Heymach J. V., Wistuba I., Robson P., Wang J., Byers L. A., Single-cell analyses reveal increased intratumoral heterogeneity after the onset of therapy resistance in small-cell lung cancer. Nat. Cancer 1, 423–436 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chan J. M., Quintanal-Villalonga A., Gao V. R., Xie Y., Allaj V., Chaudhary O., Masilionis I., Egger J., Chow A., Walle T., Mattar M., Yarlagadda D. V. K., Wang J. L., Uddin F., Offin M., Ciampricotti M., Qeriqi B., Bahr A., de Stanchina E., Bhanot U. K., Lai W. V., Bott M. J., Jones D. R., Ruiz A., Baine M. K., Li Y., Rekhtman N., Poirier J. T., Nawy T., Sen T., Mazutis L., Hollmann T. J., Pe’er D., Rudin C. M., Signatures of plasticity, metastasis, and immunosuppression in an atlas of human small cell lung cancer. Cancer Cell 39, 1479–1496.e18 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kim N., Kim H. K., Lee K., Hong Y., Cho J. H., Choi J. W., Lee J.-I., Suh Y.-L., Ku B. M., Eum H. H., Choi S., Choi Y.-L., Joung J. G., Park W.-Y., Jung H. A., Sun J.-M., Lee S.-H., Ahn J. S., Park K., Ahn M.-J., Lee H.-O., Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat. Commun. 11, 2285 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Laughney A. M., Hu J., Campbell N. R., Bakhoum S. F., Setty M., Lavallée V.-P., Xie Y., Masilionis I., Carr A. J., Kottapalli S., Allaj V., Mattar M., Rekhtman N., Xavier J. B., Mazutisz L., Poirier J. T., Rudin C. M., Pe’er D., Massagué J., Regenerative lineages and immune-mediated pruning in lung cancer metastasis. Nat. Med. 26, 259–269 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wu F., Fan J., He Y., Xiong A., Yu J., Li Y., Zhang Y., Zhao W., Zhou F., Li W., Zhang J., Zhang X., Qiao M., Gao G., Chen S., Chen X., Li X., Hou L., Wu C., Su C., Ren S., Odenthal M., Buettner R., Fang N., Zhou C., Single-cell profiling of tumor heterogeneity and the microenvironment in advanced non-small cell lung cancer. Nat. Commun. 12, 2540 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Patil N. S., Nabet B. Y., Muller S., Koeppen H., Zou W., Giltnane J., Au-Yeung A., Srivats S., Cheng J. H., Takahashi C., de Almeida P. E., Chitre A. S., Grogan J. L., Rangell L., Jayakar S., Peterson M., Hsia A. W., O’Gorman W. E., Ballinger M., Banchereau R., Shames D. S., Intratumoral plasma cells predict outcomes to PD-L1 blockade in non-small cell lung cancer. Cancer Cell 40, 289–300.e4 (2022). [DOI] [PubMed] [Google Scholar]
  • 33.Desai T. J., Brownfield D. G., Krasnow M. A., Alveolar progenitor and stem cells in lung development, renewal and cancer. Nature 507, 190–194 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhu J. F., Fan Y., Xiong Y. L., Wang W. C., Chen J. K., Xia Y. M., Lei J., Gong L., Sun S. Q., Jiang T., Delineating the dynamic evolution from preneoplasia to invasive lung adenocarcinoma by integrating single-cell RNA sequencing and spatial transcriptomics. Exp. Mol. Med. 54, 2060–2076 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Tirosh I., Venteicher A. S., Hebert C., Escalante L. E., Patel A. P., Yizhak K., Fisher J. M., Rodman C., Mount C., Filbin M. G., Neftel C., Desai N., Nyman J., Izar B., Luo C. C., Francis J. M., Patel A. A., Onozato M. L., Riggi N., Livak K. J., Gennert D., Satija R., Nahed B. V., Curry W. T., Martuza R. L., Mylvaganam R., Iafrate A. J., Frosch M. P., Golub T. R., Rivera M. N., Getz G., Rozenblatt-Rosen O., Cahill D. P., Monje M., Bernstein B. E., Louis D. N., Regev A., Suvà M. L., Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature 539, 309–313 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Han X., Zhou Z., Fei L., Sun H., Wang R., Chen Y., Chen H., Wang J., Tang H., Ge W., Zhou Y., Ye F., Jiang M., Wu J., Xiao Y., Jia X., Zhang T., Ma X., Zhang Q., Bai X., Lai S., Yu C., Zhu L., Lin R., Gao Y., Wang M., Wu Y., Zhang J., Zhan R., Zhu S., Hu H., Wang C., Chen M., Huang H., Liang T., Chen J., Wang W., Zhang D., Guo G., Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020). [DOI] [PubMed] [Google Scholar]
  • 37.Chen T. X., Luo J. Z., Gu H. Y., Gu Y., Huan J., Luo Q. Q., Yang Y. H., Should minimally invasive lung adenocarcinoma be transferred from stage IA1 to stage 0 in future updates of the TNM staging system? J. Thorac. Dis. 10, 6247–6253 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sikkema L., Ramirez-Suastegui C., Strobl D. C., Gillett T. E., Zappia L., Madissoon E., Markov N. S., Zaragosi L. E., Ji Y., Ansari M., Arguel M. J., Apperloo L., Banchero M., Becavin C., Berg M., Chichelnitskiy E., Chung M. I., Collin A., Gay A. C. A., Gote-Schniering J., Kashani B. H., Inecik K., Jain M., Kapellos T. S., Kole T. M., Leroy S., Mayr C. H., Oliver A. J., von Papen M., Peter L., Taylor C. J., Walzthoeni T., Xu C., Bui L. T., De Donno C., Dony L., Faiz A., Guo M., Gutierrez A. J., Heumos L., Huang N., Ibarra I. L., Jackson N. D., Murthy P. K. L., Lotfollahi M., Tabib T., Talavera-Lopez C., Travaglini K. J., Wilbrey-Clark A., Worlock K. B., Yoshida M., Lung Biological Network Consortium, van den Berge M., Bosse Y., Desai T. J., Eickelberg O., Kaminski N., Krasnow M. A., Lafyatis R., Nikolic M. Z., Powell J. E., Rajagopal J., Rojas M., Rozenblatt-Rosen O., Seibold M. A., Sheppard D., Shepherd D. P., Sin D. D., Timens W., Tsankov A. M., Whitsett J., Xu Y., Banovich N. E., Barbry P., Duong T. E., Falk C. S., Meyer K. B., Kropski J. A., Pe’er D., Schiller H. B., Tata P. R., Schultze J. L., Teichmann S. A., Misharin A. V., Nawijn M. C., Luecken M. D., Theis F. J., An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tata P. R., Mou H., Pardo-Saganta A., Zhao R., Prabhu M., Law B. M., Vinarsky V., Cho J. L., Breton S., Sahay A., Medoff B. D., Rajagopal J., Dedifferentiation of committed epithelial cells into stem cells in vivo. Nature 503, 218–223 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schabath M. B., Welsh E. A., Fulp W. J., Chen L., Teer J. K., Thompson Z. J., Engel B. E., Xie M., Berglund A. E., Creelan B. C., Antonia S. J., Gray J. E., Eschrich S. A., Chen D. T., Cress W. D., Haura E. B., Beg A. A., Differential association of and with mutation-associated gene expression, proliferation and immune surveillance in lung adenocarcinoma. Oncogene 35, 3209–3216 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Gaujoux R., Seoighe C., A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11, 367 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Aibar S., González-Blas C. B., Moerman T., Huynh-Thu V. A., Imrichova H., Hulselmans G., Rambow F., Marine J. C., Geurts P., Aerts J., van den Oord J., Atak Z. K., Wouters J., Aerts S., SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cable D. M., Murray E., Zou L. S., Goeva A., Macosko E. Z., Chen F., Irizarry R. A., Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Choi J., Park J.-E., Tsagkogeorga G., Yanagita M., Koo B.-K., Han N., Lee J.-H., Inflammatory signals induce AT2 cell-derived damage-associated transient progenitors that mediate alveolar regeneration. Cell Stem Cell 27, 366–382.e7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bergen V., Lange M., Peidli S., Wolf F. A., Theis F. J., Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020). [DOI] [PubMed] [Google Scholar]
  • 46.Gulati G. S., Sikandar S. S., Wesche D. J., Manjunath A., Bharadwaj A., Berger M. J., Ilagan F., Kuo A. H., Hsieh R. W., Cai S., Zabala M., Scheeren F. A., Lobo N. A., Qian D. L., Yu F. B., Dirbas F. M., Clarke M. F., Newman A. M., Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367, 405–411 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jin S., Guerrero-Juarez C. F., Zhang L., Chang I., Ramos R., Kuan C.-H., Myung P., Plikus M. V., Nie Q., Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Li J., Wu C., Hu H., Qin G., Wu X., Bai F., Zhang J., Cai Y., Huang Y., Wang C., Yang J., Luan Y., Jiang Z., Ling J., Wu Z., Chen Y., Xie Z., Deng Y., Remodeling of the immune and stromal cell compartment by PD-1 blockade in mismatch repair-deficient colorectal cancer. Cancer Cell 41, 1152–1169.e7 (2023). [DOI] [PubMed] [Google Scholar]
  • 49.Goto N., Westcott P. M. K., Goto S., Imada S., Taylor M. S., Eng G., Braverman J., Deshpande V., Jacks T., Agudo J., Yilmaz Ö. H., SOX17 enables immune evasion of early colorectal adenomas and cancers. Nature 627, 636–645 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Strunz M., Simon L. M., Ansari M., Kathiriya J. J., Angelidis I., Mayr C. H., Tsidiridis G., Lange M., Mattner L. F., Yee M., Ogar P., Sengupta A., Kukhtevich I., Schneider R., Zhao Z., Voss C., Stoeger T., Neumann J. H. L., Hilgendorff A., Behr J., O’Reilly M., Lehmann M., Burgstaller G., Königshoff M., Chapman H. A., Theis F. J., Schiller H. B., Alveolar regeneration through a Krt8+ transitional stem cell state that persists in human lung fibrosis. Nat. Commun. 11, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lu W., Ji R., Identification of significant alteration genes, pathways and TFs induced by LPS in ARDS via bioinformatical analysis. BMC Infect. Dis. 21, 852 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhao M., Kim P., Mitra R., Zhao J., Zhao Z., TSGene 2.0: an updated literature-based knowledgebase for tumor suppressor genes. Nucleic Acids Res. 44, D1023–D1031 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Balsalobre A., Drouin J., Pioneer factors as master regulators of the epigenome and cell fate. Nat. Rev. Mol. Cell Biol. 23, 449–464 (2022). [DOI] [PubMed] [Google Scholar]
  • 54.Lambert S. A., Jolma A., Campitelli L. F., Das P. K., Yin Y., Albu M., Chen X. T., Taipale J., Hughes T. R., Weirauch M. T., The human transcription factors. Cell 175, 598–599 (2018). [DOI] [PubMed] [Google Scholar]
  • 55.Suvà M. L., Tirosh I., Single-cell RNA sequencing in cancer: Lessons learned and emerging challenges. Mol. Cell 75, 7–12 (2019). [DOI] [PubMed] [Google Scholar]
  • 56.Butler A., Hoffman P., Smibert P., Papalexi E., Satija R., Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Chen A., Liao S., Cheng M., Ma K., Wu L., Lai Y., Qiu X., Yang J., Xu J., Hao S., Wang X., Lu H., Chen X., Liu X., Huang X., Li Z., Hong Y., Jiang Y., Peng J., Liu S., Shen M., Liu C., Li Q., Yuan Y., Wei X., Zheng H., Feng W., Wang Z., Liu Y., Wang Z., Yang Y., Xiang H., Han L., Qin B., Guo P., Lai G., Munoz-Canoves P., Maxwell P. H., Thiery J. P., Wu Q. F., Zhao F., Chen B., Li M., Dai X., Wang S., Kuang H., Hui J., Wang L., Fei J. F., Wang O., Wei X., Lu H., Wang B., Liu S., Gu Y., Ni M., Zhang W., Mu F., Yin Y., Yang H., Lisby M., Cornall R. J., Mulder J., Uhlen M., Esteban M. A., Li Y., Liu L., Xu X., Wang J., Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792.e21 (2022). [DOI] [PubMed] [Google Scholar]
  • 58.Wei R., He S., Bai S., Sei E., Hu M., Thompson A., Chen K., Krishnamurthy S., Navin N. E., Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol. 40, 1190–1199 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Ma L., Hernandez M. O., Zhao Y., Mehta M., Tran B., Kelly M., Rae Z., Hernandez J. M., Davis J. L., Martin S. P., Kleiner D. E., Hewitt S. M., Ylaya K., Wood B. J., Greten T. F., Wang X. W., Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Cancer Cell 36, 418–430.e6 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Subramanian A., Tamayo P., Mootha V. K., Mukherjee S., Ebert B. L., Gillette M. A., Paulovich A., Pomeroy S. L., Golub T. R., Lander E. S., Mesirov J. P., Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Qiu X., Mao Q., Tang Y., Wang L., Chawla R., Pliner H. A., Trapnell C., Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Li Y., Li X., Li H., Zhao Y., Liu Z., Sun K., Zhu X., Qi Q., An B., Shen D., Li R., Liu T., Mi J., Wang L., Yang F., Bai F., Wang J., Genomic characterisation of pulmonary subsolid nodules: Mutational landscape and radiological features. Eur. Respir. J. 55, 1901409 (2020). [DOI] [PubMed] [Google Scholar]
  • 63.Schabath M. B., Welsh E. A., Fulp W. J., Chen L., Teer J. K., Thompson Z. J., Engel B. E., Xie M., Berglund A. E., Creelan B. C., Antonia S. J., Gray J. E., Eschrich S. A., Chen D. T., Cress W. D., Haura E. B., Beg A. A., Differential association of STK11 and TP53 with KRAS mutation-associated gene expression, proliferation and immune surveillance in lung adenocarcinoma. Oncogene 35, 3209–3216 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figs. S1 to S14

Legends for tables S1 to S11

sciadv.ady8546_sm.pdf (19.5MB, pdf)

Tables S1 to S11

Data Availability Statement

This study did not generate new biological materials. All reagents and materials used in this study are commercially available or can be obtained from the indicated suppliers or public repositories. The single-cell RNA and spatial transcriptomics sequencing data generated in this study have been deposited in the National Genomics Data Center (NGDC) Genome Sequence Archive for Human under accession number HRA008879 (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA008879) and HRA012291 (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA012291). The bulk RNA-seq and CUT&Tag-seq reported in this study have been deposited in the National Omics Data Encyclopedia under accession no. OEP00006143 (www.biosino.org/node/project/detail/OEP00006143) and OEP00006493 (www.biosino.org/node/project/detail/OEP00006493). Code for data processing and figure generation is available at Zenodo with DOI: 10.5281/zenodo.16922603 (https://doi.org/10.5281/zenodo.16922603). The publicly available bulk WES data for early-stage LUAD were obtained from the study by (62). The publicly available scRNA-seq data of early-stage, advanced-stage, metastatic LUAD used in this study were downloaded from the GEO (www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE189357 (34), GSE148071 (31), and GSE131907 (29). The publicly available scRNA-seq data of normal lung tissues were downloaded from https://cellxgene.cziscience.com/collections. The publicly available spatial transcriptomics data of early-stage LUAD were downloaded from the GEO under accession number GSE189487 (34). The publicly available bulk RNA-seq data of early-stage LUAD were downloaded from the GEO under accession number GSE72094 (63).


Articles from Science Advances are provided here courtesy of American Association for the Advancement of Science

RESOURCES