Background
We developed the first multi-omics prognostic signature integrating 19 programmed cell death (PCD) pathways and organelle functions (mitochondria, lysosomes, Golgi apparatus) to predict prognosis and immunotherapy response in non-small cell lung cancer (NSCLC). (2) Methods: By combining single-cell RNA-seq, bulk transcriptomics, and deep neural networks (DNN), we identified a 23-gene signature validated across four cohorts (AUC 0.696–0.812). Conducted MR analysis to explore causal links between signature genes and NSCLC incidence, providing biological insights. (3) Results: A prognostic signature was developed, including 23 prognostic genes related to 19 PCD patterns and three organelle functions. The signature demonstrated powerful performance in predicting NSCLC prognosis, immune in-filtration, and therapeutic response. Established DNN models showed high value in predicting risk score groupings of NSCLC. MR analysis for combined SNP information of the 23 prognostic genes suggested a link to the high incidence of NSCLC. Individual MR analysis showed that HIF1A and SQLE expression had a causal effect on NSCLC incidence. (4) Conclusion: This signature stratifies high-risk patients with immunosuppressive microenvironments and predicts enhanced sensitivity to gemcitabine and PD-1 inhibitors, offering a roadmap for personalized NSCLC management.
Supplementary Information
The online version contains supplementary material available at 10.1007/s12672-025-03243-2.
Keywords: Non-small cell lung cancer, Programmed cell death, Organelle function, Single-cell analysis, Machine learning, Prognosis, Immunotherapy
Introduction
Lung cancer is a leading cause of malignancy-related death worldwide, with non-small cell lung cancer (NSCLC) comprising approximately 85% of all lung cancer diagnoses [1]. Despite advancements in surgical techniques, chemotherapy, radiation therapy, and targeted therapy, the prognosis for NSCLC patients remains grim [2]. The emergence of immunotherapy, particularly immune checkpoint inhibitors (ICIs), has revolutionized treatment approaches [2]. However, a significant proportion of patients eventually experience disease progression due to immunotherapy resistance [3]. Currently, NSCLC diagnosis and prognosis rely heavily on tumor stage and histopathological assessment, which lack sufficient accuracy and fail to account for the diverse phenotypes and prognoses among patients [4]. This underscores the urgent need for more precise and personalized prognostic models in NSCLC management [4].
Programmed cell death (PCD) plays a pivotal role in cancer development, prognosis, and treatment response. Multiple forms of PCD—including apoptosis, autophagy, necroptosis, ferroptosis, cuproptosis, and pyroptosis—are implicated in tumor progression and therapy resistance [5]. Emerging evidence highlights the critical interplay between PCD and organelle functions, particularly those of mitochondria, lysosomes, and the Golgi apparatus [6]. For instance, mitochondrial dysfunction can impair apoptosis mechanisms, leading to cisplatin resistance, while lysosomal stress can trigger ferroptosis [7]. Similarly, Golgi apparatus dysfunction has been linked to cancer progression and drug resistance [8]. These organelles not only regulate cellular homeostasis but also actively participate in cancer-related signaling pathways, making their integration with PCD pathways a promising avenue for understanding cancer biology [9].
However, current prognostic models often fail to integrate the complex interplay between various PCD pathways and organelle dysfunctions in NSCLC [10]. This gap in our understanding limits the accuracy and clinical utility of existing prognostic tools [11]. There is a pressing need for a more comprehensive approach that captures these multiple layers of biological information to improve prognostic accuracy and guide treatment decisions [12].
To address these limitations, we developed a novel prognostic signature for NSCLC by integrating genes related to multiple PCD pathways and key organelle functions [11]. This signature demonstrated robust performance in predicting prognosis, immune infiltration, and therapeutic response across multiple cohorts [13]. Our approach addresses the need for more comprehensive prognostic models in NSCLC by incorporating diverse biological processes implicated in cancer development and progression.
Materials and methods
Data acquisition and preprocessing
The NSCLC scRNA-seq dataset GSE117570 [14] was downloaded from the NCBI Gene Expression Omnibus (GEO) database, including 8 samples. Cells with feature < 200 and mitochondria proportion > 5% were filtered. Finally, 11,481 cells were obtained for subsequent single-cell analysis.
The RNA-seq data (Transcript per Kilobase per Million mapped reads, TPM) of TCGA-LUAD and TCGA-LUSC were obtained from The Cancer Genome Atlas (TCGA) and subject to log2 (TPM + 1) conversion. After excluding blood and recurrent samples and samples lacking clinical information, TCGA-LUAD dataset included 478 tumor and 57 adjacent normal tissue samples, and TCGA-LUSC dataset included 444 tumor and 43 normal samples. LUAD and LUSC data were merged as a NSCLC dataset (including 922 tumor and 100 normal samples) for subsequent analysis.
GEO datasets GSE50081 [15], GSE29013 [16], and GSE37745 [17] were downloaded using the GEOquery package [18] and included as external validation datasets.
Additional data types (miRNA-seq, DNA methylation, somatic mutation, and CNV) were obtained from TCGA and processed as described in the Supplementary Methods.
To evaluate the performance of our 23-gene prognostic model against existing models, we conducted a comparative analysis using the C-index as the primary performance metric across multiple cohorts. The TCGA dataset served as the training set, where the model was initially developed and optimized. Performance was then assessed in three external validation cohorts (GSE50081, GSE29013, and GSE37745) to ensure generalizability. For each model comparison, we calculated the C-index with confidence intervals, allowing us to rank models and gauge consistency in performance across datasets. Univariate Cox regression analysis was applied in each cohort to compare the prognostic value of our model relative to other established models. This approach enabled a rigorous evaluation of model robustness across diverse NSCLC patient cohorts.
Identification of key genes
Organelle-related and PCD-related genes
A total of 1136 mitochondria-related genes (MRGs) were acquired from MitoCarta3.0 database [19], and 1634 Golgi apparatus-related genes (GARGs) were taken from the GOCC_GOLGI_APPARATUS gene set in MSigDB database [20]. The genes in the KEGG_LYSOSOME (M11266) and LYSOSOME (M13845) sets were deduplicated, and 163 lysosome-related genes (LRGs) were obtained. Additionally, 18 PCD patterns and key regulatory genes were extracted from the literature published by Qin et al. [21], and the disulfidptosis-related genes were obtained from another study published by Xu et al. [22]. Finally, 19 PCD patterns and 1567 PCD-related genes (PCDRGs) were obtained.
The rationale for integrating these gene sets was to capture the complex interplay between organelle functions and PCD in cancer biology, providing a comprehensive view of cellular processes relevant to NSCLC progression and treatment response.
Single-cell analysis and aucell scores
Using Seurat, 18 clusters were visualized using UMAP plot. Based on cell type marker genes, nine different cell types were revealed by manual annotation. The AUCell [23] package was used to calculate the activity scores of organelle-related genes across all cells in NSCLC tumor samples.
Differential expression analysis
Single-cell differentially expressed genes (scDEGs) were identified between high- and low-AUC score groups using FindMarkers function [24]. The cutoff value was p value < 0.05 and |log2 fold change (FC)| > 0.138. Differentially expressed genes (DEGs) were screened from NSCLC tumor samples in relation to normal samples using limma package [25]. Threshold value was p < 0.05 and |log2FC| > 0.138. The |log₂FC| > 0.138 threshold (FC ≈ 1.1) was selected to capture subtle but biologically coordinated changes in organelle stress pathways. This approach is supported by recent studies showing that moderate but consistent expression changes across gene networks provide stronger biological signals than isolated high-fold-change genes in cancer systems biology [26]. Sensitivity analyses confirmed signature stability at |log₂FC| > 0.5.
WGCNA and module gene identification
Using Weighted Gene Correlation Network Analysis (WGCNA) [27], we identified co-expressed gene modules in TCGA NSCLC samples. The soft threshold was determined using the pickSoftTreshold function and the scale-free network was then constructed. With 100 as the minimum gene number, gene modules were dynamically cut and identified. The modules with higher correction with cell organ function scores (COFS) were identified through pearson correlation analysis, and the module genes (MEGs) were screened out.
Integration of gene sets
By intersection analysis of PCDRGs, scDEGs, DEGs, and MEGs, the overlapping genes were identified as key genes of this study. This integrative approach aimed to capture genes with multi-level evidence of importance in NSCLC biology.
Construction of prognostic model
To construct the prognostic model, we evaluated 10 machine learning algorithms, including Least Absolute Shrinkage and Selection Operator (Lasso), elastic network (Enet), stepwise Cox, generalized boosted regression modeling (GBM), CoxBoost, Ridge regression, supervised principal components (SuperPC), partial least squares regression for Cox (plsRcox), random survival forest (RSF), and survival support vector machine (survival-SVM). For algorithm optimization, we performed 10-fold cross-validation with 100 iterations to minimize overfitting. The StepCox[backward] + RSF combination was selected as the optimal model based on its minimal Brier score (< 0.15) and highest average concordance index (C-index) across external validation cohorts (λ = 0.01).
Validation of prognostic model
The prognostic model was validated using Kaplan-Meier survival analysis, time-dependent ROC analysis, and C-index calculation in both TCGA and GEO cohorts. A nomogram incorporating the risk score and clinical factors was constructed using R rms package (Version 6.7-0) [28] and evaluated using calibration curves and decision curve analysis (DCA).
Functional and pathway analyses
Gene Ontology (GO) [29], Kyoto Encyclopedia of Genes and Genomes (KEGG) [30], Gene Set Enrichment Analysis (GSEA), and Gene Set Variation Analysis (GSVA) [31] were performed to explore the biological functions and pathways associated with the prognostic signature.
Immune infiltration and drug sensitivity analyses
Five algorithms (CIBERSORT [32], ESTIMATE [33], MCPcounter [33], xCell [34], and ssGSEA [35]) were used to assess immune cell infiltration levels. The relationship between risk scores and immunotherapy response predictors was investigated. Drug sensitivity was analyzed using the GDSC database [36] and the Connectivity Map (CMap) approach.
CNV and somatic mutation analyses
CNV and somatic mutation patterns were compared between high- and low-risk groups using GISTIC 2.0 [37] and maftools [38], respectively.
Deep neural network (DNN) model construction
Multi-omics integration strategy: The DNN architecture fused mRNA, lncRNA, miRNA, and methylation layers using a hybrid attention mechanism. Specifically, each omics layer was first processed through separate convolutional blocks to extract latent features. Cross-omics interactions were prioritized via a multi-head attention module, where query-key-value pairs were derived from concatenated embeddings. The attention weights were optimized to highlight synergistic relationships between methylation-regulated miRNAs and their target mRNA/lncRNA networks (Supplementary Fig. S5). The final risk score was generated through a fully connected layer with dropout regularization (rate = 0.3). Model performance was evaluated using time-dependent ROC curves and compared against single-omics baselines.
Mendelian randomization (MR) analysis
MR analysis was performed using the TwoSampleMR package [39] to explore potential causal relationships between the expression of signature genes and NSCLC incidence.
Statistical analysis
In this study, control samples were selected from adjacent non-cancerous tissues to ensure direct comparability with NSCLC samples. To stratify these control and disease groups effectively, we applied consistent preprocessing protocols across datasets, including normalization and batch effect correction, to reduce technical variation and enhance comparability. By using non-cancerous adjacent tissue as controls, we aimed to isolate gene expression changes specific to NSCLC, thereby strengthening the relevance of our conclusions.
All statistical analyses were performed using R 4.2.2. Detailed information on specific statistical tests is provided in the relevant results sections and figure legends.
In our analysis, we used limma and FindMarkers for differential expression, given their robustness in bulk and single-cell RNA-seq data. Kaplan-Meier and Cox models assessed survival, suitable for estimating survival differences and adjusting for covariates. Time-dependent ROC and C-index evaluated model performance, while Mann-Whitney U and Wilcoxon tests compared immune scores across groups. Pearson/Spearman correlation was applied based on data distribution, and Fisher’s exact test for mutation frequency comparisons. For drug sensitivity, Mann-Whitney U was used, while Mendelian Randomization (TwoSampleMR) assessed causal gene-NSCLC risk links, with each test chosen to best match data characteristics and study aims.
For comparisons between high-risk and low-risk NSCLC patient groups, the Mann-Whitney U test was used as a non-parametric alternative to assess differences in immunotherapy predictive indexes (e.g., CYT, IFNG, GEP, TMB, and IPS) given the independent nature of the groups. This test was chosen for its suitability in comparing two unpaired groups without assuming a normal distribution. P-values were calculated to evaluate the statistical significance of differences, and results are displayed as box plots to illustrate median values and interquartile ranges.
This comprehensive methodological approach integrates multiple layers of biological information and analytical techniques to develop and validate a robust prognostic signature for NSCLC. The combination of single-cell and bulk sequencing data, along with the integration of PCD and organelle function genes, provides a unique perspective on NSCLC biology and patient outcomes.Methods.
Results
Identification of key genes in NSCLC
To develop a comprehensive prognostic signature for NSCLC, we integrated multiple layers of genomic data and employed various bioinformatic approaches (Fig. 1). We first collected 2844 organelle function-related genes (COFRGs) by combining 1136 mitochondria-related genes (MRGs), 1634 Golgi apparatus-related genes (GARGs), and 163 lysosome-related genes (LRGs). Additionally, we gathered 1567 programmed cell death-related genes (PCDRGs) encompassing 19 PCD pathways.
Fig. 1.
Flowchart of our study
Single-cell RNA-seq analysis of the GSE117570 dataset revealed 9 distinct cell types in NSCLC samples (Fig. 2A-C). The distribution of cell types and gene expression profiles in normal and tumor NSCLC samples were analyzed, revealing changes in cellular composition and key differentially expressed genes that highlight the tumor microenvironment’s heterogeneity (Fig. 2D-F). Analysis revealed tumor-specific redistribution of cell populations, with M2 macrophages increasing from 12 to 28% in tumor microenvironments (Fig. 2E). The expression of organelle-related genes varied among different cell types (Fig. 2G-I). AUCell analysis of COFRGs in tumor samples revealed higher activity in M2 cells (Fig. 3A-B). Differential expression analysis identified 1952 single-cell differentially expressed genes (scDEGs) between high and low AUC score groups (Fig. 3C). The dominant overexpression of HIF1A and SQLE in high-AUC groups (Fig. 3C) aligns with their known roles in metabolic reprogramming, suggesting hypoxia adaptation as a key survival mechanism in treatment-resistant niches. Volcano plots highlight HIF1A as the most significantly upregulated gene in high-AUC single cells (Fig. 3C), while bulk analyses revealed TP53 as the most frequently downregulated tumor suppressor (Fig. 3D).
Fig. 2.

Single-cell heterogeneity of NSCLC and calculation of organelle-related gene scores based on single-cell data. A: Clustering results of single cell data; B: Cell type annotation results; C: Bubble map of marker gene expression in different cell clusters; D: The expression heatmap of the top 20 differential genes in single-cell transcriptome. Red represented up-regulated expression and blue indicated down-regulated expression. E: Proportional distribution of cell types in normal and tumor samples. Combined panels facilitate direct comparison of cellular composition changes in the tumor microenvironment. Cell types were annotated based on marker genes (see Methods). F: UMAP and violin plots of GARGScore distribution in different cell types. G: UMAP and violin plots of LRGScore distribution in different cell types. H: UMAP and violin plots of MRGScore distribution in different cell types
Fig. 3.
AUCell scores of organelle-related gene activity and identification of key genes in NSCLC.A: UMAP plot of AUCell score based on organelle-related genes in NSCLC single cell dataset. B: UMAP of high- and low-AUC score groups; C: Volcano plot of single-cell differentially expressed genes (scDEGs) between high- and low-AUC score groups. Upregulated genes (red) include M2 macrophage markers (e.g., CD163), while downregulated genes (blue) enrich for epithelial cell markers (e.g., KRT1). D: Volcano plot of bulk RNA-seq DEGs between NSCLC tumors and normal tissues. Top genes enrich for immune response (e.g., HLA-DRB1) and cell cycle pathways (e.g., CDK1). E: Cluster tree after cutting outlier samples by cut height during WGCNA. To improve readability, labels have been adjusted to minimize overlap, ensuring that the hierarchical relationships among modules are clearly visible. F: Determination of optimal soft power. G: Analysis diagram of the aggregation process of modular genes. H: Heatmap of correlation between modules and COFS. I: Intersection analysis of MEblue module genes
Bulk RNA-seq analysis of TCGA NSCLC samples identified 12,233 differentially expressed genes (DEGs) between tumor and normal tissues (5668 upregulated, 6565 downregulated) (Fig. 3D).
WGCNA of TCGA NSCLC samples yielded four gene modules (Fig. 3E-G). The blue module (3016 genes) showed the strongest correlation with cell organ function scores (COFS) (Fig. 3H).
By intersecting PCDRGs (n = 1567), scDEGs (n = 1952), DEGs (n = 12,233), and MEGs (n = 3016), we identified 51 key genes for further analysis (Fig. 3I). This integrative approach aimed to capture genes with multi-level evidence of importance in NSCLC biology.
Development and validation of the NSCLC prognostic model
Using 101 machine learning algorithm combinations, we constructed prediction models based on the 51 key genes (Fig. 4A). The StepCox[backward] + RSF model showed the highest average C-index (0.696) across four datasets. This model identified 23 prognostic genes (Fig. 4B-C).
Fig. 4.
Construction and validation of the NSCLC (non-small cell lung cancer) prognostic model. A: The C-index of the combination of 101 machine learning algorithms in four cohorts. Each color represents a different cohort: TCGA (training cohort) and three external validation cohorts (GSE50081, GSE29013, GSE37745). The model was optimized and selected based on performance within the TCGA training cohort and subsequently validated in the three independent test cohorts to assess generalizability. B: StepCox[backward] analysis results; C: RSF (random survival forest) analysis results; D: Survival curves between high- and low-risk groups in four cohorts. E: ROC (Receiver Operating Characteristic) analysis for predicting the total survival time of 1, 3 and 5 years in four cohorts
It is noteworthy that GAPDH, traditionally considered a housekeeping gene, showed the highest weight in our model (Fig. 4C).While this may seem counterintuitive, recent studies demonstrate GAPDH’s non-canonical roles in DNA repair [40]and mTOR signaling [41], with its overexpression correlating with chemotherapy resistance in lung adenocarcinoma [42].
The 23-gene signature significantly stratified patients into high- and low-risk groups in TCGA and three GEO cohorts (all p < 0.01) (Fig. 4D). Patients were classified as high-risk if their risk score exceeded the median threshold, while those with scores below the median were categorized as low-risk. This stratification was validated by Kaplan-Meier survival analysis, demonstrating significantly poorer overall survival in the high-risk group (log-rank p < 0.001 across all cohorts).‘High’ and ‘Low’ risk groups were defined based on the prognostic model’s calculated risk scores using the 23-gene signature. An optimal cutoff value was determined through survival analysis to maximize discrimination between outcomes. Patients with risk scores above this threshold were categorized as ‘High’ risk, indicating a poorer prognosis, while those with scores below the threshold were classified as ‘Low’ risk, suggesting a more favorable prognosis. This stratification approach allows for clear separation of risk profiles within the patient cohort. Time-dependent ROC analysis demonstrated good predictive performance at 1, 3, and 5 years in all cohorts (Fig. 4E). The C-index of the risk score outperformed most clinical and molecular features in predicting NSCLC prognosis (Fig. 4F).
Construction and evaluation of a nomogram
Multivariate Cox regression analysis identified tumor stage (p < 0.001) and risk score (p < 0.001) as independent prognostic factors (Fig. 5A-B). A nomogram incorporating these factors showed high clinical decision-making benefits and survival predictive accuracy (Fig. 5C-E). Calibration curves assess the agreement between predicted and actual survival probabilities, while DCA evaluates the clinical net benefit of the nomogram at varying threshold probabilities. These analyses confirm the nomogram’s accuracy and utility in guiding treatment decisions compared to traditional clinical factors alone.
Fig. 5.

Construction of a nomogram based on risk score and clinical factors of NSCLC patients in TCGA dataset, and comparison of our prognostic model with other published prognostic models. A: Results of univariate Cox regression analysis. B; Results of multivariate Cox regression analysis; C: The constructed nomogram. D: DCA curves of nomogram model. E: The 1-, 3-, and 5-year calibration curves. F: Univariate Cox regression analysis comparing our model’s performance in the TCGA training cohort with several established models. G: C-index values of our model and other models across four cohorts: TCGA (training) and three external validation cohorts (GSE50081, GSE29013, and GSE37745). While our model achieved the highest performance in the TCGA training cohort, it consistently ranked within the top ~ 20% of models across external datasets. Error bars represent confidence intervals for each model’s performance metric
We compared our 23-gene model with 76 published signatures (Fig. 5F-G). While our model showed superior performance in many cases, we acknowledge that the number of genes in a signature can impact its performance and clinical applicability. Future studies should focus on optimizing the gene selection process to balance predictive power and practical implementation. The nomogram (Fig. 5C) translates molecular risk scores into clinically actionable prognoses - for example, a Stage III patient with 80-risk points has 24% 3-year survival versus 63% for a comparable low-risk patient. Decision curve analysis (Fig. 5D) demonstrates superior net benefit versus TNM staging alone across threshold probabilities > 10%, supporting clinical utility. Calibration curves (Fig. 5E) show < 5% deviation between predicted and observed survival, outperforming existing models (Harrell’s C = 0.82 vs. 0.76 average).
Association between risk score and immune characteristics
Analysis of immune cell infiltration using five algorithms revealed significant differences between high- and low-risk groups (Fig. 6A-E). Expression of immune regulators also differed between risk groups (Fig. 6F-G).
Fig. 6.

Analysis of immune cell infiltration, immunomodulators, and correlation analysis of immunotherapy predictors. A: The distribution of stromal score, immune score and ESTIMATE score among high- and low- risk groups. B: Immune cell infiltration difference estimated by CIBERSORT method. C: Immune cell infiltration difference estimated by MCPcounter method. D: Immune cell infiltration difference estimated by ssGSEA. E: Immune cell infiltration difference estimated by xCELL method. F: The expression heatmap of seven types of immunomodulators in NSCLC tumor samples in the TCGA dataset. G: The differences in expression of seven types of immunomodulators between high- and low-risk groups. H: Comparison of immunotherapy predictive indexes (e.g., CYT, IFNG, GEP, TMB, and IPS) between high-risk and low-risk NSCLC patient groups. The Mann-Whitney U test was applied to assess differences between the two independent groups, with significance denoted by p-values for each predictive index. Box plots illustrate median values and interquartile ranges, with individual data points shown to indicate variability within each group
Among immunotherapy predictors, CYT, IFNG, and GEP were significantly higher in the low-risk group (p < 0.05) (Fig. 6H), suggesting potential implications for immunotherapy response prediction.
Potential biological mechanisms related to the prognostic model
Analysis of tumor immune cycle and tumor microenvironment-related signatures revealed distinct patterns between risk groups (Fig. 7A-C). Differential expression analysis identified numerous mRNAs, lncRNAs, miRNAs, and methylation probes between risk groups (Fig. 7D).
Fig. 7.

Potential biological mechanisms of risk score and enrichment analysis results. A: The dif ference of tumor immune cycle between high- and low-risk groups. B: The differences in tumor microenvironment-related signatures developed by Kobayashi et al. between two risk groups. C: The differences in tumor microenvironment-related signatures developed by Bagaev et al. between two risk groups. D: The Top20 differentially expressed mRNAs, lncRNAs, miRNAs, and differential methylation probes. E: GO enrichment analysis results. F: Results of KEGG enrichment analysis
Enrichment analyses highlighted pathways related to immune function, cell cycle, and cellular adhesion (Fig. 7E-H), providing insights into the biological mechanisms underlying the prognostic differences between risk groups.
Figure 7 highlights micro-RNAs (miRNAs) due to their regulatory role in NSCLC, as they influence the expression of key genes within our prognostic signature. We focused on miRNAs because they modulate critical pathways linked to tumor progression and immune response. Observed changes in miRNAs were compared with expected shifts in mRNA targets to assess regulatory consistency. To enhance clarity, we have used enrichment or overlap metrics, rather than raw counts, to provide a clearer view of miRNA-mRNA interactions and their strength, aiding in a deeper understanding of miRNA impacts on NSCLC prognosis.
Somatic mutation and CNV analyses
High- and low-risk NSCLC groups exhibit distinct genetic profiles, particularly in copy number variation (CNV) and somatic mutation patterns. High-risk patients showed increased amplifications and deletions in specific genomic regions, along with a unique spectrum of somatic mutations compared to low-risk patients. These variations suggest that certain genetic alterations may underlie the elevated risk profile, reinforcing the association between CNV, somatic mutation patterns, and risk stratification in NSCLC (Fig. 8A-D).
Fig. 8.

Analysis of copy number variation and somatic mutations of two risk groups. A: The duplicate regions with copy number amplification and deletion in the high-risk groups. B: The duplicate regions with copy number amplification and deletion in the low-risk groups. C: The waterfall chart showed the genes affected by duplicate copy number changes, and the bar chart on the right described the corresponding proportion of changes in each group. D: The waterfall chart displayed the common somatic gene mutations, and the corresponding proportion of mutations in each group was described in the bar chart on the right
Development and performance of DNN models
Deep neural network models integrating multi-omics data showed high accuracy in predicting risk score groupings, outperforming single molecular layer models (Fig. 9A-D). This demonstrates the potential of integrating multiple data types for improved prognostic prediction.
Fig. 9.
Establish and performance of a DNN model. A: The workflow for developing a DDN model by integrating multiple omics features. B: The ROC curve of each risk score predictors in the TCGA training set, the testing set, and the entire cohort. C: The ROC curve of DNN model in the TCGA training set, the testing set, and the entire cohort. D: The ROC curve of DNN model in GSE29013, GSE37745, and GSE50081 cohorts
The combined model integrating transcriptomic and methylation data was developed to explore the potential of multi-omics data in enhancing risk prediction accuracy for NSCLC. However, we acknowledge that the external GEO datasets used for validation lack methylation data, limiting our ability to fully validate the model across all cohorts. This approach was intended to assess the added predictive value of methylation within our primary dataset, though future studies will explore unified deep learning architectures across multi-omics data to provide a more comprehensive comparison with other prognostic models.
Drug sensitivity analysis
The high-risk group showed increased sensitivity to several chemotherapy drugs and CMap-selected compounds (Fig. 10A-D), suggesting potential therapeutic implications of our risk stratification.
Fig. 10.
Analysis of drug sensitivity and drug targets. A: The difference of IC50 values of paclitaxel, gemcitabine, pemetrexed, and amrubicin between two risk groups. B: Scatter plot of correlation coefficients between NSCLC drug targets and risk score. C: The composition of compounds selected by CMap analysis showed only the top 10 drug classes. D: The differences in AUC values of CMap-selected compounds between two risk groups
We evaluated the sensitivity of high- and low-risk groups to various chemotherapy drugs, with results indicating minor differences in sensitivity profiles. The effect sizes, while statistically significant in some cases, appear modest based on interquartile ranges. To enhance interpretability, we included notches in each boxplot representing the 95% confidence intervals of the median, allowing for a visual assessment of statistical significance and typical effects. Mann-Whitney U test results are provided for each comparison to quantitatively support the observed sensitivity differences and their potential implications for therapeutic response in NSCLC patients.
Mendelian randomization analysis
MR analysis suggested that the expression of the 23 prognostic genes might be associated with NSCLC incidence (Fig. 11A-E). Our Mendelian Randomization (MR) analysis examined potential causal associations between specific polymorphisms within the 23 prognostic genes and NSCLC risk. The results indicate that, while one polymorphism showed a stronger association, most genes demonstrated modest effects with confidence intervals overlapping zero, suggesting limited causal evidence for these individual genes. This indicates that while certain genes may contribute to NSCLC risk, further validation is needed. We interpret these findings with caution and propose additional studies to confirm the potential associations identified in this analysis. Further analysis of individual genes revealed potential causal effects of HIF1A and SQLE expression on NSCLC incidence (Fig. 12A-H), providing additional evidence for the biological significance of our signature genes. Our analysis of potential causal effects for HIF1A and SQLE suggests minimal effects, as indicated by overlapping error bars, which align with the hypothesis of no strong causal relationship. We have applied multiple-testing correction across the 23 genes to ensure robust assessment and minimize false positives. While preliminary associations were noted, we interpret these findings cautiously and recognize that further validation is needed to substantiate any causal links with NSCLC risk. HIF1A’s causal association with NSCLC risk (OR = 1.32, p = 0.008; Fig. 12A) mechanistically links tumor hypoxia to disease pathogenesis, while SQLE’s involvement (OR = 1.21, p = 0.03) underscores lipid metabolism as a therapeutic vulnerability.
Fig. 11.

Results of MR analysis for combining SNP information of 23 prognostic genes. A: OR diagram. B: Causal relationship of prognostic gene expression and NSCLC risk. The slope represents the size of causal effect. C: Funnel plot assessed heterogeneity. D: The forest plot of MR analysis. E: The “leave-one-out” approach assessed the effect of prognostic gene expression on NSCLC when a SNP was excluded
Fig. 12.
Results of MR analysis for a single prognostic gene. A: The causal relationship be tween HIF1A expression and NSCLC risk. The slope represents the size of the causal effect. B: The causal relationship between SQLE expression and NSCLC risk. The slope represents the size of the causal effect. C: Funnel plot assessed the heterogeneity of HIF1A gene analysis results. D: Funnel plot assessed the heterogeneity of SQLE gene analysis results. E: The forest plot of MR analysis for HIF1A. F: The forest plot of MR analysis for SQLE; G: The “leave-one-out” evaluated the effect of HIF1A expression on NSCLC when a SNP was excluded. H: The “leave-one-out” evaluated the effect of SQLE expression on NSCLC when a SNP was excluded
Discussion
Non-small cell lung cancer (NSCLC) remains a therapeutic challenge due to its molecular heterogeneity and frequent resistance to immunotherapy [43]. Our 23-gene model demonstrated robust prognostic accuracy (AUC 0.696–0.812) across four cohorts and outperformed 76 existing signatures in C-index comparisons (Fig. 5G), highlighting its translational potential. Three key innovations distinguish this work.
By unifying PCD pathways (e.g., ferroptosis, cuproptosis) with organelle stress responses (mitochondria, lysosomes, Golgi), we captured synergistic biological networks driving NSCLC progression—an approach absent in prior models focused on single pathways [44]. The signature stratifies high-risk patients with immunosuppressive microenvironments (Fig. 6H) and predicts enhanced sensitivity to gemcitabine and PD-1 inhibitors, directly informing therapeutic decisions. Mendelian randomization implicated HIF1A and SQLE as potential NSCLC drivers (Fig. 12), bridging prognostic associations with biological causality—a critical advancement over purely correlative signatures [45]. These findings not only support the biological relevance of our selected genes but also highlight potential therapeutic targets for further investigation [45, 46].
Our findings align with emerging evidence of organelle-PCD crosstalk in therapy resistance, yet extend these concepts by demonstrating that mitochondrial dysfunction signatures (e.g., GAPDH dysregulation) predict immunotherapy refractoriness—a novel association with clinical implications [47, 48]. The DNN model’s superior performance (Fig. 9C) further underscores the value of multi-omics integration, particularly methylation-regulated miRNA-mRNA networks, for prognostic modeling [49]. This signature addresses two critical gaps in NSCLC management. Identifying patients likely to benefit from PD-1 inhibitors vs. those requiring combinatorial approaches (e.g., ICIs + gemcitabine) [49, 50].Enabling dynamic risk monitoring through methylation biomarkers, which are detectable in liquid biopsies [51].
The prognostic value of our signature was validated across multiple independent cohorts, consistently outperforming many clinical and molecular features [52]. Moreover, our model demonstrated high accuracy in predicting immunotherapy response, as evidenced by its association with established markers such as CYT, IFNG, and T cell-inflamed GEP [53]. This suggests that our signature could potentially guide immuno-therapy decisions in NSCLC patients.
The integration of multi-omics data through deep neural network models further enhanced the predictive power of our signature [54]. This approach demonstrates the potential of leveraging diverse molecular data types to improve prognostic accuracy in cancer [54].
Our drug sensitivity analysis revealed that high-risk patients, as defined by our signature, may be more responsive to certain chemotherapy drugs and targeted compounds [54]. This finding could have implications for personalized treatment selection in NSCLC. However, it’s important to note that these findings are based on in silico predictions and require experimental validation [55].
While our computational framework is rigorous, prospective validation using pre-treatment biopsies and randomized trials are needed to confirm clinical utility. The overlap between our signature and general stress responses suggests future refinements could enhance NSCLC specificity by incorporating spatial transcriptomics to resolve microenvironmental interactions.
Limitations
External validation cohorts (GEO datasets) lacked methylation data, limiting full validation of the multi-omics DNN model.
Drug sensitivity predictions were derived from in silico analyses (GDSC/CMap) and require experimental validation.
Mendelian randomization findings show associations but necessitate functional studies to confirm causality.
Cohort sizes, while robust, may not capture rare NSCLC subtypes; larger prospective studies are needed.
Conclusions
In conclusion, our 23-gene prognostic signature, based on PCD pathways and organelle functions, offers a novel tool for risk stratification and treatment guidance in NSCLC. The signature’s ability to predict prognosis, immunotherapy response, and drug sensitivity highlights its potential clinical utility. Future research should focus on experimental validation of the signature’s performance and exploration of the underlying biological mechanisms, particularly for the genes identified as potentially causal in our Mendelian randomization analysis. This work contributes to the ongoing efforts to develop more personalized and effective treatment strategies for NSCLC patients.
Electronic supplementary material
Acknowledgements
We gratefully thank all the staff in the Department of Oncology and the Department of Surgery at The First Affiliated Hospital of Jinzhou Medical University for their invaluable efforts in the success of this research project. We would also like to express our sincere gratitude to all individuals and organizations that have contributed to the success of this research project.
Abbreviations
- NSCLC
Non-small cell lung cancer
- LUAD
Lung adenocarcinoma
- LUSC
Lung squamous carcinoma
- DNN
Deep neural network
- ICB
Immune checkpoint blockade
Author contributions
Xiaomei Liu conceptualized the study; Yinxu Zhang, Xiaoyang Chen, and Siwang Wang per-formed the formal analysis; Yinxu Zhang and Xiaoyang Chen conducted the investigation; Yinxu Zhang, Xiaoyang Chen, and Siwang Wang wrote the main manuscript text; Guangyu Zhang and Yuxi Wang reviewed and edited the manuscript; Guangyu Zhang prepared figures. All authors have read and agreed to the published version of the manuscript.
Funding
Please add: This project was supported by funds from the Huilan lung cancer precision research fund (Grant No: HL-HS2022-009) and Wu Jieping Medical Foundation(Grant No: 2023-05-140).
Data availability
The research data supporting the results of this manuscript can be accessed through the fol-lowing sources: Gene Expression Omnibus (GEO): Dataset Name: GSE117570Accession Number: GSE117570URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE117570 The Cancer Genome Atlas (TCGA): Dataset Name: TCGA-LUAD and TCGA-LUSCAccession Number: TCGA-LUAD, TCGA-LUSC URL: https://portal.gdc.cancer.gov/External Validation Datasets from GEO: GSE50081: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE50081GSE29013: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE29013GSE37745: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE37745 Additional data types (e.g., miRNA-seq, DNA methylation, somatic mutation, and CNV) were obtained from TCGA and processed as described in the Supplementary Methods. The analysis scripts and any additional data generated during the study are available upon request from the corresponding author.
Declarations
Institutional review board statement
The data used in this study were obtained from publicly available repositories, which do not require individual participant consent for secondary analysis. The datasets used in this study are anonymized to protect participant privacy and do not contain any personally identifiable information. Therefore, informed consent from individual participants was not required for this analysis. The study adheres to the ethical guidelines and regulatory requirements for the use of public data in research.
Ethics approval and consent to participate
The study used de-identified, publicly available data from TCGA and GEO. Institutional Review Board approval was waived per institutional guidelines for secondary analysis of anonymized data.Individual consent was not required as all data were anonymized and sourced from public repositories.
Informed consent
The data used in this study were obtained from publicly available repositories, which do not require individual participant consent for secondary analysis. The datasets used in this study are anonymized to protect participant privacy and do not contain any personally identifiable information. Therefore, informed consent from individual participants was not required for this analysis.
Consent for publication
All authors approved the manuscript for publication.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yinxu Zhang, Siwang Wang and Xiaoyang Chen are co-first authors.
References
- 1.Li C, et al. Global burden and trends of lung cancer incidence and mortality. Chin Med J (Engl). 2023;136(13):1583–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wao H, et al. Survival of patients with non-small cell lung cancer without treatment: a systematic review and meta-analysis. Syst Rev. 2013;2:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xu Y, Li H, Fan Y. Progression patterns, treatment, and prognosis beyond resistance of responders to immunotherapy in advanced Non-Small cell lung Cancer. Front Oncol. 2021;11:642883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li Y, et al. A narrative review of artificial intelligence-assisted histopathologic diagnosis and decision-making for non-small cell lung cancer: achievements and limitations. J Thorac Dis. 2021;13(12):7006–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tong X, et al. Targeting cell death pathways for cancer therapy: recent developments in necroptosis, pyroptosis, ferroptosis, and Cuproptosis research. J Hematol Oncol. 2022;15(1):174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yanes B, Rainero E. The interplay between Cell-Extracellular matrix interaction and mitochondria dynamics in Cancer. Cancers (Basel), 2022;14(6):1433. [DOI] [PMC free article] [PubMed]
- 7.Cocetta V, Ragazzi E, Montopoli M. Mitochondrial involvement in cisplatin resistance. Int J Mol Sci, 2019;20(14):3384. [DOI] [PMC free article] [PubMed]
- 8.Zhang X. Alterations of golgi structural proteins and glycosylation defects in Cancer. Front Cell Dev Biol. 2021;9:665289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Patergnani S, et al. Editorial: organelles relationships and interactions: A Cancer perspective. Front Cell Dev Biol. 2021;9:678307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Saxena R, Welsh CM, He YW. Targeting regulated cell death pathways in cancers for effective treatment: a comprehensive review. Front Cell Dev Biol. 2024;12:1462339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Grinberg M, et al. Reaching the limits of prognostication in non-small cell lung cancer: an optimized biomarker panel fails to outperform clinical parameters. Mod Pathol. 2017;30(7):964–77. [DOI] [PubMed] [Google Scholar]
- 12.Bortolotto C, et al. Radiomics features as predictive and prognostic biomarkers in NSCLC. Expert Rev Anticancer Ther. 2021;21(3):257–66. [DOI] [PubMed] [Google Scholar]
- 13.Dong W, et al. Revealing prognostic insights of programmed cell death (PCD)-associated genes in advanced non-small cell lung cancer. Aging. 2024;16(9):8110–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Song Q, et al. Dissecting intratumoral myeloid cell plasticity by single cell RNA-seq. Cancer Med. 2019;8(6):3072–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Der SD, et al. Validation of a histology-independent prognostic gene signature for early-stage, non-small-cell lung cancer including stage IA patients. J Thorac Oncol. 2014;9(1):59–64. [DOI] [PubMed] [Google Scholar]
- 16.Xie Y, et al. Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients. Clin Cancer Res. 2011;17(17):5705–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Botling J, et al. Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis, and tissue microarray validation. Clin Cancer Res. 2013;19(1):194–204. [DOI] [PubMed] [Google Scholar]
- 18.Davis S, Meltzer PS. GEOquery: a Bridge between the gene expression omnibus (GEO) and bioconductor. Bioinformatics. 2007;23(14):1846–7. [DOI] [PubMed] [Google Scholar]
- 19.Rath S, et al. MitoCarta3.0: an updated mitochondrial proteome now with sub-organelle localization and pathway annotations. Nucleic Acids Res. 2021;49(D1):D1541–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liberzon A, et al. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Qin H, et al. Integrated machine learning survival framework develops a prognostic model based on inter-crosstalk definition of mitochondrial function and cell death patterns in a large multicenter cohort for lower-grade glioma. J Transl Med. 2023;21(1):588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Xu K, et al. Identification of Disulfidptosis related subtypes, characterization of tumor microenvironment infiltration, and development of DRG prognostic prediction model in RCC, in which MSH3 is a key gene during Disulfidptosis. Front Immunol. 2023;14:1205250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Andreatta M, Carmona SJ. UCell: robust and scalable single-cell gene signature scoring. Comput Struct Biotechnol J. 2021;19:3796–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lin J et al. Multimodule characterization of immune subgroups in intrahepatic cholangiocarcinoma reveals distinct therapeutic vulnerabilities. J Immunother Cancer, 2022;10(7):e004892. [DOI] [PMC free article] [PubMed]
- 25.Ritchie ME, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Alivand MR, et al. Integrative analysis of DNA methylation and gene expression profiles to identify biomarkers of glioblastoma. Cancer Genet. 2021;258–259:135–50. [DOI] [PubMed] [Google Scholar]
- 27.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Harrell FE Jr. Regression modeling strategies. 2nd ed. New York: Springer, 2015.
- 29.Gene Ontology Consortium. Going forward. Nucleic Acids Res. 2015;43(Database issue):D1049–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ogata H, et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999;27(1):29–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ferreira MR, et al. GSVA score reveals molecular signatures from transcriptomes for biomaterials comparison. J Biomed Mater Res A. 2021;109(6):1004–14. [DOI] [PubMed] [Google Scholar]
- 32.Chen B, et al. Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol Biol. 2018;1711:243–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Becht E, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18(1):220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013;14:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yang W, et al. Genomics of drug sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41(Database issue):D955–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mermel CH, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12(4):R41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Patsakis M et al. MAFcounter: An efficient tool for counting the occurrences of k-mers in MAF files. ArXiv, 2024. [DOI] [PMC free article] [PubMed]
- 39.Birney E. Mendelian randomization. Cold Spring Harb Perspect Med, 2022;12(4):a041302. [DOI] [PMC free article] [PubMed]
- 40.Ci S, et al. Src-mediated phosphorylation of GAPDH regulates its nuclear localization and cellular response to DNA damage. Faseb J. 2020;34(8):10443–61. [DOI] [PubMed] [Google Scholar]
- 41.Lee MN, et al. Glycolytic flux signals to mTOR through glyceraldehyde-3-phosphate dehydrogenase-mediated regulation of rheb. Mol Cell Biol. 2009;29(14):3991–4001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Luo MY, et al. Metabolic and nonmetabolic functions of PSAT1 coordinate signaling cascades to confer EGFR inhibitor resistance and drive progression in lung adenocarcinoma. Cancer Res. 2022;82(19):3516–31. [DOI] [PubMed] [Google Scholar]
- 43.Zhou K, et al. Mechanisms of drug resistance to immune checkpoint inhibitors in non-small cell lung cancer. Front Immunol. 2023;14:1127071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dixon SJ, et al. Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell. 2012;149(5):1060–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Pettit RW, et al. Heritable traits and lung Cancer risk: A Two-Sample Mendelian randomization study. Cancer Epidemiol Biomarkers Prev. 2023;32(10):1421–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Muthuramalingam P, et al. Integrated omics profiling and network Pharmacology uncovers the prognostic genes and multi-targeted therapeutic bioactives to combat lung cancer. Eur J Pharmacol. 2023;940:175479. [DOI] [PubMed] [Google Scholar]
- 47.Jin P, et al. Mitochondrial adaptation in cancer drug resistance: prevalence, mechanisms, and management. J Hematol Oncol. 2022;15(1):97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Shan Y, et al. Targeting tumor endothelial hyperglycolysis enhances immunotherapy through remodeling tumor microenvironment. Acta Pharm Sin B. 2022;12(4):1825–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zhu B, et al. Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep. 2017;7(1):16954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Park MK et al. Deep-Learning algorithm and concomitant biomarker identification for NSCLC prediction using Multi-Omics data integration. Biomolecules, 2022;12(12):1839. [DOI] [PMC free article] [PubMed]
- 51.Zhao Y, et al. Multiplex digital Methylation-Specific PCR for noninvasive screening of lung Cancer. Adv Sci (Weinh). 2023;10(16):e2206518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bueno R, et al. Multi-Institutional prospective validation of prognostic mRNA signatures in early stage squamous lung Cancer (Alliance). J Thorac Oncol. 2020;15(11):1748–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Dong Y, Zhang S. Letter to the editor: radiomics signature for dynamic monitoring of tumor-inflamed microenvironment and immunotherapy response prediction. J Immunother Cancer, 2025;13(2):a011778. [DOI] [PMC free article] [PubMed]
- 54.Chai H, et al. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Comput Biol Med. 2021;134:104481. [DOI] [PubMed] [Google Scholar]
- 55.He CM, et al. Integrative pan-cancer analysis and clinical characterization of the N7-methylguanosine (m7G) RNA modification regulators in human cancers. Front Genet. 2022;13:998147. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The research data supporting the results of this manuscript can be accessed through the fol-lowing sources: Gene Expression Omnibus (GEO): Dataset Name: GSE117570Accession Number: GSE117570URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE117570 The Cancer Genome Atlas (TCGA): Dataset Name: TCGA-LUAD and TCGA-LUSCAccession Number: TCGA-LUAD, TCGA-LUSC URL: https://portal.gdc.cancer.gov/External Validation Datasets from GEO: GSE50081: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE50081GSE29013: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE29013GSE37745: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE37745 Additional data types (e.g., miRNA-seq, DNA methylation, somatic mutation, and CNV) were obtained from TCGA and processed as described in the Supplementary Methods. The analysis scripts and any additional data generated during the study are available upon request from the corresponding author.






