Abstract
Background:
Lung cancer remains one of the most prevalent and lethal cancers globally, often diagnosed at advanced stages, which impedes effective treatment. Recent advancements have highlighted exosomes as valuable biomarkers for early detection, prognosis, and therapeutic interventions in lung cancer. Exosomes, which carry molecular information from tumor cells, reflect tumor development and metastasis, offering potential for precision medicine.
Objective:
This study aimed to develop a prognostic prediction model for lung cancer therapy based on miRNA profiling in exosomes. By performing bioinformatics analyses, we identified miRNAs and target genes associated with lung cancer treatment and their potential relationship with patient survival outcomes.
Materials and Methods:
Using the GSE207715 dataset, we applied machine learning models and a Transformer-based deep learning approach to predict nivolumab treatment efficacy in lung cancer patients. Additionally, miRNA-target gene interactions were predicted via miRNA databases, followed by Gene Ontology and KEGG pathway enrichment analyses. A Cox proportional hazards regression model was used to assess the relationship between miRNA expression and patient survival.
Results:
Significant differences were observed in the miRNA profiles of exosomes from patients with different nivolumab treatment outcomes, though the differences were relatively small. Machine learning models achieved prediction accuracies ranging from 0.6731 to 0.6923, while the deep learning model outperformed these methods with an accuracy of 0.9412. The hsa-let-7c miRNA showed statistical significance in multivariate survival risk analysis (p = 0.0152).
Conclusion:
This study demonstrates the potential of miRNA profiling in exosomes for predicting treatment efficacy and survival in lung cancer patients. The deep learning model’s ability to capture subtle miRNA expression differences provides a robust platform for personalized treatment strategies in non-small cell lung cancer,
Keywords: Bioinformatics, Exosome, Lung Cancer, miRNA, Multi-Target Intervention, Prognostic Model, Survival Risk
1. Background
Lung cancer remains one of the most prevalent and lethal malignancies worldwide. According to GLOBOCAN 2022 estimates, approximately 2.5 million new lung cancer cases were diagnosed globally, resulting in 1.8 million deaths, accounting for about 12.4% of total cancer incidence and 18.7% of cancer-related mortality worldwide ( 1 ). Based on the size of transformed cells, lung cancer is classified into two main types: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). NSCLC accounts for approximately 90% of all lung cancer cases( 2 ). Most patients with early-stage non-small cell lung cancer (NSCLC) miss the optimal treatment window due to the absence of clinical manifestations ( 3 ). For NSCLC patients, treatment strategies typically include surgical resection, targeted therapy, and immunotherapy. Historically, stages I-III NSCLC were primarily managed with surgery and chemotherapy. However, following the landmark CheckMate-816 trial, neoadjuvant nivolumab combined with chemotherapy has been approved by the FDA for resectable NSCLC ( 4 ). Given the heterogeneity in individual treatment responses, identifying clinically valuable biomarkers for early prediction of therapeutic outcomes in NSCLC is of critical importance.
All cells are capable of releasing extracellular vehicles (EVs), which encompass subtypes such as exosomes and microvesicles. Exosomes, derived from the endosomal pathway, exhibit a nano-scale size range (approximately 40–160 nm in diameter, averaging 100 nm). These vesicles are delimited by a lipid bilayer membrane and carry diverse biomolecular cargoes, including cytoplasmic constituents, membrane proteins, glycoproteins, lipids, metabolites, amino acids, RNAs, and DNA. Comprehensive profiling has revealed that exosomes harbor an extensive repertoire of biomolecules, comprising 9,769 distinct proteins, 3,408 mRNAs, 2,838 miRNAs, and 1,116 lipid species ( 5 ). In contrast, microvesicles are formed through direct budding of the plasma membrane and exhibit a larger diameter range of approximately 50 nm to 1 μm ( 6 , 7 ). Exosomes serve essential functions in maintaining cellular homeostasis and mediating intercellular communication, while actively contributing to both physiological and pathological processes. Their distinctive molecular composition and traceable cellular origins have enabled wide applications in precision medicine—including non-invasive liquid biopsies, early cancer diagnosis, and targeted drug delivery—particularly advancing oncology through biomarker discovery and therapeutic innovation.
Research has revealed that diverse exosomal miRNAs derived from lung cancer tissues are closely associated with tumorigenesis, progression, angiogenesis, and immune regulation, demonstrating potential diagnostic and prognostic value. For early diagnosis, miRNAs such as miR-21 and miR-210 are widely recognized as biomarkers strongly correlated with tumor development and metastasis, showing promise as early diagnostic tools ( 8 - 10 ). In tumor progression, miR-660 promotes non-small cell lung cancer (NSCLC) development by regulating the transcription factor KLF9, while miR-29a and miR-21 further drive tumor growth and metastasis through activation of Toll-like receptors in neighboring immune cells( 10 )
Given that lung cancer exhibits the highest global incidence and mortality rates, coupled with diagnostic challenges in early stages and remarkable molecular heterogeneity, there is an urgent need for personalized therapeutic strategies. This malignancy demonstrates particular dependence on molecular medicine technologies. Furthermore, substantial research founda-tions exist regarding exosome-mediated oncogenic mechanisms. Consequently, selecting lung cancer as a research focus not only carries clear clinical significance but also provides an ideal platform to advance translational applications of molecular medicine in cancer management. A key current research challenge lies in identifying clinically actionable molecular targets from the vast and complex compositional repertoire of exosomes. Artificial intelligence (AI) offers distinct advantages in processing and mining large-scale complex datasets. In recent years, machine learning approaches have been explored in lung cancer exosome research. For instance, studies have integrated surface-enhanced Raman spectroscopy (SERS) with deep learning algorithms to distinguish different lung cancer subtypes and disease stages ( 16 ). Concurrently, the integration of bioinformatics has substantially enhanced both the systematicity and precision of exosome research. Through analytical approaches such as protein-protein interaction (PPI) network analysis and signaling pathway enrichment, researchers can now comprehensively elucidate key molecular players and mechanistic insights into exosome-mediated disease regulation. These advances provide both theoretical foundations and technical support for prognostic evaluation and personalized therapy in lung cancer.
2. Objectives
In this study, we analyzed plasma samples from advanced NSCLC patients who received nivolumab to extract treatment-related miRNA features. Based on these features, we developed a prognostic prediction model for lung cancer treatment and predicted the target genes of miRNAs that contributed significantly to the model using gene databases. Subsequently, we performed enrichment analysis on the associated genes to identify biological processes and gene functions associated with lung cancer and other cancers. Building on this, we performed survival risk analysis of the miRNA features to explore their correlation with patient survival. Ultimately, we established a survival prediction model, providing valuable guidance for future cancer treatment and prognosis evaluation, particularly in the field of lung cancer.
3. Materials and Methods
3.1. Data
This study used the GSE207715 dataset from the Gene Expression Omnibus (GEO) database, a public repository created for storing gene expression data and related experimental information. The study collected and analyzed plasma samples from 174 advanced NSCLC patients who were administered nivolumab as a second-line treatment. Plasma samples were collected from each patient at baseline and at the best response time. A total of 24 ED samples, 81 PD samples, 35 SD samples, 30 PR samples, and 4 lost to follow-up samples were collected. Based on the therapeutic evaluation, patients were divided into two categories: partial response (PR) and stable disease (SD) as negative samples, while progressive disease (PD) and early death were defined as positive samples.
To explore the expression profiles of miRNAs in exosomes and their relationship with patient survival time, we obtained datasets for lung adenocarcinoma and lung squamous cell carcinoma from The Cancer Genome Atlas (TCGA) database. TCGA is a public resource, providing extensive genomic data across various cancer types to promote precision medicine research in oncology. The data, sourced from the Genomic Data Commons (GDC) TCGA platform, encompasses a range of omics data, including gene expression, mutations, and copy number variations. In this study, we selected and integrated LUAD and LUSC data from the TCGA database, ultimately including 1,066 patient samples, which contain information on overall survival (OS) and survival status (alive or deceased) at the final follow-up.
3.2. Nivolumab Efficacy Prediction
The patients treated with nivolumab were classified into two groups: the tumor remission group, comprising individuals with stable disease (SD) and partial response (PR), and the tumor progression group, consisting of those with early disease progression (ED) and progressive disease (PD).
3.2.1. Machine Learning
This study uses Support Vector Machine (SVM), Lasso Regression, Linear Discriminant Analysis, Random Forest to model and predict the efficacy of nivolumab treatment for NSCLC, aiming to explore the predictive performance and applicability of different algorithms. SVMs demonstrate superior generalization capability for high-dimensional, small-sample data, making them particularly suitable for medical datasets characterized by numerous features but limited samples. In this study, we employed the Radial Basis Function (RBF) kernel, with two key hyperparameters: the penalty coefficient C and kernel parameter γ. The initial parameter ranges were set as C ∈ {0.1, 1, 10, 100} and γ ∈ {0.01, 0.1, 1}, which were optimized via grid search on the training set.Lasso regression performs simultaneous feature selection and regression modeling, making it ideal for high-dimensional sparse data. By applying L1 regularization, it effectively eliminates irrelevant variables while preventing model overfitting. We determined the optimal regularization parameter α within the range (0.001, 1) through cross-validation.As a classical classification method, LDA enhances class separability by maximizing the ratio of between-class to within-class variance. Given that several preprocessed variables approximately satisfied the Gaussian distribution assumption, LDA was utilized to construct a low-dimensional discriminant space, thereby improving model classification performance.This ensemble learning method excels at capturing nonlinear feature relationships and variable interac-tions. Our implementation configured 100 decision trees with maximum depths ranging from 5 to 15, with the optimal parameter combination identified through grid search optimization.
The 170 samples were divided into a training dataset (73 positive/45 negative) and an independent test dataset (32 positive/20 negative). A range of machine learning techniques were employed to develop the model based on the selected features in Section 2. The optimal algorithm was selected based on performance. To identify the best hyperparameters, 5-fold cross-validation was conducted on the training dataset. The data was partitioned into five subsets, with four subsets utilized for training and one for validation. This procedure was repeated five times. The average results from these five iterations (e.g., accuracy, AUC) were computed to assess the overall model performance. Hyperparameters were fine-tuned according to the model’s performance on the validation dataset. Model performance was assessed through Receiver Operating Characteristic (ROC) curve analysis, with the area under the curve (AUC) computed for quantitative evaluation. Furthermore, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were determined at the threshold that optimized the Youden index(Sensitivity + Specificity - 1).
3.2.2. Deep Learning
This study employed a Transformer architecture based on self-attention mechanisms. In contrast to conventional convolutional neural networks (CNNs) or recurrent neural networks (RNNs), Transformers demonstrate superior capabilities in modeling long-range dependencies and parallel data processing—particularly advantageous for analyzing high-dimensional, complex biomedical feature data. By capturing global contextual information, the Transformer enhances the model’s ability to discern deep interrelationships among input features, thereby potentially improving predictive accuracy.
We implemented a standard Transformer model with an encoder-decoder structure. The encoder, designed for input feature extraction, comprises stacked multi-head self-attention and feed-forward network modules. Training stability was enhanced through residual connections and layer normalization. The decoder incorporates an encoder-decoder attention mechanism to focus on critical segments of the input sequence while preserving its autoregressive generation properties. Positional encoding was integrated to retain sequential positional information in the input data.
During model training, input data underwent standardization to ensure compatibility with network requirements. Labels were formatted as a binary classification task (0/1 labels), maintaining strict one-to-one correspondence with input samples. The dataset was randomly partitioned into training, validation, and test sets at an 8:1:1 ratio, preserving balanced class distribution. The training set optimized model parameters, the validation set monitored overfitting, and the test set evaluated final performance.The cross-entropy loss function was employed with the Adam optimizer, initialized at a learning rate of 1e-4 and augmented by a decay strategy. Training proceeded for 100 epochs with early stopping to prevent overfitting. Five-fold cross-validation was implemented on the training set to enhance generalization capability. Model efficacy was assessed through accuracy curves, loss trajectories, confusion matrices, and derived performance metrics for both training and validation phases.
3.3. Bioinformatics Analysis
Based on the high-contributing miRNA features in the model, the efficacy of nivolumab second-line treatment for NSCLC patients was predicted by analyzing the expression levels of relevant miRNAs. The miRNAs with predictive value were extracted, and their potential target genes were predicted using the miRWalk 2.0 database. Additional databases, including miRWalk, miRDB, miRMap, RNA22, miRanda, and TargetScan, were used to help predict the assumed target genes. Experimental validation of target genes was downloaded from the validated target gene modules, and the union of predicted target genes and experimentally validated genes was used to screen for differentially expressed miRNA-target gene pairs (Higher scores indicate greater matching probabilities, with a threshold of ≥0.95 set as the screening criterion). Cytoscape 3.10.3 was used to display the network (to investigate potential associations among miRNAs). Pathway enrichment analysis was conducted using the R programming language, with a BH-corrected p-value of <0.05 as the threshold.
3.4. Survival Analysis
We utilized miRNA expression data from tumors in the TCGA database to perform survival analysis and explore its correlation with patient prognosis. First, we preprocessed the miRNA expression data by removing low-expression or missing miRNAs. Subsequently, the Cox proportional hazards regression model was employed to evaluate the association between miRNA expression and patient survival, encompassing both overall survival and progression-free survival. In the modeling process, univariate Cox regression analysis was initially performed to identify miRNAs linked to survival, followed by multivariate Cox regression analysis to identify independent prognostic factors. Ultimately, Kaplan-Meier survival curves were constructed to estimate survival probabilities at various time points, providing a visual representation of survival outcomes through stepwise curves. The Kaplan-Meier method is a non-parametric statistical approach used to estimate the probability of events (such as death or disease progression) over time, with a key focus on handling censored data, i.e., individuals for whom the event did not occur during the study period.
3.5. Statistical Analysis
This study explores the relationships between the data through differential analysis and correlation analysis. In the differential analysis, the independent sample t-test was employed for normally distributed data, whereas the Mann-Whitney U test was utilized for non-normally distributed data to appropriately address the varying data distributions. the correlation analysis, we selected the appropriate method based on the distribution characteristics of the data: for normally distributed data, the Pearson correlation coefficient was employed to evaluate the linear association between variables; for non-normally distributed data, we employed the Spearman rank correlation coefficient method to assess the monotonic relationship between variables. To assess the distribution of the data, we initially performed a normality test using the Kolmogorov-Smirnov test to determine adherence to a normal distribution. All statistical analyses were conducted with a significance threshold of P < 0.05 to ensure the robustness of the findings.
4. Results
4.1. Baseline
Table 1. presents the baseline characteristics of the enrolled samples in the GSE207715.
Table 1.
The baseline information of patients in GSE207715.
| Negative (65) | Positive (105) | P-value | ||
|---|---|---|---|---|
| age | 67.09 ± 9.22 | 64.90 ± 10.00 | 0.2409 | |
| histotype | Other (11) | 2 | 9 | 0.1353 |
| squamous cell carcinoma (37) | 11 | 26 | ||
| adenocarcinoma (122) | 52 | 70 | ||
| gender | Female (51) | 20 | 31 | 1.0 |
| Male (119) | 45 | 74 | ||
| smoke habit | Never smoker (15) | 7 | 8 | 0.6676 |
| Former smoker (74) | 26 | 48 | ||
| Current smoker (81) | 32 | 49 | ||
| metastatic sites | 1 (39) | 18 | 21 | 0.293 |
| 2 (56) | 23 | 33 | ||
| 3 (43) | 15 | 28 | ||
| 4 (20) | 7 | 13 | ||
| 5 (7) | 1 | 6 | ||
| 6 (4) | 0 | 4 | ||
| 7 (1) | 1 | 0 | ||
| brain metastases | No (149) | 62 | 87 | 0.0298 |
| Yes (21) | 3 | 18 | ||
| liver metastases | No (132) | 56 | 76 | 0.0567 |
| Yes (38) | 9 | 29 | ||
| prior lines of treatment | 1 (85) | 32 | 53 | 0.7623 |
| 2 (48) | 17 | 31 | ||
| 3 (21) | 8 | 13 | ||
| 4 (10) | 6 | 4 | ||
| 5 (4) | 1 | 3 | ||
| 6 (2) | 1 | 1 | ||
| performance status | 0 (48) | 25 | 23 | 0.002 |
| 1 (99) | 38 | 61 | ||
| 2 (23) | 2 | 21 |
4.2. Correlation Analysis of Nivolumab Treatment Efficacy Based on Exosomal Mirna.
The patients receiving nivolumab treatment were divided into two groups: the tumor response group, which includes PR and SD, and the tumor progression group, which includes ED and PD. The results of the miRNA correlation analysis are presented in Figure 1.
Figure 1.
Nivolumab treatment efficacy and the expression of exosomal miRNAs.
Significant differences in the expression of numerous exosomal miRNAs were observed across patient groups with varying nivolumab treatment outcomes, with these differences being statistically significant (p < 0.05). However, despite the expression differences, the differences in the expression levels of these miRNAs were relatively small, with most miRNAs having log2FC values below 1. This suggests that while there are certain differences in the exosomal miRNA expression among patients with varying nivolumab treatment responses, the limited extent of these differences presents challenges in relying solely on statistical methods and miRNA expression levels to predict the treatment efficacy in nivolumab-treated patients. Therefore, using only exosomal miRNA expression features for efficacy prediction may not provide sufficient clinical predictive value.
4.2. Exosomal miRNA-Based Artificial Intelligence Model for Nivolumab Treatment Efficacy
4.2.1. Exosomal miRNA-Based Machine Learning for Nivolumab Treatment Efficacy
The feature selection of the SVM, Lasso, and LDA models identified the following high-contribution plasma exosomal miRNAs: hsa-let-7g-5p, hsa-miR-22-3p, hsa-miR-146a-5p, hsa-miR-30b-5p, hsa-miR-30e-5p, hsa-miR-23a-3p, hsa-let-7c-5p, and hsa-miR-4745-5p. The RF model’s feature selection also included hsa-miR-146a-5p, hsa-miR-22-3p, hsa-miR-23a-3p, and hsa-miR-4745-5p. Due to the overfitting risk associated with RF, other high-contribution miRNAs selected by this model are not considered. The AUC curve and selected features of the model are shown in Figure 2.
Figure 2.
ROC curves of four machine learning models and their top contributing features: A) Support Vector Machine (SVM), B) Lasso Regression, C) Linear Discriminant Analysis (LDA), D) Random Forest (RF).
The evaluation outcomes on the test set, as presented in Table 2, demonstrate the performance of the four models. These models exhibit strong performance on the training set. These models performance on the test set significantly underperforms relative to expectations. Not only is the overall performance poor, but the performance differences between the four models are also relatively small. This suggests that, despite the models possibly overfitting on the training set and capturing patterns in the training data, their generalization ability on the actual test set is poor and they fail to effectively apply to new data samples.
Table 2.
AUC, Accuracy, NPV, PPV, Sensitivity, and Specificity of the Four Machine Learning Methods on the Test Set.
| model | AUC | Accuracy | NPV | PPV | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| SVM | 0.6391 | 0.6731 | 0.5652 | 0.7586 | 0.6875 | 0.6500 |
| Lasso | 0.6266 | 0.6731 | 0.5714 | 0.7419 | 0.7188 | 0.6000 |
| LDA | 0.6500 | 0.6923 | 0.6111 | 0.7353 | 0.7812 | 0.5500 |
| RF | 0.6727 | 0.6731 | 0.5484 | 0.8571 | 0.5625 | 0.8500 |
* area under the curve (AUC) ; negative predictive value (NPV) ; positive predictive value (PPV)
4.2.2. Exosomal miRNA-Based deep Learning for Nivolumab Treatment Efficacy.
At the 46th epoch, the model achieved the highest accuracy on both the validation and test sets, with an accuracy of 0.9706 on the validation set and 0.9412 on the test set. This result indicates that the model successfully improved its generalization ability during training and maintained high prediction accuracy across different datasets. Throughout the training process, the accuracy curves for the training and validation sets are shown in Figure 3A, while the loss curves are shown in Figure 3B. By examining these two curves, it can be observed that the accuracy and loss curves for the training exhibit a synchronized upward and downward trend, and also on validation sets, with no obvious signs of overfitting. This suggests that the model effectively learned the patterns in the data during training and steadily improved its performance. Moreover, the deep learning model’s accuracy is significantly higher than that of traditional machine learning models, further validating its advantages and potential when handling complex data.
Figure 3.
Performance of the deep learning model based on exosomal miRNA for nivolumab treatment efficacy. A) Accuracy curves on the training and validation sets. B) Loss curves on the training and validation sets.
The evaluation metrics for both the validation and test sets exhibit consistent results, with variations of less than 5%, suggesting that the model demonstrates strong stability and generalization capability. On the validation set, the model’s accuracy was 0.9706, precision was 0.9721, recall was 0.9706, and the F1-score was 0.9705. On the test set, the evaluation metrics were as follows: accuracy was 0.9412, precision was 0.9461, recall was 0.9412, and F1-score was 0.9398. Although there were some differences between the results on the validation and test sets, the overall consistency suggests that the model performs stably and reliably across different datasets. Additionally, the confusion matrices the results for the validation and test sets are presented in Figure 4, where the classification results and performance can be further observed.
Figure 4.
Confusion Matrix of the deep learning model. A) Confusion Matrix on the validation set. B) Confusion Matrix on test set.
4.3. Target Gene Screening and Enrichment Analysis
198 target genes regulated by the identified miRNAs were obtained. GO and KEGG pathway enrichment analyses were performed for miRNAs regulating at least five target genes, and the results are shown in Figure 5A. In the KEGG enrichment analysis: Target genes regulated by hsa-let-7c-5p were mainly enriched in the p53 signaling pathway, microRNAs in cancer, hepatocellular carcinoma, proteoglycans in cancer, and chronic myeloid leukemia. Target genes regulated by hsa-let-7g-5p were primarily enriched in the JAK-STAT signaling pathway. Target genes regulated by hsa-miR-22-3p were mainly enriched in transcriptional misregulation in cancer, the Ras signaling pathway, Salmonella infection, acute myeloid leukemia, and bacterial invasion of epithelial cells. The target genes modulated by hsa-miR-4745-5p were significantly enriched in pathways related to nitrogen, glyoxylate, dicarboxylate and glutamate metabolism, arginine biosynthesis, as well as alanine, aspartate, in addition to the HIV-1 life cycle. As shown in Figure 5B, in GO enrichment analysis, the target genes regulated by hsa-miR-4745-5p are mainly enriched in (palmitoyl)-protein hydrolase activity, unmethylated CpG binding, palmitoyl hydrolase activity, siRNA binding, and U6 snRNA binding. No significant KEGG or GO enrichment was observed for the target genes regulated by other miRNAs.
Figure 5.
MiRNA-target gene network and enrichment analysis. A) The network based on these 8 miRNAs and 198 target genes. Yellow circle nodes: screened miRNAs; Blue circle nodes: target genes. B) The result of gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses for the miRNAs that the number of target genes≥5. Lateral axis: miRNAs, numbers in parentheses: the number of genes; vertical axis: the name of GO terms or KEGG pathway; the size of round dot: GeneRatio, bigger dot represents more miRNAs enriched in this term, smaller p value represents higher significance.
4.4. miRNA-based Survival Analysis in Lung Cancer Patients
To explore the relationship between miRNA and survival duration in lung cancer patients, we performed a univariate risk analysis of the miRNAs expressed within the tumors and the patients’ survival time, as shown in Figure 6.
Figure 6.
Expression of tumor-intrinsic miRNAs and patient survival.
The expression levels of tumor-intrinsic miRNAs show a strong correlation with patient survival, with significant associations between the expression of various miRNAs and survival time. Based on this, we further analyzed the features in the exosome-based deep learning model for nivolumab efficacy prediction, assessing their risk in survival time. The results are shown in Table 3. In this model, considering the impact of individual variables on survival time, hsa-let-7c, hsa-let-7g, hsa-miR-146a, hsa-miR-22, hsa-miR-23a, hsa-miR-30b, and hsa-miR-30e were all statistically significant.
Table 3.
Univariate risk analysis of features in the nivolumab efficacy deep learning model and survival time.
| miRNA | coef | HR | HR lower 95% | HR upper 95% | p |
|---|---|---|---|---|---|
| hsa-let-7c | -0.1176 | 0.8891 | 0.8195 | 0.9646 | *0.0047 |
| hsa-let-7g | -0.1714 | 0.8425 | 0.7206 | 0.9851 | *0.0317 |
| hsa-mir-146a | -0.0852 | 0.9184 | 0.8437 | 0.9997 | *0.0491 |
| hsa-mir-22 | 0.1955 | 1.2159 | 1.0295 | 1.4361 | *0.0213 |
| hsa-mir-23a | 0.1695 | 1.1847 | 1.0204 | 1.3754 | *0.0261 |
| hsa-mir-30b | -0.1295 | 0.8786 | 0.7992 | 0.9658 | *0.0074 |
| hsa-mir-30e | -0.1731 | 0.8411 | 0.7140 | 0.9907 | *0.0383 |
Only hsa-let-7c remains statistically significant (p = 0.0152), and its protective effect on survival persists even after adjusting for other factorsc as shown in Table 4. Other miRNAs are no longer significant (p > 0.05). hsa-let-7g shifts from protective (<1) to non-significant, likely because its effect in the univariate analysis was weakened after being adjusted by other miRNAs. hsa-miR-146a shows a slight change, but the significance disappears. hsa-miR-22, hsa-miR-23a, hsa-miR-30b, and hsa-miR-30e all become non-significant, suggesting that their association with survival may be driven by other factors.
Table 4.
Multivariate risk analysis of features in the nivolumab chemotherapy efficacy deep learning model and survival time.
| miRNA | coef | HR | HR lower 95% | HR upper 95% | p |
|---|---|---|---|---|---|
| hsa-let-7c | -0.1035 | 0.9017 | 0.8294 | 0.9802 | *0.0152 |
| hsa-let-7g | 0.0282 | 1.0286 | 0.8508 | 1.2437 | 0.7706 |
| hsa-mir-146a | -0.0795 | 0.9236 | 0.8387 | 1.0170 | 0.1059 |
| hsa-mir-22 | 0.1424 | 1.1530 | 0.9590 | 1.3864 | 0.1299 |
| hsa-mir-23a | 0.1299 | 1.1388 | 0.9629 | 1.3467 | 0.1290 |
| hsa-mir-30b | -0.0761 | 0.9268 | 0.8343 | 1.0294 | 0.1560 |
| hsa-mir-30e | -0.1051 | 0.9002 | 0.7424 | 1.0916 | 0.2852 |
To obtain all possible cutoff points, the miRNA column values are typically arranged in ascending order. The Log-rank test is subsequently applied to compare the survival curves between the two groups and to calculate the p-value. If the p-value is smaller, the optimal cutoff point is updated. The results are shown in Figure 7, with the cutoff value for hsa-let-7c for five-year survival risk being 8.7843, and the cutoff value for ten-year survival risk being 8.6952.
Figure 7.
Kaplan-Meier Analysis of 5- and 10-Year Survival Rates. A) 5-year Kaplan-Meier (KM) survival curve and B) 10-year Kaplan-Meier (KM) survival curve.
5. Discussion
This study is based on the GSE207715 dataset from the GEO database, utilizing sequencing data from the GPL25134 platform. We extracted miRNA features from plasma samples and performed correlation analysis to identify features highly associated with treatment outcomes. Subsequently, models were developed using both machine learning and deep learning approaches to predict exosomal molecules related to the prognosis of NSCLC patients receiving nivolumab treatment. Through bioinformatics analysis, we further identified the most contributing miRNAs in the model and their target genes, and explored the complex effects of these genes on cell pathways and lung cancer treatment outcomes. Finally, we evaluated the relationship between the miRNA features in the model and the survival time, aiming to identify potential stable biomarkers.
Cancer is one of the most challenging diseases in modern medicine, facing numerous issues in early diagnosis, treatment, and prognosis. However, the discovery of exosomes has provided new insights into potential solutions. In other systemic diseases, the use of machine learning to identify exosomal cancer-related biomarkers has been gradually rising in recent years. For instance, Yin H et al. ( 17 ) used machine learning to identify key proteomic features in serum exosomes, establishing a promising exosome-related radiofrequency model for colorectal cancer diagnosis; researchers used random forest algorithms to accurately predict the prognosis of liver cancer patients based on 13 exosomal gene features ( 18 ); Yap et al.( 19 ) employed machine learning techniques to analyze 114,602 exosomal RNAs for the identification of key features, for predicting HCC, finding nine exosomal RNAs that may have the potential to function as minimally invasive biomarkers for clinical HCC; Cui et al. ( 20 ) constructed an ovarian cancer exosome-related LncRNA feature model using multivariable COX regression to predict prognosis risk and immune therapy response. A number of studies have now substantiated the viability of utilizing exosomes for the early detection, proliferation, metastasis, and prognosis. Researchers also used machine learning to predict abnormal responses to nivolumab monotherapy in NSCLC based on circulating microRNAs ( 21 ). In this study, we found that although exosomal miRNAs are correlated with therapeutic efficacy, the differences between subgroups with different treatment outcomes are relatively small. With a limited sample size, these differences are difficult to capture using machine learning, while deep learning, with its exceptional feature extraction ability, can construct accurate predictive models.
Exosomes are related to lung cancer, impacting early diagnosis, proliferation, metastasis, and prognosis. Zhang et al. ( 22 ) identified upregulated miRNAs (miR-500a-3p, miR-501-3p, and miR-502-3p) in plasma exosomes, revealing the potential of plasma exosomes for early lung cancer diagnosis. Another research team developed a diagnostic model based on plasma exosomal miRNA to distinguish between benign and malignant pulmonary nodules( 23 ); Shin et al. ( 24 ) verified the feasibility of using human plasma exosomes as potential tumor-associated biomarkers with deep learning algorithms. In studies on exosomes and lung cancer proliferation, researchers found that platelet exosomes transfer CD41 to the surface of lung cancer cells, inducing the expression of cyclin D2 in these cells. Moreover, the level of miR-660-5p in plasma from NSCLC patients was significantly elevated ( 11 ); researcher also found that exosomal miR-126 could serve as a biomarker in circulation for regulating cancer progression in NSCLC ( 8 ). In terms of treatment and prognosis, Zhang et al. ( 25 ) discovered that exosomes and circRNA_101093 are critical for enhancing the susceptibility of lung adenocarcinoma cells to ferroptosis, and blocking exosomes may help in the treatment of lung adenocarcinoma in the future. A group of miRNAs from serum exosomes can be used as diagnostic and prognostic biomarkers for small cell lung cancer ( 26 ); another study showed that exosomes from small cell lung cancer cells containing molecules also serve as prognostic indicators ( 27 ). Zhang et al. ( 28 ) recently discovered 13 key diagnostic biomarkers in exosomes from small cell lung cancer patients and constructed an innovative prognostic model for this type of cancer. In this study, we found that exosomal miRNAs associated with therapeutic efficacy in tumors are also significantly correlated with patient survival. Since hsa-let-7c is present not only in tumor cells but also in exosomes, and is simultaneously correlated with both therapeutic efficacy and survival time, this demonstrates its potential as a stable biomarker.
Despite the achievements of this study, there are still some limitations. The diversity and scope of the database represent an initial limitation. While it is sufficient for the preliminary analysis of exosomal miRNAs, it is inadequate compared to large databases, which limits the statistical significance and generalizability of the results. Additionally, the data source is singular, particularly lacking diversity across different regions and ethnic groups, which requires further validation. The study primarily relies on bioinformatics analysis, lacking functional experimental validation, which hinders the construction of causal relationships and makes it diffi-cult to fully assess the biological significance of the results. Since exosomal miRNAs may exhibit dynamic changes during disease progression, the study’s sample did not capture time-series information. Furthermore, the absence of an independent validation set limits the generalizability of the algorithms. Future research could incorporate multi-center data, integrate experimental validation, and explore time dynamic changes for further in-depth investigation.
6. Conclusion
This research developed a prognostic prediction model for NSCLC patients undergoing nivolumab therapy, utilizing plasma miRNA profiles as features. We identified 7 treatment-related miRNAs and their 198 target genes, further demonstrating the potential of miRNAs as prognostic biomarkers for treatment outcomes. Through bioinformatics analysis, we uncovered key miRNAs and their target genes asso-ciated with lung cancer treatment and validated the effectiveness of this model in assessing patient survival. These findings provide new insights for the early diagnosis, prognostic assessment, and persona-lized treatment of lung cancer.
Acknowledgments
Conflict of Interests
The authors declared no conflict of interest.
Funding
This work was supported by the Natural Science Foundation of Hunan Province - Science and Health Joint Project (No.2022JJ70160).
References
- 1.Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality world-wide for 36 cancers in 185 countries. Ca-Cancer J Clin. 2024;74(3):229–263. doi: 10.3322/caac.21834. [DOI] [PubMed] [Google Scholar]
- 2.Srivastava S, Mohanty A, Nam A, Singhal S, Salgia R. Chemokines and NSCLC: Emerging role in prognosis, heterogeneity, and therapeutics. Semin Cancer Biol. 2022;86(Pt2):233–246. doi: 10.1016/j.semcancer.2022.06.010. [DOI] [PubMed] [Google Scholar]
- 3.Chen P, Liu Y, Wen Y, Zhou C. Non-small cell lung cancer in China. Cancer Commun. 2022;42(10):937–970. doi: 10.1002/cac2.12359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Grant C, Hagopian G, Nagasaka M. Neoadjuvant therapy in non-small cell lung cancer. Crit Rev Oncol Hemat. 2023;190:104080. doi: 10.1016/j.critrevonc.2023.104080. [DOI] [PubMed] [Google Scholar]
- 5.Karbasi S, Erfanian N, Dehghan H, Zarban A, Namaei MH, Hanafi-Bojd MY, et al. Assessment of the Anti-cancer Effects of Camel Milk Exosomes (CMEXOs) on Murine Colorectal Cancer Cell Line (CT-26) . Iran J Allergy Asthm. 2024;23(3):321–329. doi: 10.18502/ijaai.v23i3.15641. [DOI] [PubMed] [Google Scholar]
- 6.Kalluri R, LeBleu VS. The biology, function, and biomedical applications of exosomes. Science. 2020;367(6478) doi: 10.1126/science.aau6977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wortzel I, Dror S, Kenific CM, Lyden D. Exosome-Mediated Metastasis: Communication from a Distance. Dev Cell. 2019;49(3):347–360. doi: 10.1016/j.devcel.2019.04.011. [DOI] [PubMed] [Google Scholar]
- 8.Grimolizzi F, Monaco F, Leoni F, Bracci M, Staffolani S, Bersaglieri C, et al. Exosomal miR-126 as a circulating biomarker in non-small-cell lung cancer regulating cancer progression. Sci Rep Uk. 2017;7(1):15277. doi: 10.1038/s41598-017-15475-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hsu YL, Hung JY, Chang WA, Lin YS, Pan YC, Tsai PH, et al. Hypoxic lung cancer-secreted exosomal miR-23a increased angiogenesis and vascular permeability by targeting prolyl hydroxylase and tight junction protein ZO-1. Oncogene. 2017;36(34):4929–4942. doi: 10.1038/onc.2017.105. [DOI] [PubMed] [Google Scholar]
- 10.Liu Y, Luo F, Wang B, Li H, Xu Y, Liu X, et al. STAT3-regulated exosomal miR-21 promotes angiogenesis and is involved in neoplastic processes of transformed human bronchial epithelial cells. Cancer Lett. 2016;370(1):125–135. doi: 10.1016/j.canlet.2015.10.011. [DOI] [PubMed] [Google Scholar]
- 11.Fabbri M, Paone A, Calore F, Galli R, Gaudio E, Santhanam R, et al. MicroRNAs bind to Toll-like receptors to induce prometastatic inflammatory response. P Natl Acad Sci Usa. 2012;109(31):E2110–E2116. doi: 10.1073/pnas.1209414109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Qi Y, Zha W, Zhang W. Exosomal miR-660-5p promotes tumor growth and metastasis in non-small cell lung cancer. J Buon. 2019;24(2):599–607. [PubMed] [Google Scholar]
- 13.Tang YT, Huang YY, Li JH, Qin SH, Xu Y, An TX, et al. Correction: Alterations in exosomal miRNA profile upon epithelial-mesenchymal transition in human lung cancer cell lines. Bmc Genomics. 2023;24(1):753. doi: 10.1186/s12864-023-09810-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhuang G, Wu X, Jiang Z, Kasman I, Yao J, Guan Y, et al. Tumour-secreted miR-9 promotes endothelial cell migration and angiogenesis by activating the JAK-STAT pathway. Embo J. 2012;31(17):3513–3523. doi: 10.1038/emboj.2012.183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Xu K, Zhang C, Du T, Gabriel A, Wang X, Li X, et al. Progress of exosomes in the diagnosis and treatment of lung cancer. Biomed Pharmacother. 2021;134:111111. doi: 10.1016/j.biopha.2020.111111. [DOI] [PubMed] [Google Scholar]
- 16.Liu Y, Cai C, Xu W, Li B, Wang L, Peng Y, et al. Interpretable Machine Learning-Aided Optical Deciphering of Serum Exosomes for Early Detection, Staging, and Subtyping of Lung Cancer. Anal Chem. 2024;96(41):16227–16235. doi: 10.1021/acs.analchem.4c02914. [DOI] [PubMed] [Google Scholar]
- 17.Yin H, Xie J, Xing S, Lu X, Yu Y, Ren Y, et al. Machine learning-based analysis identifies and validates serum exosomal proteomic signatures for the diagnosis of colorectal cancer. Cell Rep Med. 2024;5(8):101689. doi: 10.1016/j.xcrm.2024.101689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhu K, Tao Q, Yan J, Lang Z, Li X, Li Y, et al. Machine learning identifies exosome features related to hepatocellular carcinoma. Front Cell Dev Biol. 2022;10:1020415. doi: 10.3389/fcell.2022.1020415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yap J, Goh L, Lim A, Chong SS, Lim LJ, Lee CG. Machine Learning Identifies a Signature of Nine Exosomal RNAs That Predicts Hepatocellular Carcinoma. Cancers. 2023;15(14) doi: 10.3390/cancers15143749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cui Y, Zhang W, Lu W, Feng Y, Wu X, Zhuo Z, et al. An exosome-derived lncRNA signature identified by machine learning associated with prognosis and biomarkers for immunotherapy in ovarian cancer. Front Immunol. 2024;15:1228235. doi: 10.3389/fimmu.2024.1228235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang Y, Goto Y, Yagishita S, Shinno Y, Mizuno K, Watanabe N, et al. Machine learning-based exceptional response prediction of nivolumab monotherapy with circulating microRNAs in non-small cell lung cancer. Lung Cancer. 2022;173:107–115. doi: 10.1016/j.lungcan.2022.09.004. [DOI] [PubMed] [Google Scholar]
- 22.Zhang JT, Qin H, Man CF, Su J, Zhang DD, Liu SY, et al. Plasma extracellular vesicle microRNAs for pulmonary ground-glass nodules. J Extracell Vesicles. 2019;8(1):1663666. doi: 10.1080/20013078.2019.1663666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zheng D, Zhu Y, Zhang J, Zhang W, Wang H, Chen H, et al. Identification and evaluation of circulating small extracellular vesicle microRNAs as diagnostic biomarkers for patients with indeterminate pulmonary nodules. J Nanobiotechnol. 2022;20(1):172. doi: 10.1186/s12951-022-01366-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shin H, Oh S, Hong S, Kang M, Kang D, Ji YG, et al. Early-Stage Lung Cancer Diagnosis by Deep Learning-Based Spectroscopic Analysis of Circulating Exosomes. Acs Nano. 2020;14(5):5435–5444. doi: 10.1021/acsnano.9b09119. [DOI] [PubMed] [Google Scholar]
- 25.Zhang X, Xu Y, Ma L, Yu K, Niu Y, Xu X, et al. Essential roles of exosome and circRNA_101093 on ferroptosis desensitization in lung adenocarcinoma. Cancer Commun. 2022;42(4):287–313. doi: 10.1002/cac2.12275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kim DH, Park H, Choi YJ, Im K, Lee CW, Kim DS, et al. Identification of exosomal microRNA panel as diagnostic and prognostic biomarker for small cell lung cancer. Biomark Res. 2023;11(1):80. doi: 10.1186/s40364-023-00517-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Cao B, Wang P, Gu L, Liu J. Use of four genes in exosomes as biomarkers for the identification of lung adenocarcinoma and lung squamous cell carcinoma. Oncol Lett. 2021;21(4):249. doi: 10.3892/ol.2021.12510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang K, Zhang C, Wang K, Teng X, Chen M. Identifying diagnostic markers and constructing a prognostic model for small-cell lung cancer based on blood exosome-related genes and machine-learning methods. Front Oncol. 2022;12:1077118. doi: 10.3389/fonc.2022.1077118. [DOI] [PMC free article] [PubMed] [Google Scholar]







