Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Sep 29;15:33609. doi: 10.1038/s41598-025-19007-5

Analysis of diagnostic apoptosis-related biomarkers and immune cell infiltration characteristics in endometriosis by integrating bioinformatics and machine learning

Xiulan Weng 1,2,3,4,#, Jingxuan Ye 1,2,3,#, Wenyu Lin 1,2,3, Dingjie Wang 1,2,3, Jinsong Yi 4, Zhenhong Wang 1,4,, Pengming Sun 1,2,3,4,5,
PMCID: PMC12480475  PMID: 41023129

Abstract

Endometriosis (EMs) is a chronic disease affecting millions of women worldwide, yet its pathogenesis remains unclear, and current diagnostic methods are limited. This study based on the EMs dataset from Gene Expression Omnibus (GEO), key genes related to cell apoptosis in EMs were identified through methods such as differential expression analysis and machine learning. Furthermore, analyses including nomogram construction, immune infiltration analysis, and drug prediction were performed based on these key genes. Three apoptosis-related genes—FAS, PRKAR2B and CSF2RB were identified as key genes. The nomogram constructed based on these key genes has good predictive performance. Immune infiltration analysis revealed associations between CSF2RB and activated B cells and immature dendritic cells, while FAS correlated with myeloid-derived suppressor cells (MDSCs). Additionally, potential therapeutic agents targeting these genes were identified. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) analysis revealed that FAS and CSF2RB expression levels were significantly downregulated in the EMs group compared to controls (P < 0.05). In conclusion, FAS, PRKAR2B and CSF2RB are promising diagnostic biomarkers for EMs and are associated with specific immune cell populations, offering potential targets for future therapeutic interventions.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-19007-5.

Keywords: Endometriosis, Bioinformatics, Machine learning, Apoptosis biomarkers, Immune infiltration

Subject terms: Diagnostic markers, Diagnostic markers

Introduction

Endometriosis (EMs) is a common gynaecological disorder characterised by the presence of endometrial glands and stroma outside the uterine cavity, predominantly affecting women of reproductive age1,2. Its global prevalence is estimated at 5–15%, reaching approximately 10% in China and 30–45% among women with infertility3. A 2022 study estimated the direct treatment costs of EMs at $1,459–$20,239 per patient annually, with indirect costs ranging from $4,572 to $14,0794. EMs are associated with symptoms such as dysmenorrhoea, dyspareunia, menorrhagia and infertility, with infertility affecting m 30%−40% of patients5. Notably, approximately 30% of the affected individuals are asymptomatic.

The current diagnostic gold standard is the surgical visualisation of pelvic organs via laparoscopy. However, despite advances in minimally invasive techniques, laparoscopy remains an invasive procedure requiring general anaesthesia and carries perioperative and postoperative complications6,7. Consequently, the average time from symptom onset to diagnosis is approximately 6–7 years8. Various molecules in biological fluids have been investigated as noninvasive biomarkers for EMs, with serum cancer antigen 125 (CA-125) being widely used for therapeutic monitoring. However, its low specificity and sensitivity prevent it from replacing laparoscopy as a diagnostic tool9. Therefore, there is a critical need for highly specific and sensitive noninvasive biomarkers to facilitate early diagnosis and assess disease severity.

The pathogenesis of EMs involves hormonal, neurological and immunological factors that contribute to ectopic endometrial proliferation and survival10. An impaired immune response, leading to inadequate clearance of refluxed menstrual debris, has been proposed as a key mechanism in disease development. Immune cells, inflammatory mediators and cytokines collectively contribute to an abnormal immune microenvironment in EMs11. Another widely recognised theory suggests that decreased apoptosis of ectopic endometrial cells enhances their survival, promoting disease progression12,13. Studies indicate that the apoptosis rate of shed endometrial cells is significantly reduced in patients with EMs14, suggesting that cells reaching the peritoneal cavity have a higher survival rate in individuals with progressive disease. Nevertheless, the relationships between apoptosis-related genes (ARGs) and invading immune cells were investigated to acquire a deeper understanding of the molecular immunological mechanisms involved in the development of EMs.

This study integrates multiple high-throughput sequencing datasets and employs machine learning to identify key ARGs. Additionally, the relationships between ARGs and immune cell infiltration are explored to identify potential diagnostic biomarkers and therapeutic targets for EMs.

Results

Identification and characterization of ARG-related subtypes in EMs

To investigate the relationship between ARGs and EMs subtypes, consensus clustering analysis was performed using the GSE120103 dataset. After gradually increasing the clustering variable (k) from 2 to 5, we observed that k = 2 has the highest intra-group correlation and relatively low inter-group correlation (Fig. 1a,b). As k increases from 2 to 5, a significant change in the relative area under the cumulative distribution function (CDF) curve was observed (Fig. 1c). Then we further clarify the clustering stability of samples when k = 2 by tracking plot (Fig. 1d). The outcomes demonstrated that the optimal clustering variable was two and the cohort was evenly distributed between clusters 1 (n = 9) and 2 (n = 9). Differential expression analysis identified 33 differentially expressed genes (DEGs_ARGs) between cluster1 and cluster2 in GSE120103, comprising 17 upregulated genes and 16 downregulated genes (Fig. 1e,f) Table S1. To further explore the biological functions of DEGs_ARGs, GO and KEGG analysis was performed. The results indicated that these DEGs_ARGs were associated with the execution phase of apoptosis, regulation of programmed necrotic cell death and the tumor necrosis factor (TNF) signalling pathway (Fig. 1g,h). In order to explore the types, distribution, and functional status of immune cells in the EMs immune microenvironment, we conducted immune cell enrichment analysis. Immune cell analysis revealed significant differences in 16 immune cell populations between the two clusters (P < 0.05). Notably, cluster 1 exhibited a significantly higher fraction of enriched immune cells compared to cluster 2 (P < 0.05) (Fig. 1i,j). It can be seen that cluster1 is not only closely related to apoptosis-related signaling pathways, but also accompanied by a large number of immune cells infiltration.

Fig. 1.

Fig. 1

Identification and characterization of ARG-related subtypes in EMs. (a) Consensus clustering matrix when k = 2. (b) Relative alterations in CDF delta area curves. (c) Consensus CDF curves when k = 2 to 5. (d) Consensus tracking plot when k = 2 to 5. (e) Volcano plot of DEG_ARGs in GSE120103. (f) Heatmap of DEGs_ARGs in GSE120103. (g) GO analysis of the intersection of genes. (h) Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the intersection of genes. (i-j) Identification of immune cells in GSE120103. * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001.

Identification of candidate hub genes via machine learning

A similar analysis was carried out between EMs in GSE120103 and control and a total of 6,089 DEGs_EM were identified, of which 3,737 were up-regulated and 2,352 were down-regulated (Fig. 2a,b) Table S2. By overlapping the DEGs_ARGs, along with the DEGs_EM, 7 DEARGs were obtained, including FAS (Fas cell surface death receptor), CSF2RB (Colony-stimulating factor 2 receptor beta), FASLG (FAS ligand), TNF, PRKAR2B (Protein kinase cAMP-dependent type II regulatory subunit beta), CASP6 (Caspase-6) and IL1RAP (Interleukin 1 receptor accessory protein) (Fig. 2c). Based on the protein interaction (PPI) network analysis, it was found that CASP6, FAS and FASLG exhibited strong interactions with TNF (Fig. 2d). Given the varying effectiveness of different machine learning methods in disease diagnosis, we used support vector machine recursive feature elimination (SVM-RFE) and least absolute shrinkage and selection operator (LASSO) logistic regression for biomarker screening, which can mitigate the limitations associated with using a single machine learning approach. The SVM-RFE algorithm found that when n = 4, the error rate is the lowest, so we got four hub characteristic genes: FAS, IL1RAP, CSF2RB, and PRKAR2B (Fig. 2e, Table S3). Subsequently, further LASSO regression analysis found that the optimal λ value was 5, FAS, CSF2RB, FASLG, PRKAR2B, and IL1RAP were screened (Fig. 2f,g) Table S4. In summary, FAS, IL1RAP, CSF2RB, and PRKAR2B overlapped in both machine learning algorithms and were identified as central DEARGs (Fig. 2h), demonstrating their significance as potential biomarkers in the diagnosis of EMs.

Fig. 2.

Fig. 2

Machine learning in identifying key diagnosis genes. (a) Volcano plot of DEGs_EM. (b) Circo heatmap of DEGs_EM. Red and blue data points represent upregulation and downregulation, respectively. (c) The Venn diagram revealed seven DEARGs between DEGs_ARGs and DEGs_EM. (d) Protein interaction (PPI) network diagram demonstrated that TNF displayed strong interactions with CASP6, FAS and FASLG. (e) SVM-RFE analysis. The horizontal coordinate indicates the number of genes, and the vertical coordinate indicates the error rate. (f, g) LASSO regression analysis chart. (h) The Venn diagram revealed four intersection genes from the overlap of SVM-RFE and LASSO regression.

Diagnosis value evaluation

To evaluate the diagnostic efficacy of the four central DEARGs for EMs, we calculated the area under the receiver operating characteristic (ROC) curve (AUC) for each gene in GSE120103. The AUC was calculated to assess the diagnostic performance of the candidate genes (Fig. 3a). The results were as follows: FAS (AUC = 0.988), CSF2RB (AUC = 0.802), PRKAR2B (AUC = 0.719) and IL1RAP(AUC = 0.645), genes with an AUC > 0.7 were considered as candidate key genes. Expression levels of FAS, PRKAR2B and CSF2RB were significantly lower in EMs compared to controls in both GSE120103 and GSE51981(P < 0.05) (Fig. 3b,c). Define these three genes as key genes. To improve diagnostic precision, a nomogram model incorporating these hallmark genes was constructed (Fig. 3d). According to the results of the decision curve analysis (DCA) (Fig. 3e), the nomogram model showed a higher benefit than individual genes, indicating that the predictive model had higher efficacy and clinical applicability. The calibration curve with P > 0.05 and MAE < 0.1 indicates that the nomogram model has high accuracy in predicting the disease (Fig. 3f). And the AUC > 0.7 in the ROC curve indicates that the model has excellent predictive performance (Fig. 3g). Based on the dataset GSE23339, the diagnostic model was externally validated. The results of the calibration curve and AUC = 0.933, indicating that the nomogram model has a certain predictive value (Fig. S1). The above confirms that the 3 key genes have the potential as diagnostic markers.

Fig. 3.

Fig. 3

Construction of the ROC and nomogram and the diagnosis value assessment. (a) The ROC curve of each central DEARGs (FAS, IL1RAP, CSF2RB and PRKAR2B). (b, c) The expression of FAS, PRKAR2B and CSF2RB in the training set (b) and verification set (c). (d) Nomogram was constructed based on FAS, CSF2RB and PRKAR2B. (e) DCA curves. (f) The calibration curve. (g) ROC curve of the diagnostic efficacy verification. * P < 0.05, ** P < 0.01, *** P < 0.001.

Immune cell infiltration and gene interaction analysis

To understand the enrichment differences of immune cells between EMs samples and control samples, immune infiltration analysis was conducted. The gamma delta T cell, immature dendritic cell, monocyte and natural killer (NK) cell were significantly higher in EMs than the control groups (P < 0.05) (Fig. S2a,b). According to the correlation analysis of biomarkers and immune cells with intergroup differences, it showed the activated B cell, effector memory CD4 + T cell, eosinophil and mast cell were significantly reduced (P < 0.05). Correlation analysis showed that CSF2RB, FAS and PRKAR2B were strongly associated with multiple immune cell types. CSF2RB exhibited the strongest positive correlation with activated B cells (R = 0.751) (P < 0.001) and the strongest negative correlation with immature dendritic cells (R=−0.629) (P < 0.001) (Fig. S2c). FAS was significantly positively related to myeloid-derived suppressor cells (MDSCs) (R = 0.886) (P < 0.001) (Fig. S2d). Further gene-disease correlation analysis identified FAS as the most relevant gene in EMs (Inference score = 18.88), followed by PRKAR2B (Inference score = 11.54) and CSF2RB (Inference score = 3.42) (Fig. S2e).

To explored the upstream transcription factors of biomarkers and the role of transcription factors in regulating the expression of key genes, we conducted transcription factor regulatory network analysis, showed that the network contained 126 nodes and 192 relationships. Additionally, CREB1, TFAP4, ID1, TFDPA, MITF, NFYA and PCBP1 were identified as key regulators influencing CSF2RB, PRKAR2B and FAS (Fig. S2f). In addition to transcription factor prediction, we also analyzed the interactions between biomarkers and miRNAs. Using the starBase database, we predicted miRNAs associated with three key genes and the corresponding lncRNAs for these miRNAs. We identified a common miRNA, hsa-miR-576-5p, and its corresponding lncRNA, MALAT1, shared by all three genes (Fig. S3). This indicated the complex regulatory relationship between genes and transcription factors and ceRNAs, with transcription factors and ceRNAs playing a crucial role in the regulatory network.

Subcellular localisation and drug prediction

To investigate the biological functions of the identified hub proteins, we predicted their subcellular localisation. The results indicated that FAS, CSF2RB and PRKAR2B were primarily localised to the plasma membrane (Fig. 4a–c). Based on key DEGs, we used the DGIdb database of gene-drug interaction data to identify potential therapeutic drugs and demonstrate their interactions. THEOPHYLLINE was identified as a potential therapeutic agent for PRKAR2B. Four drugs, including LERIDISTIM, were identified as therapeutic agents for CSF2RB. Additionally, 19 drugs, including aspirin, were identified as potential therapeutic agents for FAS (Fig. 4d).

Fig. 4.

Fig. 4

Subcellular localisation and drug prediction. (ac) Subcellular localisation of FAS, CSF2RB and PRKAR2B. (d) Prediction of small molecule drugs targeting the key genes. Dark blue represented key genes, light blue represented drugs predicted for FAS, green represented drugs predicted for CSF2RB, and yellow represented drugs predicted for PRKAR2B.

Expression validation analysis

The RT-qPCR results showed that the expression of FAS and CSF2RB in control group notably elevated compared to the EMs group, the expression level of PRKAR2B did not show a significant difference, but still exhibited an increasing trend. (Fig. 5). And it was found that the expression trends of FAS, PRKAR2B, and CSF2RB were consistent with the results of the training set and validation set, which provided a reference for subsequent studies.

Fig. 5.

Fig. 5

RT-qPCR validation of the three key genes. (ac) Expression validation analysis revealed differences in key genes (FAS, PRKAR2B and CSF2RB) in control and EMs groups. * P < 0.05, ** P < 0.01, *** P < 0.001.

Discussion

EMs is a common benign gynaecological disorder characterised by immune dysregulation, inflammation and hormone dependence15,16. A decline in endometrial spontaneity has been implicated as one of the causes of EMs17. Apoptosis plays a crucial role in the pathogenesis of EMs. However, the exact molecular mechanisms underlying apoptosis in EMs remain unclear, and the current treatment options are limited. Previous studies suggest that reduced cell death in EMs may be a primary factor contributing to its development. Therefore, exploring the underlying mechanisms of EMs and identifying effective, non-invasive apoptosis-related biomarkers are urgent clinical needs. In this study, we identified three apoptosis-related key genes—FAS, PRKAR2B, and CSF2RB—through bioinformatics analysis and machine learning methods. These key genes were validated in both training and validation datasets. The DCA and ROC curve analysis of the nomogram demonstrated that the predictive model based on these three biomarkers exhibits high efficacy and clinical utility, with robust predictive performance. To further explore the functional relevance of these key genes, we conducted a series of analyses, including gene enrichment analysis, immune infiltration analysis, gene-disease co-expression analysis, transcription factor regulatory network analysis, and ceRNA network construction. These analyses provided insights into the relationships between the key genes and differential immune cells, diseases, and upstream transcription factors. Validation using human samples revealed that the expression of FAS, and CSF2RB in the control group notably elevated compared to the EMs group, the expression level of PRKAR2B did not show a significant difference, but still exhibited an increasing trend.

FAS is a member of the TNF receptor superfamily, expressed in various normal and neoplastic cells. Fas-mediated apoptosis plays a crucial role in regulating the endometrial cycle. The FAS/FASL system serves as a key apoptotic mediator in various reproductive tissues, including the ovary, mammary gland and endometrium, particularly in response to hormonal changes18. In 2006, M.K.O. Gomes et al. discovered that administering GnRH agonists (GnRHa) to patients with EMs led to an increase in FAS levels, indicating that FAS is downregulated in EMs19. Studies have shown that overexpression of erythropoietin-producing hepatocellular carcinoma A3 can inhibit the mTOR signaling pathway in uterine tissues and mice, promoting autophagy and apoptosis of macrophages in mice with EMs, while also upregulating the expression of FAS20. These findings are consistent with our results, indicating that FAS is significantly downregulated in EMs and may serve as a potential diagnostic biomarker for EMs. One of the typical characteristics of EMs is the abnormal proliferation and apoptosis imbalance of endometrial cells and stromal cells21. As the main regulatory molecule of apoptosis, FAS can activate the downstream caspase cascade and trigger apoptosis after binding to its ligand FASL19. Therefore, we speculate that after the decrease of FAS expression level, the activity of FAS-FASL signaling pathway is reduced, which leads to the inhibition of apoptosis initiation, the sensitivity of ectopic cells to death signals is weakened, and ectopic endometrial cells are resistant to apoptosis, eventually forming a persistent growth advantage of ectopic lesions.

CSF2RB is on chromosome 22 and encodes the β chain of the GM-CSF receptor. It is the shared subunit βc of interleukin 3 (IL3), GM-CSF, and the IL5 receptor22,23. These cytokines play a pivotal role in the proliferation, differentiation, and functional activation of immune cells. Zhu et al. found that CSF2RB affected the cycle and proliferation of tumor cells through JAK/STAT and MAPK signaling pathways24. CBAP interacts with CSF2RB to induce apoptosis through mitochondrial dysfunction25. These studies demonstrate that the CSF2RB gene played an important regulatory role in the process of cell growth and proliferation. Furthermore, the GM-CSF signaling pathway mediated by CSF2RB can activate monocytes, promoting their migration and the release of inflammatory mediators26,27. However, the relationship between CSF2RB and EMs has not yet been reported. In this study, through comprehensive bioinformatics analysis and experimental validation, we found that CSF2RB is downregulated in EMs and can predict the occurrence of EMs. In EMs, abnormal activation of the immune system and inflammatory responses are critical pathological mechanisms. The CSF2RB-mediated signaling pathway may be involved in regulating these immune responses, thereby influencing the onset and progression of EMs.

Studies have shown that the absence of PRKAR2B can up-regulate the expression of anti-apoptotic protein Bcl-xL, activate PKA and MEK/ERK signaling pathways to promote resistance to apoptosis28. This may also be one of the mechanisms leading to the occurrence of EMs. And PRKAR2B encodes the type II beta regulatory subunit of protein kinase A (PKA), which is a primary target of the cAMP signaling pathway29,30. The cAMP signaling pathway plays a critical role in diverse physiological processes, including cell growth, differentiation, and metabolism31. In EMs, abnormal activation of the cAMP signaling pathway may lead to aberrant proliferation and migration of endometrial cells. Additionally, PRKAR2B is potentially involved in the regulation of inflammatory responses31, which are a significant pathological mechanism in EMs. Therefore, PRKAR2B may influence the development and progression of EMs by modulating the cAMP signaling pathway and inflammatory responses. Although the low expression of PRKAR2B in EMs was not statistically significant in the final clinical sample validation of this study, due to the limitation of small sample size, we plan to conduct repeat experiments in a larger cohort in the future to clarify the results.

In this study, the nomogram model constructed based on the key genes FAS, PRKAR2B, and CSF2RB demonstrated good discrimination and calibration in predicting the risk of EMs, indicating its potential clinical application value. Compared to the nomogram developed by Jiang et al. in EMs research, the calibration curve slope of the nomogram in this study is closer to 1, and the net benefit of the DCA is higher, suggesting superior diagnostic performance32. Given that the definitive diagnosis of EMs still relies on laparoscopic surgery, the development of a non-invasive and reliable predictive model holds significant importance for early screening and risk stratification of high-risk populations33. This model integrates the expression profiles of three key genes, enabling the quantification of individual disease risk, which may assist clinicians in preoperative assessment of EMs likelihood and reduce unnecessary invasive procedures. Furthermore, this model could be applied to early screening in high-risk populations (e.g., patients with infertility or chronic pelvic pain), facilitating more precise intervention and treatment. Future studies should validate its diagnostic efficacy through large-scale prospective cohorts and explore its combined application with imaging and serum biomarkers to further enhance the accuracy of non-invasive EMs diagnosis.

Growing immunological evidence indicates significant dysregulation of both cellular and humoral immune responses in patients with EMs. Key findings include diminished cytotoxicity of NK cells, increased macrophage infiltration with enhanced pro-inflammatory activity, abnormal activation of T and B lymphocytes, elevated levels of pro-inflammatory cytokines (e.g., IL-6, TNF-α), and the production of autoantibodies. These collective immune disturbances strongly support the hypothesis that dysregulated immune homeostasis plays a pivotal role in the pathogenesis of EMs, potentially contributing to the survival of ectopic lesions and disease progression34,35. Given the role of immune dysregulation in EMs pathogenesis, we investigated correlations between hub genes and immune cell infiltration. CSF2RB was significantly associated with activated B cells and dendritic cells, and FAS was related to MDSC. CSF2RB mutations have been shown to disrupt alveolar macrophage maturation and function, leading to severe respiratory insufficiency35. Previous studies have reported increased B cell numbers in the peritoneal fluid (PF) of patients with EMs36,37. Notably, overexpression of B cell lymphoma 6 (BCL6) in PF has been linked to EMs progression and EMs-associated infertility38. Ibrutinib, a B cell inhibitor, has been found to suppress EMs lesion growth39. Additionally, Fas signalling promotes lung cancer growth in vivo by enhancing the MDSC and Treg accumulation in the tumour, contributing to the immune escape of lung cancer40.

The subcellular localization results of this study demonstrated that FAS, PRKAR2B, and CSF2RB were all expressed on the plasma membrane, suggesting their potential involvement in the pathological processes of EMs through membrane receptor-mediated signal transduction. As a death receptor, the plasma membrane localization of FAS may regulate apoptosis, thereby influencing the abnormal survival of endometrial cells. PRKAR2B may enhance the sensitivity of the cAMP signaling pathway through its membrane localization, promoting EMs-associated cell proliferation and inflammatory responses. CSF2RB, a cytokine receptor, likely mediates immune cell recruitment at the plasma membrane, exacerbating the local inflammatory microenvironment in EMs. Notably, this subcellular localization pattern shares similarities with disease-associated genes in other pathologies. For instance, in tumors, the membrane localization of FAS is closely linked to dysregulation of programmed cell death pathways41, while the membrane localization of PRKAR2B and CSF2RB is directly associated with cAMP pathway activation and cytokine-mediated immune signaling, respectively. Collectively, the plasma membrane localization of these genes may contribute to EMs pathogenesis by interfering with apoptosis, hormonal signaling, and immune regulation, thereby providing novel insights into therapeutic strategies targeting membrane-associated pathways.

Our drug prediction results identified 24 potential drugs targeting the three key genes, including Floxuridine and Progesterone. Floxuridine, a pyrimidine antimetabolite commonly used in the treatment of various cancers, primarily functions as a deoxynucleoside analog of fluorouracil (5-FU), exerting its antitumor effects by interfering with DNA synthesis42,43. A hallmark of EMs is the abnormal proliferation of endometrial cells outside the uterus. So we speculate Floxuridine may alleviate the condition by inhibiting DNA synthesis and reducing the proliferation of ectopic endometrial cells. Clinically, Floxuridine could potentially be used in combination with other EMs treatments, such as traditional Chinese medicine, to enhance therapeutic efficacy through multi-target synergistic effects. Progesterone, a steroid hormone, plays a critical role in female reproductive health. Progesterone and its synthetic analogs (progestins) are considered first-line treatments for EMs44,45. They act through multiple mechanisms, such as inhibiting ovarian steroidogenesis, leading to anovulation and low serum estrogen levels, thereby suppressing the growth of ectopic endometrial tissue. Additionally, progestins can induce decidualization of ectopic endometrial tissue, causing its atrophy and apoptosis46. Moreover, progesterone can alleviate inflammation associated with EMs47. In summary, progesterone holds broad application prospects in the treatment of EMs. By gaining a deeper understanding of the mechanisms underlying progesterone resistance and developing novel therapeutic strategies, it is possible to enhance the efficacy of progesterone-based treatments and improve the quality of life for EMs patients.

In summary, this study identified three ARGs potentially involved in EMs through a series of bioinformatics approaches, including differential expression analysis and machine learning. Additionally, we explored the associated functional pathways and protein networks, which may provide insights into the mechanisms of EMs and contribute to the development of novel therapeutic strategies. However, this study has certain limitations. The results of bioinformatics analysis are often context-specific and dataset-dependent, and thus may not fully capture the complexity of biological phenomena. The computational models and algorithms used are influenced by factors such as data quality, sample heterogeneity, and algorithm selection. Although we validated the identified biomarkers through RT-qPCR experiments in vitro, further in vivo and in vitro experiments are necessary to strengthen the findings. Therefore, we plan to conduct follow-up studies based on more comprehensive datasets, employing multi-level, multi-dimensional analyses, as well as gene knockout and animal models to enhance the reliability and depth of the research. Despite these limitations, our study has uncovered key genes associated with EMs, providing a theoretical foundation for understanding the molecular mechanisms of apoptosis in EMs. These findings hold significant potential value for advancing early diagnosis, prognostic evaluation, and personalized treatment of EMs.

Conclusions

In this study, we identified three hub apoptosis-related candidate genes (FAS, CSF2RB and PRKAR2B) using bioinformatics analysis and machine learning algorithms. Additionally, we constructed a nomogram for EMs diagnosis, which demonstrates potential as a diagnostic tool for patients with EMs. Our findings also highlight the presence of immune dysfunction in EMs, suggesting a possible link between apoptosis-related mechanisms and immune regulation in disease progression. Further validation studies are necessary to confirm the clinical relevance of these genes and their role in EMs-associated immune dysregulation.

Materials and methods

Clinical sample collection

Lesion tissues and ovarian cyst wall tissues were collected from five patients with EMs for RT-qPCR experiments. All surgical procedures were performed by an experienced surgical team, and pathological evaluations were conducted by expert gynecologic pathologists to confirm the final diagnoses. Prior to sample collection, each patient provided written informed consent and signed an authorisation form after receiving a detailed explanation of the experimental procedures, instrumentation and potential risks. This study was approved by the Ethics Commission of Fujian Maternity and Child Health Hospital (approval number: 2023KYLLR01015). All experiments were conducted in strict accordance with relevant ethical guidelines and regulations.

Gene expression profiles

The GSE120103 dataset (GPL6480 platform) was downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), comprising 18 EMs and 18 normal endometrial samples. This dataset was designated as the training set. The GSE51981 dataset (GPL570 platform) was used as a validation set, comprising 77 EMs and 71 normal endometrial samples. GSE23339 datasets(GPL6102 platform) involved 9 EMs and 10 normal endometrial samples.

In total, 87 ARGs (Table S5) were identified from the literature for further analysis48 (Fig. 6).

Fig. 6.

Fig. 6

The flowchart of the study design.

DEGs: differentially expressed genes. PPI: protein-protein interaction. SVM-RFE: support vector machine recursive feature elimination. LASSO: least absolute shrinkage and selection operator. ROC: receiver operating characteristic curve.

Consensus clustering and identification of DEGs

Consensus Clustering was an unsupervised clustering method widely used in cancer subtype classification studies. By clustering samples from different omics datasets, new disease subtypes can be identified or comparisons between different subtypes can be made. In this study, we used the KM clustering method from the ‘ConsensusClusterPlus’ package49 in R to perform consistency clustering on the case samples in the GSE120103 dataset. We used the ‘GSVA’ package50 to analyze the training set samples, and with immune-related genes provided in the literature as background gene sets, we calculated the enrichment scores of immune cells. We used the ‘ggplot2’ package in R to create a heatmap of immune cell infiltration scores in EM patients, visualizing the distribution of immune cells. To analyze the differences in immune cell enrichment scores between different cluster, we performed the Wilcoxon test, setting a significance threshold of P < 0.05 for differentially abundant immune cells. Next, we used the ‘limma’ package51 in R to conduct differential expression analysis of ARGs in the GSE120103 dataset, comparing the DEGs_ARGs between the subtypes, with thresholds set at |log2FC| > 0.5 and P < 0.05. To further understand the biological functions and pathways of the differentially expressed apoptosis genes, we used the ‘clusterProfiler’ package52 in R to perform GO and KEGG5355 enrichment analyses, selecting significantly enriched pathways (P < 0.05). Additionally, we used the ‘limma’ package to analyze the DEGs2 between Control and Case samples in the training set, with the same thresholds of |log2FC| > 0.5 and P < 0.05. And we intersected the DEGs1 between apoptosis subtypes and the DEGs2 between Control and Case samples to identify potential DE-ARGs. Ultimately, to explore interactions between proteins encoded by DE-ARGs, these genes were submitted to the STRING database (https://cn.string-db.org/) with an interaction score threshold of 0.4. Additionally, a PPI network was constructed and visualised using Cytoscape software (v3.9.1)56.

Machine learning

Although differential analysis can identify potential candidate biomarkers, it does not guarantee that these features will have the best predictive performance in classification tasks. Therefore, we used SVM-RFE and LASSO for feature selection. SVM-RFE was able to effectively identify key features in high-dimensional spaces and avoid overfitting issues, which was particularly suitable for gene screening scenarios with limited sample sizes. LASSO introduced an L1 regularization term into the loss function, which could directly shrink the coefficients of irrelevant genes to 0, achieve feature sparsity, and thus automatically complete gene screening. This characteristic made it more efficient in identifying genes. The ‘e1071’ package (v1.7-12)57 was applied for the SVM-RFE algorithm, while. the ‘glmnet’ package (v4.1-7)58was utilised for LASSO regression analysis. Subsequently, the lambda. Min parameter was set to 0.001, corresponding to the lambda value at which the model’s error rate was minimised. After 10-fold cross-validation, genes with non-zero regression coefficients were identified as characterised genes. Candidate genes were selected by overlapping the results of the two algorithms. The diagnostic potential of these candidate genes was evaluated using ROC curve analysis, performed with the ‘pROC’ package (v1.18.0)59. Genes with an AUC value > 0.7, exhibiting significant expression differences between EMs and control groups, and maintaining consistent expression trends in both training and validation sets were considered key genes.

Construction of nomogram

Based on the training set, using the R package ‘rms’, scores were assigned according to the expression levels of biomarkers. Each factor corresponds to a score, and the total score (TotalPoint) was the sum of the scores of each factor. The disease diagnosis probability was then predicted based on the total score. The nomogram regression equation was as follows:

graphic file with name d33e932.gif

Based on biomarkers and the nomogram model (multivariate logistic regression model), decision curves and ROC curves were plotted. Then, we performed external validation of the diagnostic model using the dataset GSE23339.

Function analysis of genes and immune infiltration

The composition and disposition of immune cells within individual samples were observed. Differences in immune cell enrichment scores between EMs and control samples in GSE120103 were analysed using the Wilcoxon test. Additionally, Spearman analysis was used to explore the correlation between different immune cells and key genes. To further investigate the relevance of key genes to EMs, we queried the CTD database (http://ctdbase.org/). Additionally, to explore the molecular mechanisms underlying key gene regulation, the Knock TF database (https://bio.liclab.net/KnockTF/index.php) was used to predict the transcription factors (TFs) of key genes. Cytoscape software (v3.8.2) was utilised to plot TF-mRNA regulatory networks. To gain further insights into the post-transcriptional regulation of key genes, the star base database (http://starbase.sysu.edu.cn/) was used to predict miRNAs and lncRNAs that interact with the key genes. A ceRNA network was constructed based on the predicted key genes, lncRNAs and miRNAs. Furthermore, Cytoscape software (v3.8.2) was utilised to visualise the lncRNA-miRNA-mRNA network.

Analysis of functional similarity and drug prediction

Subsequently, to understand the subcellular distribution of these genes in order to gain preliminary insights into their potential functions. Cell-PLoc (v3.0) was utilised for the subcellular localisation of key genes. Ultimately, for a deeper understanding of EM pathogenesis and to accelerate the generation of new drugs, the DGIdb database (http://www.dgidb.org/) was employed to predict small molecule drugs that interacted with key genes, and a gene-drug interaction network was constructed to illustrate the relationships between key genes and potential drug candidates.

Reverse transcription -quantitative polymerase chain reaction (RT-qPCR)

RNA was extracted from 10 samples using the TRizol kit, with 1–5 as control samples and 6–10 as EMs samples, all collected from Fujian Maternity and Child Health Hospital. Total RNA extraction was performed according to the manufacturer’s instructions. The concentration and purity of 1 µL of extracted RNA were measured using a NanoPhotometer N50, and the purity/concentration was recorded to calculate the RNA input for subsequent reverse transcription. Subsequently, RNA was reverse-transcribed into cDNA using the SureScript-First-strand-cDNA-synthesis-kit (G3333-50, Servicebio) according to the manufacturer’s instructions. cDNA was then diluted 5–20 times with ddH2O (without RNase/ARase), and 3 µL of cDNA, 5 µL of SweScript RT I Enzyme Mix (NO. 11904018, ThermoFisher), 1 µL forward primer (10 µM) and 1 µL reverse primer (10 µM) were added to the reaction mixture. Additionally, 40 cycles were performed using the CFX96 real-time quantitative PCR instrument (BIO-RAD), with the reaction conditions provided in Table 1. The primer sequences are listed in Table 2, and GAPDH was used as the reference gene. Gene expression levels were calculated using the2−ΔΔCT method.

Table 1.

RT-qPCR procedure.

Temperature Time
Initial denaturation 95℃ 1 min
Denaturation 95℃ 20s
Annealing 55℃ 20s
Extension 72℃ 30s

Table 2.

Primer sequences.

Primer Sequence 5’−3’
FAS F AACAACCATGCTGGGCATCT
FAS R TGATGCAGGCCTTCCAAGTT
PRKAR2B F CATTGCTCAGGGAGATTCGG
PRKAR2B R GCGATTTCTACTGCACCATTCT
CSF2RB F CTACAAGCCCAGCCCAGATG
CSF2RB R CCCTCCTTGGCTGAACAGAG
GAPDH F CGAAGGTGGAGTCAACGGATTT
GAPDH R ATGGGTGGAATCATATTGGAAC

Statistical analysis

R Studio programme (v4.3.1) was employed for statistical analysis. Variations between subgroups were analysed using the Wilcoxon test. P < 0.05 was considered to be statistically significant.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 2 (9.9KB, xlsx)
Supplementary Material 3 (10.7KB, xlsx)
Supplementary Material 5 (687.1KB, csv)
Supplementary Material 6 (1.4MB, tiff)
Supplementary Material 7 (1.4MB, tiff)
Supplementary Material 9 (4.6MB, docx)

Author contributions

Xiulan Weng: Writing – review & editing, Writing – original draft, Funding acquisition, Data curation, Conceptualization. Jingxuan Ye: Writing – review & editing, Writing – original draft, Software, Formal analysis, Data curation. Wenyu Lin: Writing – review & editing, Software, Formal analysis. Jinsong Yi: Validation, Methodology, Formal analysis. Dingjie Wang: English proofreading. Zhenhong Wang: Writing – review & editing, Supervision, Data curation. Pengming Sun: Writing – review & editing, Supervision, Project administration, and Conceptualization.

Funding

This study was funded by the Startup Fund for scientific research, Fujian Medical University (Grant number: 2023QH2045).

Data availability

The Gene Expression Omnibus (GEO) databases (https://www.ncbi.nlm.nih.gov/geo/) were used to retrieve all raw data sets. Further information is available from the corresponding author upon request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiulan Weng and Jingxuan Ye contributed equally to this work.

Contributor Information

Zhenhong Wang, Email: zhen@fjmu.edu.cn.

Pengming Sun, Email: fmsun1975@fjmu.edu.cn, Email: sunfemy@hotmail.com.

References

  • 1.Tomassetti, C. et al. An international terminology for endometriosis, 2021. J. Minim. Invasive Gynecol.28, 1849–1859 (2021). [DOI] [PubMed] [Google Scholar]
  • 2.Leonardi, M., Hicks, C., El-Assaad, F., EI-Omar, E. & Condous, G. Endometriosis and the microbiome: a systematic review. Bjog127, 239–249 (2020). [DOI] [PubMed] [Google Scholar]
  • 3.Peiris, A. N., Chaljub, E., Medlock, D. & Endometriosis Jama320, 2608 (2018). [DOI] [PubMed] [Google Scholar]
  • 4.Darbà, J. & Marsà, A. Economic implications of endometriosis: a review. Pharmacoeconomics40, 1143–1158 (2022). [DOI] [PubMed] [Google Scholar]
  • 5.Muse, K. N. & Wilson, E. A. How does mild endometriosis cause infertility? Fertil. Steril.38, 145–152 (1982). [DOI] [PubMed] [Google Scholar]
  • 6.Slayden, O. et al. Targeted nanoparticles for imaging and therapy of endometriosis. Biol. Reprod.110, 1191–1200 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Omidvar-Mehrabadi, A., Ebrahimi, F., Shahbazi, M. & Mohammadnia-Afrouzi, M. Cytokine and chemokine profiles in women with endometriosis, polycystic ovary syndrome, and unexplained infertility. Cytokine178, 156588 (2024). [DOI] [PubMed] [Google Scholar]
  • 8.Koller, D. et al. Epidemiologic and genetic associations of endometriosis with depression, anxiety, and eating disorders. JAMA Netw. Open.6, e2251214 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen, G. et al. Diagnostic value of the combination of Circulating serum MiRNAs and CA125 in endometriosis. Med. (Baltim).102, e36339 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Saunders, P. T. K. & Horne, A. W. Endometriosis: etiology, pathobiology, and therapeutic prospects. Cell184, 2807–2824 (2021). [DOI] [PubMed] [Google Scholar]
  • 11.Vallvé-Juanico, J., Houshdaran, S. & Giudice, L. C. The endometrial immune environment of women with endometriosis. Hum. Reprod. Update. 25, 564–591 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gebel, H. M. et al. Spontaneous apoptosis of endometrial tissue is impaired in women with endometriosis. Fertil. Steril.69, 1042–1047 (1998). [DOI] [PubMed] [Google Scholar]
  • 13.Nishida, M. et al. Endometriotic cells are resistant to interferon-gamma-induced cell growth Inhibition and apoptosis: a possible mechanism involved in the pathogenesis of endometriosis. Mol. Hum. Reprod.11, 29–34 (2005). [DOI] [PubMed] [Google Scholar]
  • 14.Vaskivuo, T. E. et al. Apoptosis and apoptosis-related proteins in human endometrium. Mol. Cell. Endocrinol.165, 75–83 (2000). [DOI] [PubMed] [Google Scholar]
  • 15.Rolla, E. Endometriosis: advances and controversies in classification, pathogenesis, diagnosis, and treatment. F1000Res. 8, F1000 Faculty Rev-529 (2019). [DOI] [PMC free article] [PubMed]
  • 16.Bulun, S. E. et al. Endometr. Endocr Rev40, 1048–1079 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Samimi, M., Pourhanifeh, M. H., Mehdizadehkashi, A., Eftekhar, T. & Asemi, Z. The role of inflammation, oxidative stress, angiogenesis, and apoptosis in the pathophysiology of endometriosis: basic science and new insights based on gene expression. J. Cell. Physiol.234, 19384–19392 (2019). [DOI] [PubMed] [Google Scholar]
  • 18.Anggraeni, T. D., Rustamadji, P. & Aziz, M. F. Fas ligand (FasL) in association with tumor-infiltrating lymphocytes (TILs) in early-stage cervical cancer. Asian Pac. J. Cancer Prev.21, 831–835 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gomes, M. K. et al. Effects of the levonorgestrel-releasing intrauterine system on cell proliferation, Fas expression and steroid receptors in endometriosis lesions and normal endometrium. Hum. Reprod.24, 2736–2745 (2009). [DOI] [PubMed] [Google Scholar]
  • 20.Xu, H., Gao, Y., Shu, Y., Wang, Y. & Shi, Q. EPHA3 enhances macrophage autophagy and apoptosis by disrupting the mTOR signaling pathway in mice with endometriosis. Biosci. Rep.39, BSR20182274 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Harada, T. et al. Apoptosis in human endometrium and endometriosis. Hum. Reprod. Update. 10, 29–38 (2004). [DOI] [PubMed] [Google Scholar]
  • 22.Watanabe-Smith, K. M. et al. Oncogenic CSF2RB mutation found in a primary leukemia sample results in constitutive signaling, increased receptor stability, and formation of intermolecular complexes. Blood126, 3670–3670 (2015). [Google Scholar]
  • 23.Côrte-Real, B. F., Hornero, A., Dyczko, R., Hamad, A., Kleinewietfeld, M. & I. & Dissecting the role of CSF2RB expression in human regulatory T cells. Front. Immunol.13, 1005965 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhu, N. et al. CSF2RB is a unique biomarker and correlated with immune infiltrates in lung adenocarcinoma. Front. Oncol.12, 822849 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kao, C. J. et al. CBAP interacts with the un-liganded common beta-subunit of the GM-CSF/IL-3/IL-5 receptor and induces apoptosis via mitochondrial dysfunction. Oncogene27, 1397–1403 (2008). [DOI] [PubMed] [Google Scholar]
  • 26.Croxford, A. L. et al. The cytokine GM-CSF drives the inflammatory signature of CCR2 + Monocytes and licenses autoimmunity. Immunity43, 502–514 (2015). [DOI] [PubMed] [Google Scholar]
  • 27.Shang, D. S. et al. Intracerebral GM-CSF contributes to transendothelial monocyte migration in APP/PS1 alzheimer’s disease mice. J. Cereb. Blood Flow. Metab.36, 1978–1991 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Basso, F. et al. Comparison of the effects of PRKAR1A and PRKAR2B depletion on signaling pathways, cell growth, and cell cycle control of adrenocortical cells. Horm. Metab. Res.46, 883–888 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yoon, H. et al. Knockdown of PRKAR2B results in the failure of oocyte maturation. Cell. Physiol. Biochem.45, 2009–2020 (2018). [DOI] [PubMed] [Google Scholar]
  • 30.Xia, L. et al. Transcriptional regulation of PRKAR2B by miR-200b-3p/200c-3p and XBP1 in human prostate cancer. Biomed. Pharmacother. 124, 109863 (2020). [DOI] [PubMed] [Google Scholar]
  • 31.Lucia, K. et al. Hypoxia and the hypoxia inducible factor 1α activate protein kinase A by repressing RII beta subunit transcription. Oncogene39, 3367–3380 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jiang, H. et al. Bioinformatics identification and validation of biomarkers and infiltrating immune cells in endometriosis. Front. Immunol.13, 944683 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bafort, C., Beebeejaun, Y., Tomassetti, C., Bosteels, J. & Duffy, J. M. Laparoscopic surgery for endometriosis. Cochrane Database Syst. Rev.10, CD011031 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kyama, C. M., Debrock, S., Mwenda, J. M. & D’Hooghe, T. M. Potential involvement of the immune system in the development of endometriosis. Reprod. Biol. Endocrinol.1, 123 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Suzuki, T. et al. Hereditary pulmonary alveolar proteinosis caused by recessive CSF2RB mutations. Eur. Respir J.37, 201–204 (2011). [DOI] [PubMed] [Google Scholar]
  • 36.Antsiferova, Y. S. et al. Changes in the T-helper cytokine profile and in lymphocyte activation at the systemic and local levels in women with endometriosis. Fertil. Steril.84, 1705–1711 (2005). [DOI] [PubMed] [Google Scholar]
  • 37.Scheerer, C. et al. Characterization of endometriosis-associated immune cell infiltrates (EMaICI). Arch. Gynecol. Obstet.294, 657–664 (2016). [DOI] [PubMed] [Google Scholar]
  • 38.Evans-Hoeker, E. et al. Endometrial BCL6 overexpression in eutopic endometrium of women with endometriosis. Reprod. Sci.23, 1234–1241 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Riccio, L. G. C. et al. B lymphocytes inactivation by ibrutinib limits endometriosis progression in mice. Hum. Reprod.34, 1225–1234 (2019). [DOI] [PubMed] [Google Scholar]
  • 40.Zhang, Y. et al. Fas signal promotes lung cancer growth by recruiting myeloid-derived suppressor cells via cancer cell-derived PGE2. J. Immunol.182, 3801–3808 (2009). [DOI] [PubMed] [Google Scholar]
  • 41.Ma, H. et al. Pirin inhibits FAS-Mediated apoptosis to support colorectal cancer survival. Adv. Sci. (Weinh). 11, e2301476 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Sato, A., Hiramoto, A., Kim, H. S. & Wataya, Y. Anticancer strategy targeting cell death regulators: switching the mechanism of anticancer Floxuridine-Induced cell death from necrosis to apoptosis. Int. J. Mol. Sci.21, 5876 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Krishna, A. et al. Esterase-Responsive Floxuridine-Tethered multifunctional nanoparticles for targeted cancer therapy. ACS Appl. Bio Mater.7, 6276–6285 (2024). [DOI] [PubMed] [Google Scholar]
  • 44.[46] Tang, W., Zhu, X., Bian, L. & Zhang, B. Research progress of dydrogesterone in the treatment of endometriosis. Eur. J. Obstet. Gynecol. Reprod. Biol.296, 120–125 (2024). [DOI] [PubMed] [Google Scholar]
  • 45.Clemenza, S. et al. Progesterone receptor ligands for the treatment of endometriosis. Minerva Obstet. Gynecol.75, 288–297 (2023). [DOI] [PubMed] [Google Scholar]
  • 46.Barra, F., Scala, C. & Ferrero, S. Current Understanding on pharmacokinetics, clinical efficacy and safety of progestins for treating pain associated to endometriosis. Expert Opin. Drug Metab. Toxicol.14, 399–415 (2018). [DOI] [PubMed] [Google Scholar]
  • 47.Li, Y. et al. Progesterone alleviates endometriosis via Inhibition of uterine cell proliferation, inflammation and angiogenesis in an immunocompetent mouse model. PLoS One. 11, e0165347 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zou, R., Zhao, W., Xiao, S. & Lu, Y. A signature of three apoptosis-related genes predicts overall survival in breast cancer. Front. Surg.9, 863035 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hu, X., Ni, S., Zhao, K., Qian, J. & Duan, Y. Bioinformatics-led discovery of osteoarthritis biomarkers and inflammatory infiltrates. Front. Immunol.13, 871008 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform.14, 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Yu, G., Wang, L. G., Han, Y. & He, Q. Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. Omics16, 284–287 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res.53, D672–D677 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kanehisa, M. Toward Understanding the origin and evolution of cellular organisms. Protein Sci.28, 1947–1951 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Kanehisa, M. & Goto, S. K. E. G. G. Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.28, 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res.13, 2498–2504 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hu, X. et al. Combining metabolome and clinical indicators with machine learning provides some promising diagnostic markers to precisely detect smear-positive/negative pulmonary tuberculosis. BMC Infect. Dis.22, 707 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22 (2010). [PMC free article] [PubMed] [Google Scholar]
  • 59.Robin, X. et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinform.12, 77 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 2 (9.9KB, xlsx)
Supplementary Material 3 (10.7KB, xlsx)
Supplementary Material 5 (687.1KB, csv)
Supplementary Material 6 (1.4MB, tiff)
Supplementary Material 7 (1.4MB, tiff)
Supplementary Material 9 (4.6MB, docx)

Data Availability Statement

The Gene Expression Omnibus (GEO) databases (https://www.ncbi.nlm.nih.gov/geo/) were used to retrieve all raw data sets. Further information is available from the corresponding author upon request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES