Abstract
This study aims to identify angiogenesis-associated genes (AAGs) in endometriosis (EM) by integrating bioinformatics analysis with machine learning, and to investigate their underlying mechanisms. Differentially expressed genes (DEGs) were screened from integrated EM-related gene sets in the Gene Expression Omnibus database. These DEGs were integrated with AAGs retrieved from the AMIGO2 database. Weighted gene co-expression network analysis (WGCNA) was then employed to identify potential EM-AAGs, followed by functional enrichment analysis using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes. Five machine learning algorithms – Random Forest, LASSO, XGBoost, Gradient Boosting Machine (GBM), and SVM-RFE – were utilized for cross-validated screening of hub genes. The diagnostic efficacy of these genes was evaluated through receiver operating characteristic curves, calibration curves, and decision curve analysis. Further analyses included single-gene gene set enrichment analysis (GSEA), immune infiltration profiling, prediction of regulatory transcription factors, and construction of a competitive endogenous RNA (ceRNA) network. This study identified FZD4, SRPX2, and COL8A1 as hub genes for angiogenesis in EM. These genes were significantly upregulated in EM patients and demonstrated excellent diagnostic efficacy. Immune infiltration analysis revealed their regulatory associations with immune cell subpopulations, including M1/M2 macrophages and neutrophils. Single-gene GSEA and competitive endogenous RNA (ceRNA) network construction further elucidated their core regulatory roles in cell cycle control and multi-tiered molecular networks. Integrated bioinformatics and machine learning revealed FZD4, SRPX2, and COL8A1 as hub genes of angiogenesis in EM, proposing novel anti-angiogenic therapeutic strategies targeting EM.
Keywords: angiogenesis, bioinformatics analysis, biomarkers, endometriosis, machine learning
1. Introduction
Endometriosis (EM) is a chronic, estrogen-dependent, inflammatory disease defined by endometrial-like tissue (lesions) outside the uterine lining.[1] It affects approximately 10% of reproductive-age women.[2] Despite well-established theories of retrograde menstruation, coelomic metaplasia, lymph and vein dissemination, no single mechanism fully accounts for the multifaceted etiology of EM. Consequently, the pathogenesis underlying EM remains a subject of considerable controversy.
The essential “3A” pathogenic sequence of “Attachment,” “Aggregation,” and “Angiogenesis” constitutes a mandatory pathway for viable endometrial fragments to develop into EM lesions.[3] Sprouting angiogenesis describes the process of new blood vessel formation through budding from established vasculature.[4] Neovascularization in ectopic lesions and its adjacent tissues is essential to sustain implanted ectopic endometrium survival, promote lesion growth and the progression to EM.[5] Most EM lesions are surrounded by abdominal blood vessels and exhibit the feature of highly vascularized.[6,7] This demonstrates that angiogenesis plays a fundamental role in the progression of EM. Consequently, suppressing and blocking angiogenesis may represent a crucial therapeutic strategy for controlling the implantation and growth of ectopic lesions.
By integrating bioinformatics analysis and machine learning, this study aims to identify angiogenesis hub genes and elucidate their molecular mechanisms in EM, thereby proposing innovative strategies for clinical management.
2. Materials and methods
2.1. Data collection and processing
The GSE7305 (10 EM patients vs 10 healthy controls), GSE23339 (10 EM patients vs 9 healthy controls) and GSE25628 (16 EM patients vs 6 healthy controls) datasets were retrieved from the Gene Expression Omnibus (GEO) database[8] (https://www.ncbi.nlm.nih.gov/geo/). For detailed information about the datasets, see (Table S1, Supplemental Digital Content, https://links.lww.com/MD/Q431). Angiogenesis-associated genes (AAGs) were systematically obtained from AMIGO2[9] (Gene Ontology Consortium; http://amigo.geneontology.org). First, we performed principal component analysis (PCA) on the merged raw dataset to assess the presence of batch effects. The results indicated a certain degree of heterogeneity among samples from different batches. To address this, we applied the widely recognized ComBat algorithm (implemented in the R package “sva”) for batch correction. This algorithm is based on a linear model: “gene expression ~ disease status (EM/healthy control) + batch + potential confounders.” It specifically targets batch-related systematic variations while preserving biology variations associated with disease status. The corrected data were then used for subsequent analyses, ensuring the reliability and reproducibility of our results. Differentially expressed genes (DEGs) were identified using a threshold of adjusted P-value < .05 and |log fold change (logFC)| > 1. This criterion was selected based on 3 considerations: First, the Benjamini-Hochberg correction was applied to control the false discovery rate (FDR) at below 5%, ensuring the statistical significance of the results. Second, the |logFC| > 1 criterion ensures that the detected gene expression changes are biologically meaningful. Third, this threshold is widely adopted in Endometrioid carcinoma (EM) studies and has been validated through pre-experiments to effectively enrich pathways related to angiogenesis. The volcano plot visualizing DEG distributions were generated using “ggplot2” package in R (version 3.5.2). This work utilized exclusively publicly available data and involved no human or animal experimentation, thus qualifying for ethical review exemption.
2.2. Weighted gene correlation network analysis (WGCNA)
Initially, the “WGCNA” package in R (version 1.73) is used to process the sample data and delete outliers. The soft threshold power is determined by the “Pick Soft Threshold” function, which is converted into topological overlap matrix. Hierarchical clustering is performed based on the difference degree of the matrix. Genes with highly similar co-expression patterns are clustered into the same module (ME), and the module most related to EM is selected for subsequent analysis. We intersected the DEGs, the module genes obtained from WGCNA, and the AAGs to obtain the endometriosis-angiogenesis-associated genes (EM-AAGs).
2.3. GO and KEGG enrichment analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG)[10] pathway enrichment and gene ontology (GO)[11] functional enrichment of EM-AAGs, including biological process, cellular component, and molecular function, were analyzed for function and pathway of EM-AAGs, using the “clusterProfiler” package in R (version 4.10.1). The difference was considered statistically significant at P < .05.
2.4. Machine learning screening for hub genes
To further identify the key EM-AAGs critical for EM diagnosis, this study constructed a binary classification prediction model and analyzed data using 5 machine learning algorithms: Random Forest (RF), Least Absolute Shrinkage and Selection Operator (LASSO), eXtreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and Support Vector Machine-Recursive Feature Elimination (SVM-RFE). The overlapping genes identified by the 5 machine learning algorithms were ultimately selected as the hub genes for EM-AAGs.
2.5. Stability verification and external validation methods
For stability verification, 56 EM-AAGs from the original study were used as input, and cross-validation (CV) was conducted for 5 machine learning algorithms (i.e., RF, LASSO, XGBoost, GBM, SVM-RFE). To reduce random errors, the process was repeated 10 times, with “model classification accuracy” as the core index to verify algorithm stability across different data subsets. This addressed the original study limitation of only screening hub genes via algorithm intersection, verifying the reliability of FZD4, SRPX2, and COL8A1 as EM angiogenesis hub genes and eliminating the risk of “hub genes being caused by algorithmic preference for specific data.”
For external validation, 3 independent GEO datasets (GSE11691, GSE120103, GSE7846) not included in the original study were used to build an external validation set (62 EM patients, 45 healthy controls), with no sample overlap with the original training set (GSE7305 + GSE23339 + GSE25628). The ComBat algorithm (R “sva” package) was applied to eliminate batch effects, verified via PCA before and after removal. Meanwhile, FZD4, SRPX2, and COL8A1 expression trends in the validation set were checked for consistency with the original training set. Subsequently, a “FZD4 + SRPX2 + COL8A1” combined diagnostic model was built using the original study’s nomogram weights. Area under the curve (AUC, with 95% CI) served as the core index to compare the model’s efficacy with single genes and the original training set’s combined model; additionally, with sensitivity = 0.85 and specificity = 0.80 as thresholds, decision curve analysis (DCA) was used to assess clinical net benefit.
2.6. Diagnostic model construction and assessment
Receiver operating characteristic (ROC) curves were generated using the “pROC” package in R (version 1.18.5), with the AUC quantifying the diagnostic efficacy of hub genes. Subsequently, a nomogram was constructed, and calibration curves with DCA were performed to evaluate prediction accuracy.
2.7. Single-gene GSEA analysis
To explore potential pathways associated with hub genes in the pathogenesis of EM, we performed gene set enrichment analysis (GSEA). The C2.cp.KEGG.v7.4.symbols.gmt gene set from the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb) served as the reference. Significantly enriched pathways were identified using the following thresholds: absolute normalized enrichment score (|NES|) > 1.5, nominal P < .05, and FDR < 0.25.
2.8. Immune infiltration analysis
This study used the “Cibersort” package in R (version 1.03) and the immune cell feature matrix gene expression profiles provided by the CIBERSORTx[12] (cibersortx.stanford.edu) to calculate the immune cell proportions of EM patients and the healthy control group. The “ggplot2” package in R was used to draw box plots and cluster overlay histograms. Subsequently, Spearman correlation analysis was used to explore the correlation between hub genes and immune cells.
2.9. Transcription factors prediction and ceRNA network construction
Transcriptional regulatory networks involving hub genes and transcription factors (TFs) were analyzed using Network Analyst 3.0[13] (https://www.networkanalyst.ca/) to assess their interactions and functional impacts. The resulting networks were visualized using Cytoscape 3.8.0 software. To identify candidate regulatory miRNAs, predictions from 5 databases were integrated: miRWalk[14] (http://mirwalk.umm.uni-heidelberg.de/), miRNet[15] (https://www.mirnet.ca/), miRTarBase[16] (https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2025/php/index.php), StarBase[17] (https://rna.sysu.edu.cn/encori/index.php), and TargetScan Human 8.0[18] (http://www.targetscan.org/vert_80/). Only miRNAs predicted by all 5 databases were retained.
3. Results
3.1. Integration of datasets and identification of DEGs
The GSE7305, GSE23339, and GSE25628 datasets were integrated to correct for batch effects. Significant batch effects were observed across the 3 datasets before batch effect correction (Fig. 1A), whereas gene expression distributions converged after batch effect correction (Fig. 1B). Comparison between the EM groups and the healthy controls identified 1528 DEGs, comprising 821 upregulated and 707 downregulated genes; the corresponding volcano plot is presented in Figure 1C. Additionally, 555 AAGs were retrieved from the AMIGO2 database using “angiogenesis” as the search term.
Figure 1.
Differential gene expression analysis. (A) Datasets before batch correction; (B) datasets after batch correction; (C) Volcano plot of DEGs: red represents significantly upregulated genes; green represents significantly downregulated genes. DEGs = differentially expressed genes.
3.2. WGCNA and discovery of EM-AAGs
Cluster analysis validated that all samples met quality control criteria without outliers (Fig. 2A). Guided by scale-free topology fit indices and mean connectivity metrics, a soft threshold power (β = 9) was empirically determined (Fig. 2B) to construct the topological overlap matrix. Hierarchical clustering analysis identified 14 co-expression modules (Fig. 2C), among which the MEgrey60 module exhibited the strongest biological relevance, comprising 1543 genes. Intersection analysis between module genes, DEGs, and AAGs uncovered 56 EM-AAGs (Fig. 2D). Differential expression patterns of EM-AAGs were visualized in a heatmap (Fig. 2E).
Figure 2.
Screening of co-expression gene modules using weighted gene correlation network analysis (WGCNA). (A) stratified clustering diagram of EM and control group samples; (B) optimal soft threshold fitting analysis diagram; (C) heatmap of the correlation between module and EM trait; (D) Venn diagram of DEGs, AAGs and MEgrey60; (5) heatmap of EM-AAGs. AAGs = angiogenesis-associated genes, DEGs = differentially expressed genes, EM = endometriosis.
3.3. GO and KEGG enrichment analysis of EM-AAGs
To elucidate potential regulatory mechanisms, GO and KEGG enrichment analysis were performed on the 56 identified EM-AAGs. The results demonstrated that biological process were predominantly enriched in angiogenesis regulation, vasculature development regulation, and positive regulation of angiogenesis; cellular component were primarily associated with the basement membrane, cell junction, and collagen trimer; molecular function were significantly enriched in glycosaminoglycan binding, heparin binding, and sulfur compound binding (Fig. 3A). KEGG analysis revealed significant enrichment in focal adhesion, cell adhesion molecules, and the PI3K-Akt signaling pathway. The top 10 enriched pathways are visualized in Figure 3B.
Figure 3.
GO and KEGG enrichment analyses of candidate genes. (A) GO enrichment bubble plot, showing significantly enriched terms in biological process (BP, green), cellular component (CC, red), and molecular function (MF, blue) for 56 candidate genes. The y-axis denotes term names, and the x-axis represents enrichment score. (B) KEGG enrichment results. Left: Chord diagram of “candidate gene–KEGG pathway” associations (lines indicate associations). Right: KEGG enrichment bubble plot, where the x-axis is gene. Ratio, the y-axis is pathway name, dot color reflects −log10(adjusted P-value), and dot size represents the count of candidate genes in the pathway. GO = gene ontology, KEGG = Kyoto Encyclopedia of Genes and Genomes.
3.4. Machine learning reveals angiogenesis hub genes in EM
Feature importance evaluation of the 56 EM-AAGs was first performed using RF, identifying the top 15 genes by significance score (Fig. 4A and B). Subsequent LASSO regression analysis yielded 9 candidate genes (Fig. 4C and D), while GBM and XGBoost algorithms identified the top 15 candidate genes by ranking (Fig. 4E and F). SVM-RFE analysis identified 9 genes (Fig. 4G and H). Intersection analysis revealed FZD4, SRPX2, and COL8A1 as robustly overlapping hub genes across all 5 machine learning methods. Notably, FZD4 consistently ranked highest in feature importance scores throughout all algorithmic evaluations (Fig. 4I), suggesting its pivotal role in EM regulatory networks.
Figure 4.
Machine learning identifies biomarkers. (A and B) Feature importance identification based on the Random Forest algorithm; (C) LASSO regression algorithm regression cross-validation curve; (D) LASSO regression algorithm regression coefficient path diagram; (E) GBM algorithm; (F) XGBoost algorithm; (G and H) SVM-RFE algorithm prediction true value and error value change curve; (I) Intersection of biomarkers of the 3 algorithms. GBM = gradient boosting machine, LASSO = least absolute shrinkage and selection operator.
3.5. Stability verification and external validation
For CV of hub gene screening algorithms, 56 original EM-associated angiogenesis genes (EM-AAGs) were used as input. Five algorithms – RF, LASSO, XGBoost, GBM, and SVM-RFE – underwent 10 repeated CVs to reduce random errors, with model classification accuracy as the core index to quantify algorithm stability across data subsets (Fig. 5A). This fixed the original study’s flaw of “screening hub genes only via multi-algorithm intersection”: FZD4, SRPX2, and COL8A1 were stably identified as core genes in all validations, eliminating the risk of “hub genes being biased by specific algorithms toward the original training set” and supporting their reliability as EM angiogenesis hub genes.
Figure 5.
Stability and external validation of hub genes and diagnostic model.(A) Changes in accuracy of 10 repeated cross-validations for 5 machine learning algorithms (GBM, LASSO, RF, SVM, XGBoost), used to evaluate algorithm stability and verify the reliability of hub gene screening. (B) PCA plot of the external validation set before batch effect correction, where different markers represent datasets GSE11691, GSE120103, and GSE7846, showing obvious batch differences in the original data. (C) PCA plot of the external validation set after batch effect correction using the ComBat algorithm, with increased overlap of sample distribution indicating effective batch effect correction. (D) ROC curves of individual genes FZD4, SRPX2, and COL8A1 in the external validation set, showing the diagnostic efficacy of each gene.(E) ROC curve of the “FZD4 + SRPX2 + COL8A1” combined diagnostic model in the external validation set, with an AUC of 0.933 (95% confidence interval: 0.906–0.952), reflecting the diagnostic performance of the combined model. AUC = area under the curve, GBM = gradient boosting machine, LASSO = least absolute shrinkage and selection operator, PCA = Principal component analysis, RF = Random Forest, ROC = receiver operating characteristic, SVM-RFE = support vector machine-recursive feature elimination, XGBoost = extreme gradient boosting.
Regarding external validation of the diagnostic model (to address the original lack of independent validation), 3 new, non-overlapping GEO datasets (GSE11691, GSE120103, GSE7846) were used to build an external set (62 EM patients, 45 healthy controls; no overlap with the original training set GSE7305 + GSE23339 + GSE25628). Batch effects were corrected via the ComBat algorithm (R “sva” package), confirmed by improved sample overlap in PCA plots before/after correction (Fig. 5B and C).
External validation results are shown in Fig. 5D (single-gene efficacy) and Fig. 5E (combined model efficacy): FZD4, SRPX2, and COL8A1 were significantly upregulated in EM patients (P < .01) in the external set, consistent with the original training set (proving disease-specific expression). The “FZD4 + SRPX2 + COL8A1” combined model (using original nomogram weights) had an AUC of 0.933 (95% CI: 0.906–0.952) in the external set – higher than single genes (e.g., COL8A1, AUC = 0.876) and close to the original training set’s AUC (0.945). At sensitivity = 0.85 and specificity = 0.80, DCA showed consistent clinical net benefit with the original set.
3.6. Assessment of clinical diagnostic value
To visualize the expression levels of characteristic genes, violin plots were generated (Fig. 6A). The results showed that compared with the healthy controls, the expression of hub genes was significantly upregulated in EM samples.
Figure 6.
Construction and validation of diagnostic model based on hub genes. (A) Expression of hub genes in EM; (B) ROC curve of hub genes; (C) nomogram model; (D) calibration curve; (E) DCA curve. DCA = decision curve analysis, EM = endometriosis, ROC = receiver operating characteristic.
Subsequently, ROC curve analysis was conducted for the hub genes FZD4, SRPX2, and COL8A1 to evaluate their clinical diagnostic value. In the training set, all 3 hub genes exhibited AUC values > 0.9 (Fig. 6B). Based on this, a nomogram was constructed using the 3 hub genes to quantify their diagnostic efficacy for EM (Fig. 6C). The calibration curve showed good agreement with the ideal curve, indicating high prediction accuracy of the model (Fig. 6D). The DCA curve (Fig. 6E) further confirmed the model’s significant clinical net benefit. These results indicate that all 3 hub genes exhibited satisfactory diagnostic performance and may serve as potential diagnostic markers for angiogenesis in EM.
3.7. Single-gene GSEA reveals potential effector pathways of hub genes
To investigate signaling pathways associated with hub genes in EM pathogenesis, we performed GSEA on each hub gene individually. GSEA revealed distinct pathway associations: FZD4 showed significant enrichment in cytoskeleton in muscle cells and cell cycle related pathways (Fig. 7A), while SRPX2 was predominantly associated with Cytokine-Cytokine Receptor Interaction and Nucleocytoplasmic Transport mechanisms (Fig. 7B). COL8A1 demonstrated enrichment in Cytokine-Cytokine Receptor Interaction signaling pathways and Spearman disease related pathways (Fig. 7C). Visualization of the top 15 enriched pathways (Fig. 7D–F) demonstrated convergent downregulation of the cell cycle pathway by all 3 hub genes. Our findings suggest that the hub genes collectively suppress cell cycle progression, thereby inhibiting EM pathogenesis through this pivotal mechanism.
Figure 7.
GSEA and pathway enrichment visualization results of hub genes FZD4, SRPX2, and COL8A1.(A) GSEA results of hub gene COL8A1, showing significantly enriched pathways (e.g., cytokine-cytokine receptor interaction, osteoclast differentiation, complement and coagulation cascades, etc); (B) GSEA results of hub gene FZD4, showing significantly enriched pathways (e.g., cytoskeleton in muscle cells, complement and coagulation cascades, olfactory transduction, etc); (C) GSEA results of hub gene SRPX2, showing significantly enriched pathways (e.g., cytokine-cytokine receptor interaction, cytoskeleton in muscle cells, focal adhesion, etc); (D–F) visualized waterfall plots of the top 15 enriched pathways for FZD4 (D), SRPX2 (E), and COL8A1 (F), where red represents upregulated pathways and blue represents downregulated pathways; the results show that all 3 genes are involved in the downregulated regulation of cell cycle pathways. GSEA = gene set enrichment analysis.
3.8. Immune infiltration analysis of hub genes
In this study, we used the CIBERSORT algorithm to analyze the composition of immune cells and explore the differences in the immune microenvironment between EM patients and healthy controls.
The results showed that EM patients had higher expression levels of resting memory CD4 + T cells, Macrophages M1, Macrophages M2, activated mast cells, and neutrophils. However, plasma cells, T cells follicular helper, NK cells resting, NK cells activated, and dendritic cells activated exhibited lower expression levels compared to the control group (Fig. 8A). There were also individual differences in the proportions of immune cells among EM patients. The proportions of immune cells in 61 EM samples were calculated, as shown in Figure 8B.
Figure 8.
Immune infiltration analysis of hub genes. (A) Analysis of immune cell infiltration in EM group and healthy control group; (B) relative percentages of 22 immune cell subpopulations in 61 samples; (C) correlation between FZD4 expression level and immune cells; (D) correlation between SRPX2 expression level and immune cells; (E) correlation between COL8A1 expression level and immune cells; (F) correlation analysis of FZD4 and NK cells; (G) correlation analysis of SPRX2 and neutrophils; (H) correlation analysis of SPRX2 with neutrophils; (I) correlation analysis of COL8A1 with resting memory CD4 + T cells; (J) correlation analysis of COL8A1 with regulatory T cells. EM = endometriosis, NK cells = natural killer cells.
To better understand the functional roles of hub genes in immune infiltration, we sequentially performed correlation analyses separately (Fig. 8C–E). The analysis revealed a negative correlation between FZD4 and NK cell expression level (R = −0.42, P = .012) (Fig. 8F), whereas none of the other immune cells that were positively correlated were statistically significant. SPRX2 exhibited a positive correlation with neutrophil expression (R = 0.42, P = .011) and a negative correlation with regulatory T cell expression (R = −0.36, P = .033) (Fig. 8G and H). COL8A1 demonstrated a positive correlation with resting memory CD4 + T cell expression (R = 0.47, P = .0041) and a negative correlation with regulatory T cell expression (R = −0.47, P = .0039) (Fig. 8I and J). The correlation analysis demonstrated a significant consistency between immune infiltration profiles and hub genes, underscoring the robust linkage of these genes to disease progression.
3.9. Prediction of TFs and construction of ceRNA network
TFs prediction analysis indicated that FZD4 is regulated by 5 TFs, SRPX2 by 6 TFs, and COL8A1 by 7 TFs. FOXC1 can concurrently regulate all 3 genes (Fig. 9A). By merging the prediction results from 5 databases and identifying the common elements, a total of 13 miRNAs were identified (Fig. 9B). Subsequently, a ceRNA network was constructed to explore the regulatory mechanisms of FZD4, SRPX2, and COL8A1 (Fig. 9C). The network comprises 21 nodes (3 genes, 13 miRNAs, 5 lncRNAs) and 28 interacting edges. C10orf91 was found to bind to miR-31-5p and regulate COL8A1, FZD4, and SRPX2 simultaneously.
Figure 9.
Prediction of TFs for hub genes and construction of ceRNA network. (A) hub genes – TFs interaction network: yellow arrow indicates hub genes; red ellipse indicates upregulated TFs; green ellipse indicates downregulated TFs; purple ellipse indicates non-differentially expressed TFs; (B) 13 miRNAs were obtained from intersection of 5 databases; (C) ceRNA network: pink diamonds for hub genes; purple ovals for miRNAs; green arrows for lncRNAs. ceRNA = competing endogenous RNA, lncRNAs = long non-coding RNAs, miRNAs = microRNAs, TFs = transcription factors.
4. Discussion
Through analysis of DEGs and WGCNA, we identified 56 EM-AAGs. These genes were significantly enriched in angiogenesis regulation and cell adhesion, suggesting that they may participate in the physiological and pathological processes of EM by regulating the vascular microenvironment, cell interactions, and signal transduction, thereby providing direction for elucidating the molecular mechanisms of EM. Through multi-algorithm CV, 3 hub genes, FZD4, SRPX2, and COL8A1, were ultimately identified. Their expression levels were significantly upregulated in EM patients and demonstrated good diagnostic efficacy.
FZD4 is a transmembrane receptor for WNT ligands that orchestrates canonical WNT/β-catenin signaling. In EM lesions, FZD4 expression was significantly elevated, corroborating previous reports implicating aberrant WNT signaling in endometrial pathophysiology. Upon WNT ligand binding, FZD4 recruits LRP5/6 co-receptors, leading to β-catenin stabilization, nuclear translocation, and transcriptional activation of proangiogenic and proliferative target genes including VEGF and Cyclin D1.[19,20] Inhibition of FZD4 or blockade of upstream activators in EM models has been shown to attenuate neovascularization and lesion growth.[21] These findings support a model in which FZD4-mediated WNT/β-catenin activation constitutes a critical driver of angiogenesis and tissue expansion in endometriosis, making it a promising target for therapeutic intervention.
SRPX2 functions as an extracellular matrix protein that promotes early angiogenic remodeling. Knockout studies in endothelial cell models demonstrate that loss of SRPX2 specifically impairs endothelial cell migration and delays vascular sprouting.[22] Mechanistic investigations reveal that SRPX2 interacts with focal adhesion kinase (FAK) and integrinβ1, triggering FAK phosphorylation and downstream activation of Src, PI3K/Akt, and Rac1 pathways.[23,24] This signaling cascade enhances endothelial cell adhesion, motility, and tube formation – hallmarks of active angiogenesis. Aberrant FAK activation within EM lesions potentiates the proangiogenic microenvironment, reinforcing SRPX2 as a candidate biomarker and therapeutic target to disrupt lesion vascularization.
COL8A1 is a short-chain collagen family member localized to basement membranes and smooth muscle layers. COL8A1 overexpression in EM lesions contributes to angiogenesis via 2 complementary mechanisms: it enhances VEGF secretion to activate endothelial cells, and it increases matrix stiffness through augmented collagen deposition, thereby facilitating mechanotransduction signals that stabilize nascent vessels. Furthermore, COL8A1 cooperates with matrix metalloproteinases (MMPs) to remodel the extracellular matrix, creating conduits for endothelial invasion.[25] Given its dual role in biochemical and biomechanical regulation of angiogenesis, COL8A1 represents a novel therapeutic entry point to attenuate aberrant vascular support for EM lesions.
GSEA analysis reveals collective suppression of the cell cycle pathway by FZD4, COL8A1, and SRPX2. Crucially, aberrant cell cycle regulation is closely linked to angiogenesis in EM. Studies demonstrate that within endometriotic lesions, dysregulated cell cycle proteins, specifically overexpression of cyclins D1, A, B1 and reduced expression of cyclin-dependent kinase inhibitors, including p21 and p27kip1, drive aberrant cell proliferation. Concurrently, by upregulating pro-angiogenic factors such as VEGF, this dysregulation drives angiogenesis in ectopic lesions.[26] Furthermore, Arcyriaflavin A, a targeted inhibitor of the cyclin D1-CDK4 complex, induces apoptosis and suppresses proliferation in ectopic cells while reducing VEGF secretion, thereby inhibiting angiogenesis.[27] Notably, cell cycle proteins can modulate angiogenesis-related signaling pathways through participating in tumor necrosis factor (TGF)-β-mediated epithelial-mesenchymal transition (EMT). This ultimately promotes angiogenesis in ectopic lesions.[28] These findings reveal an interplay between the cell cycle regulatory network and angiogenesis in the pathogenesis of EM. Targeting this crosstalk holds promise as a novel therapeutic strategy to intervene in cell cycle dysregulation and curb lesion progression in EM.
Immune infiltration analysis showed that EM patients had higher proportions of M1/M2 macrophages than healthy women, whereas protective immune cells like NK cells were notably reduced. Dysregulated M1/M2 macrophage balance drives pathological angiogenesis in EM. Research indicates that M2 macrophages directly promote angiogenesis in ectopic lesions by secreting pro-angiogenic factors such as VEGF and platelet-derived growth factor (PDGF). For instance, Bacci et al[29] demonstrated in animal models that macrophages from EM patients exhibit an M2-polarized phenotype. These macrophages significantly upregulate VEGF expression through PI3K/Akt pathway activation while secreting interleukin (IL)-8 and TGF-β to collaboratively promote endothelial cell migration and lumen formation. Furthermore, lactate derived from glycolytic activity in ectopic endometrial stromal cells induces M2 polarization, which via the Mettl3/Trib1/ERK/STAT3 axis enhances VEGF secretion and augments pro-angiogenic capacity. Conversely, in the initial phase of EM, M1 macrophages exert anti-angiogenic effects through IFN-γ/TNF-α secretion and concurrently initiate Th1-polarized immunity to eradicate ectopic endometrial cells. Thiruchelvam et al[30] revealed that M1 macrophage-derived IL-12/Angptl4 suppress endothelial proliferation. Critically, M2 polarization counteracts this effect by downregulating anti-angiogenic factors in their secretome. Disease progression is thus characterized by M1-to-M2 transition, skewing the inflammatory milieu toward pro-angiogenic dominance.
Dysfunction of natural killer (NK) cells is closely linked to vascularization of ectopic lesions. Studies demonstrate that NK cells within peritoneal fluid and ectopic lesions of EM patients exhibit significant phenotypic and functional alterations, including overexpression of inhibitory receptors and downregulation of activating receptors. These changes impair cytotoxic activity, compromising the clearance of ectopic endometrial cells. Consequently, this defect in immune surveillance creates a favorable microenvironment for ectopic tissue survival and angiogenesis.[31]
Transcription factor prediction analysis suggests that FOXC1 may co-regulate 3 hub genes. Previous studies have confirmed significantly elevated mRNA and protein expression of FOXC1 in ectopic endometrial tissues of EM patients. FOXC1 promotes cell proliferation, migration, and invasion by activating the PI3K/Akt signaling pathway.[32] The PI3K/Akt pathway is a key regulatory pathway for angiogenesis, capable of upregulating pro-angiogenic factors and promoting neovascularization, thereby supporting the survival of ectopic endometrial tissues.[33,34] In the constructed ceRNA network, we observed an interaction between C10orf91 and miR-31-5p. Current research indicates that miR-31-5p can promote endothelial cell proliferation, migration, and angiogenesis.[35] The pro-angiogenic effects mediated by the FOXC1-PI3K/Akt axis, coupled with the potential regulation by miR-31-5p, may collectively form the molecular basis of a multi-level regulatory network for angiogenesis in EM. This offers novel insights for future mechanistic exploration and targeted therapeutic interventions.
5. Conclusion and outlook
By integrating bioinformatics analysis and machine learning algorithms, this study identified FZD4, SRPX2, and COL8A1 as hub genes associated with angiogenesis in endometriosis (EM). Gene enrichment analysis clarified the potential molecular pathways through which these genes regulate EM-related angiogenesis; immune infiltration analysis revealed the regulatory role of key immune cell subsets in the endometrial microenvironment on EM angiogenesis; the constructed ceRNA regulatory network systematically elucidated the multi-level molecular regulatory mechanisms underlying EM angiogenesis; meanwhile, potential active ingredients and traditional Chinese medicines that may act on this process were predicted. This study may provide certain theoretical references for the screening of EM angiogenesis-related biomarkers and is also expected to expand new research directions for the development of EM-targeted drugs.
Certainly, this study has limitations in several aspects. Specifically, it was conducted entirely as a bioinformatics analysis based on computer simulations and public data, without performing qRT-PCR, animal experiments, or functional experiments. It lacks both in vitro/in vivo experimental support and clinical validation, which constitutes a major limitation. To address this issue in future research, verification of FZD4, SRPX2, and COL8A1 should be carried out using independent tissue cohorts or in vitro angiogenesis experiments. Meanwhile, during the study, genes related to pathways such as estrogen and hypoxia were not excluded a priori; instead, screening was conducted throughout based on uniform statistical criteria. Moreover, due to the absence of a designed crosstalk quantification module, the degree of overlap between angiogenesis and the estrogen/hypoxia pathways could not be systematically evaluated, and further clarification will be required in subsequent studies through hypergeometric tests or pathway interaction analyses. Furthermore, potential confounding factors of the samples, such as disease subtypes, staging, and hormonal status, were not fully adjusted for, which may affect the accuracy of the results. The constructed diagnostic model has not undergone validation with external independent cohorts, posing a risk of overfitting, and its generalization ability remains to be confirmed. In addition, both the immune deconvolution analysis and the construction of the ceRNA network rely on algorithmic predictions, and the reliability of the relevant results still requires direct corroboration through subsequent experiments.
Author contributions
Conceptualization: Jiaoyue Li, Xiaona Ma.
Data curation: Jiaoyue Li, Fawei Li, Sijia Zhang, Xiaona Ma.
Formal analysis: Fawei Li, Sijia Zhang, Changming Zhai.
Funding acquisition: Xiaona Ma.
Supplementary Material
Abbreviations:
- AAGs
- angiogenesis-associated genes
- AUC
- area under the curve
- BP
- biological process
- CC
- cellular component
- ceRNA
- competing endogenous RNA
- DCA
- decision curve analysis
- DEGs
- differentially expressed genes
- EM
- endometriosis
- EMT
- epithelial-mesenchymal transition
- FAK
- focal adhesion kinase
- FDR
- false discovery rate
- FOXC1
- Forkhead Box C1
- GBM
- gradient boosting machine
- GEO
- Gene Expression Omnibus
- GO
- gene ontology
- GSEA
- gene set enrichment analysis
- IL
- interleukin
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- LASSO
- least absolute shrinkage and selection operator
- lncRNAs
- long non-coding RNAs
- MF
- molecular function
- miRNAs
- microRNAs
- MMPs
- matrix metalloproteinases
- MSigDB
- Molecular Signatures Database
- NES
- normalized enrichment score
- NK cells
- natural killer cells
- PDGF
- platelet-derived growth factor
- PI3K/Akt
- phosphoinositide 3-kinase/protein kinase B
- RF
- Random Forest
- ROC
- receiver operating characteristic
- SVM-RFE
- support vector machine-recursive feature elimination
- TFs
- transcription factors
- TGF-β
- transforming growth factor-beta
- Th1
- T helper 1 cells
- VEGF
- vascular endothelial growth factor
- WGCNA
- weighted gene co-expression network analysis
- XGBoost
- extreme gradient boosting
All authors read and approved the final version for manuscript and consent to its publication. The authors declare the originality of this work and attest that it has not been previously published or submitted to another journal.
The datasets generated and analyzed during this study are available in the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/). Additional datasets analyzed are available from the corresponding author upon reasonable request.
This study was supported by the National Natural Science Foundation of China (No. 81973895).
The authors have no conflicts of interest to disclose.
Supplemental Digital Content is available for this article.
How to cite this article: Li J, Li F, Zhang S, Zhai C, Ma X. Integrated bioinformatics analysis and machine learning identifies FZD4, SRPX2, and COL8A1 as angiogenesis hub genes in endometriosis. Medicine 2025;104:43(e45341).
Contributor Information
Jiaoyue Li, Email: xuetianguying@163.com.
Fawei Li, Email: xuetianguying@163.com.
Sijia Zhang, Email: 1204723554@qq.com.
Changming Zhai, Email: zhaichangming1989@163.com.
References
- [1].Allaire C, Bedaiwy MA, Yong PJ. Diagnosis and management of endometriosis. CMAJ. 2023;195:E363–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Zondervan KT, Becker CM, Missmer SA. Endometriosis. N Engl J Med. 2020;382:1244–56. [DOI] [PubMed] [Google Scholar]
- [3].Jinghe L. Past history, current situation and progress in the knowledge of endometriosis (Chinese). Chin J Pract Gynecol Obstetrics. 2020;36:193–6. [Google Scholar]
- [4].Eelen G, Treps L, Li X, Carmeliet P. Basic and therapeutic aspects of angiogenesis updated. Circ Res. 2020;127:310–29. [DOI] [PubMed] [Google Scholar]
- [5].Nana L, Xiaohan T, Meisong L. Research progress of vascular endothelial growth factor and related MicroRNA in endometriosis (Chinese). J Int Obstetrics Gynecol. 2021;48:272–6. [Google Scholar]
- [6].Healy DL, Rogers PA, Hii L, Wingfield M. Angiogenesis: a new theory for endometriosis. Hum Reprod Update. 1998;4:736–40. [DOI] [PubMed] [Google Scholar]
- [7].Feng D, Menger MD, Wang H, Laschke MW. Luminal epithelium in endometrial fragments affects their vascularization, growth and morphological development into endometriosis-like lesions in mice. Dis Model Mech. 2014;7:225–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41:D991–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Carbon S, Ireland A, Mungall CJ, Shu SQ, Marshall B, Lewis S; AmiGO Hub. AmiGO: online access to ontology and annotation data. Bioinformatics. 2008;25:288–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Gene Ontology Consortium. The gene ontology resource: enriching a GOld mine. Nucleic Acids Res. 2021;49:D325–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12:453–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res. 2019;47:W234–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Sticht C, De La Torre C, Parveen A, Gretz N. miRWalk: an online resource for prediction of microRNA binding sites. PLoS One. 2018;13:e0206239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Chang L, Zhou G, Soufan O, Xia J. miRNet 2.0: network-based visual analytics for miRNA functional analysis and systems biology. Nucleic Acids Res. 2020;48:W244–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Huang HY, Lin YC, Cui S, et al. miRTarBase update 2022: an informative resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 2022;50:D222–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42:D92–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].McGeary SE, Lin KS, Shi CY, et al. The biochemical basis of microRNA targeting efficacy. Science. 2019;366:eaav1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Steinhart Z, Angers S. Wnt signaling in development and tissue homeostasis. Development. 2018;145:dev146589. [DOI] [PubMed] [Google Scholar]
- [20].Mariadas H, Chen JH, Chen KH. The molecular and cellular mechanisms of endometriosis: from basic pathophysiology to clinical implications. Int J Mol Sci . 2025;26:2458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Yin M, Wang J, Ying X, Fang Z, Zhang X. Long non coding RNA, C8orf49, a novel diagnostic and prognostic biomarker, enhances PTEN/FZD4-mediated cell growth and metastasis by sponging miR-1323 in endometriosis. Mol Cell Endocrinol. 2023;575:112040. [DOI] [PubMed] [Google Scholar]
- [22].Miljkovic-Licina M, Hammel P, Garrido-Urbani S, Bradfield PF, Szepetowski P, Imhof BA. Sushi repeat protein X-linked 2, a novel mediator of angiogenesis. FASEB J. 2009;23:4105–16. [DOI] [PubMed] [Google Scholar]
- [23].Tanaka K, Arao T, Maegawa M, et al. SRPX2 is overexpressed in gastric cancer and promotes cellular migration and adhesion. Int J Cancer. 2009;124:1072–80. [DOI] [PubMed] [Google Scholar]
- [24].Mitra SK, Schlaepfer DD. Integrin-regulated FAK-Src signaling in normal and cancer cells. Curr Opin Cell Biol. 2006;18:516–23. [DOI] [PubMed] [Google Scholar]
- [25].Yi L, Xie H, Zhang X, et al. LPAR3 and COL8A1, as matrix stiffness-related biomarkers, promote nasopharyngeal carcinoma metastasis by triggering EMT and angiogenesis. Cell Signal. 2025;131:111712. [DOI] [PubMed] [Google Scholar]
- [26].Gonçalves GA, Camargo-Kosugi CM, Bonetti TC, et al. p27kip1 overexpression regulates VEGF expression, cell proliferation and apoptosis in cell culture from eutopic endometrium of women with endometriosis. Apoptosis. 2015;20:327–35. [DOI] [PubMed] [Google Scholar]
- [27].Hirakawa T, Nasu K, Aoyagi Y, Takebayashi K, Narahara H. Arcyriaflavin a, a cyclin D1-cyclin-dependent kinase4 inhibitor, induces apoptosis and inhibits proliferation of human endometriotic stromal cells: a potential therapeutic agent in endometriosis. Reprod Biol Endocrinol. 2017;15:53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Szymański M, Bonowicz K, Antosik P, et al. Role of cyclins and cytoskeletal proteins in endometriosis: insights into pathophysiology. Cancers (Basel). 2024;16:836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Bacci M, Capobianco A, Monno A, et al. Macrophages are alternatively activated in patients with endometriosis and required for growth and vascularization of lesions in a mouse model of disease. Am J Pathol. 2009;175:547–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Thiruchelvam U, Saunders PTK, Critchley HOD. Angiogenic crosstalk within the endometrium reveals a pivotal role for the endometrial macrophage. Biol Reprod. 2012;87(Suppl_1):100–100.22811576 [Google Scholar]
- [31].Reis JL, Rosa NN, Ângelo-Dias M, Martins C, Borrego LM, Lima J. Natural killer cell receptors and endometriosis: a systematic review. Int J Mol Sci . 2023;24:331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Zhou X, Chen Z, Pei L, Sun J. MicroRNA miR-106a-5p targets forkhead box transcription factor FOXC1 to suppress the cell proliferation, migration, and invasion of ectopic endometrial stromal cells via the PI3K/Akt/mTOR signaling pathway. Bioengineered. 2021;12:2203–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Jianqing Q, Jinghua W, Xiaying Z, Hongyu W, Lifang Y, Zhenhua S. The expression of HIF-1α and VEGF in proliferative-phase endometriosis and their correlation with angiogenesis (Chinese). Chin J Clin Obstetrics Gynecol. 2016;17:458–9. [Google Scholar]
- [34].Xiaoxia C, Lili Q. Correlation between the expression levels of adipoQ receptor 3, cell adhesion molecules, and hypoxia-inducible factor-1αof infertility women with endometriosis and their fertility index (Chinese). Chin J Family Plan. 2022;30:1819–22. [Google Scholar]
- [35].Yan C, Chen J, Wang C, et al. Milk exosomes-mediated miR-31-5p delivery accelerates diabetic wound healing through promoting angiogenesis. Drug Deliv. 2022;29:214–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.









