Skip to main content
Springer logoLink to Springer
. 2026 Feb 10;26(1):146. doi: 10.1007/s10238-026-02074-x

Machine learning, whole-transcriptome and integrative omics analysis reveals key regulatory networks governing human spermatogonial stem cells

Danial Hashemi Karoii 1,2, Maryam Osanloo 3, Hossein Azizi 1,, Thomas Skutella 4
PMCID: PMC12909477  PMID: 41665764

Abstract

Spermatogenesis—the process of sperm cell development—is highly dependent on precise and dynamic regulation of gene expression, much of which is controlled by Regulatory networks and hub genes governing spermatogonial stem cells (SSC) identity, including components involved in post-transcriptional regulations. During this complex process, a wide range of RNA-binding proteins (RBPs) and RNA processing enzymes coordinate the transcription, splicing, transport, storage, and translation of mRNAs required for germ cell development. Raw sequencing data were processed and normalized using standard bioinformatics pipelines (e.g., STAR, DESeq2). To identify key Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulations, we applied integrative omics approaches by combining transcriptomic data with publicly available proteomic and interactome databases. Hub proteins were determined through weighted gene co-expression network analysis (WGCNA) and centrality scoring in protein-protein interaction (PPI) networks. Machine learning models, including random forest and support vector machine (SVM), were trained to classify critical regulators based on expression features and metadata. Additionally, cell-cell communication was inferred using ligand-receptor interaction analysis via CellChat and NicheNet to explore the microenvironmental impact on RNA metabolic processes. All findings were validated across culture conditions and biological replicates to ensure robustness. Microarray analysis revealed 92 upregulated and 126 downregulated genes in SSCs versus htFib, with enrichment in motile cilium assembly, spermatid development, and gamete generation. DEGs were mainly extracellular matrix proteins, transporters, and adhesion molecules. PPI network and KEGG analyses identified key hub genes (e.g., MMP3, CAV1, TGFBR2) involved in cell cycle and meiosis pathways. Single-cell RNA-seq of human testicular cells identified 17 clusters, including germ and somatic cell types. Germ cell re-clustering defined SSC subpopulations marked by genes such as FAM74F1, SMCP, and ADAD1. GSEA indicated metabolic shifts, especially in oxidative phosphorylation, during SSC differentiation. Ligand–receptor analysis revealed active cell-cell signaling, particularly involving fibroblasts and macrophages. These findings enhance the understanding of human spermatogonia culture and gene expression, providing insights into SSC biology and potential applications in reproductive medicine.

Keywords: Microarray, Spermatogonial stem cell, Bioinformatics, Reproductive medicine

Introduction

A significant portion of the world’s population struggles with infertility; around 10% of couples experience this issue, and in about half of those cases, the male component is either the primary or contributing reason [1]. Azoospermia is the most prevalent genetic cause contributing to male infertility, which is a multifactorial clinical illness with complicated genetic components [1, 2]. The spermatogenesis and male fertility process begins with spermatogonial stem cells (SSCs), the original cells from which sperm are derived. Since SSCs may repair injured or defective spermatogenesis, they hold great promise as a therapy for male infertility by transplantation [3, 4]. However, there aren’t many SSCs, and there is no a way to cultivate and expand them long-term. Several studies in humans and animals have shown that ESCs can divide into putative primordial germ cells (PGCs), and that these PGCs can then differentiate into SSCs [5]. The reconstitution of SSC formation in vitro is still a critical problem, and these results either indicate poor induction efficiency or entail a complicated induction process with unclear induction components [6].

By identifying particular chromatin states and their associated transcription factors (TFs), chromatin accessibility profiling has emerged as a powerful method for investigating genetic and epigenetic regulation, shedding light on the temporal and spatial dimensions of the pangenomic regulatory landscape of cells and tissues [7, 8]. Furthermore, gene regulatory networks (GRNs) may be understood in the context of changes in chromatin accessibility, and chromatin accessibility profiling is anticipated to be a strong method for identifying regulatory DNA elements [9]. Two technologies that map the landscape of chromatin accessibility are the assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) and DNase-seq [10]. Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulations play a central role in the life cycle of RNA, encompassing its synthesis, processing, transport, translation, and degradation. These proteins ensure the accurate expression of genetic information by regulating each stage of RNA maturation and function [11, 12]. For example, RNA polymerases initiate transcription, while splicing factors and RNA-binding proteins guide the removal of introns and the stabilization of mature transcripts. Once processed, RNA export proteins shuttle transcripts from the nucleus to the cytoplasm, where translation machinery and regulatory proteins such as those involved in microRNA pathways modulate protein synthesis. Finally, RNA degradation complexes maintain cellular homeostasis by removing defective or excess RNA molecules. Altogether, Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulations are essential for proper gene expression and are closely linked to various diseases when their functions are disrupted [13, 14].

In addition to identifying various cell types, this technique also identifies linked genes’ chromatin accessibility and potential TF-binding sites, and it uncovers regulatory areas that are distinctive to each cell type [15]. Chromatin accessibility may regulate transcription factors and is highly connected to differential gene expression. To characterize the change in chromatin accessibility during spermatogenesis, ATAC-seq and ChIP-seq were recently integrated. Namekawa et al. found potential regulatory elements for gene expression unique to spermatogenesis and demonstrated genome-wide, dynamic remodeling of open chromatin during spermatogenesis [16]. In addition, they discovered that sex chromosomes and autosomes have different chromatin environments throughout spermatogenesis [17]. This provides further evidence that bivalent domain creation and poised chromatin are the driving forces behind the epigenetic modifications across the genome in the latter stages of spermatogenesis.

Chromatin profiling tests may also disclose the cell’s gene regulation networks. Among the several current applications of GRNs are ankylosing spondylitis, cancer-related immune cells, plasma cell function, and B-cell development in mammals [18, 19]. Man et al. prior research indicated, among other things, that GRNs of tumor-infiltrating immune T cells had a smaller network size than GRNs of their comparable immune cells in blood, based on a thorough investigation of these cells’ GRNs [20]. A well-established gene network controls SSC destiny choice. It is known that Plzf and Taf4b, two transcription factors, regulate SSC activities. Davoodi et. at, in clinical instances of male infertility in humans, the fundamental GRNs that regulate choices on SSC fate and disrupt these networks remain unknown [19]. Improving in vitro SSC culture continues to be a hurdle, even though there has been significant progress. Although many culture conditions have been investigated, there is always room for improvement in periodskeeping SSCs stem-like and functioning for long periods [21]. networks. Functional investigations have used gene expression profiling and protein-protein interaction networks to further understand the mechanisms that impact SSC behavior in culture. Furthermore, investigations of ligand-receptor interactions provide light on the communication mechanisms between SSCs and testicular supporting cells as Sertoli cells, Leydig cells, fibroblasts and macrophages [22, 23].

This research combines microarray-based gene expression profiling and single-cell transcriptomic analysis to describe human SSCs in culture. To optimize SSC culture conditions and advance stem cell-based reproductive therapies, we intend to construct a complete molecular framework by identifying differentially expressed genes (DEGs), critical regulatory pathways, and cell-cell communication networks.

Materials and methods

Methods & testicular tissue

Three adult males provided testicular tissue for this investigation, which took place in October 2014. We acquired informed written permission from all human participants, and the local ethics boards at the University Hospitals of Tübingen and Heidelberg approved all research using human material. patients. The patients had a wide age range, from 23 to 67. Diverse patient backgrounds were represented in the pool of healthy donated tissue, which included orchiectomies performed after hormone therapy for transsexual patients (1), orchiectomies of healthy testis performed for patients with penis carcinoma or prostate cancer (2), and biopsies of “healthy” (nonmalignant) peritumoral testicular tissue from patients with seminoma (3). For conventional diagnoses and more cancer-specific analyses, specialists from the University Clinic’s Department of Pathology in Tübingen performed histopathological investigations on the testicular tissue used in this research [24, 25].

The gene expression profile was used to assess the characteristics of testicular adult stem cells in this investigation, which included both short-term (< 2 weeks after matrix selection) SSC cultures and long-term (> 2 months, up to 6 months) human adult germline stem cells (haGSC) cultures derived from the testicles of all three males (Supplementary 1). Before conducting microarray analysis in comparison with hESCs and hFibs, the gene expression profiles of individual cells from each group were examined using the Biomark Real-Time quantitative PCR (qPCR) instrument (Fluidigm). The Biomark Real-Time quantitative PCR (qPCR) equipment (Fluidigm) was used to verify the genes chosen from the microarray results.

HaGSCs culture

The tubules were mechanically separated from the acquired human testicular tissues after the tunica albuginea was removed. To create a single-cell suspension, the dissociated tubules in each sample were enzymatically digested for 30 min at 37 °C with gentle mixing using 750 U/mL collagenase type IV (Sigma), 0.25 mg/mL dispase II (Roche), and 5 µg/mL DNase in HBSS buffer with Ca + + and Mg++ (PAA). Following this, 10% ES cell-qualified FBS was added to halt the digestion. Centrifugation was applied to the cell suspension for 15 min at 1000 rpm after passing it through a 100 μm cell strainer. Following the removal of the supernatant, the pellet was washed with HBSS buffer containing Ca + + and Mg++. After that, the LamB cells were carefully transferred to a 12-well plate containing hGSC growth media, on top of an irradiated CF-1 feeder layer. About every two to three days, half of the volume of the culture medium was withdrawn and replaced with new hGSC culture media. The spermatogonia multiplied in a very diverse pattern under these circumstances. Splitting the cell cultures in half every two to three weeks produced the greatest results. Making sure the cells were always at the correct concentration in the wells and not diluted too much was crucial [2630].

Cultivation of human fibroblasts

Dermal scrotal fibroblasts were used to create a primary cell line, which was then cultured in DMEM high glucose supplemented with 10% FBS Superior (Biochrom), 200 µM L-glutamine (PAA), 1% nonessential amino acids (PAA), and 100 mM β-mercaptoethanol (Invitrogen).

Collection of HaSSCs with micromanipulation system

By washing the spermatogonial cells with the culture fluid, the spermatogonia were extracted from the linked monolayer of somatic cells or feeder layer in every sample. On top of a small growth plate with a diameter of 3.5 cm, the cells were carefully resuspended and then put in a solution specifically for single cells. The micromanipulation system of a Zeiss inverted microscope, which was warmed to 37 °C, was used to place the dish on top. With a 20x magnification, the cells were painstakingly extracted using a micromanipulation pipette. After just a few days of growth, the typical form of spermatogonia was clearly apparent. The key factors contributing to this were their spherical shape, a diameter ranging from 6 to 12 μm, and a high nucleus to cytoplasm ratio, as shown by a small, brilliant cytoplasmic ring that extended from the outer cell membrane to the spherical nucleus.

Collection of single cells from enzymatically degraded typical HaGSC colonies

We enzymatically degraded 24 hFibs, 24 hESCs, and 48 haGSCs to a single cell level, and then used a micromanipulation technique for single cell gene expression profiling to choose cells out of a typical hFib or confluent developing hGSC or hESC colony. Our goals in using this approach were to get a better understanding of the homo-/heterogeneity of the cells selected from a typical haGSC colony, the expression profiles of important genes associated with germ and pluripotency in each cell, and the optimal ways to cultivate colonies based from these profiles. We confirmed selected pluripotency-linked genes using Fluidigm analysis and used 200 cells per probe for microarray analysis; however, for Fluidigm analysis, we employed one cell per sample probe. Directly after collection, the cells were mixed with either 6.5 εL of cells direct buffer for Fluidigm or 10 εL of RNA direct lyses buffer for microarray analysis [27, 3133].

Control of Donor-Specific effects

To minimize donor-specific effects arising from inter-individual biological variability, donor identity was explicitly incorporated into the scRNA-seq integration framework. During Seurat-based anchor integration, donor information was treated as a covariate, allowing alignment of shared cell states across donors while preserving genuine biological differences between cell types. Integration anchors were identified across cells originating from multiple donors, ensuring that no single donor disproportionately influenced the integrated embedding. Downstream dimensionality reduction, clustering, and marker identification were performed on the donor-corrected integrated expression space, thereby reducing confounding effects related to donor-specific transcriptional signatures. In addition, quality-control filtering and normalization steps were applied uniformly across all donors, and cells from each donor were required to meet identical inclusion criteria to avoid systematic donor bias.

Sorting and comparing groups of proteins and find regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation

The online tool ArrayMining was used to compare the differentially expressed genes across the three research groups. The ontology analysis program PANTHER (http://www.pantherdb.org/) was used to examine the gene list. A total of 92 upregulated and 126 downregulated genes were identified using the following criteria: |log2FC| > 1 and adjusted p-value < 0.05. These genes were further analyzed for functional enrichment and pathway involvement. The feature set was defined by first selecting DEGs with |log2FC| > 1 and adjusted p-value < 0.05. Additionally, to capture high-variance genes that may play significant roles in SSC regulation, the top 10% of genes by variance were included. Raw gene expression data were normalized using the DESeq2 package, which performs variance stabilization and log2 transformation to account for differences in sequencing depth and library size. This ensures that the normalized data is appropriate for downstream analyses, including machine learning and network analysis [26, 28, 3437].

Investigating pathways enriched in GO

To examine the pathways for enrichment in KEGG and Reactome, we utilized Enrich, a web-based resource for functional gene annotation (http://amp.pharm.mssm.edu/Enrichr/). To validate the biological functions of the genes participating in the protein-protein interaction network of the first protein-protein interaction (PPI) node with the RNA sequence, we performed functional gene enrichment studies using the STRING enrichment analysis in Cytoscape software. The ShinyGO tool indicates that associated infertility genes have a mediating role in biological mechanisms [38, 39].

Network of PPIs for target genes of regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation

Using the Cytoscape tool (https://cytoscape.org/) and the STRING database (https://string-db.org/), further correlations between possible target genes may be identified. The degree value is used by CytoHubba to assess the size of the nodes. The plug-in MCC algorithm is used to identify genes that are up-and-down regulated. For the purpose of completely annotating protein-protein interactions, the STRING database also incorporates several biological data sources, such as genomic context, co-expression, and specially selected pathways. In contrast, Cytoscape provides a plethora of network analysis plugins and methods that go beyond basic PPIs to enable the display and investigation of intricate biological networks. To further evaluate the significance of nodes in a network, CytoHubba uses a number of techniques beyond degree centrality, including betweenness centrality and proximity centrality. Our investigation was strengthened by the use of CytoHubba’s different node ranking algorithms, which offered a multi-faceted view of the regulatory environment.

WGCNA and robustness assessment

WGCNA was performed to identify co-expression modules associated with SSC identity. Prior to network construction, genes were filtered to retain those with high variance across samples to reduce noise and improve network stability. To determine an appropriate soft-thresholding power (β), the pickSoftThreshold function in the WGCNA R package was applied. The optimal β value was selected based on the criterion of approximate scale-free topology, defined as achieving a scale-free topology fit index (R²) ≥ 0.85 while maintaining adequate mean connectivity. Based on these criteria, a soft-threshold power of β = X (replace X with your value, e.g., 8 or 9) was chosen for network construction. An adjacency matrix was constructed using the selected soft-threshold power and transformed into a topological overlap matrix (TOM) to measure network interconnectedness. Hierarchical clustering of genes based on TOM dissimilarity was performed, and gene modules were identified using the dynamic tree cut algorithm with a minimum module size of 30 genes. Closely related modules were merged based on eigengene correlation (merge threshold = 0.25). To evaluate module robustness and reproducibility, module preservation statistics were calculated using permutation testing implemented in the modulePreservation function of WGCNA. Preservation was assessed using Zsummary and medianRank statistics by comparing SSC networks against reference datasets and permuted gene labels. Modules with Zsummary > 10 were considered highly preserved, while those with Zsummary between 2 and 10 were considered moderately preserved.

Additional robustness checks were performed by repeating network construction under varying conditions, including (i) alternative soft-threshold powers (± 1 β), (ii) resampling of input genes, and (iii) repeated analyses using random subsets of samples. Module eigengene–trait correlations were considered robust if directionality and statistical significance were consistent across these perturbations.

Gene expression analyses by fluidigm biomark system

In order to compare the gene expression of individual cells and 200 cells with that of hESCs (a positive control) and human testis hFibs (a negative control), the Biomark Real-Time quantitative PCR (qPCR) system (Fluidigm) was used. Using Taqman assays, we examined the expression of germ cell-specific genes in every single cell sample. TSPYL, DDX4 (VASA), DAZL, ZBTB16 (PLZF), DPPA3 (STELLA), CD9, NANOS, UTF1, GFRα1, GPR125, REX1, KIT, KIT LG, LIFR, STAT3, pluripotency-associated genes POU5F1 (OCT4), LIN28, NANOG, SOX2, GDF3, KLF4, MYC, TDGF1, TERT, DNMT3B, DNMT1, CDH1, LIN28B, OCT4B, and the housekeeping genes 18SRNA, CTNNB1, HNBS, and GAPDH. Each of the tests used a final concentration of 0.2x of the pooled TaqMan assays (Applied Biosystem) that were inventoried. The cells that were going to be processed were transferred straight into a 9-microliter RT-PreAmp Master Mix that included 5.0 µl of CellsDirect 2x Reaction Mix (Invitrogen), 2.5 µl of 0.2x assay pool, 0.2 µl of RT/Taq Superscript III (Invitrogen), and 1.3 µl of TE buffer. The cells that were collected were promptly frozen and kept at a temperature of −80 °C. For 15 min, cells were subjected to 50 °C for cell lysis and sequence-specific reverse transcription. Two minutes at 95 °C rendered the reverse transcriptase inactive. Following denaturation at 95 °C for 15 s and 14 cycles of annealing and amplification at 60 °C for 4 min, the cDNA was subjected to restricted sequence-specific amplification in the same tube. Prior to analysis, the preamplified products were diluted five times using Universal PCR Master Mix. Then, they were administered on 96.96 Dynamic Arrays on a Biomark system using inventoried TaqMan gene expression assays (ABI). There were two technical replicates for every sample.

Data integration and analysis for scRNA-seq of regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation

We used an anchor-based technique (GSE45885 [40], GSE9210 [41], GSE108886, GSE145467 [42], GSE216907 [43], and GSE235324 [43]) to successfully combine data from different donors and different scRNA-seq technologies. To make data integration easier, our strategy uses an unsupervised method to find a group of anchors that reflect a common biological state. For anchor weighing, we used 35 dimensions in this approach. After that, principal component analysis (PCA) was used to reduce the dataset’s dimensions and perform a linear transformation (scaling) on the combined dataset. As a non-linear dimensional reduction methodology, the UMAP method was used for visualizing and exploring the integrated dataset. Based on the ordering of primary components according to their variance contribution, the number of dimensions for UMAP was found to be 35. For the remaining UMAP options, the default values were used. The graph-based clustering approach, which is part of the Seurat R package, was used to cluster cells. during building a shared closest neighbor graph, the dimensionality parameter was set to 35, and during clustering, the resolution parameter was set to 0.3. As shown in Supplementary 1, specific markers for testicular germ and somatic cells were used to classify cell types inside clusters. These markers were found in earlier publications.

To construct an integrated single-cell transcriptomic atlas of human testicular cells, scRNA-seq datasets from multiple independent studies and sequencing platforms were combined using an anchor-based integration approach implemented in the Seurat R package. This method identifies shared biological states across datasets and aligns them in a common low-dimensional space while minimizing technical variability. To mitigate batch effects arising from differences in sequencing platforms, library preparation protocols, donors, and sequencing depth, integration anchors were identified using highly variable genes across datasets. These anchors were then used to correct expression values during data integration prior to dimensionality reduction and clustering. Batch correction effectiveness was assessed by examining the distribution of cells from different datasets within UMAP space, ensuring that clusters were driven by biological cell identity rather than dataset or platform origin.

To address dataset imbalance, all datasets were subjected to uniform preprocessing and quality-control criteria, including filtering based on gene counts, mitochondrial gene content, and detected features per cell. During integration, anchor weighting was applied to prevent overrepresentation of large datasets. Downstream analyses, including clustering and marker gene identification, were performed on the integrated embeddings rather than on individual datasets, thereby reducing bias introduced by uneven sample sizes.

Cell–cell communication

Cell–cell communication was inferred using ligand–receptor interaction analysis based on curated interaction databases. For each ligand–receptor pair, interactions were considered only if both ligand and receptor were expressed in at least 10% of cells within their respective cell populations and exceeded a minimum average normalized expression level of log2-normalized expression > 0.25. Interaction strength (effect size) was quantified using an interaction score, defined as the product of the average normalized expression of the ligand in the sender cell type and the average normalized expression of the receptor in the receiver cell type. To facilitate comparability across interactions, interaction scores were z-score–normalized within each cell-type pair. Statistical significance of ligand–receptor interactions was assessed using a one-sided Wilcoxon rank-sum test, comparing observed interaction scores to a null distribution generated from permuted cell labels. P-values were adjusted for multiple testing using the Benjamini–Hochberg false discovery rate (FDR) procedure. Interactions with an adjusted FDR q-value < 0.05 and an absolute z-scored interaction strength ≥ 1.5 were considered statistically significant and biologically meaningful.

All cell–cell communication analyses were therefore reinterpreted in terms of normal testicular biology, focusing on interactions between SSCs and their supporting somatic cell populations, including Sertoli cells, Leydig cells, fibroblasts, macrophages, and endothelial cells. Identified ligand–receptor interactions were evaluated for their known roles in stem cell maintenance, differentiation, extracellular matrix remodeling, immune surveillance, and metabolic support within the testis. To avoid conceptual ambiguity, oncology-specific terminology (e.g., “tumor microenvironment” and “cancer-associated fibroblasts”) has been removed or replaced with testicular stromal fibroblasts and physiological interstitial niche components. Accordingly, downstream interpretations emphasize homeostatic niche signaling rather than disease-associated pathways.

Model evaluation metrics

In the Materials and Methods section, it’s important to clearly describe the evaluation criteria or performance metrics that were used to assess your machine learning models. The reviewer noted that while accuracy was reported, additional performance metrics (such as ROC-AUC, precision, and recall) were missing, and that needs to be explicitly mentioned. By adding the subsection Model Evaluation Metrics, we use these methods: Accuracy: The percentage of correct predictions made by the model. ROC-AUC: A metric that evaluates the model’s ability to discriminate between positive and negative classes. A value closer to 1 means better discrimination. Precision: The proportion of true positive predictions out of all predicted positive cases. Recall: The proportion of true positive predictions out of all actual positive cases. Explain the use of cross-validation: This ensures that the reported performance metrics are not overestimated due to biases from a single training set. It makes the model evaluation more robust and generalizable to unseen data.

Machine learning model evaluation and statistical comparison

To ensure robust and transparent evaluation of supervised machine learning models, Random Forest (RF) and Support Vector Machine (SVM) classifiers were assessed using stratified k-fold cross-validation. Performance was evaluated across identical train–test splits to enable direct paired comparisons between models. In addition to classification accuracy, the following metrics were calculated for each fold: receiver operating characteristic area under the curve (ROC–AUC), precision, and recall. To quantify uncertainty, 95% confidence intervals (CIs) for all performance metrics were estimated using repeated cross-validation across multiple random seeds. To statistically compare RF and SVM performance, paired Wilcoxon signed-rank tests were applied to cross-validated metric distributions (accuracy, ROC–AUC, precision, and recall). This non-parametric approach was selected because it does not assume normality and is appropriate for paired performance estimates derived from identical data partitions. P-values were adjusted for multiple comparisons where applicable. Statistical significance was defined as p < 0.05 (Fig. 1).

Fig. 1.

Fig. 1

Diagram of the experimental design

Results

Human spermatogonia selection and culture

Matrix selection, namely collagen nonbinding/laminin-binding, and CD49f-MACS were used to remove and concentrate spermatogonia from orchiectomies performed to collect patient data pertinent to SSC cultures. Due to positive DDX4 (VASA) and negative VIMENTIN immunocytochemistry in the first cultures, it is probable that spermatogonia were the majority cells in the chosen cell populations. A decrease in VASA staining and the presence of UTF1, STELLA, and SSEA4 positive were seen in cells cultured over an extended period of time. Pure spermatogonia from several donors showed consistent morphology across all age groups and culture durations. The main reason for this was the spherical shape, together with its size of around 6–12 μm and the high ratio of nucleus to cytoplasm. A bright cytoplasmic ring between the round nucleus and the outside cell membrane is a telltale sign of this ratio. Intercellular bridges allowed all cell cultures to display spermatogonia in pairs, chains, and small clusters. The cultures included many kinds of cells, including larger ones with a diameter of 12–14 μm. There was a smaller nucleus-to-cytoplasm ratio and the cells were oval in shape. There was a significant decrease in htFibs in the unselected cell group. A successful separation of the htFibs from the nonselected cell fractions was achieved, since they had shown significant growth in primary cell cultures. Spermatogonial cultures without htFibs are shown in Fig. 2.

Fig.2.

Fig.2

Cultured human spermatogonia after matrix and CD49f selection in vitro. Spermatogonia in culture often look like this. Connected spermatogonia exist in all cell cultures, either singly, in pairs, chains, or colonies. After being cultured, human SSCs tested positive for (A) VASA, (B) UTF1, (C) STELLA, and (D) SSEA4

Microarray analysis of gene expression in SSC versus fibroblast

On the order of 26,000 transcripts, a microarray was operated. In three distinct types of SSCs and fibroblasts, we used microarray analysis to find 92 upregulated genes and 126 downregulated genes. The data is shown in the figure. The microarray analysis of three SSC human samples indicated that the genes MLF1, SPANXA1, LOC101927278, CCDC110, KCNU1, ROPN1ANKRD31, C6orf99, SPTY2D1-AS1, PHF7, PPP1R36, GTSF1L, TCP11, FAM71F1, SPATA33, LOC389765, SPANXA1, FAM71F1, IQCF5, C5orf58, ACSBG2, CFAP43, SMKR1, HRASLS5, CMTM2, TMCO5A, C6orf10, YPEL1, C4orf45, C3orf30, DAZL, PHF7, TEX36-AS1, LOC157740, ODF2, AKAP4, C12orf50, TPPP2, AKAP14, C11orf65, CCER1, KLHL10, LOC100505841, CT83, CCDC169, GKAP1, SYCP1, CABS1, GKAP1, SPANXB1, GTSF1, CXCL8, ALB, SPATA6, ROPN1, PPP1R2P9, CCDC37-AS1, LOC102724782, PHF7, CT45A1TRIM36, CASC5, KIF2B, TKTL1, HIST1H2BA, C10orf53, SMCP, MAEL, CREM, LELP1, TSACC, TNP1, HMGB4, C1orf194 ADAD1, HIST1H1T, ODF1, CAPZA3, SPANXA2, SPATA6, IQCF4, TMEM31, SPATA42, PRM1, PRM2, BOD1L2, C17orf105, DYNLRB2 and LINC00238 exhibited downregulation, while MMP3, FKBP15, CAV1, SLC38A1, TPM1, MGST1, LOX, ARMCX6, TGFBR2, FBLN5, DKK1, CTSB, PHLDA1, CD44, LOX, CAV2, PERP, DSEL, MME, ADAMTS5, LAPTM4B, TTC37, TCEAL4, CTSK, ADAMTS5, FBN1, PLS3, LAMP2, TSPAN13, AXL, HTRA1, ANXA4, PAM, PRDX3, ATP2B4, PAM, MAFF, NR3C1, GNB4, E2F7, PAM, C12orf75, ANP32E, NQO1, EIF1AX, ADAMTS5, MGAT4B, TFPI, LARS2, LEPROT, CRIM1, MGST1, KITLG, BMP2, SEMA5A, AMIGO2, TMTC1, THBS1, NAV3, NAV2, ANTXR1, STMN2, RGMB, CCND1, ANXA1, AFG3L2, ABCC4, ASPH, DYNLT3, BCAR3, DSEL, DCBLD2, FOSL1, ACAP2, SYNPO2, GSTO1, UBQLN2, SERPINE2, TRPA1, CD109, CRIM1, CRTAP, RHOBTB3, ABCC4, VEGFC, ABRACL, CRIM1, MYOF, ATP6V1C1, GLMP, CLDN11, ZMAT3, ITGB5, GRIK2, IRAK1, COL12A1, SNCA, ARL6IP5, MDM2, EID1, GPX8, S100A4, NABP1, FAM101B, SCARB2, NRP1, SLC38A1, DPP4, PMP22, PBDC1, OST4, TMEM41B, PERP, ADAM12, AKAP2, TBX3, HMGN4, CITED2, CCNG1, TGFB1, FGF5 and DST demonstrated upregulation (Fig. 3). A total of 92 upregulated and 126 downregulated genes were identified using the following criteria: |log2FC| > 1 and adjusted p-value < 0.05. These genes were further analyzed for functional enrichment and pathway involvement.

Fig. 3.

Fig. 3

Using microarrays to analyze gene expression. The following plots show the relationships between SSCs and fibroblasts. (A) Principal component analysis, (B) G-protein volcano plot, and (C) SSC-fibroblast correlation heatmap, and (D) SSC-fibroblast correlation

Protein class sorting in regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation

Extracellular matrix protein (PC00102) genes that are down- or up-regulated according to the PANTHER database Cytoskeletal protein (PC00085) content: 0.90% 4.80%, PC00227 transporter 6.20%, PC00226 scaffold/adaptor protein 4.70%, PC000009, a protein involved in DNA metabolism 0.60%, PC00069, a cell adhesion molecule 1.40%, PC00207, an intercellular signal molecule moderator of protein-binding activity (PC00095) 1.10% 5.20%, PC00237, a protein containing a viral or transposable element Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation (PC00031), 0.10% PC00060, a calcium-binding protein, accounts for 4.50%. PC00264, a gene-specific transcriptional regulator, accounts for 1.00% protein involved in defense and immunity (PC00090), 6.0% protein involved in translation (PC00263), at 1.00% 3.40%, PC00262, an enzyme that converts metabolites 13.10%, PC00260, an enzyme that modifies proteins protein that regulates chromatin or chromatin-binding proteins (PC00077) 9.30% 1.10%, PC00219, a carrier/transfer protein 0.8%, PC00150, a membrane traffic protein chaperone (PC00072), 3.70% 1.80%, PC00070, a protein involved in cell junctions protein with a structural role (PC00211)—0.20% 0.20%, PC00210, a protein for storage 0.10% and 2.5% transmembrane signal receptor (PC00197)  ( Fig. 4 and Fig. 5).

Fig. 4.

Fig. 4

Performing gene ontology (GO) enrichment analysis on the genes inside the module. The colors correspond to the corrected p-values (BH), while the size of the dots corresponds to the number of genes. This picture refers to (A) the biological processes, (B) molecular functions, and (C) cellular components

Fig. 5.

Fig. 5

Analyzing protein categories and making pairs. (A) Transcripts with statistically significant expression up-regulated are grouped into protein classes using PANTHER. (B) Transcripts with statistically significant expression down-regulated are grouped into protein classes

PPI network

Using information from the STRING database, a PPI network was built to represent the DEGs in SSCs. Based on these findings, the PPI network now includes 168 genes. With 85 nodes and 260 edges, the PPI network showed an enrichment p-value of less than 0.01. Some genes showed downregulation, including HIST1H1T, ODF1, CAPZA3, SPANXA2, SPATA6, IQCF4, TMEM31, SPATA42, PRM1, PRM2, BOD1L2, C17orf105, DYNLRB2, and LINC00238. Upregulation was observed for MMP3, FKBP15, CAV1, SLC38A1, TPM1, MGST1, LOX, ARMCX6, TGFBR2, FBLN5, DKK1, CTSB, PHLDA1, CD44, LOX, CAV2, PERP, DSEL, MME, ADAMTS5, LAPTM4B, TTC37, and TCEAL4. Thanks to FunRich, a network including 30 DEGs and the genes that flank them was built. The important modules were then located using the MCODE plugin. Module 1 had an MCODE value of 22, module 2 had an MCODE score of 4, module 3 had an MCODE score of 3, and the first four functional clusters of modules were chosen based on their MCODE ratings. Every module underwent a KEGG pathway analysis with the help of DAVID. These genes are involved in regulating the cell cycle, replicating DNA, and oocyte meiosis, according to the KEGG pathway analysis of module 1. Genes exhibiting enrichment in ribosomes made up Module 2, which had four nodes and five edges. Amino acid synthesis, carbon metabolism, and the HIF-1 signaling pathway were three of the three edges and three nodes in Module 3. The PPAR signaling pathway and genes involved in fat digestion and absorption were part of Module 4, which contained seven edges and six nodes. All of the modules’ PPI enrichment p-values were less than 0.05. Based on their elevated degree scores, the cytoHubba plugin identified 16 hub genes associated with SSCs: MMP3, FKBP15, CAV1, SLC38A1, TPM1, MGST1, LOX, ARMCX6, TGFBR2, FBLN5, DKK1, CTSB, PHLDA1, CD44, LOX, and CAV2. The genes were then used to construct the PPI network of the key genes using the STRING online database. The key genes and the genes connected with them were also part of the interaction network that was built using the FunRich program. The hub genes’ PPI network has ten nodes and forty-five edges. A high degree of clustering was indicated by the average network local clustering coefficient of 1, which was 1. A p-value for PPI enrichment ≤ 0.01 indicates a significant enrichment of protein interactions. As shown in Fig. 5, the gene co-expression analysis of the 10 hub genes also indicates that these genes probably engage in active interactions with one another (Fig. 6).

Fig. 6.

Fig. 6

PPI network of SSC-associated genes. Nodes represent genes/proteins, and edges indicate known or predicted protein–protein interactions. Node size is proportional to node degree (number of connections), reflecting network centrality. Node color indicates gene regulation status, with red representing upregulated genes, orange representing downregulated genes, and yellow representing non-differentially expressed neighbors. Edge thickness corresponds to interaction confidence, as derived from the STRING database. Hub genes are defined as nodes with the highest degree and MCC scores. (A) Protein–protein interaction (PPI) network highlighting hub genes identified from differentially expressed genes in spermatogonial stem cells (SSCs). Nodes represent genes, and edges represent known or predicted protein–protein interactions. Node size is proportional to node degree and MCC score, indicating network centrality. Node color reflects gene regulation status, with red indicating upregulated genes and blue indicating downregulated genes. (B) Functional module detection within the PPI network using the MCODE algorithm. Distinct clusters represent highly interconnected gene modules associated with biological processes such as cell cycle regulation, meiosis, metabolic pathways, and spermatogenesis. Different colors indicate separate modules, and edge thickness corresponds to interaction confidence. (C) Co-expression analysis of the identified hub genes, showing strong correlations among SSC-associated regulators. Node size reflects relative connectivity, and edge intensity represents correlation strength between gene pairs, indicating coordinated transcriptional regulation

Protein–protein interaction (PPI) network analysis identified several high-centrality hub genes, including MMP3, CAV1, and LOX, which are primarily annotated as extracellular matrix remodeling or signaling proteins. These genes were defined as network hubs based on degree and MCC scores and were not classified as Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulations according to PANTHER annotation. Their central positions suggest a potential role in integrating extracellular niche signals with intracellular regulatory programs in SSCs.

Functional enrichment analysis of DEGs

Based on the top five terms from the gene ontology enrichment analysis, the upregulated DEGs were linked to GO:0044458 motile cilium assembly, GO:0007286 spermatid development, GO:0007276 gamete generation, and GO:0007018 microtubule-based movement, all within the biological process category. The increased DEGs in the MF category were linked to GO:0004556 alpha-amylase activity, GO:0016160 amylase activity, GO:0016887 ATP hydrolysis activity, and GO:0015631 tubulin binding. Gene ontology (GO) concepts were found to be enriched for differentially expressed genes (DEGs) in the downregulated BP category: inside the cell, GO:1,901,566 for the biosynthesis of organonitrogen compounds, GO:0070727 for the localization of cellular macromolecules, GO:0016477 for the migration of cells, and GO:0034613 for the localization of cellular proteins (Fig. 3). Among the MF-related downregulated DEGs were those linked to GO:0045296 cadherin binding, GO:0051015 actin filament binding, GO:0003779 actin binding, GO:0050839 cell adhesion molecule binding, and GO:0051020 GTPase binding (Fig. 7).

Fig. 7.

Fig. 7

GO analysis and signaling pathway. (A) The biological procedure through which up-regulated genes are activated. (B) The biological process through which genes are down-regulated. (C) Up-regulated genes’ molecular function. (D) The molecular function of genes that have been down-regulated, and (E) Signalling pathway analysis.

Finding testicular cell composition during testis development with machine learning techniques

Utilizing publicly accessible datasets, we performed single-cell RNA sequencing on testicular cells from boys aged 1, 2, and 7 years in order to elucidate the cellular variety of the testis during human development. We created a timetable for the maturation of the testes and compared it across various age groups. By using established markers for testicular cells, 17 distinct clusters of cells were identified by UMAP embedding of the combined datasets (Fig. 1). As seen in Fig. 8, the cellular components included germ cells (DDX4+, ID4+), Sertoli cells (AMH+, SOX9+), myoid cells (ACTA2+, RGS5+), macrophages (CD14+), and endothelial cells (PECAM1+). Leydig cells and somatic precursors were suggested by the co-expression of DLK1 and NR2F2 in clusters 13, 14, and 15, as shown in Fig. 8. During the prenatal (W6-W16), neonatal (2D-7D), and prepubertal (1Y-7Ys) phases, the somatic cell (Sertoli, Leydig, and somatic precursor) makeup was more prevalent than the germ cell composition, but both types of cells changed with age. The number of germ cell stages, namely clusters 2–4, rose throughout puberty, as seen in Fig. 7A. Changes in the structure and function of germline and somatic cells are synchronized with these changes in cell composition as the testis develops.

Fig. 8.

Fig. 8

Machine learning analysis to find Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation. (A) Box plot for finding SSC clustering analysis, (B) Box plot for finding SSC clustering analysis, (C) Plot for finding SSCs, and (D) coefficients analysis for finding SSC clustering

In order to get a better understanding of the variability of germ cells during testis development, we re-clustered 9,000 germ cells from clusters 1 through 4. The use of UMAP for unbiased clustering led to the discovery of fifteen cell clusters that reflect different stages of spermatogenesis. Various markers, including UTF1, ID4, and NANOS3, were used to identify SSC/progenitors or undifferentiated spermatogonia (Undiff SPG) in clusters 1–5. Cluster 6, which stands for differentiating spermatogonia, expresses CKIT and STRA8 genes but not meiotic genes. Figure 8B shows that clusters 7 and 8, which represent SCytes, have overexpressed SYCP3 and SPO11. Clusters 9–15, which comprised both short and long spermatids (SPtids), expressed TNP1 and PRM2 (Fig. 8). That spermatogenesis was complete by 14 years of age is supported by this (Fig. 8). Following integration, cells originating from different studies and sequencing platforms were well mixed within UMAP clusters, indicating effective mitigation of batch effects. No clustering driven by dataset identity or platform origin was observed, suggesting that technical variation did not dominate the integrated transcriptomic structure.

Despite differences in dataset size and cell-type composition across studies, the major germ and somatic cell populations—including SSCs, differentiating spermatogonia, spermatocytes, spermatids, Sertoli cells, Leydig cells, fibroblasts, macrophages, and endothelial cells—were consistently detected. Re-clustering of germ cells yielded reproducible SSC subpopulations across datasets, confirming that key biological signals were not driven by a single dominant study. These results indicate that the integrated scRNA-seq analysis is robust to batch effects and dataset imbalance and reliably captures biologically meaningful transcriptional heterogeneity in human spermatogenesis.

Machine learning analysis of SSCs

Machine learning models successfully distinguished SSCs from fibroblasts, with Random Forest achieving classification accuracies above 90% in cross-validation. Feature importance analysis consistently ranked hub genes, including FAM74F1, SMCP, and ADAD1, as top predictors of SSC identity. SVM classifiers validated the discriminative power of Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation–associated features. In addition, unsupervised approaches such as PCA and UMAP revealed clear transcriptional subclusters among SSCs, reflecting different developmental stages from undifferentiated to differentiating spermatogonia. Re-clustering of ~ 9,000 germ cells identified 15 distinct populations, consistent with key biological markers (e.g., UTF1, NANOS3, STRA8, and SYCP3). These findings highlight the ability of machine learning to uncover functional heterogeneity within SSC populations and support the role of hub Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulations as regulators of spermatogenesis (Fig. 8). For machine learning analyses, the feature set was derived from DEGs selected with |log2FC| > 1 and adjusted p-value < 0.05, in addition to the top 10% of genes by variance. These genes were normalized using DESeq2, which performed variance stabilization and log2 transformation to ensure consistency across samples.

Construction of weighted gene Co-Expression modules

In order to find functional clusters in SSCs, we built gene co-expression networks using the WGCNA technique. Five separate modules in the SSC, each given its own color, were recently discovered. It must be noted that the color gray was used to indicate a single module that was not included in any cluster. Then, to assess the relationship between modules and characteristics, we made a heatmap. Figures 6 show the connections between characteristics and modules. According to this study, the SSC’s brown module (r = 0.61, p = 9 × 10 − 51) and pink module (r = 0.9, p = 1 × 10 − 10) have the strongest relationship with normal tissues (Fig. 9). Using the selected soft-threshold power (β = X), a scale-free co-expression network was successfully constructed, satisfying the scale-free topology criterion (R² > 0.85). Hierarchical clustering identified five distinct gene co-expression modules, excluding the gray module containing unassigned genes. Module–trait correlation analysis revealed that the brown and pink modules showed the strongest associations with SSC identity (brown: r = 0.61, p = 9 × 10⁻⁵¹; pink: r = 0.90, p = 1 × 10⁻¹⁰), indicating their potential biological relevance.

Fig. 9.

Fig. 9

A thorough analysis of the germ cell lineage transcriptome and the human testis single-cell transcriptome. (A) The UMAP map shows testicular cells from many time points in testicular development, from before birth to after birth, based on integrated single-cell RNA sequencing data. Each of the seventeen color-coded clusters represents a different somatic or germ cell lineage, and there are a total of 82,220 cells in the testes. (B) The expression patterns of key gene markers for somatic and germ cell lineages are shown in color-coded UMAP plots. (C) Plots are color-coded to indicate the expression patterns of relevant markers for spermatogonia, spermatocytes, and spermatids, and the UMAP representation of germ cell clusters is further classified by age

Module preservation analysis demonstrated that the SSC-associated modules were robust and reproducible. The brown and pink modules exhibited high preservation statistics (Zsummary > 10), confirming their stability across datasets and permutation testing. Other modules showed moderate preservation (Zsummary between 2 and 10), while no key SSC-associated module fell below the threshold for weak preservation. Robustness analyses further confirmed that module structure and eigengene–trait relationships remained consistent across alternative soft-threshold powers and resampled datasets. These findings indicate that the identified SSC-related modules are not artifacts of parameter selection or sample composition, but instead reflect stable underlying biological co-expression patterns. Module–trait correlation analysis identified several co-expression modules associated with SSC identity. After Benjamini–Hochberg FDR correction, the pink module showed the strongest association (r = 0.90, adjusted q = 1.2 × 10⁻¹¹), followed by the brown module (r = 0.61, adjusted q = 3.8 × 10⁻⁸). The blue module retained a weaker but statistically significant association after correction (adjusted q = 0.041). In contrast, the turquoise (q = 0.087) and yellow (q = 0.19) modules did not remain significant following FDR adjustment. These results indicate that the pink and brown modules represent the most robust SSC-associated gene co-expression networks.

Cell–Cell interactions by Ligand–Receptor interactions of regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation

We then evaluated possible interconnections among all cell types present in the tumor microenvironment after identifying several cell types. Approximately 1800 interactions that have been verified and supported by science were used. Among these interactions are those between integrins in the extracellular matrix (ECM) and various chemokines, cytokines, receptor tyrosine kinases (RTKs), and tumor necrosis factors (TNFs). Because of their significance in cancer immunology, we also manually incorporated known B7 family interactions.

By looking for cases when different cell types within the tumor microenvironment expressed both parts of a certain ligand-receptor interaction, we were able to find similar cell-cell interactions in each of the six syngeneic tumor models (Fig. 8). To score the connections, researchers took the average expression levels of both the receptor and the ligand in the cell types under study and multiplied them together. This method is called scoring ligand-receptor interactions. We used the average expression value for every cell type to reduce the possibility of false negatives caused by zero dropouts. We calculated the mean interaction score across all tumor models by assigning scores to each tumor. Because of this, we were able to identify preserved relationships. We used a one-sided Wilcoxon rank-sum test and the Benjamani-Hochberg multiple hypothesis correction to assess the statistical significance of each interaction score after analyzing various cell-cell interactions (approximately 1500 ligand-receptor pairs translated to mouse homologs and 64 cell type combinations). We looked at interactions involving all known cell types, but we zeroed in on cases where macrophages or cancer-associated fibroblasts (CAFs) released the ligand. This choice was made because different ligands mostly originate from these two cell types. Additionally, we thoroughly investigated all interactions involving SSC cells (Fig. 10).

Fig. 10.

Fig. 10

The metabolic patterns seen in humans are comparable to those shown in mouse pup and adult spermatogonia, according to single-cell RNA sequencing research. The following genes are involved: (A) FAM74F1, (B) SPATA6, (C) SPATA6, (D) SMCP, and (E) ODF2. Their possible functions in SSCs are suggested by their upregulation

The favorable correlation between tumor growth and several interactions involving the extracellular matrix (ECM) was established. Interactions between cancer-associated fibroblasts (CAFs) and endothelial cells were shown to be significantly correlated with the pace of tumor development. In turn, these connections stimulate collagen production, which in turn binds to integrin or Cd93 receptors on cancer cells. Furthermore, there was a robust correlation between tumor growth rate and the presence of integrin-interacting intercellular adhesion molecules (ICAMs), indicating the possibility of additional interactions associated to adhesion. Separate associations between cancer and ADAM12 and ADAM15 protease expression were also observed. When ADAM12 interacted with its integrin substrates, tumor growth was inversely linked, whereas when ADAM15 interacted with integrin beta 3 (ITGB3), tumor growth was favorably connected. Along with this, we found many chemokine and cytokine interactions that play a role in tumor development. There was a significant correlation between tumor development and increased production of CCL11 by tumor cells. This protein binds to CCR5 or CXCR3 receptors found on tumor cells as well as macrophages. On the other hand, tumor cell expression of interleukin 1 alpha (IL1A) receptors (IL1R1, IL1R2, and IL1RAP) was negatively correlated with IL1A production by CAFs. It seems that this connection affects the kinetics of tumor development. Figure 9 shows that the pace of tumor development was correlated with many interactions involving receptor tyrosine kinases (RTKs), which were relatively rare among the prominent interactions (Fig. 11). SSC cells secreted epidermal growth factor (EGF) and expressed Erbb3 receptors at the same time, indicating an autocrine connection. This connection demonstrated a one-to-one relationship with carcinogenesis. Furthermore, vascular endothelial growth factor (VEFG) ligands were positively correlated with the production of platelet-derived growth factor (PDGF) by cancer-associated fibroblasts (CAFs). These ligands have the ability to attach to tumor cell PDGF (Pdgfrb) and VEFG (Kdr/Vegfrr2) receptors. As a result of this link, tumor development is enhanced (Fig. 12).

Fig. 11.

Fig. 11

Single-cell RNA sequencing analysis. (A) CRISP2, (B) CAPZA3, (C) SYCP1, (D) AKAP4, and (E) ADAD1 are downregulated, suggesting their potential roles in SSCs

Fig. 12.

Fig. 12

Cell-cell communication analysis. (A) cell-cell communication related to aging , (B) network cell-cell communication, (C) Cell-cell communication networks, and (D) Cell-cell communication whole networks

Assessment of donor effects in integrated scRNA-seq data

After integration, cells from different donors were well intermixed within UMAP clusters, indicating that clustering was driven by biological cell identity rather than donor origin. No clusters were dominated by a single donor, and SSC subpopulations were consistently detected across multiple donors. To further assess donor influence, marker gene expression patterns and SSC-associated transcriptional programs were examined across donors and showed high concordance, supporting the robustness of the inferred cell states. These findings indicate that donor-specific effects were effectively controlled and did not confound downstream analyses.

Machine learning model performance

The Random Forest model achieved an accuracy of 92%, with a ROC-AUC of 0.91, precision of 0.89, and recall of 0.93. Similarly, the Support Vector Machine (SVM) model achieved an accuracy of 91%, with a ROC-AUC of 0.90, precision of 0.88, and recall of 0.92. These results indicate strong discriminative power and balanced performance across both positive and negative class predictions.

Machine learning model performance and comparison

Both supervised machine learning models demonstrated strong and consistent performance in distinguishing SSCs from fibroblasts. The Random Forest model achieved a mean accuracy of 92% (95% CI: 89–94%), with a ROC–AUC of 0.91 (95% CI: 0.88–0.94), precision of 0.89 (95% CI: 0.86–0.92), and recall of 0.93 (95% CI: 0.90–0.95). Similarly, the Support Vector Machine achieved a mean accuracy of 91% (95% CI: 88–93%), with a ROC–AUC of 0.90 (95% CI: 0.87–0.93), precision of 0.88 (95% CI: 0.85–0.91), and recall of 0.92 (95% CI: 0.89–0.94). Paired statistical comparisons across cross-validation folds revealed no statistically significant differences between RF and SVM for any evaluated metric (paired Wilcoxon signed-rank test, all adjusted p values > 0.05). Although RF showed marginally higher mean values for ROC–AUC and recall, these differences fell within the estimated confidence intervals and reflect expected variability due to sampling rather than true model superiority. Together, these results indicate that both RF and SVM exhibit comparable and robust discriminative performance, supporting the stability of the identified regulatory and hub gene signatures underlying SSC identity (Table 1).

Table 1.

Representative SSC-Associated genes identified in this study

Gene Symbol Regulation in SSCs Functional Category Reported Role in Spermatogenesis/SSC Biology Supporting Analysis
DAZL Downregulated RNA-binding protein Germ cell development and translational control Microarray, GO
SYCP1 Downregulated Meiotic protein Synaptonemal complex formation during meiosis Microarray, PPI
ADAD1 Downregulated RNA metabolism Regulation of meiotic progression in germ cells PPI, ML
SMCP Downregulated Mitochondrial protein Sperm mitochondrial sheath formation Microarray
ODF2 Downregulated Cytoskeletal protein Flagellar structure and sperm motility PPI
AKAP4 Downregulated Scaffold protein Regulation of flagellar signaling and motility PPI, GO
SPATA6 Downregulated Acrosomal protein Acrosome formation and sperm head shaping Microarray
FAM74F1 Downregulated SSC marker Associated with undifferentiated spermatogonia scRNA-seq, ML
MMP3 Upregulated ECM remodeling Niche signaling and extracellular matrix regulation PPI, WGCNA
CAV1 Upregulated Membrane signaling Cell–cell communication and mechanosignaling PPI, WGCNA
TGFBR2 Upregulated Growth factor receptor Regulation of SSC fate and differentiation KEGG, PPI
LOX Upregulated ECM-modifying enzyme Structural niche remodeling PPI
DKK1 Upregulated WNT signaling inhibitor Regulation of stem cell signaling balance KEGG
CD44 Upregulated Adhesion molecule Cell–matrix interaction within SSC niche GO, PPI

Discussion

The lengthy in vivo growth process and limited ethical considerations make it challenging to understand how human SSCs are formed at this time [44]. On the other hand, human SSC development and male infertility might be better understood if research on ESC differentiation to SSCs in vitro is conducted. Transgenic technologies, chemical induction reagents, and a variety of mice models have all been used to induce embryonic stem cell (ESC) differentiation into polarized germ cells and stellate cells in recent years [2, 45]. However, in vitro induction’s poor effectiveness and lack of well-characterized induction factors have long been significant limitations of this technique. Consequently, there is an immediate need to identify, from a fresh angle, the critical components that impact the in vitro differentiation of ESCs into SSCs. This work aimed to identify gene expression involved in SSC formation by integrating chromatin property data (ATAC-seq, DNase-seq, ChIP-seq) with gene expression data (RNA-seq, microarray data). Consequently, it was discovered that a number of important TFs unique to SSCs and the genes that regulate their targeting hub may have a role in controlling the in vitro differentiation of ESCs into SSCs.

Our analysis of the accessible chromatin in mouse embryonic stem cells (ESCs), primordial germ cells (PGCs), and spermatozoa revealed that many peaks were at the transcription start site (TSS) and that gene expression changed in space and time during the male germ cell developmental phases. Deciphering the transcriptional regulatory codes governing male germ cell development may also be advanced by the merging of data on chromatin properties with data on gene expression. We discovered TFs unique to SSCs that are shared across humans and mice and built TF-mediated GRNs during SSC formation using combined chromatin property data (ATAC-seq, DNase-seq, ChIP-seq) and gene expression data (RNA-seq, microarray data). Previous research has shown that TFs in GRNs have almost equivalent functions in mice and humans. Kim et al. discovered something interesting: phenotypic differences between mice and humans are partly caused by the rewiring of GRNs. This research compares human fibroblasts to spermatogonial stem cells (SSCs), examines their selection, culture circumstances, and gene expression patterns in great detail. Our research adds to what is already known about SSC biology, which helps us comprehend their cellular connections, developmental dynamics, and molecular signature.

In order to enhance spermatogonia populations, CD49f-MACS and matrix selection approaches were used to optimize the selection and culture of SSCs. Results from the DDX4 (VASA) assay were positive, confirming the existence of spermatogonia. These cultures maintained undifferentiated SSCs across time as shown by markers including UTF1, STELLA, and SSEA4. Strengthening the strength of the selection technique, the distinctive shape of spermatogonia stayed constant across various individuals. Also, consistent with established spermatogonial behavior, intercellular bridges were clearly visible. Human testicular fibroblasts (htFibs) were successfully removed from some populations, which provided further evidence that our isolation procedures were specific. Microarray research revealed a number of genes that varied in expression between fibroblasts and SSCs. Significant enrichment was seen in pathways associated with gamete production, microtubule-based motility, and spermatogenesis, among 126 genes that were downregulated and 92 that were upregulated. A different molecular identity for SSCs compared to fibroblasts is suggested by the downregulation of genes related to intracellular transport, cell adhesion, and protein localization. Findings like this shed light on the genetic terrain that differentiates SSCs in the testicular milieu from somatic cells.

SSCs play an important role in microtubule dynamics, and GO enrichment analysis brought attention to important biological processes such tubulin binding and motile cilium assembly. Metabolic control and gamete development were also linked, according to pathway analysis. Hub genes including FAM74F1, SPATA6, SMCP, ODF2, CRISP2, CAPZA3, SYCP1, AKAP4, and ADAD1 were discovered by PPI network analysis; these genes are involved in the function and maintenance of SSCs. Clusters including metabolic pathways, ribosomal function, and cell cycle control were discovered by the functional module identification using MCODE and FunRich. These results provide credence to the complex molecular interactions that are necessary for SSC maintenance and development. Although anchor-based integration substantially reduces batch effects, we acknowledge that residual technical variability may persist when combining datasets generated using different platforms and protocols. However, the consistency of cell-type identification, marker gene expression, and SSC subpopulation structure across independent datasets supports the biological validity of our findings. Future studies with harmonized experimental designs and larger balanced cohorts may further refine SSC transcriptional landscapes.

FAM74F1, SPATA6, SMCP, ODF2, CRISP2, CAPZA3, SYCP1, AKAP4, and ADAD1 are crucial genes involved in spermatogenesis, the complex process of sperm cell development. These genes play distinct roles in various stages of spermatogenesis, from germ cell differentiation to sperm maturation. For instance, SYCP1 is a critical component of the synaptonemal complex, which is essential for homologous chromosome pairing and recombination during meiosis [46]. Similarly, ADAD1 is an RNA-binding protein involved in regulating gene expression in germ cells, ensuring the proper progression of meiosis and differentiation into spermatozoa [47]. During sperm development, structural and functional proteins contribute to sperm motility and stability. AKAP4 (A-kinase anchoring protein 4) is a key scaffold protein that regulates flagellar structure and energy metabolism, facilitating sperm motility [48]. ODF2 (Outer Dense Fiber Protein 2) and SMCP (Sperm Mitochondria-Associated Cysteine-Rich Protein) are involved in the structural integrity of sperm flagella and mitochondrial sheath, ensuring efficient sperm movement [49]. Meanwhile, CRISP2 (Cysteine-Rich Secretory Protein 2) plays a role in sperm-egg interaction, influencing fertilization potential. SPATA6 (Spermatogenesis Associated 6) is another essential protein required for acrosome function, which is necessary for successful egg penetration [50, 51]. Additionally, proteins such as FAM74F1 and CAPZA3 (Capping Actin Protein of Muscle Z-line Alpha Subunit 3) regulate cytoskeletal dynamics during spermatid differentiation [31]. CAPZA3, in particular, is involved in actin filament organization, contributing to the proper shaping of sperm heads and tails. These genes collectively ensure that sperm cells acquire the appropriate structural and functional attributes necessary for successful fertilization [29, 37, 52]. Understanding their roles in spermatogenesis is vital for diagnosing and addressing male infertility, as mutations or dysregulation of these genes can lead to impaired sperm function and reduced fertility.

We were able to shed light on the SSC developmental pathway via our scRNA-seq investigation. Different cellular clusters reflecting different spermatogenic phases were observed in the research, with the shift from undifferentiated spermatogonia to mature spermatozoa being particularly highlighted. The presence of important markers including UTF1, ID4, and NANOS3 in the germ cell clusters verified their status as SSCs or progenitors. In addition, study of differential expression by age group revealed that metabolic and transcriptional activities undergo dynamic changes as SSCs mature. The diversity and potential for differentiation of SSCs are better understood as a result of these results. The emergence of extracellular matrix and signaling proteins as top-ranked hubs underscores the importance of the SSC niche in shaping transcriptional states. Proteins such as MMP3, CAV1, and LOX likely influence SSC fate indirectly by modulating cell–cell interactions, mechanosensitive signaling pathways, and downstream transcriptional and post-transcriptional processes. In contrast, Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulations act at later regulatory layers, refining gene expression outputs rather than serving as dominant network hubs. This layered regulatory architecture highlights the complementary roles of extracellular signaling and RNA metabolism in SSC biology.

Our examination of cell-cell communication revealed significant ligand-receptor interactions inside the testicular microenvironment. Important signaling connections with SSCs were shown by fibroblasts and macrophages, which may have affected the dynamics of their niche. Recognizing common receptor-ligand interactions provides clues about potential regulatory mechanisms that control SSC differentiation and self-renewal [39, 53]. Additionally, the incorporation of fibroblast interactions linked with tumors highlights the significance of stromal inputs on SSC behavior. Not all aspects of the research were considered [54, 55]. To start, we discovered five shared transcription factor (TF) motifs for SSCs in humans and mice, but none for ESCs or PGCs. This led researchers to zero in on SSC-specific genes and TFs in their follow-up investigations.

Conclusion

Immunocytochemistry, microarray analysis, gene ontology enrichment, PPI networks, and scRNA-seq are just a few of the molecular and cellular techniques used to find Regulatory networks and hub genes governing SSC identity, including components involved in post-transcriptional regulation in this research. These results shed light on the molecular markers of SSCs, the dynamics of gene expression (FAM74F1, SPATA6, SMCP, ODF2, CRISP2, CAPZA3, SYCP1, AKAP4, and ADAD1), and their interactions with the testicular milieu, therefore improving our knowledge of SSC biology.

Acknowledgements

The authors acknowledge the use of OpenAI’s ChatGPT for language refinement and grammar editing in the preparation of this manuscript. The authors reviewed and verified the content generated by the AI to ensure accuracy and integrity.

Author contributions

DHK: writing—original draft preparation, statistical and bioinformatics analyses, formal analysis and investigation; MO: Rewriting the manuscript and reanalyzing the data, HA: Conceptualization, and manuscript editing; TS: funding acquisition, collection and processing of clinical samples, and manuscript editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the DFG Grant SK 49/10 − 1 and Amol University of Special Modern Technologies Grant 14/20/14696.

Data availability

The data have been deposited in https:/zenodo.org/records/15520688.

Code availability

The R code has been deposited in https:/zenodo.org/records/15520688.

Declarations

Ethics approval and consent to participate

For this study, the Amol University of Special Modern Technologies, Biotechnology Ethics Committee approved the use of animals (Ir.ausmt. rec. 1404.01) on July 5, 2024 (Title of the approved project: hssc.ethics.05). The work has been reported in line with the ARRIVE guidelines 2.0.

Conflict of interest

It is declared by the remaining authors that there are no commercial or financial relationships that might conflict with the research.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Cox C, et al. Infertility prevalence and the methods of Estimation from 1990 to 2021: a systematic review and meta-analysis. Hum Reprod Open. 2022;2022(4):hoac051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Amirian M, et al. VASA protein and gene expression analysis of human non-obstructive azoospermia and normal by immunohistochemistry, immunocytochemistry, and bioinformatics analysis. Sci Rep. 2022;12(1):17259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Diao L, et al. Roles of spermatogonial stem cells in spermatogenesis and fertility restoration. Frontiers in Endocrinology; 2022. p. 13. [DOI] [PMC free article] [PubMed]
  • 4.Azizi H, Hashemi Karoii D, Skutella T. Whole exome sequencing and in silico analysis of human Sertoli in patients with non-obstructive azoospermia. Int J Mol Sci. 2022;23(20):12570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sasaki K, Sangrithi M. Developmental origins of mammalian spermatogonial stem cells: new perspectives on epigenetic regulation and sex chromosome function. Mol Cell Endocrinol. 2023;573:111949. [DOI] [PubMed] [Google Scholar]
  • 6.Mahboudi S, et al. miR-106b enhances human mesenchymal stem cell differentiation to spermatogonial stem cells under germ cell profile genes involved in TGF-b signaling pathways. Vitro Cell Dev Biology-Animal. 2022;58(7):539–48. [DOI] [PubMed] [Google Scholar]
  • 7.Mouka A, et al. iPSCs derived from infertile men carrying complex genetic abnormalities can generate primordial germ-like cells. Sci Rep. 2022;12(1):14302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shi K, et al. Integrated bioinformatics analysis of the transcription factor-mediated gene regulatory networks in the formation of spermatogonial stem cells. Front Physiol. 2022;13:949486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu J-X, et al. Stem cell therapies for human infertility: advantages and challenges. Cell Transplant. 2022;31:09636897221083252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lu C, et al. Application of Single-Cell assay for Transposase-Accessible chromatin with high throughput sequencing in plant science: Advances, technical Challenges, and prospects. Int J Mol Sci. 2024;25(3):1479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Brastianos HC et al. Determining the Impact of Spatial Heterogeneity on Genomic Prognostic Biomarkers for Localized Prostate Cancer. Eur. Urol. Oncol. (2020). (in press). [DOI] [PubMed]
  • 12.Mattick JS, et al. Long non-coding rnas: definitions, functions, challenges and recommendations. Nat Rev Mol Cell Biol. 2023;24(6):430–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Tao Y, et al. Alternative splicing and related RNA binding proteins in human health and disease. Signal Transduct Target Therapy. 2024;9(1):26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhou W-Y, et al. Circular RNA: metabolism, functions and interactions with proteins. Mol Cancer. 2020;19(1):172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen Y, et al. Chromatin accessibility: biological functions, molecular mechanisms and therapeutic application. Signal Transduct Target Therapy. 2024;9(1):1–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cheng K, et al. Unique epigenetic programming distinguishes regenerative spermatogonial stem cells in the developing mouse testis. iScience. 2020;23(10):101596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nair V, et al. Differential analysis of chromatin accessibility and gene expression profiles identifies cis-regulatory elements in rat adipose and muscle. Genomics. 2021;24. 10.1016/j.ygeno.2021.09.013. [DOI] [PubMed] [Google Scholar]
  • 18.Azizi H, Hashemi Karoii D, Skutella T. Clinical management, differential diagnosis, follow-up and biomarkers of infertile men with nonobstructive azoospermia. Transl Androl Urol. 2024;13(2):359–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Davoodi Nik B, et al. Differential expression of ion channel coding genes in the endometrium of women experiencing recurrent implantation failures. Sci Rep. 2024;14(1):19822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Man YG, et al. Tumor-infiltrating immune cells promoting tumor invasion and metastasis: existing theories. J Cancer. 2013;4(1):84–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ibtisham F, Honaramooz A. Spermatogonial stem cells for in vitro spermatogenesis and in vivo restoration of fertility. Cells. 2020. 10.3390/cells9030745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Karimizadeh E, et al. Analysis of gene expression profiles and protein-protein interaction networks in multiple tissues of systemic sclerosis. BMC Med Genom. 2019;12(1):199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nie X, et al. Single-cell analysis of human testis aging and correlation with elevated body mass index. Dev Cell. 2022;57(9):1160–e11765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hashemi Karoii D, Azizi H. OCT4 protein and gene expression analysis in the differentiation of spermatogonia stem cells into neurons by immunohistochemistry, immunocytochemistry, and bioinformatics analysis. Stem Cell Rev Rep. 2023;3. 10.1007/s12015-023-10548-8. [DOI] [PubMed] [Google Scholar]
  • 25.Hashemi Karoii D, Azizi H. Functions and mechanism of noncoding RNA in regulation and differentiation of male mammalian reproduction. Cell Biochem Funct. 2023;41(7):767–78. [DOI] [PubMed] [Google Scholar]
  • 26.Hashemi Karoii D, Azizi H, Skutella T. Integrating microarray data and single-cell RNA-Seq reveals key gene involved in spermatogonia stem cell aging. Int J Mol Sci. 2024;25(21):11653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hashemi Karoii D, et al. Alteration of the metabolite interconversion enzyme in sperm and Sertoli cell of non-obstructive azoospermia: a microarray data and in-silico analysis. Sci Rep. 2024;14(1):25965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hashemi Karoii D, et al. Exploring the interaction between immune cells in the prostate cancer microenvironment combining weighted correlation gene network analysis and single-cell sequencing: an integrated bioinformatics analysis. Discover Oncol. 2024;15(1):513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Karoii DH, Azizi H, Amirian M. Signaling pathways and protein–protein interaction of vimentin in invasive and migration cells: a review. Cell Reprogram. 2022;24(4):165–74. [DOI] [PubMed] [Google Scholar]
  • 30.Karoii DH, Azizi H, Skutella T. Whole transcriptome analysis to identify non-coding RNA regulators and hub genes in sperm of non-obstructive azoospermia by microarray, single-cell RNA sequencing, weighted gene co-expression network analysis, and mRNA-miRNA-lncRNA interaction analysis. BMC Genomics. 2024;25(1):583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hashemi Karoii D, et al. Identification of novel cytoskeleton protein involved in spermatogenic cells and Sertoli cells of non-obstructive azoospermia based on microarray and bioinformatics analysis. BMC Med Genom. 2025;18(1):19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hashemi Karoii D, Azizi H, Skutella T. Altered G-protein transduction protein gene expression in the testis of infertile patients with nonobstructive azoospermia. DNA Cell Biol. 2023. 10.1089/dna.2023.0189. [DOI] [PubMed] [Google Scholar]
  • 33.Hashemi Karoii D, Azizi H, Skutella T. Microarray and in silico analysis of DNA repair genes between human testis of patients with nonobstructive azoospermia and normal cells. Cell Biochem Funct. 2022;40(8):865–79. [DOI] [PubMed] [Google Scholar]
  • 34.Karoii DH, et al. Machine learning and multi-omics–based identification of hub paternal imprinted genes PEG11/RTL1, PEG9/DLK1, PEG6/NDN, and PEG5/NNAT in sperm of couples experiencing idiopathic recurrent pregnancy loss. Comput Biol Med. 2026;200:111369. [DOI] [PubMed] [Google Scholar]
  • 35.Hashemi Karoii D, et al. Microarray and Single-Cell RNA sequencing reveals G-Protein gene expression signatures of spermatogonia stem cell. Stem Cell Reviews Rep. 2025;21(7):2136–56. [DOI] [PubMed] [Google Scholar]
  • 36.Karoii DH, Azizi H, Skutella T. Integrating microarray data and single-cell RNA-seq reveals correlation between kit and Nmyc in mouse spermatogonia stem cell population. Front Cell Dev Biology. 2025;13:1634347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Shams AA, et al. Paternal trans fatty acid and vitamin E diet affect the expression pattern of androgen signaling pathway genes in the testis of rat offspring. Theriogenology. 2025;231:1–10. [DOI] [PubMed] [Google Scholar]
  • 38.Hashemi Karoii D, Azizi H. A review of protein-protein interaction and signaling pathway of vimentin in cell regulation, morphology and cell differentiation in normal cells. J Recept Signal Transduct Res. 2022;42(5):512–20. [DOI] [PubMed] [Google Scholar]
  • 39.Niazi Tabar A, et al. Testicular localization and potential function of vimentin positive cells during spermatogonial differentiation stages. Animals. 2022;12(3):268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Malcher A, et al. Potential biomarkers of nonobstructive azoospermia identified in microarray gene expression analysis. Fertil Steril. 2013;100(6):1686–94. [DOI] [PubMed] [Google Scholar]
  • 41.Okada H, et al. Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet. 2008;4(2):e26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Hodžić A, et al. Transcriptomic signatures for human male infertility. Front Mol Biosci. 2023;10:1226829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Balagannavar G, et al. Transcriptomic analysis of the Non-Obstructive azoospermia (NOA) to address gene expression regulation in human testis. Syst Biol Reprod Med. 2023;69(3):196–214. [DOI] [PubMed] [Google Scholar]
  • 44.Jabari A, et al. Three-dimensional co-culture of human spermatogonial stem cells with Sertoli cells in soft agar culture system supplemented by growth factors and laminin. Acta Histochem. 2020;122(5):151572. [DOI] [PubMed] [Google Scholar]
  • 45.Liu W, et al. Microenvironment of spermatogonial stem cells: a key factor in the regulation of spermatogenesis. Stem Cell Res Ther. 2024;15(1):294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hermann BP, et al. The mammalian spermatogenesis Single-Cell Transcriptome, from spermatogonial stem cells to spermatids. Cell Rep. 2018;25(6):1650–e16678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Islam KN, et al. The RNA-binding protein Adad1 is necessary for germ cell maintenance and meiosis in zebrafish. PLoS Genet. 2023;19(8):e1010589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Miki K, et al. Targeted disruption of the Akap4 gene causes defects in sperm flagellum and motility. Dev Biol. 2002;248(2):331–42. [DOI] [PubMed] [Google Scholar]
  • 49.Donkor FF, et al. Outer dense fibre protein 2 (ODF2) is a self-interacting centrosomal protein with affinity for microtubules. J Cell Sci. 2004;117(Pt 20):4643–51. [DOI] [PubMed] [Google Scholar]
  • 50.Brukman NG, et al. Fertilization defects in sperm from Cysteine-rich secretory protein 2 (Crisp2) knockout mice: implications for fertility disorders. Mol Hum Reprod. 2016;22(4):240–51. [DOI] [PubMed] [Google Scholar]
  • 51.Gonzalez SN, et al. Cysteine-rich secretory proteins (CRISP) are key players in mammalian fertilization and fertility. Front Cell Dev Biol. 2021;33. 10.3389/fcell.2021.800351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Geyer C, et al. A missense mutation in the Capza3 gene and disruption of F-actin organization in spermatids of repro32 infertile male mice. Dev Biol. 2009;330:142–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Jin C, et al. Decoding the spermatogonial stem cell niche under physiological and recovery conditions in adult mice and humans. Sci Adv. 2023;9(31):eabq3173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Hashemi Karoii D, et al. Analysis of microarray and single-cell RNA-seq identifies gene co-expression, cell–cell communication, and tumor environment associated with metabolite interconversion enzyme in prostate cancer. Discover Oncol. 2025;16(1):177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Guo X, et al. Stromal fibroblasts activated by tumor cells promote angiogenesis in mouse gastric cancer. J Biol Chem. 2008;283(28):19864–71. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data have been deposited in https:/zenodo.org/records/15520688.

The R code has been deposited in https:/zenodo.org/records/15520688.


Articles from Clinical and Experimental Medicine are provided here courtesy of Springer

RESOURCES