Summary
Spermatogenesis is a highly regulated process that produces sperm to transmit genetic information to the next generation. Although extensively studied in mice, our current understanding of primate spermatogenesis is limited to populations defined by state-specific markers from rodent data. As between-species differences have been reported in the duration and differentiation hierarchy of this process, it remains unclear how molecular markers and cell states are conserved or have diverged from mice to man. To address this challenge, we employ single-cell RNA-sequencing to identify transcriptional signatures of major germ and somatic cell-types of the testes in human, macaque, and mice. This approach reveals similarities and differences in expression throughout spermatogenesis, including the stem/progenitor pool of spermatogonia, markers of differentiation, potential regulators of meiosis, RNA turnover during spermatid differentiation, and germ cell-soma communication. These datasets provide a rich foundation for future targeted mechanistic studies of primate germ cell development and in vitro gametogenesis.
Graphical Abstract

Introduction
Sperm are highly specialized terminally differentiated cells that carry genetic information from father to offspring, thus providing a continuous link between the past and future of a species. In all male mammals, the foundational unit of fertility is the spermatogonial stem cell (SSC), which balances self-renewal with differentiation to sustain continuous sperm production throughout a male’s adult life. Isolated SSCs divide to form networks of connected cells which progress through a series of mitotic and meiotic divisions, followed by post-meiotic morphological changes of spermiogenesis to produce mature sperm. Despite the global similarities in spermatogenesis program, it remains difficult to translate knowledge generated in mice to higher primates. For example, the classical terminology for spermatogonia cells in rodents (Asingle (As), Apaired (Apr), Aaligned (Aal), A1–4, Int, B) (Huckins, 1971, Oakberg, 1971, de Rooij, 1973) is different than that in monkeys (Adark, Apale, B1–4) and humans (Adark, Apale, B) (Clermont, 1966a, Clermont, 1966b, Clermont, 1972, Clermont, 1969). Additionally, there are numerous interspecies differences, including the histological organization of the testis, duration of the seminiferous epithelium cycle, dynamics of stem cell renewal and differentiation, and the quantity and morphology of sperm produced (reviewed in (Fayomi and Orwig, 2018, Ramm et al., 2014)). Therefore, to translate the knowledge from mice to the study of spermatogenesis and fertility in higher primates, and to make better use of the diverse genetic and physiological tools in non-human models, we need a comprehensive and unbiased analysis of the gametogenesis process that allows identification and direct comparison of equivalent cell types and states.
Here we analyzed ~14K adult human and ~22K macaque testicular cells, and combined these datasets with our previously published mouse single cell data (Green et al., 2018). By defining homologous cell types across species using hundreds to thousands of genes, we uncovered both known and understudied cell types, including transient cell states too rare to be detected with low-throughput approaches. This leads to a high-resolution three-species atlas with aligned somatic and germ cell types/states. First, we describe six molecularly defined consensus spermatogonia states (SPG1–6) that are conserved in mice, monkeys and humans, providing a harmonized vocabulary to facilitate cross-species comparisons. Furthermore, we use a cell surface marker of SPG1, TSPAN33, to isolate and enrich for transplantable human spermatogonia, although colonizing potential is not limited to this fraction. Second, we generate a mammalian spermatogenesis pseudotime map and describe several conserved and diverged molecular pathways operating within or across species. Finally, we describe between-species differences in somatic cell type relatedness/plasticity, functional roles, and their potential communications with the germline. Overall, this study provides a systematic single-cell comparative analysis of the spermatogenesis program between primates and rodents. Such a resource is expected to improve our knowledge base for future studies of germ cell development and inform fertility restoration efforts, including SSC culture and in vitro gametogenesis.
Results
Single-cell sequencing identifies major germ and somatic cell types of adult human and macaque testes.
Using the Drop-seq platform, we generated single cell transcriptome data from cryopreserved adult human and rhesus macaque testes, separately identifying major cell types and states in each species, and combining them with our previously reported mouse data (Green et al., 2018) to define evolutionarily conserved or diverged programs (Figure 1A). The doublet rates for the human and macaque datasets are estimated at <2%, allowing reliable analysis of individual cells (Figure S1A). Systematic comparisons of technical batches (e.g. Human 1.1–1.5, Figure S1B–C) and biological replicates (4 humans and 5 macaques, Figure S1D–E) confirmed high batch-to-batch or individual-to-individual concordance, despite the variable proportions of cell types in different samples, likely due to differences in tissue processing.
Figure 1. Overview of major cell types and cellular attributes inferred from single-cell RNA-Seq analyses of human and macaque testes.

A. Workflow overview of data collection and analysis.
B. Visualization of major testis cell types from global clustering of all 13,837 human (left) and 21,574 macaque (right) cells in UMAP (top). Inset: Focused clustering of 3,722 human (left) and 2,098 macaque (right) somatic cells identifies 7 distinct cell types (bottom).
C. Heatmap of marker gene expression in the 11 major cell types for human (left) and macaque (right). Shown are each gene’s mean expression in each cell type (i.e., the centroid), standardized over the 11 centroids. Representative markers are listed.
D. Distribution profiles of per-cell attributes compared across the 11 cell types, including mouse data from Green et. al. 2018. From top to bottom: nGene, total number of detected genes per cell; nUMI, total number of Unique Molecular Identifiers (UMI) per cell; %ChrX, percent of total UMIs from genes on the X chromosome.
E. Putative MSCI escapee genes, shown as red dots, in context of other highly variable genes (in gray). X and Y axis show the mean expression level in Scyte and SPG, respectively. The vertical line demarcates genes with an average expression >= 0.5 log normalized UMI. The diagonal solid line highlights genes with a fold increase >2 from spermatogonia to spermatocytes.
F. Intron FISH localizes RIBC1 transcripts (magenta) to the XY-body (H2AX, green) in macaque. Arrows indicate co-localization. Scale bars 10um.
By unsupervised clustering of human and macaque cells separately, we identified 5 major cell types in both species including a somatic cell cluster and four major germ cell types: spermatogonia (SPG), meiotic spermatocytes (SCytes), post-meiotic haploid round spermatids (STids) and elongating spermatids (Elong) (Figure 1B–C, Table S1). Focused re-clustering of somatic cells revealed seven major somatic cell types: macrophages, T cells, endothelial cells, peritubular myoid cells, two pericyte subpopulations (with either smooth muscle or ECM secreting properties), and a Leydig cell precursor population (Figure 1B inset, 1C, Table S1). Altogether, our atlas captures the majority of cell types in both species and allows us to directly compare testis cell types/states across three species and define recurrent or divergent molecular programs spanning 85 million years of mammalian evolution.
Between-species differences of cellular attributes defined by transcriptomic properties.
To better understand differences in cell attributes we extracted and compared several transcriptome-derived indices (Figure 1D). In both macaque and human testis, germ cells tend to express a larger number of genes (“nGene” in Figure 1D) than somatic cells, consistent with our previous findings in mice (Green et al., 2018) and prior bulk RNA-seq observations (Soumillon et al., 2013, Hammoud et al., 2014). Interestingly, the cell type(s) with the largest number of detected genes varies across species: spermatocytes and spermatids for macaques and mice, and spermatogonia for humans (Figure 1D), although the functional significance of this difference remains to be determined.
As germ cells tend to have higher number of total transcripts (“nUMI” in Figure 1D) sequenced per cell, this elevated “library size factor” could partially account for the higher number of genes detected. To discern the distribution pattern of transcripts within individual cells, we calculated a Gini Index, a measure of transcriptional inequality, for each cell and compared across cell types in all three species. Similar to our previous reports for mice (Green et al., 2018), somatic cells and the spermatogonia population in human and macaque have the lowest Gini index, which increases progressively in differentiating germ cell populations (Figure S1F–G). This trend indicates that more differentiated germ cells devote a higher fraction of their transcriptome to a narrower set of genes for specific biological functions (Teschendorff and Enver, 2017).
During meiosis, sex chromosomes are transcriptionally silenced via meiotic sex chromosome inactivation (MSCI) (Turner, 2007). In all three species, X-linked genes account for ~2–5% of the cell’s total transcripts in the somatic cells and spermatogonia, but this fraction is markedly lower in spermatocytes and partially recovers in the round and elongated spermatids, although to varying degrees in different species (“%X” in Figure 1D). Furthermore, the suppression of ChrX transcripts in spermatocytes is less complete in primates than in mice (“%X” in Figure 1D, Figure S1H), while the non-ChrX gene expression in spermatocytes and the ChrX gene expression in spermatogonia are comparable across species (Figure S1H). To identify MSCI escapee genes we required (1) a minimal expression level and (2) expression fold increase from spermatogonia to spermatocytes to ensure that the escapee genes are actively being transcribed, rather than protected from degradation (Methods, Figure 1E). Using these criteria we identified 16 human and 14 macaque genes that escape MSCI, whereas the same threshold in mice identified a single gene, Tsga8 (Table S1). These escapees do not enrich in the pseudoautosomal regions, rather are scattered along both ChrX arms and lack an expressed Y-chromosome homologue. Of the primate escapee genes, seven are shared between the two species. Intronic hybridization chain reaction probes (HCR) probes for the shared escapee RIBC1 were used to confirm the presence of newly generated transcripts in the XY body of pachytene spermatocytes of macaques (Figure 1F). The observed incomplete MSCI in human/macaque relative to mice is consistent with earlier cytological data (de Vries et al., 2012, Turner et al., 2005). Therefore, our single cell data extend these cytological observations and identify additional candidate genes escaping MSCI.
Iterative clustering of germ cells reveals analogous patterns of discrete and continuous developmental trajectories.
Previous re-clustering of 20K mouse germ cells identified a discrete cell type, SPGs, and a continuous developmental trajectory from Scytes to STids (Green et al., 2018). Here, a similar analysis in human and macaque revealed a clear partition between SPG and non-SPG germ cells for both species (Figure S1I–J) and similar rank correlation patterns (Figure S1K–L). A provisional clustering analysis divides the germ cells into seven clusters for each species (hGC1–7 and mGC1–7) (Figure S1I–J). Using the expression of known cell state markers, we annotate hGC1–2, mGC1–2 as spermatogonia, hGC3 and mGC3–4 as meiotic spermatocytes, hGC4–5, mGC5 as post-meiotic round, and hGC6–7, mGC6–7 as elongated spermatids (Table S2). While this seven-cluster solution will be subsequently updated by finer partitions when we zoom in to the SPGs and non-SPG cells separately (below), this mid-level analysis provides independent catalogs of cell types/states and comprehensive lists of molecular markers for human and macaque that are not restricted to 1:1:1 orthologous genes.
Provisional clustering of SPGs identifies four spermatogonial states for each species.
To gain a finer view of spermatogonia heterogeneity, we re-clustered human SPGs (cells in hGC1–2) and uncovered four transcriptionally distinct states, hSPG1–4 (Figure S1I inset, Table S2). The four hSPG clusters are highly correlated with the transcriptional states 0–4 recently described by Guo et al. (r = 0.83 – 0.93, Figure S2A) and similarly by others (Guo et al., 2018, Sohni et al., 2019, Hermann et al., 2018). Likewise, re-clustering of macaque spermatogonia (mGC1–2) uncovers four states: mSPG1–4 (Figure S1J inset, Table S2). A similar analysis of our mouse spermatogonia also identified four molecular states (Green et al., 2018). The parallel identification of four SPG states independently in each species raises the question as to whether these four SPG clusters map across species in a one-to-one fashion. In the past, a common practice to align cellular states across species was reliant on a small selection of markers. However, here we have the opportunity to perform transcriptome-wide alignment to identify equivalent cellular states.
Joint analysis of human, macaque, and mouse spermatogonia uncovers six molecularly analogous states.
To identify comparable molecular states across species we used 1-1-1 homologous genes to merge the data for 1688 human, 747 macaque, and 2174 mouse SPGs. Compared to single-species analyses, this three-species merging significantly increases the total number of SPG cells, allowing more sensitive detection of rarer cell states that may have been missed in individual species. Re-clustering of ~4500 SPGs identifies six consensus states, ordered as SPG1–6 (Figure 2A, S2B). Expression patterns of previously established conserved marker genes suggest that SPG1/2 are undifferentiated spermatogonia, SPG3–5 represent progressive stages of differentiation, and SPG6 consists of cells preparing for meiotic entry (Table S3).
Figure 2: Merged analysis of mouse, human and macaque spermatogonia identifies six molecular states (SPG1–6).

A. Re-clustering of 1688, 747, 2174 SPG cells from human, macaque and mouse, respectively identifies six molecular states, shown in t-SNE space.
B. Distribution of SPG cells across the six states, colored by species.
C. Biological annotation of SPG1–6 based on expression patterns of established conserved markers for different stages of spermatogonia. Shown are expression levels in individual cells without distinguishing by species.
D-E. Expression patterns of several newly identified noncanonical markers of spermatogonia states. Each column is for a gene, showing the expression heatmap over all cells without distinguishing by species (top), and the corresponding violin plots for each species (bottom). Note that VCX genes are primate-specific but lack annotated 1–1 orthologs in macaque.
F. Similarity among SPG states within and across species, shown as heatmaps of average Jaccard index among cell clusters.
G. Graphic summary of between-species comparison of state-dependent expression patterns for select marker genes, colored by species.
As expected, the three species are not distributed evenly across the six consensus states (Figure 2B, S2C, S2D upper panels). A close examination of the cells’ assignment to (1) the four SPG states initial defined in each species and (2) the six consensus states reveal the cells largely maintained developmental ordering, but also notable between-species differences of state occupancy (Figure S2D lower panels). Most cells in the undifferentiated SPG1–2 clusters are from macaque and human, whereas most cells in differentiating SPG3–4 are from mouse. All three species are represented in SPG5 and SPG6. The between-species shifts in state occupancy reflect differences in the relative size of the stem/progenitor pool and differing number of transit amplifying divisions (Figure S2C–E), consistent with previous observations by histology (reviewed in (Fayomi and Orwig, 2018)).
Consensus SPG1–2.
Although cells in SPG1 and SPG2 are similar to each other (Figure S2B) and at first glance express many of the well-established markers of undifferentiated spermatogonia (e.g., GFRA1, ZBTB16 (PLZF), ITGA6, ID4, UCHL1, LIN28, DPPA4 (macaque), FGFR3 and UTF1 (human) (Fig 2D, S2F, Table S3) (reviewed in (von Kopylow and Spiess, 2017, Fayomi and Orwig, 2018)), they enrich for distinct subsets of canonical and noncanonical markers that can be used to clearly distinguish SPG1 from SPG2 (Figure 2C,D,E). Specifically, SPG1 cells are enriched for MORC1, MSL3, ZBTB43, TSPAN33, TCF3, FMR1, MAGEB2 and CDK17 (Figure 2E (TSPAN33, PIWIL4, MORC1 shown), see Table S3)). Interestingly, PIWIL4 is a SPG1-specific marker for all three species (Figure 2D–E; Table S3). However, mouse contributes very few cells to SPG1 (~40 cells, or ~2% of all mouse spermatogonia, Figure S2C), too rare to have clustered independently in our earlier mouse-only analysis. The rarity of mouse SPG1 is consistent with prior knowledge that the stem/progenitor pool of spermatogonia in mice is much smaller than in primates (reviewed in (Fayomi and Orwig, 2018).) In SPG2 cells, the noncanonical markers enriched in SPG1 (described above) are diminished, whereas certain canonical markers for undifferentiated spermatogonia increase in expression (ZBTB16 (mouse and human), GFRA1 (all three species)), as do several noncanonical markers (DUSP6, TCF4, L1TD1 (macaque and human only)) (Figure 2D,E (L1TD1 shown); Table S3). Given these distinct patterns, we conclude that consensus SPG1–2 correspond to two distinct undifferentiated spermatogonia populations.
To validate newly-identified noncanonical SPG1–2 markers (TCF3, PIWIL4 (previously described (Sohni et al., 2019, Guo et al., 2018)) MAGEB2, CDK17, MORC1, DNAJB6, and FMR1), we co-stained them with UCHL1, a broad undifferentiated spermatogonia marker (Figure 3A–D, S3A–E, upper panels), or CKIT, a differentiating spermatogonia marker (Figure 3A–D, S3A–E, lower panels) in human testes tissue. Consistent with our scRNA-seq data predictions, the protein markers co-localize with UCHL1+ spermatogonia cells lining the basement membrane of the human testis (Figure 3A–D, S3A–E, upper panels), but less so with cKIT+ differentiating spermatogonia (Fig 3A–D, S3A–E, lower panels); confirming that SPG1–2 are undifferentiated spermatogonial cells.
Figure 3. Localization and functional characterization of primate undifferentiated spermatogonia.

A-D. Immunofluorescence co-staining analysis of SPG1–2 markers, TCF3 (A), MAGEB2 (B), CDK17 (C), and MORC1 (D), with undifferentiated spermatogonia marker UCHL1 (top) or differentiating spermatogonia marker cKIT (bottom) in human seminiferous epithelium cross-sections. Dot plots show the number of single positive (left, right), or double positive (middle) cells per tubule. Error bars represent standard deviation.
E-H. Co-localization analysis of SPG markers PIWIL4 (E), MORC1 (F) and TCF3 (G) with human spermatogonia subtypes: Adark, Apale, B-type, and Spermatocytes. Dot plots summarize the number of marker positive cells per cross section for each SPG subtype. H demonstrates background staining. Error bars represent standard deviation.
I. Workflow of human to mouse xenotransplantation experiments.
J. FACS gating strategy to enrich for TSPAN33 cells.
K. Number of colonies obtained from TSPAN33-negative and TSPAN33-positive cells, per 105 cells transplanted. At least 10 testes were counted per fraction from 3 different donors. Error bars represent standard deviation.
L. Normalization of colonies from xenotransplantation of FACS sorted cell fractions by total fraction size as estimated by FACS. Error bars represent standard deviation.
All scale bars = 50 μm.
See also Figure S3.
Next, we explored whether SPG1 and SPG2 represent cells in different phases of mitotic cycle. We used ~500 known cell cycle genes (Whitfield et al., 2002) to determine each cell’s likely phase (Figure S3F–G), and found that SPG1 and SPG2 are similarly enriched for cells in M/G1 and G1/S (Figure S3H). Thus, cell cycle state does not account for the difference between SPG1 and 2, rather they define two transcriptionally distinct states of undifferentiated spermatogonia.
Consensus SPG3–5.
A combination of markers is often used to identify differentiating spermatogonia populations across species (Fayomi and Orwig, 2018, von Kopylow and Spiess, 2017). States SPG3-SPG5 are more similar to each other than to SPG1–2 or SPG6 (Figure S2B). Human and macaque SPG3–5 cells share many markers as mouse SPG3 such as DMRT1, KIT, STRA8, SOHLH1 and SOHLH2 (possibly equivalent to Aaligned - A1 spermatogonia in mice, Apale - B1 macaque; Apale - B human), but many markers are not restricted to a specific state – rather are broadly expressed in multiple differentiating SPG clusters, and are often temporally unaligned across species (Figure 2C–D, S2F, summarized in Figure 2G). For example, KIT positive cells are sporadic in SPG1–3 yet increase in abundance in human SPG4 (unannotated in macaque), while STRA8 is absent in SPG3, but sharply upregulated in human and macaque SPG5–6. MKI67 is upregulated in SPG3 of mice and macaques, but not until SPG4 in humans (Figure S2F; Table S3). Cells in macaque and human SPG4 expressed markers such as MKI67, PCNA, numerous cyclins, DMRT1, DMRTB1 and KIT (unannotated in macaque), indicating that this population is actively proliferating, and appears to be molecularly analogous to cells transitioning from Type A1–4/AIntermediate to B spermatogonia in mouse (Table S3, Figure S2F)(Zhang et al., 2014). This transition has a distinct cell cycle program depicted in Figure S3F and G, which appears to have either extended S/G2-phase or more transient G1 phase, possibly reducing the likelihood of premature meiotic entry of germ cells. The macaque and human SPG5 population continue to express DMRT1 and DMRTB1 (unannotated in monkey) while turning down KIT and turning on STRA8 as well as the meiotic gene REC8 (Table S3, Figure S2F). Together, these data here suggest that consensus SPG4 and SPG5 may represent two transcriptionally distinct Type B spermatogonia populations – referred to here as early (SPG4) and late (SPG5) Type B – in all three species. Whereas, the classical descriptions of spermatogenic lineage suggests one Type B in mouse and humans and four in monkeys (B1–4).
Consensus SPG6.
SPG6 cells turn off DMRT1 while maintaining DMRTB1 (not annotated in macaque), a specific signature of mouse Type B spermatogonia (Zhang et al., 2014, Green et al., 2018). They also begin to express many markers previously identified in mouse preleptotene spermatocytes: TEX101, LY6K and established meiotic genes such as SYCP2/3, SYCE1/2, MEIOB, PRDM9, SPATA22, and DNAJB11 (Green et al., 2018). Therefore, SPG6 marks a transition from type B spermatogonia to early meiotic preleptotene cells in all three species. However, we identify many genes that are either expressed in all three species (i.e. FMR1NB, ZCWPW1, DPEP3, IQBP1, CALR), in primates only (BEND2, ZRANB2, PAGE4, ZNFX1, HLTF, ZBED5) or are singularly expressed in mouse (Swt1, Tgfbr1, Rhox13, PhTF1, Fmr1), macaque (ZNF850, MAGEB17, ZMYM1, TCEA1) and human (PRDM7, full VCX family, ERBB3, ERVK13-1, SSX3, TRIML2) (Table S3A). While functional significance for many of these species-specific genes remain unknown, they may provide insights into the evolution of meiotic entry program.
Since the nomenclature for spermatogonia populations varies across species, it has been difficult to translate data produced in one species to the others. To harmonize our transcriptionally defined populations with classical morphometric descriptors, we updated a model comparing spermatogenic lineages across species Figure S2E (Fayomi and Orwig, 2018). Briefly, SPG1 and SPG2 represent two distinct populations of undifferentiated spermatogonia which do not strictly map to classic Adark and Apale spermatogonia (see below), but are analogous to the mouse undifferentiated spermatogonia populations (As-Apaired-AAligned). SPG3 cells are transitioning from undifferentiated (Aal mouse; Ad/p monkey and human) to differentiated (A1 mouse; B1 macaque; B human). SPG4 and SPG5 are transcriptionally distinct sub-populations of transit-amplifying differentiating spermatogonia (A2–4, Intermediate mouse, and Type B; B2-B4 monkey and B human). SPG6 cells have activated the meiotic program and represent Type B to preleptotene spermatocytes in all three species.
Altogether, we identify six consensus SPG states across species using global transcriptomes rather than select markers. The equivalency of the SPG states across species is confirmed by the average cell-cell Jaccard distances between clusters (Figure 2F). On a global scale, cluster-cluster similarities between species are weaker in earlier SPG states when compared to the late SPG5–6 states, suggesting that the early states and molecular pathways for early spermatogonia differentiation are more variable across species, but these cells ultimately reach a more conserved molecular identity, or converge on a similar differentiation program prior to meiosis. While previous studies assumed that well known markers are transferable across species, we now provide a foundational set of universal and species-specific markers, along with their expression timing (summarized Figure 2G), that can be used to isolate and analyze equivalent states in human, monkey and mouse.
SPG1 and SPG2 do not correlate with Adark and Apale histological states in human.
Historically, undifferentiated spermatogonia in humans and macaques are divided by intensity of nuclear staining as Adark and Apale spermatogonia (Clermont and Leblond, 1959, Clermont and Antar, 1973, Clermont, 1966b), which are proposed to be quiescent (reserve) and active stem cells, respectively (Clermont, 1969). To determine whether SPG1 corresponds to either histological state, we combined PAS and hematoxylin staining with immunohistochemistry for several markers in human testis cross sections. PIWIL4 and MORC1 are enriched in SPG1, while TCF3 is more equally distribute between SPG1 and SPG2 (Figure 2D–E). All three markers were localized to the basement membrane of human seminiferous tubules. and were expressed by both Adark and Apale spermatogonia (Figure 3E–H). However, TCF3, which is expressed in both SPG1 and SPG2, was skewed toward Apale spermatogonia (Figure 3G). Thus, SPG1 contains both Adark and Apale spermatogonia, indicating that these descriptions of nuclear morphology do not segregate transcriptionally. However, there may be a subpopulation of TCF3+ Apale with a distinct transcriptional program (SPG2).
Human spermatogonial stem cell activity is enriched in, but not exclusive to SPG1.
While a population’s stem cell potential can be predicted from pseudotime ordering of single cell data, autologous or heterologous transplantation experiments are needed for functional validation (Brinster and Zimmermann, 1994). To test the potential of human spermatogonial stem cells (hSSCs), xenotransplantation into a busulfan-treated immunodeficient mouse testis is the gold standard (Dovey et al., 2013, Valli et al., 2014a). While the xenotransplanted hSSCs cannot complete spermatogenesis, they can migrate and colonize the basement membrane of the seminiferous tubule, where they proliferate and produce characteristic chains of spermatogonia that survive for months (Figure 3I).
To examine the colonization potential of the SPG1 population, we enriched for these cells by FACS using the SPG1 cell surface marker TSPAN33 (Figure 2E, Figure 3J), as described previously (Guo et al., 2018). Unsorted, TSPAN33-, and TSPAN33+ cells were xenotransplanted into recipient nude mice (Figure 3I–J). Two months later, the number of colonies in recipient testes was significantly higher in the TSPAN33+ group (46 ± 15 colonies/105 cells transplanted) than the Unsorted (10.7 ± 3 colonies/105 cells) or TSPAN33- group (3.2 ± 0.9 colonies/105 cells) (Figure 3K). This confirms that the TSPAN33+ population (SPG1) is enriched for transplantable human spermatogonia, even though it does not capture the entire population. In fact, by adjusting for the total number of cells in each fraction, almost half of colonizing activity resides outside of the TSPAN33+ fraction (Figure 3L). Thus, transplantable human spermatogonia are transcriptionally heterogeneous, at least with respect to TSPAN33 expression. Future studies using combinatorial staining, sorting, and transplantation will be needed to determine the phenotype(s) of TSPAN33- human SSCs.
Generation of a universal pseudo-timescale for mammalian spermatogenesis.
The duration of a full spermatogenic cycle varies across species: ~35d in mouse (Kofman-Alfaro and Chandley, 1970, Bennett, 1977), ~42d in macaque (Arsenieva M, 1961, de Rooij et al., 1986, Rosiepen et al., 1997), ~74d in human (Heller and Clermont, 1963, Bennett, 1977). These early pulse chase and cell ablation experiments have provided rough estimates for the length of major spermatogenic processes in each species. However, we lack the molecular knowledge to align or compare equivalent cell states across species. Our scRNA-seq data provided the opportunity to fill this gap.
If germ cell differentiation starts and ends in equivalent molecular states and advances at the same pace through intermediates, a straight diagonal line is expected when comparing ordered molecular states between species. However, when 200 biological steps are defined separately for each species, heterochrony is observed (Figure 4A), i.e., pseudotime proceeds at different rates (Methods). This pseudotime-warping pattern is not due to gene-selection methods (Methods, Figure S4A), and as expected we find that more distantly related species showed greater misalignment, both at the start and the end of the trajectories.
Figure 4. Three-species comparison of gene expression dynamics across the germ cell differentiation trajectory.

A. Pseudotime heterochrony between pairs of species, shown as correlation matrices between 200 ordered centroids in each species, defined from initial pseudotime assignments in each species. SC, spermatocyte. Elong, elongating spermatid.
B. Construction of a consensus pseudotime by principal curve (black line). The spermatogonia centroid is shown as an anchor (triangle) to indicate the direction of differentiation. Unaligned pseudo-time bins at the end of macaque trajectory are indicated as crosses. Size of symbol indicates the number of cells contained by each bin.
C. Rank correlation for pairs of species using 20 pseudotime bins after aligning to the consensus pseudotime.
D. Relative cell abundance across 20 aligned pseudotime bins (GC1–20), preceded by 6 spermatogonia bins representing consensus SPG 1–6. Size of points indicate number of cells contained by each bin. Mouse data used only the cells from non-enriched experiments.
E. Expression patterns of established marker genes across the 6 SPG states and 20 germ cell states allow mapping between the molecular states and the major cell types/processes, showing on the top of 4D.
Although the human and macaque pseudotimes were largely aligned, the final ~1/3 of the macaque trajectory maps to the final ~1/5 of the human trajectory, suggesting that most mature or terminal state (elongating spermatids) in human corresponded biologically to a wider span of the mature states in macaque (Figure 4A, left panel). Furthermore, ~10% of molecular final states in macaque appear to extend beyond the final states of either human or mouse, suggesting that macaque spermatids progress molecularly further (Figure 4B). These “overhanging” states in macaque revealed no unique markers (data not shown), but rather progressive loss of pre-existing transcripts (Figure S4B–C), consistent with the reported RNA degradation observed in mouse sperm (Gou et al., 2015). Whether these differences between species are reflected in the transcript content of mature sperm or paternal contribution to the zygote remains unknown.
To directly compare equivalent molecular states across species we developed a common pseudotime shared by the three species by fitting the three trajectories in PC1-PC3 to a single principal curve, now divided to 20 bins (Methods, Figure 4B–C). The final, extended states of the macaque trajectory were set aside and added as a single group at the end, representing the 21st bin in macaque (Figure 4D). As expected, the 20 centroids show synchrony after reassigning cells to the universal time indicating proper alignment (Figure 4C).
With analogous states properly matched, we compared cell occupancy across the germ cell differentiation states: GC1–20 (mouse and human) or GC1–21 (macaque), along with the spermatogonia states: SPG1–6 (Methods, Figure 4D). Using conserved markers (Figure 4E) we mapped these states to major spermatogenesis processes: mitosis (SPG1–6), meiosis (GC1–4), and spermiogenesis (GC5–14, round spermatids, and GC15–20/21, elongating spermatids). The number of cells per state reflects the rate of traversing developmental states and/or the balance of cell division and death. Humans and macaques have a large pool of undifferentiated spermatogonia (SPG1–2; Figure 4C) that undergo a limited number of transit amplifying divisions, whereas mice have fewer such cells, which undergo more mitotic divisions (see Spermatogonia section). Cell numbers in SPG5–6 were maintained relatively constant in macaque and human, but reduced in mouse, consistent with a 50% reduction of mouse spermatogonia estimated in vivo (de Rooij, 1973, Huckins and Oakberg, 1978). In GC1–4, which represent the transition to meiosis and meiotic divisions, mouse and macaque expectedly display an expansion of cells, whereas humans appeared to experience a significant bottleneck. However, only a portion of this reduction in humans can be explained by histological estimates of degeneration rates (Johnson et al., 1983), suggesting that human cells may proceed more quickly through initial stages of meiosis. Finally, more cells accumulate towards the end of the trajectory in primates, likely reflecting molecular differences in spermatid clearing, as there are no cell division after meiosis.
As a resource for the community, we share details of the molecular coordinate system for the temporal progression of male germ cell differentiation for three mammalian species and provide markers for individual steps in each species (Table S4).
Species-specific gene expression dynamics during germ cell differentiation.
To identify genes with distinct temporal patterns, we employed K-means to cluster 1-1-1 orthologs, focusing on 2,000 highly variable genes across aligned pseudotime points independently in mouse, human, and macaque (Figure 5A). This unbiased gene clustering analysis revealed six successive waves of activation corresponding to major processes of mitosis (K=1), meiosis (K2–3), and spermiogenesis (K4–6), as confirmed by gene ontology (Table S5). To determine if homologous genes share the same phase across a given pair of species, we used the union of highly variable 1-1-1 orthologs in the pair, ordered them by the six K-means gene groups, and linked orthologous genes by lines (Figure 5A). Genes expressed “in phase” in two species (e.g., in K1 of both species) are linked by horizontal lines, and are considered to have conserved temporal expression patterns, whereas genes with “shifted” phase are linked by diagonal lines. “Unique” genes without a connecting line are those whose ortholog is not detectable or not dynamically expressed in the other species.
Figure 5. Three-species comparison of dynamically expressed genes in the germ cells.

A. Comparison of dynamically expressed genes between pairs of species. Rows indicate genes, ordered by six gene clusters; and columns indicate the 20 aligned bins for germ cells, plus one SPG bin. Grey lines link orthologous pairs. Genes lacking connecting lines have orthologs that are not highly expressed or variable in the other species.
B. Number of genes with “shifted” phase between species, with the direction of change shown in arrows, labeled with the main biological processes (gene ontology) enriched for.
C. Between-species similarity of transcriptome across the 6 SPG and 20 GC states. Values are rank correlations of gene expression centroid between human and macaque (blue) or human and mouse (pink), for either all 11,023 1-1-1 orthologs (solid line) or 835 transcription factor orthologs (dotted line). Line indicates loess best fit. TF, transcription factor.
D. Values are rank correlations of gene expression centroid between human and macaque (blue) or human and mouse (pink) for 835 transcription factor orthologs (dash line) or 835 randomly subsampled orthologs (solid line) with matched distribution of expression levels as the TFs.
E. Group-level gene expression changes across the 26 germ cell molecular states, for several mutually exclusive groups of orthologs. Venn diagram (top left) shows the groups of genes that are unique to one species (α’s), with orthologs in two species (β’s), or in three species (γ). Group δ are 1-1-1 orthologs. Graphs show the percentage of total expression accounted for by 1-1-1 orthologs (top right), species-specific genes (bottom left), and primate-specific genes (bottom right).
As expected, human and macaque showed more in-phase (40–56%) than phase-shifted (25–38%) genes, whereas human-mouse had the opposite trend (20–38% in phase vs. 31–48% shifted), as did macaque-mouse (26–45% vs. 36–49%) (Figure S5A). Shifted genes are enriched for broad biological processes like DNA repair, RNA splicing/processing/binding, and transcription regulation, though the direction of shift varies by species pair (Figure 5B). Specifically, human genes that lagged other species in expression included those for mRNA export and splicing such as RNPC3, NONO, PRPF8, and SNRPB. Furthermore, we find many dead-box (DDX) genes, an evolutionarily conserved family of RNA-helicases, are also phase shifted. For example, DDX18 and DDX46 were expressed early in K1 in humans, but not expressed until K2 and 3 in mouse and monkey. Whereas, DDX23 (K1) and DDX52 (K2) are expressed earlier in mouse, but later in human and macaque (K2, K5).
Finally, we examined the dynamics of genes with unique, species-specific expression across the different clusters, and find that these genes increased progressively across the six gene groups. For example, between human and macaque, K1 group has 10% unique genes whereas K6 has ~25% (Figure S5A). While the precise functions of the unique or shifted genes are unclear, these patterns may reflect regulatory changes of germ cell differentiation over the course of evolution.
Gene expression conservation varies by stage and by gene groups.
While the testis is one of the most rapidly evolving tissues on both the sequence and bulk transcriptome level (Brawand et al., 2011), our study offers the opportunity to compare the degree of expression conservation across successive stages of spermatogenesis. Using 1-1-1 orthologs we calculated the rank correlation between gene expression centroids of two species, for each of the 26 ordered stages: 6 SPG and 20 subsequent germ cells. Between-species expression correlation is highest in spermatogonia and decreases progressively through differentiation to spermatids (Figure 5C, solid lines). A similar pattern was observed for 835 known transcription factors (TF), albeit at lower levels (Figure 5C, dotted lines). Since TFs tend to have lower expression than other genes, we re-calculated the correlations using 835 randomly subsampled orthologs with matched expression distribution as the TFs (Figure 5D). This showed that TF correlations are no longer lower for germ cell states but remained lower in SPG states, suggesting a greater between-species programmatic divergence for transcription factors than for non-TF genes, and mainly in SPGs.
To better estimate the functional effects of TFs we examined the concerted changes of their downstream target genes, or “regulon”, using SCENIC (Aibar et al., 2017). This analysis identified 82 and 88 regulons with 10+ target genes for human and mouse, respectively (Table S5). (This analysis was not performed in macaque for lack of TF database). Human-mouse conservation is ranked by correlation of regulon scores across the 26 stages, and we illustrated the 17 most conserved and 5 most divergent shown in Figure S5B. Many of the top TF regulons (e.g. EZH2, E2F1, YY1 and SOX6) have been previously implicated in spermatogenesis, and their phasic patterns seen here are consistent with previously described functional dynamics in mice (Jin et al., 2017, Rotgers et al., 2015, Wu et al., 2009, Hagiwara, 2011). However, we also identify a number of regulons that have not previously explored in the context of spermatogenesis (e.g., UBTF, GABPA, BRG1, GABPB1, BACH1) and may play a role in epigenetic memory or transcriptional regulation.
While this comparison takes into account only orthologous genes, it is possible that other species-specific genes may have temporal bias in expression and thus contribute to observed divergence in the spermatogenesis program. Next we examined the transcript levels for genes of various ortholog classes (Figure 5E, S5C). For macaque, the percentage of 1-1-1 orthologous genes that are expressed appeared largely unchanged across states and macaque-specific genes contribute little to total expression, likely due to incomplete annotation of the macaque genome since the vast majority of genes are orthologs (Figure 5E, S5C). This is further underscored by the high number of orthologs between human and mouse (Figure S5C), many of which likely remain unannotated in macaque. In contrast, human and mouse show a decrease in 1-1-1 ortholog expression across spermatogenesis, complemented by an increase in species-specific (lacking an ortholog in other two species) gene expression (Figure 5E). Curiously, the amount of primate specific transcripts (exclusively in human and macaque) peak at the entry into meiosis. These primate specific transcripts include a large number of C2H2 zinc-finger containing proteins (227/875 human orthologs), most of which (198) are KRAB domain-containing, a family thought to target endogenous retro elements (ERE). Indeed, many of these proteins have been shown to bind EREs in vitro, in particular ERVs (Figure S5D,E) (Imbeault et al., 2017), and are also testis-enriched in the Human Protein Atlas (Uhlen et al., 2015). Therefore, these genes may aid in identifying primate-specific regulators of entry into meiosis.
Transcriptomic differences in interstitial somatic cells suggest functional divergence between rodents and primates
The importance of germline-soma compatibility has been underscored by in vivo cross-species transplantation experiments, where rat SSCs can complete spermatogenesis in the mouse testis (Clouthier et al., 1996), whereas macaque or human SSCs colonize the mouse testis but fail to initiate meiosis (Nagano et al., 2001). The incompatibility between SSCs and somatic compartments of distant organisms may reflect changes in signaling pathways or microenvironment.
To create an improved somatic cell atlas we re-clustered the 3,722 human and 2,098 macaque somatic cells, yielding 7 major types: myoid cells, endothelial cells, Leydig cell precursors, macrophages, and three rarer cell types not extensively studied in testis, T cells and two subsets of pericytes (Figure 6A–B, S6A–C). Unlike mouse, our human and macaque datasets failed to capture Sertoli cells, Innate Lymphoid cells, or mature Leydig cells (Green et al., 2018), likely due to the mechanical or physiological stress of the cryopreservation and dissociation methods.
Figure 6. Somatic cells show divergence of transcriptome and signaling relationships.

A. Expression of a representative marker for each of the 7 somatic cell types for human (top) and macaque (bottom).
B. Schematic of somatic cell localization patterns in primate testis.
C. Hematoxylin and eosin staining of human, macaque, and mouse seminiferous tubules. Arrowheads indicate peritubular myoid cell nuclei; red dotted lines indicate layers of myoid cells.
D. Cross-species comparison of somatic transcriptome by a joint PCA of cell type centroids from all 3 species, using 673 1-1-1 ortholog genes that are highly variable in each species. Symbol shape represents the species. Symbol color represents the cell type. Ellipsoids group cell types with similar transcriptome across species. Elliptic lines depict 50% confidence interval for cell types observed across species.
E. Cross-species comparison of somatic cell types using rank correlation of gene expression centroids for 7 human, 7 macaque, and 6 mouse cell types. Correlations are calculated using two gene sets: 673 orthologs described in Figure 6D (upper right triangle) or 311 1-1-1 ortholog genes encoding signaling ligands (lower left triangle).
One of the most abundant populations captured in primates was peritubular myoid cells, defined by the expression of MYH11, ACTA2, collagens, laminins, versican (VCAN), and fibronectin (Figure 1C, 6A, S6C, Table S6). This overrepresentation of myoid cells in primates reflects structural difference in seminiferous tubule basement membrane, with multiple layers of myoid cells in human/macaque, but a single layer in mouse (reviewed in (Maekawa et al., 1996)) (Figure 6C). Contractile myoid cells are involved in the transport of spermatozoa and testicular fluid in the tubule (Ross, 1967), and recent studies suggest that they also produce factors which may regulate neighboring somatic cells or spermatogonial cell proliferation. Consistent with these findings, human and macaque myoid cells express many genes for muscle contraction, as well as some conserved (fibronectin and collagens) or unique (decorin and biglycan) extracellular matrix (ECM) proteins (Table S6)(Adam et al., 2012). Unlike in mouse, primate myoid cells don’t express the spermatogonial stem cell proliferative factor GDNF, but rather express components of other signaling pathways not detected in mouse, like PDGFRA, EGFR, PTCH1, and PPARA (Table S6). Therefore, although both mouse and primate myoid cells have contractile properties, they have acquired species-specific signaling functions.
Another major population identified in human and macaque is that of immature Leydig cells, which are observed in neonatal rodent testis but thought to be absent in adult (reviewed in (Chen et al., 2010)) and lacking in our adult mouse testis dataset (Green et al., 2018). This population expressed multiple markers previously shown to be restricted to spindle-shaped putative progenitor (Lottrup et al., 2014) or differentiating Leydig cells (Hu et al., 2010, Wang et al., 2003) (DLK1 (unannotated in macaque), IGF1/2, CFD, SFRP, PTCH2, and IGFB3 (Figure 6A, S6C, Table S6)), whereas their expression of steroidogenic enzymes (STAR and Cytochrome P450 genes) were very low (data not shown). These cells also expressed ECM genes and are transcriptionally similar to myoid cells (Figure S6B), suggesting that myoid and Leydig cell lineages may share a common progenitor in human and macaque, as seen in mice (Liu et al., 2016).
The somatic compartment also contains endothelial cells (VWF, NOSTRIN, AQP1, CD34 (human) and pericytes (MCAM, RGS5, PDGFRB, NG2, CCL2, ABCC9, and NOTCH3) (Vanlandewijck et al., 2018, Du et al., 2015) which jointly form the mural wall of small blood vessels (Figure 6A, S6C, Table S6). We found the testis pericyte population splits into two distinct sub-clusters which we annotate as muscular (m-pericytes: MCAM (higher expression), CRIP1/2, RERGL, ADIRF (unannotated in macaque), MYL9, PTM1/2) or fibroblastic (f-pericytes: STEAP4, GUCY1A1/2, ITGA1, CD36/44 (human), collagens, laminins) (Figure 6B, S6C, Table S6). Several m-pericyte markers are more restricted to blood vessels, whereas cells expressing f-pericyte markers tend are also present in the interstitial space (Figure S6D). Despite the similarity between the two pericyte populations, cluster-cluster centroid correlations suggested that m-pericytes resembled myoid and immature Leydig cells, whereas f-pericytes were more distinct (Figure S6B), suggesting that f- and m-pericytes might arise from different cell lineages in the testis.
Finally, we identified two immune cell types: a major population of macrophages (CD163, HLA-DRA/MAMU-DRA, LYZ, TYROBP) and a smaller population of T cells (CD3D, TRAC, TRBC2, CD69) (Figure 6B, S6C, Table S6), a fraction of which express CD8 (23% human, 10% macaque).
To globally compare somatic cells to uncover the changes over the course of mammalian evolution, we merged the somatic cells from all three species, performed principal component analysis (Figure 6D) and compared cell type centroids (Figure 6E) using 1-1-1 orthologs (Methods). In the PC1-PC2 plot, immune cell populations separated from the remaining somatic cells, confirming that the distinction between cell types overshadows species differences (Figure 6D). While endothelial and macrophage cell transcriptional programs appeared to be largely conserved (r=0.6–0.8; Figure 6E, upper half of matrix), there are notable between-species differences in the remaining somatic cells. For example, mouse myoid cells appeared to be more transcriptionally similar to primate m-pericytes, than primate myoid cells. Similarly, although Leydig precursor cells in primates somewhat resembled adult mouse Leydig cells, they were more similar to our previously identified Tcf21+ interstitial mesenchymal cells in the mouse testis (Green et al., 2018). Thus, many of the somatic cells within the testis have morphed transcriptionally across species. Similar cellular association and correlation patterns were confirmed using several gene sets (Figure S6E, Methods), including ligand expression (Figure 6E, lower half of matrix), indicating that the observed shifts in cell identity relationships is likely due to genuine programmatic changes in the transcriptome.
Evolutionary differences in the germ cell - soma cell communication.
Single-cell data provide an unprecedented opportunity to explore communications between the soma and germline in the testis. By calculating Interaction Scores for curated ligand-receptor (L-R) pairs across all 182 cell type pairs (7 somatic X 26 germ cell states) (Methods, Table S7), we created a somatic-to-germ cell interaction map that depicts putative interactions (Figure 7A). Many somatic cells have the potential (i.e., expression of ligands) to communicate with germ cells, supporting an open niche model for the mammalian testis. In primates, the immature Leydig, myoid, and f-pericyte cells have the most extensive putative interactions than other somatic cell types. In mouse, the Tcf21+ interstitial population, myoid, macrophage, and endothelial cells exhibit the greatest number of interactions, while Sertoli and Leydig cells – two central niche cell populations - have the least (Figure 7A). Among the germ cells, SPG1–6 tend to have more putative interactions than the germ cells, consistent with the role of the blood-testis barrier that prevents meiotic and post-meiotic germ cells from directly communicating with the interstitial compartment.
Figure 7. Potential soma-germline signaling relationships by ligand-receptor analysis.

A. Summary of the extent of potential interactions between ligands expressed in somatic cells (left) and receptors expressed in germ cells (right), counting top 5% of all interactions for each species. Symbol size indicates the number of receptor-ligand interactions contributed by a cell type; and line width shows the number of interactions between the two cell types.
B. Pattern of expression for selected ligand-receptor pairs. Symbol size indicates the level of expression; line width shows the expected signaling strength. Predicted interactions are shown as arrows (details in Methods).
Given the significant signaling differences among cell types, we next explored their conservation patterns. For each L-R pair we calculated its between-species correlations across the 182 cell type pairs, and used such a measure to rank the degree of conservation of signaling pathways (Table S7). Globally, Fig S7A–B shows the 30 most conserved and 30 least conserved L-R pairs, with three patterns highlighted by examples in Fig 7B, S7C. First, the target cell is conserved but the source cell is not (e.g., FGF2-FGFR1, COL4A1-ITGB1, WNT5A-RYK; CXCL12-ITGB1), some with better conservation within primates (FGF2-SDC2; BMP4-BMPR2). Second, the source cell of the ligand is conserved but the target cell is not (FGF2-SDC2). Third, both the source and target cell type vary across species (SLIT2-ROBO; APOE-SCARB1; WNT5A-ROR1; BMP4-BMPR1B; FGF2-SDC4; JAG1-NOTCH2; FGF7-FGFR3). These patterns suggest that while some pathways may be similarly employed in multiple mammalian species to regulate similar stages of spermatogenesis, the timing of activation and the origin or target of the signal may have diverged. Thus, by leveraging ligand-receptor expression levels in specific cell types we generate testable hypotheses regarding intercellular signaling and evolutionary shifts, to be validated by future perturbation experiments and spatial transcriptomics or proteomics analyses.
Discussion
Molecular and genetic analyses of mouse testes have provided substantial insights into spermatogenesis and are often used to guide parallel studies in primates. However, most comparative studies are reliant on a handful of markers to define presumably equivalent cellular states across species which may overlook important functional divergence between species. Therefore, an unbiased comparative analysis of germ cell differentiation and the somatic microenvironment is needed to elucidate similarities and differences across mammalian spermatogenesis.
We addressed this challenge by employing single-cell RNA-seq to define the spermatogenic and somatic cell types in normal adult human and macaque testes, which we then compared with our previous mouse dataset. On a global scale, we observe inter-species differences in transcription regulation, suggesting evolutionary changes of some aspects of the spermatogenesis program. Specifically, we find that meiotic sex chromosome inactivation (MSCI) is incomplete in primates (identifying 16 human and 14 macaque escapees), and demonstrate by intron HCR that RIBC1 escapes XY silencing in spermatocytes. Although we have only validated one escapee in vivo, many of the additional escapees appear to be critical for normal germ cell development; DYNLT3 is involved in meiotic chromosome segregation (Huang et al., 2011), and AKAP14, AKAP4, CYLC1, and PIH1D3 are important for sperm structure and motility (Montjean et al., 2012, Baccetti et al., 2005, Miki et al., 2002, Paff et al., 2017). Additionally, several escapees (SPANXN3, SPANXN5, SPANXD) are members of a primate ampliconic gene family (Kouprina et al., 2004). Copy number variants involving SPANXN5 have been linked to human infertility (Tuttelmann et al., 2011), indicating this family may also have critical roles in spermatogenesis.
To directly compare germ cell states, we combined and re-clustered spermatogonia from all three species, identifying six consensus states. The proportion of cells contributed by each species to these states varies, reflecting inherent differences in the size of the stem/progenitor pool and the number of transit-amplifying divisions of differentiating spermatogonia. In these six states (SPG1-SPG6), we identify two molecularly distinct undifferentiated spermatogonial cell states (SPG1: TSPAN33, PIWIL4, CDK17, MORC1, ID4, ZBTB16, and SPG2: L1TD1, ID4, GFRA1, and ZBTB16), consistent with earlier findings (Guo et al., 2018). Immunohistochemistry analysis of these markers shows that SPG1 and SPG2 do not have a direct correspondence to the two classic histologically defined spermatogonial populations in the human testis, known as Adark or Apale. Rather, together SPG1 and 2 represent a heterogeneous pool of undifferentiated spermatogonia. For validation, we performed human-to-mouse xenotransplants which demonstrate that stem cell activity is enriched in the TSPAN33-positive fraction of SPG1, but not restricted to that fraction. Based on the current data it is not possible to know whether stem cell activity in the TSPAN33-negative fraction is from TSPAN33-negative cells in SPG1 or might also include cells in SPG2. Nonetheless, these data clearly indicate that transplantable hSSCs are heterogeneous, at least with respect to the TSPAN33 cell surface marker.
Although SPG1 cells in our dataset mainly come from macaque and human samples, they do contain a small number of PIWIL4+ (Miwi2) cells from mouse. Earlier studies in mice have shown that Miwi2 is detected in a small number of Asingle but is more prevalent in Apaired and Aaligned undifferentiated spermatogonia. Furthermore, loss of the MIWI2 gene results in a progressive loss of germ cells (Carmell et al., 2007, De Fazio et al., 2011), suggesting a role in spermatogonia self-renewal. Consistent with this observation, transplantation experiments confirm that Miwi2-expressing cells possess SSC activity (Chan et al., 2014), and MIWI2 cell ablation experiments confirm its requirement for efficient germ cell regeneration after injury in the adult mouse testis (Carrieri et al., 2017). Together, these findings suggest that a rare population of undifferentiated SPG cells (As, Apr, Aal) may exist in mouse that are analogous to SPG1 cells in primates.
The consensus SPG3–5 contains cells that are transitioning from undifferentiated (Aal mouse; Ad/p monkey and human) to differentiating (A1–4, Aintermediate and B spermatogonia mouse; B1–4 macaque; B human). We link these molecular states with morphometric states described across species and identify two molecularly distinct Type B molecular states. Further, SPG6 includes type B spermatogonial cells transitioning to preleptotene spermatocytes based on the expression of conserved meiotic genes, such as PRDM9, ZCWPW1, and REC8. In addition to classic meiotic genes, our dataset uncovers many primate-specific genes enriched in human and macaque SPG6, including the simian-specific ampliconic Variably Charged (VC) gene family (VCX, VCX2, VCX3A, VCX3B) - theorized to play a role in meiotic drive (Zanders and Unckless, 2019, Lahn and Page, 2000) and associated with nonobstructive azoospermia (Ji et al., 2016). We also identify many primate-specific zinc-finger proteins (ZFP) that are rapidly expanding and evolving (Nowick et al., 2010). Earlier studies propose that ZFPs target endogenous retroelements and the wide diversity of ZFPs may stem from dynamic competition between transposable elements and KRAB-ZFPs (reviewed in (Bruno et al., 2019)). Based on the timing of expression of the primate-specific KRAB-ZF genes shown here, these genes may be activated in order to protect against the sequelae of active or inactive EREs, which are often also species-specific.
Although the post-SPG portion of the spermatogenesis program (meiosis and spermiogenesis) is largely continuous, our multi-species comparisons of germ cell trajectories suggest that different species do not always follow the same progression through transcriptional intermediates during germ cell differentiation. Rather they exhibit heterochrony in the alignment of cellular states, both in meiosis and spermiogenesis stages. To directly compare molecular states across species, we constructed a universal spermatogenesis “pseudotime” and aligned molecularly analogous populations across species. With cells from all three species now mapped on the same scale, we were able to directly compare molecular states to reveal similarities and differences in the transcriptome and temporal shifts in gene expression, including numerous transcription factors and RNA binding proteins.
Finally, we also discovered interspecies differences in the somatic cells that provide surrounding niche for developing germ cells. A direct comparison of somatic populations across species shows that the transcriptome of endothelial cells and some immune cells are highly conserved across species, while testis-specific interstitial cells show greater divergence. By mapping the expression of known ligand-receptor pairs, we find that although certain pathways are likely employed to regulate similar stages of spermatogenesis, many have either changed the source (secreting cells) or the target (receptor cells), or both, generating interesting hypotheses regarding the evolution of germ cell – soma communication in mammals. One example is the FGF signaling pathway which provides regulatory cues for spermatogonial dynamics (Kubota et al., 2004). FGF signaling depends on proteoglycans, such as syndecans like SDC4 in mice, to bind and transfer FGFs to receptor tyrosine kinases (FGFRs) on target cells (Kitadate et al., 2019). We find that this interaction appears to be conserved in human and macaque, but there is an additional syndecan (SDC2) expressed in primates which may bind FGFs secreted by various somatic cells.
Altogether, our analyses provide a reference system for integrated interspecies comparisons of spermatogenesis, highlighting multiple ways by which this process has evolved while maintaining the capacity to reach a common goal – gamete production. We identify molecularly equivalent states between human, macaque, and mouse, so that despite the apparent differences, mechanistic parallels can be drawn between species to enable future studies that make use of these complementary models. This knowledge is expected to reduce barriers to studying human infertility and accelerate progress towards developing novel therapies, such as in vitro gametogenesis.
STAR Methods
RESOURCE AVAILABILITY
Lead Contact
Further information and requests for reagents may be directed to, and will be fulfilled by, the lead contact S. Sue Hammoud (hammou@med.umich.edu).
Materials Availability
This study did not generate new unique reagents.
Data and Software Availability
Raw and processed data files for Drop-seq experiments are available under the GEO accession number GSE142585 with token qjgrsgcutpkrpat for reviewer access.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Animals and Tissue Donors
All experiments utilizing animals in this study were approved by the Institutional Animal Care and Use Committees of Magee-Womens Research Institute and the University of Pittsburgh (assurance A3654–01), the Oregon National Primate Research Center, Oregon Health and Sciences University (assurance A3304–01), and the University of Michigan (PRO00008135) and was performed in accordance with the National Institutes of Health Guide for the care and use of laboratory animals. Briefly, mice were housed in an environment controlled for light (12 hours on/off) and temperature (21 to 23C) with ad libitum access to water and food (Lab Diet #5058). For detailed mouse strain information, see below and Key Resources Table.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Mouse monoclonal anti Human Protein Gene Product 9.5 (UCHL1) | BioRad | Cat#7863-1004; RRID:AB_620256 |
| Goat polyclonal anti Human CD117 (c-kit) | R&D Biosystems | Cat#AF332; RRID:AB_355302 |
| Rabbit polyclonal anti Human TCF3 | Sigma Aldrich | Cat#HPA062476; RRID:AB_2684775 |
| Rabbit polyclonal anti Human GPX1 | Sigma Aldrich | Cat#HPA044758; RRID:AB_2679076 |
| Rabbit polyclonal anti Human FMR1 | Sigma Aldrich | Cat# HPA050118; RRID:AB_2681023 |
| Rabbit polyclonal anti Human MORC1 | Thermo Fisher | Cat# PA5–57647; RRID:AB_2644047 |
| Rabbit polyclonal anti Human PIWIL4 | Sigma Aldrich | Cat# HPA036588; RRID:AB_10673502 |
| Rabbit polyclonal anti Human TSPAN33 | Abcam | Cat# ab79130; RRID:AB_2213444 |
| Rabbit polyclonal anti Human DNAJB6 | Sigma Aldrich | Cat# HPA024258; RRID:AB_1847795 |
| Rabbit polyclonal anti Human MAGEB2 | Sigma Aldrich | Cat# HPA036588; RRID:AB_2732258 |
| PE-conjugated mouse polyclonal anti Human TSPAN33 | R&D Biosystems | Cat# FAB8405P025; RRID:AB_2814817 |
| Rabbit polyclonal anti Primate | Hermann et al., 2007 | N.A. |
| Chemicals, Peptides, and Recombinant Proteins | ||
| Deoxyribonuclease I | Worthington Biochemical Corp. | Cat#LS002139 |
| Collagenase Type IV | Worthington Biochemical Corp. | Cat# LS004210 |
| HBSS | Gibco | Cat# 14025076 |
| Trypsin-EDTA | Thermo Fisher | Cat# 25200056 |
| autoMACS Rinsing Solution | Miltenyi Biotec | Cat#130-091-222 |
| MACS BSA Stock Solution | Miltenyi Biotec | Cat#130-091-376 |
| DAPI | Sigma | Cat#D9542 |
| FBS | ThermoFisher Scientific | Ca#10437010 |
| ProLong™ Gold Antifade Mountant | ThermoFisher Scientific | Cat#P36930 |
| Ficoll PM-400 | Sigma | Cat#F5415 |
| Sarkosyl, Sodium Salt Solution | Sigma | Cat#L7414 |
| 1H,1H,2H,2H-Perfluoro-1-octanol | Sigma | Cat#370533 |
| dNTP Mix | ThermoFisher Scientific | Cat#R0192 |
| NxGen RNAse Inhibitor | Lucigen | Cat#30-281-1 |
| Maxima H Minus Reverse Transcriptase | ThermoFisher Scientific | Cat#EP0753 |
| Exonuclease I | New England BioLabs | Cat#M0293S |
| Power SYBR™ Green PCR Master Mix | ThermoFisher Scientific | Cat#4367659 |
| Agencourt AMPure XP Beads | Beckman Coulter | Cat#A63880 |
| Tween-20 | ThermoFisher Scientific | Cat#00–3005 |
| Triton X-100 | Sigma | Cat#T8787 |
| BSA | MP Biochemicals | Cat# 9048-46-8 |
| Normal Donkey Serum | Jackson ImmunoResearch | Cat# 017-000-121 |
| DAB Substrate | Sigma Aldrich | Cat# 11718096001 |
| Transwell baskets | Corning | Cat#10567522 |
| OneComp ebeads | ThermoFisher Scientific | Cat# 01-1111-41 |
| Busulfan | Sigma | Cat# B2635 |
| Hematoxylin 560 | Leica Biosystems | Cat#3801575 |
| Define MX-aq | Leica Biosystems | Cat#3803595 |
| Blue Buffer | Leica Biosystems | Cat#3802915 |
| Eosin with phloxine | Leica Biosystems | Cat#3801606 |
| Permount | ThermoFisher Scientific | Cat# SP15–100 |
| Vectashield with DAPI | Vector Laboratories | Cat# H-1200–10 |
| Paraformaldehyde | EMD Millipore | Cat#818715 |
| Sucrose, Rnase and Dnase free | Amresco | Cat#0335 |
| Citric Acid | Sigma | Cat#791725 |
| Formamide, deionized | Ambion | Cat#AM9342 |
| Heparin | Sigma | Cat#H3393 |
| Denhardt’s solution | ThermoFisher Scientific | Cat#750018 |
| Dextran Sulfate | Sigma | Cat#D6001 |
| Salmon Sperm DNA | ThermoFisher Scientific | Cat#15632–011 |
| HCR Amplifier B2-Cy3B | Molecular Technologies | N/A |
| HCR Amplifier B5-AF647 | Molecular Technologies | N/A |
| SuperBlock™ (PBS) Blocking Buffer | ThermoFisher Scientific | 37515 |
| UltraPure™ BSA | ThermoFisher Scientific | Cat#AM2618 |
| OneComp eBeads™ Compensation Beads | Thermofisher | Cat# 01-1111-42 |
| Biological Samples | ||
| Adult Human 1 (Age 20–25 yrs) Testicular Tissue | University of Pittsburgh Health Sciences Tissue Bank and Center for Organ Recovery | N.A. |
| Adult Human 2 (Age 37 yrs) Testicular Tissue | University of Pittsburgh Health Sciences Tissue Bank and Center for Organ Recovery | N.A. |
| Adult Human 3 (Age 30–40 yrs) Testicular Tissue | University of Pittsburgh Health Sciences Tissue Bank and Center for Organ Recovery | N.A. |
| Adult Human 4 (Age 30–40 yrs) Testicular Tissue | University of Pittsburgh Health Sciences Tissue Bank and Center for Organ Recovery | N.A. |
| Adult Macaque 1 (Age >=4 yrs) Testicular Tissue | Oregon National Primate Research Center | N.A. |
| Adult Macaque 2 (Age 9.3 yrs) Testicular Tissue | Oregon National Primate Research Center | N.A. |
| Adult Macaque 3 (Age 7.3 yrs) Testicular Tissue | Oregon National Primate Research Center | N.A. |
| Adult Macaque 4 (Age 13 yrs) Testicular Tissue | Oregon National Primate Research Center | N.A. |
| Adult Macaque 5 (Age 5.9 yrs) Testicular Tissue | Oregon National Primate Research Center | N.A. |
| Critical Commercial Assays | ||
| KAPA HiFi HotStart ReadyMix PCR Kit | Kapa Biosystems | Cat#KK2602 |
| Nextera XT DNA SMP Prep Kit | Illumina | Cat#FC-131–1096 |
| MACS dead cell removal kit | Miltenyi Biotec | Cat# 130-090-101 |
| Deposited Data | ||
| Raw data files for RNA-sequencing | This paper or NCBI Gene Expression Omnibus | GEO: GSE142585 |
| Experimental Models: Organisms/Strains | ||
| Mouse: C57BL/6J | The Jackson Laboratory | JAX: 000664 |
| Mouse: NCr nu/nu (CrTac:NCr-Foxn1nu) | Taconic | NCRNU-M |
| Oligonucleotides | ||
| Drop-seq primers | Macosko et al., 2015 | N.A. |
| Drop-seq beads | ChemGenes | Macosko201110 |
| Software and Algorithms | ||
| Drop-seq_tools (v1.12) | Macosko et al., 2015 | http://mccarrolllab.com/dropseq/ |
| picard (v2.6.0) | Broad Institute, 2016 | http://broadinstitute.github.io/picard/ |
| samtools (v1.2) | Li et al., 2009 | http://samtools.sourceforge.net/ |
| STAR (v2.5.2b) | Dobin et al., 2013 | https://github.com/alexdobin/STAR |
| R (v3.4.1) | R Core Team, 2017 | https://www.R-project.org/ |
| Seurat (v2.3.4) | Satija et al., 2015 | https://github.com/satijalab/seurat |
| seriation (v1.2–2) | Hahsler et al., 2008 | https://CRAN.R-project.org/package=seriation |
| Monocle (v2.10.2) | Qiu et al., 2017 | https://github.com/cole-trapnell-lab/monocle-release |
| biomaRt (v2.38.0) | Durinck et al., 2009 | https://bioconductor.org/packages/release/bioc/html/biomaRt.html |
| cellAlign (v0.1.0) | Alpert et al., 2018 | https://github.com/shenorrLab/cellAlign |
| SoupX (v1.2.1) | Young et al, 2020 | https://github.com/constantAmateur/SoupX |
| SCENIC (v1.1.2.2) | Aibar et al., 2017 | https://github.com/aertslab/SCENIC |
Collection and Preparation of Testicular Tissue
Healthy adult (~20–40 years, see Key Resources Table) de-identified human testes were procured through the University of Pittsburgh Health Sciences Tissue Bank and Center for Organ recovery under the University of Pittsburgh CORID 686. Adult rhesus macaque testis tissue (≥ 4–13 years, see Key Resources Table) was collected by castration through the Oregon National Primate Research Center. Following surgical removal testes were transported on ice in lactated Ringer’s solution to the laboratory. After removal of the tunica albuginea, testicular parenchyma directly cryopreserved or was digested sequentially with 2mg/ml Collagenase IV (Worthington Biochemical corporation), then 0.25% Trypsin-EDTA (Invitrogen) and 3.5mg/ml DNase I (Worthington Biochemical corporation) in Hank’s Balanced Salt Solution (HBSS, Gibco), as previously described (Hermann et al., 2007). In some experiments dissociated cells were used immediately after isolation while in other experiments cells were cryopreserved in liquid nitrogen following a slow freezing process as described (Hermann et al., 2007) and thawed later (see details below). Frozen samples were then processed for single cell sequencing analysis at the University of Michigan (IRB #HUM00165673) as previously described (Green et al., 2018) (See below).
METHOD DETAILS
Histological Methods
Immunohistochemistry
Testicular tissue pieces were fixed in 4% paraformaldehyde overnight, paraffin embedded and sectioned (5μm). Fixed tissue slides were deparaffinized, rehydrated in PBS and incubated at 97.5°C for 30minutes in sodium citrate (10mM sodium citrate, pH 6.0, 0.05% Tween20) or TRIS (tris base, pH 10, 0.05% Tween20) antigen retrieval buffer. Sections were blocked with donkey blocking buffer containing 3% bovine serum albumin (MP biochemicals) and 5% normal donkey serum (Jackson ImmunoResearch) for 60 minutes at room temperature. Slides were then stained with primary antibodies (refer to antibody table) for 90 minutes or overnight at 4°C, isotype matched normal IgG was used as the negative control. Sections were incubated with AlexaFlour-488, AlexaFlour-568 or AlexaFlour-647 conjugated donkey secondary antibodies (1:200, Thermofisher) for 45 minutes to detect primary antibodies. Slides were washed PBST (1X Phosphate buffered saline, 0.01% Tween20) and mounted with Vectashield mounting medium containing DAPI (Vector laboratories) for nuclei detection.
Colorimetric Immunohistochemistry
Sectioned tissue slides were deparaffinized, rehydrated and incubated antigen retrieval buffer as mentioned above. Slides were incubated in peroxidase block for 10minutes, washed in PBS and blocked with goat blocking buffer containing 3% bovine serum albumin and 5% normal goat serum (Jackson ImmunoResearch) then stained with primary antibodies or negative control (refer to antibody table). Goat anti-rabbit horseradish peroxidase (HRP)-conjugated secondary antibody for 30 minutes. DAB substrate was used to detect staining, sectioned were counterstained with periodic acid-Schiff and hematoxylin (Sigma-Aldrich).
Imaging and Staining Quantification
Slides were observed under the Nikon Eclipse 90i microscope and analyzed using the NIS Elements Advanced Research software. To quantify marker overlap, single positive cells for each marker and double positive cells were manually counted in cross sections of human seminiferous epithelium, under the Nikon eclipse 90i fluorescence microscope (≥ 100 tubules per sample × 3 replicate samples).
Hematoxylin and Eosin Staining
Tissue sections (human, macaque, and C57BL6 mouse) were deparaffinized and rehydrated in a graded ethanol series with final hydration in water. Sections on slides were then stained in hematoxylin 560 (Leica biosystems Inc) for 5 mins and rinsed in running tap water (1 min). Sections were then placed in Define MX-aq (Leica biosystems) for 1 min also followed by rinsing in running tap water for 1 min. This was followed by placement of sections in blue buffer (Leica biosystems) for 1 min and rinsing in tap water for 1 min. Sections were then dipped in 70% alcohol 3 times and counterstained in Eosin with phloxine (Leica biosystems) for 30 secs. Sections were then dehydrated and mounted with Permount® (Fisher Scientific).
Immunofluorescence-Hybridization Chain Reaction
Testes from adult male macaques were cut into ~5x5mm cubes, and fixed in 4% RNase-free paraformaldehyde (PFA) (EMD Millipore) for 8 hours at 4c. Testes were then transferred to 30% RNase-free sucrose (Amresco) and rotated overnight at 4C, embedded in OCT cryogel, and cryosectioned at 10um thickness. Hybridization chain reaction (HCR) was performed as described (Green et al., 2018) with slight modifications. Tissue sections were warmed to room temperature for 10 min, cross-linked with 4% RNase-free PFA for 10 min, washed with 1X PBS, and permeabilized in 0.5% Triton X-100 (Sigma) for 1 hour at room temperature. Coverslips were then washed in 5X SSCT (ThermoFisher Scientific) and equilibrated in HCR hybridization buffer (30% Formamide (Ambion), 5X SSC (ThermoFisher Scientific), 9 mM Citric acid (pH 6.0) (Sigma), 0.1% Tween-20 (ThermoFisher Scientific), 50 mg/mL Heparin (Sigma), 1X Denhardt’s solution (ThermoFisher Scientific), and 10% Dextran Sulfate (Sigma) for 1 hour at 37c, then probes were added to the sections at a final concentration of 2nM and hybridized overnight in a humidified chamber at 37c. A custom probe set for the intronic sequences of RIBC1 (ENSMMUG00000009295) and two probe sets for GusB (ENSMMUG00000049089) were designed using OligoMiner(Beliveau et al., 2018), masked against the Mmul_10 genome assembly, and ordered from Integrated DNA Technologies. After hybridization, samples were washed three times in HCR wash buffer (30% Formamide (Ambion), 5X SSC (ThermoFisher Scientific), 9 mMCitric acid (pH 6.0) (Sigma), 0.1% Tween-20 (ThermoFisher Scientific), 50 mg/mL Heparin (Sigma) for 15min at 37c and then twice for 15min at 37c in 5X SSCT (ThermoFisher Scientific). The probe sets were amplified with HCR hairpins for 120min at room temperature in HCR amplification buffer (5X SSC (ThermoFisher Scientific), 0.1% Tween-20 (ThermoFisher Scientific), 10% Dextran Sulfate (Sigma), and 100ug/mL salmon sperm DNA (ThermoFisher Scientific)). Fluorescently-conjugated DNA hairpins used in the amplification were ordered from Molecular Technologies (B2-Cy3B or B5-AF647). Prior to use, the hairpins were ‘snap heated’ at 95c for 90 seconds and let cool to room temperature for 30 min in the dark. After amplification, the samples were washed once in 5X SSCT (ThermoFisher Scientific) for 30min.
After HCR, the samples were blocked for immunofluorescence (IF) staining in blocking buffer (1X PBS (Invitrogen), 1% ultrapure BSA (Ambion), and 0.1% TritonX-100 (ThermoFisher Scientific)) for 60min at room temperature then incubated overnight with an Anti-phospho-Histone H2A.X antibody (Millipore Sigma 05–636, Lot #3292608) at a 1:200 dilution in the blocking buffer. The samples were then washed four times for 15min in PBST (1X PBS (Invitrogen) with 0.1% TritonX-100 (ThermoFisher Scientific)), then were incubated for 1 hour at room temperature with 10ng/mL DAPI (Sigma) and an anti-mouse-488nm secondary antibody (Invitrogen A21121, Lot #845809) at a 1:1000 dilution in the blocking buffer. Finally, the samples were washed another three times for 15min in PBST and then mounted in Vectashield.
Samples were imaged using a 60X 1.4NA oil immersion objective on a Nikon A1R-HD25 inverted confocal microscope. Representative images were taken using an additional 8.33X hardware zoom (for an XY-pixel size of 30nm) with 24 250um Z-steps (6um total thickness). The images were deconvolved using NIS elements software and are shown as max projections. RNA integrity of the samples was confirmed by the multispectral overlap between the two probe sets designed against GusB, as previously described (Shah et al., 2016).
| Antibody | Company | Product # | Dilution |
|---|---|---|---|
| UCHL1(PGP9.5) | BioRad | 7863–1004 | 1:100 |
| cKIT(CD117) | R&D systems | AF332 | 1:50 |
| TCF3 | Sigma-Aldrich | HPA062476 | 1:50 |
| GPX1 | Sigma-Aldrich | HPA044758 | 1:500 |
| FMR1 | Sigma-Aldrich | HPA050118 | 1:100 |
| MORC1 | Thermo Fisher | PA5–57647 | 1:500 |
| 1:100-colormetric | |||
| TSPAN33 | Abcam | ab79130 | 1:25 |
| DNAJB6 | Sigma-Aldrich | HPA024258 | 1:50 |
| MAGEB2 | Sigma-Aldrich | HPA074544 | 1:500 |
| TSPAN33-PE conjugated | R&D systems | FAB8405P025 | 10μl/106 cells |
| phospho-Histone H2A.X | Millipore Sigma | 05–636 | 1:200 |
Flow Cytometry
Cells were isolated from cryopreserved samples as described above and stained on ice in PBS buffer containing 10% FBS for 20 minutes with fluorescently conjugated TSPAN33 antibodies (refer to antibody table) as previously described (Valli et al., 2014b). FACS analysis was performed using the BD FACSAria II flow cytometer. Compensation controls were prepared with OneComp ebeads (Thermofisher) and isotype matched IgG monoclonal antibodies were used as negative control.
Xenotransplantation
Busulfan Treatment
Nude mice (NCr nu/nu; Taconic) were treated with chemotherapy agent busulfan (40mg/ml, Sigma) at 5–6weeks of age.
Human to Mouse Xenotransplantation
FACS sorted cells were suspended in MEMα medium (Invitrogen) containing 10% FBS and 10% trypan blue. Transplants were performed using ultrasound-guided rete testis injections into the seminiferous tubules of recipient ~8–12 week old mouse testes as previously described (Valli et al., 2014b). Testes were recovered 8 weeks after transplantation, tunica removed, and seminiferous tubules dispersed with 1mg/mL Collagenase IV and DNase I in D-PBS. Tubules were fixed for 2hrs in 4% PFA at 4°C and washed in PBS for 1hr(3–4X).
Whole Mount Immunofluorescence to stain xenotransplantation derived colonies.
Following PFA fixation, tubules were dehydrated in a series of graded methanol dilutions then incubated in a 4:1:1 ratio of MeOH: DMSO:H2O2 for 3 hours in 12-mm Transwell baskets (12-μm pore size; Corning Life Sciences, Acton, MA). Tubules were then washed with PBSMT blocking buffer (PBS, 0.02g/ml blotto dry milk powder, 10% Triton-X100) and stained with a rabbit anti-primate testis cell primary antibody (Hermann et al., 2007) at 4°C overnight. Goat anti-rabbit AlexaFlour488 was used to detect the primary antibody. Tubules were mounted on slides with Vectashield mounting medium containing DAPI and raised cover slip and imaged with fluorescent microscopy.
Quantification of SSC Colonization Activity
SSCs colonies were counted if colonies contained at least 4 cells in a continuous area (≤ 100μm between cells), located on the basement membrane of the seminiferous tubules, were ovoid shaped, had a high nuclear to cytoplasmic ratio.
scRNA-sequencing
Single Cell Suspension Preparation
Cryopreserved human tissue was thawed quickly at 37°C, rinsed with HBSS, and dissociated as described above. To enrich for interstitial cells (one sample), after seminiferous tubules were dispersed, the tissue was rinsed with HBSS and centrifuged at 200g for 5 min. The supernatant containing interstitial cells was diluted with FBS to prevent further digestion and centrifuged at 600g for 8 min at 4°C. The cells were rinsed in PBS, filtered through a 100um strainer and again centrifuged.
Cryopreserved macaque cell suspensions were thawed quickly at 37°C and diluted with HBSS and FBS then spun at 600g for 8 min. The cells were rinsed in PBS, filtered through a 100um strainer and again centrifuged.
Live cells were selected by either DAPI exclusion by FACS (unenriched samples) or using a MACS dead cell removal kit (Miltenyi Biotec, human somatic cell enrichment).
Droplet Generation and Sequencing
Single cell suspensions were diluted to 280 cells/ul and processed using the Drop-seq platform as previously published (Macosko et al., 2015, Green et al., 2018). Barcoded microparticle beads (MACOSKO-2011–10, Lots 090316, 072817, and 060718, ChemGenes Corporation) were used. Briefly, cells, barcoded microparticle beads and lysis buffer were co-flown into a microfluidic device and captured in nanoliter-sized droplets. After droplet collection and breakage, the beads were washed, and cDNA synthesis occurred on the bead using Maxima H-minus RT (Thermo Fisher Scientific) and a Template Switch Oligo. Excess oligos were removed by exonuclease I digestion. cDNA amplification was done for 15 cycles from pools of 2,000 beads using HotStart ReadyMix (Kapa Biosystems) and the SMART PCR primer. Individual PCRs were purified and pooled for library generation. A total of 600 pg of amplified cDNA was used for a Nextera XT library preparation (Illumina) with the New-P5-SMART PCR hybrid oligo, and a modified P7 Nextera oligo with 10 bp barcodes. Sequencing was performed on a HiSeq-2500 (Illumina) for read length of 112 nt or 115 nt, with the Read1CustomSeqB primer. Oligo sequences are the same as previously described (Macosko et al., 2015, Green et al., 2018).
Preprocessing of Drop-seq data
Read Filtering and Alignment
Raw paired-end sequence data were converted to queryname-sorted BAM files using Picard v2.6.0 FastqToSam(BroadInstitute, 2016), and processed using Drop-seq tools v1.12 from the McCarroll laboratory as described before (Shekhar et al., 2016, Macosko et al., 2015). Briefly, the first read is comprised of, from left to right, a 12-base cell-barcode, an 8-base unique molecular index (UMI), and a poly-T segment with 6 bases or longer. Read pairs with a base quality of less than 10 for any base of the cell barcode were removed. The second read with read length of 112 nt or 115 nt was trimmed at the 5’ end to remove any SMART adaptor sequence, and at the 3’ end to remove poly-A tails of 6 consecutive bases or greater. The trimmed reads were then aligned to GRCh37 reference genome for human samples, Mmul_8.0.1 for macaque samples, or a combined mouse (GRCm38) – human/macaque mega-reference using STAR v2.5.2a (Dobin et al., 2013) with default settings. Reads uniquely mapped to the sense strand of gene exons were recorded and grouped by cell barcode. Throughout this study we used the Ensembl transcriptomic annotation (GRCh37 from Ensembl release 75, and Mmul_8.0.1 from Ensembl release 93, and GRCm38 from Ensembl release 81). We corrected for cell barcodes with synthesis errors as described before(Shekhar et al., 2016). We obtained digital gene expression matrix using standard Drop-seq pipeline.
QC for the Impact of Ambient RNA
For each batch, an early step of data processing was to select real cells and discard empty droplets by finding the “kink” in the cumulative reads distribution plot, where the x-axis shows the cell barcodes ranked by decreasing number of reads (Figure S8A). The droplets containing more than a certain number of reads are more likely to have a captured cell, whereas those with too few reads are more likely to be empty, and only contain ambient RNA. So the questions are: what is the chance that cells we selected for analysis are actually empty; and for those that are not empty, what fraction of the transcripts is contributed by ambient RNA? These questions can be answer by first defining a transcription profile for the “soup” by pooling hundreds to thousands of cells shown on the right of the “kink” in the figure below, e.g., either the 2–3 K cells in the magenta group, or the next 2–3 K cells further to the right, as they contain even fewer reads and even more likely to be empty. Next, using this soup transcriptome, constructed in silico, we can calculate the “distance” of every cell to (1) the centroid of every known cell type identified and (2) the soup. By comparing the relative distances to real cell types and to the soup we can verify whether cells to the left of the kink are mostly “real cells”, and whether those to the right of the kink are increasingly likely to be soup-only (Figure S8B).
We focused on three batches (Human1–5, Human1–4, and Monkey1–2) with human (or monkey)-to-mouse mixing experiments, where the small-read-count droplets (the after-kink group) can be further selected for those with an intermediate mixing ratio from the two species, as they are more likely to contain no cell and only the ambient RNA. For Human1–5, we further selected ~2–3 K cells on the immediate right side of the kink as an after-kink comparison group (colored in magenta). In the plot on the right, x axis shows the logarithm of total read count, and the y axis is the percentage of human transcripts in the human-mouse mixture. The red and blue cells are designated human and mouse cells, respectively, because they are before the kink and have high human or mouse ratios. The green cells are considered human-mouse doubletons due to their intermediate human/mouse ratios (Figure S1A).
For the magenta group, while some cells have high or low Y values, most are concentrated near y=0.6 (60% are human transcripts), as shown below in the 2D density heatmap (Figure S8C) and the histogram of Y values (right). This plot allowed us to select a pool of high-confidence soup-only cells by requiring both a mid-range mixing ratio (Y) and below-cutoff total read counts (X).
After constructing the soup profile with 1,227 cells (out of the 2,435 in the magenta group) selected by the additional condition: 0.5<Y<0.75 we calculated the correlation coefficient between (1) cells in both the before-kink and the after-kink group with (2) the centroid for each of the five major cell types identified, plus “soup”, and displayed the results in a heatmap (Figure S8F). On the right side of the heatmap we indicated the current assignment of the before-kink-group cells into the five discovered cell types, and to the bottom of the heatmap are the magenta-group cells. The heatmap showed that nearly all the cells we selected from the left of the kink had higher correlations with one of the recognized real cell types than with the soup, confirming that the analyses described in this manuscript has been based on stringently selected cells, not affected by inadvertently selecting empty droplets.
We further performed two-way deconvolution analyses for the before-kink-group cells by using SoupX(Young and Behjati, 2020) to estimate the likely mixing ratio of a real cell and the ambient RNA. First, we found that SoupX does not work well when tested with simulated situations, and it is difficult to troubleshoot. Second, under our best effort, SoupX reported ~7% of the transcriptome of our selected cells are from ambient RNA. The other two batches with two-species mixing showed a similar result (not shown), confirming that our selected cells are distinct from the soup profile.
We also observed that the human (or monkey) - mouse ratio varied across the three experiments with 2-species mixing, for 0.7 in Human 1–4 and 0.5 in monkey 1–2 (Figure S8D–E) This confirmed the intuition that the ambient RNA is contributed by all the cells in the experiment and its profile is batch-dependent, influenced by the cell isolation procedures.
In summary, we concluded that ambient RNA is not likely to have affected the results we reported in this study. Our cell selection criteria have been conservative. That is, many of the cells we didn’t select (those in the magenta group), in fact have very high or very low human transcript ratios and likely contain real cells. Our choice of cutoff value at the kink thus has erred on having few false positives in the selected group, and a fair amount of false negatives in the rejected group.
Cell Filtering, Normalization, and Gene Standardization
Among the 4 sequenced human subjects over 10 replicates - Human Subject 1 (Human1) with 5 replicates, Human2 and Human3 with 2 replicates each, and Human4 with 1 replicate - there were 13,837 human testis cells satisfying the following criteria: 1) cell size factor and integrity filter: cells with >500 detected genes and with <10% mitochondria genes; 2) doublets removal: cells in two clusters that corresponded to somatic and germ cells from iterative clustering were removed (N=221). These cells were considered for further analysis. There were 46,541 genes detected in the human testis cells, among which 3,469 were selected as highly variable genes using R package Seurat v2.3.4 (Satija et al., 2015). These genes were used for global analyses of all cells, while subsequent, more focused analyses relied on new rounds of selection for highly variable genes.
Among the 5 sequenced macaque subjects with 9 replicates - Macaque2 with 1 replicate, and the other 4 macaques with 2 replicates each - there were 21,574 Macaque testis cells after the following filtering: 1) cell size factor and integrity filter: cells with >500 detected genes and with <10% mitochondria genes were kept; 2) doublets filter: cells in two clusters that corresponded to somatic and germ cells from iterative clustering were removed (N=225). These cells were considered for further analysis. There were 23,029 genes detected in the macaque testis cells, among which 3,284 were selected as highly variable genes using R package Seurat v2.3.4 (Satija et al., 2015).
For each cell, we normalized transcript counts by (1) dividing by the total number of UMIs per cell and (2) multiplying by 10,000 to obtain a transcripts-per-10K measure, and then log-transformed it by E=ln(transcripts-per-10K+1). For PCA, we used standardized expression values obtained by centering and scaling for each gene using (E-mean(E))/sd(E).
Evaluation of Technical Variability
Between-batch and Across-subject reproducibility:
We first evaluated reproducibility between the replicates of each individual subject, and then analyzed reproducibility across subjects using merged replicates.
For human, each of the 10 replicates contained 775 – 2,279 filtered cells. For each replicate, we performed PCA using all detected genes and did Louvain-Jaccard clustering using top PCs using Seurat2, resulting in 8 clusters for each replicate except that the 2nd replicate of Human Subject 2 (Human2–2) had 9 clusters. We then ordered clusters by minimizing Euclidean distance of cluster centroids by optimal leaf ordering algorithm using R package Seriation as described before (Green et al. 2018)(Hahsler et al., 2008). For each subject, the placement of the ordered clusters in PC1-PC3 and t-SNE spaces were similar across replicates, indicating reproducible patterns of heterogeneity (Figure S1B). We also cross-tabulated rank correlation coefficients among all pairs of centroids across the replicates of each single subject, which demonstrated that the clusters were largely reproducible across replicates.
Next, we merged gene expression matrix for all replicates of each subject, and repeated the analysis above for each subject. For each of the 4 human subjects, we performed PCA using ~3K highly variable genes selected for each subject by Seurat2 and did Louvain-Jaccard clustering using top PCs, resulting in 12 clusters for each subject. We then ordered clusters by minimizing Euclidean distance of cluster centroids by optimal leaf ordering algorithm using R package Seriation. The placement of the ordered clusters in PC1-PC3 and t-SNE spaces were consistent among the 4 human subjects, except that Human4 has the majority of its cells as somatic cells. We cross-tabulated rank correlation coefficients among all pairs of centroids across the 4 human subjects, which confirmed that the clusters were largely reproducible across different human subjects, despite that Human4 mainly contained somatic cells (Figure S1D).
For macaque, each of the 9 replicates contained 796 – 3,618 filtered cells. For each macaque replicate, we performed PCA using all detected genes and did Louvain-Jaccard clustering using top PCs using Seurat2, resulting in 11 clusters for each replicate except that Macaque2 had 10 clusters. We then ordered clusters by minimizing Euclidean distance of cluster centroids as before. For each subject, the placement of the ordered clusters in PC1-PC3 and t-SNE space were similar across replicates, indicating reproducible patterns of heterogeneity (Figure S1C). We cross-tabulated rank correlation coefficients among all pairs of centroids across the replicates of each subject, which demonstrated that the clusters were largely reproducible across replicates of each subject.
Next, we merged gene expression matrix for all replicates of each subject for macaque, and repeated the analysis above. For each subject, we performed PCA using ~3K highly variable genes selected for each subject by Seurat2 and did Louvain-Jaccard clustering using top PCs, resulting in 11 clusters for each subject. We then ordered clusters as before. The placement of the ordered clusters in PC1-PC3 and t-SNE space were similar for the 5 Macaque subjects. We cross-tabulated rank correlation coefficients among all pairs of centroids across the 5 Macaque subjects, which confirmed that the clusters were largely reproducible across Macaque subjects (Figure S1E).
The global cell atlas for spermatogenesis
Dimensionality Reduction, Louvain-Jaccard Clustering, and Cluster Annotation
For human samples, we performed principal component analysis (PCA) on the standardized gene expression matrix of 13,837 testis cells and 3,469 highly variable genes, and projected back to all detected genes using the R package Seurat v2.3.4 (Satija et al., 2015). We chose the top 8 PC scores for downstream clustering based on the estimated “elbow” point in the scree plot. For visualization we performed t-SNE and UMAP using top 8 PCs to further reduce the dimensionality to 2. To assess global cellular heterogeneity, we used the Louvain-Jaccard clustering to identify cell clusters (Blondel et al., 2008). We initially detected 15 clusters. Our first step was to identify major cell types in the global atlas, and in the second step, we relied on focused subset re-clustering to identify subtypes within each major cell type. To achieve the first goal, we merged the clusters in the initial set of 15 by statistical evaluation and biological marker interpretation. We computed a 15–15 Euclidean distance matrix for the 15 cluster centroids, which were defined as ln(mean of normalized expression+1) over all cells in each cluster, and then ordered and renumbered the 15 clusters using the optimal leaf ordering (OLO) algorithm in the R Package Seriation. Further merging of the ordered clusters was based on known markers for major cell types. Overall, for the 15 ordered clusters, we annotated clusters 1–6 as somatic, 7–9 as SPG, 10–11 as spermatocytes, 12–13 as round spermatids, and 14–15 as elongating spermatids (Figure 1B). In the second step, we did focused clustering on somatic cells to further delineate somatic cell types as described below in the somatic subclustering section. This process led to the identification of seven somatic cell types (T cells, Macrophage, Endothelial, m-Pericytes, f-Pericytes, Myoid, and ImmLeydig) for human testis (Figure 1B insert).
For macaque, we performed PCA on the standardized gene expression matrix of 21,574 macaque testis cells and 3,284 highly variable genes and projected back to all detected genes using Seurat v2.3.4 (Satija et al., 2015). We chose the top 8 PC scores for Louvain-Jaccard clustering based on the estimated “elbow” point in the scree plot. We performed t-SNE and UMAP using top 8 PCs to further reduce the dimensionality to 2. We initially detected 13 clusters, and ordered them by minimizing Euclidean distance of the cluster centroids using the optimal leaf ordering algorithm in the R Package Seriation. Overall, of the 13 ordered clusters, we assigned clusters 1–2 as somatic, 3–4 as SPG, 5–7 as spermatocytes, 8–10 as round spermatids, and 11–13 as elongating spermatids. As described in the focused somatic subset clustering section, we further detected 7 somatic cell types (T cells, Macrophage, Endothelial, m-Pericytes, f-Pericytes, Myoid, and ImmLeydig) for macaque testis.
Marker Selection for the Major Cell Types in Global Atlas
Markers for each of the major cell types were obtained by comparing a given cell type with all other cell types using the binomial likelihood test embedded in R package Seurat v2.3.4. Selection criteria are (1) at least 20% difference in detection rate; (2) a minimum of 2-fold higher average expression levels in the cell type compared to all other cell types, and (3) p-value < 0.01 in the binomial test. To display the markers in the cell type-centroid heatmap (Figure 1C) while accommodating their wide range of absolute expression levels, we centered each marker’s expression levels across all major cell type centroids and scaled by its standard deviation.
Cellular Attributes Derived from Single-Cell Data
We calculated a series of per-cell attributes based on the transcriptome data (Figure 1D).
nGene, nUMI, are cell size factors, for total number of detected genes or transcripts per cell.
%MT, is the percent of transcripts accounted for by the 37 mitochondria-encoded genes, 35 of which were detected in our Drop-seq data for either human or macaque. It serves as an index of cell injury or viability, with a smaller %MT indicating a healthier cell.
%ChrX, %ChrY, are the percentage of transcripts accounted for by the ChrX genes (1,707 detected for human, and 956 for macaque) or ChrY genes (189 detected for human, and 48 for macaque), respectively.
Gini index, is a measure of gene expression inequality, using either all genes or only the detected genes for each cell.
Germ cell comparative analysis
Gene Orthology Annotation
Orthologues are retrieved from Ensembl Biomart (Ensembl Gene Version 94, Oct 2018) using biomaRt R package (version 2.38.0) from Bioconductor(Durinck et al., 2009). Pairwise orthologues (human vs macaque, human vs mouse) are retrieved separately and merged using human Ensembl gene ID as reference. Genes that have no symbol name in macaque annotation (but exist in our data) are replaced by the gene name of its corresponding orthologue in human. One-to-one-to-one (denoted as 1-1-1 later) orthologues across three species are used in several analyses. These are genes that are annotated as “ortholog_one2one” in the homolog_ortholog_type column in Ensembl data for both human-macaque pair and human-mouse pair. Orthologues are further filtered by removing genes that are not detected in any of the three species in our study. Three genes (NDUFAB1, RPS26, SKP1) are removed due to the existence of other genes with the same name in macaque. In all, 11,023 1-1-1 orthologues are left and used for species comparison analysis (both for SPG and germ cell trajectory alignment).
Categories of Orthologues (Figure 5D)
Human, macaque and mouse genes, and known orthologue pairs between two species are retrieved from Ensembl as described above. Only genes annotated as “protein_coding” are kept in this analysis. We define the following classes of orthologous.
α1, α2, α3: Species specific genes, defined as genes that have no annotated orthologues in the other two species. When multiple Ensembl IDs map to the same gene symbol they are merged as one gene. If only some of those Ensembl genes with the same symbol have annotated orthologues (while the other Ensembl genes with the same gene symbol don’t), this gene is classified as having orthologues and are removed from the species-specific category.
β1, β2, β3: two-species specific genes, defined as genes that are annotated to have orthologues between two species but not in a third species. Here the orthologues may include one-to-one and one-to-many orthologues between two species. Genes classified as both two-species specific and 1-1-1 orthologues are removed from two-species specific and kept in 1-1-1 orthologues (this happened when orthologue search showed inconsistencies when using human or mouse as the reference).
γ: x:y:z orthologues, defined as genes that have orthologues in all three species, but not in a 1-1-1 orthology relationship (may include multi-orthologues in one or more species, e.g., 1-many-many or 1–1-many relationships).
δ: 1-1-1 orthologues, defined as genes that have annotated one-to-one orthologues between human-mouse and human-macaque, using human as central reference.
Cell Filtering for Comparative Analyses
Spermatogonia (SPG) cells and non-SPG germ cells counts data are extracted from the first round of major cell type clustering for human, macaque and mouse. Expression data for 1-1-1 orthologues are kept for further analysis. To reduce noise and improve accuracy for species alignment, in all three species, only cells with more than 600 detected genes (among the 1-1-1 orthologues) are kept for further analysis.
SPG Comparative Analysis
Spermatogonia cells from three species are jointly analyzed using Seurat CCA (R package version 2.3.4, same version as all Seurat function used in this study), using only 1-1-1 orthologues. Cells from three species are normalized and scaled separately by species using Seurat, since we had verified that within-species batch effects (both between-individual and between-replicates) is relatively small compared to between-species differences. 2,000 highly variable genes are selected from each species separately; and the union of the highly variable genes are used for CCA analysis. We then used multipleCCA in Seurat to generate three species merged data object, where 15 CCs are calculated, and 8 CCs are used for three species alignment according to the elbow point in the Shared Correlation Strength plot. Eight clusters are detected from aligned data using parameter resolution = 0.6. Cluster 1–6 are kept as SPG clusters and cluster 7 (112 cells) is removed from SPG analysis since they are suspected doublets due to the expression of mixed marker genes for two other clusters. Cluster 8 (298 cells) is moved from SPG to non-SPG germ cell analysis since they expressed markers for transitioning states from SPG to spermatocytes.
Marker calling:
Markers for each cluster in SPG1–6 are detected using built-in functions from Seurat: (1) FindMarkers function is used to call markers from 1-1-1 orthologs without considering species factor. FindConservedMarkers function is used to call markers from each species individually and the default meta-analysis method (minimump) is used to calculate p-values and corrected p-values. (2) For detecting markers using all genes from each species, data with all gene expression are imported into Seurat and cell cluster labels 1–6 from the SPG CCA analysis are used as the cluster labels. Then, FindMarkers function is used to call markers. This process is done for each species independently.
Cluster-cluster correlation:
We calculated the cluster centroids by averaging the expression level (before log transformation) for each gene across all the single cells within each cluster. Then, we calculated the spearman rank correlation matrix of the transcriptomic profile for the six cluster centroids (Figure S2B).
Cell-cell and cluster-cluster Jaccard distance:
By following Seurat pipeline, we obtained 8 shared CCs, based on which we merged the three species and did species/batch correction. We then identified the 50 nearest neighbors for each cell, and for every pair of cells, we calculated their degree of neighbor sharing as their Jaccard distance, formally defined as the number of neighbors in their intersect divided by the number in their union. To summarize it into a cluster-cluster distance among three species, we calculated the average Jaccard distance between all the pairs of cells and ordered them first by clusters within a species and then between species (Figure S2F).
SPG Cell Cycle Analysis
We used Whitfield et al. study as the reference dataset for mitotic cell cycle (Whitfield et al., 2002). Whitfield et al. collected synchronized cycling HeLa cells and analyzed microarray-based gene expression data, providing a reference for periodic gene expression pattern during cell cycle. A dataset with 47 samples are used in this analysis. One cycle for cultured Hela cells is ~15.4 hrs. Samples are collected every hour for 47 hrs, for ~3 cycles. To model the cyclic patterns, we folded the 47 samples into one cell cycle according to their relative time in a cycle. Genes are ranked by their peaktime (calculated as atan2 value of the phase). IMAGE IDs were used as feature IDs in the cDNA microarray. We converted gene symbols used in our data into IMAGE IDs by using the conversion table downloaded from MatchMiner (https://discover.nci.nih.gov/matchminer). Since one IMAGE ID may correspond to multiple gene symbols, the genes in our single-cell expression data are converted to IMAGE IDs by summing the expression levels of all gene symbols that correspond to the same IMAGE ID. Pseudocycle-time is inferred for SPG cells as follows: for each cell, we calculated its spearman rank correlation with each of the 47 reference samples of known cycle phase using 492 cyclic genes (579 corresponding to IMAGE IDs). Then, a pseudo-cycle-time in the range of 1–47 is assigned to the cell by its highest correlation among the 47 reference samples.
Germ Cell Trajectory Comparative Analysis: Universal Pseudotime
To align the three species, we collapsed the spermatogonia populations to a single group, and ordered the remainder of non-SPG germ cells (N=18537 mouse, N=12016 macaque, N=4293 human) along a linear trajectory using 2,000 highly expressed and highly variable genes selected independently for each species. We inferred the trajectory pseudo-time by using Monocle2 (R package version 2.10.1)(Qiu et al., 2017), for each species independently. The pseudo-time values from Monocle were used as input for cellAlign (R package version 0.1.0) (Alpert et al., 2018). cellAlign evenly cut the Monocle trajectory into 200 segments, and estimated the expression profile for each of the 200 pseudo-points as the mean profile of all the cells assigned to it. We combined the pseudo-time point centroids of the three species (200 each) and for each species added the transcriptomic centroid for spermatogonia as an “anchor”, resulting in 201 pseudo-time points in total. We performed PCA for the merged centroids datasets (201X3=603 centroids) using 247 genes, which is the intersection of the 2,000 highly variable genes for each species initially selected by Monocle(Figure 4B). If the differentiation process started and ended in equivalent states between two species and advanced at the same pace through intervening states, a perfect diagonal line would be observed. However, heterochrony occurs between species as deviations from the diagonal are observed(Figure 4A), suggesting that the biological steps defined separately for each species - using the current standard method - proceed at different rates, even when the comparison is based on the same set of 247 genes.
To directly compare equivalent molecular states across human, macaque, and mouse, we fit a principal curve in PC1-PC3 space for the 545 points in three trajectories (201 from human, 201 from mouse and 143 from macaque, excluding the final 58 points of the macaque trajectory). Each of the pseudo-time points is mapped to a point on the principal curve, yielding a one-dimensional coordinate for each pseudo-point. To further smooth the trajectory alignment and reduce noise (as some of the 201 points contained too few cells), we cut the principal curve evenly into 20 segments. Each segment forms a cell cluster for each species.
To assess whether the between-species alignment curve is driven by the choice of using highly variable genes, we repeated the initial Monocle pseudo-time assignment step using only the 247 intersected highly variable genes (rather than the 2,000 gene selected for that species). This led to slight shifts of the cells’ assignment into the original 200 pseudo-points but the major patterns of two-species heterochrony remained(Figure S4A).
Non-SPG Germ Cell Expression Dynamics via Gene Clustering by K-means
After mapping the Monocle pseudotime to the universal time, we obtained the gene expression centroids for the 21 cell clusters (1 SPG cluster + 20 aligned non-SPG germ clusters). For each species, we used k-means clustering of genes to identify major dynamic patterns across the 21 ordered states. Cluster centroids are log-transformed and standardized by gene. Only 1-1-1 orthologues are used in this analysis. We selected highly variable genes for each species independently: for each gene, we calculated the gene expression mean and variance. By setting threshold for gene mean and gene variance/mean according to their distribution, different number of HVGs are selected for each species (2063, 1472, 2115 for human, macaque and mouse respectively). K-means functions in R (parameter: k=6) is used to cluster highly variable genes in each species into 6 categories. Genes are ordered by their assignment to the six k-means clusters, displayed as a heatmap for each species, and then, with orthologous pairs connected by line across pairs of species (Figure 5A). GO enrichment analyses for genes in each cluster are conducted to determine the functional themes in each gene cluster using DAVID (v6.8) (Figure 5B). To focus on informative GO terms, we removed GO terms that contain too many genes (>500) or too few genes (<50).
For each pairwise comparison (for example, human vs. macaque), we initially clustered genes that are highly variable in one species, and some of these genes are not included in the other species’ clustering results because the gene is not considered highly variable, or not highly expressed. To “bring back” a missing gene like this into the two-species comparison, we calculated its correlation with the mean expression profiles for the 6 gene clusters. If the highest correlation is higher than 0.8, this gene will be assigned to that cluster (even though it initially failed to be selected as a highly variable gene). We did this for both species reciprocally, and repeated for each pairwise comparison. In Figure 6B, each of the gene cluster consists of both genes that are highly variable in both species and those that are highly variable in one species but not highly variable in the other, but do have an expression pattern that have a high correlation (>0.8) with one of the 6 clusters centroid.
Dynamics of TF Activity in Germ Cells via Regulon Analysis
We inferred regulon activities for germ cells from scRNA-seq data using an R package SCENIC v1.1.2.2 (Aibar et al., 2017), which has three steps. First, we identified potential modules based on co-expression between TFs and candidate targets for 26 germ cell type/state centroids in human or mouse using GENIE3. We then performed the TF-motif enrichment analysis and identified direct targets in species-specific databases using RcisTarget. Each TF with its potential direct targets constitute one regulon. Lastly, we evaluated the activity of each regulon for each germ cell type centroid using AUCell, which calculates AUC score by integrating the expression ranks across all genes in a regulon. We obtained 82 regulons with at least 10 target genes for human germ cell data, and 88 regulons with ≥10 target genes for mouse germ cell data. Among these, 22 regulons overlapped between human and mouse data (Table S5). To determine if the dynamic patterns of these 22 regulons are conserved between human and mouse, we calculated Spearman’s rank correlation of the activity of each regulon between human and mouse germ cell centroids. The regulons with rank correlation of >0.40 (N=17) are considered conserved, while those with rank correlation between −0.11 and 0.40 (N=5) are considered divergent (Figure S5B).
Somatic cells in testis niche
Focused Somatic Cell Subclustering and Cell Type Annotation
For human data, we extracted somatic cells (N=3,722) from the global analysis. Since the within-subject batch effect was minimal, while cross-subject batch effect was not trivial, we corrected for cross-subject batch effect by canonical correlation analysis (CCA) using Seurat2. We selected the union of top 1K highly variable genes for somatic cells of each subject, and merged gene expression data across subjects by Multi-CCA. We identified common sources of variation across the 4 subjects and corrected for batch effect by aligning top 15 CC subspaces. We performed Louvain-Jaccard clustering using the top 15 aligned CCs (ACCs) and obtained 7 clusters. We then ordered the 7 clusters by minimizing Euclidean distance of cluster centroids using OLO by Seriation. We selected markers for each cluster by comparing the expression level in one cluster against that in all other clusters using a nonparametric binomial test as described above. For functional enrichment analysis of the 7 somatic marker lists, we analyzed Gene Ontology (GO) enrichment using GOrilla (Eden et al., 2009) (accessed on Feb 4, 2019). GO terms that contain too many (>500) or too few (<50) genes were removed as their enrichment p-values were either too easy to be significant or too unstable. By evaluating the markers and GO terms for each cluster, and by referring to known markers for testis niche cell types, we annotated the 7 clusters as T cells, Macrophage, Endothelial, m-Pericytes, f-Pericytes, Myoid, and ImmLeydig, respectively, and incorporated these into major cell types in the global atlas (Figure 1B insert). The Jaccard distance from Louvain-Jaccard clustering for all cells of the 7 human testis somatic cell types was displayed in a heatmap (Figure S6A).
For macaque data, we extracted somatic cells (N=2,098) from the global clustering results. While both within-animal and cross-animal batch effects for macaque somatic cells were minimal, in order to be consistent with the analysis of human somatic cells, we corrected for batch effect using CCA for macaque somatic cells. We selected the union of top 1K highly variable genes for somatic cells of each animal, and merged gene expression matrix across animals by Multi-CCA. We corrected for batch effect by aligning top 14 CC subspaces. We performed Louvain-Jaccard clustering using top 14 ACCs and obtained 6 clusters. We then ordered the 6 clusters by minimizing Euclidean distance of cluster centroids using OLO by Seriation. We selected markers for each cluster by comparing the expression level in one cluster against that in all other clusters using a nonparametric binomial test as described above. By evaluating the known markers for testis niche cell types and markers for each cluster, we merged 2 clusters and classified the resulting 5 clusters as Immune cells, Endothelial, Pericytes, Myoid, and ImmLeydig. For macaque Pericytes (N=174), we combined Pericytes from the 5 animals given there were very few cells in each and trivial batch effect. We reclustered the Pericytes and obtained 2 subclusters. We obtained markers distinguishing the two Pericytes subclusters and annotated them as m-Pericytes and f-Pericytes. We then cross-tabulated rank correlation of the two Pericyte cell type centroids between human and macaque, and confirmed that the two macaque Pericytes subclusters correspond to the human m-Pericytes and f-Pericytes. For macaque immune cells (N=37), we did focused clustering and obtained 2 subclusters. We obtained markers distinguishing the two immune cell subclusters and annotated them as T cells and macrophages. Given there were too few cells in some cell types of macaque, we also did focused subclustering of somatic cells using all 3 species to borrow information across species. We used 1-1-1 orthologue genes, merged somatic cells from 3 species by multi-CCA, using the union of HVG genes, and corrected for batch effect by aligning top 17 CCs. We obtained 11 clusters by Louvain-Jaccard clustering using top 17 CCs, which split macaque immune cells in two subclusters, one grouped together with human T cells and mouse Innate lymphoid, and the other grouped together with human and mouse macrophages. The two immune cell subclusters obtained from focused immune cell subclustering in macaque alone or from 3-species somatic clustering were consistent. After incorporating the two Pericytes cell types and two immune cell types obtained above, there are 7 somatic cell types for macaque, – T cells, Macrophage, Endothelial, m-Pericytes, f-Pericytes, Myoid, and ImmLeydig (Figure 1B insert). The Jaccard distance from Louvain-Jaccard clustering for all cells of the 7 macaque testis somatic cell types was displayed in a heatmap (Figure S6A).
Cross-species Comparison for Somatic Cell Types
Unlike for mouse where we did Sertoli-enrichment and 1n-depletion experiments and obtained a sufficient number of Sertoli cells, we did not detect any Sertoli cell for human or macaque testis samples. We therefore excluded Sertoli cells in the cross-species comparison of somatic cell types. We calculated centroids for all somatic cell types from the 3 species (7 somatic cell types for human, 7 for macaque, and 6 for mouse after excluding Sertoli), and then centered the somatic cell type centroids of each species by the mean of the 7 (or 6) centroids for that species, to correct for cross-species difference. To calculate centroid-centroid correlations, we extracted from 1-1-1 orthologues 4 sets of genes: 1) the intersect of top 6K highly variable genes selected for each species (N=673 genes); 2) the union of top 50 markers for each somatic cell type from each of 3 species (N=575 genes); 3) the union of top 6K highly variable genes selected for each species (N=6,124); and 4) genes encoding known ligands (N=311). The first two sets of genes overlapped by 188 genes. Using these gene sets, we calculated the rank correlation coefficients of all pairs of somatic cell type centroids over three species (Figure 6E, S6E). The 4 sets of genes gave similar results. In addition, we performed PCA for all centroids in three species using intersect of HVG (Figure 6D). We overlaid to the PC plot ellipses to highlight groups of similar cell types, with 50% confidence interval for cell types observed across species. To aid visualization we used symbol color to indicate a set of cell types matched across species, and used symbol shapes to indicate species.
Ligand-Receptor Analysis
We used the previously curated list of ligand-receptor (L-R) pairs in human as reference(Ramilowski et al., 2015). Their corresponding orthologues are used for macaque and mouse. We selected L-R pairs with either highly variable ligand gene among the 7 somatic cell centroids, or highly variable receptor gene among the 26 germ cell centroids (6 SPG and 20 non-SPG clusters). Thresholds for highly-variable were set for both gene mean and variance across the clusters according to the observed distribution (human germ: mean>0.15, var/mean>0.2; human somatic: mean>0.08, var/mean>0.08; monkey germ: mean>0.1, var/mean>0.15; monkey somatic: mean>0.15, var/mean>0.2; mouse germ: mean>0.08, var/mean>0.08; mouse somatic: mean>0.15, var/mean>0.15;). Each species was analyzed separately. For each ligand-receptor pair we calculated its apparent signaling strength as an “Interaction Score”, defined as the product of the mean expression level of the ligand in a somatic cell type and that of the receptor in a germ cell type. In all, we calculated such an Interaction Score matrix for each species, for 7X26=182 cell type pairs and 1059, 529, and 785 L-R pairs for human, monkey and mouse, respectively.
Overall number of L-R interactions:
To compare the general signaling pattern across somatic and germ cells and then across species, we defined “strong interactions” for each species by keeping the highest 5% Interaction Scores for each species. For example, for human data, we have a matrix of such scores for 182 rows and 1059 columns, and we kept the highest 5% of the entries as “strong interactions”. We then calculated the number of such strong L-R interactions for each pair of somatic and gem cell types as their overall interaction strength, and displayed them as the line width of arrows in the pairwise interaction plots (Fig 7A).
Individual L-R pairs:
For each L-R pair, we modeled the distribution of the 182 scores (7 somatic times 26 germ cell types) as a mixture of three distributions, or modes, using k-means clustering at k=3. A loose interpretation of the three modes are: one is for high expression of both ligand and receptor, and the other two are for L-R pairs with either one or both of the pair to be lowly expressed. Of the three modes, the scores in the highest group were shown as line width in the interaction arrow plots (Fig 7B, Fig S7C), while those in the middle and lowest modes were omitted.
Further three-species comparisons:
To discover L-R pairs with conserved or divergent interaction patterns among the three species, we collected L-R pairs from individual species analysis and focused on the L-R pairs where both the ligand gene and the receptor gene are 1-1-1 orthologues. Then, for each L-R pair, we calculated the two-species correlation of the interaction scores across the 182 cell type pairs, repeated over all three two-species comparisons, and averaged the three pairwise scores as a measure of this L-R pair’s level of conservation. Since only five somatic cell types in mouse can be matched to those in human and monkey, we only used five somatic cell types when calculating the human-mouse and monkey-mouse correlations, involving 5X26=130 cell type pairs. L-R pairs with highest and lowest average correlations are shown in Fig S7A, as examples of high and low conservation of signaling pathways, respectively.
QUANTIFICATION AND STATISTICAL ANALYSIS
For the Drop-seq experiments, human cells were collected from 9 experiments without enriching specific cell types of the seminiferous tubules, and 1 experiment with interstitial-targeted enrichment. Macaque cells were collected from 9 experiments without targeted enrichment.
For quantification of IF/IHC experiments, single positive cells for each marker and double positive cells were counted in cross sections of seminiferous tubules. At least 100 tubules per sample for 3 replicate samples, each from a unique donor.
For quantification of xenotransplantation experiments, SSCs colonies were counted from entire whole-mount testes if colonies contained at least 4 cells in a continuous area (≤ 100μm between cells), located on the basement membrane of the seminiferous tubules, were ovoid shaped, had a high nuclear to cytoplasmic ratio. Average number of positive colonies was determined from ~10 testis counted per fraction (10 unsorted, 12 TSPAN33-negative, 11 TSPAN33-positive) from 3 replicate samples, each from a unique donor. Two-tailed T-tests were performed in Prism.
Statistical methodologies and software packages used for analysis of single cell data are described according to the STAR Methods format above. All analyses were performed in R.
Supplementary Material
Table S1. Markers for 11 global testis cell types, Related to Figure 1 and Figure S1.
A. Average expression of all genes for 11 human testis cell types. For each detected gene in human testis, we calculated its mean value for each of the 11 major cell types by averaging its normalized expression value within each cell type, and then plus 1 and natural log-transformed. Each row is a detected gene in human testis; each column is one of the 11 major cell types for human testis.
B. Markers for 11 human testis cell types. Markers were obtained by comparing each cell type against all other 10 cell types for human testis using binomial likelihood test embedded in Seurat v2.3.4. For each cell type, only genes that satisfied three conditions were considered as marker genes: 1) with a minimum of 20% difference in detection rate in the two groups; 2) with a minimum of 2-fold higher expression in this cell type; 3) p < 0.01. Each row is a marker for a given cell type ranked by p-value; from left to right, the columns are: cell type, gene name, p-value, natural log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and Bonferroni adjusted p-value. On the right side of the table, we summarized the number of cells and markers for each of the 11 major cell types for human testis. In total, we identified 3,035 markers in 13,837 cells in 11 major cell types for human testis.
C. Average expression of all genes for 11 macaque testis cell types. For each detected gene in macaque testis, we calculated its mean value in each of the 11 major cell types by averaging its normalized expression value within each cell type, and then plus 1 and natural log-transformed. Each row is a detected gene in macaque testis; each column is one of the 11 major cell types for macaque testis.
D. Markers for 11 macaque testis cell types. Markers were obtained by comparing each cell type against all other 10 cell types for macaque testis using binomial likelihood test with the same thresholds as described above; from left to right, the columns are: cell type, gene name, p-value, natural log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and adjusted p-value. On the right side of the table, we summarized the number of cells and markers for each of the 11 major cell types for macaque testis. In total, we identified 4,380 markers in 21,574 cells in 11 major cell types for macaque testis.
E. Human and macaque putative MSCI escapee genes. Escapees were defined as having average (normalized expression) >0.5 in spermatocytes and >2 fold increase in expression from spermatogonia. From left to right, the columns are: gene name, mean expression level in spermatocytes (SPC), mean expression level in spermatogonia (SPG), expression ratio between SPC and SPG, and chromosomal location.
Table S2. Markers for human and macaque germ cells independent iterative clustering, Related to Figure S1, STAR Methods.
A. Markers for human germ cell clusters 1–7. Markers were obtained by comparing each cell type against all other cell types for human testis using FindMarkers function in Seurat v2.3.4. For each cell type, only genes that satisfied three conditions were considered as markers: 1) with a minimum of 20% difference in detection rate in the two groups; 2) with a minimum 2-fold higher expression in this cell type; 3) p < 0.01. All other parameters are used as default. Each row is a marker for a given cell type, ranked by p-value; from left to right, the columns are: gene name, p-value, log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, adjusted p-value, and cluster number. In total, we identified 2,050 markers in 7 germ cell types for human testis.
B. Markers for macaque germ cell clusters 1–7. Marker detection method and display format are the same as in A. In total we identified 4,061 markers in 7 germ cell types for macaque testis.
C. Markers for human spermatogonia cell clusters 1–4. Marker detection method and display format are the same as in A. In total we identified 769 markers in 4 spermatogonia cell types for human.
D. Markers for macaque spermatogonia cell clusters 1–4. Marker detection method and display format are the same as in A. In total we identified 443 markers in 4 spermatogonia cell types for macaque.
Table S3. Markers for consensus spermatogonia clusters. Related to Figure 2 and Figure S2.
A. Makers detected using meta-analysis for three species. Markers were obtained by comparing each SPG type against all other SPG types using FindConservedMarkers function in Seurat v2.3.4. In each selection, only genes that satisfied two conditions were considered: 1) with a minimum of 10% difference in detection rate in the two groups; 2) with a minimum of 1.3-fold higher expression in the cell type. All other parameters are used as default. In total, we identified 432 markers in six consensus clusters for three species.
B. Markers detected by merging three species together. Markers were obtained by comparing each SPG type against all other SPG types for human testis using FindMarkers function in Seurat v2.3.4. In each selection, only genes that satisfied three conditions were considered: 1) with a minimum of 10% difference in detection rate in the two groups; 2) with a minimum 1.3-fold higher expression in the cell type; 3) p < 0.01. All other parameters are used as default. In total, we identified 3,751 markers in six consensus clusters for three species.
C. Human Markers detected for the consensus clusters using all human gene. Human SPG cells are extracted from the original expression count matrix with all genes included and the six consensus cluster labels applied. Marker detection method and table layout are the same as in B. In total, we identified 2,180 markers in six consensus clusters for human.
D. Macaque Markers detected for the consensus clusters using all macaque gene. Macaque SPG cells are extracted from the original expression count matrix with all genes included and the six consensus cluster labels applied. Marker detection method and table layout are the same as in B. In total, we identified 1,949 markers in six consensus clusters for macaque.
E. Mouse Markers detected for the consensus clusters using all mouse gene. Mouse single cells are extracted from the original expression count matrix with all genes included and the six consensus cluster labels are applied. Marker detection method, cutting thresholds and formats are the same as in B. In total, we identified 1,555 markers in six consensus clusters for mouse.
F. SPG consensus cluster centroids profile.
Table S4. Markers for 20 aligned germ cell clusters, Related to Figure 4.
A. Human markers. Human single cells are extracted from the original expression count matrix with all genes included and the three-species aligned 20-stage labels are applied. We identified genes that have high expression in one stage and low expression in its flanking regions by selecting genes that have high rank correlation (cutoff 0.35–0.7, varied by the number of markers in each stage) with a delta function that peak at specific stage. The second column denotes the stage number that the gene has peak expression. The following 20 columns show the mean gene expression in each stage standardized across 20 stages for each gene.
B. Macaque markers. Same method in A applied to macaque.
C. Mouse markers. Same method in A applied to mouse.
Table S5. Germ cell gene k-means clustering (species pairwise comparison), Related to Figure 5 A, B.
A. Human vs. Macaque. Gene labels from k-means clustering of genes according to their expression profile across the 20 aligned stage (see Methods for details). The first two columns are the name of the orthologues in human and macaque. The last two columns are the cluster number that the genes are classified into. 1–6 denotes one of the six gene clusters. “NULL” denotes that this gene can’t be classified into any of the six clusters in this species (see Methods for details).
B. Macaque vs. Mouse. Same method in A applied to macaque and mouse.
C. Human vs. Mouse. Same method in A applied to human and mouse.
D. Human GO enrichment analyses. Gene ontology terms enriched in each gene cluster for human against default human background genes using GOrilla. The last column denotes the cluster number of the genes
E. Macaque GO enrichment analyses. Same method and format as in D.
F. Mouse GO enrichment analyses. Same method and format as in D.
G. Human Regulon Activity. Activities (i.e., AUC scores, see Methods) of 82 regulons identified across the 26 germ cell type centroids of human. Each row is a regulon composed of a TF with at least 10 potential direct targets; the “extended” regulons include motifs that have been linked to the TF by lower confidence annotations (e.g. inferred by motif similarity). Each column is one of the 26 germ cell types (6 SPG stages and 20 non-SPG germ cell stages).
H. Mouse Regulon Activity. Activities (i.e., AUC scores) of 88 regulons identified across the 26 germ cell type centroids of mouse. Same method and format as in G.
Table S6. Markers for 7 somatic cell types, Related to Figure 6 and Figure S6.
A. Average normalized expression of all genes for 7 human somatic cell types. For each detected gene in human somatic cells, we calculated its mean value in each of the 7 somatic cell types by averaging the normalized expression of the gene within each cell type and then plus 1 and natural log-transformed. Each row is a detected gene; each column is a somatic cell type for human testis.
B. Markers for 7 human somatic cell types. Markers were obtained by comparing each cell type against all other 6 somatic cell types for human using binomial likelihood test. In each selection, genes that satisfied three conditions were considered: 1) with a minimum of 20% difference in detection rate in the two groups; 2) with at least 2-fold higher expression in the cell type; 3) p < 0.01. Each row is a marker for a given cell type ranked by p-value; from left to right, the columns are: cell type, gene name, p-value, log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and adjusted p-value.
C. Average normalized expression of all genes for 7 macaque somatic cell types. For each detected gene in macaque somatic cells, we calculated its mean value in each of the 7 somatic cell types by averaging the normalized expression of the gene within each cell type and then plus 1 and log-transformed. Each row is a detected gene; each column is a somatic cell type. On the right side of the table, we calculated the fraction of cells expressing a gene for each somatic cell type.
D. Markers for 7 macaque somatic cell types. Markers were obtained by comparing each cell type against all other 6 somatic cell types for macaque using binomial likelihood test with the same thresholds as described above; from left to right, the columns are: cell type, gene name, p-value, log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and adjusted p-value.
Table S7. Germ-Soma Receptor-Ligand Expression, Related to Figure 7 and Figure S7.
A. Human highly variable receptor-ligand pair scores. Each row is one ligand-receptor pair and each column is a pair of cell types between 7 somatic clusters and 26 SPG-germ cell clusters. We selected receptor-ligand pairs such that at least one of them is highly variable among somatic or germ cell clusters. For each pair, we calculated the interaction strength, defined as the product of the mean expression level of the ligand in the somatic cells and the receptor in the SPG-germ cells (details in Methods).
B. Macaque highly variable receptor-ligand pair scores. Ligand-receptor interaction strength matrix for macaque using the same method described in A.
C. Mouse highly variable receptor-ligand pair scores. Ligand-receptor interaction strength matrix for mouse using the same method described in A.
D. Human common receptor-Ligand pair scores (union of pairs A-C). Ligand-receptor interaction matrix for human for ligand-receptor pairs found in three species (union of the lists from A-C), but only those where both the ligand and the receptor can be matched across species as orthologues (details in Methods). Interaction strength is calculated in the same way as described in A.
E. Macaque common receptor-ligand pair scores (union of pairs A-C). Ligand-receptor interaction matrix for macaque using the same 721 ligand-receptor pairs and calculated using the same method as described in D.
F. Mouse common receptor-ligand pair scores (union of pairs A-C). Ligand-receptor interaction matrix for mouse using the same 721 ligand-receptor pairs and calculated using the same method as described in D.
G. Conservation score for shared orthologous receptor-ligand pairs. For the list of 721 ligand-receptor pairs in D-F, we calculated the correlation of the interaction scores between each pair of species as a measure of conservation. Then, we further calculated the average of the three pairwise correlation as the overall conservation score for each L-R pair among the three species.
Acknowledgements
We thank members of the Hammoud, Li, Orwig, and Yamashita Labs for scientific discussions and manuscript comments. This research was supported by National Institute of Health (NIH) grants 1R21HD090371-01A1 (S.S.H., J.Z.L.), 1DP2HD091949-01 (S.S.H.), P01 HD075795, R01 HD076412 (K.E.O), R01 HD092084 (K.E.O., S.S.H), F30HD097961 (A.N.S), training grants 5T32HD079342 (A.N.S.), 5T32GM007863 (A.N.S.), T32HD0871494 (S.K.M.), and Michigan Institute for Data Science (MIDAS) grant for Health Sciences Challenge Award (J.Z.L., S.S.H.), Open Philanthropy Grant 2019-199327 (5384).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of interests
The authors have no competing interests.
References:
- ADAM M, URBANSKI HF, GARYFALLOU VT, WELSCH U, KOHN FM, ULLRICH SCHWARZER J, STRAUSS L, POUTANEN M & MAYERHOFER A 2012. High levels of the extracellular matrix proteoglycan decorin are associated with inhibition of testicular function. Int J Androl, 35, 550–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- AIBAR S, GONZALEZ-BLAS CB, MOERMAN T, HUYNH-THU VA, IMRICHOVA H, HULSELMANS G, RAMBOW F, MARINE JC, GEURTS P, AERTS J, VAN DEN OORD J, ATAK ZK, WOUTERS J & AERTS S 2017. SCENIC: single-cell regulatory network inference and clustering. Nat Methods, 14, 1083–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ALPERT A, MOORE LS, DUBOVIK T & SHEN-ORR SS 2018. Alignment of single-cell trajectories to compare cellular expression dynamics. Nat Methods, 15, 267–270. [DOI] [PubMed] [Google Scholar]
- ARSENIEVA M DN, ORLOVA NN, BAKULINA ED 1961. Radiation analysis duration of meiotic phases in spermatogenesis of monkey (Macaca mulatta). Dokl Biol Sci (Engl Transl), 141, 984–986. [Google Scholar]
- BACCETTI B, COLLODEL G, ESTENOZ M, MANCA D, MORETTI E & PIOMBONI P 2005. Gene deletions in an infertile man with sperm fibrous sheath dysplasia. Hum Reprod, 20, 2790–4. [DOI] [PubMed] [Google Scholar]
- BELIVEAU BJ, KISHI JY, NIR G, SASAKI HM, SAKA SK, NGUYEN SC, WU CT & YIN P 2018. OligoMiner provides a rapid, flexible environment for the design of genome-scale oligonucleotide in situ hybridization probes. Proc Natl Acad Sci U S A, 115, E2183–E2192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- BENNETT MD 1977. The time and duration of meiosis. Philos Trans R Soc Lond B Biol Sci, 277, 201–26. [DOI] [PubMed] [Google Scholar]
- BLONDEL VD, GUILLAUME J-L, RLAMBIOTTE R & LEFEBVRE E 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics, P10008. [Google Scholar]
- BRAWAND D, SOUMILLON M, NECSULEA A, JULIEN P, CSARDI G, HARRIGAN P, WEIER M, LIECHTI A, AXIMU-PETRI A, KIRCHER M, ALBERT FW, ZELLER U, KHAITOVICH P, GRUTZNER F, BERGMANN S, NIELSEN R, PAABO S & KAESSMANN H 2011. The evolution of gene expression levels in mammalian organs. Nature, 478, 343–8. [DOI] [PubMed] [Google Scholar]
- BRINSTER RL & ZIMMERMANN JW 1994. Spermatogenesis following male germ-cell transplantation. Proc Natl Acad Sci U S A, 91, 11298–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- BROADINSTITUTE. 2016. Picard [Online]. Available: http://broadinstitute.github.io/picard [Accessed 2016].
- BRUNO M, MAHGOUB M & MACFARLAN TS 2019. The Arms Race Between KRAB-Zinc Finger Proteins and Endogenous Retroelements and Its Impact on Mammals. Annu Rev Genet, 53, 393–416. [DOI] [PubMed] [Google Scholar]
- CARMELL MA, GIRARD A, VAN DE KANT HJ, BOURC’HIS D, BESTOR TH, DE ROOIJ DG & HANNON GJ 2007. MIWI2 is essential for spermatogenesis and repression of transposons in the mouse male germline. Dev Cell, 12, 503–14. [DOI] [PubMed] [Google Scholar]
- CARRIERI C, COMAZZETTO S, GROVER A, MORGAN M, BUNESS A, NERLOV C & O’CARROLL D 2017. A transit-amplifying population underpins the efficient regenerative capacity of the testis. J Exp Med, 214, 1631–1641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CHAN F, OATLEY MJ, KAUCHER AV, YANG QE, BIEBERICH CJ, SHASHIKANT CS & OATLEY JM 2014. Functional and molecular features of the Id4+ germline stem cell population in mouse testes. Genes Dev, 28, 1351–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CHEN H, STANLEY E, JIN S & ZIRKIN BR 2010. Stem Leydig cells: from fetal to aged animals. Birth Defects Res C Embryo Today, 90, 272–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CLERMONT Y 1966a. Renewal of spermatogonia in man. Am J Anat, 118, 509–24. [DOI] [PubMed] [Google Scholar]
- CLERMONT Y 1966b. Spermatogenesis in man. A study of the spermatogonial population. Fertil Steril, 17, 705–21. [PubMed] [Google Scholar]
- CLERMONT Y 1969. Two classes of spermatogonial stem cells in the monkey (Cercopithecus aethiops). Am J Anat, 126, 57–71. [DOI] [PubMed] [Google Scholar]
- CLERMONT Y 1972. Kinetics of spermatogenesis in mammals: seminiferous epithelium cycle and spermatogonial renewal. Physiol Rev, 52, 198–236. [DOI] [PubMed] [Google Scholar]
- CLERMONT Y & ANTAR M 1973. Duration of the cycle of the seminiferous epithelium and the spermatogonial renewal in the monkey Macaca arctoides. Am J Anat, 136, 153–65. [DOI] [PubMed] [Google Scholar]
- CLERMONT Y & LEBLOND CP 1959. Differentiation and renewal of spermatogonia in the monkey, Macacus rhesus. Am J Anat, 104, 237–73. [DOI] [PubMed] [Google Scholar]
- CLOUTHIER DE, AVARBOCK MR, MAIKA SD, HAMMER RE & BRINSTER RL 1996. Rat spermatogenesis in mouse testis. Nature, 381, 418–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DE FAZIO S, BARTONICEK N, DI GIACOMO M, ABREU-GOODGER C, SANKAR A, FUNAYA C, ANTONY C, MOREIRA PN, ENRIGHT AJ & O’CARROLL D 2011. The endonuclease activity of Mili fuels piRNA amplification that silences LINE1 elements. Nature, 480, 259–63. [DOI] [PubMed] [Google Scholar]
- DE ROOIJ DG 1973. Spermatogonial stem cell renewal in the mouse. I. Normal situation. Cell Tissue Kinet, 6, 281–7. [DOI] [PubMed] [Google Scholar]
- DE ROOIJ DG, VAN ALPHEN MM & VAN DE KANT HJ 1986. Duration of the cycle of the seminiferous epithelium and its stages in the rhesus monkey (Macaca mulatta). Biol Reprod, 35, 587–91. [DOI] [PubMed] [Google Scholar]
- DE VRIES M, VOSTERS S, MERKX G, D’HAUWERS K, WANSINK DG, RAMOS L & DE BOER P 2012. Human male meiotic sex chromosome inactivation. PLoS One, 7, e31485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DOBIN A, DAVIS CA, SCHLESINGER F, DRENKOW J, ZALESKI C, JHA S, BATUT P, CHAISSON M & GINGERAS TR 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DOVEY SL, VALLI H, HERMANN BP, SUKHWANI M, DONOHUE J, CASTRO CA, CHU T, SANFILIPPO JS & ORWIG KE 2013. Eliminating malignant contamination from therapeutic human spermatogonial stem cells. J Clin Invest, 123, 1833–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DU Y, GUO M, WHITSETT JA & XU Y 2015. ‘LungGENS’: a web-based tool for mapping single-cell gene expression in the developing lung. Thorax, 70, 1092–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DURINCK S, SPELLMAN PT, BIRNEY E & HUBER W 2009. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc, 4, 1184–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- FAYOMI AP & ORWIG KE 2018. Spermatogonial stem cells and spermatogenesis in mice, monkeys and men. Stem Cell Res, 29, 207–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GOU LT, DAI P, YANG JH, XUE Y, HU YP, ZHOU Y, KANG JY, WANG X, LI H, HUA MM, ZHAO S, HU SD, WU LG, SHI HJ, LI Y, FU XD, QU LH, WANG ED & LIU MF 2015. Pachytene piRNAs instruct massive mRNA elimination during late spermiogenesis. Cell Res, 25, 266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GREEN CD, MA Q, MANSKE GL, SHAMI AN, ZHENG X, MARINI S, MORITZ L, SULTAN C, GURCZYNSKI SJ, MOORE BB, TALLQUIST MD, LI JZ & HAMMOUD SS 2018. A Comprehensive Roadmap of Murine Spermatogenesis Defined by Single-Cell RNA-Seq. Dev Cell, 46, 651–667 e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GUO J, GROW EJ, MLCOCHOVA H, MAHER GJ, LINDSKOG C, NIE X, GUO Y, TAKEI Y, YUN J, CAI L, KIM R, CARRELL DT, GORIELY A, HOTALING JM & CAIRNS BR 2018. The adult human testis transcriptional cell atlas. Cell Res, 28, 1141–1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- HAGIWARA N 2011. Sox6, jack of all trades: a versatile regulatory protein in vertebrate development. Dev Dyn, 240, 1311–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- HAHSLER M, HORNIK K & BUCHTA C 2008. Getting Things in Order: An Introduction to the R Package seriation. 2008, 25, 34. [Google Scholar]
- HAMMOUD SS, LOW DHP, YI C, CARRELL DT, GUCCIONE E & CAIRNS BR 2014. Chromatin and transcription transitions of mammalian adult germline stem cells and spermatogenesis. Cell Stem Cell, 15, 239–253. [DOI] [PubMed] [Google Scholar]
- HELLER CG & CLERMONT Y 1963. Spermatogenesis in man: an estimate of its duration. Science, 140, 184–6. [DOI] [PubMed] [Google Scholar]
- HERMANN BP, CHENG K, SINGH A, ROA-DE LA CRUZ L, MUTOJI KN, CHEN IC, GILDERSLEEVE H, LEHLE JD, MAYO M, WESTERNSTROER B, LAW NC, OATLEY MJ, VELTE EK, NIEDENBERGER BA, FRITZE D, SILBER S, GEYER CB, OATLEY JM & MCCARREY JR 2018. The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids. Cell Rep, 25, 1650–1667 e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- HERMANN BP, SUKHWANI M, LIN CC, SHENG Y, TOMKO J, RODRIGUEZ M, SHUTTLEWORTH JJ, MCFARLAND D, HOBBS RM, PANDOLFI PP, SCHATTEN GP & ORWIG KE 2007. Characterization, cryopreservation, and ablation of spermatogonial stem cells in adult rhesus macaques. Stem Cells, 25, 2330–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- HU GX, LIN H, CHEN GR, CHEN BB, LIAN QQ, HARDY DO, ZIRKIN BR & GE RS 2010. Deletion of the Igf1 gene: suppressive effects on adult Leydig cell development. J Androl, 31, 379–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- HUANG X, WANG HL, QI ST, WANG ZB, TONG JS, ZHANG QH, OUYANG YC, HOU Y, SCHATTEN H, QI ZQ & SUN QY 2011. DYNLT3 is required for chromosome alignment during mouse oocyte meiotic maturation. Reprod Sci, 18, 983–9. [DOI] [PubMed] [Google Scholar]
- HUCKINS C 1971. The spermatogonial stem cell population in adult rats. I. Their morphology, proliferation and maturation. Anat Rec, 169, 533–57. [DOI] [PubMed] [Google Scholar]
- HUCKINS C & OAKBERG EF 1978. Morphological and quantitative analysis of spermatogonia in mouse testes using whole mounted seminiferous tubules, I. The normal testes. Anat Rec, 192, 519–28. [DOI] [PubMed] [Google Scholar]
- IMBEAULT M, HELLEBOID PY & TRONO D 2017. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature, 543, 550–554. [DOI] [PubMed] [Google Scholar]
- JI J, QIN Y, WANG R, HUANG Z, ZHANG Y, ZHOU R, SONG L, LING X, HU Z, MIAO D, SHEN H, XIA Y, WANG X & LU C 2016. Copy number gain of VCX, X-linked multi-copy gene, leads to cell proliferation and apoptosis during spermatogenesis. Oncotarget, 7, 78532–78540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- JIN C, ZHANG Y, WANG ZP, WANG XX, SUN TC, LI XY, TANG JX, CHENG JM, LI J, CHEN SR, DENG SL & LIU YX 2017. EZH2 deletion promotes spermatogonial differentiation and apoptosis. Reproduction, 154, 615–625. [DOI] [PubMed] [Google Scholar]
- JOHNSON L, PETTY CS & NEAVES WB 1983. Further quantification of human spermatogenesis: germ cell loss during postprophase of meiosis and its relationship to daily sperm production. Biol Reprod, 29, 207–15. [DOI] [PubMed] [Google Scholar]
- KITADATE Y, JORG DJ, TOKUE M, MARUYAMA A, ICHIKAWA R, TSUCHIYA S, SEGI-NISHIDA E, NAKAGAWA T, UCHIDA A, KIMURA-YOSHIDA C, MIZUNO S, SUGIYAMA F, AZAMI T, EMA M, NODA C, KOBAYASHI S, MATSUO I, KANAI Y, NAGASAWA T, SUGIMOTO Y, TAKAHASHI S, SIMONS BD & YOSHIDA S 2019. Competition for Mitogens Regulates Spermatogenic Stem Cell Homeostasis in an Open Niche. Cell Stem Cell, 24, 79–92 e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- KOFMAN-ALFARO S & CHANDLEY AC 1970. Meiosis in the male mouse. An autoradiographic investigation. Chromosoma, 31, 404–20. [DOI] [PubMed] [Google Scholar]
- KOUPRINA N, MULLOKANDOV M, ROGOZIN IB, COLLINS NK, SOLOMON G, OTSTOT J, RISINGER JI, KOONIN EV, BARRETT JC & LARIONOV V 2004. The SPANX gene family of cancer/testis-specific antigens: rapid evolution and amplification in African great apes and hominids. Proc Natl Acad Sci U S A, 101, 3077–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- KUBOTA H, AVARBOCK MR & BRINSTER RL 2004. Growth factors essential for self-renewal and expansion of mouse spermatogonial stem cells. Proc Natl Acad Sci U S A, 101, 16489–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- LAHN BT & PAGE DC 2000. A human sex-chromosomal gene family expressed in male germ cells and encoding variably charged proteins. Hum Mol Genet, 9, 311–9. [DOI] [PubMed] [Google Scholar]
- LIU C, RODRIGUEZ K & YAO HH 2016. Mapping lineage progression of somatic progenitor cells in the mouse fetal testis. Development, 143, 3700–3710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- LOTTRUP G, NIELSEN JE, MAROUN LL, MOLLER LM, YASSIN M, LEFFERS H, SKAKKEBAEK NE & RAJPERT-DE MEYTS E 2014. Expression patterns of DLK1 and INSL3 identify stages of Leydig cell differentiation during normal development and in testicular pathologies, including testicular cancer and Klinefelter syndrome. Hum Reprod, 29, 1637–50. [DOI] [PubMed] [Google Scholar]
- MACOSKO EZ, BASU A, SATIJA R, NEMESH J, SHEKHAR K, GOLDMAN M, TIROSH I, BIALAS AR, KAMITAKI N, MARTERSTECK EM, TROMBETTA JJ, WEITZ DA, SANES JR, SHALEK AK, REGEV A & MCCARROLL SA 2015. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell, 161, 1202–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MAEKAWA M, KAMIMURA K & NAGANO T 1996. Peritubular myoid cells in the testis: their structure and function. Arch Histol Cytol, 59, 1–13. [DOI] [PubMed] [Google Scholar]
- MIKI K, WILLIS WD, BROWN PR, GOULDING EH, FULCHER KD & EDDY EM 2002. Targeted disruption of the Akap4 gene causes defects in sperm flagellum and motility. Dev Biol, 248, 331–42. [DOI] [PubMed] [Google Scholar]
- MONTJEAN D, DE LA GRANGE P, GENTIEN D, RAPINAT A, BELLOC S, COHEN-BACRIE P, MENEZO Y & BENKHALIFA M 2012. Sperm transcriptome profiling in oligozoospermia. J Assist Reprod Genet, 29, 3–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- NAGANO M, MCCARREY JR & BRINSTER RL 2001. Primate spermatogonial stem cells colonize mouse testes. Biol Reprod, 64, 1409–16. [DOI] [PubMed] [Google Scholar]
- NOWICK K, HAMILTON AT, ZHANG H & STUBBS L 2010. Rapid sequence and expression divergence suggest selection for novel function in primate-specific KRAB-ZNF genes. Mol Biol Evol, 27, 2606–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- OAKBERG EF 1971. Spermatogonial stem-cell renewal in the mouse. Anat Rec, 169, 515–31. [DOI] [PubMed] [Google Scholar]
- PAFF T, LOGES NT, APREA I, WU K, BAKEY Z, HAARMAN EG, DANIELS JMA, SISTERMANS EA, BOGUNOVIC N, DOUGHERTY GW, HOBEN IM, GROSSE-ONNEBRINK J, MATTER A, OLBRICH H, WERNER C, PALS G, SCHMIDTS M, OMRAN H & MICHA D 2017. Mutations in PIH1D3 Cause X-Linked Primary Ciliary Dyskinesia with Outer and Inner Dynein Arm Defects. Am J Hum Genet, 100, 160–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- QIU X, MAO Q, TANG Y, WANG L, CHAWLA R, PLINER HA & TRAPNELL C 2017. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods, 14, 979–982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- RAMILOWSKI JA, GOLDBERG T, HARSHBARGER J, KLOPPMANN E, LIZIO M, SATAGOPAM VP, ITOH M, KAWAJI H, CARNINCI P, ROST B & FORREST AR 2015. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun, 6, 7866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- RAMM SA, SCHARER L, EHMCKE J & WISTUBA J 2014. Sperm competition and the evolution of spermatogenesis. Mol Hum Reprod, 20, 1169–79. [DOI] [PubMed] [Google Scholar]
- ROSIEPEN G, ARSLAN M, CLEMEN G, NIESCHLAG E & WEINBAUER GF 1997. Estimation of the duration of the cycle of the seminiferous epithelium in the non-human primate Macaca mulatta using the 5-bromodeoxyuridine technique. Cell Tissue Res, 288, 365–9. [DOI] [PubMed] [Google Scholar]
- ROSS MH 1967. The fine structure and development of the peritubular contractile cell component in the seminiferous tubules of the mouse. Am J Anat, 121, 523–57. [DOI] [PubMed] [Google Scholar]
- ROTGERS E, NURMIO M, PIETILA E, CISNEROS-MONTALVO S & TOPPARI J 2015. E2F1 controls germ cell apoptosis during the first wave of spermatogenesis. Andrology, 3, 1000–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SATIJA R, FARRELL JA, GENNERT D, SCHIER AF & REGEV A 2015. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol, 33, 495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SHAH S, LUBECK E, ZHOU W & CAI L 2016. In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus. Neuron, 92, 342–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SHEKHAR K, LAPAN SW, WHITNEY IE, TRAN NM, MACOSKO EZ, KOWALCZYK M, ADICONIS X, LEVIN JZ, NEMESH J, GOLDMAN M, MCCARROLL SA, CEPKO CL, REGEV A & SANES JR 2016. Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics. Cell, 166, 1308–1323 e30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SOHNI A, TAN K, SONG HW, BUROW D, DE ROOIJ DG, LAURENT L, HSIEH TC, RABAH R, HAMMOUD SS, VICINI E & WILKINSON MF 2019. The Neonatal and Adult Human Testis Defined at the Single-Cell Level. Cell Rep, 26, 1501–1517 e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SOUMILLON M, NECSULEA A, WEIER M, BRAWAND D, ZHANG X, GU H, BARTHES P, KOKKINAKI M, NEF S, GNIRKE A, DYM M, DE MASSY B, MIKKELSEN TS & KAESSMANN H 2013. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep, 3, 2179–90. [DOI] [PubMed] [Google Scholar]
- TESCHENDORFF AE & ENVER T 2017. Single-cell entropy for accurate estimation of differentiation potency from a cell’s transcriptome. Nat Commun, 8, 15599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- TURNER JM 2007. Meiotic sex chromosome inactivation. Development, 134, 1823–31. [DOI] [PubMed] [Google Scholar]
- TURNER JM, MAHADEVAIAH SK, FERNANDEZ-CAPETILLO O, NUSSENZWEIG A, XU X, DENG CX & BURGOYNE PS 2005. Silencing of unsynapsed meiotic chromosomes in the mouse. Nat Genet, 37, 41–7. [DOI] [PubMed] [Google Scholar]
- TUTTELMANN F, SIMONI M, KLIESCH S, LEDIG S, DWORNICZAK B, WIEACKER P & ROPKE A 2011. Copy number variants in patients with severe oligozoospermia and Sertoli-cell-only syndrome. PLoS One, 6, e19426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- UHLEN M, FAGERBERG L, HALLSTROM BM, LINDSKOG C, OKSVOLD P, MARDINOGLU A, SIVERTSSON A, KAMPF C, SJOSTEDT E, ASPLUND A, OLSSON I, EDLUND K, LUNDBERG E, NAVANI S, SZIGYARTO CA, ODEBERG J, DJUREINOVIC D, TAKANEN JO, HOBER S, ALM T, EDQVIST PH, BERLING H, TEGEL H, MULDER J, ROCKBERG J, NILSSON P, SCHWENK JM, HAMSTEN M, VON FEILITZEN K, FORSBERG M, PERSSON L, JOHANSSON F, ZWAHLEN M, VON HEIJNE G, NIELSEN J & PONTEN F 2015. Proteomics. Tissue-based map of the human proteome. Science, 347, 1260419. [DOI] [PubMed] [Google Scholar]
- VALLI H, PHILLIPS BT, SHETTY G, BYRNE JA, CLARK AT, MEISTRICH ML & ORWIG KE 2014a. Germline stem cells: toward the regeneration of spermatogenesis. Fertil Steril, 101, 3–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VALLI H, SUKHWANI M, DOVEY SL, PETERS KA, DONOHUE J, CASTRO CA, CHU T, MARSHALL GR & ORWIG KE 2014b. Fluorescence- and magnetic-activated cell sorting strategies to isolate and enrich human spermatogonial stem cells. Fertil Steril, 102, 566–580 e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VANLANDEWIJCK M, HE L, MAE MA, ANDRAE J, ANDO K, DEL GAUDIO F, NAHAR K, LEBOUVIER T, LAVINA B, GOUVEIA L, SUN Y, RASCHPERGER E, RASANEN M, ZARB Y, MOCHIZUKI N, KELLER A, LENDAHL U & BETSHOLTZ C 2018. A molecular atlas of cell types and zonation in the brain vasculature. Nature, 554, 475–480. [DOI] [PubMed] [Google Scholar]
- VON KOPYLOW K & SPIESS AN 2017. Human spermatogonial markers. Stem Cell Res, 25, 300–309. [DOI] [PubMed] [Google Scholar]
- WANG GM, O’SHAUGHNESSY PJ, CHUBB C, ROBAIRE B & HARDY MP 2003. Effects of insulin-like growth factor I on steroidogenic enzyme expression levels in mouse leydig cells. Endocrinology, 144, 5058–64. [DOI] [PubMed] [Google Scholar]
- WHITFIELD ML, SHERLOCK G, SALDANHA AJ, MURRAY JI, BALL CA, ALEXANDER KE, MATESE JC, PEROU CM, HURT MM, BROWN PO & BOTSTEIN D 2002. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell, 13, 1977–2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- WU S, HU YC, LIU H & SHI Y 2009. Loss of YY1 impacts the heterochromatic state and meiotic double-strand breaks during mouse spermatogenesis. Mol Cell Biol, 29, 6245–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- YOUNG MD & BEHJATI S 2020. SoupX removes ambient RNA contamination from droplet based single-cell RNA sequencing data. bioRxiv, 303727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ZANDERS SE & UNCKLESS RL 2019. Fertility Costs of Meiotic Drivers. Curr Biol, 29, R512R520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ZHANG T, MURPHY MW, GEARHART MD, BARDWELL VJ & ZARKOWER D 2014. The mammalian Doublesex homolog DMRT6 coordinates the transition between mitotic and meiotic developmental programs during spermatogenesis. Development, 141, 3662–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Markers for 11 global testis cell types, Related to Figure 1 and Figure S1.
A. Average expression of all genes for 11 human testis cell types. For each detected gene in human testis, we calculated its mean value for each of the 11 major cell types by averaging its normalized expression value within each cell type, and then plus 1 and natural log-transformed. Each row is a detected gene in human testis; each column is one of the 11 major cell types for human testis.
B. Markers for 11 human testis cell types. Markers were obtained by comparing each cell type against all other 10 cell types for human testis using binomial likelihood test embedded in Seurat v2.3.4. For each cell type, only genes that satisfied three conditions were considered as marker genes: 1) with a minimum of 20% difference in detection rate in the two groups; 2) with a minimum of 2-fold higher expression in this cell type; 3) p < 0.01. Each row is a marker for a given cell type ranked by p-value; from left to right, the columns are: cell type, gene name, p-value, natural log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and Bonferroni adjusted p-value. On the right side of the table, we summarized the number of cells and markers for each of the 11 major cell types for human testis. In total, we identified 3,035 markers in 13,837 cells in 11 major cell types for human testis.
C. Average expression of all genes for 11 macaque testis cell types. For each detected gene in macaque testis, we calculated its mean value in each of the 11 major cell types by averaging its normalized expression value within each cell type, and then plus 1 and natural log-transformed. Each row is a detected gene in macaque testis; each column is one of the 11 major cell types for macaque testis.
D. Markers for 11 macaque testis cell types. Markers were obtained by comparing each cell type against all other 10 cell types for macaque testis using binomial likelihood test with the same thresholds as described above; from left to right, the columns are: cell type, gene name, p-value, natural log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and adjusted p-value. On the right side of the table, we summarized the number of cells and markers for each of the 11 major cell types for macaque testis. In total, we identified 4,380 markers in 21,574 cells in 11 major cell types for macaque testis.
E. Human and macaque putative MSCI escapee genes. Escapees were defined as having average (normalized expression) >0.5 in spermatocytes and >2 fold increase in expression from spermatogonia. From left to right, the columns are: gene name, mean expression level in spermatocytes (SPC), mean expression level in spermatogonia (SPG), expression ratio between SPC and SPG, and chromosomal location.
Table S2. Markers for human and macaque germ cells independent iterative clustering, Related to Figure S1, STAR Methods.
A. Markers for human germ cell clusters 1–7. Markers were obtained by comparing each cell type against all other cell types for human testis using FindMarkers function in Seurat v2.3.4. For each cell type, only genes that satisfied three conditions were considered as markers: 1) with a minimum of 20% difference in detection rate in the two groups; 2) with a minimum 2-fold higher expression in this cell type; 3) p < 0.01. All other parameters are used as default. Each row is a marker for a given cell type, ranked by p-value; from left to right, the columns are: gene name, p-value, log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, adjusted p-value, and cluster number. In total, we identified 2,050 markers in 7 germ cell types for human testis.
B. Markers for macaque germ cell clusters 1–7. Marker detection method and display format are the same as in A. In total we identified 4,061 markers in 7 germ cell types for macaque testis.
C. Markers for human spermatogonia cell clusters 1–4. Marker detection method and display format are the same as in A. In total we identified 769 markers in 4 spermatogonia cell types for human.
D. Markers for macaque spermatogonia cell clusters 1–4. Marker detection method and display format are the same as in A. In total we identified 443 markers in 4 spermatogonia cell types for macaque.
Table S3. Markers for consensus spermatogonia clusters. Related to Figure 2 and Figure S2.
A. Makers detected using meta-analysis for three species. Markers were obtained by comparing each SPG type against all other SPG types using FindConservedMarkers function in Seurat v2.3.4. In each selection, only genes that satisfied two conditions were considered: 1) with a minimum of 10% difference in detection rate in the two groups; 2) with a minimum of 1.3-fold higher expression in the cell type. All other parameters are used as default. In total, we identified 432 markers in six consensus clusters for three species.
B. Markers detected by merging three species together. Markers were obtained by comparing each SPG type against all other SPG types for human testis using FindMarkers function in Seurat v2.3.4. In each selection, only genes that satisfied three conditions were considered: 1) with a minimum of 10% difference in detection rate in the two groups; 2) with a minimum 1.3-fold higher expression in the cell type; 3) p < 0.01. All other parameters are used as default. In total, we identified 3,751 markers in six consensus clusters for three species.
C. Human Markers detected for the consensus clusters using all human gene. Human SPG cells are extracted from the original expression count matrix with all genes included and the six consensus cluster labels applied. Marker detection method and table layout are the same as in B. In total, we identified 2,180 markers in six consensus clusters for human.
D. Macaque Markers detected for the consensus clusters using all macaque gene. Macaque SPG cells are extracted from the original expression count matrix with all genes included and the six consensus cluster labels applied. Marker detection method and table layout are the same as in B. In total, we identified 1,949 markers in six consensus clusters for macaque.
E. Mouse Markers detected for the consensus clusters using all mouse gene. Mouse single cells are extracted from the original expression count matrix with all genes included and the six consensus cluster labels are applied. Marker detection method, cutting thresholds and formats are the same as in B. In total, we identified 1,555 markers in six consensus clusters for mouse.
F. SPG consensus cluster centroids profile.
Table S4. Markers for 20 aligned germ cell clusters, Related to Figure 4.
A. Human markers. Human single cells are extracted from the original expression count matrix with all genes included and the three-species aligned 20-stage labels are applied. We identified genes that have high expression in one stage and low expression in its flanking regions by selecting genes that have high rank correlation (cutoff 0.35–0.7, varied by the number of markers in each stage) with a delta function that peak at specific stage. The second column denotes the stage number that the gene has peak expression. The following 20 columns show the mean gene expression in each stage standardized across 20 stages for each gene.
B. Macaque markers. Same method in A applied to macaque.
C. Mouse markers. Same method in A applied to mouse.
Table S5. Germ cell gene k-means clustering (species pairwise comparison), Related to Figure 5 A, B.
A. Human vs. Macaque. Gene labels from k-means clustering of genes according to their expression profile across the 20 aligned stage (see Methods for details). The first two columns are the name of the orthologues in human and macaque. The last two columns are the cluster number that the genes are classified into. 1–6 denotes one of the six gene clusters. “NULL” denotes that this gene can’t be classified into any of the six clusters in this species (see Methods for details).
B. Macaque vs. Mouse. Same method in A applied to macaque and mouse.
C. Human vs. Mouse. Same method in A applied to human and mouse.
D. Human GO enrichment analyses. Gene ontology terms enriched in each gene cluster for human against default human background genes using GOrilla. The last column denotes the cluster number of the genes
E. Macaque GO enrichment analyses. Same method and format as in D.
F. Mouse GO enrichment analyses. Same method and format as in D.
G. Human Regulon Activity. Activities (i.e., AUC scores, see Methods) of 82 regulons identified across the 26 germ cell type centroids of human. Each row is a regulon composed of a TF with at least 10 potential direct targets; the “extended” regulons include motifs that have been linked to the TF by lower confidence annotations (e.g. inferred by motif similarity). Each column is one of the 26 germ cell types (6 SPG stages and 20 non-SPG germ cell stages).
H. Mouse Regulon Activity. Activities (i.e., AUC scores) of 88 regulons identified across the 26 germ cell type centroids of mouse. Same method and format as in G.
Table S6. Markers for 7 somatic cell types, Related to Figure 6 and Figure S6.
A. Average normalized expression of all genes for 7 human somatic cell types. For each detected gene in human somatic cells, we calculated its mean value in each of the 7 somatic cell types by averaging the normalized expression of the gene within each cell type and then plus 1 and natural log-transformed. Each row is a detected gene; each column is a somatic cell type for human testis.
B. Markers for 7 human somatic cell types. Markers were obtained by comparing each cell type against all other 6 somatic cell types for human using binomial likelihood test. In each selection, genes that satisfied three conditions were considered: 1) with a minimum of 20% difference in detection rate in the two groups; 2) with at least 2-fold higher expression in the cell type; 3) p < 0.01. Each row is a marker for a given cell type ranked by p-value; from left to right, the columns are: cell type, gene name, p-value, log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and adjusted p-value.
C. Average normalized expression of all genes for 7 macaque somatic cell types. For each detected gene in macaque somatic cells, we calculated its mean value in each of the 7 somatic cell types by averaging the normalized expression of the gene within each cell type and then plus 1 and log-transformed. Each row is a detected gene; each column is a somatic cell type. On the right side of the table, we calculated the fraction of cells expressing a gene for each somatic cell type.
D. Markers for 7 macaque somatic cell types. Markers were obtained by comparing each cell type against all other 6 somatic cell types for macaque using binomial likelihood test with the same thresholds as described above; from left to right, the columns are: cell type, gene name, p-value, log-scale fold change, fraction of cells expressing the marker in this cell type, fraction of cells expressing the marker in all other cell types, and adjusted p-value.
Table S7. Germ-Soma Receptor-Ligand Expression, Related to Figure 7 and Figure S7.
A. Human highly variable receptor-ligand pair scores. Each row is one ligand-receptor pair and each column is a pair of cell types between 7 somatic clusters and 26 SPG-germ cell clusters. We selected receptor-ligand pairs such that at least one of them is highly variable among somatic or germ cell clusters. For each pair, we calculated the interaction strength, defined as the product of the mean expression level of the ligand in the somatic cells and the receptor in the SPG-germ cells (details in Methods).
B. Macaque highly variable receptor-ligand pair scores. Ligand-receptor interaction strength matrix for macaque using the same method described in A.
C. Mouse highly variable receptor-ligand pair scores. Ligand-receptor interaction strength matrix for mouse using the same method described in A.
D. Human common receptor-Ligand pair scores (union of pairs A-C). Ligand-receptor interaction matrix for human for ligand-receptor pairs found in three species (union of the lists from A-C), but only those where both the ligand and the receptor can be matched across species as orthologues (details in Methods). Interaction strength is calculated in the same way as described in A.
E. Macaque common receptor-ligand pair scores (union of pairs A-C). Ligand-receptor interaction matrix for macaque using the same 721 ligand-receptor pairs and calculated using the same method as described in D.
F. Mouse common receptor-ligand pair scores (union of pairs A-C). Ligand-receptor interaction matrix for mouse using the same 721 ligand-receptor pairs and calculated using the same method as described in D.
G. Conservation score for shared orthologous receptor-ligand pairs. For the list of 721 ligand-receptor pairs in D-F, we calculated the correlation of the interaction scores between each pair of species as a measure of conservation. Then, we further calculated the average of the three pairwise correlation as the overall conservation score for each L-R pair among the three species.
Data Availability Statement
Raw and processed data files for Drop-seq experiments are available under the GEO accession number GSE142585 with token qjgrsgcutpkrpat for reviewer access.
