SUMMARY
The challenges in recapitulating in vivo human T cell development in laboratory models have posed a barrier to understanding human thymopoiesis. Here we used single cell RNA-Seq to interrogate the rare CD34+ progenitor and the more differentiated CD34− fractions in the human postnatal thymus. CD34+ thymic progenitors were comprised of a spectrum of specification and commitment states characterized by multilineage priming followed by gradual T cell commitment. The earliest progenitors in the differentiation trajectory were CD7− and expressed a stem cell like transcriptional profile but had also initiated T cell priming. Clustering analysis identified a CD34+ subpopulation primed for the plasmacytoid dendritic lineage suggesting an intrathymic dendritic specification pathway. CD2 expression defined T cell commitment stages where loss of B cell potential preceded that of myeloid potential. These datasets delineate gene expression profiles spanning key differentiation events in human thymopoiesis and provide a resource for the further study of human T cell development.
Graphical Abstract
eTOC Blurb
Le et al use single cell RNA-Seq to interrogate the rare CD34+ progenitor and the more differentiated CD34− fractions in the human postnatal thymus, identifying a CD34+ subpopulation primed for the plasmacytoid dendritic lineage and revealing multilineage priming followed by gradual commitment to the T cell lineage during the initial stages of thymopoiesis.
INTRODUCTION
While commitment to most hematopoietic lineages occurs in the bone marrow (BM), T lineage commitment uniquely occur in the thymus. In humans, the multilineage progenitor cells that migrate from the BM and subsequently undergo T lineage commitment through differentiation in the thymus are characterized by expression of the CD34 antigen. (Plum et al., 2008). (Plum et al., 2008). Single cell RNA-Seq (sRNA-Seq) studies of human BM hematopoietic stem and progenitor cells (HSPC) have uncovered intermediate differentiation states revealing lineage commitment pathways that could not be resolved in bulk studies of immunophenotypically defined populations (Pellin et al., 2019; Velten et al., 2017). However, the progenitor cells in the postnatal thymus that initiate T cell differentiation were not included in these studies leaving critical gaps in the understanding of T cell lymphopoiesis.
CD34+ progenitor cells constitute <1% of human thymocytes, and the overwhelming majority of these CD34+ cells express CD7. The initial stages of thymopoiesis are characterized by the induction of T lineage genes (specification) and repression of alternative lineage genes (commitment). Multilineage (CD34+CD7−CD1a− (Thy1, <5% of CD34+ cells) and CD34+CD7+CD1a− (Thy2)) and committed (CD34+CD7+CD1a+, Thy3) populations with distinct bulk level transcriptomes have been immunophenotypically defined within the CD34+ thymocyte fraction (Casero et al., 2015; Hao et al., 2008; Plum et al., 2008; Weerkamp et al., 2006). The population (Thy1 or Thy2) found to represent the earliest thymic progenitors and results regarding whether Thy2 cells possess myeloid and B lineage potentials vary between studies from different groups (Hao et al., 2008; Weerkamp et al., 2006). Furthermore, inter-study differences in the immunophenotypes used to define rare progenitor populations and the unclear relationship between the populations in the different studies confound the interpretation of how findings from various groups relate to each other. Bulk studies suggest that Thy1 cells have a hematopoietic stem cell (HSC) like gene expression profile along with T lineage transcriptional priming (Casero et al., 2015; Hao et al., 2008). However, due to the inability of bulk profiles to determine whether stem and lineage genes are co-expressed in individual cells, it is unclear whether these cells represent HSC transiting through the thymus or a thymic progenitor population.
Two recent studies of unfractionated human thymocytes have defined ontogenetically distinct T cell precursors during the earliest phases of thymus organogenesis (Zeng et al., 2019) as well as T cell subtypes among committed CD34− cells in late fetal and postnatal thymopoiesis (Park et al., 2020). However, the liver derived precursors seen prior to the establishment of T cell generation in the fetus have unknown differentiation fate. Furthermore, the use of unfractionated cells precludes the dissection of the initial stages of postnatal T cell differentiation, which occur in the rare CD34+ thymocytes. Taken together, the rarity of CD34+ cells in the thymus and biological differences between fetal and postnatal thymopoiesis including those in the thymus seeding HSPC (fetal liver vs BM) (Holyoake et al., 1999; Mold et al., 2010) make it essential to directly investigate the heterogeneity within purified postnatal thymic CD34+ cells at the single cell transcriptome level for the elucidation of lineage fate regulation during the postnatal thymopoiesis that generates a majority of the T cell repertoire in human life. While bulk studies have revealed species specific expression profiles of genes during thymopoiesis (Taghon et al., 2009; Van de Walle et al., 2016), these findings are confounded by species differences in progenitor immunophenotypes (Parekh and Crooks, 2013). Single cell transcriptomic data, which can reveal differentiation trajectories independent of immunophenotype information (Zheng et al., 2018), would also enable the definition of species related differences in thymopoiesis.
Here, we performed sRNA-Seq (~35000 cells) of cells spanning T cell differentiation in the human postnatal thymus including the rare CD34+ progenitor cells to define the transcriptional landscape of T lineage specification, commitment, and subsequent differentiation. We found that the initial stages of human thymopoiesis are characterized by multilineage transcriptional priming followed by a gradual transition to a T lineage restricted gene program, a process similar to the one underlying lineage commitment of HSPC in the BM (Velten et al., 2017). SRNA-Seq clustering analysis revealed a spectrum of specification and commitment cell states within the CD34+ fraction of thymocytes, uncovered CD2 as a marker for discriminating stages of lineage restriction within the Thy2 population, and revealed a rare CD34+ progenitor population in the human thymus that showed plasmacytoid dendritic lineage transcriptional priming in vivo. Reconstruction of the differentiation topology from sRNA-Seq data independently identified Thy1 cells as the earliest progenitor cells in the thymus and uncovered transcriptional changes underlying cell state transitions within heterogenous populations. We found species related differences in expression profiles of transcription factors (TF). Our data provide a resource for elucidating mechanisms in T cell development and T cell disorders.
RESULTS
Identification of cell clusters within the CD34+ progenitor fraction in the human thymus
To define the transcriptional landscape of the initial stages of human thymopoiesis, we performed droplet-based sRNA-Seq of CD34+Lin− cells isolated by Fluorescence activated cell sorting (FACS) from 5 thymuses (~25000 cells) using 10x Genomics (Thymuses 1-3) or inDrop (Thymuses 4 and 5) platforms. Input cells for sRNA-Seq of Thymuses 1-3 also included additionally sorted Thy1 cells (~125 cells per thymus) to allow sufficient representation of the rare Thy1 population (Figure 1A). A median of 2184-3527 genes per cell and 564-793 genes per cell showed detectable expression in 10x and inDrop data respectively (Table S1), a result consistent with previous studies (Crinier et al., 2018; Zemmour et al., 2018). After removing cells with low numbers of expressed genes, high mitochondrial gene expression, or very high outlier UMI counts (likely doublets) the remaining bioinformatically identified cells from all 5 thymuses (~20000 cells) were combined for downstream analysis (Table S1).
Figure 1.
Traditional immunophenotypically defined CD34+ thymic progenitor populations are comprised of a spectrum of lineage specification and commitment cell states. Human thymic CD34+ cells were analyzed by single cell RNA-Seq (sRNA-Seq). (A) Experimental schema. Fluorescence activated cell sorting gate for sorting CD34+Lin− cells (top plot) and frequencies of Thy1-3 among CD34+ cells (bottom plot) (Thymus 1) shown. (B) sRNA-Seq cell clusters (1-11: P1-11) in combined data from Thymuses 1-5 and representation of these clusters in individual thymuses. (C) Expression of a subset of HSPC (Hematopoietic stem and progenitor), alternative lineage, and T cell genes in clusters from (B). Aggregated data from Thymuses 1-3 is shown. % cells with detectable expression (% expression) and z scored mean expression shown. Data for Thymuses 4, 5 for (B,C) in Figure S1. Genes showing cluster specific increased or decreased expression in Table S2. (D) Frequencies of clusters in thymuses. (E) Index sorted sRNA-Seq data (Thymus 6, Figure S2) was mapped to data from (A) (Thymuses 1-5 combined). Plot: CD7 and CD1a protein data for Thymus 6 cells and the clusters from (B) that these cells mapped to (source data in Table S3). Each data-point represents one Thymus 6 cell. See also Figures S1 and S2.
Following correction of principal component analysis (PCA) results for technical and inter-donor biological variation related batch effects using Harmony (Figure S1A) (Korsunsky et al., 2019), we performed Seurat (v2.3) (Butler et al., 2018) clustering analysis on principal component (PC) values. Cell cycle effects were regressed out prior to PCA. CD34+ cells segregated into 13 clusters. Clustering results were consistent across a wide range of algorithm parameters (Figure S1B). Clusters 1 and 10 and clusters 3 and 13 showed very similar expression profiles respectively and were manually merged into one cluster each. Based on expression of known HSPC and lineage genes4, the 11 resulting clusters were labeled P1-11, with P1 and P11 showing the least (lowest expression of T cell genes) and most (highest expression of T cell genes) differentiated profiles respectively (Figures 1B, C, and S1C,D, Table S2). The 11 clusters were represented across all thymuses indicating inter-donor reproducibility (Figures 1B,D and S1C). Also, averaged sRNA-Seq expression profiles of P1-11 recapitulated previous bulk RNA-Seq data from CD34+ thymic populations (Casero et al., 2015), validating our sRNA-Seq data (Figures S2A and S2B). The proportions of the different clusters varied across thymuses and low numbers of P1 and P2 cells were seen in the inDrop data (Thymuses 4 and 5), observations consistent with inter-donor variation in frequencies of Thy1-3 immunophenotypes (Figure S2C), the lower number of cells in inDrop data, and the lack of an enrichment step for Thy1 cells during FACS for InDrop.
Based on CD7, CD1A, and CD34 mRNA profiles, we predicted P1 and P11 to consist of Thy1 and Thy3 cells respectively, P2-7 and P10 to be enriched for Thy2, and P8 and P9 to be enriched for Thy3 cells. Using Harmony, we next mapped immunophenotypes to P1-11 clusters via an integrated analysis of droplet sRNA-Seq data (Thymuses 1-5) with sRNA-Seq data from index sorted cells from a separate thymus (Thymus 6; 192 cells; BD genomics Precise WTA Single Cell Kit; Figures S2C–E, 1E). Among the immunophenotypes, only Thy1 mapped to P1 and the index sorted cells mapping to P11 were largely Thy3. Thy1 cells mapping to P1 tended to have low CD7 protein expression while Thy1 cells mapping to P3 and P5 were largely in the higher CD7 (fluorescence intensity of 150-1000) region of the Thy1 gate. Index sorted cells mapping to P2-7 and P10 were enriched for Thy2 cells and very few of these cells mapped to Thy3. In contrast, cells mapping to P8 and P9 were enriched for Thy3 cells. Consistent with the minimal transcriptional differences between Thy2 and Thy3 in bulk RNA-Seq studies (Casero et al., 2015), some clusters (P8, P9) mapped to both Thy2 and Thy3. Taken together, transcriptome mapping to immunophenotype data was mostly consistent with CD34, CD7 and CD1A mRNA levels, Thy2 and Thy3 cells spanned several cell states, and CD34+ cells consisted of a spectrum of cells states marked by gradual changes in CD7, CD1A, CD34, CD44 and CD2 mRNA levels (Figure 1C). These results emphasize the heterogeneity within immunophenotypes and the existence of transcriptionally similar cells across bordering regions of adjacent gates in FACS data for a few proteins.
CD34+ thymic progenitor cells encompass a spectrum of lineage specification and commitment cell states
To delineate the heterogeneity of cell states during the initial stages of human thymopoiesis, we examined the transcriptional profiles of the sRNA-Seq cell clusters (P1-11) within the CD34+ thymocyte fraction. Of note, the clustering analysis de novo (i.e. independent of immunophenotype information) identified the most primitive CD34+ cells in the thymus as a cluster with low CD7 mRNA expression (P1, 1.5% of cells, Figure 1B,C) that mapped to the Thy1 immunophenotype (Figure 1E, Table S3). Single cells co-expressing HSPC and T lineage genes were seen within the P1 cluster (Figures 2, S3A) and among index sorted Thy1 cells (Figure S3B) indicating an early onset of T lineage priming that occurs prior to the repression of stem cell genes. These data are consistent with the Thy1 immunophenotype being enriched for the earliest progenitor cells in the thymus, which show a transcriptional program that is HSC like but are also distinguished from HSC by the expression of T cell genes.
Figure 2.
The earliest thymic progenitor cells co-express HSPC (Hematopoietic stem and progenitor) and T lineage genes. Expression of HSPC and lineage genes in individual cells in the earliest thymic progenitor cluster (cluster 1 from Figure 1B) is depicted. Red: detectable expression. Blue: no detectable expression. Data from Thymus 2 shown; similar results were seen in Thymus 1 and 3. See also Figure S3.
Within Thy2 cells, high expression of HSPC genes was mostly limited to P3, and substantial expression of alternative lineage genes was largely restricted to P2 and P3 (Figure 1C). P2 cells, a rare cluster (~0.9% of cells, ratio of P2 cells to P1 cells among CD34+ cells = 0.6) marked by high IRF8 expression, were uniquely characterized by co-expression of T cell genes and a number of innate genes, in particular those known to be highly expressed in plasmacytoid dendritic cells (pDCs) (Balan et al., 2014; Lee et al., 2017; Marafioti et al., 2008). In addition, P2 showed high expression of CCR7, a homing gene highly expressed in Thy1 cells that potentially marks recently arrived cells from the BM (Hao et al., 2008; Zlotoff et al., 2010) (Figures 1C and 3A,B, Table S2). P2 cells were not flagged as doublets in a Scrublet analysis (Figure S3C) and showed similar UMI counts as P3 cells (mean UMI per cell = 9988-19461 vs 9403-22422, P2 vs P3 for Thymuses 1-3, p>0.1) making them unlikely to be doublets. P2 showed very little expression of PAX5 (expressed in 7% of cells) and CD19 (5% of cells) (Figure 3B) and did not express the mature pDC markers NRP1 (Collin and Bigley, 2018) or CD4 (Schmitt et al., 2007), making it unlikely to represent a B lineage committed or mature pDC population (Medvedovic et al., 2011). P2 also showed relatively lower expression of HSPC genes than other early clusters (P1 and P3, Figure 1C) further supporting it as being transcriptionally distinct from other uncommitted thymic progenitor populations.
Figure 3.
The initial stages of human thymopoiesis are characterized by multilineage priming followed by a continual transition to a T cell restricted gene program. (A) Expression of plasmacytoid dendritic cell (pDC) genes in P1-11 (1-11: P1-11 clusters from Figure 1B). Percentage cells with detectable expression (% expression) and z scored mean expression shown. (B) Expression of pDC, T, and B genes in individual P2 cells. Red: expression detected. Blue: no detectable expression. (C) Expression averaged across genes for HSPC (Hematopoietic stem and progenitor), T, B, NK, and myeloid genes in P1-3 and P11. Each data-point represents a cell. Data from Thymuses 1-3. Data for Thymuses 4 and 5 in Figure S4.
P3 cells co-expressed T cell and HSPC, myeloid, B cell, and/ or NK genes (Figures 3B,C and S4A,B). Co-expression of T cell and HSPC and/or alternative lineage genes was also seen in index sorted Thy2 cells (Figure S4C). These results indicate concomitant priming of multiple lineage transcriptional programs rather than just the presence of unilineage gene program expressing subsets among uncommitted progenitor cells.
A gradual increase in the mRNA levels of T cell genes was seen from P4-11 (Figure 1C). Expression of CD44, alternative lineage (DTX1) and HSPC (BAALC) genes was still seen in P4 and to a slightly lesser extent in P5. P6-11 showed largely committed profiles with lower or minimal expression of HSPC and alternative lineage genes. However, only P11 showed high RAG1 expression indicating that commitment occurs independent of T cell receptor (TCR) rearrangement (Figure 1C). Taken together, CD34+ thymocytes encompass a spectrum of lineage specification and restriction cell states that is marked by multilineage priming of single cells followed by repression of HSPC and alternative genes along with a continued increase in the expression levels of T cell genes.
IRF8 and CD2 expression distinguish subpopulations within the Thy2 population
To characterize, through functional assays, the lineage potentials of the uncommitted progenitor subpopulations inferred via sRNA-Seq, we examined the sRNA-Seq data for cluster specific marker genes that could inform the isolation of these cell types by FACS. P2 showed increased IL3RA, CD44, and CCR7 expression, intermediate CD7, and minimal CD1A (Figures 1C, 3A). However, consistent with the challenges in FACS isolation of cell types inferred from high dimensional sRNA-Seq data (Zheng et al., 2018), these mRNA profiles did not perfectly separate P2 from all other clusters. While IL3RA expression was quite specific to P2, IL3RA mRNA was not detectable in 70% of P2 cells. Although technical drop outs account for the absence of expression in many of these cells, these findings also suggest that P2 includes cells that truly lack IL3RA expression. Since IRF8 mRNA was detected in >75% of P2 cells, we first performed intracellular flow cytometry for IRF8, which revealed a rare distinct population of CD34+CD7+CD1a−CD19−CD4−CD8− cells with high IRF8 and CCR7 protein levels in two additional thymuses validating an IRF8+ subpopulation within Thy2 (Figure 4A).
Figure 4.
IRF8 and CD2 expression distinguish subpopulations within the Thy2 population. (A) Flow cytometry data (FC) showing a rare IRF8+ population within Thy2 and CCR7 expression in Thy1, IRF8− Thy2, and IRF8+ Thy2 cells (n=2 thymuses). (B) FC showing a rare CD123+ population within Thy2 (n=7 thymuses), and CD44, CD7, and IRF8 expression in the CD123+ population (n=4 thymuses). Plots pre-gated on CD34+CD4−CD8−CD19− cells (A,B). (C) Table: Pearson correlation between P1-11 single cell RNA-Seq (sRNA-Seq) data (aggregated from Thymuses 1-3; 1-11: P1-11 clusters from Figure 1B) and bulk RNA-Seq data from CD123+ Thy2 cells (B_R1, B_R2: 2 thymuses) isolated by Fluorescence-activated cell sorting (FACS). Expression of pDC and T cell genes and enrichment of genes with increased expression in P2 or pooled P3 and P4 among genes ranked by CD123+ Thy2 vs Thy2 expression are shown. NES: normalized enrichment score. FDR: False discovery rate adjusted p value. (D) Frequencies of pDC lineage (HLA-DR+CD14−CD66b−CD303+CD123+) cells in cultures (day 7-9) initiated with Thy1, CD123+Thy2 (CD123+), or CD123−CD44+ Thy2 (CD123−) populations. EXP: experiment number, each experiment done in single to triplicate wells, using a different thymus. Mean and standard error of mean (SEM) shown for duplicate and triplicate wells. Paired t-test p value for CD123+ vs CD123−. (E) FACS for discriminating commitment stages within Thy 2. (F) Day 21 FC from T cell differentiation cultures initiated with Thy2a or Thy2b (n=3 thymuses). (G) B (CD19+), NK (NK1: CD56+, NK2: CD56+CD11c+), and myeloid (M1: CD33+, M2: CD66b+, M3: CD14/15+) outputs from Thy2a and Thy2b in alternative (non-T) lineage assays (day 14). Each experiment (Exp) done in triplicate (mean, SEM) using a different thymus. Multifactorial ANOVA P value <0.001 for Thy2a vs Thy2b lineage outputs. FC: B and myeloid (M1) differentiation. No live CD45+cells seen when Thy2c or Thy3 were cultured. (H) Expression heatmap (Z-score scaled) for genes independently associated with cell state in a Bayesian multivariate analysis of sRNA-Seq data. 1,3, and 4 : P1, P3, and P4 clusters. Cell state: categorical variable (possible values: P1, P3, or P4). See also Figure S5.
We then used IL3RA (CD123), CD7, and CD1a expression to isolate a population enriched for P2 cells by FACS for further characterization studies. FACS uncovered a rare CD123+CD19−CD4−CD8− population (“CD123+”) almost entirely made up of Thy2 cells. CD123+ cells showed CD7, CD1a, CD44, and IRF8 protein levels consistent with P2 sRNA-Seq mRNA levels (Figure 4B). The ratio of CD123+ to Thy1 cells among CD34+ cells varied from 0.14-0.77 (5 thymuses). To transcriptionally validate P2, we compared the bulk RNA-Seq profile of the sorted CD123+ population with P1-11 sRNA-Seq profiles. Among P1-11, P2 was the cluster with the most correlated transcriptome with the CD123+ population. Furthermore, the CD123+ population showed a transcriptional signature strikingly similar to that of P2 with increased expression of a multitude of pDC genes (Figure 4C).
Sorted Thy1, CD123+ Thy2 (CD123+), CD123−CD44+ Thy2 (CD123− Thy2), and Thy3 populations were cultured in MS5-DLL1 artificial thymic organoids (ATO) or MS5 stromal co-cultures to assess T lineage or alternative (B, NK, myeloid, pDC) lineage differentiation respectively. MS5 does not support T cell differentiation. Thy1, CD123+, and CD123− Thy2 generated pDC (Figure 4D, S5A) and T cell lineage cells (Figure S5B). Consistent with the high expression of innate genes in the CD123+ population, these cells also generated monocytes (Figure S5C). However, unlike other uncommitted progenitors (Thy1 and CD123− Thy2), CD123+ cells showed minimal B or NK lineage differentiation (Figure S5C). Cultures initiated with CD123+ cells tended to show higher proportions of pDC lineage cells relative to those initiated with CD123− Thy2 cells (Figure 4D). pDC lineage cell counts on day 8 in cultures initiated with CD123+ cells were 2-4 fold higher than the number of input CD123+ cells. As expected, the committed Thy3 population showed robust T lineage output (Figure S5B) but did not generate non-T lineage cells (data not shown). Overall, sRNA-Seq informed the identification of a rare IRF8+ pDC lineage primed CD34+ thymic progenitor population and supported a pDC specification pathway during the initial stages of human thymopoiesis.
Of note, CD2 showed significantly decreased mRNA expression in P3 cells indicating this gene as a potential marker for discrimination of stages of commitment within the Thy2 population (Figure 1C and Table S2). We performed lineage assays of FACS sorted Thy3, Thy2a (CD34+CD7+CD1a−CD44+CD2−), Thy2b (CD34+CD7+CD1a−CD44+CD2+), and Thy2c (CD34+CD7+CD1a−CD44−CD2+) populations to test the prediction of CD2 as a marker for degree of commitment (Figure 4E). All 4 populations showed robust T lineage differentiation in ATOs (Figures 4F, S5D). Thy2c and Thy3, which are T lineage committed (Cante-Barrett et al., 2017), generated little to no cell output in alternative lineage assays. In contrast, Thy2a and Thy2b showed alternative lineage potentials. Furthermore, compared to Thy2b, Thy2a generated a significantly higher output of alternative lineage (B, NK, and myeloid) cells, a difference most pronounced for B lineage output (Figures 4G, S5E). Taken together, these results imply that the loss of B cell potential, an event that coincides with a gain in CD2 protein expression, precedes the silencing of myeloid and NK potentials, later commitment events that are associated with the loss of CD44 protein expression. Furthermore, B cell potential is confined to a minor fraction (Thy2a) of the Thy2 population. T lineage output from Thy2b was not lower than that from Thy2a (Figure S5F) indicating that the differences in alternative lineage output between them were not merely due to an inherently reduced proliferative ability in the case of Thy2b. Moreover, sRNA-Seq did not show differences in cell cycling or MKI67 expression between P3 and P4 clusters (Figure S5G,H).
Prioritization of candidate genes for functional studies based on cell type-gene expression associations uncovered by univariate differential expression analyses is challenging since these methods do not take into account the strong expression correlations that often exist between genes. In contrast, multivariate methods reveal associations independent of correlations between genes. To aid the use of our data as a resource for hypothesis generation, we identified transcription factor (TF) genes whose expression independently correlated with cell type identity using multivariate Bayesian Network (BN) modeling (BNOmics, (Gogoshin et al., 2017) of sRNA-Seq data . We used BN modeling to compare P1, P3, and P4, early progenitor clusters with closely related transcriptomes. The resulting list of independent TF-cell type associations (where possible values for identity = P1, P3, or P4) recapitulated known lineage fate regulators (TCF7, BCL11B) (Ha et al., 2017; Johnson et al., 2018; Li et al., 2010; Van de Walle et al., 2016) and suggested candidate regulators (TFDP2 and HOPX) (Figure 4H). We next performed a BN analysis comparing the pDC primed P2 cluster with all other CD34+ cells. While many TFs showed increased expression in P2 in a univariate analysis (.e.g. IRF8, IRF7, TCF4, and SPIB, Table S2), only IRF8, an essential TF for dendritic differentiation (Sontag et al., 2017), was independently associated with the P2 cell type in a BN multivariate analysis.
In summary, the loss of B cell and myeloid potentials represent distinct commitment stages in CD7+ cells, and pDC priming is represented by a rare IRF8+ subpopulation within the Thy2 population that is transcriptionally distinct from other uncommitted progenitor cells.
Transcriptional changes along the single-cell differentiation trajectory during the initial stages of human thymopoiesis
We ordered CD34+ cells along the differentiation trajectory via Monocle (v3) pseudotime analysis (Cao et al., 2019) of 10x data (Thymuses 1-3) to delineate expression patterns of genes during the initial stages of thymopoiesis (Figure 5A). Overall, the order of P1-P11 cells (P1-11: clusters from Seurat analysis, Figure 1B) relative to each other was in agreement with the inter-cluster transcriptional relationships seen in the prior Seurat analysis. Namely, the trajectory downstream of the root cells in P1 sequentially passed through the remaining P1 cells, P3, and a complex series of branches and reunions through P4-11 with P11 being toward the end of the trajectory. Of note, P2 originated from the earliest P1 cells suggesting an early divergence of at least some pDC primed progenitors from the main T cell pathway (Figure 5B).
Figure 5.
Transcriptional changes along the single-cell differentiation trajectory during the initial stages of human thymopoiesis. (A) Analysis schema. (B) Pseudotime trajectory. 1-11: P1-11 clusters from Figure 1B. (C) Modules of genes whose expression varied with pseudotime (Thymus 1 shown; 99% overlap seen with Thymus 2 and 3 modules; genes in Table S4). (D-G) Cells along the shortest path between the root node in P1 and the circled (red) node in P11 were used to model pseudotime expression profiles. Pseudotime distributions of cells (D, circles: metastable states), clusters (E), cycling status (F), and maximal rates of change in gene expression (G) along the shortest path. (E) includes data for the cell cycling gene PCNA. (H) Expression profiles of a subset of Hematopoietic stem and progenitor (HSPC) genes and lineage genes.
Genes that varied with pseudotime were clustered into 5 modules (Figure 5C, Table S4). Genes with increased expression in early progenitors segregated into 3 modules: increased expression in P1 (Progenitor 1), increased expression in P2 or P2 and P3 (Progenitor 2), or increased expression in P1, P2 and P3 (Progenitor 3). We found four metastable states (regions of high cell density in pseudotime, (Haghverdi et al., 2016), which most likely represented the earliest progenitor, proliferating progenitor undergoing lineage restriction, earliest fully committed progenitor (coinciding with complete repression of DTX1), and TCRβ rearrangement phases respectively. These states coincided with or were immediately preceded by the regions of maximal transcriptional changes. While cells late in the trajectory (enriched for P11 cells) showed cell cycle arrest, preceding cells with committed transcriptomes (P6, P10, and late P5) included cycling cells indicating commitment, which precedes TCRβ rearrangement, is not associated with proliferation arrest (Figure 5D–G). Although cell cycle effects were regressed out, given the known strong differences in proliferation between thymic progenitor differentiation stages (Rothenberg, 2019), we cannot exclude that cell cycle differences contributed to the observed changes in cell density over pseudotime.
To facilitate the use of the pseudotime data as a resource, we provide gene expression profiles smoothed over pseudotime (tradeSeq, Table S4). These profiles finely distinguished transcriptional changes along the differentiation trajectory: e.g. genes repressed earlier (e.g. MEIS1) vs later (e.g. DTX1) (Figure 5H). In summary, pseudotime analysis delineated the major transcriptional shifts during the initial stages of thymopoiesis and revealed an early off shoot of pDC primed progenitors from the canonical T cell differentiation pathway. In contrast to expression profiles defined across a limited number of data-points in bulk studies, pseudotime analysis provided a high statistically powered resource for the inference of gene regulatory networks in thymopoiesis.
The transcriptional landscape underlying key post-commitment differentiation checkpoints in human thymopoiesis
To define the transcriptional landscape of the post-commitment stages of thymopoiesis, we performed sRNA-Seq of CD34−CD45+GlyA− cells (3 Thymuses, thymuses 1,3,7, ~15000 cells, 10x platform, Figure 6A). An integrated analysis of all thymuses revealed 18 clusters (P1-18neg, Figures 6A–C and S6B,C) with the following predicted identities: P1neg: pre β-selection cells, P2-5neg: cells undergoing β-selection mediated proliferation (includes CD4+CD8+ double positive (DP) cells), P6-11neg: late (quiescent) DP cells, and P12-18neg: single positive (SP) T cells.
Figure 6.
Transcriptional landscape of the post-commitment stages of human thymopoiesis. Human thymic CD34− cells were analyzed by single cell RNA-Seq (sRNA-Seq) (A) Experimental schema. (B) Clusters (1-18: P1-18neg) in merged sRNA-Seq data (CD34− cells) from Thymuses 1, 3, and 7. (C) Expression of genes across clusters in (A). % cells with detectable expression (% expression) and z scored mean expression shown. (D) Overlay of sorted CD4SP and CD8SP sRNA-Seq data onto CD34− cells (thymus 7, different TSNE space from A). (E) Expression of a subset of transcription factor genes in P11-18neg (11-18). (F) Genes (rows) differentially expressed between early CD4SP and CD8SP. Late DP (P11neg), early CD4SP, and early CD8SP (defined in Figure S6) shown. Gene clusters 1 and 2 in Table S5. (A-C, E, and F) Aggregated data from thymuses 1, 3, and 7. See also Figure S6.
The lack of immunophenotypic markers precisely delineating the initiation of β-selection during human thymopoiesis has meant the acute transcriptional changes underlying human β-selection remain poorly defined. We propose that the changes in expression between P1neg and P2/P3neg represent the transcriptional changes underlying the initiation of β-selection. DNA damage repair, spliceosome, and mitochondrial genes showed increased expression with β-selection. PBK, a pro-proliferative gene expressed in cytotoxic T cells (Hu et al., 2010), the anti-apoptotic gene BIRC5 (Li et al., 1998), and the cell cycle regulator BTG3 showed increased expression in P2/P3neg cells (Table S5).
An integrated analysis of sRNA-Seq data from sorted CD4 SP, CD8 SP, and CD34− cells from the same thymus (Thymus 7) identified P13,15, and 17neg as CD8SP, P14neg as CD4SP, P11neg as a mixture of DP and emerging CD4SP, P12neg as a mixture of emerging CD4SP and CD8SP, and P16 and 18neg as overwhelmingly made up of CD4SP. Based on cluster specific transcriptional signatures, we predicted the following identities: P13neg: CD8αα+ innate like T cells (high GNG4), P14neg: conventional CD4SP, P15neg: conventional CD8SP, P16neg: Treg like cells (FOXP3+, TNFRSF9+), P17neg: NK like (NKT and Υδ) T cells (high KLRB1 and NKG7), and P18neg: Treg cells (high FOXP3 and IL2RA). We found additional cell type-TF associations such as GTF3A (CD8αα+ T cells), HOXB2 (Treg), and HMGN3 (NK like T cells) (Figures 6D,E and S6D, Table S5).
Heterogeneity within DP and SP populations limits the ability to delineate the transcriptional changes underlying the CD4 vs CD8 fate choice via bulk studies. To identify candidate genes underlying CD4 vs CD8 fate, we screened for genes differentially expressed between the earliest CD4SP and CD8SP cells (tradeSeq) in a Monocle pseudotime trajectory encompassing terminal DP and conventional SP cells (P11, 12, 14, and 15 neg) (Figure S6E,F). While 147 genes including CD4 (cluster 1) showed higher expression in early CD4SP, only 33 genes including CD8A and CD8B (cluster 2) showed higher expression in early CD8SP (Figure 6F, Table S5). NF-kappaB, TCR signaling, and P53 like TF genes showed increased expression in the CD4 branch. GATA3 and NR4A1, TF important for CD4SP survival (Ho et al., 2009) and regulation of effector T cell differentiation (Liu et al., 2019) respectively, showed increased expression in CD4SP. CD8SP showed increased ZNF496 expression. Overall, sRNA-Seq data from CD34− cells recapitulated known regulators and uncovered candidate regulators (.e.g. NR4A1 and ZNF496) underlying CD4 vs CD8 fate choice and SP subtypes.
The earliest human thymic progenitors are transcriptionally distinct from murine thymic progenitors
To compare the transcriptional landscapes of the initial stages of human and murine thymopoiesis, we analyzed putative DN1-3a cells (murine counterparts of human thymic CD34+ cells) (Rothenberg et al., 2016) in a published sRNA-Seq data from unfractionated murine fetal and post-natal thymic cells (Kernfeld et al., 2018). DN1-3a cells were mainly made up of fetal cells. Seurat clustering revealed 12 clusters (Pm1-12, Figure 7A). We predicted Pm1-2, Pm3, Pm4 and 6, Pm5, Pm7-10, and Pm11-12 to represent DN1, DN2a, innate lymphoid, DN1 like, DN2b/DN3a, and DN3a cells respectively (Figures 7A,B) (Rothenberg et al., 2016; Zhou et al., 2019).
Figure 7.
The earliest human thymic progenitors are transcriptionally distinct from murine thymic progenitors. (A) Cell clusters (1-12: Pm1-12) in single cell RNA-Seq (sRNA-Seq) data from murine thymic progenitors. (B) Expression of a subset of HSPC (Hematopoietic stem and progenitor cell) and lineage genes in clusters. Percentage cells with detectable expression (% expression) and Z-scored mean expression shown. (C) Overlay of murine data onto human CD34+ thymic cells (combined data from Thymuses 1-5, Figure 1B). The human P1 cluster segregated into 1A and 1B. 1A and 1B are shown on the human pseudotime trajectory (bottom right). (D and E) Transcription factor (TF) genes with divergent (D) and HSPC and lineage genes with conserved (E) profiles between human and murine progenitors. A public dataset containing murine fetal thymic progenitors was used. See also Figure S7.
We used Harmony to integrate murine and human data (Shafer, 2019). Murine uncommitted and committed cells tended to overlap or be in closer proximity to human uncommitted and committed clusters respectively (Figure 7C). The innate Pm4 and Pm6 cells overlapped with the human P2 cluster, a finding consistent with the increased expression of innate genes in P2. However, neither Pm4 nor Pm6 showed increased expression of pDC genes (Irf8, Irf7, Il3ra, Spib, Figures 7A, S7A), and a pDC primed cluster similar to P2 was not seen among murine cells. Cells belonging to the earliest human progenitor cluster (P1) segregated into two subclusters (1A and 1B). While 1B completely overlapped with murine uncommitted progenitors (mainly Pm2 and 3), 1A showed minimal overlap with murine cells. Similarly, the earliest murine progenitors (Pm1) showed minimal overlap with human cells. Of note, 1A occupied an earlier position on the human pseudotime trajectory than 1B. Thus, while human and murine progenitors show many transcriptomic similarities, the earliest human and murine progenitor cells have distinct transcriptomes. (Figure 7C).
To compare gene expression profiles between species, we performed pseudotime analysis of murine data (Figures S7B,C). Several HSPC (MEF2C, SPI1, and HHEX) (Casero et al., 2015; Jackson et al., 2017; Stehling-Sun et al., 2009; Unnisa et al., 2012), alternative lineage (BTK, TYROBP, MPO, IRF8, and ZBTB16) (Casero et al., 2015; Hosokawa et al., 2018; Lee et al., 2017; Tomasello and Vivier, 2005), and T lineage (TCF12 and LEF1) (Casero et al., 2015) genes showed conserved profiles. However, divergences were observed for a number of TFs. Among the critical T lineage TF, the onset of Bcl11b expression was preceded by acute increases in the expression levels of Tcf7 and Gata3 in mice, a finding consistent with bulk RNA-Seq studies (Kueh et al., 2016). In contrast, the onset of BCL11B expression was not preceded by rising TCF7 and GATA3 expression levels and BCL11B and TCF7 expression levels instead increased concurrently in humans. The HSPC TF BAALC (Casero et al., 2015) and B lineage TF EBF1, genes expressed in early stages of human thymopoiesis and AEBP1, a gene whose expression level increases with T commitment in humans, were not expressed or barely expressed in murine progenitors. The erythroid TF NFE2 (Woon Kim et al., 2011) was silenced earlier in murine relative to human thymopoiesis. On the other hand, the HSPC TFs HOPX and MEIS1 (Casero et al., 2015) were silenced earlier in human thymopoiesis. The HSPC TF BCL11A (Casero et al., 2015) was silenced with commitment in murine cells but continued to be expressed albeit at lower levels (relative to uncommitted cells) in committed human cells. Consistent with previous studies, expression of the essential T cell TF NOTCH1 (Radtke et al., 1999; Taghon et al., 2005) and its target DTX1 (Van de Walle et al., 2016), a TF required for thymic NK differentiation declined with commitment in the human thymus but was sustained at high levels during commitment in mice. Furthermore, the expression level of ID2, a TF that promotes NK differentiation (Hosokawa et al., 2018) sharply declined early in human but not murine cells (Figures 7D,E and S7D).
Overall, these data indicate conserved features as well as species related differences in progenitor cell transcriptomes and the expression profiles of TFs in the initial stages of thymopoiesis independent of immunophenotype information.
Discussion
We defined the transcriptional landscape underlying lineage fate in human thymopoiesis using cell states identified de novo in data from all CD34+ cells thereby minimizing confounding effects from the use of FACS gates for the delineation of differentiation stages, the immunophenotype definitions for which vary between studies from different groups. Our results underscore the importance of defining single cell transcriptomes in purified rare progenitor cells for the delineation of the earliest events in lineage differentiation.
PDC are present in the thymus in humans and mice, and uncommitted thymic progenitors can generate pDC in vitro (Dontje et al., 2006; Martin-Gayo et al., 2017). However, non-physiological experimental conditions make it challenging to interpret the lineage pathways during in vivo human thymopoiesis from in vitro or xenotransplantation assays, and it remains unknown whether in vivo the thymic pDC originate from thymic progenitors or extrathymic sites. Furthermore, transcriptional evidence of pDC specification in unperturbed primary human thymic progenitor cells has been lacking, in part because the rarity of pDC primed cells in vivo and the heterogeneity in their immunophenotype limit their amenability to isolation. SRNA-Seq de novo identified a rare population of cells with pDC transcriptional priming among unperturbed human thymic CD34+ cells. The rarity of the pDC progenitor and the inhibitory effect of NOTCH1 signaling on pDC differentiation (Dontje et al., 2006) precluded clonal T/pDC or extensive multilineage assays. Nevertheless, FACS confirmed a rare CD34+ thymic population with increased expression of numerous pDC genes that showed robust T cell and pDC potentials and monocytic potential but no B or NK differentiation in vitro, findings consistent with the co-expression of T cell, pDC, and innate genes seen in the P2 sRNA-Seq cluster. A rare CD34 low CD123+ population with pDC potential has been reported in the human thymus in one study (Martin-Gayo et al., 2017). However, T, B, NK, or myeloid lineage potentials or the transcriptional profiles of the primary CD123+ cells were not investigated in the aforementioned study. The pDC primed progenitor population identified in our study likely includes the previously reported CD123+ population. The robust pDC differentiation from Thy1 in vitro (Martin-Gayo et al., 2017), a finding also observed in our study, is consistent with the pseudotime prediction of the origin of pDC primed progenitors from the earliest Thy1 cells. Our findings support the existence of a pDC developmental pathway from thymic progenitors in vivo and provide comprehensive single cell transcriptional profiles of the earliest pDC primed cells, a resource for dissecting mechanisms in thymic pDC differentiation. Interestingly, the CD24+CD117− subset of murine DN1 cells (DN1d) possesses T cell potential in vitro (Porritt et al., 2004) and is transcriptionally pDC primed (Moore et al., 2012). Further studies are needed to determine whether the murine DN1d and the human thymic IRF8 high population are equivalent cell types.
Of note, while CD2 mRNA expression is increased in fully committed cells relative to uncommitted progenitors in the thymus (Casero et al., 2015), the Thy2 population is reported to have a CD2+ immunophenotype (Marquez et al., 1998). Our results indicate the presence of CD2− and CD2+ subpopulations with differing lineage potentials within the uncommitted CD44+ fraction of Thy2 cells.. Given the lack of engraftment of human thymic progenitors in immunodeficient mice and the lack of robust clonal assays that support T and B lineage output from the same cell (Hao et al., 2008; Weerkamp et al., 2006), clonal or in vivo studies were not performed for Thy2a cells. The loss of B cell potential first followed by the loss of myeloid and NK potentials later in human thymopoiesis seen in our study is consistent with findings in mice, where the repression of B cell potential precedes the silencing of myeloid and NK potentials (Bhandoola et al., 2007; Scripture-Adams et al., 2014).
A cross species analysis of thymic sRNA-Seq data revealed several inter-species differences, a finding that bears implications for applying findings from mice to human T cell development. While we cannot exclude that ontogeny (fetal progenitors in the murine dataset) accounted for some of the differences, the analysis also recapitulated known species differences (Van de Walle et al., 2016) indicating that our results at least in part reflect species specific aspects of thymopoiesis.
SRNA-Seq has identified a fetal liver progenitor like CD34+ population in the human early fetal thymic primordium (Zeng et al., 2019). Of note, SP T cells are not seen at this gestational stage, a cellular landscape distinct from the postnatal thymus. Two progenitor cell states, a quiescent one and a proliferating one, are distinguishable in fetal thymic sRNA-Seq data (Zeng et al., 2019). A recent sRNA-Seq study of unfractionated thymocytes (Park et al., 2020) has revealed several SP T cell subtypes but does not delineate stages of differentiation within CD34+ cells. In contrast, we identified a multitude of CD34+ cell states in the postnatal thymus representing a spectrum of lineage specification and commitment stages. Differences between BM derived postnatal thymopoiesis and fetal liver derived differentiation in thymic primordia, the low frequency of CD34+ cells in unfractionated thymocytes, the rarity of the earliest progenitor subset among CD34+ cells, and the use of purified CD34+ cells along with enrichment for the earliest progenitors (Thy1 cells) in our study likely account for the differing progenitor landscapes seen in our and these other studies.
The limited resolution of differentiation achievable with bulk population studies has represented a barrier in the study of thymopoiesis. The single cell database presented here finely parses stages of lineage specification and restriction and defines transcriptomic changes underlying cell state transitions across T cell differentiation in the human thymus.
STAR METHODS
RESOURCE AVAILABILITY
Lead contact
Requests regarding further information and datasets should be directed to the lead contact Chintan Parekh, cparekh@chla.usc.edu
Materials availability
This study did not generate new unique reagents
Data and code availability
Sequencing data are deposited in the Gene expression omnibus (GSE, GEO GSE139042).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Thymuses
Deidentified thymuses from children undergoing cardiac surgery were collected through a Children’s Hospital Los Angeles institutional review board approved protocol. CD34+Lin− and CD34−CD45+GlyA− thymic cells were isolated by magnetic column separation (MACS, Miltenyi Biotec, CA) followed by FACS (FACSAria, BD). The sex and age of the thymus donors are listed in Table S1. The limited number of thymuses precluded an analysis of the effect of sex on the results of the study.
Lineage differentiation assays
The following CD34+CD4−CD8−CD19− populations were FACS sorted from thymic cells: CD1a−CD7− (Thy1), CD7+CD1a−CD123+ (CD123+), CD7+CD1a−CD44+CD123− (CD123− Thy2), and CD7+CD1a+ (Thy3). Given the rarity of the CD123+ population a large number of events (500,000-700,000) were recorded to visualize the CD123+ population. Sorted cells were cultured in artificial thymic organoids (ATO) (Seet et al., 2017) to assess their T lineage potential or co-cultured with MS5 cells in B-NK-myeloid (Ha et al., 2017) or pDC media to assess their alternative (non-T lineage) potentials .
The following CD34+CD7+CD4−CD8−GlyA− populations were FACS sorted from thymic cells to determine whether CD2 can be used to discriminate stages of commitment Thy2a (CD1a−CD44+CD2−), Thy2b (CD1a−CD44+CD2+), Thy2c (CD1a−CD44−CD2+), and Thy3 (CD7+CD1a+). Sorted cells were cultured in artificial thymic organoids (ATO) to assess their T lineage potential or co-cultured with MS5 cells in B-NK-myeloid, NK, or myelo-erythroid media to assess their alternative (non-T lineage) potentials.
Sorted thymic cells mixed with 150,000 MS5-DLL1 cells were centrifuged, resuspended in 5-10 microliters of PBS +1% FBS and deposited on a cell culture insert, which was then cultured in a 6-well plate containing T cell differentiation medium to create an artificial thymic organoid (ATO) (1 ATO per well containing 1 ml of medium). Culture medium was replaced with fresh medium twice a week. ATOs were analyzed by flow cytometry for T lineage differentiation. ATOs were initiated with 1000 thymic cells for all the experiments except one of the experiments involving CD123+ cells where 300 thymic cells were used to initiate ATOs. In a given experiment, equal numbers of thymic cells were used to initiate ATOs from all the sorted populations.
Equal numbers of Thy1, CD123+, CD123− Thy2, and Thy3 cells (350-433 cells) were co-cultured with MS5 cells in 200 microliters per well of B-NK-myeloid or pDC differentiation media on a 96-well TC-treated plate in single to triplicate wells to assess their alternative (non-T lineage) potentials (Ha et al., 2017). Equal numbers of Thy2a, Thy2b, Thy2c, and Thy3 cells were co-cultured with MS5 cells in 200 microliters per well (2500 cells per well in Experiment 1 and 3600 cells per well in experiment 2) of B-NK-myeloid, NK, or Myelo-erythroid differentiation media in a 96-well TC-treated plate (triplicate wells) to assess their alternative (non-T lineage) potentials (Ha et al., 2017). 15,000 MS5 cells were plated per well one day prior to plating thymic cells to achieve 70-80% confluency of MS5 cells at the time of plating thymic cells. Half medium changes were performed twice a week with medium containing 2x concentration of cytokines. In the case of pDC medium cultures, cells were harvested after 7-9 days of culture, counted (trypan blue exclusion counts), and analyzed by flow cytometry for pDC differentiation. For cultures in B-NK-myeloid, NK, or Myelo-erythroid differentiation media, cells were harvested after two weeks of culture, counted (trypan blue exclusion counts), and analyzed by flow cytometry for B, NK, and myelo-erythroid differentiation. Details regarding the composition of culture media are included in Table S6.
METHODS
SRNA-Seq library preparation and sequencing
Unique molecular identifier (UMI) tagged 3’ sRNA-Seq libraries were prepared from CD34+Lin− (Thymuses 1-6, Lin: CD4, CD8 and Gly A), CD34−CD45+GlyA− (Thymuses 1, 3, 7), CD3+CD4+CD8− (CD4SP, Thymus 7) or CD3+CD4−CD8+ (CD8SP, Thymus 7) cells using 10x genomics (thymuses 1-3 and 7), inDrop (Thymuses 4 and 5) (Wolock et al., 2019a; Zilionis et al., 2017) or BD genomics Precise WTA Single Cell Kit (Thymus 6) platforms. Fresh samples were used for Thymuses 1,3, and 5-7. Cells cryopreserved after MACS were used for Thymuses 2 and 4. Libraries were prepared from 5000 CD34+ or 5000 CD34− cells for Thymus 1, 5000 CD34+ for Thymus 2, 5000 CD34+ or 5000 CD34− cells for Thymus 3, 2500 CD34+ cells for Thymus 4, 7500 CD34+ cells for Thymus 5, 192 index sorted CD34+ cells for Thymus 6 (one cell sorted per well), and 6000 CD34− , 2000 CD4SP, or 2000 CD8SP cells for Thymus 7. For Thymuses 1-3, CD34+ cells used for library preparation also included 2.5% (~125 cells) separately sorted CD34+CD7− CD1a−Lin− (Thy1) cells. FACS parameters for Thymus 6 were set to include at least 40 Thy1 and 40 Thy3 (CD34+CD7+CD1a+Lin−) cells among the sorted cells. The BD genomics Precise WTA Single Cell Kit user guide (910000014 Rev. 03) protocol was used or Thymus 6. Libraries were sequenced on Illumina Hiseq or Nextseq machines (sequencing parameters listed in Table S1).
Data filtration for SRNA-Seq analysis
Gene expression counts were generated using Cellranger (Thymuses 1-3 and 7), BD™ Precise WTA Analysis v2.0 (Thymus 6, Seven Bridges Genomics, RSEC adjusted UMI counts), or inDrop (Thymuses 4, 5) (Wolock et al., 2019a; Zilionis et al., 2017) pipelines. Since the focus of this study was not on non-coding genes, and non-coding genes tend to be lowly expressed compared with coding genes, only genes annotated as ‘protein coding’ in Ensembl or with the biotype value “protein coding” (Seven Bridges Genomics pipeline) were included in downstream analysis for Thymus 5 or 6 respectively. Publicly available sRNA-Seq data was used for mouse thymic cells (Kernfeld et al., 2018) (Table S1).
Based on barcode abundance plots (Zilionis et al., 2017), cell barcodes with <1000 reads and <10000 reads were excluded for Thymus 4 and Thymus 5 respectively. Data was first analyzed separately for each cell type (CD34+, CD34−, CD4SP, or CD8SP) for each thymus using Seurat (v2.3.4) (Butler et al., 2018) . Genes lowly expressed across all cells and cells with low number of expressed genes, high mitochondrial gene expression (likely stressed cells or loss of cytoplasmic RNA during library generation), or very high UMI counts (likely doublets) were removed following which cell cycle effects were regressed out. Since CD34+ cells from Thymus 5 were sequenced as 3 libraries, library preparation related batch effect was additionally regressed out for Thymus 5 in the Seurat analysis done individually on Thymus 5. Following dimension reduction by principal component analysis (PCA) of highly variable genes (as defined by Seurat), clustering was done on PC values of the most variable PCs. The most variable PCs were selected based on an elbow plot. Filtering and other computational analysis parameters as well as mitochondrial and cell cycling gene lists are included in Table S1.
CD34+ cells in Thymus 2 contained a cluster of cells with increased expression mitochondrial genes, which possibly represented stressed cells and was removed. CD34− cells showed a rare cluster of cells with high expression of PAX5, CD19, and CD22 in all 3 thymuses that most likely represented thymic B cells, and these cells were excluded from further analyses. Two small clusters with DP like transcriptional profiles were removed from the CD4SP and CD8SP datasets.
SRNA-Seq batch integration and clustering
Following data filtration, we merged the sRNA-Seq data from different thymuses. We merged data from CD34+ cells for Thymuses 1-5 into a single combined CD34+ dataset using the MergeSeurat function. Data from CD34− cells for Thymuses 1, 3 and 7 was merged into a single combined CD34− dataset. Data from CD34− , CD4SP, and CD8SP cells were merged into an additional single combined dataset for overlaying SP cells onto CD34− cells (SP-CD34− dataset). These merged datasets were then used for clustering analysis.
The first step in the clustering analysis of a given merged dataset involved regressing gene expression values for cell cycle effects and scaling the residuals. Cell cycle scores that were calculated during the separate analysis of each thymus were used for the regression in the case of the merged CD34+ and CD34− datasets. Cell cycle scores were calculated using the merged dataset in the case of the SP-CD34− dataset. Scaled expression values of highly variable genes (HVG) were used for PCA. HVG defined by Seurat on the merged dataset were used for PCA of the SP-CD34− dataset. HVG for the CD34+ and CD34− merged dataset consisted of the union of the top 1000 HVG for each of the individual thymuses defined during the separate analysis of each thymus. The HVG union set was first filtered to include only those genes that were common to the scale.data slot of the separate Seurat objects for all individual thymuses. The first 20 PCs (merged dataset PCA) were batch corrected using Harmony to remove confounding effects from variation between individual thymuses for the CD34+ and CD34− merged datasets. The thymus of origin (thymus 1,2,3,4 or 5) and the sequencing method (10x or inDrop) were used as co-variates for Harmony batch correction of the merged CD34+ dataset. The thymus of origin (thymus 1,3, or 7) was used as a single variate for Harmony batch correction of the merged CD34− dataset. Clustering for the CD34+ and CD34− merged datasets was done on the batch corrected values of the first 20 PCs using Seurat (K=30, resolution =1 for CD34+ cells, and resolution = 1.5 for CD34− cells).
Cells clustered by cell type rather than thymus. Results from the clustering analysis done on the merged dataset were compared with clustering results from separate analyses of individual thymuses to confirm that the segregation between cells belonging to different cell types that was seen in the individual analysis was maintained in the merged analysis. Cluster assignments of cells in the merged analysis were largely consistent with the co-clustering and segregation patterns seen in the separate individual analyses indicating that the Harmony batch correction approach preserved the inter-cell variance within individual thymuses. DiceR was used to generate clustering consensus matrixes across clustering parameter values. We generated one consensus matrix per Thymus for CD34+ cells from Thymuses 1-5 using the cell cluster assignments from clustering analyses done on the merged CD34+ dataset across a total of 25 combinations of the two parameters, number of nearest neighbors (k, value = 15, 20, 25, 30 or 35) and resolution (value = 0.8, 0.9, 1, 1.1 or 1.2) (Zheng et al., 2018). Scrublet (Wolock et al., 2019b) was used to determine whether any of the clusters in CD34+ merged dataset represented doublets rather than true biological clusters. Each of three thymuses (thymus 1,2 and 3) was analyzed separately using Scrublet. The threshold doublet score for calling doublets was manually set at 0.25 based on the doublet score distribution of simulated doublets. Since high outlier total UMI counts are indicative of doublets, we also compared total UMI counts per cell between the P2 and P3 clusters in sRNA-Seq data from CD34+ cells to determine whether P2 represented a doublet cluster. The P2 vs P3 UMI counts per cell comparison was done separately for each of three thymuses (Thymuses 1-3) using the Welch Two Sample t-test (R 3.6.1).
Clustering for the merged SP-CD34− dataset was done on PC values of the most variable PCs selected based on an elbow plot (First 5 PCs, K=30, resolution = 1). CD4SP and CD8SP cells were overlaid on CD34− cells in a TSNE plot generated from PC values of the most variable PCs. Image J (v 1.50a) was used to overlay CD4SP and CD8SP cells on CD34− cells in a single plot (Figures 6D). CD34− cells were labeled by their cluster identity from the cluster analysis in the merged CD34− dataset in order to enable mapping CD4SP and CD8SP cells to CD34− clusters. CD4SP and CD8SP cells were mapped to CD34− clusters via visual inspection of the overlaid TSNE plot. Results from visual inspection were confirmed by assessing the degree of co-clustering of CD4SP cells, CD8SP cells, and cells from the different CD34− clusters within clusters identified via clustering analysis of the merged SP-CD34− dataset.
Mapping of index sorted cells to clusters
SRNA-Seq data from index CD34+ sorted cells (Thymus 6) was mapped to droplet sRNA-Seq data (Thymus 1-5) by doing an integrated Seurat clustering analysis of Harmony batch corrected sRNA-Seq data from Thymuses 1-6 (thymus of origin and sequencing methods as covariates in Harmony). The HVG set that was previously defined for the merged CD34+ dataset (in the analysis in Figure 1A) was used for PCA. The matching droplet sRNA-Seq cluster type (possible types being P1-11 as per the merged CD34+ (Thymus 1-5) dataset analysis cluster assignments) for each cell in the index sorted dataset was determined by assessing the clusters identified in the integrated analysis of droplet and index sorted data (termed as integrated clusters). Each index sorted cell was assigned the droplet cluster type that accounted for the majority of the droplet data cells within the same integrated cluster as the index sorted cell. Powerpoint was used to crop empty spaces between rows for the gene dotplots for Thymus 6 in Figure S2E.
SRNA-Seq differential expression analysis
Several methods for computing batch corrected gene expression are available, many of which output corrected values in cosine space. The alteration in background variance of genes induced by these methods interferes with the modeling assumptions underlying statistical expression analyses. Furthermore, correction methods that output corrected values in cosine space do not preserve the scale of the original values and thus introduces changes in the magnitude and possibly even the direction of gene expression changes, a phenomenon that could induce spurious inter-cell type differences in expression within some of the batches. We thus performed inter-cluster differential expression analysis using gene expression values without batch correction and included batch identity as a covariate, an approach most suitable for statistical analysis of gene expression (Luecken and Theis, 2019). Genes with increased or decreased expression in each cluster were identified by the Seurat FindconservedMarkers function (min.pct = 0.1, min.diff.pct = 0.09, test.use = “MAST”). The min.pct and min.diff.pct parameters were not set when analyzing for genes with increased or decreased expression in cluster P5 in the merged CD34+ dataset. The FindconservedMarkers function performs a separate differential expression analysis for each thymus and outputs the differentially expressed genes that are common to all the thymuses in the merged dataset. The power of MAST to detect differentially expressed genes is dependent on the number of cells in a cluster. Hence, given the very low number of cells in P1 and P2 clusters in the inDrop data and the lower number of genes detected in inDrop data (thymuses 4 and 5) relative to those detected in 10x data, the analysis of cluster specific up or down regulated genes in the merged CD34+ dataset was restricted to the cells sequenced using the 10x platform (thymuses 1-3). Genes with cluster specific increased expression for the Pneg11-18 clusters in the merged CD34− dataset were defined by performing a FindconservedMarkers differential expression analysis restricted to Pneg11-18 cells.
SRNA-Seq gene co-expression analysis
Log normalized expression values of individual genes were used to calculate mean expression of HSPC, T, B, NK, or myeloid genes per cell. Many transcriptional programs are shared between lineages and very few genes are truly lineage specific. Hence, genes were defined as being HSPC, T, B, NK, or myeloid genes in the context of thymopoiesis based on published knowledge about gene expression profiles during human lymphopoiesis, effects of these genes on lineage differentiation of human thymic progenitors, and function of these genes in hematopoiesis (Balan et al., 2014; Casero et al., 2015; Collins et al., 2017; Hosokawa et al., 2018; Jackson et al., 2017; Lee et al., 2017; Marafioti et al., 2008; Stehling-Sun et al., 2009; Tomasello and Vivier, 2005; Unnisa et al., 2012; Van de Walle et al., 2016; Woon Kim et al., 2011). Quadrants on co-expression plots were set based on how cells clustered on contour plots as well as expression levels of HSPC and alternative lineage genes in the most differentiated P11 cluster, which was assumed to consist of cells with little to no expression of these genes.
Bayesian network sRNA-Seq analysis
Bayesian Network (BN) multivariate analysis of TF (Charoensawan et al., 2010; Fulton et al., 2009) association with cell type among CD34+ cells was done using BNOmics software (Gogoshin et al., 2017) (TF list in Table S1). Separate BNs were first constructed for thymus 1,2, and 3 using either all CD34+ cells (P2 vs rest comparison) or a subset made up of P1, P3, and P4 cells (one BN per thymus per analysis type = 6 BNs). Heuristic network model selection search (Gogoshin et al., 2017) was carried out with 1,000 random restarts, to ensure convergence. Discretization was done using 4-bin max entropy, which, in our experience, achieves the optimal over/under-fitting balance. The resulting top genes in the networks (immediate Markov neighborhoods of cell type nodes) proved to be robust to network reconstruction parameter variations. Direct genecell type associations in respective BNs from the three thymuses were intersected to generate the final list of genes associated with cell type.
Integrated analysis of human and murine data
A published sRNA-Seq dataset (Drop-Seq data) of unfractionated murine fetal and postnatal thymic cells (Kernfeld et al., 2018) was used. SRNA-seq gene expression data (UMI counts) were downloaded from GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE107910) and used for downstream analysis. We used the 13 genes with the highest mitochondrial association score in the Mouse.MitoCarta2.0 database (https://www.broadinstitute.org/scientific-community/science/programs/metabolic-disease-program/publications/mitocarta/mitocarta-in-0) for calculating mitochondrial gene expression. Data filtering and analysis parameters as well as the mitochondrial gene list for the murine dataset are included in Table S1.
We first performed Seurat clustering on all cells in the dataset without reqressing out cell cycle effects. Non hematopoietic cell clusters were excluded based on lack of Ptprc expression (Kernfeld et al., 2018). Hematopoietic cells were then re-clustered following regression based removal of cell cycle effects, and the dataset was filtered based on expression of hematopoietic lineage and murine thymic progenitor genes to only include putative DN1-3a clusters (cells equivalent to human CD34+ thymic progenitor cells, DN1-3a dataset). The DN1-3a dataset was then re-clustered using Seurat.
The murine DN1-3a dataset was mapped to the human thymic CD34+ dataset throuqh an integrated Seurat analysis of merged Harmony batch corrected human (Thymuses 1-5) and murine data. Mouse gene IDs were converted into human IDs using ortholog information from Ensembl (http://www.ensembl.org/biomart/martview/f21f6d3f6c139e32bd08fb6fbe59b5e8) and the Jackson database (http://www.informatics.jax.org/homology.shtml). Only genes with one to one orthologs were used. We then merged data from the human CD34+ merged dataset (from Figure 1B, Thymuses 1-5 ) and the murine DN1-3a dataset (from Figure 7A) into a single combined human-murine dataset (merged HM dataset) using the MergeSeurat function.
The first step in the analysis of the merged HM dataset involved regressing gene expression values for cell cycle effects and scaling the residuals. Cell cycle scores that were calculated during the previous separate analysis of each individual human or murine thymus were used for the regression. The HVG set that was previously defined for the merged CD34+ dataset (in the analysis in Figure 1A) was used for PCA of the merged HM dataset. The thymus of origin (thymus 1-5 or murine) and sequencing method (10x, inDrop or Drop-Seq) were included as covariates in Harmony. Clustering and TSNE dimensional reduction for the merged HM dataset (HM TSNE space) were then done on the Harmony batch corrected values of the first 20 PCs using Seurat (K=30, resolution =1).
Human and murine cells were then labeled by their cluster identity from the prior separate clustering analyses of the merged CD34+ dataset (P1-11 from Figure 1B) and the murine dataset (P1-12m from Figure 7A) and visualized separately in HM TSNE space in order to enable mapping between murine to human cell types. ImageJ (v 1.50a) was used to overlay human cells onto mouse cells in a single TSNE plot (Figure 7C). Mapping between cells from the two species was done via visual inspection of the overlaid TSNE plot. Results from visual inspection were confirmed by assessing the degree of co-clustering of human (P1-11) and murine (P1-12m) cell types within clusters identified via the batch corrected clustering analysis done on the merged HM dataset.
Pseudotime trajectory construction
Monocle (v3) was used for pseudotime analysis. Harmony batch corrected PC values for the merged CD34+ or CD34− datasets (i.e. the same PC values that were used for Seurat clustering analysis of merged datasets) were used as input for the Umap dimension reduction based construction of pseudotime trajectories for CD34+ or CD34− human thymic cells. For CD34+ thymic cells, Monocle analysis was restricted to the subset of thymuses that had been profiled using the 10x platform (thymuses 1-3).
In the case of CD34− cells, Monocle identified a total of 4 separate trajectories, one each for pre β-selection cells (P1neg, trajectory 1), DP cells (P2-10neg, trajectory 2), SP cells and the DP cells transitioning into CD8αα+ cells and the CD4SP vs CD8SP branchpoint (P11-16neg, trajectory 3), and Treg and NK like T cells (P17 and 18neg, trajectory 4). The existence of 4 separate trajectories suggests that effervescent transitioning cells between these 4 groups of CD34− cells were not represented in the sRNA-Seq datasets due to their extreme rarity. Since the CD4SP vs CD8SP branchpoint was of most interest among the CD34− cells, of the 4 trajectories we focused on the trajectory encompassing P11-16neg cells (trajectory 3). The get_earliest_principal_node helper function as defined at https://cole-trapnell-lab.github.io/monocle3/docs/trajectories/ was used to assign a node for which the highest fraction of closest cells belonged to the P1 cluster (CD34+ dataset) or Pneg11 cluster (CD34− datasets) as the root node.
Following the construction of the pseudotime trajectory for the combined data from CD34+ cells from Thymuses 1-3, the Monocle cds object was subsetted based on thymus of origin into three separate cds objects, one for each thymus (cds1-3). The graph_test function was then run on cds1-3 separately to identify genes whose expression varied with pseudotime (p value < 10^-8, pseudotime variable genes). We then intersected the sets of genes with pseudotime variable expression from cds1-3 to generate a list of genes common to all three thymuses. These genes were then segregated into expression profile modules by running the find_gene_modules function (resolution=c(10^seq(−6,−1)) separately on cds1-3. Modules with similar expression profiles were manually merged to yield 5 final modules with distinct expression profiles in the case of each subsetted cds. The set of final expression profile modules was remarkably reproducible across all three thymuses, and modules with similar expression profiles from different thymuses showed high overlap of module member genes.
The pseudotime trajectory for murine data was constructed via Umap dimension reduction of PCs using Monocle v3. The get_earliest_principal_node helper function as was used to assign a node for which the highest fraction of closest cells belonged to the Pm1 cluster as the root node.
Generation of pseudotime expression profiles
TradeSeq (Van den Berge et al., 2020) was used to generate gene expression profiles smoothed over pseudotime. A list of cells of interest and their pseudotime values were extracted from Monocle data and then used as input for the tradeSeq analysis.
Cells along the shortest path between the root node in P1 and a manually selected node in P11 (node shown in Figure 5) were extracted and used for smoothing expression values over pseudotime in the case of CD34+ cells. Cells were extracted by using the igraph shortest path to identify all the nodes on the shortest path between the root node and the manually selected node and then employing the principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex command to extract the cells neighboring the identified nodes.
P11neg, P12neg, P14 neg and P15 neg cells along the shortest paths from the root node in P11neg to manually selected nodes in P14neg and P15neg respectively were extracted in the case of CD34− cells. The root node and the following 13 nodes on the trajectory constituted the path upto the CD4 vs CD8 branchpoint and cells neighboring these 14 nodes were labelled as DP cells. The manually selected endpoint nodes represented the 14th and 15th node on the P14neg and P15neg branches respectively when counted from the last DP node. Due to the looped back reunion of the two branches past the selected endpoint nodes and the close proximity of the P14neg and P15neg cells in the Umap manifold, the neighboring cells for the selected nodes on the P14neg branch included 7% P15neg cells, almost all of which localized to the border between P14neg and P15neg cells i.e. in the region of terminal P15neg cells. Since the focus of the CD4 vs CD8 SP branchpoint analysis was the determination of genes differentially expressed between early CD4SP and CD8SP cells, these cells were excluded from the tradeSeq analysis to avoid distorting effects on modeling of smoothed gene expression values for the P14neg branch. Cells along the pseudotime trajectory consisting of the shortest path between the root node in Pm1 and a manually selected node in Pm12 (node shown in Figure S7) were extracted in the case of the murine sRNA-Seq dataset.
A single combined tradeSeq analysis with K (number of knots) = 6 and the thymus of origin as a covariate was done for CD34+ or CD34− cells (one combined analysis for CD34+ cells and a separate combined analysis for CD34− cells) on the genes depicted in Figure 5 and Figure 7 (CD34+ cells) and HVG. All cells were assigned the same value for ‘lineage’ in the tradeSeq analysis on CD34+ cells. Cells preceding the CD4 vs CD8 branchpoint (DP cells) were assigned as belonging to both CD4 and CD8 lineages and post branchpoint cells were assigned to the CD4 or CD8 lineage based on whether they were on the branch ending in Pneg14 or Pneg15 cells respectively. FitGam modeling was done for the same HVG that were used for PCA during Seurat analysis of the merged CD34+ or CD34− datasets. All genes (and not just the HVG) were used for normalization in tradeSeq. Default parameters were used for the tradeSeq predictSmooth function. Rate of change of gene expression with pseudotime for CD34+ cells was calculated as a rolling average over five datapoints of the differential of smoothed expression against pseudotime.
Genes differentially expressed between the earliest CDSP and CD8SP cells were identified using the earlyDETest function on the pseudotime segment between knots 3 and 4 (pvalue<0.05 & waldStat >= 10 & fcMedian >= 0.5). These differentially expressed genes were segregated into two clusters based on their smoothed expression in SP cells extending from the CD4 vs CD8 branchpoint to knot 4. Gene clustering was done using the clusterGenes function from Monocle v2.
Omnigraffle was used to enhance line weights in the pseudotime expression profile plots from GGplot to retain visibility when plot sizes were reduced for incorporation into figures.
Geneset enrichment and gene ontology analysis
GSEA v4.0 (Subramanian et al., 2005) and DAVID (Huang da et al., 2009) were used for geneset enrichment (GSEA) and gene ontology (GO) analysis respectively. The following genesets derived from MAST differential expression analysis of sRNA-Seq data were used for GSEA: genes with increased expression in Thy1 (Thymus 6), Thy3 (Thymus 6), the least differentiated CD34+Lin− cluster (P1), the most differentiated CD34+Lin− cluster (P11), the CD34+Lin− P2 cluster, and the pooled CD34+Lin− P3 and P4 clusters. These genesets were defined based on Thy1 v Thy3 (FindMarkers function, min.pct = 0.1, min.diff.pct = 0.09, test.use = “MAST”, p <0.05; Thymus 6), P1 vs P11 CD34+Lin− clusters, or P2 vs pooled P3 and P4 CD34+Lin− clusters Seurat differential expression analyses. The FindconservedMarkers function (min.pct = 0.1, min.diff.pct = 0.09, test.use = “MAST”; data from Thymuses 1-3) was used for the P1 vs P11 and P2 vs pooled P3 and P4 differential expression analyses. We tested for enrichment of these genesets among datasets from bulk RNA-Seq studies consisting of genes ranked based on fold change observed in Thy1 vs Thy3 or CD123+ vs Thy2 DEseq2 differential expression analysis (Love et al., 2014). Previously published Thy1, Thy2, and Thy3 bulk RNA-Seq data (Casero et al., 2015) for were used for these differential expression analyses.
Bulk RNA-Seq sequencing
Bulk RNA-Seq was performed on FACS sorted CD123+ cells (using same sorting gates as described in the Lineage differentiation assays section). The Arcturus Picopure RNA extraction kit was used to extract RNA from 367 (biological replicate 1) or 600 (biological replicate 2) sorted CD123+ cells from two different thymuses. The Smart-Seq V4 ultralow input RNA-Seq kit (Clontech) was used to make libraries, which were then sequenced on an Illumina Hiseq (150 bp paired end reads, 26 million paired end reads per sample ).
Bulk RNA-Seq data analysis
To analyze high throughput RNA-Seq data, we used the web based Galaxy platform (https://usegalaxy.org/). The reads were first trimmed to remove adapter sequences (Nextera paired-ended adapter sequences) using Trimmomatic (Galaxy version 0.38.0). TopHat (Galaxy version 2.1.1) was used to align trimmed reads to a pseudoatosomal region masked GRCh38 version of the human genome (GCA_000001405.15_GRCh38_no_alt_analysis_set.fna, http://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/analysisSet/). Gene counts were computed by running HTseq (Galaxy version 0.9.1) against the gencode.v31.annotation.gff3.gz annotation file (https://www.gencodegenes.org/human/release_31.html) on BAM files that had first been sorted using Samtools (Galaxy version 2.0.3) (Li et al., 2009). Default parameters were used for Trimmomatic (Bolger et al., 2014), TopHat (Kim et al., 2013) and HT-Seq (Anders et al., 2014) with the following exceptions: minimum quality of Trimmomatic operation = 2 for Trimmomatic, mean inner distance between mate pairs = 0 for TopHat (since the fragment size for these libraries tends to be short relative to the read length we used (150 bases paired end reads)), and mode = Intersection (nonempty) and ID attribute = gene_name for HT-seq. DESeq2 was used to perform a CD123+ vs Thy2 differential expression analysis using published bulk RNA-Seq data for Thy2. For the combined analysis of the bulk RNA-Seq data from CD123+ cells and the sRNA-Seq data from CD34+Lin− thymic cells we used HT-seq and UMI count data for protein-coding genes from bulk RNA-Seq and sRNA-Seq data respectively. HT-Seq counts for protein-coding genes were converted into counts per kilobase gene length, then normalized to a total library size of 10000, and then log transformed. Similarly, averaged raw sRNA-Seq UMI counts for each of the clusters (P1-11) from sRNA-Seq data were normalized to a total library size of 10,000 for the averaged libraries, and then log transformed. The cor function in R was then used to calculate Pearson correlation coefficients between bulk RNA-Seq libraries and averaged sRNA-Seq libraries for P1-11.
Flow cytometry for IRF8 expression
Thymus CD34+ cells were stained with surface antibodies followed by fixation permeabilization, and staining with IRF8 or isotype control antibody. Given the rarity of the IRF8+ population a large number of events (500,000-700,000) were recorded to visualize the IRF8+ population. Equal or greater number of events were recorded for the control isotype stained sample relative to the sample stained with IRF8 antibody.
QUANTIFICATION AND STATISTICAL ANALYSIS
A two sided paired t test on logit transformed values was used to compare frequencies of pDC lineage cells in cultured initiated by CD123+ and CD123− cells respectively Multifactorial ANOVA (R 3.6.1) was used to compare alternative lineage outputs (Thy2a vs Thy2b) respectively. Alternative lineage log10 transformed cell outputs were analyzed by multifactorial ANOVA that included donor, input cell type (Thy2a or Thy2b), and output cell type (CD19+, CD56+, CD33+, CD56+CD11c+, CD66b+, or CD14/15+) factors (3 factors) and donor: input cell type and input cell type: output cell type interactions. A similar analysis was performed for total cell output in B-NK-myeloid, NK, or Myelo-erythroid cultures. A two sided paired t test was used to compare cell outputs (log10 transformed value) in T cell differentiation cultures between Thy2a and Thy2b. Given the small sample sizes for the in vitro studies, statistical tests to determine whether the data met the normality assumption would lack the power to detect departures from such an assumption and were thus not performed. The individual biological replicate data-points (mean of the technical replicate values for the given biological replicate where applicable) used for each of the statistical analyses are shown in the figures. Error bars (Standard error of mean, SEM) for technical replicate values are depicted in the figures, and the meaning of theses error bars (SEM) is indicated in the figure legend.
Supplementary Material
Table S1. Single cell RNA-Seq protocol and computational parameters, related to STAR Methods.
Table S2. Cluster specific genes for CD34+ thymic cells, related to Figure 1.
Table S3. Mapping of Index sorted cells to droplet single cell RNA-Seq data, related to Figure 1.
Table S4. Pseudotime gene expression profiles for CD34+ thymic cells, related to Figure 5.
Table S5. Early CD4SP vs CD8SP differentially expressed genes, genes differentially expressed with β-selection, and cluster specific genes for SP T cells, related to Figure 6.
Table S6. Reagents and media, related to STAR Methods.
KEY RESOURCES TABLE
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Antibodies | ||
CD4 FITC (clone: RPA-T4) | BD Biosciences | Cat#555346; RRID: AB_395751 |
CD8 PerCP (clone: SK1) | BD Biosciences | Cat#347314; RRID: AB_400280 |
CD3 APC (clone: UCHT1) | BD Biosciences | Cat#561810; RRID: AB_10893350 |
CD45 APC/Cy7 (clone: HI30) | Biolegend | Cat#304014; RRID: AB_314402 |
Glycophorin A PE (clone: 11E4B-7-6 (KC16)) | Beckman Coulter | Cat#IM2211U; RRID: AB_131213 |
CD34 PerCP-Cy5.5 (clone: 8G12) | BD Biosciences | Cat#347203; AB_400266 |
CD4 APC (clone: RPA-T4) | BD Biosciences | Cat#555349; RRID: AB_398593 |
CD 8 APC (clone: SK1) | Biolegend | Cat#344722; RRID: AB_2075388 |
CD235a APC (clone: GA-R2) | BD Biosciences | Cat#551336; RRID: AB_398499 |
CD7 FITC (clone: 4H9) | BD Biosciences | Cat#347483; RRID: AB_400309 |
CD1a PE (clone: HI149) | BD Biosciences | Cat#555807; RRID: AB_396141 |
CD1a APC (clone: HI149) | Biolegend | Cat#300110; RRID: AB_350438 |
CD2 FITC (clone: RPA-2.10) | Biolegend | Cat#300206; RRID: AB_314030 |
CD4 PerCP (clone: RPA-T4) | Biolegend | Cat#300528; RRID: AB_893321 |
CD7 PE (clone: CD7-6B7) | Biolegend | Cat#343106; RRID: AB_1732011 |
CD8 PerCP (clone: SK1) | Biolegend | Cat#344708; RRID: AB_1967149 |
CD34 PE-Cy7 (clone: 581) | Biolegend | Cat#343516; RRID: AB_1877251 |
CD44 Alexa Fluor 700 (clone: BJ18) | Biolegend | Cat#338813; RRID: AB_2715998 |
CCR7 APC-Cy7 (clone: G043H7) | Biolegend | Cat#353212; RRID: AB_10916390 |
CD19 PerCP (clone: SJ25-C1) | BD Biosciences | Cat#340421; RRID: AB_400028 |
CD7 FITC (clone: CD7-6B7) | Biolegend | Cat#343104; RRID: AB_1659216 |
CD123 PE (clone: 6H6) | Biolegend | Cat#306005; RRID: AB_314579 |
CD14 BV605 (clone: 63D3) | Biolegend | Cat#367125; RRID: AB_2716230 |
CD303 APC (clone: 201A) | Biolegend | Cat#354206; RRID: AB_11150412 |
HLA-DR FITC (clone: L243) | BD Biosciences | Cat#347363; RRID: AB_400291 |
CD14 APC (clone: MΦP9) | BD Biosciences | Cat#340436; RRID: AB_400509 |
CD66b PE-Cy7(clone: G10F5) | Biolegend | Cat#305116; RRID: AB_2566605 |
CD303 AF 700 (clone: 201A) | Biolegend | Cat#354228; RRID: AB_2629744 |
CD3 APC (clone: UCHT1) | Biolegend | Cat#300439; RRID: AB_2562045 |
CD45 PerCP (clone: HI30) | Biolegend | Cat#304026; RRID: AB_893337 |
CD4 PE (clone: A161A1) | Biolegend | Cat#357409; RRID: AB_2565661 |
CD8 APC-Cy7 (clone: SK1) | Biolegend | Cat#344714; RRID: AB_2044006 |
TCRa/b PE (clone: IP26) | Biolegend | Cat#306708; RRID: AB_314646 |
CD56 PE (clone: HCD56) | Biolegend | Cat#318306; RRID: AB_604101 |
CD11c APC (clone: 3.9) | Biolegend | Cat#301614; RRID: AB_493023 |
CD33 PE (clone: P67.6) | BD Biosciences | Cat#347787; RRID: AB_400350 |
CD56 APC (clone: HCD56) | Biolegend | Cat#318310; RRID: AB_604106 |
CD45 PerCP (clone: HI30) | Biolegend | Cat#304026; RRID: AB_893337 |
CD14 FITC (clone: M5E2) | Biolegend | Cat#301804; RRID: AB_314186 |
CD15 FITC (clone: HI98) | Biolegend | Cat#301904; RRID: AB_314196 |
CD66b PE (clone: G10F5) | Biolegend | Cat#305106; RRID: AB_305106 |
CD1a PE (clone: HI149) | Biolegend | Cat#300106; RRID: AB_314020 |
CD4 PerCP (clone: RPA-T4) | Biolegend | Cat#300528; RRID: AB_893321 |
CD8 PerCP (clone: SK1) | Biolegend | Cat#344708; RRID: AB_1967149 |
CD19 PerCP (clone: SJ25-C1) | BD Biosciences | Cat#340421; RRID: AB_400028 |
CD34 PE-Cy7 (clone: 581) | Biolegend | Cat#343516; RRID: AB_1877251 |
APC Mouse Anti-Human IRF8 (clone: V3GYWCH) | eBioscience | Cat#17-9852-82; RRID: AB_2573318 |
Mouse IgG1 K Isotype Control APC (clone: P3.6.2.8.1) | eBioscience | Cat#17-4714-81; RRID: AB_1603315 |
Biological Samples | ||
Human Thymuses | Children’s Hospital at Los Angeles | N/A |
Chemicals, Peptides, and Recombinant Proteins | ||
HI-FBS | Gemini Bio-products | Cat#100-106 |
B-27 Supplement | GIBCO | Cat#17504-044 |
Ascorbic Acid | Gemini Bio-Products | Cat#400-109 |
FBS Defined | Hyclone | Cat#SH30070.03 |
IL-7 (used for R-B27 Medium) | Peprotech | Cat#200-07 |
IL-7 (used for B/NK/M Medium; PDC Differentiation) | Miltenyi Biotec | Cat#130-095-367 |
Flt-3 ligand (used for R-B27 Medium) | Peprotech | Cat#300-19 |
Flt-3 (used for B/NK/M Medium; PDC Differentiation) | Miltenyi Biotec | Cat#130-096-474 |
TPO (used for B/NK/M Medium; PDC Differentiation) | Miltenyi Biotec | Cat#130-094-011 |
IL-15 (used for NK Medium) | Peprotech | Cat#200-15 |
SCF (used for NK Medium) | Miltenyi Biotec | Cat#130-096-692 |
GM-CSF (used for PDC Differentiation) | Peprotech | Cat#300-03 |
Fixation Buffer | Biolegend | Cat#420801 |
Cell Staining Buffer | Biolegend | Cat#420201 |
10x Permeabilization Wash Buffer | Biolegend | Cat#421002 |
DAPI | Invitrogen | Cat#D1306 |
Immune Globulin Intravenous (Human) | Carimune | Cat#44206-417-06 |
Critical Commercial Assays | ||
CD34 Microbead Kit | Miltenyi Biotec | Cat#130-100-453 |
ATO assay: Cell Culture Insert | Millipore | Cat#PICM0RG50 |
BD genomics Precise WTA Single Cell Kit | BD Biosciences | Cat#634100 |
Picopure RNA extraction kit | ThermoFisher Scientific | Cat#KIT0204 |
Smart-Seq V4 ultralow input RNA-Seq kit | Clontech | Cat#634888 |
Deposited Data | ||
Human gene annotation GRCh38.p12 (Release 31) | EMBL-EBI | https://www.gencodegenes.org/human/release_31.html |
Raw and analyzed data | This paper | GEO: GSE139042 |
GRCh38 version of the human genome (GCA_000001405.15_GRCh38_no_alt_analysis_set.fna) | NCBI | http://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/analysisSet/ |
Mouse.MitoCarta2.0 database | Broad Institute | https://www.broadinstitute.org/scientific-community/science/programs/metabolic-disease-program/publications/mitocarta/mitocarta-in-0) |
Jackson database (for conversion of Mouse gene IDs to human gene IDs) | The Jackson Laboratory | http://www.informatics.jax.org/homology.shtml |
Ensembl (for conversion of Mouse gene IDs to human gene IDs) | EMBL-EBI | http://www.ensembl.org/biomart/martview/f21f6d3f6c139e32bd08fb6fbe59b5e8 |
Experimental Models: Cell Lines | ||
Human MS5 cell line | Gay M. Crooks Laboratory | Seet et al. 2017 |
Human MS5-DLL cell line | Gay M. Crooks Laboratory | Seet et al. 2017 |
Software and Algorithms | ||
Cellranger (v2.2.0 and v3.0.2) | 10x Genomics | https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation |
BDTM Precise WTA Analysis (v2.0) | SevenBridges | https://www.sevenbridges.com/ |
inDrop (v2 and v3) | GitHub | https://github.com/indrops/indrops |
Seurat (v2.3.4) | Bultler et al., 2018 | http://satijalab.org/seurat/ |
Scrublet | Wolock et al., 2019b | https://github.com/AllonKleinLab/scrublet |
BNOmics | Gogoshin et al., 2017 | https://omictools.com/bnomics-tool |
Monocle (v3) | Cao et al., 2019 | https://cole-trapnell-lab.github.io/monocle3/docs/trajectories/ |
tradeSeq | Van den Berge et al., 2020 | https://github.com/statOmics/tradeSeq |
GSEA v4.0 | Subramanian et al., 2005 | https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/GSEA_v4.0.x_Release_Notes |
DAVID | Huang da et al., 2009 | https://david.ncifcrf.gov/ |
Galaxy | N/A | https://usegalaxy.org/ |
Trimmomatic (Galaxy version 0.38.0) | Bolger et al., 2014 | https://toolshed.g2.bx.psu.edu/repository?repository_id=ef9e620e9ac844b3 |
TopHat (Galaxy version 2.1.1) | Kim et al., 2013 | https://toolshed.g2.bx.psu.edu/repository?repository_id=3bece861a6a3608c |
Samtools (Galaxy version 2.0.3) | Li et al., 2009 | https://toolshed.g2.bx.psu.edu/repository?repository_id=67974b61f56a2c93 |
HTseq (Galaxy version 0.9.1) | Anders et al., 2015 | https://toolshed.g2.bx.psu.edu/repository?repository_id=2df7e24ce6c1f224 |
Highlights.
CD34+ human thymic progenitors present a spectrum of specification and commitment states
Earliest progenitors are CD7− and exhibit stem cell like and T-primed transcriptomes
Loss of B cell potential precedes that of myeloid and NK potential during T lineage commitment
A CD34+ subpopulation of cells is primed for the plasmacytoid dendritic lineage
Acknowledgments
We thank F. Codrea, J. Scholes, C. Daly, and H. Long for technical assistance, A Hartley, Dr. W. Wells, and Dr. R. Subramanyan from the Cardiothoracic Surgery Department at the Children’s Hospital Los Angeles (CHLA) for collecting thymuses, the CHLA and Harvard University Bauer next generation sequencing cores, the CHLA, University of Alabama, and Broad Stem Cell Research Center flow cytometry facilities, the University of Southern California Libraries Bioinformatics services, and Dr. G.Crooks (University of California Los Angeles) for the MS5 and MS5-DLL1 cell lines. This work was supported by the St. Baldrick’s (C.P.), the Nautica (C.P.), the Shirley Mckernan, the Couples Against Leukemia (C.P.), Hyundai Hope on Wheels (C.P.) and the SHARE, Inc. (C.P.) Foundations, the Susumu Ohno Chair in Theoretical and Computational Biology (R.A.S.), the Susumu Ohno Distinguished Investigator Fellowship (G.G.) and NCI Cancer Systems Biology Consortium 1U01CA23221601A1 (R.A.S, B.S., G.G.).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of interests
The authors have no competing financial interests to declare.
References
- Anders S, Pyl PT, Huber W, (2014). HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balan S, Ollion V, Colletti N, Chelbi R, Montanana-Sanchis F, Liu H, Vu Manh TP, Sanchez C, Savoret J, Perrot I, et al. (2014). Human XCR1+ dendritic cells derived in vitro from CD34+ progenitors closely resemble blood dendritic cells, including their adjuvant responsiveness, contrary to monocyte-derived dendritic cells. J Immunol 193, 1622–1635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhandoola A, von Boehmer H, Petrie HT, and Zuniga-Pflucker JC (2007). Commitment and developmental potential of extrathymic and intrathymic T cell precursors: plenty to choose from. Immunity 26, 678–689. [DOI] [PubMed] [Google Scholar]
- Bolger AM, Lohse M, Usadel B, (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Butler A, Hoffman P, Smibert P, Papalexi E, and Satija R (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36, 411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cante-Barrett K, Mendes RD, Li Y, Vroegindeweij E, Pike-Overzet K, Wabeke T, Langerak AW, Pieters R, Staal FJ, and Meijerink JP (2017). Loss of CD44(dim) Expression from Early Progenitor Cells Marks T-Cell Lineage Commitment in the Human Thymus. Front Immunol 8, 32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, et al. (2019). The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casero D, Sandoval S, Seet CS, Scholes J, Zhu Y, Ha VL, Luong A, Parekh C, and Crooks GM (2015). Long non-coding RNA profiling of human lymphoid progenitor cells reveals transcriptional divergence of B cell and T cell lineages. Nat Immunol 16, 1282–1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charoensawan V, Wilson D, and Teichmann SA (2010). Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res 38, 7364–7377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collin M, and Bigley V (2018). Human dendritic cell subsets: an update. Immunology 154, 3–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins A, Rothman N, Liu K, and Reiner SL (2017). Eomesodermin and T-bet mark developmentally distinct human natural killer cells. JCI Insight 2, e90063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crinier A, Milpied P, Escaliere B, Piperoglou C, Galluso J, Balsamo A, Spinelli L, Cervera-Marzal I, Ebbo M, Girard-Madoux M, et al. (2018). High-Dimensional Single-Cell Analysis Identifies Organ-Specific Signatures and Conserved NK Cell Subsets in Humans and Mice. Immunity 49, 971–986 e975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dontje W, Schotte R, Cupedo T, Nagasawa M, Scheeren F, Gimeno R, Spits H, and Blom B (2006). Delta-like1-induced Notch1 signaling regulates the human plasmacytoid dendritic cell versus T cell lineage decision through control of GATA-3 and Spi-B. Blood 107, 2446–2452. [DOI] [PubMed] [Google Scholar]
- Fulton DL, Sundararajan S, Badis G, Hughes TR, Wasserman WW, Roach JC, and Sladek R (2009). TFCat: the curated catalog of mouse and human transcription factors. Genome Biol 10, R29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gogoshin G, Boerwinkle E, and Rodin AS (2017). New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data. J Comput Biol 24, 340–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ha VL, Luong A, Li F, Casero D, Malvar J, Kim YM, Bhatia R, Crooks GM, and Parekh C (2017). The T-ALL related gene BCL11B regulates the initial stages of human T cell differentiation. Leukemia 31, 2503–2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haghverdi L, Buttner M, Wolf FA, Buettner F, and Theis FJ (2016). Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 13, 845–848. [DOI] [PubMed] [Google Scholar]
- Hao QL, George AA, Zhu J, Barsky L, Zielinska E, Wang X, Price M, Ge S, and Crooks GM (2008). Human intrathymic lineage commitment is marked by differential CD7 expression: identification of CD7- lympho-myeloid thymic progenitors. Blood 111, 1318–1326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ho IC, Tai TS, and Pai SY (2009). GATA3 and the T cell lineage: essential functions before and after T-helper-2-cell differentiation. Nat Rev Immunol 9, 125–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holyoake TL, Nicolini FE, and Eaves CJ (1999). Functional differences between transplantable human hematopoietic stem cells from fetal liver, cord blood, and adult marrow. Exp Hematol 27, 1418–1427. [DOI] [PubMed] [Google Scholar]
- Hosokawa H, Romero-Wolf M, Yui MA, Ungerback J, Quiloan MLG, Matsumoto M, Nakayama KI, Tanaka T, and Rothenberg EV (2018). Bcl 11b sets pro-T cell fate by site-specific cofactor recruitment and by repressing Id2 and Zbtb16. Nat Immunol 19, 1427–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu F, Gartenhaus RB, Eichberg D, Liu Z, Fang HB, and Rapoport AP (2010). PBK/TOPK interacts with the DBD domain of tumor suppressor p53 and modulates expression of transcriptional targets including p21. Oncogene 29, 5464–5474. [DOI] [PubMed] [Google Scholar]
- Huang da W, Sherman BT, and Lempicki RA (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57. [DOI] [PubMed] [Google Scholar]
- Jackson JT, Shields BJ, Shi W, Di Rago L, Metcalf D, Nicola NA, and McCormack MP (2017). Hhex Regulates Hematopoietic Stem Cell Self-Renewal and Stress Hematopoiesis via Repression of Cdkn2a. Stem Cells 35, 1948–1957. [DOI] [PubMed] [Google Scholar]
- Johnson JL, Georgakilas G, Petrovic J, Kurachi M, Cai S, Harly C, Pear WS, Bhandoola A, Wherry EJ, and Vahedi G (2018). Lineage-Determining Transcription Factor TCF-1 Initiates the Epigenetic Identity of T Cells. Immunity 48, 243–257 e210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kernfeld EM, Genga RMJ, Neherin K, Magaletta ME, Xu P, and Maehr R (2018). A Single-Cell Transcriptomic Atlas of Thymus Organogenesis Resolves Cell Types and Developmental Maturation. Immunity 48, 1258–1270 e1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL, (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, and Raychaudhuri S (2019). Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kueh HY, Yui MA, Ng KK, Pease SS, Zhang JA, Damle SS, Freedman G, Siu S, Bemstein ID, Elowitz MB, et al. (2016). Asynchronous combinatorial action of four regulatory factors activates Bcl11b for T cell commitment. Nat Immunol 17, 956–965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J, Zhou YJ, Ma W, Zhang W, Aljoufi A, Luh T, Lucero K, Liang D, Thomsen M, Bhagat G, et al. (2017). Lineage specification of human dendritic cells is marked by IRF8 expression in hematopoietic stem cells and multipotent progenitors. Nat Immunol 18, 877–888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F, Ambrosini G, Chu EY, Plescia J, Tognin S, Marchisio PC, and Altieri DC (1998). Control of apoptosis and mitotic spindle checkpoint by survivin. Nature 396, 580–584. [DOI] [PubMed] [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Leid M, and Rothenberg EV (2010). An early T cell lineage commitment checkpoint dependent on the transcription factor Bcl11b. Science 329, 89–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Wang Y, Lu H, Li J, Yan X, Xiao M, Hao J, Alekseev A, Khong H, Chen Tv et al. (2019). Genome-wide analysis identifies NR4A1 as a key mediator of T cell dysfunction. Nature 567, 525–529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love MI, Huber W, and Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luecken MD, and Theis FJ (2019). Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15, e8746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marafioti T, Paterson JC, Ballabio E, Reichard KK, Tedoldi S, Hollowood K, Dictor M, Hansmann ML, Pileri SA, Dyer MJ, et al. (2008). Novel markers of normal and neoplastic human plasmacytoid dendritic cells. Blood 111, 3778–3792. [DOI] [PubMed] [Google Scholar]
- Marquez C, Trigueros C, Franco JM, Ramiro AR, Carrasco YR, Lopez-Botet M, and Toribio ML (1998). Identification of a common developmental pathway for thymic natural killer cells and dendritic cells. Blood 91, 2760–2771. [PubMed] [Google Scholar]
- Martin-Gayo E, Gonzalez-Garcia S, Garcia-Leon MJ, Murcia-Ceballos A, Alcain J, Garcia-Peydro M, Allende L, de Andres B, Gaspar ML, and Toribio ML (2017). Spatially restricted JAG1-Notch signaling in human thymus provides suitable DC developmental niches. J Exp Med 214, 3361–3379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Medvedovic J, Ebert A, Tagoh H, and Busslinger M (2011). Pax5: a master regulator of B cell development and leukemogenesis. Adv Immunol 111, 179–206. [DOI] [PubMed] [Google Scholar]
- Mold JE, Venkatasubrahmanyam S, Burt TD, Michaelsson J, Rivera JM, Galkina SA, Weinberg K, Stoddart CA, and McCune JM (2010). Fetal and adult hematopoietic stem cells give rise to distinct T cell lineages in humans. Science 330, 1695–1699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore AJ, Sarmiento J, Mohtashami M, Braunstein M, Zuniga-Pflucker JC, and Anderson MK (2012). Transcriptional priming of intrathymic precursors for dendritic cell development. Development 139, 373–384. [DOI] [PubMed] [Google Scholar]
- Parekh C, and Crooks GM (2013). Critical differences in hematopoiesis and lymphoid development between humans and mice. J Clin Immunol 33, 711–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park JE, Botting RA, Dominguez Conde C, Popescu DM, Lavaert M, Kunz DJ, Goh I, Stephenson E, Ragazzini R, Tuck E, et al. (2020). A cell atlas of human thymic development defines T cell repertoire formation. Science 367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pellin D, Loperfido M, Baricordi C, Wolock SL, Montepeloso A, Weinberg OK, Biffi A, Klein AM, and Biasco L (2019). A comprehensive single cell transcriptional landscape of human hematopoietic progenitors. Nat Commun 10, 2395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plum J, De Smedt M, Leclercq G, Taghon T, Kerre T, and Vandekerckhove B (2008). Human intrathymic development: a selective approach. Semin Immunopathol 30, 411–423. [DOI] [PubMed] [Google Scholar]
- Porritt HE, Rumfelt LL, Tabrizifard S, Schmitt TM, Zuniga-Pflucker JC, and Petrie HT (2004). Heterogeneity among DN1 prothymocytes reveals multiple progenitors with different capacities to generate T cell and non-T cell lineages. Immunity 20, 735–745. [DOI] [PubMed] [Google Scholar]
- Radtke F, Wilson A, Stark G, Bauer M, van Meerwijk J, MacDonald HR, and Aguet M (1999). Deficient T cell fate specification in mice with an induced inactivation of Notch1. Immunity 10, 547–558. [DOI] [PubMed] [Google Scholar]
- Rothenberg EV (2019). Programming for T-lymphocyte fates: modularity and mechanisms. Genes Dev 33, 1117–1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rothenberg EV, Ungerback J, and Champhekar A (2016). Forging T-Lymphocyte Identity: Intersecting Networks of Transcriptional Control. Adv Immunol 129, 109–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmitt N, Cumont MC, Nugeyre MT, Hurtrel B, Barre-Sinoussi F, Scott-Algara D, and Israel N (2007). Ex vivo characterization of human thymic dendritic cell subsets. Immunobiology 212, 167–177. [DOI] [PubMed] [Google Scholar]
- Scripture-Adams DD, Damle SS, Li L, Elihu KJ, Qin S, Arias AM, Butler RR 3rd, Champhekar A, Zhang JA, and Rothenberg EV (2014). GATA-3 dose-dependent checkpoints in early T cell commitment. J Immunol 193, 3470–3491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seet CS, He C, Bethune MT, Li S, Chick B, Gschweng EH, Zhu Y, Kim K, Kohn DB, Baltimore D, et al. (2017). Generation of mature T cells from human hematopoietic stem and progenitor cells in artificial thymic organoids. Nat Methods 14, 521–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shafer MER (2019). Cross-Species Analysis of Single-Cell Transcriptomic Data. Front Cell Dev Biol 7, 175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sontag S, Forster M, Qin J, Wanek P, Mitzka S, Schuler HM, Koschmieder S, Rose-John S, Sere K, and Zenke M (2017). Modelling IRF8 Deficient Human Hematopoiesis and Dendritic Cell Development with Engineered iPS Cells. Stem Cells 35, 898–908. [DOI] [PubMed] [Google Scholar]
- Stehling-Sun S, Dade J, Nutt SL, DeKoter RP, and Camargo FD (2009). Regulation of lymphoid versus myeloid fate ‘choice’ by the transcription factor Mef2c. Nat Immunol 10, 289–296. [DOI] [PubMed] [Google Scholar]
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, and Mesirov JP (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taghon T, Van de Walle I, De Smet G, De Smedt M, Leclercq G, Vandekerckhove B, and Plum J (2009). Notch signaling is required for proliferation but not for differentiation at a well-defined beta-selection checkpoint during human T cell development. Blood 113, 3254–3263. [DOI] [PubMed] [Google Scholar]
- Taghon TN, David ES, Zuniga-Pflucker JC, and Rothenberg EV (2005). Delayed, asynchronous, and reversible T lineage specification induced by Notch/Delta signaling. Genes Dev 19, 965–978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tomasello E, and Vivier E (2005). KARAP/DAP12/TYROBP: three names and a multiplicity of biological functions. Eur J Immunol 35, 1670–1677. [DOI] [PubMed] [Google Scholar]
- Unnisa Z, Clark JP, Roychoudhury J, Thomas E, Tessarollo L, Copeland NG, Jenkins NA, Grimes HL, and Kumar AR (2012). Meis1 preserves hematopoietic stem cells in mice by limiting oxidative stress. Blood 120, 4973–4981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van de Walle I, Dolens AC, Durinck K, De Mulder K, Van Loocke W, Damle S, Waegemans E, De Medts J, Velghe I, De Smedt M, et al. (2016). GATA3 induces human T cell commitment by restraining Notch activity and repressing NK-cell fate. Nat Commun 7, 11171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van den Berge K, Roux de Bezieux H, Street K, Saelens W, Cannoodt R, Saeys Y, Dudoit S, and Clement L (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun 11, 1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Velten L, Haas SF, Raffel S, Blaszkiewicz S, Islam S, Hennig BP, Hirche C, Lutz C, Buss EC, Nowak D, et al. (2017). Human haematopoietic stem cell lineage commitment is a continuous process. Nat Cell Biol 19, 271–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weerkamp F, Baert MR, Brugman MH, Dik WA, de Haas EF, Visser TP, de Groot CJ, Wagemaker G, van Dongen JJ, and Staal FJ (2006). Human thymus contains multipotent progenitors with T/B lymphoid, myeloid, and erythroid lineage potential. Blood 107, 3131–3137. [DOI] [PubMed] [Google Scholar]
- Wolock SL, Krishnan I, Tenen DE, Matkins V, Camacho V, Patel S, Agarwal P, Bhatia R, Tenen DG, Klein AM, and Welner RS (2019a). Mapping Distinct Bone Marrow Niche Populations and Their Differentiation Paths. Cell Rep 28, 302–311 e305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolock SL, Lopez R, and Klein AM (2019b). Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst 8, 281–291 e289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woon Kim Y, Kim S, Geun Kim C, and Kim A (2011). The distinctive roles of erythroid specific activator GATA-1 and NF-E2 in transcription of the human fetal gamma-globin genes. Nucleic Acids Res 39, 6944–6955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zemmour D, Zilionis R, Kiner E, Klein AM, Mathis D, and Benoist C (2018). Single-cell gene expression reveals a landscape of regulatory T cell phenotypes shaped by the TCR. Nat Immunol 19, 291–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng Y, Liu C, Gong Y, Bai Z, Hou S, He J, Bian Z, Li Z, Ni Y, Yan J, et al. (2019). Single-Cell RNA Sequencing Resolves Spatiotemporal Development of Prethymic Lymphoid Progenitors and Thymus Organogenesis in Human Embryos. Immunity 51, 930–948 e936. [DOI] [PubMed] [Google Scholar]
- Zheng S, Papalexi E, Butler A, Stephenson W, and Satija R (2018). Molecular transitions in early progenitors during human cord blood hematopoiesis. Mol Syst Biol 14, e8041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou W, Yui MA, Williams BA, Yun J, Wold BJ, Cai L, and Rothenberg EV (2019). Single-Cell Analysis Reveals Regulatory Gene Expression Dynamics Leading to Lineage Commitment in Early T Cell Development. Cell Syst 9, 321–337 e329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zilionis R, Nainys J, Veres A, Savova V, Zemmour D, Klein AM, and Mazutis L (2017). Single-cell barcoding and sequencing using droplet microfluidics. Nat Protoc 12, 44–73. [DOI] [PubMed] [Google Scholar]
- Zlotoff DA, Sambandam A, Logan TD, Bell JJ, Schwarz BA, and Bhandoola A (2010). CCR7 and CCR9 together recruit hematopoietic progenitors to the adult thymus. Blood 115, 1897–1905. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Single cell RNA-Seq protocol and computational parameters, related to STAR Methods.
Table S2. Cluster specific genes for CD34+ thymic cells, related to Figure 1.
Table S3. Mapping of Index sorted cells to droplet single cell RNA-Seq data, related to Figure 1.
Table S4. Pseudotime gene expression profiles for CD34+ thymic cells, related to Figure 5.
Table S5. Early CD4SP vs CD8SP differentially expressed genes, genes differentially expressed with β-selection, and cluster specific genes for SP T cells, related to Figure 6.
Table S6. Reagents and media, related to STAR Methods.
Data Availability Statement
Sequencing data are deposited in the Gene expression omnibus (GSE, GEO GSE139042).