Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 26.
Published in final edited form as: Cell Stem Cell. 2024 Jan 5;31(2):244–259.e10. doi: 10.1016/j.stem.2023.12.001

A time and single-cell resolved model of murine bone marrow hematopoiesis

Iwo Kucinski 1,#, Joana Campos 2,#, Melania Barile 1,3,#, Francesco Severi 4,5,#, Natacha Bohin 2, Pedro N Moreira 4,5, Lewis Allen 2, Hannah Lawson 2, Myriam L R Haltalli 1, Sarah J Kinston 1, Dónal O’Carroll 4,5,8, Kamil R Kranc 2,6,8, Berthold Göttgens 1,7,8
PMCID: PMC7615671  EMSID: EMS194148  PMID: 38183977

Abstract

The paradigmatic hematopoietic tree model is increasingly recognized to be limited as it is based on heterogeneous populations and largely inferred from non-homeostatic cell fate assays. Here, we combine persistent labeling with time-series single-cell RNA-Seq to build a real-time, quantitative model of in vivo tissue dynamics for murine bone marrow hematopoiesis. We couple cascading single-cell expression patterns with dynamic changes in differentiation and growth speeds. The resulting explicit linkage between molecular states and cellular behavior reveals widely varying self-renewal and differentiation properties across distinct lineages. Transplanted stem cells show strong acceleration of differentiation at specific stages of erythroid and neutrophil production, illustrating how the new model can quantify the impact of perturbations. Our reconstruction of dynamic behavior from snapshot measurements is akin to how a kinetoscope allows sequential images to merge into a movie. We posit that this approach is generally applicable to understanding tissue scale dynamics at high resolution.

Keywords: Differentiation rate, self-renewal, hematopoiesis, progenitors, stem cells, scRNA-Seq, dynamics, modelling, Hoxb5

Introduction

A continuous flow of cells replenishes blood cells throughout life to maintain hematopoietic homeostasis. This flow originates from the hematopoietic stem cells (HSCs) and progresses through a complex hierarchy of multipotent, bipotent and unipotent progenitors, collectively called hematopoietic stem and progenitor cells (HSPCs). Decades of research have allowed to immunophenotypically identify HSPCs and define their functionality, thus positioning them within the hematopoietic hierarchy and establishing the hematopoietic tree model1,2. While the advent of scRNA-Seq provided high-resolution and allowed to resolve heterogeneity within HSPCs, scRNA-Seq is typically used to obtain snapshot measurements lacking temporal information. Thus, while undeniably useful, the classical hematopoietic tree model, even complemented by scRNA-Seq, remains static and qualitative, and does not capture the highly dynamic and complex biology of HSPCs in real time.

To pave the way for real-time modelling of HSPC dynamics under near-native conditions, a previous study3 induced a persistent fluorescent reporter within the HSC compartment and assessed label propagation into downstream progenitors and mature cells by flow cytometry. However, immunophenotyping has limited resolution, and HSPC populations defined by flow cytometry are known to be functionally heterogeneous. This is particularly evident within common myeloid progenitors (CMP)4,5 and lymphoid-primed multipotent progenitors (LMPP)6,7, whose subpopulations resolved by scRNA-Seq were found to be functionally distinct in transplantation experiments. Further high-throughput scRNA-Seq studies charted putative gradual molecular transitions from HSCs toward 8 distinct lineages8, including specific stages of erythroid differentiation9. Nonetheless, while molecular states captured by scRNA-Seq can be predictive of progenitor fate potential when assessed in vitro1012, gaining insights into single-cell fates in vivo during homeostasis has remained more challenging13.

Recent work in non-hematopoietic tissue demonstrated that lineage tracing and scRNA-Seq can be combined to understand progenitor cell differentiation into the airway epithelial lineage14. Nevertheless, such an approach has never been applied to a complex multilineage differentiation process, such as hematopoiesis. Furthermore, it has remained unclear whether predictive tissue-scale computational models of steady state-tissue homeostasis at single cell resolution can be constructed based on such approaches. Here, we aimed to uncover high-resolution HSPC kinetics of multilineage bone marrow (BM) hematopoiesis in vivo. To achieve this, we combined inducible HSC-labelling to track label-propagation to downstream progeny during steady-state hematopoiesis with scRNA-Seq at different time-points after label induction. This enabled us to reveal real-time dynamics (instead of mere pseudotime or latent time) and build quantitative cellular flow models. These models describe numbers of cells produced and transported across the stem and progenitor compartment, properties which have so far only been measured for a selected few subpoplations. Notably, the ample molecular information, allowed us to construct continuous models to associate gene expression changes with cell behaviors, such as increased proliferation or accelerated differentiation, thus directly connecting tissue and cellular behavior with the underpinning layer of molecular processes. Finally, we demonstrate that our dynamic HSPC model, unlike static immunophenotypic data, is transferable and able to predict HSPC fate outcomes based on published datasets. To showcase this, we compared the near-native haematopoiesis with an HSC transplantation setting, which revealed drastic upregulation of differentiation at specific stages of erythroid and neutrophilic differentiation.

Results

Hoxb5-CreERT2-Tomato reporter tracks HSC differentiation over time

To analyze HSPC dynamics, we aimed to employ a labelling approach (based on principles from Busch et al.3), in which an inducible HSC-specific CRE excises a STOP cassette in a Rosa26-LoxP-STOP-LoxP-tdTomato (R26LSL-tdTomato) reporter to permanently label HSCs and their subsequent progeny. We hypothesized that Hoxb5, which is specifically expressed in HSCs15, would be a suitable driver locus. To validate the specificity of Hoxb5 expression at the protein level, first we generated Hoxb5mKO2 mice, where the expression of the HOXB5 and mKO2 fluorescent reporter protein is driven by the endogenous Hoxb5 locus (Figure S1A). mKO2 expression was confined to the BM LinSca-1+c-Kit+ (LSK) stem and progenitor cell compartment and was absent from LinSca-1-c-Kit+ progenitors, and Lin+ differentiated cells (Figure S1B-D, E1A). Within the LSK compartment, Hoxb5mKO2 was highly expressed in the LSK CD48CD150+ HSC fraction and enriched this population (Figure S1B-D). Low-level expression was also detected in LSK CD48CD150 multipotent progenitors (MPPs), although the highest expression was exclusive to the HSC population (Figure S1C). At the functional level, we observed robust long-term multilineage repopulation activity of mKO2+ HSCs upon serial transplantation. Notably, chimerism in the HSC compartment of primary recipients was significantly lower in the mKO2- cohort, and mKO2- HSCs failed to efficiently propagate all lineages in secondary recipients (Figure S1E, E1B-D). These results point to Hoxb5-positive HSC fraction as a population with the most robust stem cell activity. To corroborate this observation, we investigated the molecular properties of mKO2+ and mKO2- HSC/MPP populations. For that we generated a Smart-seq2 plate-based scRNA-Seq dataset consisting of 384 cells sorted by FACS for mKO2 as well as surface markers. We then scored the cells for expression of HSC marker genes, which demonstrated that mKO2+ cells indeed express canonical HSC marker genes at protein and mRNA level and display the highest HSC-score based on full transcriptomic analysis16, a molecular signature associated with LT-HSC function (Figure E2A-C). Projection of the newly sequenced transcriptomes on our previously reported, high-resolution HSPC landscape7 confirmed that almost all mKO2+ HSCs tightly occupy the region of the most immature stem cells (Figure E2D-F). Altogether, HOXB5 expression selectively marks HSCs with the long-term multilineage reconstitution potential and stem cell signature.

Having validated Hoxb5 as a suitable locus, we generated Hoxb5CreERT2 mice15 and crossed them with R26LSL-tdTomato reporter17 to establish the Hoxb5CreERT2; R26LSL-tdTomato mice (here referred to as Hoxb5-Tom, Figure 1A), which allow for inducible labelling of HSCs in situ by tamoxifen administration and subsequent tracking of HSC progeny over time (Figure 1B-D). To validate this system, we used flow cytometry to track label propagation across HSPC sub-populations in the BM and differentiated cell types in the peripheral blood (PB) at indicated intervals (Figure 1B-E, S1F-H, E3). Upon tamoxifen administration, we observed specific labelling of 1.8% of cells within the HSC compartment, which over 2 months gradually accumulated in downstream cell compartments (Figure 1C-D). Importantly, internal controls (i.e. vehicle-treated Hoxb5-Tom mice or those lacking the driver allele) show no background labelling (Figure E3A). Labelled differentiated cells are detectable in PB within 1-2 months after labeling HSCs; with particularly fast contribution to the platelet lineage, followed by erythrocytes and myeloid cells, and T and B cells appearing later (Figure 1D-E & E3B-D). We observed non-decreasing labelling for at least 9 months after the treatment (Figure 1C-D, S1G-H), indicating that the label is persistent and inert.

Figure 1. Hoxb5-Tom persistent labelling system enables time-resolved tracking of stem cells and their progeny.

Figure 1

(A) Diagram of the genetic construct used to introduce the inducible and persistent Hoxb5-Tom label in the respective mouse line. (B) Schematic of the time-course experiment analyzing Hoxb5-Tom label frequency in the indicated populations of mouse bone marrow (BM) and peripheral blood (PB). Upon tamoxifen administration, Hoxb5-expressing cells are labelled with heritable Tom expression. (C) Fractions of Tom+ cells in the HSPC subpopulations within the BM at indicated time-points after label induction. Mice were analyzed at 0.5 (n=5), 1 (n=3), 2 (n=8), 3 (n=10), 5 (n=4) and 9 (n=7) months after label induction. Dots represent individual mice and bars indicate mean ± SEM. (D, E) Fractions of Tom+ cells in peripheral blood of lymphoid/myeloid cells (D) and erythrocytes/platelets (E) analyzed at the indicated time-points after label induction. Shown as mean with error bars denoting SEM of 4-32 animals. (F) Diagram portraying the concept of inferring population dynamics from heritable label propagation. The rate of label accumulation in the downstream compartments is proportional to the differentiation rate between the compartments. (G) Diagrams providing analogy between the shape of the Waddington landscape and the key population parameters estimated in this work: differentiation rate is akin to the slope of the landscape; self-renewal (and related residence time or half-life) depend on the input, output and proliferation; flux the number of cells multiplied by the slope. (H) Comparison of Tie2-YFP and Hoxb5-Tom label progression displayed as relative labelling frequency between MPP or HPC-1 and HSC compartments. Red dots - Hoxb5-Tom data points (see Figure 2), grey line - rolling average for matching Tie2-YFP data, as published previously22. LSK – Lin-, Sca1+, cKit+; HSCs – LSK, CD150+, CD48-; MPP – LSK, CD150-, CD48-; HPC-1 – LSK, CD150-, CD48+; HPC-2 – LSK, CD150+, CD48+ cells.

Computational inference of population dynamics relies on a simple principle (Figure 1F): as heritable label propagates down from the label-rich upstream compartment, the speed of differentiation is proportional to label equilibration (Figure 1G, see methods). To benchmark our new experimental model, we compared flow cytometry data obtained from tamoxifen-treated Hoxb5-Tom mice with previously published results of analogous label propagation obtained with the Tie2-YFP mouse line3. As shown in Figure 1H, our data are highly consistent for both MPP/HSC and HPC-1/HSC relative abundances across the entire time range, thus validating our new transgenic models and unlocking our next goal - modelling of population dynamics.

A unified reference HSPC landscape with time-resolved differentiation

Having validated the HoxB5-Tom system, we designed a strategy to capture scRNA-Seq profiles of cells traversing the HSPC landscape over time (Figure 2A). We harvested BM from tamoxifen-treated mice at 9 time-points ranging between 3 days (providing just enough time for Tom protein expression) and 269 days, when the label is mostly equilibrated. At each time-point we sorted cells together from two overlapping populations: (Lin-cKit+) and (Lin-Sca1+) from the bone marrow which contain all stem cells and a broad view of progenitor cells8 (Figure E3E). To ensure accuracy and reproducibility, we profiled multiple independent biological replicates for each time-point (36 animals in total). While our focus was labelled Tom+ cells, we also profiled Tom- cells at each time-point to obtain accurate background cell density in case it changes over time. We generated a common reference landscape by integrating all single-cell profiles followed by clustering, embedding in a UMAP projection and manual annotation (Figures 2B,C, S2A-E). Clusters disjointed from the main landscape body (mostly mature cell types) and those representing technical artifacts (e.g. doublets or dying cells) were excluded (unfiltered data in Figure S2F-G). The refined landscape (>115,000 cells) served as the basis for our analysis. To place our data within the broader scope of hematopoiesis research and extend its interpretability, we provide multiple layers of annotation. Manual annotation8,11 used lineage marker expression, cell cycle phases, HSC-score (molecular signature of long-term repopulating HSCs16) and pseudotime (Figure 2A-D, Supplementary Table S1) to highlight the upstream cluster containing HSCs (Figure S2C) (cluster 0) and 8 terminal clusters (Figure 2C), where clear expression of definitive markers is observed. Please note that we refer to the populations as terminal within the constraints of our stem and progenitor landscape, but most of them are not mature cells and cells can progress to further differentiation stages beyond our landscape. To add more functional information, we mapped external scRNA-Seq datasets using our Cellproject package. Firstly, we overlaid canonical immunophenotypic subpopulations with our scRNA-Seq landscape (Figures 2D,E, S3A,B) (data from Nestorowa et al.7) comprising: highly purified LT-HSCs, multipotent progenitors (MPPs) 1 and 3, ST-HSCs, granulocyte-monocyte progenitors (GMPs), LMPPs and megakaryocyte-erythroid progenitors (MEPs). Secondly, we highlighted cell states associated with specific cell fate outcomes based on in vitro lineage tracing experiments11 (Figures 2F and S3C). Importantly, the in vitro cell potency is broadly aligned with the manual cluster annotation. Finally, we included information about the active/inactive HSC status under proliferative challenge based on lineage tracing data from18 (Figure 2G). Together, these annotations place cell clusters into a functional framework to facilitate interpretation of the population dynamics models discussed below.

Figure 2. Time-resolved reference HSPC landscape at single-cell level.

Figure 2

(A) Experimental design for HSPC dynamics analysis with flow cytometry and scRNA-Seq. Table indicates specific time-point and the number of mice (replicates) used for Tom+ scRNA-Seq analysis, 2 mice in each time-point were used for the Tom- fraction estimation. (B) UMAP projection of the integrated HSPC scRNA-Seq landscape (all Tom+ and Tom- cells combined) with color-coded clusters. Outlier or aberrant clusters were removed for clarity (see Figure S2F,G). (C) Manual annotation of the landscape in B. Most differentiated clusters with clearly defined lineage markers are color-coded, intermediate undifferentiated states are shown in grey (Int prog), cluster containing HSCs is shown in pink. (D,E) Projection from B in grey, with embedded and color-coded immunophenotypic sub-populations from Nestorowa et al. data.7 Up to randomly selected 60 cells in each category are plotted. All cells are plotted in Figure S3A. (F) Projection from B in grey, with embedded and color-coded cKit+ progenitors, based on their output in lineage tracing in vitro cultures. Color-coded points correspond to cells harvested at day 2 with sufficient clonal information available at day 4 and day 6 of culture. Data from Weinreb et al.11. (G) Projection from B in grey, with embedded and color-coded HSCs with no detected cellular output (inactive - childless) or contributing to haematopoiesis (active - parent) following 5-FU challenge in mice (data from Bowling et al.18). (H) Projection from B in grey, with Hoxb5-Tom+ cells harvested at indicated time-points shown in blue. Nestorowa et al.7 population definitions: LT-HSC – Lin-, cKit+, Sca1+, CD34-, Flt3-, MPP1 – Lin-, cKit+, Sca1+, Flt3-, CD34+, CD150+, CD48-, ST-HSC – Lin-, cKit+, Sca1+, Flt3-, CD34+, CD150-, CD48-, GMP Lin-, cKit+, Sca1+, CD16/32+, CD34+, LMPP – Lin-, cKit+, Sca1+, Flt3+, CD34+, MEP – Lin-, cKit+, Sca1+, CD16/32-, CD34-, MPP3 – Lin-, cKit+, Flt3-, CD34+, CD150-, CD48+, CMP – Lin-, cKit+, Sca1+, CD16/32-, CD34+.

Abbreviations: B prog - B cell progenitor, Bas - basophils, Bas/MC prog - Basophil and Mast Cell progenitors, DC prog - dendritic cell progenitors, Eos - eosinophils, Ery prog - erythroid progenitors, HSC - hematopoietic stem cells, Int prog - intermediate progenitors, Ly prog - lymphoid progenitors, Meg prog - megakaryocyte progenitors, Mono/DC prog - monocyte and dendritic cells progenitors, Neu prog - neutrophil progenitors, pDC - plasmacytoid dendritic cells

The HSPC landscape split by time-point shows clear propagation of labelled cells (Figure 2H), a full quantification of labelled/unlabelled cell ratios for all time-points is provided in Figure S4A and follows the behaviour from label propagation experiments3 (Figure 1F). Certain clusters (e.g. clusters 8 and 7) very quickly accumulate labelled cells, others are slower (clusters 11 or 10) and some very slow (clusters 13 or 14) (Figures 2H and S4A). Eventually the label largely equilibrated, as compared to the Tom- population (Figure S4A,B). Importantly, scRNA-Seq clustering resolves heterogeneity within cell populations defined by conventional flow cytometry gates (Figure S3A-B)4,7,19 and is predictive of cell fate11. To provide a quantitative description of population dynamics, we employed two types of models: discrete and continuous, each built for specific purposes. The former captures dynamics across the entire compartment and intuitively combines hierarchical tree models of hematopoiesis with a new quantitative view based on more precisely defined cell types. It also serves as a necessary reference for the latter, a more advanced continuous modelling approach, which focuses on specific trajectories, but provides cellular flux parameter estimates for each single cell and thus directly connects single cell transcriptomic profiles with tissue-scale cellular behavior.

Discrete model reveals HSPCs with lineage-specific patterns of self-renewal and differentiation

To capture the flow of cells through the HSPC compartment in real time, we utilized the concepts from previous label propagation studies3,20 to build a discrete model consisting of multiple, interconnected cell clusters (Figure 3A-C). We explain two variables changing over time: number of labelled cells (Tom+ cells, Figures 3D and E4, Supplementary table S2) and size (Tom- cells, Figure E5, Supplementary Table S2) for each cluster (labelling frequency is provided in Figure E6). The model considers two basic properties of each cluster: net proliferation (number of divisions reduced by the number of cells lost e.g. by cell death) and differentiation rates (number of ingoing and outgoing cells between clusters per unit of time, scaled to a single cell). Thus, our model simultaneously estimates (net) proliferation balancing it with the influx, efflux and time-dependent cluster size. Importantly a common set of parameters fits both labelled and unlabeled cells (except cluster 0, see the next section) indicating similar dynamics. Additionally, we introduce two derived parameters that are useful for interpreting cell behavior (Figure 1G). Residence time, which corresponds to a half-time of one cell in a cluster, is the time required for the cluster to shrink by 63% (to 1/e of original size, where e is the Euler’s number) in absence of any incoming cells. Residence time is defined as the inverse of ((death+differentiation)-proliferation) and thus residence time increases as proliferation rate rises, and death/differentiation rates decrease and vice versa. Finally, flux depicts the total number of cells transported between clusters in a unit of time (i.e. differentiation rate multiplied by cluster size). We limited the number of differentiation parameters by assuming that cells travel only between adjacent clusters (i.e. with highest PAGA21 connectivities – Figure 3A). While PAGA is a robust method with relatively few assumptions, there is currently no consensus in trajectory inference methodology. Thus, we also provide the tools to explore alternative topologies (see Methods) and apply a cluster-independent, continuous model (see later).

Figure 3. Quantitative discrete model of the HSPCs highlights progenitor-specific self-renewal and differentiation properties.

Figure 3

(A) Annotated UMAP projection overlaid with PAGA graph abstraction view of the HSPC landscape. The graph shows putative transitions between clusters (related to Figure 2B). (B) The absolute number of labelled cells observed in each cluster over time displayed as a graph view from A. 4 out of 9 time-points are shown for clarity. (C) Graph abstraction view of the discrete cellular flow model. Size of the nodes is proportional to square roots of relative cluster size, node color is proportional to the residence time (log-scale), arrows indicate differentiation directions, arrow stem thickness is proportional to cell flux. Note: cluster 0a is fully self-renewing and thus exhibits infinite residence time. (D) Best discrete model fit (with 95% confidence intervals) for Tom+ cell number in chosen clusters relative to cluster 0. Error bars indicate pooled standard error of the mean. (E) Scatter plot showing relation of pseudotime distance to differentiation rates, each point corresponding to a transition between clusters. Only transitions among clusters 0-12 and differentiation rates greater than 10-12 are shown. Please note that in the case of the transitions between clusters 4 and 8 two differentiation rates are plotted (each direction). Blue line indicates linear model fit with shaded 95% confidence interval. (F) UMAP projection of the HSPC landscape, with cells color-coded by simulated time required for 1 cell to accumulate in the corresponding cluster starting from cluster 0. Please mind that the color is logarithm-scaled. (G) Simulated relative cluster size of chosen clusters following complete ablation of cluster 0.

Of note, we observed changes in relative cluster size over-time (i.e. the background unlabelled cells), in particular a quick increase in relative abundance (compared to cluster 0) of clusters 7 and 8 (>50% in <20 days) and a coordinated relative decrease in other major clusters (Figures E5, S5C). Cluster 0 size also modestly increases size in the same time-frame. Previous tamoxifen-based label propagation studies also observed a quick rise in ST-HSC, MPP2 and MPP3 total numbers (Figure S5D), but no explanations were provided22. It had previously been suggested that application of tamoxifen interferes with JAK-STAT signalling23. Consistent with recovery from cell depletion caused by tamoxifen interference with JAK/STAT, this pathway was most active in the depleted clusters 7, 8 in addition to cluster 0 (Figure S6A). To assess how recovery from short-term cell depletion may influence model parameters, we compared our main model with a bi-phasic fit, which permits a switch in differentiation/proliferation rates between the recovery and homeostasis phases, albeit at some cost of increased parameter uncertainty (Figure S6B-C). We observed changes in 14 out of 58 rates between the two phases (Figure S6D-E, Supplementary Table S3). Of note, all bar one of the homeostasis rates in the biphasic model are essentially the same as the rates in the main model. We thus explain and account for a previously overlooked side-effect of using tamoxifen for label induction.

We formulated our main model into a graph in Figures 3C and S6F, where node sizes are proportional to the average cluster size, node color indicates residence time (or net proliferation in Figure S6F) and arrows indicate cell flux (differentiation rate in Figure S6F). Please note that some transitions occur infrequently (transition rates and their confidence intervals are provided in Supplementary Table S3) and we cannot exclude that some may be redundant (for the discussion on the minimal model, please see the methods section “Model selection”). Interestingly, differentiation rates poorly correlate with similarities between gene expression states (Figures 3E, S6G), indicating that discovery of real-time dynamics requires temporal information. Moreover, the compartment-wide view clearly shows lineage-specific dynamics (Figure 3C). Megakaryocyte progenitors emerge through a rapid transition via the fast-proliferating cluster 8, which also generates erythroid cells, albeit more slowly (cluster 1). Substantial erythroid output is achieved via sequential cell states with considerable self-renewal (clusters 1 and 9) and proliferation (cluster 9), followed by fast differentiation between clusters 9 and 11. Furthermore, myeloid progenitors transition from cluster 0 either into cluster 4 or via a shared route with the erythroid and megakaryocytic progenitors into cluster 8, with gradually increasing differentiation rates from cluster 2 onward. The myeloid branch therefore employs additional progenitor populations analogously to the erythroid trajectory, albeit with lower proliferation rates (Figure S6F).

The lymphoid trajectory is altogether different showing exclusively slow transitions via clusters 5 and 2 into cluster 14 (which overlaps mostly with a subset of MPP4 cells). Cluster 5, compared to the more myeloid-biased cluster 4, proliferates and differentiates more slowly, while expressing higher levels of key lymphoid factors, including Flt3, Satb1, Pou2f2 (and to some extent the monocytic factor Irf8, discussed later) (Figure E7A). The lymphoid program therefore displays restricted proliferation and differentiation rates already from its immature stages. Plasmacytoid dendritic cell (cluster 13, pDCs) differentiation through the lymphoid cluster 14 and myeloid clusters 6 and 16 is similarly slow. The emergence of mast cell, basophil and eosinophil progenitors in the adult BM is unclear24,25. Our results are consistent with a model whereby basophil and mast cell progenitors (cluster 12) are continuously generated and originate at least by a transition from the early myeloid cluster 2 but may also have some contributions from other clusters (dashed lines). Furthermore, despite limited cell numbers, we observed some label accumulation in eosinophil progenitors (cluster 17), most likely originating from neutrophil progenitors (cluster 10).

Interestingly, residence time (self-renewal) varies widely across the HSPC landscape, with lineage-specific patterns (Figure 3C, Supplementary Table S3). As expected, cluster 0 contains the only perfectly self-sustaining population; intermediate populations show an extensive range of residence times, from just 2.5 days for Erythroid/Megakaryocytic progenitor (cluster 8), 11 days for Monocyte/Granulocyte progenitors (cluster 2) and up to 53 days for the medial cluster 4. The latter example falls close to the residence time previously estimated for MPPs (70 days)3 and highlights that progenitors can also show considerable self-renewal. Importantly, cells in clusters 8, 2 and 4 fall within the immunophenotypic CMP and MPP definitions (Figures 2D-E and S3A-B), illustrating how historically used flow cytometry gates capture populations with vastly different dynamics. We also note that among some intermediate clusters our model permits a degree of forward and backward differentiation suggesting that some states may exist in an equilibrium, with each cluster having distinct differentiation properties. Thus, diverse hematopoietic progenitors exhibit widely different, lineage-specific dynamics consistent with distinct mechanisms maintaining cell output.

Composition of the top HSPC compartment changes over time

Based on immunophenotype annotations (Figure 2C), the top cluster 0 contains virtually all LT-HSC and a large subset of ST-HSC and MPP1 cells. The overall cluster size increases over time (Figure S5B,C), reminiscent of previous reports noting the expansion of ST-HSCs and MPP3s as mice age (Figure S5D)22. Of note, the Hoxb5-Tom labelled cells within cluster 0 grow almost exponentially (Figure S5A), which mirrors the previously reported behavior of Tie2-YFP labelled LT-HSCs22 and is consistent with the observation of dramatic expansion of Hoxb5-, Tie2- or Fgd5-labelled cells in aging animals26. This suggests that the Hoxb5 and Tie2 systems mark, in addition to the canonically quiescent LT-HSCs, a subset of immature cells with high self-renewal or proliferation capacity.

To take account of this experimentally revealed heterogeneity within cluster 0, we next tested multiple models and put forward a potential explanation, which assumes a logistic growth for cluster 0 and three sub-clusters within in it: a top, perfectly self-renewing cluster 0a, the megakaryocyte & myeloid-biased cluster 0b, and the multipotent cluster 0c (Figure 3C, dashed box). We constrained cluster 0a size and differentiation rate to match previously reported LT-HSC numbers but left clusters 0b and 0c sizes unconstrained. We defined the tip cluster by finely subclustering cluster 0 and picking as cluster 0a the subcluster with the highest HSC-score (subcluster 8, Figure S5E, F). Reassuringly, this cluster size is compatible to that predicted by our model, is enriched as expected in Procr and Ly6a, and, most importantly, has a non-growing labelling frequency, as one would expect from the candidate tip cluster (Figure S5G-I). Cluster 0c remains stable over time but it proliferates quickly and feeds both downstream progenitors and cluster 0b, which in turn grows over time (Figure S5B,C). Hence, the flux between clusters 0b and 8 increases with mouse age. This is in line with the increased myeloid output27,28 and relative proportion of megakaryocyte-biased and myeloid-biased HSCs in aged animals29. Of note, cluster 0b shows high self-renewal (residence time of 180 days), consistent with high repopulation potential of lineage-biased HSCs29. Altogether, our discrete model in addition to faithful recapitulation of cell flux through the HSPC compartment also provides a possible explanation of aging-associated changes in HSC behavior.

Continuous model of hematopoiesis connects dynamics of gene expression with cell behavior

While our discrete model has provided the HSPC compartment-wide dynamics, a complementary model is required to associate gene expression changes at the single cell level with cell behavior, such as increased proliferation or accelerated differentiation. To directly connect cellular behavior with the underpinning layer of molecular processes, we employed a continuous model based on the Pseudodynamics framework30. For tractability, we considered one lineage at a time, based on cells with highest fate probabilities towards each lineage31,32(Figure 4A-B, E8, E9). The continuous model assigns differentiation and net proliferation rates to each cell (Figure 4A) by solving partial differential equations describing cell densities along pseudotime over real-time. Hence, model parameters and gene expression share a common pseudotime (and real-time) axis, enabling direct comparison. Of particular interest are states (i.e. pseudotime ranges) with changes in proliferation or differentiation rates. An increase in proliferation rates indicates an expansion stage, whereas a rise in differentiation rates marks a potentially irreversible molecular transition.

Figure 4. Continuous models capture single cell growth and differentiation rates alongside their molecular state.

Figure 4

(A) Diagrammatic representation of megakaryocyte trajectory analysis with pseudodynamics. Following the arrows: putative cell transitions (pseudotime kernel) were used to estimate megakaryocyte cell fate, from which megakaryocyte trajectory was isolated (dashed line). Along the pseudotime cell densities were computed for each time-point (color-coded density profiles) and analyzed using the pseudodynamics framework providing differentiation and net proliferation rate estimates for each cell. (B) (left) UMAP projection of the HSPC landscape color-coded by cell fate probability of neutrophil lineage (estimated with pseudotime kernel, see A). Panels on the right show UMAP projections of isolated neutrophil trajectory color-coded by indicated parameters or gene expression. (C) Pseudodynamics fitted net proliferation parameter (red) and differentiation rate parameters (blue) along pseudotime for megakaryocyte trajectory. Vertical lines indicate the region of interest with increasing proliferation. (D) Heatmap of genes differentially expressed around the region of interest shown in C. Left columns indicate genes belonging to enriched gene categories - E2F target (FDR <10-38), G2-M checkpoint (FDR <10-24) and cell cycle (FDR <10-38). (E) Pseudodynamics fitted net proliferation (red) and differentiation rate (blue) parameters along pseudotime for neutrophil trajectory. Vertical lines indicate the region of interest with increasing differentiation. (F) Fitted gene expression values along pseudotime for neutrophil markers and two TF groups shown in (full analysis in Figure E10). Grey, dashed line indicated differentiation rates shown in E. Gene expression was scaled around the mean.

We set out to analyze gene expression dynamics occurring at such changes in cell behavior over time, for instance correlating the first derivative of the differentiation rates and gene expression highlights complex matching patterns and shortlists potential regulators driving cell differentiation in an unbiased manner (Figure E11, extended data Table E1). A more targeted approach tests for differential expression around specific stages of differentiation and changes in cell behavior. For brevity, we showcase the megakaryocyte and neutrophil trajectories (Figures 4, E9, 10) but also provide analogous analyses for the erythroid and monocytic/dendritic lineages (Figures E9 and Supplementary Tables S4, S5, extended data Table E2). As shown in Figure 4A megakaryocyte progenitors display characteristic changes in growth and differentiation rates. Cells rapidly increase their net proliferation early on, ahead of the peak in differentiation and around the stage where Pf4 (megakaryocyte marker) mRNA becomes detectable. In this growth phase, we identified 170 dynamically expressed genes with distinct patterns along pseudotime (Figure 4C-D, similar analysis of the differentiation phase is showed in Figure E9C-D). These genes are strongly enriched for cell growth and proliferation genes with almost all of them showing an upward trend in the relevant pseudotime range. This serves as a proof of principle, as the model based solely on total cell numbers, predicts the growth stage matching the respective gene signature.

While following the neutrophil differentiation kinetics (Figure 4B,E), we found gradually increasing differentiation rates (blue line) accompanied by a complex pattern of gene expression. Indeed, we observed two phases of neutrophil-affiliated gene expression (Figure 4F), with Cebpe, Cst7, Elane, Fcgr3, and Gfi1 appearing almost simultaneously at the onset of differentiation, while Clec4a2, Wfdc21, S100a8 increasing at different intervals later. To gain insight into potential mechanisms regulating the process, we scrutinized transcription factors with dynamic expression along the trajectory (Figure E10A) and classified them into 4 groups based on their distinct expression patterns. Group 2 (Figure 4F) largely mirrored the expression of early neutrophil markers described above, and reassuringly contained Gfi1, a key determinant of the neutrophil fate, which indeed suppresses Irf8 expression33, a member of the downregulated group 1 TFs. Group 3 (Figure E10B) contained factors with the highest expression in the most immature HSPCs (e.g. Gata2, Hlf, Meis1) and showed early and nearly synchronous decay in expression, suggesting involvement in self-renewal. Finally, Group 1 (Figure 4F) TFs exhibit unique patterns of expression with peaks at different stages, all of which ultimately decaying as late neutrophil markers appear. These contain multiple TFs associated with specific lineages such as: Irf8 (Monocyte/DC fate33), Aff3 (lymphoid/B cells34), Dach1 (myeloid35), Hmga2 (myeloid, erythroid, megakaryocytic36, Pou2f2 (lymphoid/B cells37) or are important for HSPC self-renewal, including Ikzf238 or Ssbp239. Thus, our analysis indicates that progenitors exhibit transient expression of major lineage determinants at specific differentiation stages on their way to becoming neutrophils (see Gfi1, Flt3, Irf8 in Figure E10D,E). Early accumulation of these factors is correlated with increased differentiation rate but eventually a single programme takes over and accelerates the differentiation even further. Thus, the continuous model unlocks access to full single cell transcriptome data, and thus enables integrated analysis of cellular and molecular dynamics, revealing new mechanistic insights into cell behavior during differentiation.

HSPC models simulate cell journeys in real-time consistent with basic properties of hematopoiesis

Mathematical models combined with our new datasets offer unique prediction capabilities allowing us to unravel fundamental facets of hematopoiesis. Specifically, we focused on computing cell journeys in real-time and consequences of cluster ablation. Firstly, we estimate the ‘average journey times’ with the discrete model. We placed a single cell in cluster 0 and computed the average time required to accumulate one cell for each target cluster. The required time depends on the specific influx/efflux and proliferation rates, including the loss of cells out of the terminal populations (via differentiation/death). Highly transient populations can therefore take longer to be populated stably. As shown in Figures 3F and S6H, average journey time widely varies between terminals states of different lineages (Supplementary Table S3). For instance, accumulating a cell in Meg progenitors (cluster 7) requires 27 days, neutrophil progenitors (cluster 10) or late erythroid progenitors (cluster 11) >80 days and finally producing pDCs takes about 150 days. Secondly, we predict what would happen if, under normal conditions, the self-renewing cluster 0 was ablated. As expected, without cluster 0 input, downstream cluster sizes would gradually decline over time (Figure 3G), due to limited self-renewal of intermediate progenitors. As we described above, progenitor self-renewal is lineage-specific, hence corresponding clusters wane at different rates, with megakaryocyte progenitors depleted to 50% after 2-3 days, whereas lymphoid progenitors are maintained for >50 days. Of note, the substantial effect of the depletion in some compartments is due to the fact that we are simulating ablation of all cells in cluster 0, which includes progenitors immediately downstream of HSCs. For comparison, we also simulated the effect of the depletion of just cluster 0a and ascertained that the effect on the downstream populations is barely noticeable (Figure E7B).

Predictions revealed by our model agree with the order of lineage emergence inferred from transplantation11,29,4042 or cell culture11,40 experiments. The time-frame of the process is expectedly much longer but is compatible with previous studies of HSPC dynamics in vivo3. Our approach is therefore anchored firmly in the long tradition of hematopoiesis research and opens the opportunity to serve as a predictive framework for in vivo experiments.

Integrative model is predictive and resolves the effects of transplantation on HSPC dynamics

Our models serve as a reference framework for near-native hematopoiesis, capable of transferring information across experiments and systems. To prove predictive capabilities of our models, we utilised data from an independent study (Upadhaya et al.43). In this setting, HSCs and their descendants were labelled using the Pdzk1ip1-CreER;tdTomato system (analogous to Hoxb5-Tom but using a different HSC-specific driver) and analysed after 3, 7 and 14 days. Upadhaya et al.43 profiled cells by scRNA-Seq, thus we were able to integrate them into our HSPC landscape (Supplementary Table S6). As the limited number of replicates and cells was not sufficient for building a standalone model, we used the Hoxb5 model parameters to predict expected cell abundances using the day 3 time-point as initial condition and compared the predictions with the observed data. As shown in Figures S7A and E12, both the discrete model and continuous models faithfully predict the evolution of the system over time for the majority of the large clusters and trajectories. Curiously, our model indicates faster differentiation towards megakaryocytes (see clusters 7 and 8) at the expense of erythroid (clusters 9 and 1). We noted that Upadhaya et al.43 used a milder tamoxifen treatment than our study, hence consulted the Hoxb5 bi-phasic model (Figure S6E) for potential explanation. Reassuringly, the bi-phasic parameters show that shortly after our tamoxifen treatment megakaryocytic differentiation occurs faster while erythroid slower, thus suggesting that the discrepancy is associated with the difference in tamoxifen dosage. Thus, our model, with some uncertainty, is able to quantitatively predict dynamics of adult in vivo hematopoiesis. Furthermore, our approach paves the way for future studies, which, avoiding the transient tamoxifen effect, will provide even more accurate models.

We next employed the same approach to predict multi-lineage differentiation trajectories in vitro (Figures E13) using previously published data11. We found that almost all clusters and trajectories accumulate differentiating cells much faster in vitro than in vivo, though interestingly megakaryocytic differentiation occurs at roughly the same speed as in vivo.

To demonstrate how our model can be used to generate new insights, we analyzed a previous study44, which used scRNA-Seq to track the progeny of highly-purified HSCs in transplanted animals over time (Figure 5A). After integrating the scRNA-Seq profiles into our reference landscape (Figure 5B-F), we derived cell frequencies per cluster at day 3, and used the discrete model to predict the cell abundance expected under non-transplantation conditions (Figures 5G and S7B). While some general features match normal hematopoiesis, for instance megakaryocyte progenitors being the first emerging lineage, cells under transplantation conditions differentiate much faster in most directions, particularly towards the neutrophil fate (Figure 5G, cluster 10). The erythroid lineage behaves differently; while early megakaryocyte and erythrocyte differentiation is accelerated upon transplantation (Figure 5G, cluster 8), late erythroid progenitor cell emergence is delayed, compared to the steady-state counterparts (Figure 5G, cluster 11). To go beyond qualitative interpretation, we performed combinatorial model re-fit of the transplantation data to pinpoint the changes in differentiation rates and proliferation rates in each cluster/transition most likely to be responsible for altered transplantation landscape dynamics (Figure E14A). This procedure highlighted stage and lineage-specific effects. For instance the erythroid lineage differentiates around 10 times faster between clusters 1 and 9, while myeloid progenitor cluster 2 exhibits 2-fold higher net proliferation and 7-fold faster differentiation towards neutrophil progenitors and 3-fold higher towards monocyte/DC progenitor (Figure E14B). In conclusion, we demonstrated that our model can be easily applied to other datasets, and provide quantitative predictions and interpretation, which would not be available from static measurements alone.

Figure 5. Growth and differentiation rates of HSPCs adapt to cellular stress conditions.

Figure 5

(A) Diagram of the experiment performed by Dong et al.44, with HSC transplanted into an irradiated animal and followed over time with scRNA-Seq. (B-F) UMAP projections of the HSPC landscape (grey) with embedded cells from Dong et al.44 in blue. (G) Relative cluster size, points indicates observed data from Dong et al.44. Red line indicates our discrete model prediction (shaded area – with 95% confidence interval) starting from the day 3 time-point. Error bars indicate propagated standard error of the mean.

Discussion

Quantitative models describing cell differentiation (e.g. Waddington landscape) were conceptualized decades ago45. However, the generation of dynamic and quantitative abstractions of native haematopoiesis has been hampered by lack of suitable experimental approaches, particularly in terms of getting down to single cell resolution. Here, we report a major effort, combining persistent HSC labelling, time-series scRNA-Seq analyses and mathematical modelling to build a predictive model of in vivo hematopoiesis dynamics. Analogously to the moving images in a kinetoscope, our approach employs multiple high-resolution snapshots of differentiation to reconstruct the real-time cellular flow between single-cell states within the BM multilineage hematopoiesis. Our model describes cell behavior with self-renewal and differentiation rates, which intuitively can be represented as the shape of a Waddington-like landscape (Figure 6). Using this analogy, the discrete model is a set of fixed platforms connected with slides, whereas the continuous model follows the curvature for all observed states (here: single cells). Differentiation rate indicates the slope between two states, with steeper slopes indicating faster transition. In turn, stable states, the flat areas, have little or no downward slope and combined with proliferation, constitute areas of high self-renewal (Figure 1G).

Figure 6. The quantitative model of HSPC dynamics in the mouse bone marrow.

Figure 6

Diagram highlighting the transferable information and the model utility.

Differentiation rate and cell fate are naturally connected, but, crucially, exist in specific experimental contexts. CMPs have been originally proposed as a multipotent population with combined erythroid, megakaryocytic, neutrophilic and monocytic potential46. However, later studies reported that most CMPs are transcriptionally and epigenetically primed towards specific lineages4, exhibit lineage bias and are primarily unipotent5 in transplantation cell fate assays. Importantly, transplantation, as we show in this work, is associated with greatly increased differentiation rates, most likely due to high proliferative demand, as other means of ablating cells, like 5-FU treatment also causes accelerated differentiation3. Furthermore, in vitro assays, performed under cytokine-rich conditions driving rapid differentiation, CMPs also rarely show combined megakaryocyte, erythroid, granulocyte and monocyte output11,46. However, if the differentiation is slowed down and cells given the opportunity to expand (for approx. 3 divisions) under cytokine-restricted conditions (SCF, IL-11, TPO only), >50% CMP clones generate multipotent output after switching to a cytokine-rich secondary culture46. Similarly, LMPPs have been described as largely unipotent cells in transplantation assays47 but in fact can produce multipotent output in two-phase culture assays analogous to the CMPs48, ie. given the opportunity to grow first under slower differentiation conditions. Our model, describing the physiologically-relevant slow differentiation system close to native conditions, suggests that intermediate clusters 8, 4, 5, which largely overlap with CMPs, are able to slowly transition among each other. In particular, cells can shift from 8 to 4 between the transient megakaryocyte/erythroid-biased cluster 8 and the long-lived myeloid-biased cluster 4, but potential bidirectional transitions are also permitted by our model (while transition 4 to 8 best fit value is small the upper bound is considerable). This prediction is consistent with cell fates estimated from the static data (using cellrank), where only a small subset of cells is assigned to a single lineage (e.g. ~5% to neutrophil fate), whereas within more mature cluster 2 60% of cells are predicted the become neutrophils (Figure E14C). Thus a subset of CMP cells are balanced and behave as multipotent progenitor states. This is also consistent with the in vivo observation of progenitors with combined myeloid and megakaryocytic/erythroid outputs13,19,49. Importantly, we find that transitions between clusters 4 and 8 are slow, thus under strong differentiation conditions (e.g. transplantation or differentiation-promoting media), progenitor cells simply do not have time to ‘explore’ the multipotent states but instead roll down to a committed state and thus generate only a limited number of lineages. Moreover, if a primitive progenitor cell does not divide before being channeled down a particular lineage, alternative fates can never be realized (as illustrated in Figure E14D).

While tamoxifen has broadly been used to activate CRE in multiple studies3,22,50,51, we found that hematopoiesis upon tamoxifen treatment perturbs the steady-state in the short term (i.e. first two weeks). Indeed, we observed changes in cluster sizes and differentiation rates associated with tamoxifen treatment, which we teased apart using a bi-phasic model (Figure S6B-E). Development of tamoxifen-independent models will help avoid such confounding effects. In the long-term, as mice age, we observed only modest differences of most cluster sizes but observed striking differences in cluster 0 composition. While further work will be required to better resolve the HSC sub-populations (in cluster 0) and their age-related dynamics, we consider the tentative sub-structure provided here as a critical first step in this endeavor, as it fits both our data and experimental evidence of HSC behavior in aging mice3,22,26,29.

We fully leverage the scRNA-Seq approach to extend our model’s applicability. To ensure broad accessibility and interpretability, we integrated published annotation from multiple sources.7,11,18 This places our unified landscape (and its sub-populations) in the biological context of previous immunophenotyping and lineage tracing experiments. Moreover, static cell properties (cluster, pseudotime) and model parameters (differentiation rates, self-renewal) are transferable. Crucially, new scRNA-Seq data can be readily incorporated into our landscape and our model is capable of predicting differentiation outcomes for chosen time-points given initial conditions, as we demonstrated using an independent time-course data43. Finally, our model can be used to simulate putative explanations for changes in cell abundance, e.g. between healthy and disease tissues, even if only few snapshot measurements are available. We showcased this capability by shedding new light on changes cell dynamics after HSC transplantation, which displays stage and lineage-specific acceleration of differentiation in the erythroid and neutrophilic/monocytic-DC lineages (see transitions 1-9 and 2-3/2-6 respectively).

Differentiation and growth involve coordinated up- and down-regulation of thousands of genes, where it remains unknown for the vast majority of those genes whether and if so, how, they play a role in controlling cell behavior. To access the relevant molecular states with high precision, we introduce the continuous model of near-native hematopoiesis, which includes per-cell growth and differentiation rates, thus providing a direct comparison between cellular behavior and underlying gene expression. We observed complex, sequential gene expression patterns, some of which overlap with increasing differentiation rates, implying irreversible molecular changes. For example, we show that neutrophil differentiation is coupled with expression of multiple lineage determinants (Irf8, Flt3, Pou2f2, Gfi1) followed by a single programme taking over and a further increase in differentiation.

The current and predominant view of haematopoiesis has been constructed through the identification of progenitor populations by FACS and definition of their potential by transplantation1. This approach not only lacks resolution, but more importantly, transplantation defines potential in a non-homeostatic assay and therefore does not reveal the actual contribution of any given population to steady-state haematopoiesis. The revolution of single cell transcriptomics has provided evidence for additional progenitor populations4,6,7,19, but so far had been severely limited by having to place those putative populations on a static transplantation-defined map of hematopoiesis. Here we have overcome all these shortcomings by observing near-native haematopoiesis in situ and over time.

The combination of lineage tracing with a single cell transcriptomics chase delivered a truly quantitative and dynamic model of hematopoiesis including previously unknown dynamic relationships between precisely defined stem and progenitor cells. The model also reveals fundamental quantitative system properties from cell trajectories, cell division rates, and number of cell divisions to individual lineage-specific differentiation rates.

Unshackling the field from the static transplantation-defined view of haematopoiesis shifts the paradigm from qualitative models with limited predictive capabilities to integrative, quantitative and predictive models. The latter are highly transferable and thus key to providing insight into human hematopoiesis, where experimental options are limited. As recently demonstrated scRNA-Seq can be integrated across species5254 thus potentially enable mapping HSPC dynamics onto human counterparts. Self-renewal and differentiation capacities are particularly relevant to leukemia research, because they are the precise cellular behaviors whose dysregulation causes the malignant phenotype. As we show here and supported by previous studies3,20, progenitors can also operate close to self-renewal and a small proliferative advantage may be sufficient to immortalize them. Finally, population dynamic models are universally applicable across biological fields, as adult tissues are commonly replenished from their own stem cell pools55. To inspire such future endeavors, we showcase how to build a model connecting high-resolution molecular information with tissue-scale cell behavior.

Limitations of the Study

Despite vastly improved resolution over immunophenotyping, scRNA-Seq does not capture cellular states in full. Additional variables such as chromatin state or protein levels also affect cell behavior and may manifest in unappreciated heterogeneity and dynamic properties. These characteristics may be heritable in which case they may be tractable with lineage tracing approaches. In addition, the discrete model relies on hard clustering, which averages any finer cell heterogeneity. While most of the early cell fate decisions will occur within the landscape presented in this work, with increased throughput a BM-wide landscape could be generated, thus providing better insight into the entire lymphoid and myeloid differentiation trajectories.

STAR Methods

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Berthold Göttgens (bg200@cam.ac.uk)

Materials availability

Plasmids and mouse lines generated in this work are available upon request.

Method details

Hoxb5CreERT2 and Hoxb5mKO2 mouse lines

The Hoxb5CreERT2 and Hoxb5mKO2 alleles were generated using CRISPR-Cas9 gene editing technology employing fertilized 1-cell zygotes on the B6CBAF1/Crl genetic background. For the Hoxb5CreERT2 allele, we injected a single 15 ng/ul sgRNA (tcctccggatgggctca)15 together with 25 ng/ul CAS9 mRNA and 17.5 ng/ul single strand donor DNA encoding the P2A-CRE-ERT2 protein flanked by 70 nucleotides of homology arms (Supplementary Table S7). For the Hoxb5mKO2 allele, we used the same concentrations of sgRNA (tcctccggatgggctca), CAS9 mRNA and single strand donor DNA encoding the mKO2-P2A-mKO2-CAAX (Supplementary Table S7). The F0 offspring was screened by PCR and Sanger sequencing. The Hoxb5CreERT2 and Hoxb5mKO2 lines were established from one founder animals, respectively, and back-crossed several times to the C57BL/6N genetic background. Mice were genotyped by PCR using primers detailed in Supplementary Table S7.

Transplantation assays and hematopoietic reconstitution analysis

Primary and secondary transplanted recipient mice (CD45.1+/CD45.2+) were lethally irradiated with a split dose of 8 Gy (two doses of 4 Gy administered at least 4 hours apart). For primary transplantations, mice were tail-vein injected with 200 Hoxb5+ or Hoxb5- HSCs (LSK CD48CD150+) sorted from Hoxb5-mKO2 animals, together with 2x105 support CD45.1+ unfractionated BM cells. For secondary transplantations, 3 000 CD45.2+ LSK cells sorted from BM of primary recipients were mixed with 2x105 support CD45.1+ unfractionated BM cells and re-transplanted. Peripheral blood of all recipient mice was analyzed up to 21 weeks after primary and secondary transplantations. Leukocytes and HSCs (LSK CD48CD150+) were stained as described below for flow cytometry analysis of PB and BM, except cells were also incubated with CD45.1-BV605 (Biolegend 110738) and CD45.2-PercP (Biolegend 109826) antibodies. For each mouse, the percentage of donor chimerism in the analyzed cell compartment was defined as the percentage of CD45.1/CD45.2+ cells among total CD45.1/CD45.2+ and CD45.1+/CD45.2- cells, after exclusion of recipient fraction (CD45.1+/CD45.2+).

Induction of reporter gene expression by tamoxifen

Tamoxifen (1g) was dissolved in 10 mL absolute ethanol and 90 mL corn oil at 37ºC. Aliquots of tamoxifen (10 mg/mL) were stored at -20 ºC. 8-12 weeks Hoxb5CreERT2; tdTomato mice were injected intraperitoneally (i.p.) with tamoxifen at 100 mg/kg body weight for 7 days. As controls for subsequent lineage tracing experiments, mice with same genotype were injected with equivalent volume of corn oil to determine whether any labelling was present in the absence of induction. Hoxb5WT; tdTomato mice treated with tamoxifen were also analysed to confirm no background or tamoxifen-induced changes.

Flow cytometry

At end point analyses, the fraction of mKO2+ and Tom+ cells was determined in various hematopoietic compartments of BM, PB, spleens, thymi and lymph nodes. Cells from those tissues were prepared and analyzed as described previously56,57.

For HSC and progenitor cell analyses, unfractionated BM cells were incubated with Fc block, followed by biotin-conjugated anti-lineage marker antibodies (CD4, CD5, CD11b, B220, CD8a, Gr1 and Ter119, cKit-BV711, Sca1-APC/Cy7, CD48-APC and CD150-PE/Cy7 antibodies. Biotin-conjugated antibodies were then stained with Pacific blue-conjugated streptavidin. DAPI was used for dead cell exclusion.

For staining of megakaryocyte and erythroid progenitors, unfractionated BM cells were incubated with antibodies against lineage markers as described above, except Ter119 antibody was replaced by biotin-conjugated anti-CD19. Cells were stained together with cKit-BV711, Sca1-PB, CD150-PE/Cy7, CD16/32-APC/Cy7, CD41-BV605, CD105-APC and Ter119-FITC antibodies. Biotin-conjugated antibodies were then stained with PerCP-conjugated streptavidin.

For analyses of differentiated cells in the BM, cell suspensions were stained with B220-APC and CD19-APC/Cy7 antibodies for B cells, CD11b-PB and Gr1-PE/Cy7 for myeloid cells and Ter119-FITC for erythroid cells.

PB samples were collected from tail vein into EDTA-coated capillary tubes (Sarstedt). 1-2µL of unfraction-ated PB were used for analysis of erythrocytes, mixed with 10uL of platelet solution. Platelets were separated by centrifugation of PB samples at 100g for 10 min at room temperature. Platelets were identified as Ter119-PE/Cy5CD150-PE/Cy7+CD41-BV605+, and Ter119-PeCy5+ cells were erythrocytes.

For analyses of leukocytes in PB, spleen and lymph node, myeloid cells were stained as above for BM cells, T cells with CD8a-APC and CD4-APC antibodies, and CD19-APC/Cy7 antibodies were used to detect B cells.

Cell suspensions from thymus were incubated with the biotin-conjugated anti-lineage marker antibodies described above together with CD4-APC, CD8b-APC/Cy7, CD25-PB and CD44-PE/Cy7 antibodies. Biotin-conjugated antibodies were then stained with PerCP-conjugated streptavidin.

Flow cytometry data were acquired by LSRFortessa (BD) and analysed with FlowJo software (TreeStar, v10).

Cell isolation for the scRNA-Seq experiments

Hoxb5-Tom experiments

All steps in this section (unless otherwise indicated) were performed on ice, and centrifugation steps performed at 300g, 4°C for 5 min. 8-12 weeks old mice carrying the Hoxb5-Cre and the Rosa26-LoxP-STOP-LoxP-tdTomato constructs were treated with 7 daily injections of tamoxifen (as described above) and sacrificed at indicated time-points. BM cells were extracted from ilia, tibiae and femora by grinding with mortar and pestle in PBS supplemented with 2% Fetal Bovine Serum (cell buffer). The suspension was filtered through a 50µm filter, centrifuged and resuspended in 3 ml of cell buffer. Red blood cells were removed using the ammonium chloride solution: 5 ml of 0.8% Ammonium Chloride was added to the suspension and incubated for 10 min with intermittent mixing. Afterwards cells were diluted with 7 ml of cell buffer, centrifuged and resuspended in 1 ml of cell buffer. Subsequently, lineage depletion was performed as follows: added 20 µl of the EasySep mouse hematopoietic progenitor cell isolation cocktail, incubated for 15 min, added 30 µl magnetic particles, incubated for 10 min, added 1.5 ml of cell buffer and placed tubes in a magnet, incubated for 3 min at room temperature and eluted cells twice (with additional 2.5 ml of cell buffer). Afterwards, cells were centrifuged, resuspended in 200 µl of cell buffer and stained with the antibody panel as follows: antibody mix was added, cells were incubated for 30 min, washed with 2 ml of cell buffer, centrifuged, resuspended in 200 µl cell buffer. For the secondary staining Streptavidin-BV510 was added, cells were washed with 2 ml of cell buffer, centrifuged, and resuspended in 1000 µl of cell buffer supplemented with 7AAD. Afterwards cells were sorted with BD influx sorter into either 96 well plates containing 2.3 µl lysis buffer (for the Smart-Seq2 protocol) or 100 µl of PBS with 0.04% BSA in eppendorf tubes (‘droplet buffer’) when used for the 10x Genomics scRNA-Seq protocol. The Smart-Seq2 plates were vortexed, centrifuged at 800g for 2 min and stored at -80°C.

Both Tom+ or Tom- cells within the Lin- (cKit OR Sca1)+ gate were sorted. (cKit OR Sca1)+ is a superset of the cKit+ gate used previously8 which contains more lymphoid progenitors and pDCs.

Hoxb5-mKO2 experiments

All steps in this section (unless otherwise indicated) were performed on ice, and centrifugation steps performed at 500g, 4°C for 5 min. 8-12 weeks old mice carrying the Hoxb5-mKO2 reporter were sacrificed and cells were isolated from bone marrow (femurs and tibia) by grinding with mortar and pestle in PBS supplemented with 2% Fetal Bovine Serum (cell buffer). Cells were stained as described under Flow cytometry section for analysis of HSPCs. Cell suspension was filtered as above and sorted with BD influx sorter into 96-well plates containing 2.3 µl lysis buffer (for the Smart-Seq2 protocol). The Smart-Seq2 plates were vortexed, centrifuged at 800g for 2 min and stored at -80°C. The isolated populations were Lin-, Sca1+, cKit+, CD48-, CD150- (MPPs) and Lin, Sca1+, cKit+, CD48-, CD150+ (HSCs).

scRNA-seq data generation

Smart-Seq2

When cell numbers were limiting single cells were profiled with a modified version of the Smart-Seq2 protocol58,59 rather than 10x Genomics kit. Single cells were sorted into 96-well plates with 2.3 µl lysis buffer containing 0.115 µl of SUPERase-In RNase Inhibitor at 20 U/µl concentration and 0.23 µl of 10% Triton X-100 solution, plates were vortexed and stored at -80°C. After thawing 2 µl of the annealing solution (0.1 µl of ERCC RNA Spike-In solution (1:300,000 dilution), 0.02 µl of the oligo-dT primer (100 µM stock concentration) and 1 µl of dNTP (10 mM stock concentration)) was added. The plate was incubated at 72°C for 3 min, cooled down on ice and reverse transcription was performed by adding 5.7 µl of RT buffer (0.1 µl of Maxima H minus reverse transcriptase at 200 U/µl concentration, 0.25 µl of SUPERase-In RNAse Inhibitor at 20 U/µl concentration, 2 µl of the Maxima enzyme buffer, 0.2 µl of TSO oligo at 100 µM concentration, 1.875 µl of PEG 8000 solution at 40% v/v concentration and 1.275 µl water) and incubation at 42°C for 90 min followed by incubation at 70°C for 15 min. Immediately after, cDNA was amplified by PCR by adding 1 µl of the Terra PCR Direct Polymerase (1.25 U/µl), 25 µl of the Terra PCR Direct buffer and 1 µl of the ISPCR primer (10 µM stock concentration) to a total volume of 50 µl using the following PCR conditions: 98°C for 3 min, 98°C for 15 s, 65°C for 30 s, 68°C for 4 min (21 cycles), 72°C for 10 min. The amplified cDNA was purified using AMPure XP beads, quantified using the PicoGreen assay (ThermoFischer P7589) and used for Nextera library preparation. The libraries were generated using either a standard protocol (batch 7d and mKO2 data) or modified protocol (batches 3d7d, 2w4w and 3dr2, see the corresponding metadata) described below. No obvious batch effects were observed among cells analyzed with either of the protocols.

The standard Nextera protocol: cDNA was diluted to approximately 50-150 pg/µl and 1.25 µl of the solution was used, 2.5 µl of Tagment DNA buffer 1.25 µl of Amplicon Tagment Mix (Nextera XT kit) were added, samples were incubated at 55°C for 10 min, and the reaction was stopped by addition of 1.25 µl of NT buffer. Tagmentation products were amplified by PCR by adding 1.25 µl of each N and S primers and 3.75 µl of NPM solution and using the following thermocycler settings: 72°C 3 min, 95°C 30 s, 12 cycles of 95°C 30s, 55°C 30s, 72°C 60s and a final extension at 72°C for 5 min.

The modified Nextera protocol follows the same principle as the standard Nextera protocol and includes the following steps: cDNA was diluted to approximately 50-150 pg/µl and 1.03 µl of the solution was used, 1.63 µl of Tagment DNA buffer and 0.6 µl Amplicon Tagment Mix was added, samples were incubated at 55°C for 10 min, the reaction was stopped by adding 0.82 µl of NT buffer. Tagmentation products were amplified by adding 1.23 µl of each N and S primers (as above but diluted 5 times), 2.3 µl of Phusion HF buffer, 0.1 µl of dNTP (25 mM stock concentration), 0.07 µl of Phusion polymerase and 2.5 µl of water and using the following thermocycler settings: 72°C 3 min, 98°C 3 min s, 12 cycles of 98°C 10s, 55°C 30s, 72°C 30s and a final extension at 72°C for 5 min.

Libraries were sequenced using the Illumina Hiseq4000 or NovaSeq instruments, obtaining an average of 1, 271,307 reads per cell.

10X genomics

For the 10x Genomics scRNA-Seq protocol up to 20,000 cells were pooled in pairs corresponding to male and female animals, centrifuged and resuspended in a volume of droplet buffer optimal for recovery of up to 10,000 cells and immediately processed with the 10x Genomics Single Cell 3’ v3 protocol following the manufacturer’s instructions.

Libraries were sequenced using the Illumina NovaSeq instrument, obtaining at least 20,000 reads per cell in each run (33,843 reads per cell total average).

Quantification and statistical analysis

scRNA-Seq data analysis

Smart-Seq2 sequencing reads were aligned to the mouse genome (mm10) using the STAR aligner (version 2.7.3a) with default parameters. Reads mapping to exons were counted with featureCounts (version 2.0.0) using the ENSEMBL v93 annotation. Each cell was subjected to a quality control, cells with: <100,000 reads, <23% of reads mapped to exons, >8.5% of reads mapped to ERCC transcripts, >10% mitochondrial reads or <2000 genes detected above 10 counts per million were discarded. 1288 out of 1533 cells passed quality control. Data were normalized 10,000 total counts and ln(n+1) transformed.

10x genomics reads were pre-processed using cellranger (version 3.1.0, reference genome and annotation version 3.0.0) with default settings. Downstream analysis was performed mainly using the scanpy60 framework with additional packages where indicated. Low quality barcodes with less than 1000 genes were excluded from the analysis, doublet scores were estimated using the scrublet tool (using 30 principal components), potential doublets were removed. Male and female cells were distinguished based on the expression of the Xist gene and Y chromosome genes. Cells with detectable Xist expression and undetectable Y chromosome gene expression were classified as female and vice versa, ambiguous cells or potential doublets were excluded. Data were normalised to 10,000 total counts and ln(n+1) transformed.

To determine highly variable genes, scanpy’s highly_variable_genes function was used to select top 5000 genes within the 10x genomics data. From the list of highly variable genes, genes associated with cell cycle, Y-chromosome genes and the Xist were excluded. Genes associated with cell cycle were a union of cell-cycle genes from8 and genes with at least 0.1 Pearson correlation with the following gene set: Ube2c, Hmgb2, Hmgn2, Tuba1b, Ccnb1, Tubb5, Top2a, Tubb4b, following previously established method11. Putative cell cycle phase was assigned using scanpy’s ‘score genes cell cycle’ function to assign putative cell cycle phase to both 10x and Smart-Seq2 cells. Following that, 10x and Smart-Seq2 data were combined and subjected to Seurat CCA batch correction61. Among a variety of batch correction tools (Harmony62, Scanorama63, BBKNN64, fastMNN65, MNNcorrect) only Seurat CCA generated seamless integration best matching the cell frequencies based on flow cytometry analysis. After applying batch correction, we observed no obvious segregation of Smart-Seq2 and 10x scRNA-Seq profiles (Figure S2E). Corrected log-normalized counts were scaled and used to compute 50 principal components, find nearest neighbors and calculate a UMAP projection66. A minor batch effect between 10x samples was corrected using Harmony batch correction tool62. The corrected principal components were used to calculate 12 neighbors followed by cell clustering using the leiden algorithm67 and calculation of the UMAP projection. Clusters were manually annotated based on the marker gene expression as described in Supplementary table S1. To reduce the complexity for the discrete model clusters with the following criteria were excluded from the further analysis: clusters that appeared disjointed from the main landscape body, represented low-quality/dying cells or with unclear origins based on the UMAP projection and PAGA analysis. This included: T cells, innate lymphoid cells (ILCs), cells with high mitochondrial gene counts, mature B cells, interferon-activated cells, cells with high complement expression and small clusters with unclear annotation, likely to represent doublet cells. Unfiltered landscape is displayed in Figure S2G.

To visualize the relative proportions of cells per cluster over time (Figure S4A), we averaged fractions of Tom+ cells in each cluster for each time-point and divided by the respective values for matching Tom- cells.

mKO2 cells analysis

Smart-Seq2 sequencing reads were aligned to the mouse genome (mm10) using the STAR aligner (version 2.7.3a) with default parameters. Reads mapping to exons were counted with featureCounts (version 2.0.0) using the ENSEMBL v93 annotation. Cells with: <100,000 reads, <10% of reads mapped to exons, >10% of reads mapped to ERCC transcripts, >10% mitochondrial reads. 374 out of 384 cells passed quality control. Counts were normalized using the scran package in R and ln(n+1) transformed. Log-normalized counts were used to generate the corresponding violin plots, compute HSC-scores16 and the projections on the7 landscape. Particularly, the projections were performed within the scanpy module: log-normalized counts of the mKO2 experiment and of the published datasets were combined, subsetted to highly variable genes, and scaled. 50 PCs were then computed and corrected with the mnn_correct package65. Adjacency scores were determined based on the fraction of cells in the reference landscape that are neighbours of the cells to be projected according to the euclidean metric (method adapted from Dahlin et al.8).

Subclustering of cluster 0

To verify whether the HSC tip population has a constant labelling frequency, we subset cluster 0 from our landscape. We then focussed on 10x data only, to avoid artefacts deriving from the integration of different data types when it comes to very high detail. We then subclustered cluster 0 cells with higher resolution (1.3) of the Leiden algorithm. Among these subclusters, we identified the subcluster that has the highest HSC score16 as the putative cluster 0a.

Embedding external datasets into the integrated HSPC landscape

For each external datasets the log-normalised counts for cells passing quality control were used as in the original work. Annotation was either obtained from the respective GEO repositories, literature or kindly provided by the authors.

Each dataset was integrated with the HSPC landscape (below denoted as reference) using the indicated batch correction tools and the Cellproject package as follows. Log-normalized counts for7 were concatenated with the reference and batch effect was removed using Seurat CCA method61 only highly-variable genes selected in the reference landscape were used. The corrected values were scaled and used to compute PCA (50 components) in the reference dataset. The correct values of Nestorowa et al.7 dataset were fit into the reference PCA space, in which 15 nearest neighbors were identified between the datasets. These nearest neighbors were used for two purposes: (1) transfer the cluster identity to the new data (based on the most frequent label) and (2) to predict coordinates in the original reference PCA space (used as a basis for UMAP projection) using nearest-neighbor regression. Finally, the new PCA coordinates were used to embed the new data into UMAP space. As immunophenotypic populations we used the ‘narrow’ classification provided in the original study.

Bowling et al.18 data was concatenated with the reference and a common PCA space was calculated, which was subsequently corrected with the Harmony batch correction tool. Within the corrected space 8 nearest neighbors were identified across the datasets, followed by label transfer and UMAP embedding as described above.

Weinreb et al.11 and Upadhaya et al.43 data were integrated analogously to the Nestorowa et al.7 data. For Figure 2F, S3C only ‘state-fate’ clones were used, ie. cells captured at an early time-point (day2) with measured fate outcomes at later time-points. Only fates with more than 7 cells were considered for the analysis. To enable model predictions (Figures S7, E12-14) all cells and time-point were integrated using the same method.

Trajectory inference and selection

To pinpoint the most immature stem cells the HSC score was calculated (default parameters)16 and denoised by averaging values over the nearest neighbors for each cell. As diffusion pseudotime the cell with the highest smoothed HSC score was selected, diffusion map was calculated and served as the basis for trajectory inference and continuous populations models described below.

To infer putative trajectories Tom+ cells were used (matching the Pseudodynamics analysis below) for calculating cell transition probabilities using the Pseudotime Kernel method (based on the Palantir tool32) from the CellRank package31. To define the end states clusters 6, 7, 10, 11, 13, 14, 15, 16, 17, 18, 19 were selected and within them 50 cells with the highest pseudotime values. These states are largely consistent with an unsupervised method of macrostate selection Generalized Perron Cluster Analysis with Schur decomposition31. To assign cell fate probabilities Cellrank’s compute_absorption_probabilities function was used.

Cells belonging to trajectories for the continuous models were selected as follows. In case of megakaryocytic trajectory cells belonging to cluster 0, 7 and 8 and with the respective fate probability >0.3 were chosen. For the erythroid trajectory cells with respective fate probability <0.2 and falling within the pseudotime range 0.015 and 0.294 (to exclude variable small number at the end of the trajectory) were used. Neutrophil and monocyte share a long stretch of progenitors with high probabilities towards both lineages, thus a different approach was used, motivated the apparent locations of bipotent cells with neutrophil and monocyte/DC potential based on cell fate assays (Figure 2F)11. Neutrophil progenitors (terminal state 10) were selected with fate probability >0.24 and Mono/DC probability <0.38 and excluding a small number of cells falling into clusters 12, 17 and 14. Conversely for the Mono/DC progenitors (terminal state 6) cells were selected with Mono/DC fate probability >0.18 and neutrophil probability <0.49 and a small number of cells falling into clusters 12, 17 and 14 was excluded.

Discrete population model analysis

As input to the discrete models the estimated total number of Tom+ or Tom- cells per cluster was used (Supplementary Table S2). The numbers were estimated based on the fraction of cells assigned to each cluster adjusted by the total number of cells (based on the flow cytometry analysis of the entire sample). One out of 5 mice analyzed at day 3 exhibited abnormally high labelling frequency, the sample was excluded to avoid introducing bias but we provide the corresponding data within the GEO submission files and source code for individual assessment.

To assess the kinetics of differentiation and growth of the different hematopoietic populations, we first considered a discrete compartments model, using the HSPC landscape clusters as compartments. To establish the available differentiation pathway, PAGA connections and pseudotime ordering were considered. We used a relaxed lenient PAGA connectivity threshold of 0.05 preserving the majority of connections between ‘adjacent’ clusters, consistent with the hematopoiesis models9,11,13,19. The relaxed PAGA connectivity threshold of 0.05 This reduced the number of model parameters and prohibted ‘jumps’ between distant states (e.g. from HSC to neutrophil progenitor directly), in line with the common assumptions of trajectory inference methods. Beyond identifying putative transitions the connectivity weights do not feed into our dynamics models. Furthermore, no back differentiation (ie. against pseudotime ordering) was permitted into cluster 0 and from most differentiated clusters with clear expression of commitment genes: 1, 3, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19. Other transitions above-threshold were considered potentially bidirectional. Each compartment is assigned a growth rate and as many differentiation rates as the number of its progeny compartments. Assuming the following:

  • the label is neutral and stably propagated

  • the kinetics parameters of each cluster are constant over time and independent of the size of any cluster

  • the labeled and unlabeled cells have identical kinetics,

Population dynamics can be modelled as an ODE system of coupled equations:

x˙i(t)=(βij=1ncαi,j)xi(t)+j=1ncαi,jxj(t)

where xi(t) is the number of cells in population i, αj,i is the differentiation rate from compartment j to i, and βi the growth rate of population i. For the terminal and initial clusters, the equations take form respectively:

x˙i(t)=βixi(t)+j=1ncαj,ixj(t)x˙0(t)=(β0j=1ncα0,j)x0(t)

Please note that differentiation rates are set to zero if they have not passed the thresholding criteria as explained above. The differentiation rates were allowed to vary between 0 and 4 per day, with the exception of cluster 0a’s rates, which were bounded to vary between 0 and 0.02 per day, based on previous knowledge of HSCs low activity22,41. The growth rates were bounded between -4 and 4 per day, to allow for death rate (negative values) or additional differentiation towards more mature cell states outside the presented HSPC landscape, or cell migration. The number of clusters, nc, is equal to 22, one per each of the 20 Leiden clusters, plus 2 additional subpopulations within cluster 0, the most immature cluster. The reason for this choice lays in 2 observed characteristics in the data: cluster 0 ratio of labelled to unlabeled cells (labelling frequency) grows over time, and some downstream clusters’ labelling frequency overshoots the one in cluster 0. Based on Barile et al.22 and Takahashi et al.20, this implies that the progenitor cluster must be heterogeneous. Indeed, the most immature HSCs occupy only the tip of cluster 0 (Figure 2C).

Particularly, we chose to add 2 more sub-compartments to allow for differentiation bias in the HSCs.22,41 The growth rate in the most immature subcluster 0a was fixed in such a way to balance the differentiation rates, given the a priori knowledge that pure functional haematopoietic stem cells show only limited growth over time. The proliferation estimates range, we chose from one division per 145 days to in 50 days3,22,26,41. We accounted for this upon modelling cluster 0 overall number of cells with a logistic function, and thus added a logistic parameter ρ and a carrying capacity K. Both parameters are positive and unconstrained. Specifically, we implemented the following equations for cluster 0a:

x˙0(t)=ρx0(1x0(t)/K)
x˙0a(t)=x˙0(t)x˙0b(t)x˙0c(t),

while the time evolution of clusters 0b and 0c is analogous to that of all other clusters. Since we calibrated the ODE system to both the labelled and unlabeled cells time courses, we also included as parameters 22*2 initial conditions (corresponding to labelling frequencies and cluster sizes), all positive and unbounded, except for the number of cells in cluster 0a, set to range between 500 and 1500 based on previous HSC number estimates68 and factoring in cell isolation efficiency. The model allows the initial number of labelled cells to be greater than zero, thus accounting for any unspecific labelling.

We calibrated our model to 4 types of observables:

  • The number of labeled cells in each cluster over time and relative to cluster 0 as computed via scRNA-Seq analysis

  • The number of unlabeled cells in each cluster over time and relative to cluster 0 as computed via scRNA-Seq analysis

  • The number of labeled cells in cluster 0 over time as computed via FACS sorting and scRNA-Seq analysis

  • The number of unlabeled cells in cluster 0 over time as computed via FACS sorting and scRNA-Seq analysis

To estimate the parameters, we minimized a cost function of the squared sum of residuals. Each residual is weighted by the squared error, which was computed as pooled variance per time course. We computed the 95% confidence bounds on the parameters’ best fit with the profile likelihood method as in22,69. To compute error bounds on the model, we ran ≈4000 bootstrap simulations, where data is resampled with replacement per time-point, and the cost function is re-minimized on the new dataset. For each simulation, a new parameter vector is found, and a model curve generated. 95% bootstrap confidence bounds are then determined cutting upper and lower 0.025 quantiles per time-point.

The bi-phasic model was generated analogously for data split into to the recovery phase (days 3-27) and the homeostasis phase (remaining time-points). We observe vast majority of changes in Tom- cell abundance within the first 12 days, thus we conservatively chose day 27 as a boundary.

To simulate the ablation of any population, the initial condition of the unlabeled cells for the corresponding compartment can be set to 0. To ablate the HSCs, we simultaneously set to 0 the initial condition of all 3 subclusters.

To compute the journey times, we generated the model in the time interval 1-300 days with 1 day steps, assuming that cells are initially only in cluster 0 and with the unlabeled cells initial condition. We then computed the smallest time for which the number of cells in a population reaches one and dubbed that journey time.

Generalized model for testing alternative topologies

As explained in the main text and in the above section of the methods, our model constrains the topology based on the PAGA-predicted edges. In principle, though, one could test any topology, including backwards differentiation and unlikely connections such as HSC differentiating directly into a terminal compartment. We have thus implemented an additional code where the user can test the performance of any model upon setting to 1 the entries of a 22 X 22 table representing the existence of a differentiation rate from any cluster to any cluster.

Model selection for perturbed systems

To infer what parameters may change in non-homeostatic conditions, we developed a model selection-based method. We first fixed the parameters describing the challenged system to our best fit, and then allowed the parameters of specific populations to change. We considered 14 populations whose proliferation and differentiation rates may change, being 14 out of 20 the populations that have at least one progeny in the ‘challenge’ dataset. Out of these 14 populations, any subgroup may change its parameters or not, for a total of 214 = 16384 models. These models were all fit to the transplantation data. In order to rank these models, we employed the Akaike information criterion, and retained only those models that simultaneously have the lowest possible number of populations whose parameters change in order to fit the data and whose corrected Akaike index is not greater than the best ranking Akaike index plus 10.

Continuous population model analysis

In order to compute pseudotime-dependent kinetic rates, we relied on the pseudodynamics framework30. Briefly, the compartment model explained in the previous section has a one to one correspondence to the continuous model if the compartment index is treated as a continuous variable, namely the diffusion pseudotime coordinate s, the number of cells is replaced by the cell density over pseudotime and real time u(s, t), and the differentiation and net proliferation rates are replaced by the drift v(s) and the growth rate g(s), respectively. Given these substitutions, the ODE system becomes a PDE system. In addition, the Pseudodynamics framework also introduced an extra parameter D(s) that allows for diffusion of the cells on the pseudotime axis to account for stochasticity in the differentiation process. The 3 kinetics parameters, drift, growth rate and diffusion, are modelled as natural cubic splines with 9 nodes. The nodes boundaries were kept as in the original publication: between 0 and 1 per day for drift and diffusion, and between -5 and 6 per day for the growth rate. To simplify the computation, we estimated such rates independently for 4 different trajectories, which avoids introducing parameters that describe the branching process. The trajectories were chosen based on the affinity to each terminal state as estimated by CellRank (see section ‘Trajectory inference and selection’). For each trajectory, the PDE reads:

u(s,t)t=s(D(s)u(s,t)s)s(v(s)u(s,t))+g(s)u(s,t)

For the boundaries, we assumed no-flux Robin conditions, as in the original publication. To solve the PDE, we used the non-branching pseudodynamics model as compiled in MATLAB 2017b, with only one difference: we did not enforce differentiation to be 0 at the end of the trajectory which, together with the growth rates taking also negative values, accounts for the fact that the populations in our landscape are all transient and that fully mature cells are not captured by our gating strategy. The model was calibrated to the time-dependent density and total number of labelled cells only. The error was computed as variance among replicates. For each trajectory, at least 240 simulations were launched, with regularization parameters 0, 1, or 10 to penalize big differences in the splines’ nodes. The solution was chosen based on the highest log-likelihood, and the regularization parameter as the highest that visually fits the data well.

Differential expression analysis

For the DE analysis cells were selected to match the continuous model trajectories. The shapes of differentiation and net proliferation rates were inspected for potential regions of interests and respective ranges of pseudotime values were chosen. Prior to the analysis genes with low expression were filtered out, only genes detected in more than 2.5% cells and with overall mean expression above 0.05 (data normalized with logNormCounts from the scuttle package) were included. To select genes with dynamic expression in the chosen intervals the fitGAM function followed by startVsEndTest from the TradeSeq package were used. Genes were considered significant if they showed at least FDR of 0.1 and a log2(Fold change) of at least 1. Predicted and smoothed gene expression was used using the predictSmooth function from the same package. In heatmaps genes were clustered with hierarchical clustering using the hclust R function with default settings. Transcription factors were selected based on the gene list established in70, TF groups were established by cutting the tree at the level of 4. Gene enrichment was performed using GSEAPY interface to the enrichr tool71. The first derivative of the differentiation rate was calculated using interpolation at the same pseudotime points that were used to predict gene expression using the TradeSeq model described above.

Transplantation data analysis

Dong et al. data44 was integrated into the HSPC landscape analogously to the7 data integration described in section ‘Embedding external datasets into the integrated HSPC landscape’. Cells in each HSPC cluster were counted and used as an input into the discrete model prediction. Day 3 data was used as the initial condition and cell abundances per cluster were predicted from day 3 to day 7. The bootstrap confidence bounds were recomputed upon substituting the initial conditions. Given that the experimental data in relevant clusters vastly exceed the model prediction bounds, we concluded that the dynamics of perturbed haematopoiesis are different from normal conditions and suggest increased differentiation.

Key Resource Table.
REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Anti-Mouse CD45.1 (BV605 conjugated, clone A20) Biolegend Cat#110738;
RRID: AB_11204076
Anti-Mouse CD45.2 (PerCP conjugated, clone 104) Biolegend Cat#109826; RRID:
AB_893349
Anti-Mouse CD4 (Biotin conjugated, clone H129.19) BD Biosciences Cat#553649;
RRID: AB_394969
Anti-Mouse CD5 (Biotin conjugated, clone 53-7.3) BD Biosciences Cat#553019;
RRID: AB_394557
Anti-Mouse CD8a (Biotin conjugated, clone 53-6.7) BD Biosciences Cat#553029;
RRID: AB_394567
Anti-Mouse CD11b (Biotin conjugated, clone M1/70) BD Biosciences Cat#553309;
RRID: AB_394773
Anti-Mouse CD45R/B220 (Biotin conjugated, clone RA3-6B2) BD Biosciences Cat#553086;
RRID: AB_394616
Anti-Mouse Gr-1/Ly-6G/C (Biotin conjugated, clone RB6-8C5) BD Biosciences Cat#553125;
RRID: AB_394641
Anti-Mouse Ter119 (Biotin conjugated, clone TER-119) BD Biosciences Cat#553672;
RRID: AB_394985
Anti-Mouse c-Kit/CD117 (BV711 conjugated, clone 2B8) Biolegend Cat#105835;
RRID: AB_2565956
Anti-Mouse Sca-1 (APC-Cy7 conjugated, clone D7) Biolegend Cat#108126;
RRID: AB_10645327
Anti-Mouse CD48 (APC conjugated, clone HM48-1) Biolegend Cat#103411; RRID:
AB_571996
Anti-Mouse CD150 (PE-Cy7 conjugated, clone 12F12.2) Biolegend Cat#115914;
RRID: AB_439797
Anti-Mouse CD19 (Biotin conjugated, clone 1D3) Biolegend Cat#1; 553784;
RRID: AB_395048
Anti-Mouse Sca-1 (PB conjugated, clone D7) Biolegend Cat#108120; RRID:
AB_493273
Anti-Mouse CD16/32 (APC-CY7 conjugated, clone 93) Biolegend Cat#101328; RRID:
AB_2104158
Anti-Mouse CD41 (BV605 conjugated, clone MWReg30) Biolegend Cat#133921; RRID:
AB_2563933
Anti-Mouse CD105 (APC conjugated, clone MJ7/18) Biolegend Cat#120413; RRID:
AB_2277915
Anti-Mouse Ter119 (FITC conjugated, clone TER-119) Biolegend Cat#116206;
RRID: AB_313707
Anti-Mouse CD45R/B220 (APC conjugated, clone RA3-6B2) Biolegend Cat#103212;
RRID: AB_312997
Anti-Mouse CD19 (APC-Cy7 conjugated, clone 6D5) Biolegend Cat#115529;
RRID: AB_830707
Anti-Mouse CD11b (PB conjugated, clone M1/70) Biolegend Cat#101224;
RRID: AB_755986
Anti-Mouse Gr-1/Ly-6G/C (PE-Cy7 conjugated, clone RB6-8C5) Biolegend Cat#108416;
RRID: AB_313381
Anti-Mouse Ter119 (PE-Cy5 conjugated, clone TER-119) Biolegend Cat#116210;
RRID: AB_313711
Anti-Mouse CD8a (APC conjugated, clone 53-6.7) Biolegend Cat#100712;
RRID: AB_312751
Anti-Mouse CD4 (APC conjugated, clone GK1.5) Biolegend Cat#100411;
RRID: AB_312696
Anti-Mouse CD8a (APC-CY7 conjugated, clone YTS156.7.7) Biolegend Cat#126620; RRID:
AB_2563951
Anti-Mouse CD25 (PB conjugated, clone PC61) Biolegend Cat#102022;
RRID:AB_493643
Anti-Mouse CD44 (PE-CY7 conjugated, clone IM7) Biolegend Cat#103030;
RRID: AB_830787
Fc Block (anti-mouse CD16/32, clone 93) Biolegend Cat#101320;
RRID: AB_1574975
Streptavidin (PerCP conjugated) Biolegend Cat#405213
Streptavidin (Pacific Blue conjugated) ThermoFisher Scientific Cat# S11222
DAPI BD Biosciences Cat#564907;
RRID: AB_2869624
Mouse hematopoietic progenitor cell isolation cocktail Stem Cell Technologies 19856
CD48-APC ThermoFischer 17-0481-82,
RRID:AB_469408
c-Kit-APC/Cy7 Biolegend 105826,
RRID:AB_1626278
Sca1-BV421 Biolegend 108133,
RRID:AB_2650926
CD150-PE/Cy7 Biolegend 115914,
RRID:AB_439797
Streptavidin-BV510 Biolegend 405234
Chemicals, peptides, and recombinant proteins
Tamoxifen Sigma T5648; CAS: 10540-29-1
Corn Oil Sigma C8267; CAS: 8001-30-7
DAPI BD Pharmigen 564907
Ammonium Chloride Stem Cell Technologies 07800
SUPERase-In RNase Inhibitor ThermoFisher AM2694
dNTP mix ThermoFisher 10319879
ERCC RNA Spike-In Mix ThermoFisher 4456740
Maxima H minus Reverse Transcriptase ThermoFisher EP0753
Terra PCR Direct Polymerase Mix Takara 639270
Agencourt AMPure XP beads Beckman Coulter A63881
Nextera XT DNA sample preparation kit 96 samples Illumina FC-131-1096
Triton X-100 solution Sigma 93443
PEG 8000 solution Sigma P1458
Phusion polymerase ThermoFischer F530L
Critical commercial assays
10x Genomics Single Cell 3’ v3 10X Genomics PN-1000268
Deposited data
Sequencing data This paper GEO: GSE207412
Pre-processed input data This paper https://doi.org/10.5281/zenodo.10156542
Extended data figures This paper https://doi.org/10.5281/zenodo.10156542
Extended data tables This paper https://doi.org/10.5281/zenodo.10156542
Experimental models: Organisms/strains
Mouse: Hoxb5mKO2 This paper N/A
Mouse: Hoxb5CreERT2 This paper N/A
Oligonucleotides
See Table S7 for list of oligonucleotides and ssDNA This paper N/A
TSO 5′-
AAGCAGTGGTATCAACGCAGAGTACATrGrG+G-3’
IDT NA
Oligo-dT30VN 5′–
AAGCAGTGGTATCAACGCAGAGTAC(T30)VN-3’
IDT NA
ISPCR oligo 5′-AAGCAGTGGTATCAACGCAGAGT-3’ IDT NA
Nextera XT 96-Index kit, 384 samples Illumina FC-131-1002
Recombinant DNA
Software and algorithms
FlowJo v10 FlowJo,Tree Star Inc. N/A
GraphPad Prism 6 software GraphPad Software, Inc. N/A
Analysis code This paper https://doi.org/10.5281/zenodo.10156542 and
https://github.com/Iwo-K/HSPCdynamics2022
Singularity container (containing all scRNA-Seq analysis software) This paper https://doi.org/10.5281/zenodo.10156542
Cellproject https://github.com/Iwo-K/cellproject NA
Cellranger v3.1.0 10X genomics NA

Supplementary Material

Supplementary Materials

Acknowledgements

The authors thank Reiner Schulte, Chiara Cossetti and Gabriela Grondys-Kotarba from the Cambridge Institute for Medical Research Flow Cytometry Core facility for their assistance with cell sorting. We would also like to thank Katarzyna Kania and others at the Cancer Research UK Cambridge Institute Genomics Core Facility for generating the 10x Genomics libraries and performing high-throughput sequencing. The authors are also grateful to all staff of the Biological Services Unit at Queen Mary University of London for their technical support. Work in the Kranc Laboratory is supported by Cancer Research UK (awards C29967/A14633 and C29967/A26787), The Barts Charity, Blood Cancer UK, and the Kay Kendall Leukaemia Fund. The O’Carroll laboratory is supported by the Wellcome Trust Investigator Award (106144), the Wellcome Centre for Cell Biology (203149) and a Wellcome multi-user equipment grant (108504). Work in the Göttgens laboratory is supported by Wellcome (206328/Z/17/Z and 203151/Z/16/Z), Blood Cancer UK (18002), Cancer Research UK (C1163/A21762) and UKRI Medical Research Council (MC_PC_17230). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Footnotes

Author contributions

Part1 - Hoxb5-mKO2 and Hoxb5-Tom model: conceptualisation, generation and characterisation

Conceptualization K.R.K. and D.O.C.; Methodology J.C., F.S., N.B., P.N.M, K.R.K. and D.O.C; Software M.B.; Validation J.C., F.S., N.B., P.N.M., L.A., H.L., K.R.K. and D.O.C.; Formal Analysis I.K, J.C., M.B., F.S., N.B., K.R.K., D.O.C. and B.G.; Investigation J.C., F.S., N.B., P.N.M., L.A. and H.L.; Resources J.C., F.S., N.B., P.N.M., L.A., H.L., K.R.K. and D.O.C.; Data Curation J.C., F.S., N.B., L.A., H.L., K.R.K. and D.O.C.; Writing - Original Draft I.K., Writing - Review & Editing I.K., J.C., M.B., K.R.K., D.O.C. and B.G.; Visualisation I.K., J.C., M.B., F.S., N.B., K.R.K., D.O.C. and B.G.; Supervision H.L., K.R.K., D.O.C. and B.G.; Project Administration J.C., F.S., N.B., P.N.M., L.A., H.L., K.R.K., D.O.C. and B.G; Funding Acquisition K.R.K. and D.O.C.

Part2 - Single-cell transcriptomics and dynamics modelling: conceptualisation, data generation and analysis

Conceptualization I.K, M.B. and B.G.; Methodology I.K., M.B. and B.G.; Software I.K. and M.B.; Validation I.K., M.B. and B.G.; Formal Analysis I.K., M.B. and B.G.; Investigation I.K., J.C., N.B., M.L.R.H. and S.J.K; Resources I.K., J.C., M.B., K.R.K., D.O.C. and B.G.; Data Curation I.K., M.B and B.G.; Writing - Original Draft I.K. and M.B.; Writing - Review & Editing I.K, M.B., K.R.K., D.O.C. and B.G.; Visualisation I.K., M.B. and B.G.; Supervision I.K and B.G.; Project Administration I.K., J.C., M.B. and B.G; Funding Acquisition K.R.K. and B.G.

Conflict of interests

NB is now an employee of AstraZeneca. IK is now an employee of Xap Therapeutics. The other authors declare that they have no conflict of interest.

Data and code availability

https://gottgens-lab.stemcells.cam.ac.uk/bgweb2/HSPC_dyn2022/

External online data will be maintained long-term using: the GEO repository (sequencing data), Mendeley Data repository (code, pre-processed input data and software environment) and a dedicated server maintained by the University of Cambridge Stem Cell Institute (interactive visualization).

References

  • 1.Seita J, Weissman IL. Hematopoietic Stem Cell: Self-renewal versus Differentiation. Wiley Interdiscip Rev Syst Biol Med. 2010;2:640–653. doi: 10.1002/wsbm.86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Reya T, Morrison SJ, Clarke MF, Weissman IL. Stem cells, cancer, and cancer stem cells. Nature. 2001;414:105–111. doi: 10.1038/35102167. [DOI] [PubMed] [Google Scholar]
  • 3.Busch K, Klapproth K, Barile M, Flossdorf M, Holland-Letz T, Schlenner SM, Reth M, Höfer T, Rodewald H-R. Fundamental Properties of Unperturbed Haematopoiesis From Stem Cells in Vivo. Nature. 2015;518:542–546. doi: 10.1038/nature14242. [DOI] [PubMed] [Google Scholar]
  • 4.Paul F, Arkin Y, Giladi A, Jaitin DA, Kenigsberg E, Keren-Shaul H, Winter D, Lara-Astiaso D, Gury M, Weiner A, et al. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell. 2015;163:1663–1677. doi: 10.1016/j.cell.2015.11.013. [DOI] [PubMed] [Google Scholar]
  • 5.Perié L, Duffy KR, Kok L, de Boer RJ, Schumacher TN. The Branching Point in Erythro-Myeloid Differentiation. Cell. 2015;163:1655–1662. doi: 10.1016/j.cell.2015.11.059. [DOI] [PubMed] [Google Scholar]
  • 6.Klein F, Roux J, Cvijetic G, Rodrigues PF, von Muenchow L, Lubin R, Pelczar P, Yona S, Tsapogas P, Tussiwand R. Dntt expression reveals developmental hierarchy and lineage specification of hematopoietic progenitors. Nat Immunol. 2022:1–13. doi: 10.1038/s41590-022-01167-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nestorowa S, Hamey FK, Pijuan Sala B, Diamanti E, Shepherd M, Laurenti E, Wilson NK, Kent DG, Göttgens B. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood. 2016;128:e20–e31. doi: 10.1182/blood-2016-05-716480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dahlin JS, Hamey FK, Pijuan-Sala B, Shepherd M, Lau WWY, Nestorowa S, Weinreb C, Wolock S, Hannah R, Diamanti E, et al. A single-cell hematopoietic landscape resolves 8 lineage trajectories and defects in Kit mutant mice. Blood. 2018;131:e1–e11. doi: 10.1182/blood-2017-12-821413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tusi BK, Wolock SL, Weinreb C, Hwang Y, Hidalgo D, Zilionis R, Waisman A, Huh JR, Klein AM, Socolovsky M. Population Snapshots Predict Early Hematopoietic and Erythroid Hierarchies. Nature. 2018;555:54–60. doi: 10.1038/nature25741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang S-W, Herriges MJ, Hurley K, Kotton DN, Klein AM. CoSpar identifies early cell fate biases from single-cell transcriptomic and lineage information. Nat Biotechnol. 2022;40:1066–1074. doi: 10.1038/s41587-022-01209-1. [DOI] [PubMed] [Google Scholar]
  • 11.Weinreb C, Rodriguez-Fraticelli A, Camargo FD, Klein AM. Lineage Tracing on Transcriptional Landscapes Links State To Fate During Differentiation. Science. 2020;367:eaaw3381. doi: 10.1126/science.aaw3381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yeo GHT, Saksena SD, Gifford DK. Generative Modeling of Single-Cell Time Series With Prescient Enables Prediction of Cell Trajectories With Interventions. Nature Communications. 2021;12:3222. doi: 10.1038/s41467-021-23518-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pei W, Shang F, Wang X, Fanti A-K, Greco A, Busch K, Klapproth K, Zhang Q, Quedenau C, Sauer S, et al. Resolving Fates and Single-Cell Transcriptomes of Hematopoietic Stem Cell Clones by PolyloxExpress Barcoding. Cell Stem Cell. 2020;27:383–395.:e8. doi: 10.1016/j.stem.2020.07.018. [DOI] [PubMed] [Google Scholar]
  • 14.Montoro DT, Haber AL, Biton M, Vinarsky V, Lin B, Birket SE, Yuan F, Chen S, Leung HM, Villoria J, et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature. 2018;560:319–324. doi: 10.1038/s41586-018-0393-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen JY, Miyanishi M, Wang SK, Yamazaki S, Sinha R, Kao KS, Seita J, Sahoo D, Nakauchi H, Weissman IL. Hoxb5 marks long-term haematopoietic stem cells and reveals a homogenous perivascular niche. Nature. 2016;530:223–227. doi: 10.1038/nature16943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hamey FK, Göttgens B. Machine learning predicts putative hematopoietic stem cells within large single-cell transcriptomics data sets. Exp Hematol. 2019;78:11–20. doi: 10.1016/j.exphem.2019.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Madisen L, Zwingman TA, Sunkin SM, Oh SW, Zariwala HA, Gu H, Ng LL, Palmiter RD, Hawrylycz MJ, Jones AR, et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat Neurosci. 2010;13:133–140. doi: 10.1038/nn.2467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bowling S, Sritharan D, Osorio FG, Nguyen M, Cheung P, Rodriguez-Fraticelli A, Patel S, Yuan W-C, Fujiwara Y, Li BE, et al. An Engineered Crispr-Cas9 Mouse Line for Simultaneous Readout of Lineage Histories and Gene Expression Profiles in Single Cells. Cell. 2020;181:1410–1422.:e27. doi: 10.1016/j.cell.2020.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rodriguez-Fraticelli AE, Wolock SL, Weinreb CS, Panero R, Patel SH, Jankovic M, Sun J, Calogero RA, Klein AM, Camargo FD. Clonal analysis of lineage fate in native haematopoiesis. Nature. 2018;553:212–216. doi: 10.1038/nature25168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Takahashi M, Barile M, Chapple RH, Tseng Y-J, Nakada D, Busch K, Fanti A-K, Säwén P, Bryder D, Höfer T, et al. Reconciling Flux Experiments for Quantitative Modeling of Normal and Malignant Hematopoietic Stem/Progenitor Dynamics. Stem Cell Reports. 2021;16:741–753. doi: 10.1016/j.stemcr.2021.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, Rajewsky N, Simon L, Theis FJ. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20:59. doi: 10.1186/s13059-019-1663-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Barile M, Busch K, Fanti A-K, Greco A, Wang X, Oguro H, Zhang Q, Morrison SJ, Rodewald H-R, Höfer T. Hematopoietic stem cells self-renew symmetrically or gradually proceed to differentiation. 2020:2020.08.06.239186. doi: 10.1101/2020.08.06.239186. [DOI] [Google Scholar]
  • 23.Sánchez-Aguilera A, Arranz L, Martín-Pérez D, García-García A, Stavropoulou V, Kubovcakova L, Isern J, Martín-Salamanca S, Langa X, Skoda RC, et al. Estrogen Signaling Selectively Induces Apoptosis of Hematopoietic Progenitors and Myeloid Neoplasms without Harming Steady-State Hematopoiesis. Cell Stem Cell. 2014;15:791–804. doi: 10.1016/j.stem.2014.11.002. [DOI] [PubMed] [Google Scholar]
  • 24.Hamey FK, Lau WWY, Kucinski I, Wang X, Diamanti E, Wilson NK, Göttgens B, Dahlin JS. Single-cell molecular profiling provides a high-resolution map of basophil and mast cell development. Allergy. 2021;76:1731–1742. doi: 10.1111/all.14633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wu C, Boey D, Bril O, Grootens J, Vijayabaskar MS, Sorini C, Ekoff M, Wilson NK, Ungerstedt JS, Nilsson G, et al. Single-cell transcriptomics reveals the identity and regulators of human mast cell progenitors. Blood Advances. 2022;6:4439–4449. doi: 10.1182/bloodadvances.2022006969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhang L, Mack R, Breslin P, Zhang J. Molecular and cellular mechanisms of aging in hematopoietic stem cells and their niches. Journal of Hematology & Oncology. 2020;13:157. doi: 10.1186/s13045-020-00994-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Benz C, Copley MR, Kent DG, Wohrer S, Cortes A, Aghaeepour N, Ma E, Mader H, Rowe K, Day C, et al. Hematopoietic Stem Cell Subtypes Expand Differentially during Development and Display Distinct Lymphopoietic Programs. Cell Stem Cell. 2012;10:273–283. doi: 10.1016/j.stem.2012.02.007. [DOI] [PubMed] [Google Scholar]
  • 28.Muller-Sieburg CE, Cho RH, Karlsson L, Huang J-F, Sieburg HB. Myeloid-biased hematopoietic stem cells have extensive self-renewal capacity but generate diminished lymphoid progeny with impaired IL-7 responsiveness. Blood. 2004;103:4111–4118. doi: 10.1182/blood-2003-10-3448. [DOI] [PubMed] [Google Scholar]
  • 29.Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai C-Y, Nakauchi Y, Pritchard JK, Nakauchi H. Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell. 2018;22:600–607.:e4. doi: 10.1016/j.stem.2018.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fischer DS, Fiedler AK, Kernfeld EM, Genga RMJ, Bastidas-Ponce A, Bakhti M, Lickert H, Hasenauer J, Maehr R, Theis FJ. Inferring population dynamics from single-cell RNA-sequencing time series data. Nat Biotechnol. 2019;37:461–468. doi: 10.1038/s41587-019-0088-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lange M, Bergen V, Klein M, Setty M, Reuter B, Bakhti M, Lickert H, Ansari M, Schniering J, Schiller HB, et al. CellRank for directed single-cell fate mapping. Nat Methods. 2022;19:159–170. doi: 10.1038/s41592-021-01346-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Setty M, Kiseliovas V, Levine J, Gayoso A, Mazutis L, Pe’er D. Characterization of cell fate probabilities in single-cell data with Palantir. Nat Biotechnol. 2019;37:451–460. doi: 10.1038/s41587-019-0068-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Olsson A, Venkatasubramanian M, Chaudhri VK, Aronow BJ, Salomonis N, Singh H, Grimes HL. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature. 2016;537:698–702. doi: 10.1038/nature19348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ma C, Staudt LM. LAF-4 Encodes a Lymphoid Nuclear Protein With Transactivation Potential That Is Homologous to AF-4, the Gene Fused to MLL in t(4;ll) Leukemias. Blood. 1996;87:734–745. doi: 10.1182/blood.V87.2.734.bloodjournal872734. [DOI] [PubMed] [Google Scholar]
  • 35.Amann-Zalcenstein D, Tian L, Schreuder J, Tomei S, Lin DS, Fairfax KA, Bolden JE, McKenzie MD, Jarratt A, Hilton A, et al. A new lymphoid-primed progenitor marked by Dach1 downregulation identified with single cell multi-omics. Nat Immunol. 2020;21:1574–1584. doi: 10.1038/s41590-020-0799-x. [DOI] [PubMed] [Google Scholar]
  • 36.Kumar P, Beck D, Galeev R, Thoms JAI, Talkhoncheh MS, de Jong I, Unnikrishnan A, Baudet A, Subramaniam A, Pimanda JE, et al. HMGA2 promotes long-term engraftment and myeloerythroid differentiation of human hematopoietic stem and progenitor cells. Blood Adv. 2019;3:681–691. doi: 10.1182/bloodadvances.2018023986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Novershtern N, Subramanian A, Lawton LN, Mak RH, Haining WN, McConkey ME, Habib N, Yosef N, Chang CY, Shay T, et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell. 2011;144:296–309. doi: 10.1016/j.cell.2011.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Park S-M, Cho H, Thornton AM, Barlowe TS, Chou T, Chhangawala S, Fairchild L, Taggart J, Chow A, Schurer A, et al. IKZF2 Drives Leukemia Stem Cell Self-Renewal and Inhibits Myeloid Differentiation. Cell Stem Cell. 2019;24:153–165.:e7. doi: 10.1016/j.stem.2018.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Li J, Kurasawa Y, Wang Y, Clise-Dwyer K, Klumpp SA, Liang H, Tailor RC, Raymond AC, Estrov Z, Brandt SJ, et al. Requirement for ssbp2 in hematopoietic stem cell maintenance and stress response. J Immunol. 2014;193:4654–4662. doi: 10.4049/jimmunol.1300337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Notta F, Zandi S, Takayama N, Dobson S, Gan OI, Wilson G, Kaufmann KB, McLeod J, Laurenti E, Dunant CF, et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science. 2016;351:aab2116. doi: 10.1126/science.aab2116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Oguro H, Ding L, Morrison SJ. SLAM family markers resolve functionally distinct subpopulations of hematopoietic stem cells and multipotent progenitors. Cell Stem Cell. 2013;13:102–116. doi: 10.1016/j.stem.2013.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Boyer SW, Rajendiran S, Beaudin AE, Smith-Berdan S, Muthuswamy PK, Perez-Cunningham J, Martin EW, Cheung C, Tsang H, Landon M, et al. Clonal and Quantitative In Vivo Assessment of Hematopoietic Stem Cell Differentiation Reveals Strong Erythroid Potential of Multipotent Cells. Stem Cell Reports. 2019;12:801–815. doi: 10.1016/j.stemcr.2019.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Upadhaya S, Sawai CM, Papalexi E, Rashidfarrokhi A, Jang G, Chattopadhyay P, Satija R, Reizis B. Kinetics of adult hematopoietic stem cell differentiation in vivo. Journal of Experimental Medicine. 2018;215:2815–2832. doi: 10.1084/jem.20180136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Dong F, Hao S, Zhang S, Zhu C, Cheng H, Yang Z, Hamey FK, Wang X, Gao A, Wang F, et al. Differentiation of transplanted haematopoietic stem cells tracked by single-cell transcriptomic analysis. Nat Cell Biol. 2020;22:630–639. doi: 10.1038/s41556-020-0512-1. [DOI] [PubMed] [Google Scholar]
  • 45.Waddington CH. The Strategy Of The Genes. George Allen & Unwin; 1957. [Google Scholar]
  • 46.Akashi K, Traver D, Miyamoto T, Weissman IL. A clonogenic common myeloid progenitor that gives rise to all myeloid lineages. Nature. 2000;404:193–197. doi: 10.1038/35004599. [DOI] [PubMed] [Google Scholar]
  • 47.Naik SH, Perié L, Swart E, Gerlach C, van Rooij N, de Boer RJ, Schumacher TN. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature. 2013;496:229–232. doi: 10.1038/nature12013. [DOI] [PubMed] [Google Scholar]
  • 48.Adolfsson J, Månsson R, Buza-Vidas N, Hultquist A, Liuba K, Jensen CT, Bryder D, Yang L, Borge O-J, Thoren LAM, et al. Identification of Flt3+ lympho-myeloid stem cells lacking erythro-megakaryocytic potential a revised road map for adult blood lineage commitment. Cell. 2005;121:295–306. doi: 10.1016/j.cell.2005.02.013. [DOI] [PubMed] [Google Scholar]
  • 49.Pei W, Feyerabend TB, Rössler J, Wang X, Postrach D, Busch K, Rode I, Klapproth K, Dietlein N, Quedenau C, et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature. 2017;548:456–460. doi: 10.1038/nature23653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Göthert JR, Gustin SE, Hall MA, Green AR, Göttgens B, Izon DJ, Begley CG. In vivo fate-tracing studies using the Scl stem cell enhancer: embryonic hematopoietic stem cells significantly contribute to adult hematopoiesis. Blood. 2005;105:2724–2732. doi: 10.1182/blood-2004-08-3037. [DOI] [PubMed] [Google Scholar]
  • 51.Joseph C, Quach JM, Walkley CR, Lane SW, Lo Celso C, Purton LE. Deciphering Hematopoietic Stem Cells in Their Niches: A Critical Appraisal of Genetic Models, Lineage Tracing, and Imaging Strategies. Cell Stem Cell. 2013;13:520–533. doi: 10.1016/j.stem.2013.10.010. [DOI] [PubMed] [Google Scholar]
  • 52.Lotfollahi M, Wolf FA, Theis FJ. scGen predicts single-cell perturbation responses. Nat Methods. 2019;16:715–721. doi: 10.1038/s41592-019-0494-8. [DOI] [PubMed] [Google Scholar]
  • 53.Lotfollahi M, Naghipourfar M, Luecken MD, Khajavi M, Büttner M, Wagenstetter M, Avsec Ž, Gayoso A, Yosef N, Interlandi M, et al. Mapping single-cell data to reference atlases by transfer learning. Nat Biotechnol. 2022;40:121–130. doi: 10.1038/s41587-021-01001-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ. Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity. Cell. 2019;177:1873–1887.:e17. doi: 10.1016/j.cell.2019.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Goodell MA, Nguyen H, Shroyer N. Somatic stem cell heterogeneity: diversity in the blood, skin and intestinal stem cell compartments. Nat Rev Mol Cell Biol. 2015;16:299–309. doi: 10.1038/nrm3980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Lawson H, Sepulveda C, van de Lagemaat LN, Durko J, Barile M, Tavosanis A, Georges E, Shmakova A, Timms P, Carter RN, et al. JMJD6 promotes self-renewal and regenerative capacity of hematopoietic stem cells. Blood Adv. 2021;5:889–899. doi: 10.1182/bloodadvances.2020002702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Mapperley C, van de Lagemaat LN, Lawson H, Tavosanis A, Paris J, Campos J, Wotherspoon D, Durko J, Sarapuu A, Choe J, et al. The mRNA m6A reader YTHDF2 suppresses proinflammatory pathways and sustains hematopoietic stem cell function. J Exp Med. 2021;218:e20200829. doi: 10.1084/jem.20200829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Bagnoli JW, Ziegenhain C, Janjic A, Wange LE, Vieth B, Parekh S, Geuder J, Hellmann I, Enard W. Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq. Nat Commun. 2018;9:2937. doi: 10.1038/s41467-018-05347-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Picelli S, Faridani OR, Björklund ÅK, Winberg G, Sagasser S, Sandberg R. Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc. 2014;9:171–181. doi: 10.1038/nprot.2014.006. [DOI] [PubMed] [Google Scholar]
  • 60.Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell. 2019;177:1888–1902.:e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hie B, Bryson B, Berger B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol. 2019;37:685–691. doi: 10.1038/s41587-019-0113-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Polański K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park J-E. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics. 2020;36:964–965. doi: 10.1093/bioinformatics/btz625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36:421–427. doi: 10.1038/nbt.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. 2020 doi: 10.48550/arXiv.1802.03426. [DOI] [Google Scholar]
  • 67.Traag V, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9:5233. doi: 10.1038/s41598-019-41695-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Kent DG, Copley MR, Benz C, Wöhrer S, Dykstra BJ, Ma E, Cheyne J, Zhao Y, Bowie MB, Zhao Y, et al. Prospective isolation and molecular characterization of hematopoietic stem cells with durable self-renewal potential. Blood. 2009;113:6342–6350. doi: 10.1182/blood-2008-12-192054. [DOI] [PubMed] [Google Scholar]
  • 69.Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, Timmer J. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25:1923–1929. doi: 10.1093/bioinformatics/btp358. [DOI] [PubMed] [Google Scholar]
  • 70.Ravasi T, Suzuki H, Cannistraci CV, Katayama S, Bajic VB, Tan K, Akalin A, Schmeier S, Kanamori-Katayama M, Bertin N, et al. An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man. Cell. 2010;140:744–752. doi: 10.1016/j.cell.2010.01.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90–97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Morris R, Kershaw NJ, Babon JJ. The molecular details of cytokine signaling via the JAK/STAT pathway. Protein Sci. 2018;27:1984–2009. doi: 10.1002/pro.3519. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

Data Availability Statement

https://gottgens-lab.stemcells.cam.ac.uk/bgweb2/HSPC_dyn2022/

External online data will be maintained long-term using: the GEO repository (sequencing data), Mendeley Data repository (code, pre-processed input data and software environment) and a dedicated server maintained by the University of Cambridge Stem Cell Institute (interactive visualization).

RESOURCES