Abstract
In a recent clone-tracking experiment, millions of uniquely tagged hematopoietic stem cells (HSCs) and progenitor cells were autologously transplanted into rhesus macaques and peripheral blood containing thousands of tags were sampled and sequenced over 14 years to quantify the abundance of hundreds to thousands of tags or “clones.” Two major puzzles of the data have been observed: consistent differences and massive temporal fluctuations of clone populations. The large sample-to-sample variability can lead clones to occasionally go “extinct” but “resurrect” themselves in subsequent samples. Although heterogeneity in HSC differentiation rates, potentially due to tagging, and random sampling of the animals’ blood and cellular demographic stochasticity might be invoked to explain these features, we show that random sampling cannot explain the magnitude of the temporal fluctuations. Moreover, we show through simpler neutral mechanistic and statistical models of hematopoiesis of tagged cells that a broad distribution in clone sizes can arise from stochastic HSC self-renewal instead of tag-induced heterogeneity. The very large clone population fluctuations that often lead to extinctions and resurrections can be naturally explained by a generation-limited proliferation constraint on the progenitor cells. This constraint leads to bursty cell population dynamics underlying the large temporal fluctuations. We analyzed experimental clone abundance data using a new statistic that counts clonal disappearances and provided least-squares estimates of two key model parameters in our model, the total HSC differentiation rate and the maximum number of progenitor-cell divisions.
Author summary
Hematopoiesis of virally tagged cells in rhesus macaques is analyzed in the context of a mechanistic and statistical model. We find that the clone size distribution and the temporal variability in the abundance of each clone (viral tag) in peripheral blood are consistent with (i) stochastic HSC self-renewal during bone marrow repair, (ii) clonal aging that restricts the number of generations of progenitor cells, and (iii) infrequent and small-size samples. By fitting data, we infer two key parameters that control the level of fluctuations of clone sizes in our model: the total HSC differentiation rate and the maximum proliferation capacity of progenitor cells. Our analysis provides insight into the mechanisms of hematopoiesis and a framework to guide future multiclone barcoding/lineage tracking measurements.
Introduction
Hematopoiesis is a process by which hematopoietic stem cells (HSCs) produce all the mature blood in an animal through a series of proliferating and differentiating divisions [1]. Maintenance of balanced hematopoietic output is critical for an organism’s survival and determines its response to disease and clinical procedures such as bone marrow transplantation [2–5]. How the relatively small HSC population generates more than 1011 cells of multiple types daily over an organism’s lifetime has yet to be fully understood. HSCs are defined primarily by their function but are often quiescent [6]. In vivo, it is hard to track the dynamics of individual HSCs, while HSCs in vitro do not typically proliferate or differentiate as efficiently. Therefore, the dynamics of HSCs can be inferred only from analyses of populations of progenitors and differentiated blood cells [7] and it is useful to investigate HSC dynamics through mathematical modeling and simulations [8–10].
While most studies model population-level HSC behavior [5, 11, 12], certain aspects of HSCs, such as individual-level heterogeneity in repopulation and differentiation dynamics, have to be studied on a single-cell or clonal level [13]. Single HSC transplant mouse data [14] and clonal tracking of HSCs [15, 16] in mice have shed some light on repopulation dynamics under homeostasis and after bone marrow transplantation [5, 17, 18]. However, murine studies usually involve only one or a few clones. How each individual HSC contributes to the blood production process over long times in much larger human and non-human primates is less clear and more difficult to study. Also, unlike in mice, there is no way to isolate and mark HSC populations in human [19].
Recently, results of a long-term clonal tracking of hematopoiesis in normal-state rhesus macaques has been made available [13, 20]. The experiment extracted and uniquely “labelled” hematopoietic stem and progenitor cells (HSPCs) from four rhesus macaques with viral tags that also carry an enhanced green fluorescent protein gene. After autologous transplantation, if any of the tagged HSPCs divide and differentiate, its progeny will inherit their unique tags and ultimately appear in the peripheral blood. Blood samples were drawn every few months over 4 − 14 years (depending on the animal) and the sampled cells were counted and sequenced. Of the ∼106 − 107 unique HSPC tags transplanted, ∼102 − 103 clones were detected in the sampled peripheral blood. In the original paper describing the clonal tracking experiment, Kim et al. [13] observed “A small fraction (4 − 10%) of tagged clones predominately contribute to a large fraction (25 − 71%) of total blood repopulation.” They described the fluctuations of tags that appeared in each sample as “waves of clones”, but did not address why some clones can disappear at certain times and reappear in a latter sample.
In this study, we seek to better understand the observed clone size distributions and the large temporal variability in clonal populations. To address these observations, we ask: Is heterogeneity in HSCs necessary for peripheral blood clone size heterogeneity, or can a neutral model explain clone size differences? Are clones that disappear and reappear from sample to sample simply missed by random blood sampling, or do other mechanisms of temporal variability need to be invoked?
Unlike other previous models that describe the evolution of lineages of different cell types and their regulation [8–10, 21], we will consider simpler neutral models that describe the dynamics of specifically granulocyte populations carrying different tags. Of central interest is the competition among the thousands of clones under a neutral environment that gives rise to fluctuations, extinctions, and resurrections in individual clone populations. Even when considering only one cell type, realistic mathematical models may need to include complex multilevel biochemical feedback mechanisms of regulation [8, 22–27]. Many mechanisms may contribute to temporal fluctuations, including extrinsic noise and heterogeneity of HSCs, progenitors, or mature granulocytes. Large time gaps between samplings (5 − 11 months) and small sample sizes also add to the uncertainty of the underlying dynamics. Trying to infer all possible mechanisms and associated parameters from the experimental data would essentially be an overfitting problem. In order to feasibly compare with experimental data, our modeling philosophy will be to recapitulate these complexities into simple, effective models and infer parameters that subsume some of these regulatory effects. This approach and level of modeling are similar to those taken by e.g., Yang, Sun, and Komarova [28, 29].
After careful consideration of a number of key physiological mechanisms, we hypothesize that stochastic HSC self-renewal, generation-limited progenitor cell proliferation, and small-size sampling frequency statistics provide the simplest reasonable explanation for the observed clonal size variability and large temporal fluctuations. HSCs that are generated from self-renewal of the founder population share the same tag as their founder HSC. Thus, during intense self-renewal after myeloablative treatment and HSPC transplantation, each originally transplanted HSCs begets a clonal HSC subpopulation. Subsequently, heterogeneous clone sizes are stochastically generated even though each tag was initially represented by only a single cell. These expanded HSC clones then go on to repopulate the clones in the progenitor and mature blood population, which are also distinguishable by their corresponding tags.
Relative to HSCs, progenitor cells have limited proliferative potential that can explain the apparent extinctions of clones in blood samples. This limited proliferation potential can be thought of as an “aging” process. Different types of aging, including organism aging [23, 30, 31], replicative senescence of stem cells [32], and generation-dependent birth and death rates, have been summarized by Edelstein et al. [33]. Here, the clonal “aging” mechanism we invoke imposes a limit to the number of generations that can descend from each newly created (from HSC differentiation) “zeroth generation” progenitor cell. Possible sources of such a limit include differentiation-induced loss of division potential [34] and telomere shortening (as in the Hayflick limit) [35–37]. Mathematically, genealogical aging can be described by tracking cell populations within each generation. After a certain number of generations, progenitor cells of the final generation stop proliferating and can only differentiate into circulating mature cells or die.
In the following sections, we first present the mathematical equations and corresponding solutions (whenever possible) of a model that incorporates the above processes. We then develop a new statistical measure that tracks the numbers of absences of clones across the samples. Measured clone abundances of animal RQ5427 are statistically analyzed within our mechanistic model to infer estimates for key model parameters. The data and corresponding statistical analyses for animals 2RC003 and RQ3570 are also provided in the Results section.
Materials and methods
Below, we describe available clonal abundance data, mechanistic models, and a statistical model we will use for parameter inference.
Clone abundance data
In the experiments of Kim et al. [13], cells in samples of peripheral blood were sequenced and counted to extract , the total number of EGFP+ tagged cells in sample 1 ≤ j ≤ J taken at time tj. After PCR amplification and sequencing, , the relative abundance of the ith tag among all sampled, tagged cells is also quantified. The “^” notation will henceforth indicate experimentally measured quantities.
Within mature peripheral blood, lymphocytes such as T cells and B cells proliferate or transform in response to unpredictable but clone-specific immune signals [38]. They also vary greatly in their lifespans, ranging from days in the case of regular T and B cells to years in the case of memory B cells. On the other hand, mature granulocytes do not proliferate in peripheral blood and have relatively shorter life spans [7]. Granulocyte dynamics can thus be analyzed with fewer confounding factors [11]. Thus, in this paper, we restrict our analysis to granulocyte repopulation and extract all variables, including and described above, that are associated exclusively with granulocyte populations.
In Fig 1(a), we plot the total numbers of sampled granulocytes from one of the macaques, RQ5427. The subpopulation of EGFP+ granulocytes and the subset of EGFP+ granulocytes that were extracted for PCR amplification and analysis are also plotted. Data for two other animals, 2RC003 and RQ3570, are qualitatively similar. Blood samples from a fourth animal, 95E132, were not separated in to granulocyte and peripheral blood mononuclear cells (PBMCs) before sequencing. Thus, clonal abundances for granulocytes are not available from 95E132. There are only three animals for which we can analyze clonal abundances of granulocytes. For more specifics on the data, see supplemental files of the original experimental paper [13]. As shown in Fig 1(b), not only are the clone abundances heterogeneous, but individual clone abundances vary across samples taken at different times. The variation is so large that many clones can go extinct and reappear from one sample to another, as shown in Fig 1(c). Since large numbers of progenitor and mature cells are involved in blood production, the observed clone size fluctuations cannot arise from intrinsic demographic stochasticity of progenitor- and mature-cell birth and death. Moreover, we will show later in the Results section that random sampling alone cannot explain the observed clonal variances and mechanisms that involve other sources of variation are required.
Fig 1. Blood sample data from animal RQ5427 [13].
(a) The total numbers of sampled granulocytes (blue triangles), EGFP+ granulocytes (green squares), and the subset of EGFP+ granulocytes that were properly tagged and quantifiable were extracted for PCR amplification and analysis (black circles). This last population defined by is used to normalize clone cell counts. We excluded the first sample at month 2 in our subsequent analysis so, for example, the sample at month 56 is labeled the 7th sample. There were 536 clones detected at least once across the eight samples taken over 67 months comprising an average fraction 0.052 of all granulocytes. The abundances of granulocyte clones are shown in (b). The relative abundance of granulocytes from the ith clone measured at month tj is indicated by the vertical distances between two adjacent curves. The relative abundances of individual clones feature large fluctuations over time. “Extinctions” followed by subsequent “resurrections,” were constantly seen in certain clones as indicated by the black circles in (b) and in the inset (c).
Nomenclature and lumped mechanistic model
Fig 2 depicts our neutral model of hematopoiesis which is composed of five successive stages, or compartments, describing the initial single-cell tagged HSC clonal populations immediately after transplantation (Compartment 0), the heterogeneous HSC clonal populations after a short period of intense self-renewal (Compartment 1), the transit-amplifying progenitor cell compartment (Compartment 2), the peripheral blood pool (Compartment 3), and the sampled peripheral blood (Compartment 4), respectively. Each distinct color or shape in Fig 2 represents a distinct clone of cells with the same tag.
Fig 2. Schematic of a neutral multi-stage or multi-compartment hematopoiesis model.
BM and PB refer to bone marrow and peripheral blood, respectively. Cells of the same clone have the same color. White circles represent untagged cells which were not counted in the analysis. Stages 0, 1, and 2 describe cell dynamics that occur mainly in the bone marrow. Stage 1 describes HSC clones (Ch = 6 in this example) after self-renewal that starts shortly after transplantation with rate rh. After self-renewal, the relatively stable HSC population (H+ = 20 in this example) shifts its emphasis to differentiation (with per-cell differentiation rate α). Larger clones in Stage 1 (e.g., the circular blue clone, hblue = 4) will have a larger total differentiation rate αhblue while smaller clones (e.g., the red hexagonal clone, hred = 1) will have smaller αhred. The processes of progenitor-cell proliferation (with rate rn) and maturation (with rate ω) in Compartments 2 and 3 are considered deterministic because of the large numbers of cells involved. The darker-colored symbols correspond to cells of later generations. For illustration, the maximum number of progenitor-cell generations allowed is taken to be L = 4. Compartment 4 represents a small sampled fraction (ε(tj) ≈ 2.8 × 10−5 − 2 × 10−4) of Compartment 3, the entire peripheral blood of the animal. In the example pictured above, Cs = 4. Such small samples can lead to considerable sampling noise but is not the key driver of sample-to-sample variability.
In each compartment, relevant parameters include (using Compartment 1 as example): the total cell count H(t), the untagged cell count H−(t), the tagged cell count H+(t), the total number of tagged clones Ch(t), and the number hi(t) of HSCs carrying the ith tag. These quantities are related through .
In the progenitor pool, the total number of cells and the number with tag i are denoted N(t) and ni(t), respectively. Further resolving these progenitor populations into those of the ℓth generation, we define N(ℓ)(t) and . In the mature granulocyte pool, the total granulocyte population and that with tag i are labelled M(t) and mi(t). In the sampled blood compartment, we use S(tj), S+(tj), si(tj), and Cs(tj) to denote, at time tj, the total number of sampled cells, the number of tagged sampled cells, the total number of tagged cells of clone i, and the total number of clones in the sample, respectively. In Compartment 4, we further define fi(tj) = si(tj)/S+(tj) to denote the relative abundance of the ith clone among all tagged clones.
By lumping together all clones (tagged and untagged) in each compartment, we can readily model the dynamics of total populations in each pool. After myeloablative treatment, the number of BM cells, including HSCs, is severely reduced. Repopulation of autogolously transplanted HSCs occurs quickly via self-renewal until their total number H(t) reaches a steady-state. The repopulation of the entire HSC population and the subsequent entire progenitor and mature cell populations may be described via simple deterministic mass-action growth laws
(1) |
(2) |
(3) |
HSC self-renewal is a regulated process involving signaling and feedback [22–24, 39, 40] and rh may be a complicated function of many factors; however, we will subsume this complexity into a simple population-dependent logistic growth law rh(H(t)) ≡ ph(1 − H(t)/Kh) and assume a constant death rate μh. Alternatively, other studies have employed Hill-type growth functions [12, 28].
We assume the per cell HSC differentiation rate α is independent of the tag and that differentiation is predominantly an asymmetric process by which an HSC divides into one identical HSC and one progenitor cell that commits to differentiation into granulocytes. An initial generation-zero progenitor cell further proliferates with rate , contributing to the overall progenitor-cell population. Subsequent generation-ℓ progenitors, with population N(ℓ), proliferate with rate until a maximum number of generations L is reached. By keeping track of the generation index ℓ of any progenitor cell, we limit the proliferation potential associated with an HSC differentiation event by requiring that any progenitor cell of the final Lth generation to terminally differentiate into peripheral blood cells with rate ω or to die with rate . For simplicity, we neglect any other source of regulation and assume α, , and ω are all unregulated constants.
Our model analysis and data fitting will be performed using clone abundances sampled a few months after transplantation under the assumption that granulopoiesis in the animals has reached steady-state [4] after initial intensive HSC self-renewal. Steady-state solutions of Eqs (1), (2) and (3) are defined by Hss, , and Mss. The first constraint our model provides relates these steady-state total populations through
(4) |
where we have defined
(5) |
as the total rate of HSC differentiation and the average number of granulocytes generated per HSC differentiation, respectively. These constraints also hold for the virally tagged, EGFP+ subset (about 5% − 10%) of HSCs, e.g., and . Since is inferred from the experiment, Eq (4) places a constraint between the total differentiation rate of labeled HSCs and the typical per-differentiation amplification number β. This steady-state constraint will eventually be combined with statistics of the fluctuating clone abundances data to infer estimates for the underlying model parameters.
Clone-resolved mechanistic model
Although the lumped model above provides important constraints among the steady-state populations within each compartment, the clone-tracking experiment keeps track of the populations of sampled granulocytes that arise from “founder” HSCs that carry the same tag. Thus, we need to resolve the lumped model into the clonal subpopulations described by hi, , and mi.
Even though the total HSC populations H(t) and H±(t) are large, the total number of clones Ch ≫ 1 in compartment 1 is also large, and the number of cells with any tag (the size of any clone) can be small. The population of cells with any specific tag i is thus subject to large demographic fluctuations. Thus, we model the stochastic population of HSCs of any tag using a master equation for P(h, t), the probability that at time t the number of HSCs of any clone is h:
(6) |
Recall that immediately after transplantation, each HSC carries a distinct tag before self-renewal (hi(0) = 1) leading to the initial condition , where the indicator function if and only if x = y. Because h = 0 is an absorbing boundary, clones start to disappear at long times resulting in a decrease in the total number Ch(t) of HSC clones. Before this “coarsening” process significantly depletes the entire population, each clone constitutes a small subpopulation among all EGFP+ cells, h(t)≪H(t), and the stochastic dynamics of the population h of any clone can be approximated by the solution to Eq (6) with the logistic self-renewal rate rh(H) ≡ ph(1 − H/Kh) replaced by rh(t) = ph(1 − H(t)/Kh). Hence, evolution of each HSC clone follows a generalized birth-death process with time-dependent birth rate and constant death rate. We show in Appendix A in S1 Appendix that for H ≫ 1 the solution to Eq (6) can be written in the form [41]
(7) |
where 0 ≤ λ(t) < 1 depends on rh(t) and μh. Here, λ(t) determines “broadness” (level of clone size heterogeneity) of the clone size distribution. For the relevant initial condition of unique tags at t = 0, λ(0) = 0 and λ(t → ∞)→1. When λ(t) is small, the distribution is weighted towards small h. For λ(t) = 0, which was the limit used in Goyal et al. [4] to assume no HSC self-renewal after transplantation. In the limit λ(t)→1, the distribution becomes flat and a clone is equally likely to be of any size 1 ≤ h ≤ H.
To further resolve the progenitor population into cells with distinct tags, we define n(ℓ)(t) as the number of generation-ℓ progenitor cells carrying any one of the viral tags. The total number of progenitor cells with a specific tag is . Since the sizes hi of individual clones may be small, differentiation of HSCs within each clone may be rare. However, since the size of each tagged progenitor clone quickly becomes large (n(t) ≫ 1), we model the dynamics of n(ℓ)(t) using deterministic mass-action growth laws:
(8) |
Our model is neutral (all clones have the same birth, death, and maturation rates), so these equations are identical to Eq (2). However, since creation of the zeroth-generation subpopulation n(0)(t) derives only from differentiation of HSCs of the corresponding clone, which has a relatively small population h(t), we invoke a Poisson process with rate αh(t) to describe stochastic “injection” events associated with asymmetric differentiation of HSCs of said clone. Each discrete differentiation event leads to a temporal burst in n(ℓ)(t).
Finally, the dynamics of the population m(t) of any granulocyte clone in the peripheral blood are described by an equation analogous to Eq (3):
(9) |
where we have assumed that only the generation-L progenitor cells undergo terminal differentiation with rate ω. An alternative model allows progenitor cells of earlier generations (ℓ < L) to also differentiate and circulate but does not give rise to qualitatively different results (See Appendix B in S1 Appendix).
To study the dynamics of the burst in immediately following a single, isolated asymmetric HSC differentiation event at t = 0, we set the initial condition , remove the Poisson (αh(t)) term in Eq (8) and find,
(10) |
Bounded analytic solutions to involving the lower incomplete gamma function can be found. Upon using the solution in Eq (9) the mature blood population within a clone associated with a single HSC clone differentiation even is described by
(11) |
The populations associated with a single HSC differentiation event, and mb(t), are plotted below in Fig 3. of the Results section. Then, the total number mi(t) of mature granulocytes with the ith tag at time t is obtained by summing up all mb(t − τk) bursts initiated by HSC differentiations at separate times τk ≤ t with the ith tag.
Fig 3.
(a) A burst of cells is triggered by a single HSC differentiation event at time t = 0. A plot of representative solutions to Eqs (10) and (11) for rn = 2.5, L = 24, , μm = 1, , and ω = 0.16. Curves of different colors represent , the progenitor cell population within each generation ℓ = 0, 1, 2, …, L, and mb(t), the number of mature granulocytes associated with the differentiation burst. All populations rise and fall. (b) Realizations of peripheral blood (PB) populations in a single clone arising from multiple successive differentiation events. The fluctuating populations are generated by adding together mb(t) associated with each differentiation event. Time series resulting from small (hi/H+ = 0.0003) and large (hi/H+ = 0.03) HSC clones are shown. Small clones are characterized by separated bursts of cells, after which the clone vanishes for a relatively long period of time. The number of mature peripheral blood cells of large clones reaches a relatively constant level and almost never vanishes.
Besides the burst dynamics described above, the data shown in Fig 1(a) are subject to the effects of small sampling size, uncertainty, and bias induced by experimental processing such as PCR amplification, and data filtering. In this experimental system, PCR generates a smaller uncertainty than blood sampling so we focus on the statistics of random sampling. Each blood sample drawn from monkey RQ5427 contains about 10μg of genomic DNA [13]. After PCR amplification, deep sequencing, and data filtering, the total number of quantifiable tags corresponds to ∼5 × 103 − 3 × 104 tagged cells. The sample ratio is defined by where is the estimated total number of tagged granuloctyes in the peripheral blood. The number of sampled cells with the ith tag from the jth sample then approximately follows a Binomial distribution in our model. To quantitatively explore the feature of apparent extinctions of clones from a sample, we calculate the probability that no peripheral blood cell from clone i is found in a sample of size . Thus, if the ith clone is likely to be missed in the sample. The value ε−1 is also used to threshold the population mb(t) to define the measurable duration Δτb of a burst (as indicated in Fig 3(a)).
Parameter values
Parameters determined by the experimental procedure or estimated directly from the experiments include the weight of the animal, the sampling times tj, the EGFP+ ratio, and the total number of tagged cells detected in each sample . Since the tagged granulocyte population does not fluctuate much across samples, we use its average for , and the relevant experimental parameters for each animal become . These will also be used as inputs to our models.
Our multi-stage model also contains many other intrinsic parameters, including . We first found parameter values that have been reliably independently measured. Some parameters were measured in human clinical studies rather than in rhesus macaques but can nonetheless serve as reasonable approximations for non-human primates due to multiple physiological similarities [42]. These estimates can certainly be improved once direct measurements on rhesus macaques become available. Model parameters, their estimates, and the associated references are given in Table 1 below.
Table 1. Summary of parameters, including their biological interpretation, ranges of values, and references.
All rate parameters are quoted in units of per day. Other parameters are chosen to be within their corresponding reported ranges from the referenced literature. How variations in parameter values affect our analysis will be described in the subsequent sections.
Parameter | Interpretation | Values & References |
---|---|---|
HSC pool (Compartment 1) | ||
Hss | total number of HSCs at steady state | 1.1 × 104 − 1.1 × 106 [4, 11, 12] |
α | per-cell HSC differentiation rate | 5.6 × 10−4 − 0.02 [4, 11, 12] |
μh | HSC death rate | 10−3 − 0.1 [12, 34] |
Transit-Amplifying Progenitor pool (Compartment 2) | ||
rn | growth rate of progenitor cell | 2 − 3 [12] |
μn | death rate of progenitor cell (generation ℓ < L) | 0 [12, 34] |
death rate of progenitor cell (generation ℓ = L) | 0 − 0.27 [12, 34] | |
ω | maturation rate of generation-L cells | 0.15 − 0.17 [43, 44] |
L | maximum generation of progenitor cells | 15 − 21 [12, 34] |
Peripheral Blood pool (Compartment 3) | ||
Mss | total number of peripheral blood granulocytes at steady state | (2.5 − 5) × 109 [13, 42] |
μm | death rate of peripheral blood granulocytes | 0.2 − 2 [34, 44, 45] |
Model properties and implementation
Using parameter estimates, we summarize the dynamical properties of our model and describe how the key model ingredients including stability of HSC clone distributions and subsequent “bursty” clone dynamics that follow differentiation can qualitatively generate the observed clone-size variances.
Slow homeostatic birth-death of HSCs—The first important feature to note is the slow homeostatic birth-death of HSCs. After the bone marrow is quickly repopulated, rh(H(t)) − μh ≈ 0, and stochastic self-renewal slows down. Because h = 0 is an absorbing state, the size distribution of the clones may still slowly evolve and coarsen due to stochastic dynamics, leading to the slow successive extinction of smaller clones. The typical timescale for overall changes in h can be estimated by approximating rh(Hss) ≈ μh [46] and considering the mean time T(h) of extinction of a clone initially at size h ⪡ Hss. The standard result given in Gardiner [47] and also derived in Appendix C in S1 Appendix is months (for μh = 10−2, Hss = 104, h = 101; see Table 1 for applicable values). Since this timescale is larger than the time of the experiment (67 months for monkey RQ5427), mean HSC clone sizes do not change dramatically during the experiment, consistent with the stable number of clones observed in the samples shown in Fig 1(b). Thus, as a first approximation, we will use a static configuration {hi} drawn from P(h) to describe how, through differentiation, HSC clones feed the progenitor pool.
Fast clonal aging of progenitors—In contrast to slow HSC coarsening, progenitor cells proliferate “transiently.” In Fig 3(a) we plot a single population burst of progenitor and mature granulocytes, given by Eqs (10) and (11) and using the parameter values listed in Table 1. The characteristic duration, or “width” Δτb associated with each temporal burst of cells is defined as the length of time during which the number mb(t) is above the detection threshold within a sample of peripheral blood: .
According to Eq (11), the burst width and height depend nonlinearly on the parameters L, rn, μn, μm, and ω in their physiological ranges (see Table 1). The characteristic width of a burst scales as Δτb ∼ L/rn + 1/ω + 1/μm. This estimate is derived by considering the L rounds of progenitor cell division, each of which takes time ∼ 1/rn. Terminal-generation progenitors then require time ∼1/ω to mature, after which mature granulocytes live for time ∼ 1/μm. In total, the expected life span of ∼ L/rn + 1/ω + 1/μm approximates the timescale of a HSC-differentiation-induced burst of cells fated to be granulocytes. Using realistic parameter values, the typical detectable burst duration Δτb ∼ 1 − 2 months is much shorter than the typical sampling gaps Δtj = 5 − 11 months.
With this “burst” picture in mind, we now show how fluctuations of sampled clone sizes can be explained. Small-h (where the clone-wise HSC differentiation rate ) clones rarely appear in blood samples. Their appearance also depends on whether sampling is frequent and sensitive enough to catch the burst of cells after rare HSC differentiation events. On the other hand, large-h () clones differentiate frequently and consistently appear in the peripheral blood. Their populations in blood samples are less sensitive to the frequency of taking samples. Fig 3(b) shows two multi-burst realizations of peripheral-blood populations mi(t) of clone i corresponding to a small clone and a large clone. The 2000-day trajectories were simulated by fixing hi and stochastically initiating the progenitor proliferation process. Population bursts described by Eq (11) were added after each differentiation event distributed according to Poisson(αhi). Using simulations, we confirm that the statistics of clone extinctions and resurrections are more sensitive to the overall clonal differentiation rate αhi than to the precise shape of a mature cell burst, allowing a reduction in the number of effective parameters (Appendix D in S1 Appendix).
We can further pare down the number of remaining parameters by finding common dependences in the model and defining an effective maximum generation number. We can rewrite Eq (5) as , where
(12) |
is an effective (and noninteger) maximum generation parameter. Later in Appendix D in S1 Appendix, we show that uncertainties of the model structure, alternative mechanisms, and parameter values can be subsumed into Le. Henceforth, in our quantitative data analysis, we will set the unmeasurable parameters and subsume their uncertainties into an effective maximum generation Le. Finally, we will invoke Eq (4) to find the constraint
(13) |
Since we can estimate of the animals in the experiment and the death rate of mature granulocytes μm has been reliably measured in the literature, Eq (13) provides a relationship between the total steady-state differentiation rate and the maximum number of progenitor generations Le.
After assigning values to parameters using Table 1 (setting μn = 0, ω = 0.16 and μm = 1), subsuming parameters into Le (setting ), describing the configuration {hi} through the distribution shape factor λ and the total number of HSC clones Ch (setting the HSC death rate μh = 0), and applying the constraint , we are left with four effective model parameters θmodel = {λ, Ch, rn, Le}. Here we have included rn in the key model parameters since it is not reliably measured and the cell burst width is sensitive to rn. Once Le is inferred, Eq (13) can be used to find .
Statistical model
The total number of tags observed across all samples (obtained by summing up the observed numbers of unique tags over J samples) can be used as a lower bound on Ch. Even though estimates for animal RQ5427 give Ch ∼ 550 − 1100, uncertainties in the HSC self-renewal rate parameters ph, Kh, and the initial HSC population H(0) make λ and P(h, t) difficult to quantify. Even if P(h, t) were known, it is unlikely that the drawn {hi} would accurately represent those in the monkey, especially when λ ≈ 1 and P(h) becomes extremely broad (the variance of P(h) approaches infinity). Thus, we are motivated to find a statistical measure of the data that is insensitive to the exact configuration of {hi}. The goal is to study the statistical correlations between various features of only the outputs, which should be insensitive to the input configuration {hi} but still encode information about the differentiation dynamics.
Two such features commonly used to fit simulated fi(tj) to measured are the mean and the variance . However, the small number of measurement time points J and the frequent disappearance of clones motivated us to propose an even more convenient statistic that is based on
(14) |
the number of absences across all samples of a clone rather than on σi. Here, the indicator function when x = x′ and otherwise. In Appendix E in S1 Appendix, we illustrate alternatives such as data fitting based on σi and on an autocorrelation function but also describe the statistical insights gained from using statistics of zi.
The level of correlation between the observed number of absences of clone i and its average abundance is measured by the average of conditioned on (dashed curve). In Fig 4, the distribution of the values of at each is clearly shown. To combine the correlated stochastic quantities zi and yi into a useful objective function, we take the expectation of yi over only those clones that have a specific number zi = z absences across the time samples:
(15) |
The normalizing denominator is simply the number of clones with exactly z absences. In case no simulated or data-derived trajectories fi(tj) exhibit exactly z absences, we set Yz = 0 or . We then determine Yz(θmodel) from simulating our model and from experiment and use the mean squared error (MSE) between the two as the objective function:
(16) |
where θmodel = {λ, Ch, rn, Le}. Y0 is excluded from the MSE calculation because the yi values of clones that have zi = 0 are not constrained by the burstiness of the model and Y0 can be sensitive to the underlying configuration {hi} (see the Discussion and Appendix E in S1 Appendix).
Fig 4. Scatterplot of clone trajectories of animal RQ5427 displayed in terms of ln , the log mean abundance of clone i, and , the number of samples in which clone i is undetected.
The trajectory of each clone i is represented by a symbol located at a coordinate determined by its value of ln and . A trajectory of a clone that exhibits one absence within months 8 − 67 is shown in the inset. The first sample at month 2 is excluded because only long-term repopulating clones are considered. Clones that are absent in all eight samples are also excluded, so the largest number of absences considered for animal RQ5427 is 7. The dashed black line denotes ln , where is the average of calculated over i within each bin of z as shown in Eq (15). When later analyzing , (red circles) is not included.
We are now in a position to compare results of our model with experimental data. The general approach will be to choose a set of parameters, simulate the forward model (including sampling) to generate clone abundances {fi(tj)}, number of absences zi, and ultimately Yz(θmodel), which is then compared to data-derived . By minimizing Eq (16) with respect to θmodel, we obtain the least square estimates (LSE) of θmodel. A schematic of our workflow is shown in Fig 5. We describe the details of the simulation of our model in Appendix F in S1 Appendix.
Fig 5. Workflow for comparing parameter-dependent simulated data with measured clone abundances.
The first step is drawing a configuration {hi}, which is experimentally unmeasurable, from the HSC clone distribution P(h). To define P(h) requires an initial estimate of λ and Ch. Using known experimental parameters θexp and choosing rn, Le ∈ θmodel, we compute the theoretical quantities yi and zi by simulating the multi-compartment mechanistic model and the peripheral-blood sampling. The corresponding and are extracted from data, and the theoretical Yz(θmodel) and the experimental are compared through the MSE defined in Eq (16). The MSE is then minimized to find least squares estimates for θmodel.
Results
By implementing the protocol outlined in Fig 5, we find a number of results including least-squares-estimates (LSE) of the parameters, their sensitivity to other model features, validation of the mechanistic model, and robustness of our statistical methods to missing data and clone sampling thresholds. Our analyses allow us to effectively compare the results from the three different animals.
MSE function and estimates of Le and for animal RQ5427
We first fix the HSC distribution shape parameter λ = 0.99 and the total number of HSC clones Ch = 500; this choice will be justified in the next subsection. The MSE objective function can now be plotted as a function of the proliferation rate rn ∈ [0.01, 10] and proliferation potential Le ∈ [19, 28] of progenitor cells in their respective biologically relevant ranges. Even after specifying θmodel = {λ = 0.99, Ch = 500, rn, Le}, there is still uncertainty in the simulated values of Yz = {Y1, Y2, …, Y7} due to the uncertainty in the drawn configuration of HSC clone sizes {hi}, the intrinsic stochastic mechanisms of the model (Poissonian HSC differentiation events), and random peripheral blood sampling. Therefore, we performed 200 simulations for each set of {rn, Le}, producing 200 sets of Yz. The means of Yz are used to construct the mean of MSE(λ = 0.99, Ch = 500, rn, Le), plotted in Fig 6.
Fig 6. Dependence of the mean MSE defined in Eq (16) on rn and Le.
For visualization purposes, we took the natural logarithms of MSE values and plotted them as a function of Le and rn. Blue areas denotes smaller MSE values, thus better fitting. This energy surface was generated by averaging over 200 simulations using Ch = 500 and λ = 0.99.
In the reported progenitor growth rate range of rn = 2 − 3 (Table 1), the MSE function is quite insensitive to Le. To interpret this observation, note that rn does not affect the absolute value of β according to Eq (13), but it affects the typical time ∼ L/rn + 1/ω it takes for a generation 0 progenitor cell to form a mature granulocyte. When rn < μm, the proliferation of progenitors cannot “catch up” with the loss of granulocytes, resulting in a quickly vanishing burst in the granulocyte population mb(t) arising from a single-differentiation event mb(t). A larger Le would be required to compensate. When rn ≫ μm, the growth of any clone is much quicker than its loss, so the burst size is relatively stable and is not very sensitive to rn. Thus, the MSE objective function is fairly insensitive to rn in its biologically meaningful value range.
We then fix the progenitor proliferation rate rn = 2.5 and plot the mean MSE(λ = 0.99, Ch = 500, rn = 2.5, Le) in Fig 7(a), which indicates a clear minimum at . The error bars denote the standard deviation of MSEs obtained from the 200 simulations at different values of Le and show that the variability is negligible for the purpose of determining the minimum. Upon applying the steady-state granulocyte balance constraint in Eq (13), we obtain a total HSC differentiation rate .
Fig 7. Finding the least squares estimate (LSE) for animal RQ5427 by fitting the simulated Yz to the experimental .
The values of (λ, Ch, rn) are chosen to be (0.99, 500, 2.5). Simulations with {hi} set to instead of drawing from P(h) generate similar results. (a) The LSE is . Averages and standard deviations (error bars) of the 200 MSEs are plotted. (b) Comparisons between the experimental (solid) and simulated (dashed) Yz with fixed . The error bars are determined by considering the standard deviation of the average abundances (yi or ) of all clones exhibiting z absences.
If we approximate , Le ≈ L. Substituting LSE values for L into the model for the peripheral blood bursts (the analytic solutions to n(L)(t) and mb(t) in Eqs (10) and (11)) yields a single burst duration of Δτb ≈ 32 days, consistent with our assumption Δτb ⪡ Δtj = 5 − 11 months. Note that even though L is interpreted as an integer in Eq (8), analytic solutions of Eqs (10) and (11), and mb(t), depend on L in a continuous manner, interpolating the behavior to arbitrary values of L. Fig 7(b) shows how one simulation of fits the experimentally measured . Here, each error bar denotes the standard deviation across all mean abundances yi (or ) within each value of z absences.
Insensitivity of analysis to HSC configurations
In Fig 8, we demonstrate the weak dependence of our least-squares estimate to λ, the parameter controlling the shape of the probability distribution of HSC clone sizes P(h, t). For each λ, we sample a fixed number (Ch = 500) of HSC clones from the theoretical distribution P(h, t), fix rn = 2.5, and let Le vary between 19 and 28. The averages of the 200 simulated MSEs at each value of Le are compared and the that corresponds to the minimal average MSE is selected. The selected as a function of λ is plotted in Fig 8(a). Fig 8(b) shows the averages and standard deviations of MSE at each value of λ. We then repeat the simulations with Ch = 1000. These results together show that is insensitive to the distribution of hi. This insensitivity might be understood by noticing that the quantity Yz is defined as the mean of the values of yi that are associated with z absences (dashed curve in Fig 4) and is not necessarily sensitive to how these values are distributed (vertically distributed markers at each value of z in Fig 4). Instead, Yz incorporates the intrinsic relationship between a clone’s mean abundance yi and its number of absences zi, averaged over all clones. It thus also encodes how heterogeneity in the HSC clone populations is translated into the burstiness seen in the sampled clone abundances fi(tj). Although it is generally impossible to recover the exact {hi} configuration, we find the HSC self-renewal-induced geometric distribution described by Eq (7) generally generates better fits to the sampled data when λ is large (≳ 0.5), suggesting significant heterogeneity in values of hi.
Fig 8. The LSE is insensitive to the geometric distribution factor λ > 0 and to Ch ≫ 1.
This implies that for a wide range of values of λ and Ch the LSEs are insensitive to the HSC configuration {hi}. (a) s found at each value of λ. (b) Averages and standard deviations (error bars) of MSE as a function of λ. The LSE and MSE() values associated with self-consistently using from experimental data are marked by arrows and “exp.”
Comparison of variability from simple sampling and best-fit model
We can check how our LSE result performs against the null hypothesis that clone size variations arise only from random sampling. An estimate of sampling-induced variability can be obtained by assuming a specific number of peripheral blood granulocytes of tag i and randomly drawing an experimentally determined fraction ε(tj) of peripheral blood cells. This is repeated J times from a constant peripheral pool {mi}. Each draw results in si(tj) cells of clone i in the simulated sample. Normalizing by S+(tj), the total number of tagged cells in the sample, we obtain simulated fi(tj) from which we extract the mean abundance yi and its standard deviation for each clone i. The simulated quantities ln yi and σi associated with each clone i are indicated by the green triangles in Fig 9(a). The corresponding values and derived from the data shown in Fig 1(b) are indicated by the blue dots. This simple heuristic test shows that the experimental fluctuations in clone abundances are significantly larger than those generated from random sampling alone and that additional mechanisms are responsible for the fluctuation of clone abundances in peripheral blood. Using LSE parameter values, Fig 9(b) shows the fluctuations in clone abundances obtained from random sampling of fluctuating mature clones simulated from our model. Here, the variability is a convolution of the fluctuations arising from intrinsic burstiness and from random sampling. The total variability fits those of the experimental data well except for several large-sized outlier clones.
Fig 9.
(a) A plot of the standard deviation vs. the log of the mean , extracted from abundance data (blue dots). For comparison, clonal tags distributed within the peripheral blood cells were randomly sampled (with the same sampling fraction ε(tj) at times tj as in the experiment). The analogous quantity σi shown by the green triangles indicates a much lower standard deviation for a given value of ln yi. This simple test implies that the clonal variability across time cannot be explained by random sampling. (b) The same test is performed after applying our model with the LSE parameter Le = 23.4 (and the average of parameters listed in Table 1).
Robustness of to sampling frequency and threshold
We checked the robustness of our inference by leaving out time points from the experiment. Recall that the experimental data matrix for animal RQ5427 contains 536 rows, each representing a clone, and 8 columns, each representing a time point measured by month. By using only the first j = 8, 7, …1 time points of data (leaving out 8 − j time points), seven additional simulation studies to find were performed. As shown in Fig. G1 in Appendix G of S1 Appendix, reduction in the number of time samples flattens the MSE but preserves its minimum near provided at least 2-3 samples are used. We have also excluded intermediate samples to mimic larger sampling gaps Δtj and found similar results.
Next, we examined the effects of sample thresholding on our parameter inference. By eliminating clones whose average abundances are under a certain threshold, we will observe fewer clones in the large-z bins depicted in Fig 4. Since larger clones with fewer absences contribute most to the MSE, our results will not be affected as long as the threshold is not too large. Provided we apply the same threshold to both the simulated and experimental data, there should not be systematic bias in our results. The MSEs generated using different thresholds are plotted in Fig. G2 in Appendix G of S1 Appendix and show that the inferred value remains essentially unchanged provided the threshold level is low enough to retain approximately at least 40% (about 200) of the clones (see Fig. G2(a-f) in Appendix G). With fewer clones retained (< 200), the LSE of Le shifts only modestly to . Thus, we conclude that our inference of Le is robust to increases in sampling threshold as along as a reasonable number of clones (≳ 200) are counted.
Data analysis and fitting for animals 2RC003 and RQ3570
The data from the three different monkeys vary in their numbers of tagged clones transplanted and the lengths of the experiments. For animal RQ5427/2RC003/RQ3570, there are 536/1371/442 clones that are detected at least once within 67/103/38 months. The fraction of cells in all tracked clones in animal RQ5427/2RC003/RQ3570 was approximated by the average fraction of cells that were EGFP+ marked over time, around 0.052/0.049/0.086 (the ratios between green square and blue triangle markers in Figs 1(a), 10(a) and 11(a)), respectively. Figs 10 and 11 also show the clone abundances, the MSE functions, and the statistics of Y(z).
Fig 10.
(a-b) Experimental data for animal 2RC003. (c) Difference between experimental and simulated Yz(Le) as a function of Le. The values of his are set to be equal to , and the model was simulated 200 times at each value of Le. Other parameters are taken from Tables 1 and 2. The LSE and . (d) Comparison of the optimal Yz to the experimental .
Fig 11. Experimental data (a-b) and fitting results (c-d) for animal RQ3570.
The values of his are set to be equal to . Other parameters are taken from Tables 1 and 2. The LSE fitting results are and .
Despite differences among the animals and the large variability in the estimated values of α and Hss individually reported in the literature [4, 11, 12], the estimates of and are rather similar across the three animals. For animal 2RC003, the optimal estimates are , while for animal RQ3570, . The corresponding estimates for A*, after considering the constraint Eq (13) and the EGFP+ ratios in Table 2, are 282.7, 136.7, and 224.4.
Table 2. Summary of specific parameter values for monkeys 2RC003 and RQ3570 derived from experimental measurements [13] or obtained by calculations ( and ).
Parameter | Reference range or LSE value | ||
---|---|---|---|
RQ5427 | 2RC003 | RQ3570 | |
536 | 442 | 1371 | |
14.7 | 6.7 | 19.3 | |
282.7 | 136.7 | 224.4 | |
23.4 | 25.0 | 24.0 | |
Mss | 3.2 × 109 | 4.6 × 109 | 3.8 × 109 |
S+(tj) | (5.0 − 30) × 103 | (2.1 − 8.6) × 103 | (7.0 − 10.8) × 103 |
EGFP+ ratio | 0.052 | 0.049 | 0.086 |
ε(tj) | (2.8 − 20) × 10−5 | (1.2 − 4.2) × 10−5 | (2.4 − 3.0) × 10−5 |
Δtj | 150 − 330 | 180 − 660 | 150 − 260 |
We also compared how the simulated LSE fits the experimental for all three animals. Note that for each specific z, the value of Yz is the conditional mean of the values of yi for which each clone i exhibits exactly z absences. Even though for any specific z, the distribution of the corresponding yis is unknown, their mean Yz should follow a normal distribution according to the central limit theorem. We use a one-sample t-test to compare against the mean of the s generated from 10000 simulations using the optimal . For animal RQ5427, we actually performed seven one-sample t-tests on the Yz = {Y1, Y2, …, Y7} to find the seven p-values {0.69, 0.53, 0.58, 0.17, 0.68, 0.01, 3 × 10−5}. Except for the last two p-values (corresponding to the bins z = 6 and z = 7), all other bins easily pass the one-sample t-test at a significance level of 0.05. Clones with z = 6, 7 are much smaller and more severely corrupted by noise, such as that induced during PCR amplification, and thus provide less reliable information.
Comparisons of the test results among the three animals, together with comparisons among Figs 7(b), 10(d) and 11(d), show qualities of fit ordered according to RQ3570 < RQ5427 < 2RC003. This sequence of fitting qualities is consistent with the increasing experimental times RQ3570 > RQ5427 > 2RC003, suggesting that age-associated changes of stem cell clone sizes cannot be fully neglected (which we did by fixing {hi}) [48]. As is evident from Fig 10(a), several clones start to dominate after month 64; this coarsening phenomenon is not evident in the data of the other two animals. Animal RQ3570 was sacrificed at month 38, so no obvious coarsening is observed and no clones strongly dominate (see Fig 11). A summary of the parameters and fitting results for all animals is given in Table 2.
Discussion
In this study, we analyzed a decade-long clonal tracking experiment in rhesus macaques and developed mechanistic and statistical models that helped us understand two salient features of clone abundance data: the heterogeneous (nonuniform) distribution of clone sizes and the temporal fluctuation of clone sizes. Below, we further discuss the implications of our results, the structure of our mechanistic model, and the potential effects of including additional biological processes.
Comparison to previous studies
The long-term clonal tracking data we analyzed were generated from a huge number of initially tagged HSPCs (Ch(0) ∼ 106 − 107) [13], a large number of observed clones (Cs ∼ 102 − 103), small numbers of sequenced cells that carry tags (), and infrequent sampling (Δtj > 5 months). These features present significant challenges to the modeling and analysis over previous studies that mostly focused on one or a few clones [5, 15, 17, 18].
In a previous analysis, Goyal et al. [4] aggregated the clone abundance data across all mature cell types and studied the distribution of the number of clones of specific size. At each time point, they ordered the clones according to their sizes. Thus, the ordering can change across samples as some clones expand while others diminish. They found that the cumulative clone-number distribution (defined as the number of clones of a specific size or less) of the size-ordered clones becomes stationary as soon as a few months after transplantation. They proposed a neutral birth-death description of progenitor cells and fitted the expected value of clone counts in each sample by assuming and tuning parameters in the downstream progenitor and mature-cell compartments. By focusing on aggregate clone counts, this study could not distinguish the dynamics of individual clones, nor could it predict the persistence of clone sizes over time. Since individual clone sizes (hi, ni, mi, si of the same tag i) were not tracked, mechanisms driving the dynamics, and in particular, the variability and fluctuations of individual clone sizes that drive disappearances and reappearances, remain unresolved [4].
In our model, heterogeneity of clone sizes is explicitly generated by stochastic HSC self-renewal of cells of each tag, and extinctions and resurrections arise from a generation-limited progenitor proliferation assumption. We infer model parameters as listed in Table 2. Combining the results with previous experimental and theoretical estimates of Hss ≈ 1.1 × 104 − 2.2 × 104 [4, 49] results in α = 0.0045 − 0.027, slightly larger than, but still consistent with, the estimates α = 0.0013 − 0.009 by Shepherd et al. [11]. Previous studies that modeled total peripheral blood population estimated α ≈ 0.022 and Hss ≈ 1.1 × 106/kg for dogs and α ≈ 0.044 and Hss ≈ 1.1 × 106/kg for humans [12]. These estimates yield a value of αHss about 102 − 103 times greater than ours, which is nonetheless consistent with our steady-state constraint Eq (13) because they assumed a much smaller L ≈ 15 − 18 for dog and 16 − 21 for human. This difference in the estimates of L may be partially attributed to the transplant conditions under which the rhesus macaque experiments were performed [13]. Alternative model assumptions and differing values of other parameters may also contribute to this difference. For example, the extremely large value of Hss ≈ 107 used in [34] will naturally decrease their estimate for relative to that of our analysis.
Model structure, sensitivity to parameters, and cellular heterogeneity
Uncertainties in values of parameters such as μh, ph, Kh, and other factors that tune the symmetric-asymmetric modes of HSC differentiation or involve HSC activation processes [50] will impart uncertainty in determining P(h) and {hi}. We have assumed P(h) satisfies a master equation and depends on only two effective parameters λ and Ch. However, we have demonstrated that the statistical properties of Yz are quite insensitive to the upstream configuration {hi} and hence to λ and Ch for a wide range of their values (see Fig 8). In other words, very little information in {hi} is retained in the sampled abundances after HSCs differentiate and trigger random bursty peripheral blood-cell population dynamics.
Another feature we have ignored in our neutral model is cellular heterogeneity such as tag-dependent differentiation, proliferation, and death rates. Cellular heterogeneity in HSC differentiation rates could be described by different αi for each clone i, and the total differentiation rate would be . Differences in αi can be subsumed into a modified configuration {hi} which, as we have seen, does not strongly influence our parameter estimation based on the Yz statistics. Thus, given the available data and how information is lost along the stages of hematopoiesis and sampling, the present quasi-steady-state analyses cannot resolve heterogeneity across HSC clones.
We have not investigated how cellular heterogeneity in progenitor and mature cells would affect our results, but clone-dependences in their birth and death rates could affect sizes and durations of population bursts and quantitatively affect our analysis. However, unless the statistics of inter-burst times are highly variable across clones, we do not expect cellular heterogeneity to qualitatively affect our conclusions.
Changing downstream parameters such as μm or invoking alternative mechanisms of terminal differentiation (see Appendix B in S1 Appendix) can affect the shape of clonal bursts. We show in Appendix D in S1 Appendix that these effects can be subsumed into the effective maximum progenitor generation Le. We have performed additional simulations to confirm that changing μm = 2 will not influence the fitting of but increases by one. In other words, inference of is robust against many upstream and downstream parameters, indicating that the intrinsic clone size fluctuations observed in the experimental data strongly constrain the total rate of HSC differentiation. On the other hand, uncovering the actual maximal generation L* from is possible only when uncertainties in these other parameters are resolved.
Clonal stability vs clonal succession
Our model reduction was based on the separation of timescales of the slow HSC dynamics and the fast clonal aging dynamics. Since HSC clone sizes vary extremely slowly for primates ( months), we ignored the homeostatic births/deaths of HSCs when fitting the temporal clonal variations. This is partially justified by visual inspection of Figs 1(b), 10(b) and 11(b) that show no significant variations of large clones’ abundances is observed before 60 months. Instead, the random intermittent HSC differentiation events induce relatively short ( months) bursts of granulopoietic progeny that contribute strongly to temporal fluctuations of clone sizes. Such behavior are consistent to the “clonal stability” hypothesis [51–53], which assumes that a fixed group of HSCs randomly contributes to an organism’s blood production at all times.
The alternative hypothesis of “clonal succession” [16, 54, 55] assumes that different groups of HSCs are sequentially recruited to the blood production at different times. This hypothesis would be consistent with our model only under a different set of parameters where HSCs self-renew/die at a rate comparable to that of Δτb, the duration of a granulocyte burst. For example, murine HSC turnover rates μh are hypothesized to be 10-fold higher than those in primates while the clonal aging dynamics (and its timescale Δτb) are relatively conserved across species [56]. According to our result in Appendix C in S1 Appendix, such a 10-fold increase in HSC death rate would lead to a 10-fold increase in HSC clone extinction rate, bringing the lifespans of HSC clones closer to the (progenitor) clonal aging timescale Δτb. This interpretation is consistent with the fact that hematopoiesis in large primates has been described in terms of “clonal stability” while hematopoiesis in mice has been described in terms of “clonal succession” [16, 51–55]. We thus predict that with even longer tracking (> 100 months), the “clonal succession” mechanism could be significant in primates also.
Summary and future directions
In summary, we have built mechanistic and statistical models that enable the quantitative analysis of noisy and infrequent clonal tracking data. We focused on the huge temporal variability observed in the sampled clone abundances and defined a robust statistical measure Yz of sample-to-sample clone size variability through the number of clonal disappearances. Of course, there is a nearly endless list of details such cellular heterogeneity and more complex biology that we did not include, but given the noisy data, we propose and quantify the simplest explanation for the observed heterogeneous clone abundances and the temporal “extinctions and resurrections”. The key ingredients in our mechanistic model are HSC self-renewal (quantified by the effective parameter λ), intermittent HSC differentiation (quantified by the parameter ), and an effective maximum progenitor generation (quantified by the effective parameter Le). Although we cannot fully resolve λ from data, the obvious mismatch between experiment and our model when λ is small shows that a certain level of HSC clone-size heterogeneity (larger λ ≈ 1) is necessary to match the sampled data. Similarly, we cannot fully resolve α and , but their product, the total tagged HSC differentiation rate , is one of the key parameters constrained by our modeling. By minimizing an objective function of Yz over effective model parameters, we found LSE values and for the three rhesus macaques. These quantities could not be inferred from the total, more static cell populations. These results also imply that true dynamical changes in and Le could be masked by the intrinsically bursty dynamics of each clone but provide a framework for future study into extrinsic perturbations.
Our analysis provides insight into the variables and experimental conditions to which parameter inference is most sensitive, possibly guiding the design of future experiments. The approach and models can also be readily extended to quantify white blood cells of other types. For example, the mechanistic model can be directly applied to monocytes since they also have relatively simple dynamics and do not proliferate in the periphery [57]. Peripheral lymphocytes, however, would require additional experimental information because their populations are more sensitive to the state of the animal and can homeostatically proliferate [38].
Supporting information
(PDF)
Acknowledgments
The authors thank S. K. Lyons for help editing.
Data Availability
Data analyzed have been previously published in Cell Stem Cell, volume 14, no. 4, pp 473-485, 2014 (https://doi.org/10.1016/j.stem.2013.12.012). No figures, plots, or tables in this manuscript, or any parts of them, have been previously published. Figures, plots, and tables have all been generated de novo from published raw data, specifically for this manuscript.
Funding Statement
This study was supported by grants from the National Institutes of Health (www.nih.gov<http://www.nih.gov>, R56HL126544, R01AI110297, R01HL125030, U19AI117941 to ISYC and R00HL116234 to SK), the UCLA Center for Aids Research (http://aidsinstitute.ucla.edu/cfar, AI028697 to ISYC), the UCLA AIDS Institute (http://aidsinstitute.ucla.edu/), the James B. Pendleton Charitable Trust, the McCarthy Family Foundation (http://mccarthyfamilyfdn.org/), the Army Research Office (http://www.arl.army.mil W911NF-14-1-0472 to TC), and the National Science Foundation, (www.nsf.gov<http://www.nsf.gov>, DMS-1516675 and DMS-1814364 to TC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Abkowitz JL, Catlin SN, Guttorp P. Evidence that hematopoiesis may be a stochastic process in vivo. Nature Medicine. 1996;2(2):190–197. 10.1038/nm0296-190 [DOI] [PubMed] [Google Scholar]
- 2. Mendelson A, Frenette PS. Hematopoietic stem cell niche maintenance during homeostasis and regeneration. Nature Medicine. 2014;20(8):833 10.1038/nm.3647 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Stiehl T, Ho A, Marciniak-Czochra A. The impact of CD34+ cell dose on engraftment after SCTs: personalized estimates based on mathematical modeling. Bone marrow transplantation. 2014;49(1):30 10.1038/bmt.2013.138 [DOI] [PubMed] [Google Scholar]
- 4. Goyal S, Kim S, Chen IS, Chou T. Mechanisms of blood homeostasis: lineage tracking and a neutral model of cell populations in rhesus macaques. BMC Biology. 2015;13(1):85 10.1186/s12915-015-0191-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Busch K, Klapproth K, Barile M, Flossdorf M, Holland-Letz T, Schlenner SM, et al. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature. 2015;518(7540):542–546. 10.1038/nature14242 [DOI] [PubMed] [Google Scholar]
- 6. Seita J, Weissman IL. Hematopoietic stem cell: self-renewal versus differentiation. Wiley Interdisciplinary Reviews: Systems Biology and Medicine. 2010;2(6):640–653. 10.1002/wsbm.86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Bystrykh LV, Verovskaya E, Zwart E, Broekhuis M, de Haan G. Counting stem cells: methodological constraints. Nature Methods. 2012;9(6):567–574. 10.1038/nmeth.2043 [DOI] [PubMed] [Google Scholar]
- 8. Székely T, Burrage K, Mangel M, Bonsall M. Stochastic dynamics of interacting haematopoietic stem cell niche lineages. PLoS Computational Biology. 2014;10:e1003794 10.1371/journal.pcbi.1003794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Stiehl T, Marciniak-Czochra A. Characterization of stem cells using mathematical models of multistage cell lineages. Mathematical and Computer Modelling. 2011;53:1505–1517. 10.1016/j.mcm.2010.03.057 [DOI] [Google Scholar]
- 10. Höfer T, Rodewald H. Output without input: the lifelong productivity of hematopoietic stem cells. Current Opinion in Cell Biology. 2016;43:69–77. 10.1016/j.ceb.2016.08.003 [DOI] [PubMed] [Google Scholar]
- 11. Shepherd BE, Kiem HP, Lansdorp PM, Dunbar CE, Aubert G, LaRochelle A, et al. Hematopoietic stem-cell behavior in nonhuman primates. Blood. 2007;110(6):1806–1813. 10.1182/blood-2007-02-075382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Zhuge C, Lei J, Mackey MC. Neutrophil dynamics in response to chemotherapy and G-CSF. Journal of Theoretical Biology. 2012;293:111–120. 10.1016/j.jtbi.2011.10.017 [DOI] [PubMed] [Google Scholar]
- 13. Kim S, Kim N, Presson A, Metzger M, Bonifacino A, Sehl M, et al. Dynamics of HSPC repopulation in nonhuman primates revealed by a decade-long clonal-tracking study. Cell Stem Cell. 2014;14(4):473–485. 10.1016/j.stem.2013.12.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Sieburg HB, Rezner BD, Muller-Sieburg CE. Predicting clonal self-renewal and extinction of hematopoietic stem cells. Proceedings of the National Academy of Sciences. 2011;108(11):4370–4375. 10.1073/pnas.1011414108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Copley MR, Beer PA, Eaves CJ. Hematopoietic stem cell heterogeneity takes center stage. Cell Stem Cell. 2012;10(6):690–697. 10.1016/j.stem.2012.05.006 [DOI] [PubMed] [Google Scholar]
- 16. Sun J, Ramos A, Chapman B, Johnnidis JB, Le L, Ho YJ, et al. Clonal dynamics of native haematopoiesis. Nature. 2014;514(7522):322–327. 10.1038/nature13824 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Muller-Sieburg CE, Sieburg HB, Bernitz JM, Cattarossi G. Stem cell heterogeneity: implications for aging and regenerative medicine. Blood. 2012;119(17):3900–3907. 10.1182/blood-2011-12-376749 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Verovskaya E, Broekhuis MJ, Zwart E, Ritsema M, van Os R, de Haan G, et al. Heterogeneity of young and aged murine hematopoietic stem cells revealed by quantitative clonal analysis using cellular barcoding. Blood. 2013;122(4):523–532. 10.1182/blood-2013-01-481135 [DOI] [PubMed] [Google Scholar]
- 19. Doulatov S, Notta F, Laurenti E, Dick JE. Hematopoiesis: A human perspective. Cell Stem Cell. 2012;10(2):120–136. 10.1016/j.stem.2012.01.006 [DOI] [PubMed] [Google Scholar]
- 20. Kim S, Kim N, Presson AP, An DS, Mao SH, Bonifacino AC, et al. High-throughput, sensitive quantification of repopulating hematopoietic stem cell clones. Journal of Virology. 2010;84(22):11771–11780. 10.1128/JVI.01355-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Höfer T, Busch K, Klapproth K, Rodewald H. Fate mapping and quantitation of hematopoiesis in vivo. Annual Review of Immunology. 2016;34(1):449–478. 10.1146/annurev-immunol-032414-112019 [DOI] [PubMed] [Google Scholar]
- 22. Crauste F, Pujo-Menjouet L, Génieys S, Molina C, O G. Mathematical model of hematopoiesis dynamics with growth factor-dependent apoptosis and proliferation regulations. Journal of Theoretical Biology. 2008;250:322–338. [DOI] [PubMed] [Google Scholar]
- 23. Adimy M, Crauste F. Mathematical model of hematopoiesis dynamics with growth factor-dependent apoptosis and proliferation regulations. Mathematical and Computer Modelling. 2009;49:2128–2137. 10.1016/j.mcm.2008.07.014 [DOI] [Google Scholar]
- 24. Hoyem M, Maloy F, Jakobsen P, Brandsdal B. Stem cell regulation: Implications when differentiated cells regulate symmetric stem cell division. Journal of Theoretical Biology. 2015;380:203–219. 10.1016/j.jtbi.2015.05.009 [DOI] [PubMed] [Google Scholar]
- 25. Østby I, Rusten LS, Kvalheim G, Grøttum P. A mathematical model for reconstitution of granulopoiesis after high dose chemotherapy with autologous stem cell transplantation. Journal of Mathematical Biology. 2003;47(2):101–136. 10.1007/s00285-003-0198-6 [DOI] [PubMed] [Google Scholar]
- 26. Marciniak-Czochra A, Stiehl T, Ho AD, Jäger W, Wagner W. Modeling of asymmetric cell division in hematopoietic stem cells-regulation of self-renewal is essential for efficient repopulation. Stem Cells and Development. 2009;18(3):377–386. 10.1089/scd.2008.0143 [DOI] [PubMed] [Google Scholar]
- 27. Manesso E, Teles J, Bryder D, Peterson C. Dynamical modelling of haematopoiesis: an integrated view over the system in homeostasis and under perturbation. Journal of the Royal Society Interface. 2013;10(80):20120817 10.1098/rsif.2012.0817 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Sun Z, Komarova N. Stochastic modeling of stem-cell dynamics with control. Mathematical Biosciences. 2012;240:231–240. 10.1016/j.mbs.2012.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Yang J, Sun Z, Komarova N. Analysis of stochastic stem cell models with control. Mathematical Biosciences. 2015;266:93–107. 10.1016/j.mbs.2015.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Greenman CD, Chou T. Kinetic theory of age-structured stochastic birth-death processes. Physical Review E. 2016;93:012112 10.1103/PhysRevE.93.012112 [DOI] [PubMed] [Google Scholar]
- 31. Chou T, Greenman CD. A hierarchical kinetic theory of birth, death and fission in age-structured interacting populations. Journal of Statistical Physics. 2016;164:49–76. 10.1007/s10955-016-1524-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Marciniak-Czochra A, Stiehl T, Wagner W. Modeling of replicative senescence in hematopoietic development. Aging (Albany NY). 2009;1(8):723 doi: 10.18632/aging.100072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Edelstein-Keshet L, Israel A, Lansdorp P. Modelling perspectives on aging: Can mathematics help us stay young? Journal of Theoretical Biology. 2001;213(4):509–525. 10.1006/jtbi.2001.2429 [DOI] [PubMed] [Google Scholar]
- 34. Bernard S, Bélair J, Mackey MC. Oscillations in cyclical neutropenia: new evidence based on mathematical modeling. Journal of Theoretical Biology. 2003;223(3):283–298. 10.1016/S0022-5193(03)00090-0 [DOI] [PubMed] [Google Scholar]
- 35. Rufer N, Brümmendorf TH, Kolvraa S, Bischoff C, Christensen K, Wadsworth L, et al. Telomere fluorescence measurements in granulocytes and T lymphocyte subsets point to a high turnover of hematopoietic stem cells and memory T cells in early childhood. The Journal of Experimental Medicine. 1999;190(2):157–168. 10.1084/jem.190.2.157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hodes RJ. Telomere length, aging, and somatic cell turnover. The Journal of Experimental Medicine. 1999;190(2):153–156. 10.1084/jem.190.2.153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Miller R. Telomere diminution as a cause of immune failure in old age: an unfashionable demurral. Biochemical Society Transactions. 2000;28(2):241–245. 10.1042/bst0280241 [DOI] [PubMed] [Google Scholar]
- 38. De Boer RJ, Perelson AS. Quantifying T lymphocyte turnover. Journal of Theoretical Biology. 2013;327:45–87. 10.1016/j.jtbi.2012.12.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Muller-Sieburg C, Cho R, Thoman M, Adkins B, Sieburg H. Deterministic regulation of hematopoietic stem cell self-renewal and differentiation. Blood. 2002;100(4):1302–1309. [PubMed] [Google Scholar]
- 40. Seita J, Weissman IL. Hematopoietic stem cell: self-renewal versus differentiation. Systems Biology and Medicine. 2010;2(6):640–653. 10.1002/wsbm.86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Kendall DG. On the generalized “birth-and-death” process. The Annals of Mathematical Statistics. 1948; p. 1–15. 10.1214/aoms/1177730285 [DOI] [Google Scholar]
- 42. Chen Y, Qin S, Ding Y, Wei L, Zhang J, Li H, et al. Reference values of clinical chemistry and hematology parameters in rhesus monkeys (Macaca mulatta). Xenotransplantation. 2009;16(6):496–501. 10.1111/j.1399-3089.2009.00554.x [DOI] [PubMed] [Google Scholar]
- 43. Dancey JT, Deubelbeiss KA, Harker LA, Finch CA. Neutrophil kinetics in man. Journal of Clinical Investigation. 1976;58(3):705 10.1172/JCI108517 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Lahoz-Beneytez J, Elemans M, Zhang Y, Ahmed R, Salam A, Block M, et al. Human neutrophil kinetics: modeling of stable isotope labeling data supports short blood neutrophil half-lives. Blood. 2016;127(26):3431–3438. 10.1182/blood-2016-03-700336 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Pillay J, den Braber I, Vrisekoop N, Kwast LM, de Boer RJ, Borghans JA, et al. In vivo labeling with 2H2O reveals a human neutrophil lifespan of 5.4 days. Blood. 2010;116(4):625–627. 10.1182/blood-2010-01-259028 [DOI] [PubMed] [Google Scholar]
- 46. Parsons TL, Quince C, Plotkin JB. Absorption and fixation times for neutral and quasi-neutral populations with density dependence. Theoretical Population Biology. 2008;74(4):302–310. 10.1016/j.tpb.2008.09.001 [DOI] [PubMed] [Google Scholar]
- 47. Gardiner CW. Handbook of Stochastic Methods: For physics, chemistry, and natural sciences. Springer, Berlin; 1985. [Google Scholar]
- 48. Yu KR, Espinoza DA, Wu C, Truitt L, Shin TH, Chen S, et al. The impact of aging on primate hematopoiesis as interrogated by clonal tracking. Blood. 2018;131:1195–1205. 10.1182/blood-2017-08-802033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Abkowitz JL, Catlin SN, McCallie MT, Guttorp P. Evidence that the number of hematopoietic stem cells per animal is conserved in mammals. Blood. 2002;100(7):2665–2667. 10.1182/blood-2002-03-0822 [DOI] [PubMed] [Google Scholar]
- 50. Wilson A, Laurenti E, Oser G, van der Wath RC, Blanco-Bose W, Jaworski M, et al. Hematopoietic stem cells reversibly switch from dormancy to self-renewal during homeostasis and repair. Cell. 2008;135(6):1118–1129. 10.1016/j.cell.2008.10.048 [DOI] [PubMed] [Google Scholar]
- 51. Abkowitz JL, Persik MT, Shelton GH, Ott RL, Kiklevich JV, Catlin SN, et al. Behavior of hematopoietic stem cells in a large animal. Proceedings of the National Academy of Sciences. 1995;92(6):2031–2035. 10.1073/pnas.92.6.2031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Prchal JT, Prchal JF, Belickova M, Chen S, Guan Y, Gartland GL, et al. Clonal stability of blood cell lineages indicated by X-chromosomal transcriptional polymorphism. Journal of Experimental Medicine. 1996;183(2):561–567. 10.1084/jem.183.2.561 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. McKenzie JL, Gan OI, Doedens M, Wang JC, Dick JE. Individual stem cells with highly variable proliferation and self-renewal properties comprise the human hematopoietic stem cell compartment. Nature Immunology. 2006;7(11):1225–1233. 10.1038/ni1393 [DOI] [PubMed] [Google Scholar]
- 54. Jordan CT, Lemischka IR. Clonal and systemic analysis of long-term hematopoiesis in the mouse. Genes & Development. 1990;4(2):220–232. 10.1101/gad.4.2.220 [DOI] [PubMed] [Google Scholar]
- 55. Drize NJ, Keller JR, Chertkov JL. Local clonal analysis of the hematopoietic system shows that multiple small short-living clones maintain life-long hematopoiesis in reconstituted mice. Blood. 1996;88(8):2927–2938. [PubMed] [Google Scholar]
- 56. Catlin SN, Busque L, Gale RE, Guttorp P, Abkowitz JL. The replication rate of human hematopoietic stem cells in vivo. Blood. 2011;117(17):4460–4466. 10.1182/blood-2010-08-303537 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Auffray C, Sieweke MH, Geissmann F. Blood monocytes: development, heterogeneity, and relationship with dendritic cells. Annual Review of Immunology. 2009;27:669–692. 10.1146/annurev.immunol.021908.132557 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(PDF)
Data Availability Statement
Data analyzed have been previously published in Cell Stem Cell, volume 14, no. 4, pp 473-485, 2014 (https://doi.org/10.1016/j.stem.2013.12.012). No figures, plots, or tables in this manuscript, or any parts of them, have been previously published. Figures, plots, and tables have all been generated de novo from published raw data, specifically for this manuscript.