Abstract
Clonal expansions driven by somatic mutations become pervasive across human tissues with age, including in the haematopoietic system, where the phenomenon is termed clonal haematopoiesis1–4. The understanding of how and when clonal haematopoiesis develops, the factors that govern its behaviour, how it interacts with ageing and how these variables relate to malignant progression remains limited5,6. Here we track 697 clonal haematopoiesis clones from 385 individuals 55 years of age or older over a median of 13 years. We find that 92.4% of clones expanded at a stable exponential rate over the study period, with different mutations driving substantially different growth rates, ranging from 5% (DNMT3A and TP53) to more than 50% per year (SRSF2P95H). Growth rates of clones with the same mutation differed by approximately ±5% per year, proportionately affecting slow drivers more substantially. By combining our time-series data with phylogenetic analysis of 1,731 whole-genome sequences of haematopoietic colonies from 7 individuals from an older age group, we reveal distinct patterns of lifelong clonal behaviour. DNMT3A-mutant clones preferentially expanded early in life and displayed slower growth in old age, in the context of an increasingly competitive oligoclonal landscape. By contrast, splicing gene mutations drove expansion only later in life, whereas TET2-mutant clones emerged across all ages. Finally, we show that mutations driving faster clonal growth carry a higher risk of malignant progression. Our findings characterize the lifelong natural history of clonal haematopoiesis and give fundamental insights into the interactions between somatic mutation, ageing and clonal selection.
Subject terms: Acute myeloid leukaemia, Myelodysplastic syndrome, Risk factors, Computational models, Mutation
A long-term study of 385 human donors reports that driver gene mutations and age determine the lifelong dynamics of clonal haematopoiesis
Main
Human haematopoiesis produces hundreds of billions of specialized blood cells every day, through a hierarchy of progressively more differentiated and numerous cells originating from a pool of long-lived haematopoietic stem cells (HSCs). Haematopoiesis remains highly efficient for decades, but is inevitably challenged by the erosive effects of ageing7–9 and the inexorable acquisition of somatic DNA mutations10. Mutations that augment HSC fitness can drive clonal expansion of a mutant HSC and its progeny, a phenomenon known as clonal haematopoiesis1–4. Clonal haematopoiesis becomes ubiquitous with advancing age and is associated with an increased risk of myeloid leukaemias and some non-haematological diseases1,2,4,5,11,12.
The observation that clonal haematopoiesis-associated mutations affect a restricted set of genes that are also frequently mutated in leukaemia1–4—most commonly those involved in epigenetic regulation (DNMT3A, TET2 and ASXL1), splicing (SF3B1 and SRSF2) and apoptosis (TP53 and PPM1D)—implies that these mutations inherently confer fitness to HSCs. In fact, recent evolutionary models assume that each specific mutation carries a fixed fitness advantage, and find that this largely explains the relative proportions and clonal sizes of clonal haematopoiesis driven by different mutations13. However, several observations suggest that non-mutational factors are also influential. For example, a handful of clonal haematopoiesis cases studied at two time points propose that clones driven by the same or similar mutations can behave differently between individuals12,14. Also, the relative prevalence of different clonal haematopoiesis-driver gene mutations changes significantly depending on context; for example, in aplastic anaemia, clonal haematopoiesis is commonly driven by mutations that enhance immune evasion15–18, whereas genotoxic stress favours clones with mutations in DNA damage genes19–21. Furthermore, factors such as inflammation22 and heritable genetic variation23–25 can affect the emergence of clonal haematopoiesis.
A major limitation to our understanding of the determinants of clonal haematopoiesis behaviour and fate up to now has been its reliance on cross-sectional studies capturing clonal haematopoiesis at single time points. Here, by tracking blood cell clones over long periods in a large cohort, and by reconstructing haematopoietic phylogenies, we uncover the lifelong dynamics and natural history of clonal haematopoiesis.
Mutational landscape of clonal haematopoiesis
We analysed 1,593 blood DNA samples from 385 adults aged 55–93 years at the time of entry into the SardiNIA longitudinal study26. The participants, who had no history of haematological malignancy, were sampled up to 5 times (median 4) over 3.2–16 years (median 12.9 years) (Fig. 1a, Extended Data Fig. 1a–c). We performed deep sequencing (mean 1,065× coverage) of 56 genes associated with clonal haematopoiesis and haematological malignancy (Supplementary Table 1) and identified somatic mutations in 52 of these genes (Supplementary Table 2). Using the dNdScv algorithm, an implementation of dN/dS (the ratio of the number of nonsynonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site) that corrects for trinucleotide mutation rates, sequence composition and variable mutation rates across genes, we identified positive selection of missense and/or truncating variants in 17 of these genes27 (dN/dS > 1 with q < 0.1) (Extended Data Figs. 1d–e, 2, Supplementary Table 3). We focussed on these genes for further analysis.
At least one somatic non-synonymous mutation was identified in 305 of 385 individuals (79.2%), with clonal haematopoiesis prevalence, average clone size and number of mutations per individual increasing with advancing age, and clonal haematopoiesis was identified in more than 90% of those aged 85 years or older (Fig. 1b, c). Mutations were most common in epigenetic regulator genes TET2 and DNMT3A, and also frequent in ASXL1, TP53, PPM1D and spliceosome genes (Fig. 1d, top). Notably, in this elderly cohort, advancing age affected the prevalence of different driver mutations in a gene-dependent manner (Fig. 1d, bottom). In particular, the prevalence of DNMT3A mutations showed no significant relationship with age overall (P = 0.12 for a binomial regression of gene prevalence versus age, controlling for sex). By contrast, clones with TET2 mutations showed a consistent increase with age, averaging at 6.8% per year (P = 0.00037), as did those with mutations in splicing genes (U2AF1, SRSF2 and SF3B1), whose prevalence increased by 5.4% per year (P = 0.025). These changes in driver prevalence with age could not have resulted from exclusion of individuals with haematological malignancies, as the incidence of these in the complete SardiNIA cohort was only 0.28% (22 out of 7,816), the majority of which were lymphoid.
Most clones expand steadily in older age
To investigate clonal behaviour over time, we used serial variant allele fraction (VAF) measurements—the fraction of sequencing reads reporting a mutation—as a surrogate for clone size, and fitted a saturating (logistic) exponential curve with a constant growth rate over time to each clonal trajectory. Such logistic growth behaviour is supported by simulations of evolutionary dynamics using Wright–Fisher models with constant fitness28 (Extended Data Fig. 3a, b). Remarkably, by assessing the fit between serial VAF measurements and the trajectories inferred by our model, we find that the great majority of clones (92.4%) expanded at a constant exponential rate over the study period (Fig. 2a, b, Extended Data Fig. 3c). The predominance of fixed-rate growth was particularly marked for genes such as DNMT3A and TET2, for which 99% and 94.3% of clones, respectively, grew steadily over time. Nevertheless, some clones behaved unpredictably, with proportions varying by mutant gene. Most notable were JAK2V617F-mutant clones, which showed irregular growth trajectories, with only 58% displaying stable growth. The likelihood of mutant clones displaying non-constant growth at older age was not affected either by the number of mutations in the same individual or by the number of available serial samples (Extended Data Fig. 3d, e).
We further assessed the consistency of clonal trajectories by testing our ability to predict future clonal growth. Using additional prospectively-obtained blood samples from 11 individuals, we compared observed versus predicted VAFs (Extended Data Fig. 3f–h, Supplementary Table 4) and found good concordance (mean absolute error 3.5%), corroborating our model and providing further evidence that fixed-rate growth of clones is the norm in old age.
Determinants of clonal growth rate
To delineate the factors that determine each clone’s growth rate, our logistic regression model fits the following contributions of the driver mutation: (1) mutated gene; (2) specific amino acid change (in recurrently mutated sites), and (3) mutation type (truncating versus non-truncating) (Supplementary Table 5). An additional component in our model, measuring variation not captured by (1)–(3), was also used and termed ‘unknown-cause growth’ (Extended Data Fig. 3i).
We found that clones bearing mutations in different genes expanded at different rates, with mutations affecting DNMT3A and TP53 displaying the slowest average annual growth rates of approximately 5% per year (Fig. 2c, Supplementary Table 6). Clones with mutations in the other most common driver genes (TET2, ASXL1, PPM1D and SF3B1), expanded at roughly twice this rate, that is, about 10% per year. The most rapidly expanding clones were those carrying mutations in SRSF2, PTPN11 and U2AF1, which grew at 15–20% per yr on average. The only specific mutation displaying distinctive behaviour was SRSF2P95H, which was associated with significantly faster expansion compared with other SRSF2 mutations. By contrast, all other hotspot mutations drove growth at rates similar to mutations elsewhere in the same gene, including commonly mutated sites such as DNMT3AR882, SF3B1K666N and SF3B1K700E.
For most genes, truncating and missense mutations drove similar rates of growth, including TET2 and DNMT3A, in keeping with the similar functional consequences of these two types of mutation in these genes29,30. Exceptions were (1) TP53, for which clones with missense mutations expanded by 10% per year (90% confidence interval [3–18%]) faster than truncating mutations (which usually did not expand or even contracted), consistent with the reported strong dominant-negative effect of missense mutations in this gene31, and (2) CBL, for which clones with missense mutations grew 11% per year (90% confidence interval [3–19%]) slower than truncating mutations (Fig. 2c, Extended Data Fig. 3j, Supplementary Table 6).
To quantify the impact of factors other than driver mutations, we compared the observed growth rate of each clone with that predicted by the mutation (Fig. 2d). In Fig. 2d, vertical spread represents variability in growth rate between clones with the same driver mutation. On average, this growth of unknown cause contributed approximately ±5% per year to clonal expansion (Fig. 2e). Consequently, for fast-growing clones, including those associated with SRSF2P95H or mutant U2AF1, this effect was proportionately small and there was relatively little inter-individual variability in growth rate. By contrast, the effect on slow drivers such as DNMT3A was more substantial, with some clones growing twice as rapidly as predicted by the mutation, and others showing negligible expansion. Clones harbouring JAK2V617F mutations were an exception as they displayed an unusually high degree of inter-individual variability in relation to average growth rate (Fig. 2d, e, Extended Data Fig. 4a). In view of the well-described heritable contribution to myeloproliferative neoplasm (MPN) susceptibility23,24, we tested whether JAK2V617F-mutant clones grew more quickly in individuals carrying MPN risk alleles, but found no such relationship (Extended Data Fig. 4b, Supplementary Table 7).
The more general observation that certain individuals harboured more mutations in the same gene than would be expected by chance (Extended Data Fig. 4c) suggests that non-mutation factors influencing clonal growth are both individual- and gene-specific. We found no evidence that these non-mutation factors include either sex or smoking history and that initial clone size made only a small contribution, whereas age was a significant factor specifically for TET2-mutant clones, which grew faster in older individuals (Spearman’s rho = 0.31; sum of squared rank differences (S) = 1.15 × 106; n = 216 TET2 clones; adjusted P = 2 × 10−6) (Extended Data Fig. 4d–g).
Lifelong natural history of clonal haematopoiesis
To compare the longitudinal clonal behaviours we observed in older age with lifelong clonal dynamics, we began by deriving and whole-genome sequencing (WGS) 96 haematopoietic colonies, each originating from a single stem or progenitor cell and expanded in vitro to a clone of hundreds to thousands of cells, from each of three individuals with splicing gene mutations (Fig. 3a–c, Extended Data Fig. 5), particularly as previous reports suggested a sharp increase in prevalence of these driver mutations late in life3. We constructed phylogenetic trees using somatic mutations as lineage-tracing barcodes and, since HSCs accumulate mutations at a near constant rate, we used phylogenetic branch lengths to time the onset of clonal expansions (‘clades’)32–37. In PD41276, the phylogeny was dominated by an SF3B1K666N-mutant clone, beginning between 23–47 years of age, with only a single SF3B1-wild-type colony, consistent with a near-complete clonal sweep (Fig. 3a). In PD34493, SF3B1K666N was acquired before the age of 35 years, whereas U2AF1Q157R initiated clonal growth later (41–61 years of age) in a previously expanded clade lacking recognizable drivers (Fig. 3b). Notably, an additional apparently driverless expansion—a phenomenon that has been recognized to occur in old age2,36—and three further such expansions in PD41305 were observed in this individual (Fig. 3b), (Fig. 3c). In PD41305, since the SRSF2P95H mutation was present in only one colony, we could time its acquisition only to the broad interval between 13 years of age and the age of sampling (73 years of age).
We next used the timing and density of clonal branchings (also known as ‘coalescences’) to reconstruct the entire growth trajectories of expanded clades using phylodynamic principles33,38,39 (Fig. 3d–h). This revealed that the three clades with identified drivers (SF3B1K666N and U2AF1Q157R in PD34493, and SF3B1K666N in PD41276), expanded (Fig. 3d–f) at calculated rates similar to those observed in our time-series VAF measurements during older age (Fig. 3i, left). Of note, SF3B1K666N was associated with a substantially different growth rate in PD41276, where it expanded at 28% per year according to serial VAFs (29% per year by phylodynamic estimate), versus 10% per year in PD34493 (17% per year by phylodynamics) (Fig. 3i). Reasons for this difference are unclear, but it is notable that the faster-growing clone had antecedent Y loss (Fig. 3a), an aberration seen in clades from all three individuals and associated with only modest clonal expansion when isolated (Fig. 3a–c). Of note, clones without known drivers began to expand within the first two decades of life and grew over their lifetimes at rates similar to clones with known drivers (14–32% per year) (Fig. 3g, h, Extended Data Fig. 6).
Many clones decelerate before older age
As the phylodynamic reconstruction of a clone goes back to its inception, we investigated whether clonal growth dynamics during earlier life deviate from the stable growth observed during older age. To corroborate observations from the three individuals depicted in Fig. 3, we conducted additional phylodynamic analyses of trees derived from 1,461 whole-genome-sequenced single cell-derived colonies from another four individuals 75–81 years of age from the study by Mitchell et al.36. This revealed that, in many instances, the reconstructed effective population size (Neff) of any individual clone grew more slowly towards the sampling date and before it saturated the HSC compartment (Fig. 4a, b, Extended Data Fig. 7a–c). This characteristic deceleration was quantified by fitting a biphasic exponential growth model to early and late parts of the trajectories (Fig. 4c). In most cases, extrapolating early growth (a consistent estimator of the fitness advantage of a clone in Wright–Fisher simulations; Extended Data Figs. 7d, 8) led to substantial overestimations of clade size (median 35×; Fig. 4d, Extended Data Fig. 7e).
We used our longitudinal cohort to orthogonally test the lifelong stability of clonal growth by extrapolating the observed (fitted) trajectory of each clone backwards in time to infer the age at clonal onset. To account for stochastic drift, which can lead to faster growth of small clones, and the finite carrying capacity of the HSC population, which naturally limits or slows large clones, we derived and used an approximation to a Wright–Fisher process (Extended Data Fig. 4a, b). Whereas estimates of age at clonal onset agreed with phylogenetic estimates for the fast-growing splice factor mutations (Fig. 3i), for many other clones, constant lifelong growth at the rate we observed during old age would be too slow to explain the observed VAFs (Fig. 4e–g), suggesting that clonal expansion was faster in earlier life. These observations reveal that, at least for some clones and genes, the dynamics observed in later life are not representative of those that prevail earlier.
We then assessed the minimum lifetime rate at which clones must have grown in order to reach the observed VAFs in our longitudinal data—hereafter termed ‘historical growth’—by restricting fits and solutions to growth rates that would place the age of clonal onset within individuals’ lifetimes (Fig. 4h, Supplementary Table 8). Expectedly, this minimal historical growth rate was typically higher than the growth rate observed during the study period (that is, in older age; Fig. 4i, Extended Data Fig. 7f). Moreover, the fold changes between historical and observed growth rates derived from longitudinal data were qualitatively in good agreement with the fold changes between late growth and expected growth (the latter assuming growth is constant through life and carrying capacity is fixed) derived from phylodynamic data (Fig. 4c, i, Extended Data Fig. 7f). Thus it emerges that many clones grew more rapidly early in life compared with the rate in old age.
Driver genes and lifelong clonal growth
The effect of deceleration was most marked for clones bearing mutations in DNMT3A, BRCC3 and TP53, whose early growth was at least twice as fast as that measured during old age (Fig. 4i, j). Conversely, we observed almost no deceleration of fast-growing clones harbouring U2AF1, SRSF2P95H, PTPN11 or IDH1 mutations (Fig. 4i, j). It is particularly notable that the TET2-mutant clones were much less susceptible to deceleration than DNMT3A-mutant clones (Fig. 4i, j). This is consistent with the observation that the prevalence of TET2-mutant clonal haematopoiesis is higher at older ages and eventually exceeds that of DNMT3A-mutant clonal haematopoiesis, which is more prevalent at younger ages (Fig. 1d). A declining relative advantage of DNMT3A mutations in older age was also suggested by the much lower proportion of DNMT3A-mutant clones reaching detectable limits during our study period compared with clones bearing mutations in other genes (‘incipient clones’) (Extended Data Fig. 9a).
To derive representative ranges for age at clone onset for each driver gene, we capped individual estimates at conception, thus avoiding estimates that projected beyond individuals’ lifetimes (Fig. 4k, Extended Data Fig. 9b, c). We also validated this method using simulations and confirmed that these ranges are not affected by changes in Neff or generation time (Extended Data Fig. 9d, e). We estimated that the average latency between clone foundation and detection in peripheral blood at VAF ≥ 0.2% (Supplementary Note 1) was 30 years across all clones, with considerable variability between mutant genes, ranging from 38 years for DNMT3A-mutant clones to 12 years for U2AF1-mutant clones. Most drivers were projected to initiate expansions of clones throughout life, compatible with the notion that somatic mutations occur at a constant rate32–34. However, solutions for DNMT3A-mutant clones concentrated earlier in life, consistent with early initiation and rapid expansion followed by marked deceleration then slow growth, as previously mentioned. Of note, capping onset at conception is arbitrary and it remains possible that some clones start later and exhibit faster initial growth followed by even stronger deceleration, a scenario that would be more consistent with published fitness estimates of 11–19% per year based on cross-sectional VAF measurements13. By contrast, SRSF2P95H and U2AF1 mutations initiated clonal expansion always after 30 years of age and with a median age at onset of 58 and 57 years, respectively (Fig. 4k). This indicates that the reported rarity of these mutant clones1–3 in people aged less than 60 years is not owing to slow growth over decades, but rather owing to their late onset followed by rapid expansion, and provides a plausible explanation for the high risk of leukaemic progression associated with these mutations5,40.
Clonal haematopoiesis dynamics and malignancy
To investigate the links between mutation fitness and malignant progression, we built on our previous study of acute myeloid leukaemia (AML) risk prediction5, and revealed that among clonal haematopoiesis driver genes, a faster growth rate was associated with a higher AML risk (adjusted R2 = 0.55, P = 0.0037; Fig. 5a). For example, genes driving fast clonal haematopoiesis growth—such as SRSF2 and U2AF1—were associated with the highest risks of leukaemogenesis, whereas slow-growing clones—such as those bearing DNMT3A mutations—conferred a lower risk. To confirm our findings in larger studies and include myeloid malignancies other than AML, we analysed large published datasets of AML41 (n = 1,540) and myelodysplastic syndromes42 (MDS) (n = 738) using a site-specific extension of the dNdScv algorithm to formally quantify the extent to which individual hotspots are under the influence of positive selection in these cancers25 (Supplementary Tables 9, 10). This analysis revealed a positive correlation between each hotspot’s growth coefficient in clonal haematopoiesis and its selection strength in myeloid cancer (adjusted R2 = 0.19, P = 0.0016; Fig. 5b), corroborating the AML risk analysis. Nevertheless, the observation that the same clonal haematopoiesis driver gene can progress to either AML or MDS, with variable predilections as quantified by gene-level dN/dS comparison (Extended Data Fig. 10, Supplementary Table 10), suggests that factors other than growth rate can also influence a mutation’s malignant potential.
Discussion
The phenomenon of clonal haematopoiesis has served as an exemplar in the developing understanding of somatic mutation, clonal selection and oncogenesis in human tissues10,43. However, the nature of these interrelated processes can change over time and their consequences develop only slowly, making them difficult to investigate. Here, we studied the longitudinal behaviour of clonal haematopoiesis over long periods (median 13 years) and combined this with lifelong phylodynamic analyses of haematopoiesis to derive new insights into these fundamental biological processes.
First, we found that most clones (92%) display stable exponential growth dynamics in older age, at rates influenced by their driver mutations. This enabled us to predict future clonal growth trajectories, a finding with potentially useful implications for clinical practice (Extended Data Fig. 3f–h). Notably, mutations in DNMT3A, reportedly the most common clonal haematopoiesis driver gene1,2,4, were associated with slower clonal expansion than most other clonal haematopoiesis genes. Also, DNMT3A hotspot mutations (for example, at codon R882) were not associated with faster growth than other DNMT3A mutations (Fig. 2c). By contrast, TET2-mutant clones expanded significantly faster over the study period (Fig. 2c) and, reflecting this, also reached detectable levels much more frequently on-study than DNMT3A-mutant clones (Extended Data Fig. 9a). This resulted in TET2 becoming the most prevalent clonal haematopoiesis driver after the age of 75 years (Fig. 1d).
These findings suggested that, although clonal growth is remarkably stable in old age, dynamics in earlier life may deviate from this behaviour, challenging the premise that mutation fitness is constant over the human lifespan13. To test this, we first attempted to derive when individual clonal haematopoiesis clones were founded, using simple retrograde extrapolation of observed trajectories. This led to projected ages at clonal foundation that preceded conception for a large number of clones (Fig. 4f, g), implying that their early growth must have been faster than that we observed during old age. This was most striking for DNMT3A, for which more than two thirds of projections were implausible (that is, onset pre-conception), but less common for TET2 and very uncommon for splicing factor genes (Fig. 4g).
To further investigate lifelong clonal behaviour, we analysed haematopoietic phylogenies from healthy old individuals and found that aged haematopoiesis was dominated by a small number of expanded HSC clones, some of which lacked recognizable drivers36. Using phylodynamic approaches to track clonal growth rates through life, in conjunction with findings from our longitudinal cohort, we reveal widespread clonal deceleration prior to the period of stable growth during old age, in the context of an increasingly competitive oligoclonal HSC compartment (Fig. 4i). DNMT3A-mutant clones, as well as those bearing mutations in TP53 and BRCC3 and also apparently driverless clones, were among those displaying the most marked degree of deceleration (Fig. 4i). The faster growth of DNMT3A-mutant clones in early life is supported by comparison with the findings of Watson et al., who analysed cross-sectional VAF spectra from 50,000 individuals and estimated average clonal growth rates across the first 55 years of life; expansion of clones was substantially faster in younger individuals13 (15.0% per year) compared with older individuals (6.2% per year) (from our study) (Supplementary Table 11). By contrast, TET2 mutations appeared to drive more stable lifelong growth (Fig. 4h–j), which may underlie their apparent ability to initiate clonal expansion fairly uniformly through life (Fig. 4k) and the fact that TET2 ‘overtakes’ DNMT3A as the most common clonal haematopoiesis driver after 75 years of age (Fig. 1d and ref. 44).
In diametric contrast to DNMT3A and unlike other genes, clonal haematopoiesis driven by mutant U2AF1 and SRSF2P95H initiated only late in life (Fig. 4k) and exhibited some of the fastest expansion dynamics (Fig. 2c). These data were corroborated by phylogenetic analyses (Fig. 3b, f) and tally with the sharp increase in prevalence of splice factor-mutant clonal haematopoiesis3, MDS42,45,46 and AML41,47 in old age and the high risk of progression to myeloid cancers associated with these mutations5. The particular behaviour of these clones suggests a specific interaction with ageing, which could relate to cell-intrinsic factors or to cell-extrinsic changes in the aging haematopoietic niche that favour splice factor mutations48,49.
Finally, we explored the relationship between clonal growth rate in clonal haematopoiesis and the development of myeloid cancers. We find that mutations associated with faster clonal haematopoiesis growth are also those associated with higher risk of progression to AML (Fig. 5a) and are under the strongest selective pressure in AML and MDS (Fig. 5b). Indeed, we show that the average annual growth per gene explains more than 50% of the variance in AML risk progression. This shows that an improved understanding of growth dynamics in clonal haematopoiesis can help identify those at risk of myeloid malignancies.
Collectively, our work gives new insights into the lifelong clonal dynamics of different subtypes of clonal haematopoiesis, the impact of ageing on haematopoiesis, and the processes linking somatic mutation, clonal expansion and malignant progression.
Methods
Study participants
Ethical permission for this study was granted by The East of England (Essex) Research Ethics Committee (REC reference 15/EE/0327). The SardiNIA longitudinal study recruited individuals from four towns in the Lanusei Valley in Sardinia, capturing 5 phases of sample and data collection26 over more than 20 years. Informed consent was obtained from all participants. We analysed serial samples from 385 individuals in the SardiNIA project.
Targeted sequencing and variant calling
Target enrichment of whole-blood DNA was performed using a custom RNA bait set (Agilent SureSelect ELID 3156971), designed complementary to 56 genes implicated in clonal haematopoiesis and haematological malignancies (Supplementary Table 1). Libraries were sequenced on Illumina HiSeq 2000 and variant calling was performed as we described previously5,50. In brief, somatic single-nucleotide variants and small indels were called using Shearwater (v.1.21.5), an algorithm designed to detect subclonal mutations in deep sequencing experiments51. Two additional variant-calling algorithms were applied to complement this approach: CaVEMan (v.1.11.2) for single-nucleotide variants, and Pindel (v.2.2) for small indels52,53. VAF correction was performed using an in-house script (https://github.com/cancerit/vafCorrect). Finally, allele counts at recurrent mutation hotspots were verified using an in-house script (https://github.com/cancerit/allelecount). Variants were filtered as we described previously5,50, but were not curated with regard to existing notions of oncogenicity, that is, all somatic variants passing quality filters were retained for analysis.
If a variant was identified in an individual at any time point in the study, this site was re-queried in the same individual at all other time points, using an in-house script (cgpVAF) to provide pileup (SNV) and Exonerate (indel) output (https://github.com/cancerit/vafCorrect). No additional filters were applied to these back-called variants.
Selection analyses
To quantify selection, we used the dNdScv algorithm, a maximum-likelihood implementation of dN/dS, which measures the ratio of non-synonymous (N) to synonymous (S) mutations, while controlling for gene sequence composition and variable substitution rates27. We first applied this method to the mutation calls from the longitudinal SardiNIA cohort in order to identify which genes are under positive selection in the context of clonal haematopoiesis. For this analysis, any mutation that was present in a single individual at multiple time points was counted only once. We also compared dN/dS ratios at the beginning and end of study, and found the latter to be higher, consistent with stronger cumulative effects of selection at older ages (Supplementary Note 2).
To characterize patterns of selection in AML and MDS, we applied dNdScv to two published data sets. The AML set was derived from 1,540 patients enrolled in three prospective trials of intensive therapy41. The MDS set included 738 patients with MDS or closely related neoplasms such as chronic myelomonocytic leukaemia42. Both used deep targeted sequencing of 111 cancer genes, which overlapped with 13 of the 17 genes of interest in our longitudinal clonal haematopoiesis study (PPM1D, CTCF, GNB1 and BRCC3 were not sequenced in the AML or MDS studies). We called and filtered variants in the 13 overlapping genes using the strategy described above (‘Targeted sequencing and variant calling’). Variants were identified in all 13 genes in both AML and MDS datasets (Supplementary Table 10). We calculated dN/dS values both at the level of individual genes, and at single-site level for hotspots, the latter using the sitednds function in the dNdScv R package.
Finally, we compared dN/dS ratios in shared and private branches of the three phylogenies, and found selection to be stronger in the former, consistent with the fact that mutations along shared branches were the ones driving subsequent clonal expansions (and therefore were more strongly selected) (Supplementary Note 2).
Modelling of clone trajectories through time
We use Bayesian hierarchical modelling to model clonal trajectories. Since we are unable to reliably phase different mutations into specific clones (Supplementary Note 3) and given that individual clonal haematopoiesis clones typically harbour a single driver mutation54, we assume that each mutation is heterozygous and its VAF is representative of the prevalence of a single clone. Accordingly, for a given individual j and mutation i, we have a mutant clone cij. We model the counts for at age as a binomial distribution (Bin), such that , with as the coverage of this mutation at age and as the expected proportion of mutant allele copies. As such, , where BB is the beta binomial distribution. Here, is the technical overdispersion parameterized as a normal distribution whose parameters (μod and σod, the mean and standard deviation, respectively) are estimated using replicate data (details below) and , where , with ilogit representing the inverse function. We use this parameterization to guarantee that . and are the gene and site growth effects for mutation , respectively. is the growth effect associated exclusively with mutation in individual —that is, of mutant clone —and is the offset accounting for the onset of different clones at different points in time. We also define the growth effect of as . Throughout this work we will refer to as the driver (growth) effect and to as the unknown-cause (growth) effect—the fraction of growth that is quantifiable but not explained by the driver mutation, and is attributable to other factors that may affect clonal growth, but differ between individuals, such as age, sex, interclonal competition and others.
Preventing identifiability issues and reducing uninformed estimates
To address possible identifiability issues in our model, when a gene has a single mutation (JAK2V617F and IDH2R140Q), the effect is considered to occur only at the site level. To avoid estimating the dynamics of a site from a single individual, we only model when two or more individuals have a missense mutation on site , we refer to these sites as ‘recurrent sites’. Overall, we consider a total of 17 genes and 39 recurrent sites (Supplementary Table 5).
Estimating and validating growth parameters
Using the model described above, we use Markov chain Monte Carlo (MCMC) with a Hamiltonian Monte Carlo (HMC) sampler with 150–300 leapfrog steps as implemented in greta55. We sample for 5,000 iterations and discard the initial 2,500 to get estimates for the distribution of our parameters. As such, our estimates for each parameter are obtained considering their mean, median and 95% highest density posterior interval for 2,500 samples.
We assess the goodness-of-fit using the number of outliers detected in any trajectory and consider only trajectories with no outliers as being explained by our model and, as such, growing at constant rate. Outliers are assessed by calculating the tail probabilities of the counts under our model with a hard cut-off at 2.5%. Thus, if and otherwise. We validate this approach using Wright–Fisher simulations (Supplementary Methods). We additionally assess the predictive power of this model on an additional time point that was available for a subset of individuals and that was not used in the inference of parameters in our model (Supplementary Methods).
Estimating the technical overdispersion parameter
Technical VAF overdispersion used two distinct sets of data:
Horizon Tru-Q-1 was serially diluted to VAFs of 0.05, 0.02, 0.01, 0.005 and 0 using Horizon Tru-Q-0 (verified wild-type at these variant sites), then sequenced in duplicate or triplicate;
19 SardiNIA samples with mutations across 15 genes at a range of VAFs, were sequenced in triplicate.
Sample processing and analysis was performed as described in ‘Targeted sequencing and variant calling’ section. Replicate samples were picked from the same stock of DNA, then library preparation and sequencing steps were performed in parallel. Variant calls for these replicate samples are in Supplementary Table 12.
For (1), we model the distribution over the expected as a beta distribution such that and for (2) we adopt a model identical to the one described earlier in this section but use only gene growth effects (, ). Here, we model with as a variable with no prior. We use MCMC with HMC sampling with 400–500 leapfrog steps as implemented in greta55 to estimate the mean and standard deviation of . For this estimate we use 1,000 samples from the posterior distribution.
Non-mutation factors and clonal growth rate
Inherited polymorphisms and JAK2-mutant clonal growth
The SardiNIA cohort had previously been characterized using two Illumina custom arrays: the Cardio-MetaboChip and the ImmunoChip26. Inherited genotypes at 12 loci previously associated with MPN risk were extracted for the 12 individuals with JAK2V617F mutation23,24. The relationship between each individual’s total number of inherited risk alleles and JAK2-mutant clonal growth rate was assessed by Pearson’s correlation. The 46/1 haplotype, which harbours 4 SNPs in complete linkage disequilibrium, was considered as a single risk allele.
Age, sex and smoking experience
We assess the association between unknown-cause growth and age through the calculation of a Pearson correlation considering all genes, both together and separately while controlling for multiple testing. We also assess the association between unknown-cause growth and sex and smoking history using a multivariate regression where unknown-cause growth is the dependent variable and sex and previous smoking experience are the covariates, while also controlling for age.
Determining the age at clone onset
We consider that HSC clones grow according to a Wright–Fisher model. According to this, for an initial population of HSC , we can consider two scenarios—that of a single growth process where the time at which the cell first starts growing is described as , or that of a two-step growth process, where , where is the number of generations per year. The latter scenario is the one chosen, due to its strong theoretical foundation and previous application to mathematical modelling of cancer evolution56. The two regimes that describe it are an initial stochastic growth regime and, once the clone reaches a sufficient population size, a deterministic growth regime. The adjustment made to in can be interpreted as first estimating the age at which the clone reached the deterministic growth phase () followed by subtracting the expected time for a clone to overcome its stochastic growth phase . For both and we use the estimates based on ref. 33: and . We validate this approach using simulations (Supplementary Methods) and test the approach against our serial VAF data and verify that changes in and do not have a marked effect on age at onset estimates by considering a range of values ( and ).
Cell colonies and phylogenetic trees
Sample preparation and sequencing
We selected 3 individuals with splicing gene mutations from the SardiNIA cohort for detailed blood phylogenetic analysis. Peripheral blood samples were drawn into Lithium-heparin tubes (vacutest, kima, 9 ml) and buccal samples were taken (Orangene DNA OG-250). Peripheral blood mononuclear cells were isolated from blood and plated at 50,000 cells per ml in MethoCult 4034 (Stemcell Technologies). After 14 days in culture, 96 single haematopoietic colonies were plucked per individual (total 288 colonies, each made up of hundreds to thousands of cells) and lysed in 50 μl of RLT lysis buffer (Qiagen).
Library preparation for WGS was performed using our low-input pipeline as previously described57,58. The 150 bp paired-end sequencing reads were generated using the NovaSeq 6000 platform to a mean sequencing depth of 15× per sample. Reads were aligned to the human reference genome (NCBI build37) using BWA-MEM.
Variant calling and filtering
Single-nucleotide variants (SNVs) and small indels were called against an unmatched reference genome using the in-house pipelines CaVEMan and Pindel, respectively52,53. ‘Normal contamination of tumour’ was set to 0.05; otherwise, standard settings and filters were applied. For all mutations passing quality filters in at least one sample, in-house software (cgpVAF, https://github.com/cancerit/vafCorrect) was used to produce matrices of variant and normal reads at each mutant site for all colonies from that individual. Copy-number aberrations and structural variants were identified using matched-normal ASCAT59 and BRASS (https://github.com/cancerit/BRASS). Low-coverage samples (mean <4×) were excluded from downstream analysis (n = 1, PD41305). Samples in which the peak density of somatic mutation VAFs was lower than expected for heterozygous changes (in practice VAF < 0.4) were suspected to be contaminated or mixed colonies, and were also excluded from further analysis (n = 3, PD41305; n = 9, PD41276; n = 3, PD34493).
Multiple post-hoc filtering steps were then applied to remove germline mutations, recurrent library prep or sequencing artefacts, and in vitro mutations, as described previously60 and detailed in custom R scripts (https://github.com/margaretefabre/Clonal_dynamics). Buccal samples were used as an additional filter; mutations were removed if the variant:normal count in the buccal sample was consistent with that expected for a germline mutation (0.5 for autosomes and 0.95 for X and Y chromosomes, binomial probability >0.01), and were retained if (1) the variant:normal count in the buccal sample was not consistent with germline (binomial probability <1 × 10−4) and (2) the mutation was not present in either of 2 large SNP databases (1000 Genomes Project and Kaviar) with MAF > 0.001.
Phylogenetic tree construction and assignment of mutations back to the tree
These steps were also performed as described previously60 and are detailed here: https://github.com/margaretefabre/Clonal_dynamics. In brief, samples were assigned a genotype for each mutation site passing filtering steps (‘present’ = ≥2 variant reads and probability > 0.05 that counts came from a somatic distribution; ‘absent’ = 0 variant reads and depth ≥6; ‘unknown’ = neither ‘absent’ nor ‘present’ criteria met). The proportion of ‘unknown’ genotypes going into tree-building was low: 1.5% (PD34493), 1.4% (PD41276) and 1.3% (PD41305; Extended Data Fig. 5a–c). A genotype matrix of shared mutations was fed into the MPBoot program61, which constructs a maximum parsimony phylogenetic tree with bootstrap approximation. The in-house-developed R package treemut (https://github.com/NickWilliamsSanger/treemut), which uses original count data and a maximum likelihood approach, was then used to assign mutations back to individual branches on the tree. Since individual edge length is influenced by the sensitivity of variant calling, lengths were scaled by 1/sensitivity, where sensitivity was calculated as the proportion of germline variants called (mean sensitivity: 85.4%, 87.0% and 83.5% for PD41305, PD41276 and PD34493, respectively). The approaches we used to validate the phylogenies, including comparison of MPBoot with an alternative phylogeny-inference algorithm, SCITE62, are detailed in Supplementary Methods.
Reconstruction of population trajectories
Phylogenies were made ultrametric (branch lengths normalized) using a bespoke R function (make.tree.ultrametric, https://github.com/margaretefabre/Clonal_dynamics/my_functions). With the root of the tree representing conception and the tips representing age at sampling, we scaled the age axis in two phases by: (1) assigning the first 55 mutations to the period between conception and birth (in light of evidence for this higher rate of mutation acquisition during this period36,60, and (2) scaling the axis linearly throughout life after birth (in light of evidence for a constant rate of mutation acquisition in HSCs during postnatal life32–37. We then analysed population size trajectories by fitting Bayesian nonparametric phylodynamic reconstructions (BNPR) as implemented in the phylodyn R package38,39 to clades - sets of samples in a phylogenetic tree sharing a most recent common ancestor (MRCA)—defined by either having a driver mutation on the MRCA or a MRCA branch length that spans more than 10% of the tree depth and with 5 tips or more. We also estimated the lower and upper bounds for age at onset of clonal expansion to be the limits of the branch containing the most recent common ancestor.
Detection of clonal deceleration
We detect deceleration using two different approaches—the ratio between expected and observed clone size using phylodynamic estimates and the ratios between observed and historical (from longitudinal data) and between late and expected (from phylogenetic data), respectively. To obtain the late growth rate we fit a biphasic log-linear model to our phylodynamic estimation of Neff—this enables us to obtain an early and a late growth rate (details in the Supplementary Methods).
Expected and observed clone size
The expected clone size is calculated by extrapolating the early growth rate until the age of sampling; having this we can calculate the ratio between expected and observed growth. The ratio between these quantities is then used as a measure of deceleration (details in the Supplementary Methods).
Growth ratio in phylogenetic data
The late growth rate is defined as the late growth rate defined in the previous section of the methods. The expected growth rate for the phylogenies is calculated as the growth coefficient for a sigmoidal regression that assumes a population size of 200,000 HSC as the carrying capacity. We then use the ratio between these quantities as a measure of deceleration (1 implies no deceleration; <1 implies deceleration).
Growth ratio in longitudinal data
The observed growth rate is defined as the growth rate inferred directly from the data. The minimal historical growth is the growth rate estimate obtained by restricting clone initiation to a time after conception (age at at onset > −1).
Clonal haematopoiesis dynamics and malignant progression
To calculate the association between clonal haematopoiesis dynamics and AML we used the risk coefficients from our previous work in predicting the onset of AML5, which were calculated by fitting a Cox-proportional hazards model that calculated the risk of AML onset associated with each gene (agnostic of clone size) while controlling for age, sex and cohort, and estimate the coefficient of correlation between the expected value of the annual growth for the posterior distribution of each gene (considering gene, site and unknown-cause effects) and the AML progression risk.
The association between clonal haematopoiesis dynamics and selection in MDS and AML use the dN/dS values calculated with dNdScv as previously described in the methods, using two distinct cohorts from previous studies41,42. dN/dS values were calculated for all hotspots and their coefficient of correlation with the expected value of the annual growth for the posterior distribution of each hotspot (also considering gene, site and unknown-cause effects) was calculated.
Statistical analyses
All statistical analyses were conducted using the R software63 - MCMC models were fitted using greta55 and hypothesis testing, generalized linear models and maximum likelihood fits were performed in base R.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-022-04785-z.
Supplementary information
Acknowledgements
This work was funded by a joint grant from the Leukemia and Lymphoma Society (RTF6006–19) and the Rising Tide Foundation for Clinical Cancer Research (CCR-18–500) and by the Wellcome Trust (WT098051). M.A.F. is funded by a Wellcome Clinical Research Fellowship (WT098051). J.G.d.A. is supported by the NIHR Cambridge BRC and their opinions are not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. G.S.V. is funded by a Cancer Research UK Senior Cancer Fellowship (C22324/A23015) and work in his lab is also funded by the European Research Council, Kay Kendall Leukaemia Fund, Blood Cancer UK and the Wellcome Trust. E.F.M. is supported by the Wellcome Trust and Beit Foundation (104064/Z/14/Z) and by the EC H2020. The collection of samples and data from the SardiNIA longitudinal cohort study was supported by the Intramural Research Program of the NIH, National Institute on Aging (NIA) of the National Institute of Health (NIH) with contracts N01-AG-1- 2109 and HHSN271201100005C; and by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement 633964 (ImmunoAgeing). We thank J. Blundell and C. Watson for helpful discussions relating to comparison between the findings in this manuscript and their published work13.
Extended data figures and tables
Author contributions
G.S.V. and M.G. conceived and supervised the study. M.A.F., J.G.d.A. and M.S.V. carried out analyses and generated data figures. M.G. and J.G.d.A. developed and implemented the statistical modelling of clonal dynamics. V.O., E.F., M.M. and F.C. oversaw the SardiNIA cohort. V.O., E.F., M.M., E.F.M. and F.C. provided samples and data from the Immunoageing study. A.D., J.R., C.H., J.B., M.A.F. and G.S.V. processed participant samples and performed assays. F.A., N.W., J.N. and I.M. generated computational code used in this paper. E.M., M.S.C. and P.J.C. provided single-cell-derived colony WGS data and helped with data analysis/interpretation.
Peer review
Peer review information
Nature thanks Sudhir Kumar, Johannes Reiter and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer review reports are available.
Data availability
The data files necessary to run the analysis in https://github.com/josegcpa/clonal_dynamics are freely available at 10.6084/m9.figshare.15029118. All sequencing data have been deposited in the European Genome–phenome Archive (EGA) (https://www.ebi.ac.uk/ega/). Targeted sequencing data have been deposited with EGA accession numbers EGAD00001007682 and EGAD00001007683; WGS data have been deposited with accession number EGAD00001007684. Data from the EGA are accessible for research use only to all bona fide researchers, as assessed by the Data Access Committee (https://www.ebi.ac.uk/ega/about/access). Data can be accessed by registering for an EGA account and contacting the Data Access Committee.
Code availability
All analyses reported in this study used the statistical software R (v.3.6.3). All R files used for the longitudinal and phylodynamic modelling and validation are publicly available at https://github.com/josegcpa/clonal_dynamics. All files used for the construction of phylogenetic trees are publicly available at https://github.com/margaretefabre/Clonal_dynamics.
Competing interests
G.S.V. is a consultant for STRM.BIO and receives a research grant from Astrazeneca. The other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Margarete A. Fabre, José Guilherme de Almeida
Contributor Information
Moritz Gerstung, Email: moritz.gerstung@dkfz.de.
George S. Vassiliou, Email: gsv20@cam.ac.uk
Extended data
is available for this paper at 10.1038/s41586-022-04785-z.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-022-04785-z.
References
- 1.Jaiswal S, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 2014;371:2488–2498. doi: 10.1056/NEJMoa1408617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Genovese G, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 2014;371:2477–2487. doi: 10.1056/NEJMoa1409405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.McKerrell T, et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Rep. 2015;10:1239–1245. doi: 10.1016/j.celrep.2015.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Xie M, et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 2014;20:1472–1478. doi: 10.1038/nm.3733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Abelson S, et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature. 2018;559:400–404. doi: 10.1038/s41586-018-0317-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sellar RS, Jaiswal S, Ebert BL. Predicting progression to AML. Nat. Med. 2018;24:904–906. doi: 10.1038/s41591-018-0114-7. [DOI] [PubMed] [Google Scholar]
- 7.Lipschitz DA, Udupa KB, Milton KY, Thompson CO. Effect of age on hematopoiesis in man. Blood. 1984;63:502–509. [PubMed] [Google Scholar]
- 8.de Haan G, Lazare SS. Aging of hematopoietic stem cells. Blood. 2018;131:479–487. doi: 10.1182/blood-2017-06-746412. [DOI] [PubMed] [Google Scholar]
- 9.Mohrin M, et al. Hematopoietic stem cell quiescence promotes error-prone DNA repair and mutagenesis. Cell Stem Cell. 2010;7:174–185. doi: 10.1016/j.stem.2010.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jaiswal S, Ebert BL. Clonal hematopoiesis in human aging and disease. Science. 2019;366:eaan4673. doi: 10.1126/science.aan4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Desai P, et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 2018;24:1015–1023. doi: 10.1038/s41591-018-0081-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Young AL, Challen GA, Birmann BM, Druley TE. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 2016;7:12484. doi: 10.1038/ncomms12484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Watson CJ, et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science. 2020;367:1449–1454. doi: 10.1126/science.aay9333. [DOI] [PubMed] [Google Scholar]
- 14.McKerrell T, et al. JAK2 V617F hematopoietic clones are present several years prior to MPN diagnosis and follow different expansion kinetics. Blood Adv. 2017;1:968–971. doi: 10.1182/bloodadvances.2017007047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Heuser M, et al. Genetic characterization of acquired aplastic anemia by targeted sequencing. Haematologica. 2014;99:e165–e167. doi: 10.3324/haematol.2013.101642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kulasekararaj AG, et al. Somatic mutations identify a subgroup of aplastic anemia patients who progress to myelodysplastic syndrome. Blood. 2014;124:2698–2704. doi: 10.1182/blood-2014-05-574889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lane AA, et al. Low frequency clonal mutations recoverable by deep sequencing in patients with aplastic anemia. Leukemia. 2013;27:968–971. doi: 10.1038/leu.2013.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yoshizato T, et al. Somatic mutations and clonal hematopoiesis in aplastic. anemia. N. Engl. J. Med. 2015;373:35–47. doi: 10.1056/NEJMoa1414799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Coombs CC, et al. Therapy-related clonal hematopoiesis in patients with non-hematologic cancers is common and associated with adverse clinical outcomes. Cell Stem Cell. 2017;21:374–382.e4. doi: 10.1016/j.stem.2017.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gibson CJ, et al. Clonal hematopoiesis associated with adverse outcomes after autologous stem-cell transplantation for lymphoma. J. Clin. Oncol. 2017;35:1598–1605. doi: 10.1200/JCO.2016.71.6712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wong TN, et al. Role of TP53 mutations in the origin and evolution of therapy-related acute myeloid leukaemia. Nature. 2015;518:552–555. doi: 10.1038/nature13968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Meisel M, et al. Microbial signals drive pre-leukaemic myeloproliferation in a Tet2-deficient host. Nature. 2018;557:580–584. doi: 10.1038/s41586-018-0125-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bick AG, et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature. 2020;586:763–768. doi: 10.1038/s41586-020-2819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hinds DA, et al. Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms. Blood. 2016;128:1121–1128. doi: 10.1182/blood-2015-06-652941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zink F, et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood. 2017;130:742–752. doi: 10.1182/blood-2017-02-769869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Orrù V, et al. Genetic variants regulating immune cell levels in health and disease. Cell. 2013;155:242–256. doi: 10.1016/j.cell.2013.08.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Martincorena I, et al. Universal patterns of selection in cancer and somatic tissues. Cell. 2017;171:1029–1041.e21. doi: 10.1016/j.cell.2017.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Beerenwinkel N, et al. Genetic progression and the waiting time to cancer. PLoS Comput. Biol. 2007;3:e225. doi: 10.1371/journal.pcbi.0030225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Huang Y-H, et al. Systematic profiling of DNMT3A variants reveals protein instability mediated by the DCAF8 E3 ubiquitin ligase adaptor. Cancer Discov. 2022;12:220–235. doi: 10.1158/2159-8290.CD-21-0560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ferrone CK, Blydt-Hansen M, Rauh MJ. Age-associated TET2 mutations: common drivers of myeloid dysfunction, cancer and cardiovascular disease. Int. J. Mol. Sci. 2020;21:626. doi: 10.3390/ijms21020626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Boettcher S, et al. A dominant-negative effect drives selection of TP53 missense mutations in myeloid malignancies. Science. 2019;365:599–604. doi: 10.1126/science.aax3649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Blokzijl F, et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature. 2016;538:260–264. doi: 10.1038/nature19768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lee-Six H, et al. Population dynamics of normal human blood inferred from somatic mutations. Nature. 2018;561:473–478. doi: 10.1038/s41586-018-0497-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Osorio FG, Huber AR, Oka R, Verheul M, Patel SH. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 2018;25:2308–2316.e4. doi: 10.1016/j.celrep.2018.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Abascal F, et al. Somatic mutation landscapes at single-molecule resolution. Nature. 2021;593:405–410. doi: 10.1038/s41586-021-03477-4. [DOI] [PubMed] [Google Scholar]
- 36.Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature10.1038/s41586-022-04786-y (2021). [DOI] [PMC free article] [PubMed]
- 37.de Kanter JK, et al. Antiviral treatment causes a unique mutational signature in cancers of transplantation recipients. Cell Stem Cell. 2021;28:1726–1739.e6. doi: 10.1016/j.stem.2021.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Karcher MD, Palacios JA, Lan S, Minin V. N. phylodyn: an R package for phylodynamic simulation and inference. Mol. Ecol. Resour. 2017;17:96–100. doi: 10.1111/1755-0998.12630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lan S, Palacios JA, Karcher M, Minin VN, Shahbaba B. An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics. Bioinformatics. 2015;31:3282–3289. doi: 10.1093/bioinformatics/btv378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Desai P, et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 2018;24:1015–1023. doi: 10.1038/s41591-018-0081-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Papaemmanuil E, et al. Genomic classification and prognosis in acute myeloid. leukemia. N. Engl. J. Med. 2016;374:2209–2221. doi: 10.1056/NEJMoa1516192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Papaemmanuil E, et al. Clinical and biological implications of driver mutations in myelodysplastic syndromes. Blood. 2013;122:3616–3627. doi: 10.1182/blood-2013-08-518886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. Science. 2015;349:1483–1489. doi: 10.1126/science.aab4082. [DOI] [PubMed] [Google Scholar]
- 44.Rossi M, et al. Clinical relevance of clonal hematopoiesis in the oldest-old population. Blood. 2021;138:2093–2105. doi: 10.1182/blood.2021011320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Haferlach T, et al. Landscape of genetic lesions in 944 patients with myelodysplastic syndromes. Leukemia. 2014;28:241–247. doi: 10.1038/leu.2013.336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Schwartz JR, et al. The genomic landscape of pediatric myelodysplastic syndromes. Nat. Commun. 2017;8:1557. doi: 10.1038/s41467-017-01590-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Takita J, et al. Novel splicing-factor mutations in juvenile myelomonocytic leukemia. Leukemia. 2012;26:1879–1881. doi: 10.1038/leu.2012.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Latchney SE, Calvi LM. The aging hematopoietic stem cell niche: phenotypic and functional changes and mechanisms that contribute to hematopoietic aging. Semin. Hematol. 2017;54:25–32. doi: 10.1053/j.seminhematol.2016.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Griffith JF. Age-related changes in the bone marrow. Curr. Radiol. Rep. 2017;5:24. [Google Scholar]
- 50.Fabre MA, et al. Concordance for clonal hematopoiesis is limited in elderly twins. Blood. 2020;135:269–273. doi: 10.1182/blood.2019001807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gerstung M, Papaemmanuil E, Campbell PJ. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics. 2014;30:1198–1204. doi: 10.1093/bioinformatics/btt750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Jones D, et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics. 2016;56:15.10.1–15.10.18. doi: 10.1002/cpbi.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Raine KM, et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics. 2015;52:15.7.1–15.7.12. doi: 10.1002/0471250953.bi1507s52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Miles LA, et al. Single-cell mutation analysis of clonal evolution in myeloid malignancies. Nature. 2020;587:477–482. doi: 10.1038/s41586-020-2864-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Golding N. greta: simple and scalable statistical modelling in R. J. Open Source Softw. 2019;4:1601. [Google Scholar]
- 56.Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst. Biol. 2015;64:e1–e25. doi: 10.1093/sysbio/syu081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ellis P, et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 2021;16:841–871. doi: 10.1038/s41596-020-00437-6. [DOI] [PubMed] [Google Scholar]
- 58.Moore L, et al. The mutational landscape of normal human endometrial epithelium. Nature. 2020;580:640–646. doi: 10.1038/s41586-020-2214-z. [DOI] [PubMed] [Google Scholar]
- 59.Van Loo P, et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA. 2010;107:16910–16915. doi: 10.1073/pnas.1009843107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Spencer Chapman M, et al. Lineage tracing of human development through somatic mutations. Nature. 2021;595:85–90. doi: 10.1038/s41586-021-03548-6. [DOI] [PubMed] [Google Scholar]
- 61.Hoang DT, et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 2018;18:11. doi: 10.1186/s12862-018-1131-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Jahn K, Kuipers J, Beerenwinkel N. Tree inference for single-cell data. Genome Biol. 2016;17:86. doi: 10.1186/s13059-016-0936-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.R Core Team. R: A Language and Environment for Statistical Computing (2020).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data files necessary to run the analysis in https://github.com/josegcpa/clonal_dynamics are freely available at 10.6084/m9.figshare.15029118. All sequencing data have been deposited in the European Genome–phenome Archive (EGA) (https://www.ebi.ac.uk/ega/). Targeted sequencing data have been deposited with EGA accession numbers EGAD00001007682 and EGAD00001007683; WGS data have been deposited with accession number EGAD00001007684. Data from the EGA are accessible for research use only to all bona fide researchers, as assessed by the Data Access Committee (https://www.ebi.ac.uk/ega/about/access). Data can be accessed by registering for an EGA account and contacting the Data Access Committee.
All analyses reported in this study used the statistical software R (v.3.6.3). All R files used for the longitudinal and phylodynamic modelling and validation are publicly available at https://github.com/josegcpa/clonal_dynamics. All files used for the construction of phylogenetic trees are publicly available at https://github.com/margaretefabre/Clonal_dynamics.