Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Apr 1.
Published in final edited form as: Hum Genet. 2023 Feb 4;143(4):545–549. doi: 10.1007/s00439-023-02526-4

Interpreting variants in genes affected by clonal hematopoiesis in population data

Sanna Gudmundsson 1,2,3, Colleen Carlston 1,3, Anne O’Donnell-Luria 1,2,3
PMCID: PMC10400727  NIHMSID: NIHMS1886285  PMID: 36739343

Abstract

Reference population databases like the Genome Aggregation Database (gnomAD) have improved our ability to interpret the human genome. Variant frequencies and frequency-derived tools (such as depletion scores) have become fundamental to variant interpretation and the assessment of variant-gene-disease relationships. Clonal hematopoiesis (CH) obstructs variant interpretation as somatic variants that provide proliferative advantage will affect variant frequencies, depletion scores, and downstream filtering. Further, default filtering of variants or genes associated with CH risks filtering bona fide germline variants as variants associated with CH can also cause Mendelian conditions. Here we provide our insights on interpreting population variant data in genes affected by clonal hematopoiesis, as well as recommendations for careful review of 36 established CH genes associated with neurodevelopmental conditions.

Keywords: Clonal hematopoiesis, population database, variant interpretation, somatic variants


Reference population databases are a fundamental tool in variant interpretation, demonstrating the spectrum of genetic variation observed (and not observed) in the general population, and guiding the assessment of variant- and gene-disease relationships. The Genome Aggregation Database (gnomAD) is the most widely used publicly available population database, providing human sequencing data from more than 195,000 individuals through the gnomAD browser (https://gnomad.broadinstitute.org/) (Gudmundsson et al. 2022). We have previously highlighted difficulties with interpreting population variant data in genes affected by clonal hematopoiesis (Carlston et al. 2017; Karczewski et al. 2020), where somatic variation that provides a growth advantage can accumulate in blood cells (Silver, Bick, and Savona 2021). Since variant allele frequencies that are higher than expected for a Mendelian phenotype are used as strong evidence against pathogenicity (Richards et al. 2015), the effect of clonal hematopoiesis on presumed germline allele frequencies in population databases can have a major impact on variant interpretation. Here we provide our insights and recommendations for interpreting population variant data in genes affected by clonal hematopoiesis.

Somatic variants arise by chance over time and the number of somatic variants will accumulate with age (Figure 1A). Most impart no competitive advantage and will remain at a low allele balance, but occasionally a variant arises that increases cell proliferation. Whole blood is the most common source of DNA for genomic sequencing studies, including the vast majority of samples in gnomAD, so hematopoietic proliferative advantage can result in somatic variation rising to higher allele balance and inflating variant frequencies, thereby complicating the measurement of germline allele frequency (Avramović et al. 2021; Brunet et al. 2022; Carlston et al. 2017; Karczewski et al. 2020). Efforts are made to exclude somatic variants from reference allele frequencies; for example, gnomAD only includes variants with an allele balance between 20–80% for heterozygous variants. However, somatic variants in genes classically known to be affected by clonal hematopoiesis, such as ASXL1, DNMT3A, and TET2, persist after this quality control process, as the proliferation advantage of these variants is so strong that it gives rise to higher allele balances comparable to those seen with germline variants (Karczewski et al. 2020). Variants in these genes must therefore be interpreted with caution as somatic variation will be present in databases and can affect downstream applications and analysis. For instance, variant depletion might not be reflected in pLI (probability of being loss-of-function intolerant), LOEUF (loss-of-function observed/expected upper bound fraction), or missense constraint scores (Figure 1B), and thus downstream variant filtering in research and clinical applications using allele frequencies or these scores may require adjustment.

Figure 1: Clonal hematopoiesis effect on population data.

Figure 1:

(A) Pathogenic variants that would cause a genetic condition if present as germline variants may arise as somatic variants (star). These variants can have a proliferative advantage leading to inflated allele frequencies in population databases. (B) Metrics based on presumed germline allele frequency, like gnomAD constraint scores, may be difficult to interpret in the context of known somatic mosaicism (Lek et al. 2016; Samocha et al. 2014; Brunet et al. 2022). (C) Somatic variants can be recognized by lower allele balance distributions and higher age distributions.

Deprioritization of variants with higher than expected population frequencies for disease prevalence is not infallible. For example, pathogenic gain-of-function missense variants in DNMT3A associated with both hematologic malignancies and Tatton-Brown-Rahman syndrome are enriched in population databases of blood-derived DNA (Shen et al. 2017; Brunet et al. 2022). Similarly, pathogenic ASXL1 predicted loss of function variants are observed in gnomAD, which could suggest incomplete penetrance of Borhing-Opitz syndrome but this has not been reported to date (Carlston et al. 2017). Somatic mosaicism with clonal expansion in the blood is a more parsimonious explanation for the presence of these variants in reference databases, and is supported by the presence of skewed allele balance and older age of individuals with these variants (Figure 1C). Figure 2 demonstrates skewed allele balance and sample age distribution for representative pathogenic variants in DNMT3A, ASXL1, SETBP1, PTPN11, and NRAS. Of note, age data is available for most but not all individuals in gnomAD.

Figure 2: Variant allele balance and sample age distribution.

Figure 2:

Manual assessment of allele balance and age distributions can differentiate variants more likely to be somatic than germline. The expected distribution of (A) allele balance and (B) age, displaying data for the ASXL1 synonymous chr20:31023500C>T germline variant that does not show evidence of clonal hematopoiesis and provides a baseline expected distribution for these parameters (top row). Here are examples of skewed allele balance (low) and age (high) for variants in DNMT3A, ASXL1, SETBP1, PTPN11, and NRAS reported to cause autosomal dominantly inherited Mendelian disorders in ClinVar but also with inflated allele frequencies likely due to clonal hematopoiesis. Age data is available only for a subset of gnomAD individuals.

Pathogenic variants associated with disorders with autosomal recessive inheritance patterns (DNMT3B, ERCC2, TET2) are expected to be present in population databases due to unaffected carriers and are less likely to affect variant interpretation. However, variants associated with severe, early-onset conditions of autosomal or X-linked dominant inheritance are generally expected to be absent from population databases. We have provided a list of 36 previously established clonal hematopoiesis genes (Pich et al. 2022; Brunet et al. 2022) that are also associated with neurodevelopmental conditions with dominant inheritance patterns (Box 1, Supplementary Table 1). It is also worthwhile to note that pathogenic variants in genes associated with mosaicism as a primary mechanism of disease (e.g. GNAS, PIK3CA, SMO) may also be present in population databases. All the examples provided have disease-gene relationships curated in the Online Mendelian Inheritance in Man (OMIM 2022) (Amberger et al. 2019) or are established candidates for neurodevelopmental conditions per previous review (Brunet et al. 2022). It is intriguing that there are numerous chromatin modifiers (e.g. histone lysine methyltransferases and acetyltransferases) that play a dual role in both age-related clonal hematopoiesis and neurodevelopment. It has been observed that neurodevelopmental genes are enriched for genes with an epigenetic function (Ciptasari and van Bokhoven 2020), and that epigenetic genes often have important roles in cancer (Paul, Pillai, and Kumar 2021), but the core cell function(s) linking pathways of early neurodevelopment and tumorigenesis remains to be elucidated.

Box 1: Genes associated with clonal hematopoiesis (Brunet et al. 2022; Pich et al. 2022) and neurodevelopmental disorders of dominant autosomal or X-linked inheritance. Genes in bold were reported by both Brunet et al., and Pich et al.

AFF3, ARID2, ASXL1, BCOR, BRAF, BRCC3, CBL, CREBBP, CTCF, CUX1, DNMT3A, EZH2, FOXP1, GNB1, IDH2, KDM6A, KMT2C, KMT2D, KRAS, LZTR1, MYCN, NF1, NOTCH1, NRAS, PPM1D, PTPN11, PTPRD, RAD21, SETD2, SETDB1, SF3B1, SMC1A, SRSF2, STAG2, SUZ12, U2AF1

Apart from clonal hematopoiesis, the presence of disease-associated variants in population databases could also be due to post-zygotic constitutional mosaicism, variable expressivity, incomplete penetrance, age of onset beyond the age of sample collection, sample swap, or sequencing artifact (Gudmundsson et al. 2021). Importantly, genetic regions prone to sequencing artifacts can also genuinely be associated with clonal hematopoiesis or with disease, depending on context. For example, the ASXL1 p.Gly646Trpfs*12 variant is causative of Bohring-Opitz syndrome as confirmed by Sanger sequencing in a previously reported case (Urreizti et al. 2018). However, this variant is present in 121 individuals in gnomAD v2 and 87 individuals in the gnomAD genome (v3) non-v2 dataset. This variant resides in a G-homopolymer region that is prone to polymerase slippage and its presence in population sequencing data is likely due to a combination of sequencing artifacts and clonal hematopoiesis, with the latter more likely for variants with higher allele balance and no strand bias (Alberti et al. 2018). We recommend that any pathogenic/likely pathogenic variants that may be present in population databases at a higher-than-expected allele frequency be flagged to ensure that they are reviewed during analysis. At the Broad Center for Mendelian Genomics, we routinely flag ClinVar pathogenic/likely pathogenic variants with <5% allele frequency for review in exome and genome analysis.

The converse scenario - a pathogenic/likely pathogenic variant that is reported in testing for a patient without any stigmata of that particular condition due to clonal hematopoiesis in the patient - also has implications for clinical practice. Brunet et al. provide examples of this involving DNMT3A, ASXL1, and PPM1D, where variant allele balance and testing of other available tissues, such as fingernails, in patients ranging from 59–85 years of age helped clarify that the variants identified by sequencing blood samples were in fact somatic. Anecdotally, we have also encountered this scenario in an asymptomatic older patient undergoing exome sequencing for cancer predisposition who was found to have a pathogenic FBN1 variant. As clinical sequencing becomes more broadly available, including large gene panels, exomes, and genomes where testing is not necessarily targeted to genes of interest, these cases reinforce the importance of careful phenotyping and, in some cases, further analysis to aid in the interpretation of a genetic test result.

In summary, population reference databases like gnomAD are essential in variant interpretation but must be applied judiciously. Prior to the debut of population databases many human genetic studies relied on relatively small numbers of cases and controls and were often underpowered to detect true disease-variant relationships. Now we have a wealth of population sequencing data mainly based on blood-derived DNA; however, it is important to recognize the phenomenon of clonal hematopoiesis and how this affects blood-derived sequencing. In some cases, manual assessment of allele balance and age distributions may help differentiate somatic versus germline variation. Furthermore, downstream analysis derived from presumed germline variant frequencies, such as depletion scores or constraint metrics (e.g. missense Z-score, pLI, and LOEUF scores), may be confounded by clonal hematopoiesis. Providing sufficient clinical information when ordering genetic testing aids clinical laboratories in assessing when a pathogenic variant is a poor fit for a clinical phenotype and thus more likely to be somatic. We hope that a broader awareness of genes susceptible to clonal hematopoiesis may help avert both missed diagnoses and misdiagnoses.

Supplementary Material

Supplementary Table 1

Supplementary Table 1: Genes from Box 1. Phenotype and references for 36 genes previously associated with clonal hematopoiesis and neurodevelopmental disorders of autosomal or X-linked dominant inheritance.

Acknowledgment

This work was supported by National Human Genome Research Institute grant U24HG011450 and UM1HG008900. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. S.G. was supported by The Wallenberg Foundation scholarship program for postdoctoral studies at MIT and the Broad Institute.

Funding

National Human Genome Research Institute, Grant/Award Number: U24HG011450 and UM1HG008900. S.G. was supported by the Knut and Alice Wallenberg Foundation scholarship program for postdoctoral studies at MIT and the Broad Institute.

Footnotes

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

This review did not generate novel data and does not require ethics approval nor consent from participants (all data is publicly available).

Data Availability

The gnomAD data is displayed on the browser https://gnomad.broadinstitute.org/, available for download on https://gnomad.broadinstitute.org/downloads through Google Cloud Public Datasets, the Registry of Open Data on AWS, Azure Open Datasets, and the UCSC genome browser.

References

  1. Alberti Michael O., Srivatsan Sridhar Nonavinkere Jin, McNulty Samantha N., Chang Gue Su, Miller Christopher A., Dunlap Jennifer B., et al. 2018. “Discriminating a Common Somatic ASXL1 Mutation (c.1934dup; p.G646Wfs*12) from Artifact in Myeloid Malignancies Using NGS.” Leukemia 32 (8): 1874–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Amberger Joanna S., Bocchini Carol A., Scott Alan F., and Hamosh Ada. 2019. “OMIM.org: Leveraging Knowledge across Phenotype-Gene Relationships.” Nucleic Acids Research 47 (D1): D1038–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Avramović Vladimir, Frederiksen Simona Denise, Brkić Marjana, and Tarailo-Graovac Maja. 2021. “Driving Mosaicism: Somatic Variants in Reference Population Databases and Effect on Variant Interpretation in Rare Genetic Disease.” Human Genomics 15 (1): 71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brunet Theresa, Berutti Riccardo, Dill Veronika, Hecker Judith S., Choukair Daniela, Andres Stephanie, Deschauer Marcus, et al. 2022. “Clonal Hematopoiesis as a Pitfall in Germline Variant Interpretation in the Context of Mendelian Disorders.” Human Molecular Genetics, February. 10.1093/hmg/ddac034. [DOI] [PubMed] [Google Scholar]
  5. Carlston Colleen M., O’Donnell-Luria Anne H., Underhill Hunter R., Cummings Beryl B., Weisburd Ben, Minikel Eric V., Birnbaum Daniel P., et al. 2017. “Pathogenic ASXL1 Somatic Variants in Reference Databases Complicate Germline Variant Interpretation for Bohring-Opitz Syndrome.” Human Mutation 38 (5): 517–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Ciptasari Ummi, and van Bokhoven Hans. 2020. “The Phenomenal Epigenome in Neurodevelopmental Disorders.” Human Molecular Genetics 29 (R1): R42–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gudmundsson Sanna, Karczewski Konrad J., Francioli Laurent C., Tiao Grace, Cummings Beryl B., Alföldi Jessica, Wang Qingbo, et al. 2021. “Addendum: The Mutational Constraint Spectrum Quantified from Variation in 141,456 Humans.” Nature, August. 10.1038/s41586-021-03758-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Gudmundsson Sanna, Singer-Berk Moriel, Watts Nicholas A., Phu William, Goodrich Julia K., Solomonson Matthew, Genome Aggregation Database Consortium, Heidi L. Rehm Daniel G. MacArthur, and O’Donnell-Luria Anne. 2022. “Variant Interpretation Using Population Databases: Lessons from gnomAD.” Human Mutation 43 (8): 1012–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Karczewski Konrad J., Francioli Laurent C., Tiao Grace, Cummings Beryl B., Alföldi Jessica, Wang Qingbo, Collins Ryan L., et al. 2020. “The Mutational Constraint Spectrum Quantified from Variation in 141,456 Humans.” Nature 581 (7809): 434–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Lek Monkol, Karczewski Konrad J., Minikel Eric V., Samocha Kaitlin E., Banks Eric, Fennell Timothy, O’Donnell-Luria Anne H., et al. 2016. “Analysis of Protein-Coding Genetic Variation in 60,706 Humans.” Nature 536 (7616): 285–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Paul Aswathy Mary, Pillai Madhavan Radhakrishna, and Rakesh Kumar. 2021. “Prognostic Significance of Dysregulated Epigenomic and Chromatin Modifiers in Cervical Cancer.” Cells 10 (10). 10.3390/cells10102665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Pich Oriol, Reyes-Salazar Iker, Gonzalez-Perez Abel, and Lopez-Bigas Nuria. 2022. “Discovering the Drivers of Clonal Hematopoiesis.” Nature Communications 13 (1): 4267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Richards Sue, Aziz Nazneen, Bale Sherri, Bick David, Das Soma, Gastier-Foster Julie, Grody Wayne W., et al. 2015. “Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.” Genetics in Medicine: Official Journal of the American College of Medical Genetics 17 (5): 405–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Samocha Kaitlin E., Robinson Elise B., Sanders Stephan J., Stevens Christine, Sabo Aniko, McGrath Lauren M., Kosmicki Jack A., et al. 2014. “A Framework for the Interpretation of de Novo Mutation in Human Disease.” Nature Genetics 46 (9): 944–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Shen Wei, Heeley Jennifer M., Carlston Colleen M., Acuna-Hidalgo Rocio Willy M., Dent Karin M., Douglas Ganka V., et al. 2017. “The Spectrum of DNMT3A Variants in Tatton-Brown-Rahman Syndrome Overlaps with That in Hematologic Malignancies.” American Journal of Medical Genetics. Part A 173 (11): 3022–28. [DOI] [PubMed] [Google Scholar]
  16. Silver Alexander J., Bick Alexander G., and Savona Michael R.. 2021. “Germline Risk of Clonal Haematopoiesis.” Nature Reviews. Genetics 22 (9): 603–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Urreizti Roser, Gürsoy Semra, Castilla-Vallmanya Laura, Cunill Guillem, Rabionet Raquel, Erçal Derya, Grinberg Daniel, and Balcells Susana. 2018. “The ASXL1 Mutation p.Gly646Trpfs*12 Found in a Turkish Boy with Bohring-Opitz Syndrome.” Clinical Case Reports 6 (8): 1452–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1

Supplementary Table 1: Genes from Box 1. Phenotype and references for 36 genes previously associated with clonal hematopoiesis and neurodevelopmental disorders of autosomal or X-linked dominant inheritance.

Data Availability Statement

The gnomAD data is displayed on the browser https://gnomad.broadinstitute.org/, available for download on https://gnomad.broadinstitute.org/downloads through Google Cloud Public Datasets, the Registry of Open Data on AWS, Azure Open Datasets, and the UCSC genome browser.

RESOURCES