Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Feb 6.
Published in final edited form as: Nature. 2025 Jul 2;643(8070):47–59. doi: 10.1038/s41586-025-09096-7

The Somatic Mosaicism across Human Tissues Network

Tim H H Coorens 1,*,, Ji Won Oh 2,3,*,, Yujin Angelina Choi 2, Nam Seop Lim 2,3, Boxun Zhao 1,4,5,6, Adam Voshall 1,4,5,6, Alexej Abyzov 7, Lucinda Antonacci-Fulton 8, Samuel Aparicio 9,10,11, Kristin Ardlie 1, Thomas J Bell 12, James T Bennett 13,14, Bradley E Bernstein 1,15,16, Thomas G Blanchard 17, Alan P Boyle 18,19, Jason D Buenrostro 1,20, Kathleen H Burns 1,16,21, Fei Chen 1,20, Rui Chen 22,23, Sangita Choudhury 1,4,6, Harsha V Doddapaneni 22, Evan E Eichler 24,25, Gilad D Evrony 26,27, Melissa A Faith 28,29, Thomas G Fazzio 30, Robert S Fulton 31, Manuel Garber 32, Nils Gehlenborg 33, Soren Germer 11, Gad Getz 1,16,34, Richard A Gibbs 22,23, Raquel G Hernandez 35,36, Fulai Jin 37,38, Jan O Korbel 39,40, Dan A Landau 11,41,42, Heather A Lawson 43, Niall J Lennon 1, Heng Li 33,44, Yan Li 37, Po-Ru Loh 1,45, Gabor Marth 46, Michael J McConnell 47, Ryan E Mills 18,19, Stephen B Montgomery 48,49,50, Pradeep Natarajan 1,51,52, Peter J Park 33,53, Rahul Satija 11,54, Fritz J Sedlazeck 22,23,55, Diane D Shao 4,6,56, Hui Shen 57, Andrew B Stergachis 24,58,59, Hunter R Underhill 60,61, Alexander E Urban 49,62, Melissa W VonDran 12, Christopher A Walsh 4,6,63, Ting Wang 8,43, Tao P Wu 23,64,65, Chenghang Zong 23, Eunjung Alice Lee 1,4,5,6,, Flora M Vaccarino 66,67,68,; the Somatic Mosaicism across Human Tissues Network‡,§
PMCID: PMC12875085  NIHMSID: NIHMS2133093  PMID: 40604182

Abstract

From fertilization onwards, the cells of the human body acquire variations in their DNA sequence, known as somatic mutations. These post-zygotic mutations arise from intrinsic errors in DNA replication and repair, as well as from exposure to mutagens. Somatic mutations have been implicated in some diseases, but a fundamental understanding of the frequency, type and patterns of mutations across healthy human tissues has been limited. This is primarily due to the small proportion of cells harboring specific somatic variants within an individual, making them more challenging to detect than inherited variants. Here, we describe the Somatic Mosaicism across Human Tissues (SMaHT) Network, which aims to create a reference catalog of somatic mutations and their clonal patterns across 19 different tissue sites from 150 non-diseased donors and develop new technologies and computational tools to detect somatic mutations and assess their phenotypic consequences, including clonal expansions. This strategy enables a comprehensive examination of the mutational landscape across the human body, and provides a comparison baseline for somatic mutation in diseases. This will lead to a deep understanding of somatic mutations and clonal expansions across the lifespan, as well as their roles in health, aging, and, by comparison, in diseases.

Introduction

Genetic diversity within the human population has been well described. The Human Genome Project resulted in the first near-complete mapping of the human DNA sequence1, and was followed by large-scale projects, such as the 1,000 Genomes Project2 and the Pangenome project3, that mapped the genetic diversity between individuals and populations. Now, there is growing recognition that extensive genetic variation exists within individuals among different tissues and cells. Two decades after completion of the first draft human genome, the Somatic Mosaicism across Human Tissues (SMaHT) Network plans to map the genetic diversity across different tissues and cells within individuals.

From fertilization onwards, the cells of the human body continuously experience damage to their genome, either from intrinsic causes or from exposure to mutagens49. While the vast majority of DNA damage is repaired, and the genome is replicated with extremely high fidelity, cells steadily acquire somatic mutations throughout life. All cells within an individual harbor somatic mutation, but any given mutation is present in only a subset of the cells, or even in single cells. Hence, somatic mutations are often described as mosaic10,11.

The detection of somatic mutations is challenging. In contrast to inherited variants, somatic mutations only exist in small and variable proportions of cells, ranging from embryonic mutations present in most cells down to mutations present in just a single cell (Fig. 1a). This challenge is exacerbated by the introduction of artifacts and errors resembling low-frequency mutations during DNA library preparation and sequencing12. Current short-read sequencing technologies limit detection of mutations in repetitive regions of the genome and are likely to be less suitable for detection of somatic structural variations.

Figure 1 |. Somatic mutations, causes and patterns:

Figure 1 |

a, Schematic comparison between inherited variants, an early somatic mutation and a late somatic mutation. b, Overview of causes and types of somatic mutations. RT=reverse transcriptase, EN=endonuclease c, Overview of the reported mutation rates of somatic SNV across developmental stages and tissues. Data of first cell divisions3,4,6,40 and later cell divisions3,4,6,40 are SNVs per cell per division; ZGA, zygotic genome activation. Data from fetal development of the early central nervous system (CNS)6 and placenta41 are SNVs per cell per day. Adult data is SNVs per year and estimated for seminiferous tubules42, hematopoietic stem cells26,43,44, B lymphocytes43, neurons21,30,45, T lymphocytes43, bronchial epithelium46, gastric epithelium47, endometrial epithelium48, hepatocytes19, small bowel epithelium19,49, colorectal epithelium19,24,29, cardiomyocytes50.

While most somatic mutations are likely functionally neutral13, some can profoundly alter the phenotype of a cell and are implicated in a wide variety of diseases. Many insights have come from sequencing the genomes of cancers14, the best-known example of disease arising from somatic mutation, but mutagenesis in tumors is often accelerated, and normal mutational patterns are distorted by genome instability. More recently, mapping the patterns of somatic mutations in normal tissues, exemplified by efforts of the Brain Somatic Mosaicism Network (BSMN), and other studies36,1526, has identified a role for somatic mutations in developmental syndromes, neurological diseases, and inflammatory disorders2739. Despite these efforts, there is currently no robust reference data set of somatic mosaicism across many tissues of a large pool of donors.

In this Perspective, we describe the SMaHT Network, initiated by the NIH Common Fund, which aims to generate a reference catalog of somatic variation from 150 donors in 19 non-diseased tissue sites. To advance the field beyond what is currently known, the SMaHT Network will perform a comprehensive discovery and analysis of all types of somatic mutations at an unprecedented scale: the joint analysis of mosaicism across many tissues per donor; the robust discovery of structural variants through long-read sequencing and donor-specific assemblies; the widespread and robust application of ultrasensitive sequencing technologies such as Duplex sequencing across sequencing centers. Furthermore, beyond applying established sequencing assays at scale, the SMaHT Network has a strong emphasis on tool and technology development to enable the next generation of somatic mutation studies. Before describing the network goals in detail, we briefly review the current knowledge about somatic mutations in health and disease, as well as the technical challenges in mutation detection. A large part of the SMaHT Network will focus on the development of technologies and computational tools to improve the detection of all types of somatic variation.

Somatic mutations in healthy tissues

Throughout the human lifespan, from conception to death, cells acquire mutations in their DNA (Fig. 1b)14,6,51. These somatic mutations can be the consequence of erroneous repair of damaged DNA bases or DNA strand breaks, errors during replication, chromosome missegregation, or the integration of mobile elements. Somatic mutations can be divided into different types52: substitutions, the vast majority of which are single nucleotide variants (SNVs); small (<50 base pairs) insertions and deletions (indels); structural variants (SVs), including segmental duplications, large deletions, translocations, inversions, mobile element insertions (MEIs) and complex SVs, including chromothripsis and chromoplexy; and other large chromosomal aberrations, such as whole-chromosome gains and losses. Duplications, deletions and whole-chromosome gains and losses are also referred to as copy number variants (CNVs) or mosaic chromosomal alterations. These classes differ profoundly in their underlying causes and patterns across tissues and their phenotypic effects on cells. In normal tissues, SNVs are by far the most common type of somatic variation, followed by indels. SVs and large chromosomal aberrations are observed less frequently27, but typically affect more base pairs and thus may have larger functional effects. However, most previous studies on somatic mutations relied on short-read DNA sequencing, which may fail to detect various types of SVs. Studies of germline differences has shown SVs are far more abundant, but the majority are missed by short-read approaches53.

Different mutagenic processes cause distinct patterns of somatic mutations, depending on the types of DNA damage incurred and the pathways responsible for DNA lesion repair. Research over the past decade has deconvolved these patterns into mutational signatures and linked certain signatures to specific mutagens, such as ultraviolet light, tobacco smoke, chemotherapy, or natural age-related accumulation of endogenous mutations54. Mutational signatures are most commonly applied to SNVs55 but they have been defined for other classes of somatic mutations, including indels55, chromosomal alterations56,57 and SVs55,58. In the context of SNVs, mutational signatures reflect the distribution of specific base changes within their trinucleotide contexts.

All normal tissues, including post-mitotic cells, exhibit SNV mutational signatures linked to clock-like endogenous processes (single base substitution signature 1 (SBS1) or SBS5) and, to a lesser extent, oxidative damage (SBS18)42,50,59. Mutational signatures linked to mutagenic exposure can be confined to specific organs, such as UV damage (SBS7) in the skin60 or skin-resident T-lymphocytes43, damage from tobacco smoke (SBS4) in the bronchial epithelium of the lung46, and exposure to a genotoxic strain of E. coli (SBS88)24 in the large intestine. These exposure differences drive some of the variation in the types of somatic mutations observed across different tissues of the human body3,42,61. Further, different mutational processes show different correlations with genomic features, such as replication timing, replication strand and transcription strand6265, reflecting genomic biases of DNA damage and repair.

The somatic mutation rate varies across human tissues and life stages (Fig. 1c). During the initial embryonic cell divisions, somatic SNVs accumulate at a high rate of approximately 3 per division, likely due to the high division rate and delayed activation of the zygotic genome3,4,40. Afterwards the mutation rate decreases (~1 SNVs per division) during development in utero, both in embryonic, such as the fetal brain6,66,67, and extraembryonic tissues, such as the placenta41. After birth, mutation rates further decline 5- to 10-fold and vary substantially across tissues, from 16-20 SNVs per year in post-mitotic cells such as neurons21,45,59,67 to 44 SNVs per year in colonic stem cells24 (Fig. 1c). Germ cells have the lowest somatic mutation rate reported23, in line with the parental age effect on de novo germline mutations42. While division rate may influence the endogenous somatic mutation rate, there are likely other factors that modulate both mutagenesis and repair of DNA damage6870.

Large somatic mutations such as SVs, chromosomal alterations and MEIs are detected at much less frequently than SNVs and indels. While somatic aneuploidy appears to be rare, sub-chromosomal structural variations affect 13-41% of neurons18,34,71,72. Frequent CNVs, mostly duplications, of likely developmental origin have been detected in approximately 7% of brains from the BSMN consortium34, and mosaic chromosomal alterations were observed in approximately 5% of blood samples in the UK Biobank73. Single-neuron DNA sequencing of mobile element-enriched libraries or whole genomes has revealed MEI events that appear to occur during development and create mosaicism in the human brain5,74,75. Bulk sequencing approaches have also detected a few examples of somatic MEIs in the brain76 and non-brain tissues including the heart77, fibroblasts77, and liver78. Recent somatic MEI profiling in colorectal epithelial single-cell clones has indicated peak insertion rates during early embryogenesis79. Considering the potential impact of these large mutations on the sequence, splicing, or expression of genes80,81, it is valuable to understand their prevalence across human tissues during development and aging.

While most somatic mutations do not discernibly affect the phenotype of a cell, some somatic mutations are under selection in different tissues. Such driver mutations may lead to a proliferative advantage or increased survival of the cell and its progeny, resulting in clonal expansions in tissues. Cancer is the canonical example of somatic evolution and often involves the stepwise accumulation of key somatic mutations and genomic instability1,82. Mutations typically associated with cancer can be abundant across normal tissues with age. For comparison, in a typical individual of age 60, approximately 90% of the endometrial epithelium harbors a driver mutation48, whereas this is true of only about 1% of colonic epithelium24, despite the latter having a much higher somatic mutation rate24,26. This difference is likely caused by the menstrual cycles of shedding and regrowth in the endometrium. Likely due to similar clonal expansion in development or aging, about 6% of individuals harbor a 3 to 20-fold higher than average number of detectable SNVs in their brain6. These varying proportions of clonally expanded cell populations likely reflect differences in tissue architecture, cell turnover, regeneration, and selection pressures, but much is still unknown.

While many driver mutations in normal tissues can be identical to those found in corresponding cancer types, their abundance and phenotypic consequence may differ profoundly as normal tissues may experience different selection pressures than cancer. For example, clones with NOTCH1 mutations are exceedingly abundant in normal esophageal epithelium, at even higher rates than esophageal cancers83. NOTCH1 mutant clones have a lower propensity of malignant transformation and even outcompete precancerous clones in the esophagus84,85. These observations suggest that characterizing the somatic mutation landscape in normal individuals will be important to understand the role of these mutations in pathological phenomena such as cancer.

Lastly, somatic mutations can be used as intrinsic barcodes to create phylogenies and trace the ancestries of cells, such that it becomes possible to quantitatively study human development from somatic mutations ascertained in adult donors3,4,6,20,25,51,65,66,74,86. This approach has been applied to studying embryogenesis, clonal expansions across the lifespan and the origins of childhood cancers87. Since the allele frequency of a mutation reflects the fraction of cells within a population that harbors it, this method can be used to quantitatively assess the contribution of embryonic progenitors to the adult body. Intriguingly, such studies have found that one of the two daughter cells of the zygote often has at least twice as many descendant cells as the other3,4,25,66,67,88,89, likely due to cellular bottlenecks in embryogenesis, developmental cell death, or migratory patterns, and confirming earlier observations in mice51,90.

Taken together, these initial studies on somatic mutations in normal tissues have shown the variability of rates, patterns, and selection of mutations across tissues. It is unknown, however, how variable these patterns are between individuals and how different types of somatic mutations are correlated with inherited genetic background, environmental exposures, or other behavioral characteristics. In addition, mutation discovery is severely hampered in poorly mapped regions of the genome, including acrocentric chromosomes, centromeric and repetitive regions, and hence, the mutational patterns in these regions are largely unknown. Thus, identification of the differences in mutational patterns between tissues and individuals, particularly in the context of specific organs21,26,29,30,34,91,92, may have profound clinical implications.

Somatic mutations and disease

Somatic mutations can profoundly alter the phenotype of a cell and have been implicated in human diseases. Besides cancer, various other diseases and conditions can be a result of somatic mutations, including cardiovascular anomalies, immunological and neurological disorders26,3034,43,93,94. Notably, early somatic mutations can cause clonal expansions and alterations in the differentiation programs of precursor cells that subsequently can lead to paediatric cancers and organ overgrowth87,95,96. Among the first described instances of somatic mutagenesis, PI3K-AKT-mTOR pathway mutations involving the brain, were associated with brain malformations leading to intractable epilepsy33,97,98. Other examples are NRAS mutations leading to congenital melanocytic nevi99 and UBA1 mutations in hematopoietic stem cells100 leading to VEXAS syndrome, a rare and severe inflammatory disorder. Somatic expansions of short tandem repeats in the brain can cause cell death and neurodegeneration101, and underpin Huntington’s disease102,103. Large SVs, including CNVs and MEIs, have been also implicated in neurodevelopmental and neurodegenerative disorders76,104,105.

The effects of somatic mutations can be highly specific to the timing and tissue of origin. For example, an activating PIK3CA mutation acquired during development can lead to widespread overgrowth across organs and vascular malformations94. However, PIK3CA mutations acquired after development can lead to cavernomas in the brain106 and are also a common driver mutation observed in normal colonic24 and endometrial epithelium48.

Clonal expansions can also indirectly lead to or influence other diseases26. An example is clonal hematopoiesis of indeterminate potential (CHIP), characterized by a clonal expansion within the hematopoietic stem cell compartment driven by somatic mutations. CHIP is highly prevalent in the context of normal aging26. Besides acting as a potential cancer precursor clone, CHIP has been linked to a variety of non-cancer diseases, such an increased risk of cardiovascular disease107 and infections108.

Conversely, diseases can also select clones with certain adaptive somatic mutations. Recently, it has been shown that inflammatory bowel disease leads to the preferential remodeling of the colonic epithelium with clones harboring IL17 and Toll-like receptor pathway mutations29,109. Likewise, chronic liver disease selects for clones of hepatocytes that escape the toxicity imposed by the disease, notably by recurrent, independent mutations in FOXO1, CIDEB and GPAM, all involved in lipid metabolism92.

Taken together, research over the past years has shown that somatic evolution is ubiquitous in normal tissues and is fundamental to our understanding of the causes, mechanisms, and consequences of disease, and the normal process of aging.

The SMaHT Network

The Somatic Mosaicism across Human Tissues (SMaHT) Network, funded by the NIH Common Fund, was established with the goal of transforming our understanding of how somatic variation in human cells influences biological processes. SMaHT will accomplish this through the following aims: 1) generate a comprehensive dataset of somatic variants across human tissues (Fig. 2); 2) develop tools and technologies to optimize detection and characterization of various types of somatic variants; and 3) create a somatic mutation database that is widely used by researchers and the wider public, and interoperable with similar data sets.

Figure 2 |. Tissue sampling:

Figure 2 |

Overview of sampling from 19 primary tissue sites, spanning three developmental germ layers (endoderm, mesoderm, and ectoderm) and germ cells. While organs represent mixture of cells derived from the germ layers (e.g, skin epidermis (ectoderm) versus dermis (mesoderm) and adrenal gland medulla (ectoderm) versus cortex (mesoderm), we have indicated the major germ layer represented by each organ. Gonads represent germ cells and their supportive structures (mesoderm), while buccal swabs are variable mixtures of germ layers (mesoderm and ectoderm).

The Network is comprised of five Genome Characterization Centers (GCCs), 14 Tool and Technology Development projects (TTDs), an Organizational Center (OC), a Data Analysis Center (DAC) and a Tissue Procurement Center (TPC) and includes over 250 researchers from 52 institutions. The GCCs are tasked with producing a core dataset of somatic mutations for the SMaHT Network from multiple tissues collected by TPC, while TTDs are tasked with developing novel experimental assays and computational tools. The DAC will integrate the data generated by GCCs and TTDs to build the somatic mutation catalog, data portal, and the analysis work bench for the Network. The OC will coordinate the Network activities and focus on outreach efforts and building liaison with other genomics consortia. The SMaHT Network has implemented a set of policies (https://smaht.org/policies/), including a policy to allow external researchers to apply for associate membership of the Network.

The tissues to be profiled by the Network include those arising from the three germ layers and germlines within the human body, which will give the opportunity to delineate early somatic mutations that are common across all tissues, as well as later mutations that are unique to certain tissues (Fig. 2). The TPC is partnering with multiple organ procurement organizations (OPOs) in the US for the screening, authorization, and recovery of tissues from post-mortem organ and tissue donors. Tissues will be collected following transplant recovery and include ascending and descending colon, esophagus, lung and liver (predominantly endoderm); blood, heart, aorta and skeletal muscle (predominantly mesoderm); brain, adrenal gland, sun-exposed and non-sun-exposed skin (predominantly ectoderm). We also aim to collect buccal swabs to assess the extent of the somatic mutation landscape that can be gleaned from clinically accessible tissues in living donors. To study mutagenesis in germ cells, we also aim to collect ovaries and testes. Lastly, to enable a variety of experimental techniques requiring live cells, we will derive fibroblast cultures from dermis (skin). All tissues are requested to be recovered from each donor approached for the SMaHT tissue collection. The number and type of samples collected from each donor will vary based on donor authorization and eligibility (Box 1), but the goal is to recover as many tissues from a single donor as possible. To study the mechanisms and consequences of somatic mosaicism across the lifespan, these post-mortem donors will span the human adult age ranges from 18 to over 85. Race and ethnicity of donors are assessed using a single-question framework.

Box 1.

Inclusion criteria:

  • Donor over 18 years old

  • Collection can be completed within 24 hours of cross-clamp or cardiac cessation

Exclusion criteria:

  • History of HIV, HCV or HBV

  • History of IV drug use in last 5 years

  • Chemotherapy or radiation treatment in the past 24 months

  • Known chromosomal or genetic disorder

  • Current positive blood cultures (sepsis)

  • Active/metastatic cancer

  • Diagnosed with multisystem organ failure

  • Received organ or allogeneic bone marrow transplant

  • Received whole blood transfusion in 48 hours prior to cross clamp or cardiac cessation

Exclusion criteria for brain donation specifically:

  • Cause of death related to penetrating brain injury or head trauma

  • Brain dead or ventilator dependent for greater than 24 hours

To maximize scientific and clinical impact of the data set, the TPC will collect a large amount of donor metadata during donation and biospecimen collection, building on practices developed for the Genotype-Tissue Expression (GTEx)110 and developmental GTEx projects111. De-identified donor-level data will include demographic information, medical history, sample-based laboratory test results and death circumstances. Sample-level data will include tissue type and location, ischemic time, and tissue metrics from pathology review. Pathology images will be made publicly available. When possible, tissue sampling will align with the common coordinate framework structure of other large-scale projects. For all of these biospecimens, sufficient fresh-frozen material will be collected and banked to enable all core assays as well as implementation of novel emerging technologies. Fixed samples for pathology review will be collected from adjacent sites to the fresh-frozen specimens utilizing a standardized collection schema developed for each tissue type.

To pursue a demographically diverse and evenly sex-distributed pool of donors, the SMaHT Network includes an Ethical, Legal and Social Implications (ELSI) project112 to promote diversity, equity, and inclusion efforts. In a recent call to action, the American Society for Human Genetics stated that “addressing underrepresentation in human genomics starts with meaningful engagement of underrepresented communities112.” This ELSI substudy seeks to implement a model called Diversity Equity and Inclusion 360 (DEI 360), which includes engaging geographically, racially-ethnically, and socio-culturally diverse stakeholders. These stakeholders include family decision-makers, tissue requesters, community advisory board members, and multi-disciplinary specialty committee members throughout the entire duration of the SMaHT Network. Feedback from community stakeholders will be leveraged to inform communication and enrollment efforts as well as dissemination of study findings.

The SMaHT Network is uniquely positioned to collaborate with many other large consortia and programs. These include: the Human Pangenome Reference Consortium (HPRC)9, to leverage methods for constructing haplotype-phased genome assemblies; the Impact of Genetics Variation on Function (IGVF) Consortium113, to understand the functional consequences of genetic variation; the developmental GTEx project 111, to access datasets from tissues at early developmental stages; the Human Tumor Analysis Network (HTAN) and PreCancer Atlas, to further understand the progression from normal cells to tumor cells through somatic mutations; and PsychENCODE114, to inform on the phenotypic consequences of brain somatic mosaicism. These collaborations will enrich the individual studies and ultimately, through data integration and cross-network analyses, further enhance our understanding of the context and consequences of somatic mutations.

Producing the somatic mutation catalog

To produce the first phase of the somatic mutation catalog, the SMaHT Network will strike a balance between standard genomic assays, productionized and applied uniformly by the GCCs to all tissues, and bespoke assays developed by the TTD projects, focusing on novel technological approaches. As part of the initial phase of the SMaHT project, benchmarking efforts are nearing completion, using both primary human tissues and cell lines. We have used this benchmarking to determine optimal sequencing coverage, compare the accuracy of variant calling algorithms, and evaluate the utility of long- and short-read sequencing data generated on diverse sequencing platforms from multiple GCCs.

The GCCs will deploy three core assays across all tissue specimens that meet quality thresholds: deep short-read whole-genome sequencing (WGS; over 300x coverage), long-read WGS (over 30x) sequencing, and RNA sequencing (over 50 million reads). The deep short-read WGS will enable the discovery of high allele frequency somatic mutations across tissues acquired early in embryogenesis, as well as discovery of the large clonal expansions arising later in life. Since these core assays will be performed on bulk tissues, composed of heterogeneous cell types, only mutations with a relatively high variant allele frequency (VAF; above 1-2%) will be accurately detectable at the proposed depth of sequencing. The long-read WGS will facilitate the detection of complex SVs, MEIs, and variants in complex genetic loci that have been challenging to accurately study using short-read data, such as the MHC region, centromeres, telomeres, acrocentric DNA including ribosomal DNA, and other tandem-repeat regions of the genome. Ultralong-read sequencing will enable us to generate near telomere-to-telomere donor-specific reference genome assemblies (DSAs) for at least 50 donors and through reducing misalignment, enhance the discovery of diverse types of variants within an individual59, including complex somatic SVs and other mutations in previously unmappable regions of the genome115. Lastly, the RNA sequencing may allow us to assess transcriptional consequences of early mutations and late clonal expansions, as well as, by comparison with single cell RNA sequencing atlases116, cell type composition of heterogeneous tissues.

In addition to these core assays, GCCs will deploy three approaches specifically designed to profile low-frequency somatic mutation: duplex sequencing, single cell WGS, and transcript-based detection of mutations. These technologies, while published and well-tested, represent recent innovations and have not yet been systematically deployed across sequencing centers, or applied at large scale.

As conventional DNA sequencing platforms have a non-trivial sequencing error rate (in the order of 1 in 1,000-10,000), a putative mutation needs to be detected in multiple independent reads to assure it is not artifactual. However, by sequencing both the forward and reverse strands of each individual DNA duplex molecule, this error rate is drastically reduced. Since the reduced error rate is much lower than the expected number of somatic mutations in most tissues, an average mutation burden and mutational profile can be obtained by shallow genome-wide duplex coverage (0.5-2x)45,117. Duplex sequencing of bulk tissue samples is well-suited to finding average mutation burdens and spectra of SNVs and indels within cell populations, but the low depth generally precludes discovery of somatic CNVs and SVs, or the precise inference of variant allele frequency of specific mutations.

Even with a reduced sequencing error rate, bulk DNA sequencing will average out the mutational patterns of all cells and does not allow to assess the variability of mutational patterns between cells or the reconstruction of cell lineages. Instead, sequencing the DNA of single cells or single cell-derived clones will enable the most detailed discovery of somatic mutations. This can be achieved either by expanding single cells in vitro3,6,25,26,43 or laser capture microdissection to isolate naturally occurring clonal populations of cells4,24,48,49,91.

Alternatively, direct single cell DNA sequencing is applicable to all cell types, including non-dividing cells. However, whole-genome amplification can cause allelic or locus dropout, uneven coverage across the genome and artifactual variants introduced during biochemical amplification. The Direct Library Preparation (DLP+)118,119 method avoids whole-genome amplification and allows for the accurate detection of CNVs at the single cell level and other mutations at the population level. The Primary Template-directed Amplification (PTA)30,120 method offers a substantial improvement in data quality over previous single cell amplification methods, resulting in more uniform genome coverage and fewer artifactual variants. A more recent version of PTA, the ResolveOme approach, profiles both the transcriptome and the genome from the same single cell. If validated, this approach will represent a major advance in allowing new mutation detection and cellular phenotyping at the same time. Profiling somatic mutations in single cells will enable us to characterize mutational patterns and associations between mutation types and to reconstruct phylogenetic trees of normal cells across tissues. In cases of polyploid cells, the VAFs of somatic mutations may deviate from the expected 0.5 and ploidy will need to be taken in consideration in downstream analyses.

Lastly, at least some somatic mutations can be inferred from RNA121123. Methods that allow for the interrogation of the full-length transcriptome in single cells, such as Smart-seq3124 or STORM-seq125,126, can facilitate the detection of somatic mutations, such as SNVs, indels and fusion genes within transcribed regions of the genome. This allows assessing cell type specificity for clonal expansion of certain genetic variants. Furthermore, STORM-seq enables quantification of transposable element expression at single cell resolution, which has been shown to be challenging with other single-cell RNA-seq methods125. The single cell data also provide references for a more precise deconvolution of cell types in bulk tissues.

Each of these methods for the detection of somatic mosaic variants presents its own advantages and disadvantages and thus they are complementary (Table 1). For example, while genome-wide duplex sequencing has a lower sequencing error rate and excels at population-level inferences of patterns of short mutations acquired during the entire lifespan, the low depth precludes detection of the precise allele frequency of a specific variant. Bulk sequencing at medium-high coverage (300x) will only detect variants at a sufficiently high frequency (i.e. 1-2%) in tissues, which are mostly acquired in early embryogenesis. Single cell sequencing can in principle detect all variants present in a single cell and allow reconstruction of cell phylogenies, but it requires significant costs and efforts to address genome amplification artifacts. RNA-based mutation discovery allows for direct integration of mutations with transcriptomic information but is naturally confined to expressed regions of the genome. Taken together, these genomic assays function as complementary techniques to detect somatic mutations and will enable the robust interrogation of mutational patterns across human tissues.

Table 1. Comparison between somatic mutation discovery methods.

A combination of different methods can achieve comprehensive mutation discovery and accurate analysis.

Methods Analysis by
Bulk sequencing Duplex sequencing Clonal expansion Single cell WGA (PTA) LCM clones
Applicability to any tissue Yes Yes No Yes No
Mutation types discovered All SNVs, indels All All All
Applicability to long read Yes Yes Yes Inefficient Inefficient
Fraction of genome sampled 100% 30%-100%, depending on fragmentation method 100% ~90% per cell; 100% across cells 100%
Detection of early and clonally expanded mutations Most Minority Depending on clone number Depending on cell number Depending on clone number
Overall mutation spectrum No Yes Yes Yes Yes
Likely amount of artifacts Small (10−4) Very small (<10−8) Small (10−4) Some Small (10−4)
Information on cell lineages No No Yes Yes Yes
Advantage(s) Sensitive detection of high frequency mutations of all types Obtaining overall mutation spectrum even at low (0.5-2x) coverage Accurate mutations discovery at a single cell level Mutation discovery at a single cell level in any tissue Accurate mutations discovery at a single cell level
Limitation(s) Need for high coverage; missing low frequency mutations Only SNVs & indels; missing high frequency mutations at low (0.5-2x) coverage Applicable to culturable or reprogrammable cells <50% sensitivity because of drop-out and amplification artifacts Applicable to tissues with visible clonal substructures

Areas of technology development

As new technologies to interrogate somatic mutations with high resolution or sensitivity are constantly emerging, a large part of the SMaHT Network is devoted to developing new tools and technologies (Table 2). The first area of innovation aims to increase the accuracy of mutation detection in single cells or molecules by further reducing background noise. For single cell WGS, a limited cloning step to create small pools of cells can reduce allelic dropout and amplification artifacts. In parallel, the SMaHT Network aims to reduce the error rate of amplification and sequencing for single cells and molecules through various adaptations of duplex sequencing technologies45,127129. These approaches will allow for the interrogation of the landscape of somatic variation in single cells and complex multicellular tissues with high precision, which is crucial to study tissues without large-scale expansions.

Table 2. Experimental and analytical methods adopted by the SMaHT Network.

The SMaHT Network will apply a set of standard approaches as well as benchmark new methods to improve detection, analysis, and functional annotation of somatic variants across tissues and individuals.

Core assays (bulk) Short-read DNA sequencing (Illumina),Short-read RNA sequencing (Illumina), Long-read DNA sequencing (PacBio, ONT)
Long-read full-length transcript sequencing (PacBio)
Extended assays Single cell DNA sequencing with Primary Template-Directed Amplification (PTA)
Duplex sequencing: Concatenating Original Duplex for Error Correction (CODEC)/ Nanorate Sequencing (NanoSeq)/ Tn5-duplex-seq (CompDuplex-seq and VISTA-seq) / Ultima ppmSeq
Scale-up approaches and technology developments Single molecule sequencing Hairpin Duplex Enhanced Fidelity sequencing (HiDEF-seq)
Single cell duplex sequencing scNanoSeq and scUduplex-seq
Structural variants detection Single cell full length RNA-seq for MEI detection
Single-cell, mini-bulk ME-targeted sequencing (PTA-HAT-seq)
Single-cell Total RNA-seq Miniaturized sequencing (STORM-seq), Strand-seq
Spatial variant detection Slide-tags
Genotype-to-phenotype multi-omics Epigenome Single cell 3C-seq protocols (e.g., Dip-C)
Assay for Transposase-Accessible Chromatin (ATAC-seq), Genotyping of Targeted loci with Chromatin Accessibility (GoT-ChA, GoT-EpiM)
Fiber-seq (single-molecule chromatin accessibility, nucleosome occupancy, TF occupancy, RNA polymerase occupancy, CpG methylation, and genetic variant identification)
var-CUT&Tag (enrichment and epigenetic annotation of variants in regulatory elements)
Strand-seq (nucleosome occupancy and SVs in the same cell)
Transcriptome ResolveOme method
GoT-ChA-RNA
Duplex-seq-based large-scale single-cell dual omics profiling
Proteome GoT-ChA-Pro
Computational tools Donor-specific reference genome assemblies (DSAs) (paired Fiber-seq, ultra-long ONT, and Hi-C data)
Hybrid approach for scWGA
RUFUS (a reference-free, kmer-based variant detection algorithm)
Infrastructure Centralized three germ layer tissue collection banking with donor metadata
Automated analysis pipelines for quality control
A variant catalog containing a curated set of annotated somatic mutations
Computational pipelines with multi-omics platforms
Human pangenome visualization with somatic mutation detection pipelines
A cloud-based infrastructure and a web portal

Secondly, the SMaHT Network aims to increase the sensitivity of SV detection to single molecules or cells. As SVs extend beyond the length of a typical short-read, long-read sequencing unlocks SV detection across the genome, especially for MEIs and other rearrangements in repetitive regions130,131. However, many single cell DNA amplification approaches result in short fragments. Therefore, we are applying long-read sequencing to clonal populations such as iPSC lines which have been used25 in lieu of single cell for lineage reconstructions as they can be expanded and analyzed by bulk sequencing, avoiding in vitro DNA amplification. In addition, MEIs can be cost-effectively assessed by target enrichment assays as new insertions share conserved sequences in each transposon subfamily. We are developing targeted detection of MEI insertions by utilizing Cas9-targeted long-read sequencing132 and PTA-amplified micro-bulk or single cells77. These efforts will unlock the study of SVs and MEIs in all tissues and across the lifespan, even in the absence of clonal expansions.

Thirdly, the SMaHT Network will develop scalable platforms that can perform variant detection spatially in human tissues, through single cell DNA and RNA sequencing with resolved spatial barcodes133,134. This will allow us to study the prevalence and extent of clonal expansions across ages and tissues, especially in organs without a clearly organized tissue architecture.

An outstanding question is the effect of specific somatic mutations on the phenotype of the cells that harbor them. While certain mutations are under positive selection and lead to clonal expansions, how these mutations alter cellular phenotypes is mostly unknown. The consequence of a mutation can be assessed by combining mutational readouts, either through genotyping of specific mutations135,136 or genome sequencing, in combination with functional readouts of cells, such as the transcriptome, proteome, epigenome, methylome, and the chromatin accessibility landscape137142. Interpreting the phenotypic effects of somatic mutations will greatly benefit our understanding of the clinical consequences.

The efforts in tool and technology development within the SMaHT Network are focused on improving precision in somatic mutation detection and interpretation at scale, each addressing vital shortcomings of current assays, with a goal to productionize and deploy many of these within the Network at large. After the development phase, the precise extent and scope of the deployment of these assays across the SMaHT tissues and donors will depend on the cost, scalability and priorities of the Network.

Integration and analysis of data

The low VAF of mosaic variants brings unique challenges in bioinformatic analysis143, and we expect that novel computational methods and tools are needed to fully analyze the data and to increase the sensitivity and specificity of variant detection. Somatic mutation detection algorithms developed in cancer genomics are often inadequate for detecting variants with allele fractions less than 2-5% and simply increasing the depth of sequencing is not cost-effective. Thus, more sophisticated machine learning algorithms that efficiently incorporate various local features near candidate variants may prove useful138140,144.

Other challenges include optimal integration of long-read and short-read data, inference of lineage relationships based on bulk and single cell data, and effective strategies for integrative and comparative analysis of samples across the tissues and across individuals. An important aspect of our analysis will be the use of donor-specific diploid genomes assembled using short Illumina, long PacBio and ultra-long Nanopore and Hi-C reads. Alignment to the donor-specific reference genome137 will allow for more accurate variant identification, especially in repetitive regions, as well as for examination of allele-specific transcriptional and epigenetic modulations associated with genetic variants.

The SMaHT Data Analysis Center (DAC) will lead an effort to collect, curate, and analyze the vast amount of multi-modal data generated on multiple platforms and to create a data resource for the scientific community. The DAC will ensure high data standards with various quality control steps and compile extensive metadata describing experimental and data processing protocols, following the FAIR (Findable, Accessible, Interoperable and Reusable) guidelines145. Scalable and cost-effective analytical workflows will be implemented on a cloud platform with full provenance and docker images to enable reproducibility of the analysis output.

The data generated by the consortium will be made available to the wider scientific community via a user-friendly and secure web portal (https://data.smaht.org). This portal will feature: (i) a reference catalog of somatic variants that can be searched (e.g., by locus, tissue, or phenotypic features such as age) and annotated with information from other genomics databases; (ii) a workbench that enables users to apply the computational pipelines developed by the SMaHT Network to their own data, and (iii) data visualization tools including a multi-scale browser that allows users to navigate the data from a genome-level view to the sequencing read-level view. Visual inspection of variants using such a browser will be particularly helpful in assessing their quality, and the annotations will enable rapid identification of variants that may be functionally relevant.

Conclusion

The SMaHT Network aims to produce a comprehensive reference catalog of somatic mutations, across tissues and individuals, by harnessing the full potential of many different genomic assays, including short and long-read bulk WGS, duplex sequencing, ultralong read sequencing, single cell DNA sequencing, and RNA sequencing (Fig. 3). The Network will develop new tools and technologies to increase our ability to detect somatic mutations as well as infer their phenotypic consequences at greater resolution. All these various data modalities will be integrated, analyzed, and released to the research community and wider public.

Fig. 3 |. Methods, assays and questions:

Fig. 3 |

Overview of sampling methods and sequencing assays deployed in the SMaHT Network, as well as the biological questions, outcomes and inferred mutational patterns from downstream analyses of the catalog of somatic mutations across normal tissues, including mutation rates or burdens, selection, lineage tracing and mutational signatures (reference signatures obtained from https://cancer.sanger.ac.uk/signatures)55.

An extensive catalog of somatic mutations will reveal mutational patterns, rates, and signatures across tissues, allowing us to infer the biological and molecular processes that govern somatic mutagenesis and their adaptive and maladaptive consequences for development and disease (Fig. 3). Our assays can inform on mutations under selection in tissues, which result in clonal expansions and potentially tissue dysfunction. Single cell analyses added to the bulk readouts will further allow us to generate cellular phylogenies of human development, infer embryonic differentiation dynamics and improve our future assessment of de novo germline mutations.

Delineating the full extent of somatic mosaicism greatly exceeds the scope of the Human Genome Project. A typical cell may acquire hundreds to thousands of somatic mutations in a lifetime. There are trillions of cells in a human body and so the total number of somatic mutations acquired in a single individual may well exceed quadrillions, millions of times the size of the human genome. Beyond cataloging somatic variation across tissues, it is essential to understand the causes, patterns, and consequences of somatic mutations in normal cells, and provide a crucial comparison baseline for disease research. The efforts of the SMaHT Network will substantially contribute to our insights into the role of somatic variation in health, aging, and disease.

Supplementary Material

Supplement: participants in the SMaHT network

Acknowledgements

This research is supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under awards U24 MH133204, U24 NS132103, UG3 NS132024, UG3 NS132061, UG3 NS132084, UG3 NS132105, UG3 NS132127, UG3 NS132128, UG3 NS132132, UG3 NS132134, UG3 NS132135, UG3 NS132136, UG3 NS132138, UG3 NS132139, UG3 NS132144, UG3 NS132146, UM1 DA058219, UM1 DA058220, UM1 DA058229, UM1 DA058230, UM1 DA058235, and UM1 DA058236. E.E.E. and C.A.W. are investigators of the Howard Hughes Medical Institute.

COMPETING INTEREST STATEMENT

F.C. is an academic founder of Curio Biosciences and Doppler Biosciences, and scientific advisor for Amber Bio. F.C ’s interests were reviewed and managed by the Broad Institute in accordance with their conflict-of-interest policies. G.G. receives research funds from IBM, Pharmacyclics/Abbvie, Bayer, Genentech, Calico, Ultima Genomics, Inocras, Google, Kite, and Novartis and is also an inventor on patent applications filed by the Broad Institute related to MSMuTect, MSMutSig, POLYSOLVER, SignatureAnalyzer-GPU, MSEye, and MinimuMM-seq, is a founder, consultant, and holds privately held equity in Scorpion Therapeutics and PreDICTA Biosciences, and was also a consultant to Merck, all unrelated to present work. E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. C.Z. is a co-founder and equity holder of Pioneer Genomics Inc. and reports that Baylor College of Medicine filed a patent application related to the CompDuplex-seq or CompDup method. P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech / Roche, and Novartis, personal fees from Allelica, Apple, AstraZeneca, Bain Capital, Blackstone Life Sciences, Bristol Myers Squibb, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Capital, Foresite Labs, Genentech / Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, Novo Nordisk, TenSixteen Bio, and Tourmaline Bio, equity in Bolt, Candela, Mercury, MyOme, Parameter Health, Preciseli, and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. C.T. is the founder of C2T, a consultant for Bayer, a member of the Scientific Advisory Board of PrognomiQ, and receives royalties from Exact Sciences. J.W.O is the founder and CEO of Absolute DNA Inc., with no direct relation to this study, and the interests are managed by University-Industry Foundation in Yonsei University Health System in accordance with their conflict-of-interest policies. E.A.L. is a member of the scientific advisory board for Inocras. All other authors declare no competing interests.

Participants in the Somatic Mosaicism across Human Tissues Network

National Institutes of Health

Richard S. Conroy69, Brionna Y. Hair69, Walter J. Koroshetz70, Roger Little71, Amy C. Lossie71*, Jill A. Morris70, Dena C. Procaccini69, Wendy Wang72

*Dr. Lossie was substantially involved in UM1DA058219, UM1DA058220, UM1DA058229, UM1DA058230, UM1DA058235, and UM1DA058236, consistent with his/her role as Program Officer. She is also the NIH Working Group Coordinator, and has involvement with the remaining awards, consistent with this role.

Organizational Center (U24NS132103)

Ting Wang8,43*, Casey Andrews8, Lucinda Antonacci-Fulton8, Sarah Cody8, Milinn Kremitzki8, Heather A Lawson43, Daofeng Li43, Tina Lindsay8, Wenjin Zhang43

*Contact: twang@wustl.edu

Tissue Procurement Center (U24MH133204)

Thomas J. Bell12*, Thomas G. Blanchard17, Valerie J. Estela-Pro12, Melissa A. Faith28,29, Kayla Giancarlo73, Melissa Grimm73, Azra Hasan12, Raquel G. Hernandez35,36, M. Kathryn Leonard12, Phoebe McDermott29, Mary Pfeiffer73, Isabel Sleeman12, Melissa W. VonDran12

*Contact: tbell@ndriresource.org

Genome Characterization Centers

University of Washington – James Bennett (UM1DA058220)

James T Bennett13,14*, Stephanie Bohaczuk24, Colleen P Davis24, Evan E Eichler24,25, Chris Frazar24, Kendra Hoekzema24, Meng-Fan Huang24, Caitlin Jacques24, Dana M. Jensen13, Tom Kolar24, Youngjun Kwon24, Kelsey Loy13, Yizi Mao24, Sohn Min-Hwan24, Katherine M Munson24, Shane Neph24, Jeffrey Ou13, Nancy L Parmalee13, Minh-Hang M Pham13, Jane Ranchalis59, Luyao Ren24, Adriana E Sedeño-Cortés5, Josh Smith24, Melanie Sorensen24, Andrew B Stergachis24,58,59, Lila Sutherlin13, Mitchell R Vollger24, Chia-Lin Wei24, Jeffrey M Weiss24, Christina Zakarian24

*Contact: jtbenn@uw.edu

Genome Characterization Center

Baylor College of Medicine – Richard Gibbs (UM1DA058229)

Richard A. Gibbs22,23*, Elizabeth G. Atkinson23, Sravya Bhamidipati22, Hsu Chao22, Rui Chen22,23, Christopher M. Grochowski22, Harsha Doddapaneni22, Divya Kalra22, Ziad Khan22, Kavya Kottapalli22, Marie-Claude Gingras22, Walker Hale22, Heer Mehta22, Donna M. Muzny22, Muchun Niu23, Luis Paulin23, Jeffrey Rogers22, Evette Scott22, Fritz J. Sedlazeck22,23, Kimberly Walker22, Tao Wu23, Chenghang Zong23

*Contact: agibbs@bcm.edu

Genome Characterization Center

Broad Institute of MIT and Harvard – Kristin Ardlie (UM1DA058235)

Kristin G. Ardlie1, Viktor Adalsteinsson1, Lisa Anderson1, Carrie Cibulskis1, Tim H. H. Coorens1, Laura Domènech1, Kiran Garimella1, Whitney Hornsby1, Steve Huang1, Satoshi Koyama1, Niall Lennon1, Stephen B. Montgomery48,49,50, Tetsushi Nakao1, Azeet Narayan1, Pradeep Natarajan1,51,52, Evin Padhi49, Constantijn Scharlee1, Md Mesbah Uddin1,51, Liying Xue1, Zhi Yu1, Shadi Zaheri1

*Contact: kardlie@broadinstitute.org

Genome Characterization Center

Washington University – Ting Wang (UM1DA058219)

Ting Wang8,43*, Derek Albracht8, Eddie Belter8, Emma Casey8, Justin Chen43, Yuchen Cheng8, Shihua Dong8, Qichen Fu, Robert Fulton8, John Garza, H. Josh Jang57, Sheng Chih Jin8, Benjamin K. Johnson57, Nahyun Kong8, Daofeng Li57, Vivien Li8, Tina Lindsay8, Shane Liu8, Juan Macias-Velasco8, Elvisa Mehinovic8, Benpeng Miao8, Theron Palmer57, Purva Patel8, Mary Rhodes57, Dan Rohrer57, Andrew Ruttenberg8, Ayush Semwal57, Hui Shen57, Jiawei Shen8, Zitian Tang8, Chad Tomlinson8, Wenjin Zhang8, Xin Zilan8

*Contact: twang@wustl.edu

Genome Characterization Center

New York Genome Center – Soren Germer (UM1DA058236)

Soren Germer11*, Samuel Aparicio9,10,11, Jade E. B. Carter11, Bill Driscoll11, Uday Evani11, Heather Geiger11, Tausif Hasan11, Manisha Kher11, Dan A. Landau11,41,42, Rajeeva Musunuri11, Giuseppe Narzisi11, Nicolas Robine11, Alexi Runnels11

*Contact: sgermer@nygenome.org

Data Analysis Center (UM1DA058230)

Peter J. Park33,53*, Mingyun Bae4,5, Michele Berselli33, Ann Caplin33, Hye-Jung Elizabeth Chun33, Niklas Engel33, William Feng33, Yan Gao33, Nils Gehlenborg33, Dominik Glodzik33, Yoo-Jin Ha33, Hu Jin33, Sehi ĽYi33, Eunjung Alice Lee1,4,5,6, Heng Li33,44, Po-Ru Loh1,45, Lovelace J. Luquette33, Maximilian Marin33,44, Julia Markowski33, Dominika Maziec33, Huang Neng33,44, Sarah Nicholson33, Junseok Park4,5, Qin Qian44, Han Flora Qu33, Douglas Rioux33, William Ronchetti33, Andrew Schroeder33, Corinne Sexton33, Yichen Si, Kar-Tong Tan44, David Tang45, Alexander D. Veit33, Vinayak V. Viswanadham33, Suenghyun Wang33, Kate Woo33, Xi Zeng4,5, Yuwei Zhang33, Yifan Zhao33, Ying Zhou44

*Contact: peter_park@hms.harvard.edu

Tool and Technology Development

University of Michigan – Ryan Mills (UG3NS132084)

Ryan E. Mills18,19*, Brandt A. Bessell18, Alan P. Boyle18,19, Ingrid Flashpohler18, Steve Losh18, Michael J McConnell47, Torrin L. McDonald18, Camille Mumm19, Jessica A. Switzenberg18, Jinhao Wang18, Weichen Zhou18

*Contact: remills@med.umich.edu

Tool and Technology Development

Boston’s Children Hospital – Sangita Choudhury (UG3NS132144)

Sangita Choudhury1,4,6*, Guanlan Dong1,4,6, Nazia Hilal1,4,6, Se-Young Jo4,6, Eunjung Alice Lee1,4,5,6, Shayna L Mallett1,4,6, Monica Devi Manam4,6, Shulin Mao4,6, Diane D. Shao4,6,56, Christopher Walsh4,6,63, Sijing Zhao4,6

*Contact: Sangita.Choudhury@childrens.harvard.edu

Tool and Technology Development

Stanford University – Alexander Urban (UG3NS132146)

Alexander E Urban49,62*, Yiling Elaine Huang49, Jan O. Korbel39,40, Abhiram Natu66, Reenal Pattni49, Carolin Purman49, Flora M. Vaccarino66,67,68, Bo Zhou49, Xiaowei Zhu74

*Contact: aeurban@stanford.edu

Tool and Technology Development

Baylor College of Medicine – Chenghang Zong (UG3NS132132)

Chenghang Zong23*, Jiayi Luo, Muchun Niu23, Yichi Niu23, Rohan Thakur75, David A. Weitz75, Yang Zhang23

*Contact: Chenghang.Zong@bcm.edu

Tool and Technology Development

Baylor College of Medicine – Fritz Sedlazeck (UG3NS132105)

Fritz J. Sedlazeck22,23,55*, Yilei Fu22, Michal B. Izydorczyk22, Luis F. Paulin22, Tao P. Wu23,64,65, Xiaomei Zhan22, Xinchang Zheng22

*Contact: Fritz.Sedlazeck@bcm.edu

Tool and Technology Development

Mayo Clinic – Alexej Abyzov (UG3NS132128)

Alexej Abyzov7*, Geon Hue Bae2,3, Taejeong Bae7, Areum Cho2,3, June Hyug Choi2,3, Yujin Angelina Choi2, Hyungbin Chun2,3, Mrunal Dehankar7, Yeonjun Jang7, Seok-Won Jeong2,3, Min Ji3, Mee Sook Jun3,4, Su Rim Kim2,3, Seong Gyu Kwon2,3, Soung-Hoon Lee2,3, Nam Seop Lim2,3, Nanda Maya Mali2,3, Ji Won Oh2,3, Arijit Panda7, Jung Min Park2,3, JaeEun Shin2,3, Milovan Suvakov7

*Contact: Abyzov.Alexej@mayo.edu

Tool and Technology Development

University of Utah – Gabor Marth (UG3NS132134)

Gabor T. Marth46*, Brad Demarest46, Stephanie Gardiner46, Stephanie J. Georges46, Hunter R. Underhill60,61, Yingqi Zhang60

*Contact: gmarth@genetics.utah.edu

Tool and Technology Development

Weill Cornell Medicine – Dan Landau (UG3NS132139)

Dan A. Landau11,41,42*, Samantha Avaylon11, Alexandre Cheng11,41, Wei-Yu Chi41, Mariela Cortés-Lopéz41, Andrew R. D’Avino11,41, Husain Danish11,41, Elliot Eton11,41, Foteini Fotopoulo11,41, Saravanan Ganesan11,2, Yiyun Lin11,41, Qing Luo11,41, Levan Mekerishvili11,41, Joe Pelt11,41, Catherine Potenski11,41, Tamara Prieto11,41, Jake Qiu11,41, Ivan Raimondi11,2, Rahul Satija11,54, Manu Singh11,41, Dennis Yuan11,41, John Zinno11,41

*Contact: dlandau@nygenome.org

Tool and Technology Development

New York University – Gilad Evrony (UG3NS132024)

Gilad D. Evrony26,27*, Benjamin Costa26,27, Jonathan Evan Shoag79,80, Jimin Tan76, Aristotelis Tsirigos77,78

*Contact: Gilad.Evrony@nyulangone.org

Tool and Technology Development

Boston’s Children Hospital – Christopher Walsh (UG3NS132138)

Christopher A. Walsh4,6,63*, Emre Caglayan4,6,, Hayley Cline4,6,, Niklas L. Engel33, Shelbi E. Gill4,6,, Robert Sean Hill4,6,, Andrea J. Kriz4,6,, Julia Markowski1, Alisa Mo4,6,, Peter J. Park33,53, Daniel Snellings4,6,, Vinayak V. Viswanadham33

*Contact: Christopher.Walsh@childrens.harvard.edu

Tool and Technology Development

Dana Farber Cancer Institute – Kathleen Burns (UG3NS132127)

Kathleen H. Burns1,16,21*, Justin S. Becker1,15,16, Bradley E. Bernstein1,15,16, Aidan H. Burn1,16,21, Wen-Chih Cheng1,16,21, Jennifer A. Karlow1,16,21, Cheuk-Ting Law1,16,21, Eunjung Alice Lee1,4,5,6, Shayna L. Mallet1,4,5, Carlos Mendez-Dorantes1,16,21, Khue H. Nguyen1,4,5, Adam Voshall1,4,5, Boxun Zhao1,4,5

*Contact: KathleenH_Burns@DFCI.HARVARD.EDU

Tool and Technology Development

Broad Institute of MIT and Harvard – Fei Chen (UG3NS132135)

Fei Chen1,20*, Jason D. Buenrostro1,20, Tim H.H. Coorens1, Gad Getz1,16,34, Benno Orr1, Andrew J.C. Russell1

*Contact: chenf@broadinstitute.org

Tool and Technology Development

Case Western University – Fulai Jin (UG3NS132061)

Fulai Jin37,38*, He Li37,38, Yan Li37,38, Leina Lu37,38, Xiaofeng Zhu81

*Contact: fxj45@case.edu

Tool and Technology Development

University of Massachusetts – Thomas Fazzio (UG3NS132136)

Thomas G. Fazzio30*, Trishita Basak30, Manuel Garber32, Azita Ghodssi30, Katrina Newcomer30, Yuqing Wang32

*Contact: Thomas.Fazzio@umassmed.edu

Affiliations (continued from main list)

69. Office of Strategic Coordination, Division of Program Coordination, Planning, and Strategic Initiatives, National Institutes of Health, Bethesda, MD, USA

70. National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA

71. Division of Neuroscience and Behavior, National Institute on Drug Abuse, National Institutes of Health, Bethesda, MD, USA

72. Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, USA

73. ConnectLife, Williamsville, NY, USA

74. City University of Hong Kong, Department of Neuroscience, City University of Hong Kong, Hong Kong, China

75. Department of Physics, Harvard University, Boston, MA, USA

76. Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY, USA

77. Department of Medicine, NYU Grossman School of Medicine, New York, NY, USA

78. Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA

79. Department of Urology, University Hospitals Cleveland Medical Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA

80. Case Comprehensive Cancer Center, Cleveland, OH, USA

81. Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA

References

  • 1.IHGSC. Initial sequencing and analysis of the human genome. Nature 409, 860–921, doi: 10.1038/35057062 (2001). [DOI] [PubMed] [Google Scholar]
  • 2.Durbin RM et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073, doi: 10.1038/nature09534 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang T et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446, doi: 10.1038/s41586-022-04601-8 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Stratton MR, Campbell PJ & Futreal PA The cancer genome. Nature 458, 719–724, doi: 10.1038/nature07943 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Saini N & Gordenin DA Somatic mutation load and spectra: A record of DNA damage and repair in healthy human cells. Environ Mol Mutagen 59, 672–686, doi: 10.1002/em.22215 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Park S et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397, doi: 10.1038/s41586-021-03786-8 (2021). [DOI] [PubMed] [Google Scholar]; By amplifying cells into clones and subsequent sequencing, this study reconstructs embryonic dynamics through somatic mutation patterns.
  • 7.Coorens THH et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392, doi: 10.1038/s41586-021-03790-y (2021). [DOI] [PubMed] [Google Scholar]; This study reconstructs phylogenetic trees of human development through the detection of somatic mutations in many different tissues of the same donors.
  • 8.Evrony GD et al. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell 151, 483–496, doi: 10.1016/j.cell.2012.09.035 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]; This study represents a first foray into single cell DNA sequencing to uncover somatic mutations in single neurons.
  • 9.Bae T et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555, doi: 10.1126/science.aan8690 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]; Using somatic SNVs discovered in human brain progenitor cell clones, this study obtains a human embryonic lineage tree and estimates mutation frequencies at pregastrulation and neurogenesis.
  • 10.Biesecker LG & Spinner NB A genomic view of mosaicism and human disease. Nat Rev Genet 14, 307–320, doi: 10.1038/nrg3424 (2013). [DOI] [PubMed] [Google Scholar]
  • 11.Martínez-Glez V et al. A six-attribute classification of genetic mosaicism. Genet Med 22, 1743–1757, doi: 10.1038/s41436-020-0877-3 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Salk JJ, Schmitt MW & Loeb LA Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nature Reviews Genetics 19, 269–285, doi: 10.1038/nrg.2017.117 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Martincorena I et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029–1041.e1021, doi: 10.1016/j.cell.2017.09.042 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Consortium ITP-CA o. W. G. Pan-cancer analysis of whole genomes. Nature 578, 82–93, doi: 10.1038/s41586-020-1969-6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Laurie CC et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet 44, 642–650, doi: 10.1038/ng.2271 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Jacobs KB et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat Genet 44, 651–658, doi: 10.1038/ng.2270 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Abyzov A et al. Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature 492, 438–442, doi: 10.1038/nature11629 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]; Using iPSC lines to perform clonal analysis, this study outlines somatic copy number variations in skin fibroblasts from multiple families.
  • 18.McConnell MJ et al. Mosaic copy number variation in human neurons. Science 342, 632–637, doi: 10.1126/science.1243472 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Blokzijl F et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264, doi: 10.1038/nature19768 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]; This study is a first to use single cell derived organoids across tissues in humans to demonstrate a variability in somatic mutation rate.
  • 20.Lee-Six H et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478, doi: 10.1038/s41586-018-0497-0 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lodato MA et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559, doi: 10.1126/science.aao4426 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Milholland B et al. Differences between germline and somatic mutation rates in humans and mice. Nat Commun 8, 15183, doi: 10.1038/ncomms15183 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang L et al. Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proc Natl Acad Sci U S A 116, 9014–9019, doi: 10.1073/pnas.1902510116 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lee-Six H et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537, doi: 10.1038/s41586-019-1672-7 (2019). [DOI] [PubMed] [Google Scholar]
  • 25.Fasching L et al. Early developmental asymmetries in cell lineage trees in living individuals. Science 371, 1245–1248, doi: 10.1126/science.abe0981 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]; Reconstructing lineage trees via somatic mutations in living humans, this study demonstrates that embryonic lineages are often asymmetric.
  • 26.Mitchell E et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350, doi: 10.1038/s41586-022-04786-y (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jourdon A, Fasching L, Scuderi S, Abyzov A & Vaccarino FM The role of somatic mosaicism in brain disease. Curr Opin Genet Dev 65, 84–90, doi: 10.1016/j.gde.2020.05.002 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Breuss MW et al. Somatic mosaicism reveals clonal distributions of neocortical development. Nature 604, 689–696, doi: 10.1038/s41586-022-04602-7 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Olafsson S et al. Somatic Evolution in Non-neoplastic IBD-Affected Colon. Cell 182, 672–684.e611, doi: 10.1016/j.cell.2020.06.036 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Miller MB et al. Somatic genomic changes in single Alzheimer’s disease neurons. Nature 604, 714–722, doi: 10.1038/s41586-022-04640-1 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Heimlich JB & Bick AG Somatic Mutations in Cardiovascular Disease. Circ Res 130, 149–161, doi: 10.1161/circresaha.121.319809 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Poduri A et al. Somatic activation of AKT3 causes hemispheric developmental brain malformations. Neuron 74, 41–48, doi: 10.1016/j.neuron.2012.03.010 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chung C et al. Comprehensive multi-omic profiling of somatic mutations in malformations of cortical development. Nat Genet 55, 209–220, doi: 10.1038/s41588-022-01276-9 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bae T et al. Analysis of somatic mutations in 131 human brains reveals aging-associated hypermutability. Science 377, 511–517, doi: 10.1126/science.abm6222 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]; Using data collected by the Brain Somatic Mosaicism Network, this paper describes types and frequencies of early somatic mutations in the human brain.
  • 35.Lim ET et al. Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat Neurosci 20, 1217–1224, doi: 10.1038/nn.4598 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Baldassari S et al. Dissecting the genetic basis of focal cortical dysplasia: a large cohort study. Acta Neuropathol 138, 885–900, doi: 10.1007/s00401-019-02061-5 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.D’Gama AM & Walsh CA Somatic mosaicism and neurodevelopmental disease. Nat Neurosci 21, 1504–1514, doi: 10.1038/s41593-018-0257-3 (2018). [DOI] [PubMed] [Google Scholar]
  • 38.Evans MA & Walsh K Clonal hematopoiesis, somatic mosaicism, and age-associated disease. Physiol Rev 103, 649–716, doi: 10.1152/physrev.00004.2022 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Park JS et al. Brain somatic mutations observed in Alzheimer’s disease associated with aging and dysregulation of tau phosphorylation. Nat Commun 10, 3090, doi: 10.1038/s41467-019-11000-7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Spencer Chapman M et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90, doi: 10.1038/s41586-021-03548-6 (2021). [DOI] [PubMed] [Google Scholar]
  • 41.Coorens THH et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85, doi: 10.1038/s41586-021-03345-1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Moore L et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386, doi: 10.1038/s41586-021-03822-7 (2021). [DOI] [PubMed] [Google Scholar]; By studying many tissues from the same donor, this study demonstrates the variability in mutation burden and signatures across human tissues.
  • 43.Machado HE et al. Diverse mutational landscapes in human lymphocytes. Nature 608, 724–732, doi: 10.1038/s41586-022-05072-7 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Osorio FG et al. Somatic Mutations Reveal Lineage Relationships and Age-Related Mutagenesis in Human Hematopoiesis. Cell Rep 25, 2308–2316.e2304, doi: 10.1016/j.celrep.2018.11.014 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Abascal F et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410, doi: 10.1038/s41586-021-03477-4 (2021). [DOI] [PubMed] [Google Scholar]; This study developed ultra-low error rates for duplex sequencing that allows the interrogation of somatic mutations at the single read level.
  • 46.Yoshida K et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272, doi: 10.1038/s41586-020-1961-1 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Coorens THH et al. The somatic mutation landscape of normal gastric epithelium. bioRxiv, 2024.2003.2017.585238, doi: 10.1101/2024.03.17.585238 (2024). [DOI] [Google Scholar]
  • 48.Moore L et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646, doi: 10.1038/s41586-020-2214-z (2020). [DOI] [PubMed] [Google Scholar]
  • 49.Wang Y et al. APOBEC mutagenesis is a common process in normal human small intestine. Nat Genet 55, 246–254, doi: 10.1038/s41588-022-01296-5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Choudhury S et al. Somatic mutations in single human cardiomyocytes reveal age-associated DNA damage and widespread oxidative genotoxicity. Nat Aging 2, 714–725, doi: 10.1038/s43587-022-00261-5 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Behjati S et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425, doi: 10.1038/nature13448 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]; Using single cell-derived organoids of mouse tissues, this paper demonstrates the use of single nucleotide variants for in vivo lineage tracing and is an early report of developmental asymmetry.
  • 52.Pleasance ED et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196, doi: 10.1038/nature08658 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Chaisson MJP et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun 10, 1784, doi: 10.1038/s41467-018-08148-z (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Tate JG et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 47, D941–d947, doi: 10.1093/nar/gky1015 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Alexandrov LB et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101, doi: 10.1038/s41586-020-1943-3 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Drews RM et al. A pan-cancer compendium of chromosomal instability. Nature 606, 976–983, doi: 10.1038/s41586-022-04789-9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Macintyre G et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat Genet 50, 1262–1270, doi: 10.1038/s41588-018-0179-8 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Li Y et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121, doi: 10.1038/s41586-019-1913-9 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Nurk S et al. The complete sequence of a human genome. Science 376, 44–53, doi: 10.1126/science.abj6987 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Martincorena I et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886, doi: 10.1126/science.aaa6806 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]; This study demonstrated the abundance of clones harboring cancer driver mutations in a normal tissue using ultradeep sequencing.
  • 61.Li R et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature 597, 398–403, doi: 10.1038/s41586-021-03836-1 (2021). [DOI] [PubMed] [Google Scholar]
  • 62.Supek F & Lehner B Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84, doi: 10.1038/nature14173 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Haradhvala NJ et al. Mutational Strand Asymmetries in Cancer Genomes Reveal Mechanisms of DNA Damage and Repair. Cell 164, 538–549, doi: 10.1016/j.cell.2015.12.050 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Vöhringer H, Hoeck AV, Cuppen E & Gerstung M Learning mutational signatures and their multidimensional genomic properties with TensorSignatures. Nat Commun 12, 3628, doi: 10.1038/s41467-021-23551-9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Lodato MA et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98, doi: 10.1126/science.aab1785 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Bizzotto S et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253, doi: 10.1126/science.abe1544 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Bizzotto S & Walsh CA Genetic mosaicism in the human brain: from lineage tracing to neuropsychiatric disorders. Nat Rev Neurosci 23, 275–286, doi: 10.1038/s41583-022-00572-x (2022). [DOI] [PubMed] [Google Scholar]
  • 68.Rahbari R et al. Timing, rates and spectra of human germline mutation. Nat Genet 48, 126–133, doi: 10.1038/ng.3469 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Bloom JC, Loehr AR, Schimenti JC & Weiss RS Germline genome protection: implications for gamete quality and germ cell tumorigenesis. Andrology 7, 516–526, doi: 10.1111/andr.12651 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Maklakov AA & Immler S The Expensive Germline and the Evolution of Ageing. Curr Biol 26, R577–r586, doi: 10.1016/j.cub.2016.04.012 (2016). [DOI] [PubMed] [Google Scholar]
  • 71.Cai X et al. Single-cell, genome-wide sequencing identifies clonal somatic copy-number variation in the human brain. Cell Rep 8, 1280–1289, doi: 10.1016/j.celrep.2014.07.043 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sun C et al. Mapping the Complex Genetic Landscape of Human Neurons. bioRxiv, doi: 10.1101/2023.03.07.531594 (2023). [DOI] [Google Scholar]
  • 73.Loh PR et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355, doi: 10.1038/s41586-018-0321-x (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Evrony GD et al. Cell lineage analysis in human brain using endogenous retroelements. Neuron 85, 49–59, doi: 10.1016/j.neuron.2014.12.028 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]; This paper represents early work using single-cell somatic mutational signals to reconstruct cell lineages.
  • 75.Erwin JA et al. L1-associated genomic regions are deleted in somatic cells of the healthy human brain. Nat Neurosci 19, 1583–1591, doi: 10.1038/nn.4388 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Zhu X et al. Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nat Neurosci 24, 186–196, doi: 10.1038/s41593-020-00767-4 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhao B et al. Somatic LINE-1 retrotransposition in cortical neurons and non-brain tissues of Rett patients and healthy individuals. PLoS Genet 15, e1008043, doi: 10.1371/journal.pgen.1008043 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Shukla R et al. Endogenous retrotransposition activates oncogenic pathways in hepatocellular carcinoma. Cell 153, 101–111, doi: 10.1016/j.cell.2013.02.032 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Nam CH et al. Widespread somatic L1 retrotransposition in normal colorectal epithelium. Nature 617, 540–547, doi: 10.1038/s41586-023-06046-z (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Chimeric transcripts of transposable elements and genes are a source of tumor-specific antigens. Nat Genet 55, 538–539, doi: 10.1038/s41588-023-01361-7 (2023). [DOI] [PubMed] [Google Scholar]
  • 81.Gebrie A Transposable elements as essential elements in the control of gene expression. Mob DNA 14, 9, doi: 10.1186/s13100-023-00297-3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Martincorena I & Campbell PJ Somatic mutation in cancer and normal cells. Science 349, 1483–1489, doi: 10.1126/science.aab4082 (2015). [DOI] [PubMed] [Google Scholar]
  • 83.Martincorena I et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917, doi: 10.1126/science.aau3879 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Colom B et al. Mutant clones in normal epithelium outcompete and eliminate emerging tumours. Nature 598, 510–514, doi: 10.1038/s41586-021-03965-7 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Abby E et al. Notch1 mutations drive clonal expansion in normal esophageal epithelium but impair tumor growth. Nat Genet 55, 232–245, doi: 10.1038/s41588-022-01280-z (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Choi SH, Ku EJ, Choi YA & Oh JW Grave-to-cradle: human embryonic lineage tracing from the postmortem body. Exp Mol Med 55, 13–21, doi: 10.1038/s12276-022-00912-y (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Coorens THH et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251, doi: 10.1126/science.aax1323 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Kwon SG et al. Asymmetric Contribution of Blastomere Lineages of First Division of the Zygote to Entire Human Body Using Post-Zygotic Variants. Tissue Eng Regen Med 19, 809–821, doi: 10.1007/s13770-022-00443-7 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Ju YS et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718, doi: 10.1038/nature21703 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Zernicka-Goetz M First cell fate decisions and spatial patterning in the early mouse embryo. Semin Cell Dev Biol 15, 563–572, doi: 10.1016/j.semcdb.2004.04.004 (2004). [DOI] [PubMed] [Google Scholar]
  • 91.Brunner SF et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542, doi: 10.1038/s41586-019-1670-9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Ng SWK et al. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 598, 473–478, doi: 10.1038/s41586-021-03974-6 (2021). [DOI] [PubMed] [Google Scholar]
  • 93.Sherman MA et al. Large mosaic copy number variations confer autism risk. Nat Neurosci 24, 197–203, doi: 10.1038/s41593-020-00766-5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Keppler-Noreuil KM et al. PIK3CA-related overgrowth spectrum (PROS): diagnostic and testing eligibility criteria, differential diagnosis, and evaluation. Am J Med Genet A 167a, 287–295, doi: 10.1002/ajmg.a.36836 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Custers L et al. Somatic mutations and single-cell transcriptomes reveal the root of malignant rhabdoid tumours. Nat Commun 12, 1407, doi: 10.1038/s41467-021-21675-6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Pilet J et al. Preneoplastic liver colonization by 11p15.5 altered mosaic cells in young children with hepatoblastoma. Nat Commun 14, 7122, doi: 10.1038/s41467-023-42418-9 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Checri R et al. Detection of brain somatic mutations in focal cortical dysplasia during epilepsy presurgical workup. Brain Commun 5, fcad174, doi: 10.1093/braincomms/fcad174 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Lee JH et al. De novo somatic mutations in components of the PI3K-AKT3-mTOR pathway cause hemimegalencephaly. Nat Genet 44, 941–945, doi: 10.1038/ng.2329 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Kinsler VA et al. Multiple congenital melanocytic nevi and neurocutaneous melanosis are caused by postzygotic mutations in codon 61 of NRAS. J Invest Dermatol 133, 2229–2236, doi: 10.1038/jid.2013.70 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Beck DB et al. Somatic Mutations in UBA1 and Severe Adult-Onset Autoinflammatory Disease. N Engl J Med 383, 2628–2638, doi: 10.1056/NEJMoa2026834 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Higham CF, Morales F, Cobbold CA, Haydon DT & Monckton DG High levels of somatic DNA diversity at the myotonic dystrophy type 1 locus are driven by ultra-frequent expansion and contraction mutations. Hum Mol Genet 21, 2450–2463, doi: 10.1093/hmg/dds059 (2012). [DOI] [PubMed] [Google Scholar]
  • 102.De Rooij KE, De Koning Gans PA, Roos RA, Van Ommen GJ & Den Dunnen JT Somatic expansion of the (CAG)n repeat in Huntington disease brains. Hum Genet 95, 270–274, doi: 10.1007/bf00225192 (1995). [DOI] [PubMed] [Google Scholar]
  • 103.Swami M et al. Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset. Hum Mol Genet 18, 3039–3047, doi: 10.1093/hmg/ddp242 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Maury EA et al. Schizophrenia-associated somatic copy-number variants from 12,834 cases reveal recurrent NRXN1 and ABCB11 disruptions. Cell Genom 3, 100356, doi: 10.1016/j.xgen.2023.100356 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Kim J et al. Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders. Nat Commun 13, 5918, doi: 10.1038/s41467-022-33642-w (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Ren AA et al. PIK3CA and CCM mutations fuel cavernomas through a cancer-like mechanism. Nature 594, 271–276, doi: 10.1038/s41586-021-03562-8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Jaiswal S et al. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med 371, 2488–2498, doi: 10.1056/NEJMoa1408617 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Zekavat SM et al. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat Med 27, 1012–1024, doi: 10.1038/s41591-021-01371-0 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Nanki K et al. Somatic inflammatory gene mutations in human ulcerative colitis epithelium. Nature 577, 254–259, doi: 10.1038/s41586-019-1844-5 (2020). [DOI] [PubMed] [Google Scholar]
  • 110.Carithers LJ et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv Biobank 13, 311–319, doi: 10.1089/bio.2015.0032 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Coorens THH et al. The human and non-human primate developmental GTEx projects. Nature 637, 557–564, doi: 10.1038/s41586-024-08244-9 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Lemke AA et al. Addressing underrepresentation in genomics research through community engagement. Am J Hum Genet 109, 1563–1571, doi: 10.1016/j.ajhg.2022.08.005 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.The Impact of Genomic Variation on Function (IGVF) Consortium. ArXiv (2023).
  • 114.Akbarian S et al. The PsychENCODE project. Nat Neurosci 18, 1707–1712, doi: 10.1038/nn.4156 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Aganezov S et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533, doi: 10.1126/science.abl3533 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Regev A et al. The Human Cell Atlas. Elife 6, doi: 10.7554/eLife.27041 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Cheng AP et al. Whole genome error-corrected sequencing for sensitive circulating tumor DNA cancer monitoring. bioRxiv, 2022.2011.2017.516904, doi: 10.1101/2022.11.17.516904 (2022). [DOI] [Google Scholar]
  • 118.Zahn H et al. Scalable whole-genome single-cell library preparation without preamplification. Nat Methods 14, 167–173, doi: 10.1038/nmeth.4140 (2017). [DOI] [PubMed] [Google Scholar]
  • 119.Laks E et al. Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing. Cell 179, 1207–1221.e1222, doi: 10.1016/j.cell.2019.10.026 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Gonzalez-Pena V et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc Natl Acad Sci U S A 118, doi: 10.1073/pnas.2024176118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Yizhak K et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, doi: 10.1126/science.aaw0726 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Muyas F et al. De novo detection of somatic mutations in high-throughput single-cell profiling data sets. Nat Biotechnol, doi: 10.1038/s41587-023-01863-z (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Gao T et al. A pan-tissue survey of mosaic chromosomal alterations in 948 individuals. Nat Genet, doi: 10.1038/s41588-023-01537-1 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Hagemann-Jensen M et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol 38, 708–714, doi: 10.1038/s41587-020-0497-0 (2020). [DOI] [PubMed] [Google Scholar]
  • 125.Shao W & Wang T Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data. Genome Res 31, 88–100, doi: 10.1101/gr.265173.120 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Benjamin KJ et al. Single-cell Total RNA Miniaturized sequencing (STORM-seq) reveals differentiation trajectories of primary human fallopian tube epithelium. bioRxiv, 2022.2003.2014.484332, doi: 10.1101/2022.03.14.484332 (2022). [DOI] [Google Scholar]
  • 127.Xing D, Tan L, Chang CH, Li H & Xie XS Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc Natl Acad Sci U S A 118, doi: 10.1073/pnas.2013106118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Bae JH et al. Single duplex DNA sequencing with CODEC detects mutations with high sensitivity. Nat Genet 55, 871–879, doi: 10.1038/s41588-023-01376-0 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Liu MH et al. DNA mismatch and damage patterns revealed by single-molecule sequencing. Nature 630, 752–761, doi: 10.1038/s41586-024-07532-8 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Ebert P et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, doi: 10.1126/science.abf7117 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Zhou W et al. Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology. Nucleic Acids Res 48, 1146–1163, doi: 10.1093/nar/gkz1173 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.McDonald TL et al. Cas9 targeted enrichment of mobile elements using nanopore sequencing. Nat Commun 12, 3586, doi: 10.1038/s41467-021-23918-y (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Zhao T et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91, doi: 10.1038/s41586-021-04217-4 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Russell AJC et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature, doi: 10.1038/s41586-023-06837-4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Nam AS et al. Somatic mutations and cell identity linked by Genotyping of Transcriptomes. Nature 571, 355–360, doi: 10.1038/s41586-019-1367-0 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Myers RM et al. Integrated Single-Cell Genotyping and Chromatin Accessibility Charts JAK2V617F Human Hematopoietic Differentiation. bioRxiv, 2022.2005.2011.491515, doi: 10.1101/2022.05.11.491515 (2022). [DOI] [Google Scholar]
  • 137.Feusier J et al. Pedigree-based estimation of human mobile element retrotransposition rates. Genome Res 29, 1567–1577, doi: 10.1101/gr.247965.118 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Dou Y et al. Accurate detection of mosaic variants in sequencing data without matched controls. Nature Biotechnology 38, 314–319, doi: 10.1038/s41587-019-0368-8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Poplin R et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36, 983–987, doi: 10.1038/nbt.4235 (2018). [DOI] [PubMed] [Google Scholar]
  • 140.Ding J et al. Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data. Bioinformatics 28, 167–175, doi: 10.1093/bioinformatics/btr629 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Gaiti F et al. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia. Nature 569, 576–580, doi: 10.1038/s41586-019-1198-z (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Grimes K et al. Cell type-specific consequences of mosaic structural variants in hematopoietic stem and progenitor cells. bioRxiv, 2023.2007.2025.550502, doi: 10.1101/2023.07.25.550502 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Dou Y, Gold HD, Luquette LJ & Park PJ Detecting Somatic Mutations in Normal Cells. Trends Genet 34, 545–557, doi: 10.1016/j.tig.2018.04.003 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Yang X et al. Control-independent mosaic single nucleotide variant detection with DeepMosaic. Nat Biotechnol 41, 870–877, doi: 10.1038/s41587-022-01559-w (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Wilkinson MD et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018, doi: 10.1038/sdata.2016.18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement: participants in the SMaHT network

RESOURCES