A Universal Duplex Sequencing Approach for Accurate Detection of Somatic Mutations

Shuvro P Nandi; Yuhe Cheng; Shams Al-Azzam; Safa Saeed; Audrey Kristin; Nadia Sunico; Isabella R Stuewe; Zichen Jiang; Luka Culibrk; Maria Zhivagui; Xiaoxu Yang; Rachel M Wise; Foster C Jacobs; Bérénice Chavanel; Michael Korenjak; Mia Petljak; Silvia Balbo; Laurie G Hudson; Ke Jian Liu; Jiri Zavadil; Joseph G Gleeson; Ludmil B Alexandrov

doi:10.1101/2025.09.14.676103

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Sep 16:2025.09.14.676103. [Version 1] doi: 10.1101/2025.09.14.676103

A Universal Duplex Sequencing Approach for Accurate Detection of Somatic Mutations

Shuvro P Nandi ^1,^2,^3,^{^}, Yuhe Cheng ^1,^2,^3,^{^}, Shams Al-Azzam ^1,^2,^3,⁴, Safa Saeed ^1,^2,³, Audrey Kristin ^1,^2,³, Nadia Sunico ^1,^2,³, Isabella R Stuewe ^1,^2,³, Zichen Jiang ^1,^2,³, Luka Culibrk ^5,⁶, Maria Zhivagui ⁷, Xiaoxu Yang ^8,^9,¹⁰, Rachel M Wise ¹¹, Foster C Jacobs ¹², Bérénice Chavanel ¹³, Michael Korenjak ¹³, Mia Petljak ^5,⁶, Silvia Balbo ¹², Laurie G Hudson ¹¹, Ke Jian Liu ^14,¹⁵, Jiri Zavadil ¹³, Joseph G Gleeson ^8,⁹, Ludmil B Alexandrov ^1,^2,^3,^16,^*

¹Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, USA

²Department of Bioengineering, UC San Diego, La Jolla, CA, USA

³Moores Cancer Center, UC San Diego, La Jolla, CA, USA

⁴Biomedical Sciences Graduate Program, UC San Diego, La Jolla, CA, USA

⁵Department of Pathology, Grossman Medical School, New York University, NY, USA

⁶Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA.

⁷Kirk Kerkorian School of Medicine, University of Nevada, Las Vegas, NV, USA

⁸Department of Neurosciences and Pediatrics, UC San Diego, La Jolla, CA, USA

⁹Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA

¹⁰Department of Human Genetics, University of Utah, Salt Lake City, UT, USA

¹¹Department of Pharmaceutical Sciences, College of Pharmacy, University of New Mexico, Albuquerque, NM, USA

¹²Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA

¹³International Agency for Research on Cancer WHO, Epigenomics and Mechanisms Branch, Lyon, France

¹⁴Stony Brook Cancer Center, Stony Brook University, Stony Brook, NY, USA

¹⁵Department of Pathology, Stony Brook University School of Medicine, Stony Brook, NY, USA

¹⁶Sanford Stem Cell Institute, UC San Diego, La Jolla, CA, USA

^{^}

These authors contributed equally

AUTHOR CONTRIBUTIONS

S.P.N. and L.B.A. conceptualized the study and designed the UDSeq protocol with assistance from Y.C and advice from M.P., J.Z., and J.G.G. S.P.N. optimized the protocol and performed comparison with assistance and advice from S.A., S.S., A.K., N.S., I.R.S., Z.J., L.C., and M.Z. Access and analysis of human samples was performed by S.P.N., Y.C., X.Y., and J.G.G. Access and analysis of mouse samples was performed by S.P.N., Y.C., R.M.S., L.G.H., and K.J.L. Access and analysis of rat samples was performed by S.P.N., Y.C., F.C.J, and S.B. Access and analysis of cell line experiments was performed by S.P.N., Y.C., B.C., M.K., and J.Z. S.P.N. and L.B.A. wrote the manuscript with input from all co-authors. All authors read and approved the final manuscript.

Correspondence should be addressed to L2alexandrov@health.ucsd.edu.

PMCID: PMC12458366 PMID: 41000705

Abstract

Ultra-accurate detection of rare somatic mutations is critical for understanding mutational processes in human disease, aging, and environmental exposures, yet current methods are limited by error rates, restricted genome coverage, and high DNA input. We present UDSeq, a duplex sequencing protocol combining random fragmentation, efficient UMI ligation, and quantitative input control to achieve near-complete genome/exome representation from as little as 100 pg DNA. Benchmarking in human sperm estimates a UDSeq error rate of ~2.5×10⁻⁹ per base pair. UDSeq captures mutational signatures from heterogeneous populations without clonal expansion, reproduces exposure-specific patterns in cell lines and rodent models, and enables cross-species profiling. Compared with prior duplex methods, UDSeq yields up to fourfold more usable duplex molecules, improves library conversion, and remains cost-effective. We include a step-by-step protocol with quality-control checkpoints for fragment size, ligation yield, library conversion, and duplication rate. UDSeq provides a scalable, low-input platform for accurate profiling of somatic mutagenesis.

INTRODUCTION

Somatic mutations, which can be caused by both endogenous and exogenous mutagenic processes, are present in all human cells¹. These mutations accumulate gradually over time and often go unnoticed, as most have minimal or no effect on cellular function^1–3. However, certain mutations can disrupt key biological processes⁴, lead to cell death⁵, or confer a selective growth advantage, resulting in clonal expansion^6,7. Cancer is the most well-known example of a disease driven by somatic mutations, where specific alterations initiate tumorigenesis⁸, promote progression⁹, and can confer treatment resistance¹⁰. Beyond cancer, somatic mutations are increasingly recognized as contributors to other diseases¹¹, including neurodegenerative disorders¹² and cardiovascular conditions¹³. However, their roles in these areas remain much less studied, as detecting low-frequency mutations in non-cancer tissues presents significant technical challenges^14,15.

From a technical standpoint, the study of somatic mutations in cancer has been greatly facilitated by the fact that most tumors originate from a single mutated cell, whose proliferation produces thousands of descendants carrying the same mutations as the progenitor⁸. Although additional mutations may emerge during clonal expansion, the original mutations remain uniformly present across the tumor tissue⁹. This clonal nature enables reliable detection of tumor mutations with conventional sequencing, despite its inherent error rates ranging from one error per 1,000 base pairs (bp; i.e., 10⁻³ per bp) to six errors per 1,000 bp (i.e., 6 × 10⁻³ per bp)¹⁶. Because these shared mutations appear in every cancer cell, repeated resequencing generates a strong, reproducible signal that stands out from the random noise of sequencing errors. In contrast, detecting somatic mutations in most non-cancer tissues is far more challenging, as each cell typically harbors a unique set of mutations^17,18. As such, accurately studying somatic mutations in non-cancerous healthy or diseased tissues requires methods with error rates below 1 × 10⁻⁸ errors per bp (less than one error per 100 million sequenced bp)¹⁸.

The need for sequencing protocols with low error rates is further evident in efforts to evaluate the mutagenic potential of known carcinogens in experimental systems. For example, in vitro studies have required intricate experimental setups, where cells are exposed to a potential mutagenic carcinogen, followed by isolation and clonal expansion of single cells from the exposed population before sequencing^19,20. The clonal amplification step, though labor-intensive, is crucial for accurately detecting mutations, as it ensures that mutations present in the progeny of a single cell can be distinguished from background noise^19,20. Without this step, bulk sequencing cannot distinguish low-frequency mutations from background noise, since each cell harbors a distinct set of somatic mutations.

To overcome these limitations, several approaches have been developed to detect rare somatic mutations. Single-cell DNA sequencing^21–25 and single-cell clonal expansion^26–28 can, in principle, resolve mutations at the level of individual cells^26,28. However, these methods are labor-intensive, costly, and require sequencing large numbers of cells to obtain a representative view of tissue-wide mutagenesis. Additionally, amplification artifacts²⁹ and allelic dropout³⁰ can compromise accuracy for detecting rare mutations. As a scalable alternative, duplex sequencing has emerged as a powerful method for profiling somatic mutations at extremely low frequencies³¹. By exploiting DNA’s double-stranded nature, duplex sequencing independently evaluates each strand^32,33 and confirms mutations only when they are independently detected on both strands, greatly reducing error rates and enabling confident identification of rare somatic mutations.

Advances in duplex sequencing have improved the detection of rare somatic mutations, but no current method offers universal, single-molecule resolution in a cost-effective format that supports both whole-genome profiling and targeted capture from limited input DNA across species. An ideal approach would achieve an error rate below 10⁻⁸, offer full genome compatibility for profiling human tissues, human and non-human model systems, and non-model organisms, while requiring minimal DNA input, supporting targeted enrichment, and remaining cost-effective. To address this need, we developed Universal Duplex Sequencing (UDSeq), a novel single-molecule duplex sequencing protocol for rapid, accurate detection of rare somatic mutations across diverse biological systems. We put UDSeq in the context of existing error-corrected sequencing methods, demonstrating superior performance and broad applicability. To showcase its versatility, we applied UDSeq to samples from humans, mice, rats, chickens, and sheep, successfully detecting mutations in whole genomes and targeted regions derived from cell lines and multiple tissue types.

RESULTS

Overview of Existing Error-Corrected Sequencing Methods

Over the past decade, duplex sequencing has revolutionized the detection of rare somatic mutations¹⁸. In this approach, a ‘duplex consensus’ is generated by independently sequencing both the Watson and Crick strands of the same DNA molecule and typically confirming mutations only if present on both strands. Most protocols use unique molecular identifiers (UMIs) and exploit DNA strand complementarity to perform this process³³. To our knowledge, the first method of this kind was introduced in 2012, enabling sequencing of small genomic panels—generally under 1 megabase—with error rates of approximately 10⁻⁷ errors per bp³². However, this original DupSeq method had low efficiency in generating duplex consensuses³², limiting its practical application by necessitating large amounts of input DNA and extensive sequencing. Subsequently, Hoang et al. developed BotSeqS to address this challenge, introducing a dilution step immediately before library amplification³⁴. This dilution step creates a bottleneck, enabling efficient random sampling of double-stranded template molecules and substantially reducing the required amount of sequencing. Notably, BotSeqS could be applied to input DNA amounts as low as 50 nanogram (ng). Despite this improvement, the error rate of BotSeqS was similar to that of DupSeq, with independent analysis estimating it at ~2 × 10⁻⁷ errors per bp³⁵.

To further enhance BotSeqS and reduce error rates, NanoSeqV1 was developed by incorporating optimized DNA fragmentation and restrictive end repair method during library preparation by replacing sonication and end repair with restriction enzyme-based fragmentation using HpyCH4V³⁵. This innovation reduced the error rate to approximately 5 × 10⁻⁹ errors per base pair and allowed NanoSeqV1 to be applied to input DNA amounts as low as 50 ng. Nonetheless, it limited coverage to only 30% of the human genome and restricted its applicability to other genomes due to the specificity of the restriction enzyme³⁵. More recently, the CODEC (Concatenating Original Duplex for Error Correction) method employed specially designed quadruplex adaptors to physically link the Watson and Crick strands into a single-duplex molecule, enabling sequencing on a standard Illumina short-read platform³⁶. The original CODEC method achieved error rates of approximately 10⁻⁷ errors per bp, comparable to those of DupSeq and BotSeqS, while offering greater cost-effectiveness compared to DupSeq³⁶. Additionally, the original CODEC method could be applied to input DNA amounts as low as 2.5 ng. A modified version of the CODEC protocol, incorporating fragmentation steps similar to those used by NanoSeqV1, reduced the error rate to approximately 10⁻⁸ errors per bp. However, this modified version inherited the same limitations as NanoSeqV1, including coverage restricted to only 30% of the human genome³⁶. Additionally, to overcome the partial genome coverage limitations of the original NanoSeq, a second version—NanoSeqV2—was developed using an alternative genome fragmentation strategy. Nonetheless, its low library conversion efficiency necessitates a substantially larger amount of input DNA, which may restrict its broader applicability³⁷.

The previously described methods relied entirely on short-read sequencing based on duplex sequencing. In contrast, a recently developed long-read sequencing technique, HiDEF-seq (Hairpin Duplex Enhanced Fidelity sequencing), leveraged the PacBio platform to achieve single-molecule fidelity³⁸. HiDEF-seq utilized the inherent single-molecule nature of PacBio’s technology³⁹ to achieve high accuracy, performing 5 to 20 sequencing passes per strand with estimated error rates below 10⁻⁹ errors per bp³⁸. Notably, HiDEF-seq can resolve some single-strand mismatches, a capability not achievable with other duplex sequencing methods. However, it required a high input of DNA and incurred higher costs due to the expense of PacBio long-read sequencing compared to short-read technologies. Specifically, it needed at least 500 ng of high-quality DNA or 1,500 ng of degraded DNA to achieve 40% genome coverage, with even larger amounts required for complete genome sequencing³⁸.

Each of the previously discussed approaches—DupSeq, BotSeqS, NanoSeq, CODEC, and HiDEF-seq—represents a significant advancement in the detection of rare somatic mutations, introducing innovative methodologies, enhanced efficiency, and reduced error rates (Table 1). However, each method also comes with its own set of limitations, including challenges related to efficiency, error rates, genome coverage, DNA input requirements, or cost. To overcome many of these limitations, we present UDSeq, a single-molecule duplex sequencing protocol optimized for rapid, accurate detection of rare somatic mutations (Table 1), with the complete protocol provided in Supplementary Note 1.

Table 1:

Comparative overviews of DupSeq, BothSeq, NanoSeq, HiDEF-seq, CODEC, and UDSeq.

Sequencing protocol	Primary innovation	Genome coverage	Minimum input DNA	Approx. error rate (per bp)	Key limitations
DupSeq ³²	Independent tagging and sequencing of both DNA strands to enable duplex consensus mutation calling	Targeted panels	~1 μg	2 × 10⁻⁷	Requires high DNA input; generally limited to defined panels; labor-intensive workflow; High error rate when compared to other protocols
BotSeqS ³⁴	Bottleneck dilution to enrich randomly sampled duplex DNA molecules for consensus mutation calling	Genome-wide or targeted	50 ng	2 × 10⁻⁷	High error rate when compared to other protocols. Low efficiency.
NanoSeqV1 ³⁵	Removes end-repair–associated errors via restriction-enzyme fragmentation	~30% of genome	50 ng	5 × 10⁻⁹	Restriction sites limit coverage and species portability; needs redesign for new targets
NanoSeqV2 ³⁷	Alternative genome fragmentation methods that provide full genome coverage whilst retaining the original error rates	Genome-wide or targeted	30 ng	5 × 10⁻⁹	Low efficiency. Still unpublished but available as preprint
HiDEF-seq ³⁸	High-fidelity duplex consensus from PacBio long-read sequencing with detection of single-strand mismatches	~40% of genome	500 ng – 1.5 μg	4 × 10⁻⁹	Complex workflow, elevated sequencing depth required; partial genome coverage; high cost
CODEC ³⁶	Custom quadruplex adapters physically linking Watson & Crick strands	Genome-wide or targeted	2.5 ng	1 × 10⁻⁷	Requires specialized reagents, sequencing customization, and intensive library preparation; higher error rate than other protocols.
CODEC-HpyCH4V ³⁶	Restriction enzyme–based CODEC enabling targeted duplex capture	~30% of genome	50 ng	3 × 10–8	Requires specialized reagents, sequencing customization, and intensive library preparation; higher error rate than other protocols.
UDSeq	Near-complete genome coverage with ultra-low input via random fragmentation and efficient duplex consensus.	Genome-wide or targeted	100 pg	2.5 × 10⁻⁹	Similar to all other duplex methods, does not capture large structural variants or copy-number changes

Open in a new tab

Innovation Over Prior Protocols

To develop the UDSeq protocol, we built upon the advances of NanoSeqV1 over BotSeqS and introduced targeted innovations that overcome key limitations in existing duplex sequencing methods. Each improvement in the protocol was designed not only to enhance performance but also to expand the scope, versatility, and practicality of single-molecule mutation detection (Figure 1a; Supplementary Figure 1a).

Figure 1: — ***(a)*** High-level workflow illustrating the versatility of UDSeq across whole-genome and targeted sequencing approaches. ***(b)*** Comparison of error rates among UDSeq, other duplex sequencing methods, and germline *de novo* mutation (DNM) studies. UDSeq and other duplex approaches were applied to human sperm samples, whereas the DNM studies analyzed germline data from trios. Scatter plot shows the relationship between paternal age and the number of single base substitutions (SBSs) per haploid sperm genome across different sequencing approaches. Data points represent individual samples analyzed by UDSeq (n=8), HiDEF-Seq (n=5), and NanoSeqV1 (n=7), as well as parental DNM estimates from Jónsson (n=1,548) and Halldorsson (n=2,963). Regression lines with corresponding equations and coefficients of determination (R²) are shown for each dataset: error-corrected sequencing (UDSeq, HiDEF-Seq, NanoSeqV1) and parental DNM studies. ***(c)*** Left panel shows estimated slopes for the number of SBS accumulated per haploid sperm genome per year, and the right panel shows estimated y-intercepts representing the predicted number of SBS present at birth. Both values are derived from the regression analyses in panel b. Error bars indicate standard error of the estimate. ***(d)*** Top panel shows the SBS-96 mutational profile from sperm samples analyzed by UDSeq (n=8), and the bottom panel shows the SBS-96 profile from parental DNMs. The SBS-96 profile encompasses all single-base substitutions (C>A, C>G, C>T, T>A, T>C, and T>G) and their immediate trinucleotide sequence context. The two profiles have a cosine similarity of 0.92. Relative contributions of the aging-associated signatures SBS1 and SBS5 are shown on the right for each profile.

First, we replaced sonication- and HpyCH4V-based fragmentation with random fragmentation using either NEBNext dsDNA Fragmentase (M0348L) or UltraShear (M7634L). Both methods enable unbiased fragmentation across the genome, allowing UDSeq to achieve near-complete coverage (≥95%; comparable to bulk sequencing) of the genome and exome—an advance over NanoSeqV1, which was limited to ~30% of the genome (Supplementary Figure 1b), and comparable to NanoSeqV2, which also provides near-complete coverage³⁷. dsDNA Fragmentase is more cost-effective and widely accessible, though it produces short overhangs that require additional trimming during bioinformatics processing. In contrast, UltraShear generates highly uniform fragment sizes without overhangs, reducing computational preprocessing steps—but at higher reagent cost. This flexibility allows users to balance performance, cost, and bioinformatics complexity based on experimental needs.

Second, we adopted the xGen^™ cfDNA & FFPE DNA Kit (IDT) for ligation of unique UMIs. This kit provides high ligation efficiency even with minimal DNA input, enabling accurate duplex sequencing from as little as 0.1 ng (100 picograms) of starting material. Compared to NanoSeqV2, it delivers a substantial improvement in library conversion efficiency—yielding up to four times more femtomoles of usable library from the same input DNA (p=0.00022; Supplementary Figure 1c). This enhancement reduces input requirements and broadens applicability to samples with limited DNA, such as clinical biopsies or environmental isolates.

Third, we incorporated accurate quantification of UMI-ligated molecules using the NEBNext Library Quant Kit for Illumina and iTaq Universal SYBR Green Supermix on a Bio-Rad real-time PCR system. This ensures precise input into PCR amplification, reducing over-amplification artifacts and preserving single-molecule fidelity. PCR was then performed with UDI primers (IDT) and NEBNext^® Ultra^™ II Q5^® Master Mix (M0544L), which maintains high fidelity during amplification.

Finally, by integrating random fragmentation with low-input, high-efficiency UMI ligation and precise quantification, UDSeq uniquely enables ultra-accurate, single-molecule somatic mutation detection across species, with support for whole-genome coverage or targeted panels—even from limited input material. Despite offering substantial advantages in sensitivity, flexibility, and scalability, the protocol remains cost-efficient—comparable to or even lower in cost than other duplex sequencing methods. (Supplementary Figure 1d). In the sections that follow, we systematically evaluate the protocol’s error rate and demonstrate its applicability across a range of experimental settings. Together, these advances position UDSeq as a cost-effective, scalable platform for widespread genomic applications (Figure 1a; Supplementary Figure 1a).

Assessing and Comparing the Error Rate of UDSeq

To assess the error rate of UDSeq, we sequenced DNA extracted from sperm samples provided by eight males ranging in age from 19 to 70 years. We compared our results with two large-scale Icelandic population studies of trios (mother, father, and child), which estimated sperm mutation rates in fathers at different ages based on phasing de novo mutations (DNM) observed in the offspring^40,41. These DNM trio studies reported sperm mutation rates of 1.54 and 1.40 single base substitutions (SBS) per year, respectively, with the number of SBS at birth (i.e., age zero) estimated at 4.96 and 6.58, respectively (Figure 1b–c). Consistent with the estimates from the DNM trio studies, the UDSeq data revealed that sperm accumulate 1.58 SBSs per year and an estimated number of mutations at age zero of 12.60. By analyzing the difference in mutation rates at age zero between UDSeq and the DNM trio studies, we estimated that the error rate of UDSeq is between 6 and 7.6 artifactual SBS per sequenced haploid sperm sample containing approximately three billion base pairs. This corresponds to an error rate of approximately 2.5 × 10⁻⁹ errors per bp (Supplementary Figure 1d). Applying the same approach to previously sequenced sperm samples, we also estimated the error rates of NanoSeqV1 (n=7 sperm samples) and HiDEF-seq (n=5) which yielded error rates of about 4.8 × 10⁻⁹ and 4.3 × 10⁻⁹ errors per bp, respectively (Figure 1c). Although derived using a different approach, our estimated error rate closely matches values reported in previous studies—for example, 4.8 × 10⁻⁹ in our analysis compared to 5 × 10⁻⁹ for NanoSeqV1 in their original publication³⁵. Overall, given the sample sizes of sperm samples, the error rates of UDSeq, NanoSeq, and HiDEF-seq were effectively similar, with less than 5 mutations per billion sequenced base pairs (i.e., <5 × 10⁻⁹ errors per bp; Figure 1c). Lastly, as expected^42,43, the mutational patterns observed in the eight sperm samples profiled by UDSeq exhibited the patterns of clock-like signatures SBS1 and SBS5, closely resembled that of paternal de novo mutations⁴¹ (cosine similarity = 0.92; Figure 1d).

In vitro Assessment of Mutagenesis

Traditional sequencing protocols for evaluating environmental carcinogen exposure in vitro generally require months of precise exposure, clonal expansion, and sequencing^19,44,45 (Figure 2a), whereas UDSeq enables direct mutational detection in heterogeneous cell populations, significantly reducing the timeline (Figure 2b). To showcase the utility of the UDSeq protocol, we applied it to three human cell lines exposed to four environmental carcinogens with well-documented mutagenic properties. These in vitro experiments included: (i) HepG2 human liver cancer cell line, derived from a well-differentiated hepatocellular carcinoma, exposed to 4-Nitroquinoline 1-oxide (4NQO; 0.5 μM for 4 hours) and aristolochic acid-I (AA-I; 80μM for 24 hours); (ii) immortalized normal oral keratinocytes (NOK) exposed to the tobacco specific nitrosamine 4-methylnitrosamino-1-(3-pyridyl)-1-butanone (NNK)⁴⁶; and (iii) N/TERT-1 keratinocytes exposed to solar-simulated ultraviolet-light radiation (ssUVR; 3 KJ/m²)⁴⁷. A heterogeneous population of cells was cultured for a single passage following exposure, after which DNA was extracted and subjected to UDSeq at whole-genome resolution (Figure 2b). The resulting mutational profiles from this duplex sequencing approach were then compared to those obtained from clonally expanded cells that were exposed to the same carcinogens (Figure 2b).

The mutational profile induced by AA-I closely matched the COSMIC signature SBS22a, consistent with SBS22a’s AA-I etiology as established from human cancer samples⁴⁸ and in vitro models⁴⁹. The profile also showed strong concordance with bulk sequencing data derived from clonally expanded AA-I-exposed cells (cosine similarity = 0.98; Figure 2c). Similarly, the pattern of ssUVR matched the known ssUVR-light associated COSMIC experimental mutational signature for solar simulated radiation⁴⁷, while the pattern of 4NQO was identical to that observed in clonally expanded human cells exposed to 4NQO (Figure 2c). Lastly, for NNK acetate, the mutational profiles were also nearly identical to the one found in clonally expanded human lung cells exposed to the same compound⁵⁰. Overall, these results confirm that UDSeq can replicate mutational patterns observed in clonally expanded cells, without the time or resource burden of extended culture.

The previously generated in vitro results utilized 100 ng of input DNA (Figure 2c). To evaluate UDSeq’s performance with low-input DNA, we applied it to 100 picograms (pg) of DNA 4NQO- and AA-I-exposed HepG2 cells. Across all conditions, UDSeq reliably detected the expected mutational patterns, yielding results consistent with those obtained from higher DNA input (Figure 2d).

To demonstrate UDSeq’s versatility in generating custom pull-down sequencing, we also performed whole-exome sequencing (using xGen^™ Exome Hybridization Panel) and targeted gene panel sequencing encompassing 127 known cancer-associated genes (xGen^™ Pan-Cancer Hybridization Panel). The resulting mutational profiles closely matched expectations, with AA-I exposure aligning with SBS22a (cosine similarity = 0.98) and 4NQO exposure mirroring patterns observed in HepG2 cells (cosine similarity = 0.94; Figure 2e).

In vivo Assessment of Mutagenesis

To showcase the utility of the UDSeq protocol for assessing in vivo mutagenesis, we exposed SKH-1 hairless mice to ssUVR (14 kJ/m²; ~0.5 minimal erythema dose) three times per week for 30 weeks (Figure 3a) and F344 rats to 5 parts per million NNK in drinking water for 15 weeks (Figure 3b), and compared their mutational profiles to those of unexposed controls. As expected, in SKH-1 hairless mice, mutational burden analysis revealed 105-fold and 6.5-fold higher mutational loads in the dorsal and ventral skin of ssUVR-exposed mice, respectively (p<0.05), compared to controls (Figure 3a). ssUVR-associated mutational signatures were present in the dorsal and ventral skin of exposed mice but absent in the skin of unexposed controls (Figure 3a). Similarly, F344 rats exposed to NNK exhibited a 4.7-fold higher mutational burden compared to controls (p=0.013; Figure 3b), with an NNK-specific mutational pattern closely resembling that observed in cell line experiments (Figure 2b; cosine similarity = 0.90).

Figure 3: — ***(a)*** *Left:* Schematic of the in vivo mutagenesis workflow in SKH-1 hairless mice, with cohorts either unexposed (controls) or subjected to solar-simulated UVR (14 kJ/m²; ~0.5 minimal erythema dose) three times per week for 30 weeks. *Middle:* Box plots showing mutation burden per base pair in ventral skin from control mice (Control; n=4), ventral skin from ssUVR-exposed mice (ssUVR_VS, n=4), and dorsal skin from ssUVR-exposed mice (ssUVR_DS, n=4). The y-axis represents mutation burden per base pair on a log scale. Horizontal lines within boxes indicate medians; boxes represent interquartile ranges (IQR), and whiskers extend to 1.5× IQR. P-values were calculated using a two-sided t-test: control vs. ssUVR_VS, p=0.014; ssUVR_VS vs. ssUVR_DS, p=0.010. *Right*: SBS-96 mutational profiles of control, ssUVR_VS, and ssUVR_DS skin samples, with contributing COSMIC reference mutational signatures shown adjacent to each profile. Each SBS-96 profile represents all single-base substitutions (C>A, C>G, C>T, T>A, T>C, T>G) within their trinucleotide sequence context. ***(b)*** *Left:* Schematic of the in vivo mutagenesis workflow in F344 rats: control and 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK)–exposed groups (NNK administered in drinking water for 15 weeks), with lung tissue collected for analysis. *Middle:* Box plots showing mutation burden per base pair in lung tissue for control (n=3) and NNK-exposed (n=4) lungs. The y-axis represents mutation burden per base pair on a log scale, and the format of the box plots is identical to the one in *(b)*. P-values were calculated using a two-sided t-test: control vs. NNK, p=0.010. *Right*: SBS-96 mutational profiles of control and NNK-exposed lung tissues, with contributing COSMIC reference signatures shown alongside each profile. The *in vitro*–derived NNK experimental signature from Figure 2c was included in the assignment and was detected exclusively in the NNK-exposed samples.

Additionally, we evaluated the capability of UDSeq to profile tissues from non-model organisms by whole-genome sequencing breast, pancreas, and skin tissues from healthy chickens, as well as different layers of kidney tissue samples from healthy sheep. As anticipated⁵¹, the tissues from both chickens and sheep displayed distinct patterns of clock-like mutational signatures SBS1 and SBS5, along with SBS18 in skin tissue of chickens (Supplementary Figure 3a-c). Furthermore, we observed that the cortex has a 1.49-fold higher mutational burden than the medulla in the same kidney samples (Supplementary Figure 3c)⁵².

Examining Mutational Processes in Healthy Human Tissues

To demonstrate UDSeq’s ability to study mutational processes in normal somatic tissues of healthy individuals, we applied the protocol at whole-genome resolution to five organs from a single 70-year-old individual: left cortex, right cortex, left kidney, right kidney, and liver (Figure 4a). The data revealed that the brain had the lowest mutational burden, followed by the kidneys and liver (Figure 4a). Interestingly, the left kidney exhibited more mutations than the right kidney. However, since DNA was extracted from bulk tissue, this difference may stem from variations in capturing the kidney cortex and medulla, as observed in prior reports⁵² and our data from sheep kidney (Supplementary Figure 3c). To investigate the mutational processes active in these organs, we analyzed the mutational signatures present⁵³ across tissues and identified patterns consistent with known biology. As expected, clock-like mutational signatures SBS1 and SBS5—associated with cell proliferation and aging—were detected in all samples (Figure 4b). SBS40, a signature commonly observed in renal tissues and cancers despite its unknown etiology⁵⁴, was present in both kidney samples. SBS4, which is associated with tobacco smoking across multiple cancer types and has also been observed in liver cancer⁵⁵, was uniquely detected in the liver, although the smoking status of the donor was unavailable. Together, these findings confirm the ability to detect the presence of expected tissue-specific mutational processes (Figure 4b).

Figure 4: — ***(a)*** Schematic of the human body highlighting sampled organs and their mutation burden estimates from a single 70-year-old individual. Mutation burden is expressed as single base substitutions (SBS) per base pair, and the total number of SBS per diploid genome is shown for each organ (denoted by m). ***(b)*** SBS-96 mutational profiles are shown for each tissue. Each SBS-96 profile represents all single-base substitutions (C>A, C>G, C>T, T>A, T>C, T>G) within their trinucleotide sequence context. Contributing COSMIC mutational signatures are displayed adjacent to each profile. Y-axis scales are adjusted individually to optimally display the percentage of mutations within each tissue.

DISCUSSION

In this study, we introduce UDSeq, a novel and cost-efficient single-molecule duplex sequencing protocol designed to overcome the key limitations of existing error-corrected sequencing technologies. By enabling both whole-genome and targeted sequencing from as little as 100 picograms of input DNA, UDSeq combines ultra-low error rates (~2.5 × 10⁻⁹ errors per base pair; Table 1) with high sensitivity and versatility, making it well-suited for detecting rare somatic mutations across a wide range of biological contexts. Benchmarking against human sperm DNA validated its accuracy, yielding mutation rates that align with parent-offspring trio-based de novo mutation studies and faithfully recapitulating clock-like mutational signatures SBS1 and SBS5.

Unlike prior duplex sequencing protocols—such as NanoSeqV1, CODEC-HpyCH4V, and HiDEF-seq—that rely on enzyme-based fragmentation and are limited to partial genome coverage, UDSeq leverages random fragmentation to achieve near-complete genome and exome representation. This enzyme-independent approach expands applicability across species and simplifies targeted sequencing without requiring protocol modifications. Additionally, our optimized library preparation pipeline enhances library conversion efficiency, generating up to four times more duplex molecules than NanoSeqV2 from the same DNA input. Combined with its compatibility with low-input samples and streamlined workflow, UDSeq offers both technical performance and cost-effectiveness, making it particularly valuable for studies involving scarce clinical material or environmental specimens.

In this study, we demonstrated the power of UDSeq across diverse applications. In vitro, it captured carcinogen-induced mutational signatures in heterogeneous cell populations—without the need for laborious clonal expansion. In vivo, it revealed exposure-specific mutation patterns in rodent models and enabled genome-wide mutation detection in non-model organisms including chickens and sheep. In human tissue biopsies, UDSeq recovered expected organ-specific mutational signatures and identified differences in mutational burden, further validating its utility for studying tissue-specific mutagenesis. While some of these applications could, in principle, be addressed with other duplex sequencing protocols, to the best of our knowledge UDSeq is the only method optimized to perform all of them, with experiments conducted across different laboratories confirming its versatility and with the protocol streamlined to facilitate adoption and use by others.

While UDSeq has demonstrated versatility across a wide range of applications and offers clear advantages over existing duplex sequencing protocols, it still shares certain limitations inherent to all short-read duplex sequencing. Specifically, the protocol is not well-suited for detecting large structural variants, complex rearrangements, or copy number alterations, which require long-range genomic context. Future integration with long-read sequencing technologies or complementary genomic platforms could overcome these challenges, further extending UDSeq’s utility to capture both small-scale mutations and large-scale genomic alterations.

In summary, UDSeq is a robust, scalable, and cost-efficient duplex sequencing protocol that enables accurate detection of rare somatic mutations at single-molecule resolution. Its flexibility across species, compatibility with limited input material, and high technical fidelity position UDSeq as a powerful tool for advancing studies of mutagenesis, somatic mosaicism, aging, cancer biology, and environmental exposures. Importantly, we provide a clear and streamlined protocol (Supplementary Note 1) that is easy to use and has been extensively validated across multiple independent laboratories, ensuring broad reproducibility and accessibility.

METHODS

Human biospecimens

All human biospecimens were collected with informed consent from all human research participants or their families. The tissue samples used in this study were collected post-mortem from deceased human participants by LIBD, not from living individuals. The collection was conducted in accordance with applicable national and state Institutional Review Board (IRB) regulations (study number: 1126332; IRB tracking number: 20111080). Sperm samples were collected from healthy ethnically diverse males enrolled according to approved human subjects’ protocols from the Institutional Review Board (IRB) of the University of California for blood, saliva, and semen sampling (140028, 161115). Genomic DNA was extracted using the DNeasy Blood and Tissue kit (QIAGEN, Cat# 69506, Valencia, CA) following the manufacturer’s recommendations.

Cytotoxicity assessment

Cytotoxicity assessment was performed for all in vitro experiments. Specifically, cell viability was determined using the CellTiter-Glo^® Luminescent Cell Viability Assay (Promega, Cat# G7572, Madison, WI), which quantifies ATP as an indicator of metabolically active cells. The reagent was added to each well of a 96-well plate at a 1:10 ratio. After a 10-minute incubation at room temperature, luminescence was recorded using a Cytation 5 Cell Imaging Multi-Mode Reader (BioTek, Winooski, VT). Relative cell viability was calculated as the percentage of luminescent signal from treated cells compared to untreated controls

In vitro experiments

HepG2 human liver cancer cell line, derived from a well-differentiated hepatocellular carcinoma were purchased from ATCC (HB-8065). An hTERT immortalized non-cancerous human keratinocyte cell line (i.e., N/TERT-1) was purchased from Cellosaurus (RRID: CVCL_CW92). Normal oral keratinocytes (NOK) cell lines were a kind gift from Dr. Paul Lambert (University of Wisconsin-Madison, United States of America). The cells were generated by retroviral insertion of the human hTERT gene in oral epithelial cells derived from gingival tissue. The cells were propagated in the keratinocyte growth medium 2 (PromoCell GmbH, Heidelberg, Germany) and 1% penicillin/streptomycin. All other cells were cultured by following the recommended cell maintenance process from manufacturer using T25 (Thermo Fisher, 169900) or T75 (Thermo Fisher, 156800) flasks. Following cytotoxicity assessment, half-maximal inhibitory concentration (IC₅₀) of environmental carcinogens was used for exposure with a specific duration of time. For in vitro experiments for profiling with UDSeq, no single cell clonal bottlenecking/passaging was done after exposure. For each experiment, cells were passaged only once after exposure, followed by DNA extraction. Following treatment, genomic DNA extraction was performed using the DNeasy Blood & Tissue Kit (QIAGEN, Cat# 69506, Valencia, CA), including RNase A treatment to eliminate RNA contamination. DNA concentrations were measured using the Qubit^™ dsDNA Broad Range Assay Kit (Thermo Fisher Scientific, Cat# Q32850).

For in vitro experiment with bottlenecking, clonal expansion and profiling with bulk sequencing, primary human cells derived from human foreskin fibroblasts (HFFs) were passaged and clonally expanded by following the methods in Zhivagui et al.⁵⁶. Cells were washed weekly, until clones reached confluency and were transferred progressively to T-75 flasks. 4NQO exposure (0.5 μM for 4 hours) was performed following cytotoxicity assessment to determine the IC₅₀ concentration. Following exposure, cells underwent an additional clonal passage for ~35 rounds of cell division, after which DNA was extracted and subjected to bulk whole-genome sequencing using the NEBNext^® Ultra^™ II DNA Library Prep Kit for Illumina^® (E7645S). Clonal expansion results for other cells were based on previously generated sequencing data as reported in the original publications.

In vivo experiments

Male SKH-1 mice (21–25 days old) were purchased from Charles River Laboratories (Wilmington, MA). These studies were performed under an approved Institutional Animal Care and Use Committee (IACUC) protocol 25–201636-HSC at the University of New Mexico. Mice were either controls (i.e., unexposed to ssUVR) or exposed to ssUVR (14 kJ/m²; ~0.5 minimal erythema dose) 3 times per week for 30 weeks. Animals were sacrificed 4 weeks after the last ssUVR treatment. Animals were euthanized using CO₂ followed by cervical dislocation and tissues were collected. Skin tissue was collected in 10% neutral buffered formalin, RNAlater, snap-frozen, and epidermal scrapings obtained from both ventral and dorsal skin. We have complied with all relevant ethical regulations for animal use. Genomic DNA extraction was performed using the DNeasy Blood & Tissue Kit (QIAGEN, Cat# 69506, Valencia, CA), including RNase A treatment to eliminate RNA contamination. DNA concentrations were measured using the Qubit^™ dsDNA Broad Range Assay Kit (Thermo Fisher Scientific, Cat# Q32850).

F344 rats (21–25 days old) were purchased from Charles River Laboratories (Wilmington, MA). These studies were performed under an approved Institutional Animal Care and Use Committee (IACUC) protocol (#1802–35549A) at University of Minnesota. Following one week of acclimation, rats were treated with NNK (5 parts per million in drinking water) and were euthanized after 15 weeks. Control rats were provided with normal drinking water. Animals were euthanized using CO₂ followed by cervical dislocation and tissues were collected. Lung tissues were from both control and NNK-exposed rats. Tissues were collected and flash frozen. Genomic DNA extraction was performed using the DNeasy Blood & Tissue Kit (QIAGEN, Cat# 69506, Valencia, CA), including RNase A treatment to eliminate RNA contamination. DNA concentrations were measured using the Qubit^™ dsDNA Broad Range Assay Kit (Thermo Fisher Scientific, Cat# Q32850).

Chicken and sheep organs were obtained from a butcher shop in San Diego. Genomic DNA extraction was performed using the DNeasy Blood & Tissue Kit (QIAGEN, Cat# 69506, Valencia, CA), including RNase A treatment to eliminate RNA contamination. DNA concentrations were measured using the Qubit^™ dsDNA Broad Range Assay Kit (Thermo Fisher Scientific, Cat# Q32850).

UDSeq Library Preparation

The complete step-by-step UDSeq protocol is provided in Supplementary Note 1. Briefly, to minimize DNA damage during fragmentation, intact genomic DNA was enzymatically fragmented using NEBNext dsDNA Fragmentase (M0348S) or UltraShear (M7634L) to achieve an average fragment size of ~350 bp. Fragmentation conditions were carefully optimized for each species and sample type. For human samples, both sperm and cell lines were fragmented for 15 minutes, while human tissue required 20 minutes. Mouse cell lines were also fragmented for 20 minutes, but mouse tissue needed a longer duration of 25 minutes. Similarly, rat tissues were fragmented for 25 minutes to achieve optimal results. Fragmented DNA was then used for UMI adapter ligation with the xGen^™ cfDNA & FFPE DNA Library Preparation Kit. All steps were carried out on magnetic beads to reduce DNA loss during purification, thereby improving library conversion efficiency (Supplementary Figure 1b). In the final step, an appropriate femtomole input amount was used for PCR amplification to incorporate sample index sequences compatible with Illumina^® sequencing platforms.

DNA Quantification, Dilution, and PCR Amplification

A key strength of UDSeq lies in the accurate quantification of adapter-ligated DNA using qPCR. To avoid the variability introduced by mixed primer sets during quantification, we utilized NEBNext^® Library Quant DNA Standards, which reliably quantify UMI-ligated molecules. For size correction, we used 330 bp for the standards. For adapter-ligated DNA, we estimated fragment size by adding 82 bp (accounting for UMI-containing adapters) to the average fragment length determined by TapeStation. For example, a sample with an average fragment size of 370 bp was quantified using 452 bp as the effective fragment length. Additional details are provided in Supplementary Note 1.

For library amplification, we used 0.2 fmol of input DNA and 15 PCR cycles to achieve ~90× whole-genome coverage in human samples. For mouse samples, we used 0.15 fmol with the same number of cycles. For other species, input amounts were adjusted as appropriate to target ~80% duplicated and ~20% unique reads (Supplementary Note 1).

Targeted hybrid capture was performed using 6–8 multiplexed samples per reaction, with 500 ng of adapter-ligated DNA per capture. The complete targeted capture protocol, including exome and panel-based enrichment, is described in the Supplementary Note 1. The pre-made UDSeq libraries were sequenced on an Illumina NovaSeq 6000 and NovaSeq X platform using 150 PE sequencing chemistry to effective data volume.

Trimming, Alignment, and Mutation Identification

All bioinformatics analyses were performed within the Triton Shared Compute Cluster (San Diego Supercomputer Center (2022): Triton Shared Computing Cluster. University of California, San Diego. Service. https://doi.org/10.57873/T34W2R). Somatic mutations and mutational burden from UDSeq data with matched normal were analyzed using DupCaller⁵⁷ ver1.0.1. Briefly, Paired-end FASTQ files with equal-length barcodes at the start of each read were preprocessed to remove barcodes and align sequences to the reference genome using BWA. PCR and optical duplicates were marked using GATK. DupCaller constructs sample-specific error profiles by analyzing single-strand mismatches and single-read discrepancies. These profiles are stratified by trinucleotide context and homopolymer length for substitutions and indels, respectively. A strand-aware probabilistic model calculates genotype likelihoods and assigns confidence scores to candidate mutations. Mutations exceeding a confidence threshold are retained. Post-calling filters were used to exclude low-quality reads, common germline variants, and noisy loci.

Mutational profile and signature analysis

The variant call format files (VCFs) from DupCaller were used for mutational profiles and signatures assignment. Analysis of mutational profiles was performed using our previously established methodology with the SigProfiler suite of tools. Briefly, mutational matrices for SBS, DBS and Indels were generated with SigProfilerMatrixGenerator⁵⁸ (Version 1.2.16). Plotting of each mutational profile was done with SigProfilerPlotting (Version 1.3.13). Assignment of mutational signatures to samples was done with SigProfilerAssignment⁵⁹. Mutational profile of sperm samples from NanoSeqV1³⁵ and HiDEF-Seq³⁸ were obtained from their corresponding publications. Parental de novo mutations were obtained from Halldorsson, B. V. et al.,⁴¹. and the patterns of the mutations are plotted with SigProfilerMatrixGenerator. Regression plots mutation rate was calculated as previously described in Ref.³⁸. The corrected mutation burdens output from DupCaller was used for plotting using R statistical language⁶⁰.

Supplementary Material

Supplement 1

Supplementary Figure 1: UDSeq protocol overview and comparative performance. (a) Detailed schematic of the UDSeq protocol, comprising four major steps: (i) DNA extraction and fragmentation, (ii) end repair and adapter ligation for library preparation, (iii) library quantification and sequencing, and (iv) data analysis. (b) Comparative overview of genomic coverage achieved for human cortex (UDSeq), human cortex (NanoSeqV1), and human blood (bulk WGS) samples across different target regions (exome and genome) at varying coverage thresholds. (c) Quantitative polymerase chain reaction (quantitative PCR) amplification curves and quantification of unique molecular identifier–ligated molecules (in femtomoles) demonstrating library conversion efficiency of Universal Duplex Sequencing (UDSeq) versus NanoSeqV2 using 10 nanograms of fragmented DNA input. UDSeq achieved approximately four-fold higher conversion efficiency. Horizontal lines indicate medians; boxes represent interquartile ranges (IQR), and whiskers extend to 1.5× IQR. Statistical significance was assessed using a two-sided t-test (p=0.00022). (d) Cost-effectiveness comparison of UDSeq, NanoSeq, CODEC, and HiDEF-seq. Projected error rates and estimated cost per megabase of duplex coverage for CODEC (using HpyCH4V enzymatic fragmentation), CODEC (sonication), NanoSeqV1, HiDEF-seq, and UDSeq. Error rates are shown on the y-axis.

media-1.pdf^{(1.6MB, pdf)}

Supplement 2

Supplemental Figure 2: Mutational profiles and genomic analyses across species and tissues. (a) SBS-96 mutational profiles from distinct anatomical layers of the kidney in three individual sheep, with the relative contributions of the clock-like COSMIC signatures SBS1 and SBS5 shown alongside each profile. Each SBS-96 profile represents all single-base substitutions (C>A, C>G, C>T, T>A, T>C, T>G) within their trinucleotide sequence context. Y-axis scales are adjusted independently to optimize visualization of mutation percentages within each trinucleotide context. (b) SBS-96 mutational profiles from skin, breast, and pancreas tissues in chicken, with the contributions of the clock-like COSMIC signatures SBS1 and SBS5, as well as the reactive oxygen species–associated signature SBS18, displayed adjacent to each profile. Profiles are displayed in a format consistent with (a). (c) Box plots showing mutation burden per base pair (log scale) for kidney cortex and two replicates of kidney medulla from sheep, as well as for skin, breast, and two replicates of pancreas from chicken. Horizontal lines indicate medians; boxes represent interquartile ranges (IQR), and whiskers extend to 1.5× IQR. (d) Bar plots showing the percentage of whole-genome duplex coverage achieved for chicken, sheep, rat, and mouse using the UDSeq approach. Coverage is expressed as the proportion of the genome successfully sequenced at the desired depth.

media-2.pdf^{(1.3MB, pdf)}

Supplement 3

media-3.pdf^{(1.1MB, pdf)}

ACKNOWLEDGMENTS

The authors would like to thank Cécilia Sirand for her technical support in performing some of the cell line experiments and Dr Fekadu Kassie for assistance with the NNK rat study. This work was supported by the US National Institute of Health grants R01ES032547, R01ES036931, R01CA269919, R01CA296974, P01CA281819, and U01CA290479 to L.B.A. and RO1CA220376 to S.B. as well as by L.B.A.’s Packard Fellowship for Science and Engineering and the UC San Diego Sanford Stem Cell Institute. The work presented here is also supported by a network grant from The Larry L. Hillblom Foundation to L.B.A. and J.G.G. as well as by UK Grand Challenge 2016 Award “Mutographs of Cancer” C98/A24032 to L.B.A. and J.Z. This work was supported in part by NIH award R00HD111686 to X.Y. The computational analyses reported in this manuscript have utilized the Triton Shared Computing Cluster at the San Diego Supercomputer Center of UC San Diego. The funders had no roles in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Footnotes

DISCLAIMER

Where members are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization.

COMPETING INTERESTS

L.B.A. is a co-founder, CSO, scientific advisory member, and consultant for Acurion (formerly io9), has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Hologic, Inc. L.B.A. declares U.S. provisional applications filed with UCSD with serial numbers: 63/269,033; 63/289,601; 63/483,237; 63/412,835; 63/492,348; and 63/366,392 as well as a European patent application with application number EP25305077.7. L.B.A. and S.P.N. also declare provisional patent application PCT/US2023/010679. L.B.A. is also an inventor of a US Patent 10,776,718 for source identification by non-negative matrix factorization. All other authors declare that they have no competing interests.

Data availability

All whole-genome sequencing data have been or will be deposited in the Sequence Read Archive (SRA) or the database of Genotypes and Phenotypes (dbGaP), as appropriate. Duplex sequencing data from N/TERT-1 and HepG2 cell lines, as well as SKH-1 mouse, F344 rat, sheep, and chicken tissues, are available under accession number PRJNA1262723. Duplex sequencing data for NOK cells are accessible via PRJNA1196807. Bulk clonal expansion sequencing data for iPSC, BEAS-2B, and N/TERT-1 were obtained from the respective publications cited in the manuscript. Data for human foreskin fibroblasts (HFF), generated as part of this study, are also deposited under PRJNA1262723. Whole-genome sequencing data from human subjects will be made available in dbGaP upon acceptance of the manuscript. Patient ID 7614 data can be accessed via PRJNA799597. Sequencing data for sperm samples are available under accession numbers PRJNA660493, PRJNA753973, and PRJNA588332. All other data are available from the corresponding authors or other sources upon reasonable request.

REFERENCES

1.Martincorena I. & Campbell P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015). 10.1126/science.aab4082 [DOI] [PubMed] [Google Scholar]
2.Ren P., Zhang J. & Vijg J. Somatic mutations in aging and disease. Geroscience 46, 5171–5189 (2024). 10.1007/s11357-024-01113-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Alexandrov L. B. et al. Clock-like mutational processes in human somatic cells. Nat Genet 47, 1402–1407 (2015). 10.1038/ng.3441 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Reva B., Antipin Y. & Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic acids research 39, e118–e118 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Evan G. & Littlewood T. A matter of life and cell death. Science 281, 1317–1322 (1998). [DOI] [PubMed] [Google Scholar]
6.Maeda H. & Kakiuchi N. Clonal expansion in normal tissues. Cancer Sci 115, 2117–2124 (2024). 10.1111/cas.16183 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Martincorena I. Somatic mutation and clonal expansions in human tissues. Genome Medicine 11, 35 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Stratton M. R., Campbell P. J. & Futreal P. A. The cancer genome. Nature 458, 719–724 (2009). 10.1038/nature07943 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Greaves M. & Maley C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Blair B. G., Bardelli A. & Park B. H. Somatic alterations as the basis for resistance to targeted therapies. J Pathol 232, 244–254 (2014). 10.1002/path.4278 [DOI] [PubMed] [Google Scholar]
11.Shendure J. & Akey J. M. The origins, determinants, and consequences of human mutations. Science 349, 1478–1483 (2015). [DOI] [PubMed] [Google Scholar]
12.Proukakis C. Somatic mutations in neurodegeneration: An update. Neurobiology of Disease 144, 105021 (2020). [DOI] [PubMed] [Google Scholar]
13.Jaiswal S. & Ebert B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Pfeifer G. P. & Jin S.-G. Methods and applications of genome-wide profiling of DNA damage and rare mutations. Nature Reviews Genetics, 1–18 (2024). [Google Scholar]
15.Fowler J. C. & Jones P. H. Somatic mutation: what shapes the mutational landscape of normal epithelia? Cancer discovery 12, 1642–1655 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Stoler N. & Nekrutenko A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom Bioinform 3, lqab019 (2021). 10.1093/nargab/lqab019 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Dou Y., Gold H. D., Luquette L. J. & Park P. J. Detecting somatic mutations in normal cells. Trends in Genetics 34, 545–557 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Menon V. & Brash D. E. Next-Generation Sequencing Methodologies to Detect Low-Frequency Mutations:“Catch Me If You Can”. Mutation Research/Reviews in Mutation Research, 108471 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kucab J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Koh G., Zou X. & Nik-Zainal S. Mutational signatures: experimental design and analytical framework. Genome Biol 21, 37 (2020). 10.1186/s13059-020-1951-5020-1951-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Lodato M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015). 10.1126/science.aab1785 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Xing D., Tan L., Chang C. H., Li H. & Xie X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc Natl Acad Sci U S A 118 (2021). 10.1073/pnas.2013106118 [DOI] [Google Scholar]
23.Gonzalez-Pena V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc Natl Acad Sci U S A 118 (2021). 10.1073/pnas.2024176118 [DOI] [Google Scholar]
24.Dong X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat Methods 14, 491–493 (2017). 10.1038/nmeth.4227 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Petljak M. et al. Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell 176, 1282–1294 e1220 (2019). 10.1016/j.cell.2019.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Blokzijl F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016). 10.1038/nature19768 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Mitchell E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022). 10.1038/s41586-022-04786-y [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Yoshida K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020). 10.1038/s41586-020-1961-11961-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Luquette L. J. et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nature genetics 54, 1564–1571 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Hou Y. et al. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing. Gigascience 4, s13742-13015-10068-13743 (2015). [Google Scholar]
31.Menon V. & Brash D. E. Next-generation sequencing methodologies to detect low-frequency mutations: “Catch me if you can”. Mutat Res Rev Mutat Res 792, 108471 (2023). 10.1016/j.mrrev.2023.108471 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Schmitt M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A 109, 14508–14513 (2012). 10.1073/pnas.1208715109 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kennedy S. R. et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc 9, 2586–2606 (2014). 10.1038/nprot.2014.170 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Hoang M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proceedings of the National Academy of Sciences 113, 9846–9851 (2016). doi: 10.1073/pnas.1607794113 [DOI] [Google Scholar]
35.Abascal F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021). [DOI] [PubMed] [Google Scholar]
36.Bae J. H. et al. Single duplex DNA sequencing with CODEC detects mutations with high sensitivity. Nature Genetics 55, 871–879 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Lawson A. R. et al. Somatic mutation and selection at epidemiological scale. medRxiv, 2024.2010. 2030.24316422 (2024). [Google Scholar]
38.Liu M. H. et al. DNA mismatch and damage patterns revealed by single-molecule sequencing. Nature, 1–10 (2024). [Google Scholar]
39.Wenger A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature biotechnology 37, 1155–1162 (2019). [Google Scholar]
40.Jónsson H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017). [DOI] [PubMed] [Google Scholar]
41.Halldorsson B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019). [DOI] [PubMed] [Google Scholar]
42.Axelsson J. et al. Frequency and spectrum of mutations in human sperm measured using duplex sequencing correlate with trio-based de novo mutation analyses. Scientific Reports 14, 23134 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Rahbari R. et al. Timing, rates and spectra of human germline mutation. Nat Genet 48, 126–133 (2016). 10.1038/ng.3469 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Zou X. et al. Validating the concept of mutational signatures with isogenic cell models. Nature communications 9, 1744 (2018). [Google Scholar]
45.Zavadil J. & Rozen S. G. Experimental Delineation of Mutational Signatures Is an Essential Tool in Cancer Epidemiology and Prevention. Chem Res Toxicol 32, 2153–2155 (2019). 10.1021/acs.chemrestox.9b00339 [DOI] [PubMed] [Google Scholar]
46.Korenjak M. et al. Human cancer genomes harbor the mutational signature of tobacco-specific nitrosamines NNN and NNK. bioRxiv, 2024.2006. 2028.600253 (2024). [Google Scholar]
47.Speer R. M. et al. Arsenic is a potent co-mutagen of ultraviolet light. Communications Biology 6, 1273 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Senkin S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024). 10.1038/s41586-024-07368-207368-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Nik-Zainal S. et al. The genome as a record of environmental exposure. Mutagenesis 30, 763–770 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Mingard C. et al. Dissection of cancer mutational signatures with individual components of cigarette smoking. Chemical Research in Toxicology 36, 714–723 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Ivanov D., Hwang T., Sitko L. K., Lee S. & Gartner A. Experimental systems for the analysis of mutational signatures: no ‘one-size-fits-all’solution. Biochemical Society Transactions 51, 1307–1317 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Martin G. M. et al. Somatic mutations are frequent and increase with age in human kidney epithelial cells. Human Molecular Genetics 5, 215–221 (1996). [DOI] [PubMed] [Google Scholar]
53.Díaz-Gay M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Alexandrov L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Alexandrov L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016). 10.1126/science.aag0299 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Zhivagui M. et al. DNA damage and somatic mutations in mammalian cells after irradiation with a nail polish dryer. nature communications 14, 276 (2023). [Google Scholar]
57.Cheng Y. et al. Improved Mutation Detection in Duplex Sequencing Data with Sample-Specific Error Profiles. bioRxiv, 2025.2007. 2013.664565 (2025). [Google Scholar]
58.Bergstrom E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC genomics 20, 1–12 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Diaz-Gay M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. bioRxiv, 2023.2007. 2010.548264 (2023). [Google Scholar]
60.Team, R. C. R: A language and environment for statistical computing. (2013).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.pdf^{(1.6MB, pdf)}

Supplement 2

media-2.pdf^{(1.3MB, pdf)}

Supplement 3

media-3.pdf^{(1.1MB, pdf)}

Data Availability Statement

[R1] 1.Martincorena I. & Campbell P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015). 10.1126/science.aab4082 [DOI] [PubMed] [Google Scholar]

[R2] 2.Ren P., Zhang J. & Vijg J. Somatic mutations in aging and disease. Geroscience 46, 5171–5189 (2024). 10.1007/s11357-024-01113-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Alexandrov L. B. et al. Clock-like mutational processes in human somatic cells. Nat Genet 47, 1402–1407 (2015). 10.1038/ng.3441 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Reva B., Antipin Y. & Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic acids research 39, e118–e118 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Evan G. & Littlewood T. A matter of life and cell death. Science 281, 1317–1322 (1998). [DOI] [PubMed] [Google Scholar]

[R6] 6.Maeda H. & Kakiuchi N. Clonal expansion in normal tissues. Cancer Sci 115, 2117–2124 (2024). 10.1111/cas.16183 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Martincorena I. Somatic mutation and clonal expansions in human tissues. Genome Medicine 11, 35 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Stratton M. R., Campbell P. J. & Futreal P. A. The cancer genome. Nature 458, 719–724 (2009). 10.1038/nature07943 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Greaves M. & Maley C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Blair B. G., Bardelli A. & Park B. H. Somatic alterations as the basis for resistance to targeted therapies. J Pathol 232, 244–254 (2014). 10.1002/path.4278 [DOI] [PubMed] [Google Scholar]

[R11] 11.Shendure J. & Akey J. M. The origins, determinants, and consequences of human mutations. Science 349, 1478–1483 (2015). [DOI] [PubMed] [Google Scholar]

[R12] 12.Proukakis C. Somatic mutations in neurodegeneration: An update. Neurobiology of Disease 144, 105021 (2020). [DOI] [PubMed] [Google Scholar]

[R13] 13.Jaiswal S. & Ebert B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Pfeifer G. P. & Jin S.-G. Methods and applications of genome-wide profiling of DNA damage and rare mutations. Nature Reviews Genetics, 1–18 (2024). [Google Scholar]

[R15] 15.Fowler J. C. & Jones P. H. Somatic mutation: what shapes the mutational landscape of normal epithelia? Cancer discovery 12, 1642–1655 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Stoler N. & Nekrutenko A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom Bioinform 3, lqab019 (2021). 10.1093/nargab/lqab019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Dou Y., Gold H. D., Luquette L. J. & Park P. J. Detecting somatic mutations in normal cells. Trends in Genetics 34, 545–557 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Menon V. & Brash D. E. Next-Generation Sequencing Methodologies to Detect Low-Frequency Mutations:“Catch Me If You Can”. Mutation Research/Reviews in Mutation Research, 108471 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Kucab J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Koh G., Zou X. & Nik-Zainal S. Mutational signatures: experimental design and analytical framework. Genome Biol 21, 37 (2020). 10.1186/s13059-020-1951-5020-1951-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Lodato M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015). 10.1126/science.aab1785 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Xing D., Tan L., Chang C. H., Li H. & Xie X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc Natl Acad Sci U S A 118 (2021). 10.1073/pnas.2013106118 [DOI] [Google Scholar]

[R23] 23.Gonzalez-Pena V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc Natl Acad Sci U S A 118 (2021). 10.1073/pnas.2024176118 [DOI] [Google Scholar]

[R24] 24.Dong X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat Methods 14, 491–493 (2017). 10.1038/nmeth.4227 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Petljak M. et al. Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell 176, 1282–1294 e1220 (2019). 10.1016/j.cell.2019.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Blokzijl F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016). 10.1038/nature19768 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Mitchell E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022). 10.1038/s41586-022-04786-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Yoshida K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020). 10.1038/s41586-020-1961-11961-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Luquette L. J. et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nature genetics 54, 1564–1571 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Hou Y. et al. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing. Gigascience 4, s13742-13015-10068-13743 (2015). [Google Scholar]

[R31] 31.Menon V. & Brash D. E. Next-generation sequencing methodologies to detect low-frequency mutations: “Catch me if you can”. Mutat Res Rev Mutat Res 792, 108471 (2023). 10.1016/j.mrrev.2023.108471 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Schmitt M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A 109, 14508–14513 (2012). 10.1073/pnas.1208715109 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Kennedy S. R. et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc 9, 2586–2606 (2014). 10.1038/nprot.2014.170 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Hoang M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proceedings of the National Academy of Sciences 113, 9846–9851 (2016). doi: 10.1073/pnas.1607794113 [DOI] [Google Scholar]

[R35] 35.Abascal F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021). [DOI] [PubMed] [Google Scholar]

[R36] 36.Bae J. H. et al. Single duplex DNA sequencing with CODEC detects mutations with high sensitivity. Nature Genetics 55, 871–879 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Lawson A. R. et al. Somatic mutation and selection at epidemiological scale. medRxiv, 2024.2010. 2030.24316422 (2024). [Google Scholar]

[R38] 38.Liu M. H. et al. DNA mismatch and damage patterns revealed by single-molecule sequencing. Nature, 1–10 (2024). [Google Scholar]

[R39] 39.Wenger A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature biotechnology 37, 1155–1162 (2019). [Google Scholar]

[R40] 40.Jónsson H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017). [DOI] [PubMed] [Google Scholar]

[R41] 41.Halldorsson B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019). [DOI] [PubMed] [Google Scholar]

[R42] 42.Axelsson J. et al. Frequency and spectrum of mutations in human sperm measured using duplex sequencing correlate with trio-based de novo mutation analyses. Scientific Reports 14, 23134 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Rahbari R. et al. Timing, rates and spectra of human germline mutation. Nat Genet 48, 126–133 (2016). 10.1038/ng.3469 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Zou X. et al. Validating the concept of mutational signatures with isogenic cell models. Nature communications 9, 1744 (2018). [Google Scholar]

[R45] 45.Zavadil J. & Rozen S. G. Experimental Delineation of Mutational Signatures Is an Essential Tool in Cancer Epidemiology and Prevention. Chem Res Toxicol 32, 2153–2155 (2019). 10.1021/acs.chemrestox.9b00339 [DOI] [PubMed] [Google Scholar]

[R46] 46.Korenjak M. et al. Human cancer genomes harbor the mutational signature of tobacco-specific nitrosamines NNN and NNK. bioRxiv, 2024.2006. 2028.600253 (2024). [Google Scholar]

[R47] 47.Speer R. M. et al. Arsenic is a potent co-mutagen of ultraviolet light. Communications Biology 6, 1273 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Senkin S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024). 10.1038/s41586-024-07368-207368-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Nik-Zainal S. et al. The genome as a record of environmental exposure. Mutagenesis 30, 763–770 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Mingard C. et al. Dissection of cancer mutational signatures with individual components of cigarette smoking. Chemical Research in Toxicology 36, 714–723 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Ivanov D., Hwang T., Sitko L. K., Lee S. & Gartner A. Experimental systems for the analysis of mutational signatures: no ‘one-size-fits-all’solution. Biochemical Society Transactions 51, 1307–1317 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Martin G. M. et al. Somatic mutations are frequent and increase with age in human kidney epithelial cells. Human Molecular Genetics 5, 215–221 (1996). [DOI] [PubMed] [Google Scholar]

[R53] 53.Díaz-Gay M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Alexandrov L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Alexandrov L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016). 10.1126/science.aag0299 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Zhivagui M. et al. DNA damage and somatic mutations in mammalian cells after irradiation with a nail polish dryer. nature communications 14, 276 (2023). [Google Scholar]

[R57] 57.Cheng Y. et al. Improved Mutation Detection in Duplex Sequencing Data with Sample-Specific Error Profiles. bioRxiv, 2025.2007. 2013.664565 (2025). [Google Scholar]

[R58] 58.Bergstrom E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC genomics 20, 1–12 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Diaz-Gay M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. bioRxiv, 2023.2007. 2010.548264 (2023). [Google Scholar]

[R60] 60.Team, R. C. R: A language and environment for statistical computing. (2013).

PERMALINK

This is a preprint.

A Universal Duplex Sequencing Approach for Accurate Detection of Somatic Mutations

Shuvro P Nandi

Yuhe Cheng

Shams Al-Azzam

Safa Saeed

Audrey Kristin

Nadia Sunico

Isabella R Stuewe

Zichen Jiang

Luka Culibrk

Maria Zhivagui

Xiaoxu Yang

Rachel M Wise

Foster C Jacobs

Bérénice Chavanel

Michael Korenjak

Mia Petljak

Silvia Balbo

Laurie G Hudson

Ke Jian Liu

Jiri Zavadil

Joseph G Gleeson

Ludmil B Alexandrov

Abstract

INTRODUCTION

RESULTS

Overview of Existing Error-Corrected Sequencing Methods

Table 1:

Innovation Over Prior Protocols

Figure 1: Overview and validation of UDSeq for accurate detection of somatic mutations.

Assessing and Comparing the Error Rate of UDSeq

In vitro Assessment of Mutagenesis

Figure 2: UDSeq enables rapid, ultra-low-input, and versatile assessment of in vitro mutational profiles.

In vivo Assessment of Mutagenesis

Figure 3: UDSeq-based in vivo mutagenesis in mouse and rat models.

Examining Mutational Processes in Healthy Human Tissues

Figure 4: UDSeq-based mutational burden and mutational profiles across healthy human tissues.

DISCUSSION

METHODS

Human biospecimens

Cytotoxicity assessment

In vitro experiments

In vivo experiments

UDSeq Library Preparation

DNA Quantification, Dilution, and PCR Amplification

Trimming, Alignment, and Mutation Identification

Mutational profile and signature analysis

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

Data availability

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases