Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Jun 30;66(6-7):311–326. doi: 10.1002/em.70020

Transferability, Reproducibility and Sensitivity of Mutation Quantification by Duplex Sequencing

Shaofei Zhang 1,, Barbara L Parsons 2, Devon Fitzgerald 3, Anne Ashford 4, James Todd Auman 5, Tao Chen 2, Annette Dodge 6, Azeddine Elhajouji 7, Lena Pfaller 7, Shawn Harris 8, Jake Higgins 3, Cheryl A Hobbs 5, Francesco Marchetti 6, Matthew J Meier 6, Meagan B Myers 2, Jesse Salk 3, Rebecca Sahroui 9, David Schuster 10, Raja Settivari 9, Stephanie L Smith‐Roe 11, Carole L Yauk 10, Jian Yan 2, Andrew Williams 6, Connie L Chen 12
PMCID: PMC12451229  PMID: 40586373

ABSTRACT

Duplex Sequencing (DS) is an ultra‐accurate, error‐corrected next generation sequencing (ecNGS) technology for mutation analysis. A working group (WG) within Health and Environmental Sciences Institute's Genetic Toxicology Technical Committee is investigating the suitability of ecNGS for regulatory mutagenicity testing, using DS as a model. Initial steps to promote acceptance require demonstrating technical reproducibility across DS‐experienced and inexperienced laboratories and establishing the method's sensitivity relative to conventional tests. Thus, the WG conducted a ‘reconstruction experiment’ to evaluate the transferability, reproducibility, and sensitivity of DS. TwinStrand Biosciences first applied DS to establish mutation frequency (MF) in DNA samples extracted from the livers of an untreated Sprague Dawley rat, or rats treated with either 100 mg/kg/day benzo[a]pyrene (B[a]P) for ten days or 40 mg/kg/day N‐ethyl‐N‐nitrosourea (ENU) for three days. Using the measured MF in these original samples, mixtures were then constructed using the B[a]P‐ and ENU‐treated samples to create “MF standards” with target MFs 1.2‐, 1.5‐, and 2‐fold greater than the untreated control. Aliquots of these standards were distributed to seven laboratories in North America and Europe. DS libraries were prepared by each laboratory and TwinStrand. All eight laboratories met library preparation and assay performance metrics to yield high quality sequencing data with MF in the expected ‘MF standard’ range. The measured MF and mutation spectra were nearly identical across the laboratories and a 2‐fold increase in MF could readily be identified in all labs relative to the untreated controls. The results confirm the high reproducibility and sensitivity of DS for mutagenicity assessment.

Keywords: ecNGS, mutation frequency, mutational spectra, power analysis, reconstruction experiment

1. Introduction

Next‐generation DNA sequencing (NGS) has made it possible to analyze genetic alterations in millions of nucleotides of DNA at a time, but limitations in accuracy (error‐rates on the order of one‐in‐one‐hundred to one‐in‐one‐thousand) preclude measurement of the much lower mutation frequencies (MFs) required for genetic toxicology applications. Duplex Sequencing (DS) is a double‐stranded tag‐based error correction technology that improves the accuracy of NGS by more than 10,000‐fold and allows for sensitive detection of extremely rare mutations (Salk et al. 2018), such as those induced by chemical mutagens (Salk and Kennedy 2020). Multiple studies provide evidence of DS utility in the detection of induced mutation (Wang et al. 2021; Cho et al. 2023; Smith‐Roe et al. 2023; Seo et al. 2024); while others have evaluated DS equivalence with the transgenic rodent (TGR) mutation assays currently used in regulatory genetic safety assessment (Valentine 3rd et al. 2020; LeBlanc et al. 2022; Bercu et al. 2023; Dodge et al. 2023; Cheung et al. 2024). In aggregate, these studies identified mutation induction by at least eight known mutagens in at least six tissues across four strains of rodents, as well as in human cells in culture and organoids. Additionally, this body of work collectively demonstrates that mutational sensitivities vary by genomic locus and DNA strand, based on contributing factors that may include transcription, methylation and chromatin state (Valentine 3rd et al. 2020; LeBlanc et al. 2022; Dodge et al. 2023). Given the apparent success of these early studies, an error‐corrected NGS workgroup (ecNGS WG) under the auspices of the Health and Environmental Sciences Institute's (HESI) Genetic Toxicology Technical Committee (GTTC) enumerated the potential advantages of DS over current genetic toxicology testing methods and described opportunities for DS to advance other areas of mutation research (Marchetti et al. 2023a, 2023b).

The DS approach has a number of advantages over other conventional technologies with OECD test guidelines in terms of the richness of data generated and the broad applicability across species and tissues. Due to the very low rate at which even mutagen‐exposed cells accumulate mutations, nearly all conventional (e.g., Organization for Economic Co‐operation and Development (OECD)) assays to date have relied on biological selection systems. Specifically, mutations in defined reporter genes that alter a phenotype to promote preferential growth over non‐mutated cells can be quantified using various assay methods. These include the bacterial reverse mutation (Ames) test (bacteria in vitro) (OECD 2020), HPRT/APRT/HGPRT (mammalian in vitro) (OECD 2016), and transgenic reporter systems like Big Blue and MutaMouse (rodent in vivo) (OECD 2025). However, several limitations to these conventional tests exist. Bacteria are convenient and inexpensive to work with but are far from perfect surrogates for mammals. Transgenic rodent models make it possible to test mutagenicity in vivo (Lambert et al. 2005); however, the availability of transgenic rodents can be an issue, a complex ex vivo laboratory component is required for its readout, and overall, the approach is expensive. Moreover, determining the spectra of mutations induced is very labor‐intensive and impractical outside a research setting (Besaratinia et al. 2012; Beal et al. 2020). HPRT and related assays rely on genes natively present in immortalized mammalian cells and make it possible to directly detect mutation induction in human cells, yet the selective growth condition required to obtain single cell‐derived colonies is not amenable to the majority of human cell types, and the very small reporter locus limits the generalizability of spectral determination and ability to extrapolate to the rest of the genome. Because DS measures chemically induced mutations directly and identifies the resultant mutational signatures using native DNA without the need for phenotypic selection, it enables analysis in any cell type from any organism with a reference genome, be it cells in culture or tissues from rodents or humans.

Recognizing the promise of DS, the ecNGS WG considered the data needed for adoption by regulatory agencies. Before a technology can be formally investigated using appropriate procedures (i.e., ring trial as per OECD), its methodology must first be demonstrated to be robust, sufficiently detailed, and validated (OECD 2002). Although groups of investigators have used DS successfully for mutational analyses, a large inter‐laboratory study demonstrating successful technology transfer and reproducibility had not been performed. Goals of this study, therefore, were to demonstrate the transferability and reproducibility of DS among laboratories with and without previous DS experience. Additional goals were to establish consensus on critical methodological parameters and gather information on technical sensitivity; this information is necessary to evaluate its equivalence with TGR methods. To accomplish this, a set of rat DNA samples with defined MFs was prepared and shared among investigators, who independently prepared libraries, submitted their libraries for sequencing, and jointly evaluated the study results.

While the DS validation study underscores the promising potential of ecNGS approaches in genetic toxicology, the discontinuation of DS kits presents a significant hurdle for this particular technology. Promptly addressing this issue and exploring alternative ecNGS methodologies will be crucial for the continued advancement of these technologies. The current work offers a potential model for assessing transferability, reproducibility, and sensitivity of other ecNGS technologies, aligning with the WG's broader efforts to promote the adoption of ecNGS methods in regulatory testing.

2. Material and Methods

2.1. Chemicals

N‐Ethyl‐N‐nitrosourea (ENU; CAS #759‐73‐9) and benzo[a]pyrene (B[a]P, CAS #50–32‐8) were obtained from Sigma–Aldrich (St. Louis, MO). Formulations of ENU were prepared daily in phosphate buffered saline (Nova‐Tech, Kingwood, TX) at a concentration of 8 mg/mL. B[a]P formulations were prepared weekly in sesame oil (Spectrum Chemical, New Brunswick, NJ) at a concentration of 20 mg/mL.

2.2. Ethics Statement

The in‐life portion of this study was conducted by Integrated Laboratory Systems (ILS; Research Triangle Park, NC) with prior approval by the ILS Institutional Animal Care and Use Committee. All procedures complied with the Animal Welfare Act Regulations, 9 CFR 1–4 and animals were handled and treated according to the Guide for the Care and Use of Laboratory Animals (National Research Council 2011).

2.3. Animal Treatment and Tissue Collection

Male Sprague Dawley rats (Hsd:Sprague Dawley; Envigo Laboratories, Frederick, MD) were maintained in a Specific Pathogen‐Free facility in constant temperature rooms (20°C–25°C) with a relative humidity of 30%–70% on a 12‐h light/12‐h dark cycle. Animals were housed in polycarbonate cages with micro‐isolator tops containing absorbent heat‐treated hardwood bedding (Northeastern Products Corp., Warrensburg, NY). Certified Purina Pico Chow No. 5001 (Ralston Purina Co., St. Louis, MO) and reverse osmosis treated tap water were provided ad libitum. The animals were assigned to a dose group such that the mean body weight of each group was not statistically different from any other group. At 8–10 weeks of age, three animals per group were left untreated or administered 40 mg/kg/day ENU or 100 mg/kg/day B[a]P by oral gavage daily for 3 or 10 consecutive weekdays, respectively. The dose levels were selected to maximize the mutational load under this dosing regimen. To be consistent with the TG 488 tissue harvest protocol, animals were humanely euthanized at Day 31 following initiation of dosing using carbon dioxide asphyxiation, and death was confirmed by exsanguination.

All lobes of the liver were harvested and rinsed in cold mincing solution (Mg+2 and Ca+2 free Hanks Balanced Salt Solution, 10% v/v DMSO, and 20 mM EDTA pH 7.4–7.7) to remove residual blood. Tissue was cut into ~5 × 5 × 5 mm sections and placed into tubes embedded in dry ice. Tubes were transferred to a − 80°C freezer until shipped on dry ice to TwinStrand Biosciences (Seattle, WA). Care was taken to avoid cross‐contamination of cells between different organ types within each animal and between animals. Clean instruments and gloves were used for each animal.

2.4. DNA Extraction and Mixture Preparation

DNA was extracted using a modified “salting out” method (DNA Extraction Kit, Agilent Technologies, Santa Clara, CA), according to the manufacturer's protocol for extraction from whole tissue. Extractions were performed with approximately 250 mg of liver tissue and overnight proteinase digestion at 37°C. Final DNA pellets were resuspended in IDTE (10 mM Tris, 0.1 mM EDTA) (Integrated DNA Technologies, Coralville, IA). To yield sufficient DNA, between 4 and 12 tissue sections were extracted per rat, and the resulting DNA was pooled together after initial quality assessment. Pooled DNA was concentrated using AMPure XP beads (Beckman Coulter, Brea, CA) at a 1:1 ratio. DNA samples were quantified using a Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA) and quality was assessed using a Genomic DNA ScreenTape Assay (Agilent Technologies, Santa Clara, CA). All pooled DNA samples had concentrations > 50 ng/μL and DNA integrity numbers (DIN) > 7.

For each of the three pure DNA samples, DS libraries were prepared according to the standard protocol (see below). For each sample, three replicate libraries were prepared with 1000 ng of DNA input mass per library. MFs from replicate libraries were averaged to estimate the true MF for each pure sample. Average MFs were as follows: 8.12 × 10−8 for untreated, 4.36 × 10−7 for B[a]P‐treated, and 1.82 × 10−6 for ENU‐treated rats.

DNA extractions and mixture preparations were performed at TwinStrand Biosciences, and aliquots of pure and mixed DNA samples were distributed to seven other laboratories for DS library preparation (Figure 1A).

FIGURE 1.

FIGURE 1

Study site locations and sample descriptions. (A) Locations of eight study sites spanning four countries and two continents. (B) Pure and mixture samples used in the study and previously measured or expected mutation frequencies (MFmin). Expected MFmin values targeted by the DNA mixtures are italicized. Colored boxes at left of the table indicate sample color coding used in subsequent figures. See Methods for details regarding preparation of mixtures.

Mixtures of DNA from untreated and mutagenized rats were made to simulate samples with MFs 1.2× (120%), 1.5× (150%), and 2× (200%) higher than the pure untreated sample. These will be referred to as B[a]P (or ENU) 1.2× mixture, 1.5× mixture, and 2× mixture. One series of mixtures was made by combining DNA from untreated and B[a]P‐treated rats and another by combining DNA from untreated and ENU‐treated rats (Figure 1B). The quantities of control DNA, B[a]P pure DNA, and ENU pure DNA used in mixtures to achieve the desired MFmin are detailed in Table S1.

2.5. Library Preparation and Sequencing

DS libraires were prepared by seven participating labs independently using TwinStrand DuplexSeq Mutagenesis kits according to the manufacturer's protocol. Briefly, 1000 ng or 500 ng of genomic DNA were digested to DNA fragments with a median size of about 250 to 300 base pairs using TwinStrand DuplexSeq Kit‐Enzymatic Fragmentation Module, followed by end repair, A‐tailing and DuplexSeq Adapters ligation reactions. A unique set of indexing primers used in the indexing reaction were pre‐assigned to each participating lab. The rat mutagenesis hybrid capture panel (TwinStrand product #06–1007‐02) includes 20 genomic targets (Table S2), each 2.4 kb (48 kb total), spread across the rat genome. The panel was designed to provide a representative sampling of the rat genome regarding GC% and trinucleotide content, genic/intergenic regions, and excluding repetitive elements and loci known to drive positive or negative selection. The size of final libraries was checked using the Agilent TapeStation system and libraries were quantified with the Qubit fluorometer. The set of libraries generated by the first three participating labs (A, B, and C) were pooled and sequenced on site at TwinStrand on an Illumina (San Diego, CA) NovaSeq 6000 system using an S4 flow cell with 2 × 150 bp read length. Libraries from the remaining four labs (D, E, F, and G) were pooled and sequenced at Psomagen on another S4 flow cell with 2 × 150 bp read length. The DS libraries prepared from each lab included one set of duplicate libraries made from the untreated rat DNA sample (DNA 01). The eighth set of libraries previously prepared and sequenced by TwinStrand (TS) were used as a reference for comparison with results from the seven other labs.

2.6. Duplex Consensus Making and Variant Calling

Raw sequencing data for all libraries were processed simultaneously using the TwinStrand DuplexSeq FASTQ to VCF Parallel App (version 3.21.0) in the DNAnexus cloud platform. The app performed consensus making, consensus read postprocessing, and variant calling according to previously described methods (Kennedy et al. 2014; Valentine 3rd et al. 2020; LeBlanc et al. 2022). When a clear duplex consensus base call could not be made, an N was called at that base in that molecule's duplex consensus sequence.

2.7. Calculation of MF and Spectrum

Total and subtype MF calculations, hierarchical clustering of base substitution spectra, and trinucleotide spectra generation were performed for all libraries simultaneously using the TwinStrand DuplexSeq Mutagenesis Report App (version 4.3.0) on DNAnexus. Only somatic variants with variant allele fraction (VAF) < 0.01 were included in the analysis. Analysis was further restricted to a custom reportable region, which excluded the following loci. First, any position with an apparent germline variant (VAF > 0.3) in one or more of the pure samples was excluded from analysis in all samples. This was done so that germline variants diluted to low frequency during DNA mixture preparation would not impact MFs and spectra. Additionally, an approximately 100 bp region within the target locus on chromosome 20 was excluded due to a previous observation of a very high number of artifactual variant calls flanking a cluster of closely spaced germline variants.

Unless otherwise noted, MFs were calculated using the minimum method (MFmin) with the assumption that multiple observations of the same mutation result from cell division rather than multiple independent mutation events, as described previously (Valentine 3rd et al. 2020; Dodge et al. 2023). Briefly, each unique rare variant was counted once, regardless of how many consensus reads supported the variant, and the total number of unique variants was divided by total duplex bases (excluding Ns). Subtype MFmin values were calculated similarly, but within context‐specific total duplex bases. Alternatively, MFmax was calculated by counting all occurrences of each unique rare variant. Wilson binomial confidence intervals (95%) were calculated for all MFs.

For hierarchical clustering by base substitution spectra, simple base substitution variants were converted into pyrimidine context and the proportion of each substitution type was calculated for each library. A cosine similarity matrix was calculated on substitution type proportions and unsupervised hierarchical clustering was performed using the Ward clustering algorithm.

To generate trinucleotide spectra, the frequency of each base substitution type in each trinucleotide context was derived by dividing the count of each substitution type by the relative abundance (total duplex depth, excluding Ns) of that context in the reportable region. Frequencies were normalized by the sum of the total single base substitution frequency, so that the proportions for each sample sum to one. Wilson binomial confidence intervals (95%) were calculated for context‐specific substitution frequencies and were normalized by the sum of the total single base substitution frequencies.

2.8. Generalized Linear Modeling for Analyses of MFs

MFs were analyzed using a generalized linear model (GLM) with extra‐binomial dispersion, assuming sample independence. All analyses were performed in the R statistical environment (R Core Team 2023). The GLM included main effects of laboratory and treatment, along with a laboratory‐by‐treatment interaction, using the quasibinomial family of models. Mean estimates and pairwise comparisons were conducted using the utility R library for groupwise statistics, LSmeans, and linear estimates were called with doBy (Højsgaard and Halekoh 2023). As this family of models employs a log link function, the estimates from the model parameters obtained through the esticon() function were exponentiated (i.e., back‐transformed). Estimates of the back‐transformed standard errors were approximated using the delta method. The p‐values for the fold change estimates were adjusted for multiple comparisons using the Sidak method (Bonferroni 1935; Sidak 1967; Wright 1992). Fold changes between treatment and controls were calculated independently within each laboratory.

For group‐based analysis of MFmin, libraries prepared from the same DNA samples were treated as replicates. A quasi‐Poisson generalized linear model was used to perform all possible pairwise comparisons between sample groups, and Benjamini‐Hochberg false‐discovery rate adjustment (Benjamini and Hochberg 1995) was used to correct for multiple comparisons.

2.9. Sampling‐Based Replicate Analysis

To assess whether a ~ 1.2‐fold increase in MFmin could be detected using fewer than eight technical replicates, we randomly selected replicates to make smaller groups and repeated the group‐based analysis. We tested group sizes between 2 and 7 and took 100 random samples of replicates at each group size. Libraries generated with 500 ng of input DNA (Lab B) were excluded from this analysis because of the lower data yield of these libraries. Sampling and pairwise statistical tests were performed separately for the B[a]P 1.2×, ENU 1.2×, B[a]P 1.5×, and ENU 1.5× mixtures compared to the untreated control sample, without correction for multiple comparisons. A similar analysis was repeated for each of the six single base substitution (SBS) subtype MFs.

2.10. Mutation Spectra

The trinucleotide spectra were analyzed using the likelihood ratio test as described in Piegorsch and Bailer (1994). Pairwise comparisons were conducted on 2 × 96 contingency tables of counts for each trinucleotide as well as on 2 × 10 contingency tables of counts for each variant type and SNV (single‐nucleotide variant) subtypes normalized to its pyrimidine context. These comparisons included treatments to controls and comparisons across laboratories for each treatment. Adjusted p‐values were obtained using the Sidak method (Sidak 1967), which was applied independently to each family of tests. This family of tests were defined as within laboratories and across laboratories for each treatment independently.

2.11. Power Analysis

A power analysis was conducted to determine the minimum sample size and informative duplex bases required to observe a significant increase in MF. Random samples were generated repeatedly to simulate experiments using estimates from this study. Over‐dispersed binomial samples were randomly generated using the rbinom() function with an MF of 2 × 10−9 and overdispersion assuming a normal distribution with a mean of zero and standard deviations ranging from 0.05 to 0.15. A GLM was employed to calculate the significance between the control and treated samples. The bisection method was used to estimate the minimum detectable fold change of the simulated experiment, yielding a power of 80%. For each iteration of the bisection method, the power was estimated by simulating 2000 data sets and calculating the proportion of times a significant result was achieved at the 0.05 significance level. The sample size, sample variance, and number of informative duplex bases were varied, and the results from these analyses were used to produce power curves.

3. Results

3.1. Mortalities, Clinical Signs, and Body Weights

All animals survived to the scheduled termination; clinical observations indicated no signs of toxicity during the study. Lower terminal body weight as compared to untreated control animals was exhibited in rats administered the test articles (−10.3% and − 7.8% for ENU and B[a]P, respectively).

3.2. QC Metrics Overview of Sequencing Data From Eight Participating Labs

Eight labs each constructed ten libraries from the nine DNA samples provided, with the untreated control prepared in duplicate by each lab. Libraries were sequenced across three different flow cells with each library allocated ~700 million raw paired‐end reads (696,136,363 ± 26%) or ~350 million sequencing clusters (Table S3). Most libraries were prepared using 1000 ng of input DNA per library, but Lab B used 500 ng of input DNA per library. Because DS data yield depends greatly on the input DNA mass, with a roughly linear relationship, performance of the 500 ng and 1000 ng libraries were considered separately. Libraries prepared with 500 ng of input DNA yielded ~1.2 billion informative duplex bases each (1,222,407,630 ± 11%, Figure 2A, left) and libraries prepared with 1000 ng of input DNA yielded ~2.2 billion informative duplex bases each (2,233,908,655 ± 14%, Figure 2A, right). These total data yields equate to ~20,000× or ~37,000× average duplex molecular depth across the target regions for 500 ng and 1000 ng inputs, respectively (19,646% ± 10%, 36,810% ± 13%, Figure 2B). Peak tag family size (PTFS) is the modal number of reads representing each strand in duplex consensus families, with PTFS of 10 representing a balance between optimizing data yield and sequencing cost. All libraries had PTFS > 8 indicating sufficient sequencing and libraries prepared with 500 ng of input DNA had higher PTFS values (48.30% ± 9%) than the 1000 ng libraries (16.36% ± 26%), as expected (Figure 2C). Additional assay performance metrics are summarized in Table S3. Overall, all eight labs generated high quality DS libraries from the provided DNA samples. Data yields per library were relatively consistent within and between labs and scaled with input DNA mass, as expected.

FIGURE 2.

FIGURE 2

Assay performance matrix across eight laboratories. Per‐library informative duplex bases (A), mean duplex depth (B), and peak tag family size (C) are plotted as points, grouped by lab along the x‐axis and color‐coded by sample type. Each plot is divided into sub‐panels for libraries prepared with 500 ng (left) or 1000 ng (right) of input mass.

3.3. Reproducibility Measured by MFs (MFmin )

Rigorous evaluation of DS sensitivity and reproducibility is needed to support its use as a regulatory test for mutagenicity. To evaluate reproducibility, eight labs analyzed the same set of DNA samples. Technical sensitivity of DS was analyzed concurrently by assessing the lowest quantity of treated rat DNA when spiked into untreated rat DNA could be detected as significantly increased MFmin relative to the untreated control. The DS MFmin measurements from the analysis of the set of 10 samples by each of the eight labs are shown in Figure 3. It is apparent that the results from the different labs are highly reproducible. Importantly, the results from each lab showed comparable and expected increases in MF based on the amounts of pure DNA from mutagenized rats spiked into the different samples. MFmin is the preferred endpoint for the DS analysis of mutagenesis because it eliminates possible confounding due to clonal expansion of individual mutants. However, high interlaboratory reproducibility was also observed based on quantitation using MFmax (Figure S1).

FIGURE 3.

FIGURE 3

High interlaboratory reproducibility demonstrated in DS analyses of shared DNA samples. MFmin was calculated as the unique mutation count divided by the total duplex bases (excluding no‐calls). Error bars represent 95% binomial proportion Wilson score intervals calculated from the mutation frequency data.

Some participating laboratories had considerable previous experience using DS. Other laboratories were trained immediately preceding library preparation of study samples. Given that reproducible results were obtained irrespective of extent of laboratory experience, it is concluded that the DS technology (including kit design, instruction, and support) has great across‐laboratory transferability.

3.4. Reproducibility Measured by MFs

A generalized linear model was used to estimate the effects of laboratory and treatment on MFmin. Mean estimates (Figure 4A) did not differ significantly between any of the eight laboratories for any of the treatment groups (p > 0.05). Fold changes from control were consistent with the expected fold increases in MFmin based on the amount of treated DNA in each sample across all labs (Figure 4B). Fold changes across laboratories ranged from 1.1–1.3×, 1.4–2.3×, 1.8–2.8×, and 5.3–7.2× for B[a]P 1.2× mixture, 1.5× mixture, 2× mixture, and pure B[a]P treated samples, respectively. For the ENU treated samples or mixtures, fold changes ranged from 1.1–1.4×, 1.4–2.1×, 1.9–3.0×, and 22‐33× for ENU 1.2× mixture, 1.5× mixture, 2× mixture, and pure ENU treated samples, respectively. Significant increases from control (p < 0.05) were detected for most individual laboratories (two control replicates vs. one mixture replicate) starting at the 1.5× mixtures for both B[a]P and ENU. Laboratory D was able to detect significant increases from control at the 1.2× mixtures for both chemicals. All laboratories detected significant increases in MF at 2× mixtures and pure B[a]P and ENU treated samples.

FIGURE 4.

FIGURE 4

Highly reproducible results across labs for both B[a]P and ENU mixtures as measured by mutation frequencies. (A) MFs for each sample estimated using a generalized linear model. Error bars represent the standard error. (B) Fold change in MFs from control for each mixture or pure sample across labs. Error bars represent the standard error. Asterisks indicate significance at the 0.05 level.

3.5. Technical Sensitivity of DS

To further examine the technical sensitivity of DS, we treated all libraries prepared from the same DNA sample as technical replicates and performed group‐based analysis (Figure 5A). Considering libraries from all labs, both 1.2× mixtures had significantly increased MFmin relative to the untreated control (FDR‐adjusted p values, B[a]P 1.2× mixture p = 1.82 × 10−3, ENU 1.2 × mixture p = 6.63 × 10−4). The 1.5× and 2× mixtures were significantly increased as well (p < 1 × 10−8).

FIGURE 5.

FIGURE 5

Group and replicate analysis of mutation frequencies. (A) MFs plotted by sample type, treating libraries prepared by different labs as technical replicates. Each dot represents one library, with horizontal lines representing the group mean. Asterisks indicate FDR‐adjusted p‐values less than 1 × 10−2 (*), 1 × 10−3 (**), and 1 × 10−4 (***). Replicate libraries of the untreated control and 1.2× mixtures were randomly selected to make smaller groups and statistical testing was performed on the sampled groups. Independent pairwise comparisons were performed for untreated and the B[a]P 1.2× mixture (B) and for untreated and the ENU 1.2× mixture (C). For each group size, indicated by the x‐axis, 100 independent samplings were performed and the p‐value for each comparison is plotted on the y‐axis. Points indicate individual iterations of sampling and “violin” shapes represent the density distribution of the resulting p‐values for each group size. Number above each set of points indicates the number of p‐values < 0.05 out of the 100 sampling iterations.

To assess whether a ~ 1.2‐fold increase in MFmin could be detected using fewer than 8 replicates, we randomly selected replicates to make smaller groups and repeated the group‐based analysis (Figure 5B,C). For the B[a]P 1.2× mixture, 97% of randomly sampled groups of n = 6 yielded a significant test result (p < 0.05) and for the ENU 1.2× mixture, 98% of randomly samples groups of n = 5 yielded a significant test result (p < 0.05). This analysis suggests that five to six 1000 ng libraries provide enough statistical power to detect an ~1.2‐fold change with high confidence, assuming a similar baseline MF and that the variability between replicates is similar to what is observed in inter‐lab technical replicates in this study. The analysis was repeated for the 1.5× mixtures, with the results suggesting that 2–3 replicates per group provides sufficient power to detect an ~1.5‐fold increase in MFmin (Figure S2).

3.6. Single Base Substitution Spectra Analyses

Unsupervised hierarchical clustering was performed to classify libraries based on SBS spectra (Figure 6). Libraries prepared with pure DNA samples from untreated, B[a]P‐treated, and ENU‐treated rats showed distinct SBS spectra and formed discrete clusters. Libraries prepared with DNA mixture samples also resolved into near‐perfect clusters by DNA sample, with the lower fold‐change mixtures being more similar to the pure untreated rat libraries and the higher fold‐change mixtures being more similar to the pure B[a]P‐ or ENU‐treated rat samples. While the clustering analysis does not directly measure reproducibility across different labs, it shows that the variation within a specific mix rate (such as 1.2× or 1.5×) is less than the variation observed between different mix rates or treatments.

FIGURE 6.

FIGURE 6

Unsupervised hierarchical clustering by SBS spectra. For each library, the normalized proportions of simple base substitution types (pyrimidine notation) are plotted as a stacked bar (middle panel). The dendrogram (top) reflects the results of unsupervised hierarchical clustering. The sample identity for each library is indicated along the x‐axis by color‐coded boxes.

3.7. Subtype MF Analysis

Next, we assessed whether focusing on mutagen‐specific mutation subtypes could boost sensitivity of DS. We repeated the replicate‐sampling analysis using sub‐type MFmin instead of total MFmin for the B[a]P and ENU 1.2× mixtures (Figure 7). For the B[a]P 1.2× mixture, 92% of randomly sampled groups of n = 3 showed a significant increase in C>A MF (Figure 7A) and 99% of randomly sampled groups of n = 5 showed a significant increase in C>G MF (Figure 7B), compared to n = 6 needed to see a significant increase in total MFmin (Figure 5B). For the ENU 1.2× mixture, 100% of randomly sampled groups of n = 4 showed a significant increase in T>A MF (Figure 7C) and 100% of randomly sampled groups of n = 3 showed a significant increase in T>C MF (Figure 7D), compared to n = 5 needed to see a significant increase in total MFmin (Figure 5C). For all other mutation subtypes, either no significant change was detectable with any number of replicates or more replicates were needed than for total MFmin (Figure S3).

FIGURE 7.

FIGURE 7

Replicate analysis of subtype mutation frequencies. Similar to the replicate sampling analysis performed for total MF, replicates were randomly selected into smaller groups and pairwise testing was performed to determine if a significant increase in subtype MF could be detected. For each subtype in each sample pairing, 100 iterations of sampling and statistical testing were performed, represented by individual points. Larger shapes show the density distribution of p‐values for each group size and numbers represent the number of iterations that yielded a p‐value < 0.05. For each mutagen, the most dominant mutation subtypes are shown: C>A (A) and C>G (B) mutations for the B[a]P 1.2× mixture and T>A (C) and T>C (D) mutations for the ENU 1.2× mixture.

3.8. Simple and Trinucleotide Spectra Analysis

Normalized proportions of mutation subtypes were compared across treatment groups and across laboratories using the likelihood ratio test within contingency tables of mutation counts (min). Comparisons were first made based on the simple mutation spectra (Figure 8A), including all variant types and SNV subtypes. Categories included MNV (multiple‐nucleotide variants), insertions, deletions, complex variants, symbolic/structural variants, and the six SNV subtypes. Proportions for each mutation subtype were normalized to the sequencing depth. There was no significant difference (p > 0.05) between laboratories for any of the mixtures or pure samples except for laboratory C, pure ENU treated samples which was significantly different from all other laboratories. The small relative increase in C>A mutation found in sample from laboratory C is not fully understood but may arise from the process of sample preparation or library constructions. Expected differences in mutation proportions were observed between mixtures (or pure samples) and controls within each laboratory. Significant shifts in the mutation spectra were detected starting at B[a]P 1.5× mixture for six out of the eight labs. Labs B and E detected significant shifts in the spectrum starting at B[a]P 2× mixture. Significant differences in the mutation spectra were detected starting at ENU 1.5× mixture for all laboratories. Labs A and F were further able to detect significant differences in the mutation spectra for the ENU 1.2× mixture (Table S4).

FIGURE 8.

FIGURE 8

Comparison of mutation spectra between treatment groups and across labs. (A) The normalized proportions of simple base substitution (SNV) types (pyrimidine notation) and non‐SNV variant types are plotted as a stacked bar for each library. The treatment for each library is indicated along the x‐axis by color‐coded boxes. Samples are organized by lab, which are denoted at the top of the plot. Asterisks indicate significant differences between a sample and its within‐lab control. Daggers represent comparisons of a treatment group between labs; samples with the same dagger are not significantly different from one another. All comparisons were made at a significance level of 0.05. B. The normalized proportions of SNV subtypes (pyrimidine notation) in their trinucleotide context are plotted in a heatmap for each library. The treatment for each library is indicated along the y‐axis by color‐coded boxes. Samples are faceted by lab (right) and include the total mutation count for all samples per lab. Trinucleotide subtypes are faceted by SNV subtype and include the total mutation count for each subtype across all samples. Asterisks indicate significant differences between a sample and its within‐lab control. Daggers represent comparisons of a treatment group between labs; samples with the same dagger are not significantly different from one another. All comparisons were made at a significance level of 0.05.

Similar results were found when the 96‐base trinucleotide spectrum was compared across treatment groups and laboratories (Figure 8B). The trinucleotide spectrum consists of the normalized proportions of the six SNV subtypes (pyrimidine notation) within the context of their two flanking nucleotides. Again, the proportions of mutation subtypes did not differ across laboratories for any treatment groups except pure ENU, which differed between lab C and all other laboratories. The pure ENU treatment groups had many more mutations than any other group; therefore, it is possible that only small differences in the spectra are being detected in this case. Shifts in the mutation spectra between the treatment groups and the control were consistent across labs. Four out of the eight laboratories detected a significant shift in the trinucleotide spectrum starting at B[a]P 2× mixture. Lab A was able to detect a significant difference in B[a]P 1.5× mixture. Labs B, G, and TS were only able to detect a significant difference in trinucleotide spectra at pure B[a]P treated samples. Five out of eight laboratories detected a significant shift in the trinucleotide spectrum starting at ENU 2× mixture. Labs A and D detected a significant difference at ENU 1.5× mixture. Finally, lab B only detected a significant difference at pure ENU treated samples.

3.9. Power Analysis

The issue with empirical evaluations is that the underlying distributions of the data may not reflect true experimental conditions such as only evaluating technical variability. Simulation is a tool often used to test new methodologies based on estimates from the empirical evaluation. Simulating data from known distributions allows us to measure the performance of the test system under different scenarios.

Using the binomial variation observed in this study, a MF of 6.7 × 10−8 and an average number of informative bases (1.71 × 109), our simulation shows that DS can detect a 1.5‐fold increase with an n of four with 80% power (the red line in panel A of Figure 9). As n increases, DS can detect a 1.2‐fold change with an n of 8 with a Bonferroni adjusted significance level of 0.05 with eight comparisons (i.e., eight labs). As the number of informative bases decreases from 100% to 80% (orange), 60% (green), 40% (blue) and 20% (purple), the minimal detectable fold change increases. With only 3.42 × 108 informative duplex bases, to detect a 1.5‐fold change would require an n of 8 per group.

FIGURE 9.

FIGURE 9

Power analysis of minimum detectable fold‐change using different sample size. The calculation was based on MF of 6.7 × 10−8 with total number of informative duplex bases of: 1.71 × 109 (Red), 1.34 × 109 (Orange), 1.03 × 109 (Green), 6.84 × 108 (Blue), and 3.42 × 108 (Purple). The gray line represents the minimum detectable fold change of 1.5 fold. Power analysis was conducted assuming no additional binomial variation or three different sample variances (sigma = 0.05, 0.10, and 0.15) with a desired power of 80%.

Panels B, C and D in Figure 9 illustrate the minimal detectable fold change of DS as over dispersion or additional binomial variation is present. These analyzes attempts to simulate biological variability using estimates of standard deviations that have been previously published or cover a range of values observed in unpublished in‐house data. As the additional binomial variation increases the minimal detectable fold change also increases. For example, with a standard deviation of 0.15, the sample size to detect a 1.5‐fold change would be seven per group, the point at which the red line crosses the gray line in panel D of Figure 9.

4. Discussion

Recognition that ecNGS approaches bring improved precision and a broader range of applicability to the characterization of genomic change created enthusiasm for their further development and validation. Indeed, practitioners have described the potential for ecNGS methods to revolutionize the fields of genetic toxicology and cancer risk assessment (Marchetti et al. 2023a, 2023b). Establishing the robustness of ecNGS methods is a critical step toward regulatory adoption. As noted in the roadmap by Marchetti et al. (2023a), essential steps toward the regulatory adoption of ecNGS test methods must include favorable demonstration of assay sensitivity, transferability, and inter‐laboratory reproducibility through a ring trial (OECD series on testing and assessment –number 34, https://ntp.niehs.nih.gov/sites/default/files/iccvam/suppdocs/feddocs/oecd/oecd‐gd34.pdf).

DS is one type of ecNGS that has been used in various studies to assess low‐frequency mutations, both in vitro and in vivo (Wang et al. 2021; Armijo et al. 2023; Cho et al. 2023; Smith‐Roe et al. 2023; Zhang et al. 2024). Given that the number of laboratories investigating ecNGS for genetic toxicology applications was greater for DS than for other ecNGS approaches (e.g., PacBio Hifi sequencing (Miranda et al. 2023), SMM‐seq (Maslov et al. 2022), and Hawk‐Seq (Matsumura et al. 2019; Otsubo et al. 2021)), investigating the technical sensitivity, transferability, and reproducibility of DS was the goal of this international collaboration.

The results from the eight laboratories participating in this study demonstrate the outstanding reproducibility of DS in the measurement of MF and mutation spectra. Indeed, the results of the current study robustly meet the criteria for OECD pre‐validation of a test method. Further, the interlaboratory reproducibility (Figures 3 and 6) supports the conclusion that the methods employed in the current study are sufficiently robust to be considered as a ring trial.

Importantly, several of the laboratories entered this study with minimal or no prior experience building libraries for DS. With appropriate training by TwinStrand personnel, these laboratories were able to produce results, both MF and mutation spectrum, nearly identical to those generated by laboratories with more experience. Thus, the DS method was established as having high transferability. Our observations regarding DS reproducibility and transferability support the conclusion that interlaboratory variability is unlikely to limit the regulatory use of this method for MF determination.

One of the interesting observations of this study is that the use of 500 ng input DNA by lab B generated results very similar to the seven other labs that used 1000 ng of input DNA. Thus, the method appears to have good tolerance regarding variation in the quantity of input DNA. Both quantities of input DNA yielded large numbers of total informative duplex bases. Comparison of the performance metrics shows that libraries prepared using 500 ng of input DNA had a mean PTFS of 48, as compared to a mean PTFS of 16 for libraries prepared using 1000 ng input DNA. However, 500 ng input DNA generated only about half of the total informative duplex bases generated using 1000 ng input DNA with the same sequencing settings. Generally, a PTFS of 10 to 15 is considered ideal for high quality DS data. Although there are no negative impacts on the mutational analysis, greater PTFS values may signal the creation of unnecessary raw reads (and sequencing cost), without a proportional increase in informative duplex bases. Therefore, this study documented the relationship between the input DNA and raw reads as critical parameters of DS study design that must be considered to achieve optimal PTFS.

MF data can be expressed as MFmin (mutations recovered more than once within the same sample are counted only once) or as MFmax (all mutations are counted). Sequencing of mutants is sometimes performed as part of a TGR assay, particularly when high inter‐individual variability is observed because “sequencing can be used to rule out the possibility of jackpots or clonal events by identifying the proportion of unique mutants from a particular tissue” (OECD 2025). In assessment of mutagenesis, this conveys a recognized preference for the analysis of unique mutations as compared to mutation data confounded by clonal events. The current study investigated MFs calculated as MFmin and MFmax. Initially some variation in MFmax was observed across replicated samples. Some of this variation was due to a few variants with germline frequencies in the pure samples erroneously left out of the initial list intended to mask these positions from analysis. This resulted in apparent large clones in some mixture samples. Furthermore, in mixtures where those variants were present at about ~1%, they fell below the threshold in some samples, so contributed 100 s of variant counts to the MFmax calculation and fell above the threshold in others so were excluded from MFmax calculations. When the reportable region was updated to exclude these germline variants, the variability in MFmax greatly decreased. This demonstrates that in samples with true clones, that both the presence of clones and the exact size of the clones (barely above or below the somewhat arbitrary threshold of 1%) can have a dramatic impact on MFmax. Specifically, the dilution necessary for the construction of the DNA mixtures caused germline mutations to appear as large clones in some but not all samples. This example highlights the need to carefully scrutinize DS data. After addressing the artifactual variant calls, the MFmin and MFmax metrics both yielded reproducible results across the eight participating laboratories. Nevertheless, based on this experience and existing TGR guidance, MFmin is recommended as the appropriate metric to use for ecNGS assessment of mutagenicity, whereas MFmax may be a more relevant metric for carcinogenicity testing.

It is well known that different mutagens and DNA repair defects may leave unique signatures in the mutational spectra (Poon et al. 2014; Kucab et al. 2019; Boysen et al. 2025). In addition to the measurement of MF, DS yields a comprehensive view of mutation types, thereby providing insights into mutagenesis mechanisms. Identification of changes in mutational spectra could also provide additional confidence in the mutagenic effects observed from a specific treatment. In this study, significant changes in the simple mutation spectra were observed between control and the B[a]P 1.5× mixture (six out of eight laboratories) and between control and ENU 1.5× mixture (eight out of eight laboratories). The sensitivity to detect a shift based on the trinucleotide spectra was not as great as that observed using the simple mutation spectra. In comparisons with control, it was possible to detect a significant shift in the trinucleotide spectra for the B[a]P 2× mixture (four out of eight laboratories) and the ENU 2× mixture (five out of eight laboratories). Although regulatory applications do not mandate assessment of a shift in the mutation spectrum caused by a test article, detecting significant changes in simple and/or trinucleotide spectrum may help identify a positive response in conjunction with the MF analysis.

As a first step toward systematically evaluating DS sensitivity, this study developed data on the technical sensitivity of DS. Technical sensitivity was evaluated using replicate analyses of constructed and shared DNA mixtures (n = 1, without biological variability) designed to represent a range of MFs. Using the library preparations and sequencing approaches described, the results indicate a MF 1.5× greater than control was statistically distinguishable from control MF. When data from different laboratories were pooled, a MF 1.2× greater than control was statistically distinguishable from control MF. This sensitivity was augmented by focusing on mutagen‐specific mutation subtypes. For example, three samples were sufficient to demonstrate a statistically significant increase in C>A MF in the B[a]P 1.2× mixture versus control in 92% of randomly selected sets of samples whereas six samples were needed to detect a significant increase based on total MF. Similarly, in the comparison of control and the ENU 1.2× mixture three samples were sufficient to detect a significant increase in T>C MF in 100% of randomly selected set of samples, whereas five samples were required to observe a significant increase based on total MF. This demonstration of the ability of DS to detect subtle elevations in the induced total or subtype MF over the background level with relatively small number of replicates should not be construed as a recommendation that a 1.2‐fold increase should be used as a cutoff for positive/negative calling in mutation testing because such a small increase may not be biologically relevant. Instead, this result means that the technical sensitivity of DS is unlikely to hinder its ability to detect biologically relevant changes in MF.

The simulation conducted in this paper provides further evidence to support the findings of the empirical evaluation and to explore the sensitivity of the method when incorporating different levels of biological variability, overdispersion, or additional binomial variation. The power analysis shows that it is possible to detect a 2‐fold change using only three samples per group, 1.71 × 109 informative bases, and assuming a standard deviation of 0.1. Currently, a minimum of five animals per group is required for the TGR assay according to OECD Test Guideline 488 (OECD 2025). The requirement for five animals is largely due to the relatively high inter‐animal variability observed with TGR assays. A recent study, which included a power analysis, supported the use of as few as three animals to detect a 1.5‐fold increase in MF due to the very low inter‐animal variability observed using DS (Dodge et al. 2023). If the possibility that fewer animals can be used in DS studies than TGR studies is confirmed, mutagenicity testing by DS would be consistent with the 3R principles of reducing, refining, and replacing animal use in experimentation.

One potential limitation of our study design is that it would have been possible to sequence each lab's libraries as discrete pools in separate lanes or on separate days. However, practical/real world sequencing analyses are expected to involve pooling of libraries for sequencing on a given flow cell or “lane”. While theoretically, it cannot be ruled out that concurrent sequencing of the libraries prepared as in this study minimized potential variation, it is generally understood that library quality rather than competently performed input of samples for sequencing is the primary driver of sequence quality.

Recent studies have compared the performance of DS and TGR assays in transgenic animals and proven the equivalency of the two methods in mutagenicity testing (LeBlanc et al. 2022; Bercu et al. 2023; Dodge et al. 2023). Notably, the current study demonstrates the applicability of DS to wild‐type animals, laying the groundwork for its application as an alternative to the conventional TGR assay. Overall, our data from eight participating laboratories demonstrated that DS has high technical sensitivity, as well as great reproducibility and transferability. The data shown in this study will be useful in the design and interpretation of future interlaboratory studies that examine DS reproducibility and sensitivity in the context of test‐article exposed and unexposed rodents. As an intermediate step, however, additional studies are needed to establish the acceptability criteria for DS studies, i.e., the minimal number of animals required per group for in vivo studies, sufficiency in terms of target sequences analyzed, PTFS range, and/or number of informative duplex bases.

In summary, this DS validation study was undertaken with the view that ecNGS approaches may yield improved precision and greater flexibility in the characterization of genomic changes. A significant advantage of DS over TGR assays is its flexibility to be performed using DNA isolated from any tissue of any model organism. This study's results expand existing evidence indicating DS may improve the precision of the mutational analyses used in regulatory testing and the perception that ecNGS methods, in general, may revolutionize the field of genetic toxicology. However, to the authors' knowledge at the time of manuscript submission, DS kits and onsite trainings are not currently commercially available, and it is unclear when they will be once again. Long‐term unavailability of DS kits would impose obvious limitations on future validation studies and widespread adoption of this particular ecNGS technology. This will become a significant roadblock for analysis of in vivo mutation by DS if the current situation is not resolved. Further, it highlights the need to characterize/validate multiple ecNGS methodologies. The present paper presents a possible model of how transferability, reproducibility, and sensitivity could be assessed for other ecNGS methods. Independent replication of this study using alternative sequencing providers, reagents, and bioinformatic tools would be beneficial for further validation.

Author Contributions

James Todd Auman: data curation, writing – original draft, writing – review and editing. Anne Ashford: investigation, writing – review and editing. Connie L. Chen: conceptualization, writing – original draft, writing – review and editing, supervision, project administration, funding acquisition. Tao Chen: investigation, writing – review and editing. Annette Dodge: software, data curation, writing – original draft, writing – review and editing, visualization. Azeddine Elhajouji: investigation, writing – review and editing. Devon Fitzgerald: methodology, validation, formal analysis, resources, data curation, writing – Original draft, writing – review and editing, visualization. Shawn Harris: formal analysis, writing – review and editing, visualization. Jake Higgins: methodology, validation, resources, data curation, writing – original draft, writing – review and editing. Cheryl A. Hobbs: methodology, investigation, writing – original draft, writing – review and editing. Francesco Marchetti: conceptualization, methodology, writing – original draft, writing – review and editing. Matthew J. Meier: software, formal analysis, data curation, writing – original draft, writing – review and editing, visualization. Meagan B. Myers: formal analysis, writing – review and editing. Barbara L. Parsons: methodology, software, validation, formal analysis, writing – original draft, writing – review and editing, project administration. Lena Pfaller: investigation, writing – review and editing. Rebecca Sahroui: investigation, writing – review and editing. Jesse Salk: conceptualization, methodology, resources, writing – review and editing. David Schuster: investigation, writing – review and editing. Raja Settivari: methodology, investigation, writing – review and editing. Stephanie L. Smith‐Roe: investigation, resources, writing – review and editing. Andrew Williams: formal analysis, writing – original draft, writing – review and editing. Carole L. Yauk: methodology, investigation, writing – original draft, writing – review and editing. Jian Yan: investigation, writing – review and editing. Shaofei Zhang: conceptualization, methodology, investigation, writing – original draft, writing – review and editing, project administration.

Disclosure

TwinStrand Biosciences Inc. has developed DS technology, provided key reagents and analysis tools for this study and may financially benefit from this assay and the findings of this study. No authors are current employees, but former employees J. H. and J. S. are private equity holders. Shaofei Zhang is an employee of, and owns shares in, Pfizer Inc. Anne Ashford is an employee of, and owns shares in, AstraZeneca plc.

Supporting information

Data S1. Supporting Information.

EM-66-311-s001.docx (662.1KB, docx)

Acknowledgments

The experimental work described in this paper was performed under the auspices of the HESI GTTC. HESI is a non‐profit scientific organization that facilitates public and private partnerships in human and environmental health in a pre‐competitive space. The GTTC aims to improve the scientific basis of the interpretation of results from genetic toxicology tests for purposes of more accurate hazard identification and assessment of human risk; to develop follow‐up strategies for determining the relevance of test results to human health; to provide a framework for integration of testing results into a risk‐based assessment of the effects of chemical exposures on human health; to promote the integration and use of new techniques and scientific knowledge in the evaluation of genetic toxicology; and to monitor and promote the development of innovative tests and testing strategies. HESI GTTC provided funding for the library sequencing for laboratories D–G. All other experimental work was supported by in‐kind contributions (from public and private sector participants) of time, expertise, and experimental and/or development effort from the participating laboratories. These contributions are supplemented by direct funding (that primarily supports program infrastructure and management and some project‐related direct expenses) provided by HESI's corporate sponsors. A list of supporting organizations (public and private) is available at http://hesiglobal.org. Work performed for the National Toxicology Program, National Institutes of Environmental Health Sciences, National Institutes of Health, US Department of Health and Human Services, USA, was conducted under contract 75N96020C00001 (genetic toxicity testing) and ES103378‐01.

Zhang, S. , Parsons B. L., Fitzgerald D., et al. 2025. “Transferability, Reproducibility and Sensitivity of Mutation Quantification by Duplex Sequencing.” Environmental and Molecular Mutagenesis 66, no. 6‐7: 311–326. 10.1002/em.70020.

Accepted by: R. Heflich

Shaofei Zhang and Barbara L. Parsons contributed equally to this study.

Data Availability Statement

The data that support the findings of this study are openly available in Sequence Read Archive at https://www.ncbi.nlm.nih.gov/sra, reference number BioProject PRJNA1206152.

References

  1. Armijo, A. L. , Thongararm P., Fedeles B. I., et al. 2023. “Molecular Origins of Mutational Spectra Produced by the Environmental Carcinogen N‐Nitrosodimethylamine and S(N)1 Chemotherapeutic Agents.” NAR Cancer 5, no. 2: zcad015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Beal, M. A. , Meier M. J., LeBlanc D. P., et al. 2020. “Chemically Induced Mutations in a MutaMouse Reporter Gene Inform Mechanisms Underlying Human Cancer Mutational Signatures.” Communications Biology 3, no. 1: 438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Benjamini, Y. , and Hochberg Y.. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B, Statistical Methodology 57, no. 1: 289–300. [Google Scholar]
  4. Bercu, J. P. , Zhang S., Sobol Z., Escobar P. A., Van P., and Schuler M.. 2023. “Comparison of the Transgenic Rodent Mutation Assay, Error Corrected Next Generation Duplex Sequencing, and the Alkaline Comet Assay to Detect Dose‐Related Mutations Following Exposure to N‐Nitrosodiethylamine.” Mutation Research, Genetic Toxicology and Environmental Mutagenesis 891: 503685. [DOI] [PubMed] [Google Scholar]
  5. Besaratinia, A. , Li H., Yoon J. I., Zheng A., Gao H., and Tommasi S.. 2012. “A High‐Throughput Next‐Generation Sequencing‐Based Method for Detecting the Mutational Fingerprint of Carcinogens.” Nucleic Acids Research 40, no. 15: e116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bonferroni, C. E. 1935. “Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni. Roma.” 13–60.
  7. Boysen, G. , Alexandrov L. B., Rahbari R., et al. 2025. “Investigating the Origins of the Mutational Signatures in Cancer.” Nucleic Acids Research 53, no. 1: gkae1303. 10.1093/nar/gkae1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cheung, J. , Dobo K., Zhang S., et al. 2024. “Evaluation of the Nitrosamine Impurities of ACE Inhibitors Using Computational, In Vitro, and In Vivo Methods Demonstrate no Genotoxic Potential.” Environmental and Molecular Mutagenesis 65: 203–221. [DOI] [PubMed] [Google Scholar]
  9. Cho, E. , Swartz C. D., Williams A., et al. 2023. “Error‐Corrected Duplex Sequencing Enables Direct Detection and Quantification of Mutations in Human TK6 Cells With Strong Inter‐Laboratory Consistency.” Mutation Research, Genetic Toxicology and Environmental Mutagenesis 889: 503649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dodge, A. E. , LeBlanc D. P. M., Zhou G., et al. 2023. “Duplex Sequencing Provides Detailed Characterization of Mutation Frequencies and Spectra in the Bone Marrow of MutaMouse Males Exposed to Procarbazine Hydrochloride.” Archives of Toxicology 97, no. 8: 2245–2259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Højsgaard, S. , and U. Halekoh. 2023. “doBy: Groupwise Statistics, LSmeans, Linear Estimates, Utilities. R package version 4.6.20.” https://CRAN.R-project.org/package=doBy.
  12. Kennedy, S. R. , Schmitt M. W., Fox E. J., et al. 2014. “Detecting Ultralow‐Frequency Mutations by Duplex Sequencing.” Nature Protocols 9, no. 11: 2586–2606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kucab, J. E. , Zou X., Morganella S., et al. 2019. “A Compendium of Mutational Signatures of Environmental Agents.” Cell 177, no. 4: 821–836.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lambert, I. B. , Singer T. M., Boucher S. E., and Douglas G. R.. 2005. “Detailed Review of Transgenic Rodent Mutation Assays.” Mutation Research 590, no. 1–3: 1–280. [DOI] [PubMed] [Google Scholar]
  15. LeBlanc, D. P. M. , Meier M., Lo F. Y., et al. 2022. “Duplex Sequencing Identifies Genomic Features That Determine Susceptibility to Benzo(a)pyrene‐Induced In Vivo Mutations.” BMC Genomics 23, no. 1: 542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Marchetti, F. , Cardoso R., Chen C. L., et al. 2023a. “Error‐Corrected Next‐Generation Sequencing to Advance Nonclinical Genotoxicity and Carcinogenicity Testing.” Nature Reviews. Drug Discovery 22, no. 3: 165–166. [DOI] [PubMed] [Google Scholar]
  17. Marchetti, F. , Cardoso R., Chen C. L., et al. 2023b. “Error‐Corrected Next Generation Sequencing ‐ Promises and Challenges for Genotoxicity and Cancer Risk Assessment.” Mutation Research, Reviews in Mutation Research 792: 108466. [DOI] [PubMed] [Google Scholar]
  18. Maslov, A. Y. , Makhortov S., Sun S., et al. 2022. “Single‐Molecule, Quantitative Detection of Low‐Abundance Somatic Mutations by High‐Throughput Sequencing.” Science Advances 8(14):eabm3259 8: eabm3259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Matsumura, S. , Sato H., Otsubo Y., Tasaki J., Ikeda N., and Morita O.. 2019. “Genome‐Wide Somatic Mutation Analysis via Hawk‐Seq Reveals Mutation Profiles Associated With Chemical Mutagens.” Archives of Toxicology 93, no. 9: 2689–2701. [DOI] [PubMed] [Google Scholar]
  20. Miranda, J. A. , Fenner K., McKinzie P. B., Dobrovolsky V. N., and Revollo J. R.. 2023. “Unbiased Whole Genome Detection of Ultrarare Off‐Target Mutations in Genome‐Edited Cell Populations by HiFi Sequencing.” Environmental and Molecular Mutagenesis 64, no. 7: 374–381. [DOI] [PubMed] [Google Scholar]
  21. National Research Council . 2011. Guide for the Care and Use of Laboratory Animals. 8th ed. National Academies Press. [PubMed] [Google Scholar]
  22. OECD . 2002. Guidance Document for the Development of OECD Guidelines for Testing of Chemicals. OECD. [Google Scholar]
  23. OECD . 2016. “Test No. 476: In Vitro Mammalian Cell Gene Mutation Tests Using the HPRT and XPRT Genes.” In OECD Guidelines for the Testing of Chemicals, Section 4. OECD Publishing. 10.1787/9789264264809-en. [DOI] [Google Scholar]
  24. OECD . 2020. “Test No. 471: Bacterial Reverse Mutation Test.” In OECD Guidelines for the Testing of Chemicals, Section 4. OECD Publishing. 10.1787/9789264071247-en. [DOI] [Google Scholar]
  25. OECD . 2025. “Test No. 488: Transgenic Rodent Somatic and Germ Cell Gene Mutation Assays.” In OECD Guidelines for the Testing of Chemicals, Section 4. OECD Publishing. 10.1787/9789264203907-en. [DOI] [Google Scholar]
  26. Otsubo, Y. , Matsumura S., Ikeda N., and Morita O.. 2021. “Hawk‐Seq Differentiates Between Various Mutations in Salmonella typhimurium TA100 Strain Caused by Exposure to Ames Test‐Positive Mutagens.” Mutagenesis 36, no. 3: 245–254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Piegorsch, W. W. , and Bailer A. J.. 1994. “Statistical Approaches for Analyzing Mutational Spectra: Some Recommendations for Categorical Data.” Genetics 136, no. 1: 403–416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Poon, S. L. , McPherson J. R., Tan P., Teh B. T., and Rozen S. G.. 2014. “Mutation Signatures of Carcinogen Exposure: Genome‐Wide Detection and New Opportunities for Cancer Prevention.” Genome Medicine 6, no. 3: 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. R Core Team . 2023. “R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing. Vienna, Austria.”
  30. Salk, J. J. , and Kennedy S. R.. 2020. “Next‐Generation Genotoxicology: Using Modern Sequencing Technologies to Assess Somatic Mutagenesis and Cancer Risk.” Environmental and Molecular Mutagenesis 61, no. 1: 135–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Salk, J. J. , Schmitt M. W., and Loeb L. A.. 2018. “Enhancing the Accuracy of Next‐Generation Sequencing for Detecting Rare and Subclonal Mutations.” Nature Reviews. Genetics 19, no. 5: 269–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Seo, J. E. , Le Y., Revollo J., et al. 2024. “Evaluating the Mutagenicity of N‐Nitrosodimethylamine in 2D and 3D HepaRG Cell Cultures Using Error‐Corrected Next Generation Sequencing.” Archives of Toxicology 98, no. 6: 1919–1935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Sidak, Z. 1967. “Rectangular Confidence Regions for the Means of Multivariate Normal Distributions.” Journal of the American Statistical Association 62: 626–633. [Google Scholar]
  34. Smith‐Roe, S. L. , Hobbs C. A., Hull V., et al. 2023. “Adopting Duplex Sequencing Technology for Genetic Toxicity Testing: A Proof‐Of‐Concept Mutagenesis Experiment With N‐Ethyl‐N‐Nitrosourea (ENU)‐exposed Rats.” Mutation Research: Genetic Toxicology and Environmental Mutagenesis 891: 503669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Valentine, C. C., 3rd , Young R. R., Fielden M. R., et al. 2020. “Direct Quantification of In Vivo Mutagenesis and Carcinogenesis Using Duplex Sequencing.” Proceedings of the National Academy of Sciences of the United States of America 117, no. 52: 33414–33425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wang, Y. , Mittelstaedt R. A., Wynne R., et al. 2021. “Genetic Toxicity Testing Using Human In Vitro Organotypic Airway Cultures: Assessing DNA Damage With the CometChip and Mutagenesis by Duplex Sequencing.” Environmental and Molecular Mutagenesis 62, no. 5: 306–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wright, S. P. 1992. “Adjusted p‐values for Simultaneous Inference.” Biometrics 48: 1005–1013. [Google Scholar]
  38. Zhang, S. , Coffing S. L., Gunther W. C., et al. 2024. “Assessing the Genotoxicity of N‐Nitrosodiethylamine With Three In Vivo Endpoints in Male Big Blue(R) Transgenic and Wild‐Type C57BL/6N Mice.” Environmental and Molecular Mutagenesis 65, no. 6–7: 190–202. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting Information.

EM-66-311-s001.docx (662.1KB, docx)

Data Availability Statement

The data that support the findings of this study are openly available in Sequence Read Archive at https://www.ncbi.nlm.nih.gov/sra, reference number BioProject PRJNA1206152.


Articles from Environmental and Molecular Mutagenesis are provided here courtesy of Wiley

RESOURCES