Abstract
To determine the phase of NUDT15 sequence variants for more comprehensive star (*) allele diplotyping, we developed a novel long-read single molecule real-time HiFi amplicon sequencing method. A 10.5 kb NUDT15 amplicon assay was validated using reference material positive controls and additional samples for specimen type and blinded accuracy assessment. Triplicate NUDT15 HiFi sequencing of two reference material samples had non-reference genotype concordances of >99.9%, indicating that the assay is robust. Notably, short-read genome sequencing of a subset of samples was unable to determine the phase of star (*) allele-defining NUDT15 variants, resulting in ambiguous diplotype results. In contrast, long-read HiFi sequencing phased all variants across the NUDT15 amplicons, including a *2/*9 diplotype that previously was characterized as *1/*2 in the 1000 Genomes Project v3 dataset. Assay throughput was also tested using 8.5 kb amplicons from 100 Ashkenazi Jewish individuals, which identified a novel NUDT15 *1 sub-allele (c.−121G>A) and a rare likely-deleterious coding variant (p.Pro129Arg). Both novel alleles were Sanger confirmed and assigned as *1.007 and *20, respectively, by the PharmVar Consortium. Taken together, NUDT15 HiFi amplicon sequencing is an innovative method for phased full-gene characterization and novel allele discovery, which could improve NUDT15 pharmacogenomic testing and subsequent phenotype prediction.
Keywords: NUDT15, single molecule real-time (SMRT) HiFi sequencing, long-read sequencing, amplicon sequencing, PacBio, pharmacogenetics, pharmacogenomics
INTRODUCTION
The NUDT15 diphosphatase metabolizes several nucleotide substrates, including its direct role in converting oxidized guanosine-5’-triphosphate (GTP) to its monophosphate form, which prevents the integration of damaged purine nucleotides into genomic DNA. Importantly, NUDT15 catalyzes the conversion of cytotoxic thioguanine triphosphate (TGTP) to the less toxic thioguanine monophosphate (TGMP) (Moriyama et al., 2016), which is the biological rationale behind the discovery of germline NUDT15 variants as genetic determinants of mercaptopurine intolerance among children with acute lymphoblastic leukemia (J. J. Yang et al., 2015). Increasing evidence supporting the role of NUDT15 in thiopurine response variability (Moriyama et al., 2017; Nishii et al., 2018; Schaeffeler et al., 2019; Tsujimoto et al., 2018) prompted the incorporation of NUDT15 (with TPMT) into the 2018 Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline on genotype-guided thiopurine dosing (Relling et al., 2018) and the recently reported FDA pharmacogenetic association tables (Rubinstein & Pacanowski, 2021). There is also growing evidence that NUDT15 is involved in the metabolism of antiviral drugs such as acyclovir and ganciclovir (Nishii et al., 2021; Zhang et al., 2021).
Potentially novel NUDT15 variants are evaluated and catalogued as full-gene star (*) allele haplotypes by the Pharmacogene Variation (PharmVar) consortium (J. J. Yang et al., 2019), consistent with other pharmacogenomic genes (Gaedigk et al., 2019). Prior to this study, PharmVar defined 19 star (*) alleles that were comprised of 22 independent sequence variants (promoter, coding, and UTR), underscoring that some functional NUDT15 variants occur as part of multiple haplotypes (J. J. Yang et al., 2019). Like other clinically relevant pharmacogenomic genes (e.g., CYP2D6) (Gaedigk, Sangkuhl, Whirl-Carrillo, Klein, & Leeder, 2017; Nofziger et al., 2020), the frequencies of NUDT15 alleles vary significantly across ancestral populations. Given the complex architecture of NUDT15 star (*) alleles and their variable multi-ancestral frequencies, it is important to carefully perform variant phasing when implementing and interpreting NUDT15 testing for both research and clinical applications. For example, NUDT15*2 is defined by c.415C>T (rs116855232) and c.50_55dup (rs746071566) in cis, whereas *3 and *6 are defined by the independent occurrence of c.415C>T (rs116855232) and c.50_55dup (rs746071566), respectively (J. J. Yang et al., 2019).
Given the increasing clinical significance of NUDT15 and the importance of variant phasing, we sought to develop a method to sequence the full-length gene using long-read single molecule real-time (SMRT) HiFi sequencing, which was based on our previously reported analogous approach for both CYP2D6 (Qiao et al., 2016; Y. Yang, Botton, Scott, & Scott, 2017) and SLC6A4 (Botton, Yang, Scott, Desnick, & Scott, 2020). The NUDT15 gene is located on the positive strand of chromosome 13q14.2 and is comprised of three coding exons that are distributed over ~8 kb, which further justified the application of a long-range PCR amplicon SMRT sequencing strategy. Compared to our initial report on CYP2D6 SMRT sequencing (Qiao et al., 2016), the novel NUDT15 HiFi sequencing assay employs highly accurate sequence reads, updated sequencing chemistry and variant calling workflows, which were validated using NUDT15 accuracy and reproducibility reference materials and other controls. Our NUDT15 assay also has improved sequencing throughput compared to the original CYP2D6 assay (Qiao et al., 2016), which further demonstrates the utility of HiFi sequencing for high-throughput, phased full-gene characterization and novel pharmacogenomic allele discovery.
MATERIALS and METHODS
Samples and Subjects
Reference material DNA samples were identified from the 1000 Genomes Project v3 dataset using VarCover (https://varcover.org/) (E. R. Scott, Bansal, Meacham, & Scott, 2020) and acquired from the Coriell Institute for Medical Research (Camden, NJ) (Table 1). Additional de-identified NUDT15 DNA controls with blinded results from previous testing were provided by St. Jude Children’s Research Hospital. In addition, peripheral blood DNA samples were obtained from 100 unrelated, healthy Ashkenazi Jewish individuals from the greater New York City metropolitan area as previously described (S. A. Scott, Edelmann, Kornreich, & Desnick, 2008; S. A. Scott et al., 2010; S. A. Scott et al., 2012). All personal identifiers were removed and isolated DNA samples were tested anonymously. Peripheral blood was collected in EDTA vacutainer tubes using standard practices and DNA was isolated using the QiaSymphony (Qiagen, Valencia, CA) or Chemagic (Perkin Elmer, Baesweiler, Germany) according to manufacturer instructions. Saliva samples were collected using the Oragene Dx kit (OGD-500; DNA Genotek, Ottawa, ON, Canada) and DNA was isolated using the QiaSymphony (Qiagen).
Table 1:
Long-read NUDT15 HiFi sequencing validation samples and results
| Sample | Source | Intended Use | VarCover Result a | Targeted Genotyping Diplotype | 1KG NYGC WGS Diplotype b | PacBio HiFi Diplotype |
|---|---|---|---|---|---|---|
| HG00437 | Coriell | Accuracy | *5 carrier | *1/*5 | - | *1/*5 |
| HG01086 | Coriell | Accuracy/Precision | *2 carrier or *3/*6 | *1/*2(*3) c | *2 (or *3 and *6) and *9 carrier d | *2/*9 |
| HG01359 | Coriell | Accuracy | *4 carrier | *1/*4 | *1/*4 | *1/*4 |
| HG01979 | Coriell | Accuracy | *2 carrier or *3/*6 | *1/*2(*3) c | *1/*2 or *3/*6 d | *1/*2 |
| HG02259 | Coriell | Accuracy | *2 carrier or *3/*6 | *1/*2(*3) c | *1/*2 or *3/*6 d | *1/*2 |
| NA18526 | Coriell | Accuracy | *2 carrier or *3/*6 | *1/*2(*3) c | *1/*2 or *3/*6 d | *1/*2 |
| NA18564 | Coriell | Accuracy | *3 carrier | *1/*2(*3) c | *1/*3 | *1/*3 |
| NA19079 | Coriell | Accuracy | *2 (or *3 and *6) and *5 carrier | *2(*3)/*5 | *2 (or *3 and *6) and *5 carrier d | *2/*5 |
| NA19095 | Coriell | Accuracy | No star (*) allele variant | *1/*1 | *1/*1 | *1/*1 |
| NA19109 | Coriell | Accuracy/Precision | No star (*) allele variant | *1/*1 (20 variants) e | - | *1/*1 |
| PGXS6 | Saliva | Accuracy/Specimen | - | *1/*2(*3) c | - | *1/*3 |
| SJ01 | Cell line | Blinded Accuracy/Specimen | - | *1/*5 f | - | *1/*5 |
| SJ02 | Cell line | Blinded Accuracy/Specimen | - | *2/*3 f | - | *2/*3 |
1KG: 1000 Genomes Project; NYGC: New York Genome Center; WGS: whole genome sequencing.
Variants identified from the low coverage 1000 Genomes Project v3 dataset.
Variants generated from high coverage NYGC 1000 Genomes Project data.
Previous targeted genotyping of *2(*3), *4, and *5, which could not distinguish the *2 and *3 haplotypes due to interrogation of c.415C>T (rs116855232) and not c.50_55dup (rs746071566).
Star (*) allele variants detected but phase and haplotypes not determined.
Sample was selected for variant calling precision/reproducibility based on the total number of variants across the NUDT15 amplicon.
Genotyping was performed by Sanger sequencing of all NUDT15 coding regions and haplotype phase inferred.
NUDT15 Targeted Genotyping
Genotyping of three variant NUDT15 alleles [*2 (*3) (rs116855232), *4 (rs147390019) and *5 (rs186364861)] was performed using a comprehensive pharmacogenetic genotyping panel (S. A. Scott et al., 2020), which employs multiplex PCR and single base extension (SBE) using the Agena® SpectroCHIP® II and MassARRAY® Analyzer 4 platform as per manufacturer instructions (Agena Biosciences, San Diego, CA). Genotypes at the targeted loci were determined by SBE peak intensity and Typer software v4.1 (Agena Biosciences), and NUDT15 diplotypes were inferred by a haplotype translation table and Typer software v4.1. Of note, the Agena chemistry was unable to interrogate the c.50_55dup (rs746071566) variant, which independently defines *6 and is found in cis with c.415C>T (rs116855232) on the *2 haplotype. Given that c.415C>T (rs116855232) is found on both *2 and *3, the Agena genotyping panel is unable to distinguish between NUDT15*2 and *3 when c.415C>T (rs116855232) is detected (S. A. Scott et al., 2020).
Full Gene NUDT15 Amplicon Preparation and Barcoding
Two amplification strategies were used for sequencing the entire NUDT15 gene, a one-step PCR barcoded 10.5 kb amplicon and a two-step PCR barcoded 8.5 kb amplicon. Long-range PCR reactions for both the 10.5 and 8.5 kb amplicons were performed in 15 μl containing ~50 ng of DNA, 1X SequalPrep™ Reaction buffer (Invitrogen), 3% DMSO, 1X Enhancer A, 1X Enhancer B, 0.5 μM of forward and reverse primers (Supp. Table S1), and 1.8 units of SequalPrep™ Polymerase. Amplification consisted of an initial denaturation step at 94°C for 2 min followed by 10 amplification cycles (94°C for 10 sec, 60°C for 30 sec, and 68°C for 12 min), another 20 amplification cycles (94°C for 10 sec, 60°C for 30 sec, and 68°C for 12 min + 20 sec/cycle), and a final extension at 72°C for 5 min.
The 8.5 kb products were used as templates for second round amplification using universal oligonucleotide tags to embed barcodes prior to SMRT sequencing. Amplification was as above but with 0.05 μM of forward and reverse primers (Supp. Table S2), a 67°C annealing temperature for the initial 10 cycles, and a 1:10 dilution of first round amplification amplicon template. The 10.5 kb amplicon primers incorporated unique barcode sequence into independent primer pairs, which negated the need for second round amplification.
Single-Molecule Real-Time (SMRT) HiFi Sequencing
Multiplex SMRT HiFi sequencing was performed as previously described (Botton et al., 2020; Qiao et al., 2016; Y. Yang et al., 2015), with updated platform and enzyme chemistry. In brief, PCR amplicons were purified by Agencourt® AMPure® XP beads and quantified by Nanodrop 1000. After purification, equal molecule quantities of PCR amplicons were pooled, with the required volume of each amplicon calculated by the following formula (Qiao et al., 2016):
Where M is the total mass of pooled PCR amplicons, n is the total number of samples, V(i) is the volume of each PCR amplicon, and C(i) is the concentration of each amplicon. A total of 500 ng of pooled PCR amplicons were submitted for HiFi sequencing. HiFi sequencing was performed according to the Pacific Biosciences 3.0 protocol on the Sequel instrument with a movie collection time of 20 hours, as per manufacturer instructions and as previously described (Botton et al., 2020; Qiao et al., 2016).
NUDT15 HiFi Sequencing Data Analysis
NUDT15 HiFi sequencing analysis included demultiplexing, alignment, sequencing quality score recalibration, and variant calling. Raw sequencing data in FASTQ format were demultiplexed into independent samples using NGSutils (Breese & Liu, 2013). HiFi circular consensus sequencing (CCS) reads were aligned to the targeted NUDT15 gene region (chr13:48610000-48622000; hg19) using BWA-MEM version 0.7.12 with dedicated parameter settings for Pacific Biosciences HiFi sequencing (Li, 2013). Subsampled Coriell reference material CCS BAM files were created with 50, 100, and 250 randomly selected reads using BBmap (version 38.86, reformat.sh samplereads) to assess variant, genotype, haplotype, and metabolizer phenotype calling accuracy with the computational pipeline described below.
Variant, Genotype, Star (*) Allele Haplotype, and Diplotype Analysis
Phased NUDT15 genotypes were called with DeepVariant (version 1.3) using WhatsHap haplotagged BAM files (version 1.2.1). When two structural variants were called at the same position, the haplotagged BAMs were split into two files and DeepVariant was run separately for each haplotype. Genotype calls were subsequently removed if the DeepVariant-generated FILTER label did not equal ‘PASS’ or genotype quality was less than 5. Genotype calls were left-aligned using the Vt package (version v0.5) (Tan, Abecasis, & Kang, 2015). Four non-coding thymidine homopolymer sites were excluded from analysis based on their challenging genomic contexts (GRCh37 chr13: 48612952–48612858, 48613760–48613785, 48614736–4861475, and 48617672–48617681). For the Ashkenazi Jewish cohort, Amplicon Long-read Error Correction (ALEC) was implemented as previously described (Qiao et al., 2016). Identified NUDT15 genotypes were manually translated to common star (*) allele nomenclature according to the PharmVar Consortium (Gaedigk et al., 2019; J. J. Yang et al., 2019).
Publicly Available NUDT15 Short-Read Sequencing Data
To compare long-read and short-read NUDT15 sequencing results, New York Genome Center short-read genome sequencing data for the 1KG reference material samples noted above (Pratt et al., 2016) were acquired from the National Center for Biotechnology Information (NCBI) FTP server (ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes) using Samtools (version 1.3.1). GRCh38 bam positions were converted into GRCh37 coordinates using CrossMap.py. Phased variant, genotype, haplotype, and diplotype calls were generated using the long-read HiFi amplicon sequencing pipeline described above.
Statistical Analysis
Genotype phasing using 100 subsampled long-read reads (providing comparable mean short-read depth) from seven Coriell samples with high coverage short read data were compared using a binomial test (n=50 heterozygous genotype sites).
RESULTS
Long-Read NUDT15 HiFi Sequencing Read Length Metrics
An average of 2,592 aligned reads (standard deviation (SD): 1782; minimum (min): 526) with an average read length of 7.91 kb (SD: 728 bp) were generated using the 10.5 kb NUDT15 amplicon (n=17 amplicons), and an average of 680 aligned reads (SD: 387; min 38) with an average read length of 6.51 kb (SD: 510 bp) were generated using the 8.5 kb NUDT15 amplicon (n=200 amplicons).
Long-Read NUDT15 HiFi Sequencing Accuracy and Precision
Ten reference material samples were utilized to assess NUDT15 long-read HiFi sequencing accuracy and precision (Table 1). Results were compared to the expected diplotypes from the alleles identified by VarCover (E. R. Scott et al., 2020) and orthogonal clinical NUDT15 targeted genotyping (S. A. Scott et al., 2020). Importantly, all samples had completely phased NUDT15 diplotypes by long-read HiFi sequencing, which included clarification of the related *2, *3, and *6 alleles, as well as the identification of a sample with an unexpected *9 allele (c.50_55del) (Table 1, Figures 1 and 2). In addition, blinded testing of two samples with previously determined NUDT15 diplotypes (SJ01 and SJ02) correctly identified the expected results using long-read HiFi sequencing. Saliva DNA was also evaluated (PGXS6), which previously had undergone clinical targeted genotyping (*1/*2(*3)), and long-read HiFi sequencing determined that the true diplotype for this sample was *1/*3 (Table 1).
FIGURE 1. NUDT15 HiFi amplicon sequencing of HG01086.

(A) Full gene view of both short- and long-read HiFi sequencing of the HG01086 cell line, which was characterized by the 1000 Genomes Project v3 dataset as having the c.415C>T (rs116855232) and c.50_55dup (rs746071566) variants found in the NUDT15 *2, *3 and/or *6 alleles. High-depth short-read sequencing detected these variants; however, they could not be phased. In contrast, long-read HiFi sequencing identified these variants and an overlapping c.50_55del variant, which is the NUDT15 *9 defining variant. (B) Zoomed in view of NUDT15 exon one, underscoring the phased identification of c.50_55dup and c.50_55del, resulting in a *2/*9 diplotype. (C) Zoomed in view of the overlapping NUDT15 c.50_55dup and c.50_55del variants, resulting in a *2/*9 diplotype.
FIGURE 2. NUDT15 HiFi amplicon sequencing of NA19079.

Full gene view of both short- and long-read HiFi sequencing of the NA19079 cell line, which was characterized by the 1000 Genomes Project v3 dataset as having the c.415C>T (rs116855232) and c.50_55dup (rs746071566) variants found in the NUDT15 *2, *3 and/or *6 alleles, and the c.52G>A variant that defines *5. High depth short-read sequencing detected these variants; however, they could not be phased. In contrast, long-read HiFi sequencing unambiguously phased all variants (red boxes), resulting in a *2/*5 diplotype.
In addition, genotype, haplotype, and star (*) allele assignments for all reference material samples were accurately called after down-sampling to 100 and 250 long-reads; however, although down-sampling to 50 reads also yielded accurate calls, when processed under the split-read pipeline, some low genotype quality false-positive variants were called. NUDT15 HiFi sequencing precision was measured by evaluating triplicate amplifications of two reference material samples (HG1086 and NA19109) across the 100 and 250 long-read down-sampled datasets. HG1086 had a unique NUDT15 *2/*9 diplotype and NA19109 was selected for having the most variants (n=20 coding and non-coding) across the NUDT15 amplicon among the 1000 Genomes Project v3 dataset. Importantly, genotype phasing reproducibility for long-read HiFi sequencing across replicates and 100 and 250 subsampled read depths was >99.9% (943/944 variant calls).
NUDT15 Short-Read Sequencing Accuracy and Phasing
Eight of the 10 reference material samples had publicly available high depth short-read genome sequencing data available. Targeted analysis of the NUDT15 region determined that short-read sequencing coupled with the same computational pipeline used for long-read HiFi sequencing identified the expected star (*) allele variants. However, in contrast to long-read HiFi sequencing, samples with multiple star (*) allele variants could not be unambiguously phased to haplotypes and diplotypes (Table 1, Figures 1 and 2). Importantly, more genotype sites were phased using HiFi sequencing (100%, 58/58 genotypes) than could be phased using short-read sequencing (55%, 32/58 genotypes) in the eight Coriell samples with available data (binomial p < 0.0001) (Table 2).
Table 2:
NUDT15 genotype phasing performance with short- and long-read sequencing
| Short-read WGS | Long-read HiFi | |||
|---|---|---|---|---|
| Sample | Phased | Unphased | Phased | Unphased |
| HG01086 | 3 | 5 | 8 | 0 |
| HG01359 | 0 | 2 | 2 | 0 |
| HG01979 | 4 | 5 | 9 | 0 |
| HG02259 | 7 | 2 | 9 | 0 |
| NA18526 | 5 | 3 | 8 | 0 |
| NA18564 | 3 | 4 | 7 | 0 |
| NA19079 | 7 | 2 | 9 | 0 |
| NA19095 | 3 | 3 | 6 | 0 |
WGS: whole genome sequencing.
NUDT15 Variation in an Ashkenazi Jewish Cohort
Given the NUDT15 accuracy and concordance results identified with the reference material samples, long-read HiFi sequencing throughput was evaluated by assessing the frequency of NUDT15 haplotypes in a cohort of 100 Ashkenazi Jewish samples, run in duplicate. All replicates demonstrated perfect concordance of star (*) allele assignments (Figure 3). Of the 795 called heterozygous variants at 33 unique sites, a single variant in one sample was unable to be phased in 1 of the 2 replicates, yielding a phasing concordance of 99.7% between the two replicates (396/397 in replicate 1 and 398/398 in replicate 2). In addition, 152 homozygous alternative genotypes were concordantly called in each replicate. Importantly, a novel *1 sub-allele (c.−121G>A) and a rare coding variant (c.386C>G; p.Pro129Arg) were also detected, which were Sanger confirmed and assigned as *1.007 (two individuals) and *20.001 (one individual), respectively, by the PharmVar Consortium (J. J. Yang et al., 2019) (Table 3 and Figure 3). Of note, both of these variants have previously been catalogued in the gnomAD database (rs544736228 and rs768324690, respectively).
FIGURE 3. NUDT15 HiFi amplicon sequencing of 100 Ashkenazi Jewish DNA samples.

(A) Full gene view of the Ashkenazi Jewish long-read HiFi sequencing results, highlighting all identified genotypes. Light blue: homozygous alternate; dark blue: heterozygous alternate; green: heterozygous star (*) allele-defining variants. The dashed pink boxes highlight the novel NUDT15*1.007 alleles (c.[−121G>A;−79G>A]) identified in this cohort, and the dashed red box highlights the novel NUDT15 *20 allele (c.386C>G; p.Pro129Arg) identified in this cohort. (B) Full length long-read HiFi sequencing results of the Ashkenazi Jewish sample with the novel NUDT15*20 allele. The red box highlights two star (*) allele-defining variants in trans in this sample, NUDT15*1.005 (c.*7G>A) and *20 (c.386C>G). (C) Zoomed in view of NUDT15 exon three in the Ashkenazi Jewish sample with the novel NUDT15 *20 allele (red asterisk). The red boxes highlight the two star (*) allele-defining NUDT15 *1.005 (c.*7G>A) and *20 (c.386C>G) variants in trans in this sample, resulting in a *1.005/*20 diplotype. The novel NUDT15*20 allele is further zoomed in by black box to detail the exon three c.386C>G coding variant that defines *20.
Table 3:
Ashkenazi Jewish NUDT15 haplotypes detected by long-read HiFi sequencing
| NUDT15 Allele | *1.001 | *1.003 | *1.005 | *1.007 a | *20.001 a |
| n (frequency) b | 187 (0.935) | 4 (0.020) | 6 (0.030) | 2 (0.010) | 1 (0.005) |
Novel allele detected in this study.
Total n=200 haplotypes.
DISCUSSION
Given the increasing clinical significance of NUDT15 and the importance of variant phasing in pharmacogenomics, we sought to develop an innovative method to sequence the full-length gene using long-read HiFi sequencing. Phased haplotypes can be determined using sequencing technologies that span a set of relevant alleles (read-based phasing) or can be inferred using previously phased haplotypes (statistical phasing). We leveraged our previous experience with CYP2D6 and SLC6A4 HiFi sequencing (Botton et al., 2020; Qiao et al., 2016) to develop an updated and innovative HiFi amplicon sequencing assay that can phase the entire ~8 kb of NUDT15, which is a clinically actionable gene implicated in thiopurine toxicity (Relling et al., 2018). Our NUDT15 HiFi sequencing assay was validated against reference materials with previously reported variants and diplotypes, which resulted in expected star (*) alleles, but with additional variant detection, diplotype reclassification, and novel star (*) allele discovery (NUDT15*1.007, *20). The importance of phasing in clinical pharmacogenomics is highlighted by reference material sample HG01086, which targeted genotyping assigned *1/*2(*3) but HiFi sequencing reclassified as *2/*9. These diplotypes translate to phenotypes of intermediate and poor metabolizer, respectively, which have dramatically different risks for thiopurine toxicity and recommended clinical management (Relling et al., 2018).
Targeted genotyping of NUDT15 is increasingly available from clinical laboratories, which typically is accomplished by multiplexed PCR-based platforms. Although this strategy can efficiently measure for the presence or absence of specific nucleotides of interest, these targeted assays are unable to phase identified NUDT15 genotypes. Similarly, both Sanger and short-read sequencing interrogation of NUDT15 are also unable to phase identified variants across the gene without performing parental studies. As such, most clinical NUDT15 assays rely on haplotype translation tables to infer star (*) allele diplotypes; however, this can be problematic for haplotypes that are defined by multiple variants in cis, with additional haplotypes with the same variants in trans. For example, NUDT15*2 is defined by c.415C>T (rs116855232) and c.50_55dup (rs746071566) in cis, whereas *3 and *6 are defined by the independent occurrence of c.415C>T (rs116855232) and c.50_55dup (rs746071566), respectively (J. J. Yang et al., 2019).
Although the vast majority of clinical sequencing currently employs short-read sequencing platforms, long-read sequencing platforms continue to improve in performance and throughput, which has translated to higher variant calling accuracy in historically challenging genomic contexts (Olson et al., 2022; Wenger et al., 2019). As such, clinical long-read sequencing is emerging as an important alternative to short-read sequencing (Cohen et al., 2022; Logsdon, Vollger, & Eichler, 2020), which is evidenced by improved interrogation of clinically significant regions, including structural variants, repeat expansions and homologous gene families, as well as the inherent benefit of variant phasing (Ameur, Kloosterman, & Hestand, 2019; Ardui, Ameur, Vermeesch, & Hestand, 2018; Reiner et al., 2018). Cataloguing phased haplotypes among pharmacogenomic genes has been an ongoing effort of the PharmVar Consortium (Gaedigk et al., 2019), as previous technologies have had to rely on statistical phasing and/or haplotype inference. Our long-read HiFi amplicon sequencing assays for CYP2D6, SLC6A4 and now NUDT15 underscore the value of long-read sequencing for genes implicated in drug response variability, which recently has also been reported using HiFi whole-genome sequencing for a panel of pharmacogenomic genes (van der Lee et al., 2022). Of note, limitations of the long-read HiFi amplicon sequencing strategy include not being able to detect large structural variants at the NUDT15 locus; however, copy number variants are not currently documented within this region in the Database of Genomic Variants (http://dgv.tcag.ca/).
In addition to the accuracy and precision analyses performed with the NUDT15 HiFi amplicon sequencing method, which correctly phased two unique reference material samples (HG01086: *2/*9; NA19079: *2/*5), assay throughput was also evaluated by processing 100 Ashkenazi Jewish DNA samples in a single SMRT Cell (ran in duplicate on a second SMRT Cell). Based on geographical isolation and a history of significant population bottlenecks followed by rapid expansion, founder mutations for autosomal recessive diseases have become prevalent at appreciable carrier frequencies among Ashkenazi Jews, which has resulted in increased risks for several genetic disorders compared to other racial and ethnic groups (Y. Yang, Peter, & Scott, 2014).
Consistent with the identification of recessive founder mutations among the Ashkenazi Jewish population, pharmacogenomic studies have identified several unique variants not found at appreciable frequencies in other populations (e.g., VKORC1 p.D36Y; CYP2C19*4.002) (S. A. Scott et al., 2008; S. A. Scott et al., 2012). As such, it is notable that NUDT15 HiFi sequencing identified the novel *20 allele (c.386C>G; p.Pro129Arg) within this cohort, which has an allele frequency of 0.0007922 in 10,098 Ashkenazi Jewish alleles in the gnomAD v2.1.1 database (0.00004025 overall frequency; 273,322 total alleles). The clinical significance of c.386C>G (p.Pro129Arg) is currently uncertain; however, common in silico tools predict this variant to be deleterious (e.g., EIGEN, MutPred, Mutation assessor, MutationTaster, SIFT). In addition, the c.−121G>A variant that defines the *1.007 allele has an allele frequency of 0.008641 in 3,472 Ashkenazi Jewish alleles in the gnomAD v3.1.2 database (0.0003021 overall frequency; 152,252 total alleles). The clinical significance of this *1 sub-allele promoter variant is also currently uncertain.
In conclusion, these data underscore the accuracy and robust performance of an innovative long-read NUDT15 HiFi amplicon sequencing assay, which was validated using reference materials and other control samples, and including a throughput assessment using a 100-sample cohort. These results support the use of long-read HiFi sequencing for clinically significant amplicon targets, which has utility for full-gene haplotype phasing and novel pharmacogenomic allele discovery. Although full-gene sequencing strategies undoubtedly can lead to the increased identification of novel variants of uncertain significance, NUDT15 variant classification can be supported by recently reported massively parallel functional analyses of NUDT15 coding variants (Suiter et al., 2020). Moreover, implementation of NUDT15 testing has the most clinical utility when coupled with TPMT interrogation, and long-read HiFi sequencing could also resolve unambiguous and unphased TPMT diplotypes (e.g., *1/*3A vs *3B/*3C). As such, future directions include the development of a comprehensive pharmacogenomic HiFi sequencing panel that includes TPMT and NUDT15, as well as other clinically actionable genes implicated in interindividual drug response variability.
Supplementary Material
ACKNOWLEDGEMENTS
This work was supported in part by NIH grant R35GM141947 (J.J.Y.).
Footnotes
CONFLICT OF INTEREST STATEMENT
N.C., P.N., L.E., and E.E.S. are paid employees of Sema4; J.H, P.B, S.C., and J.K. are paid employees of Pacific Biosciences. All other authors declare no conflicts of interest.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
REFERENCES
- Ameur A, Kloosterman WP, & Hestand MS (2019). Single-Molecule Sequencing: Towards Clinical Applications. Trends Biotechnol, 37(1), 72–85. doi: 10.1016/j.tibtech.2018.07.013 [DOI] [PubMed] [Google Scholar]
- Ardui S, Ameur A, Vermeesch JR, & Hestand MS (2018). Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res, 46(5), 2159–2168. doi: 10.1093/nar/gky066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Botton MR, Yang Y, Scott ER, Desnick RJ, & Scott SA (2020). Phased Haplotype Resolution of the SLC6A4 Promoter Using Long-Read Single Molecule Real-Time (SMRT) Sequencing. Genes (Basel), 11(11). doi: 10.3390/genes11111333 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breese MR, & Liu Y (2013). NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets. Bioinformatics, 29(4), 494–496. doi: 10.1093/bioinformatics/bts731 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, . . . Pastinen T (2022). Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet Med. doi: 10.1016/j.gim.2022.02.007 [DOI] [PubMed] [Google Scholar]
- Gaedigk A, Sangkuhl K, Whirl-Carrillo M, Klein T, & Leeder JS (2017). Prediction of CYP2D6 phenotype from genotype across world populations. Genet Med, 19(1), 69–76. doi: 10.1038/gim.2016.80 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaedigk A, Sangkuhl K, Whirl-Carrillo M, Twist GP, Klein TE, Miller NA, & PharmVar Steering C (2019). The Evolution of PharmVar. Clin Pharmacol Ther, 105(1), 29–32. doi: 10.1002/cpt.1275 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997v2, 1–3. [Google Scholar]
- Logsdon GA, Vollger MR, & Eichler EE (2020). Long-read human genome sequencing and its applications. Nat Rev Genet, 21(10), 597–614. doi: 10.1038/s41576-020-0236-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moriyama T, Nishii R, Perez-Andreu V, Yang W, Klussmann FA, Zhao X, . . . Yang JJ (2016). NUDT15 polymorphisms alter thiopurine metabolism and hematopoietic toxicity. Nat Genet, 48(4), 367–373. doi: 10.1038/ng.3508 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moriyama T, Yang YL, Nishii R, Ariffin H, Liu C, Lin TN, . . . Yang JJ (2017). Novel variants in NUDT15 and thiopurine intolerance in children with acute lymphoblastic leukemia from diverse ancestry. Blood, 130(10), 1209–1212. doi: 10.1182/blood-2017-05-782383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishii R, Mizuno T, Rehling D, Smith C, Clark BL, Zhao X, . . . Yang JJ (2021). NUDT15 polymorphism influences the metabolism and therapeutic effects of acyclovir and ganciclovir. Nat Commun, 12(1), 4181. doi: 10.1038/s41467-021-24509-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishii R, Moriyama T, Janke LJ, Yang W, Suiter CC, Lin TN, . . . Yang JJ (2018). Preclinical evaluation of NUDT15-guided thiopurine therapy and its effects on toxicity and antileukemic efficacy. Blood, 131(22), 2466–2474. doi: 10.1182/blood-2017-11-815506 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nofziger C, Turner AJ, Sangkuhl K, Whirl-Carrillo M, Agundez JAG, Black JL, . . . Gaedigk A (2020). PharmVar GeneFocus: CYP2D6. Clin Pharmacol Ther, 107(1), 154–170. doi: 10.1002/cpt.1643 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, . . . Zook JM (2022). PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell Genom, 2(5). doi: 10.1016/j.xgen.2022.100129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pratt VM, Everts RE, Aggarwal P, Beyer BN, Broeckel U, Epstein-Baak R, . . . Kalman LV (2016). Characterization of 137 Genomic DNA Reference Materials for 28 Pharmacogenetic Genes: A GeT-RM Collaborative Project. J Mol Diagn, 18(1), 109–123. doi: 10.1016/j.jmoldx.2015.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiao W, Yang Y, Sebra R, Mendiratta G, Gaedigk A, Desnick RJ, & Scott SA (2016). Long-Read Single Molecule Real-Time Full Gene Sequencing of Cytochrome P450–2D6. Hum Mutat, 37(3), 315–323. doi: 10.1002/humu.22936 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reiner J, Pisani L, Qiao W, Singh R, Yang Y, Shi L, . . . Scott, S. A. (2018). Cytogenomic identification and long-read single molecule real-time (SMRT) sequencing of a Bardet-Biedl Syndrome 9 (BBS9) deletion. NPJ Genom Med, 3, 3. doi: 10.1038/s41525-017-0042-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Relling MV, Schwab M, Whirl-Carrillo M, Suarez-Kurtz G, Pui CH, Stein CM, . . . Yang JJ (2018). Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline for thiopurine dosing based on TPMT and NUDT15 genotypes: 2018 update. Clin Pharmacol Ther, submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubinstein WS, & Pacanowski M (2021). Pharmacogenetic Gene-Drug Associations: FDA Perspective on What Physicians Need to Know. Am Fam Physician, 104(1), 16–19. [PubMed] [Google Scholar]
- Schaeffeler E, Jaeger SU, Klumpp V, Yang JJ, Igel S, Hinze L, . . . Schwab M (2019). Impact of NUDT15 genetics on severe thiopurine-related hematotoxicity in patients with European ancestry. Genet Med, 21(9), 2145–2150. doi: 10.1038/s41436-019-0448-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott ER, Bansal V, Meacham C, & Scott SA (2020). VarCover: Allele Min-Set Cover Software. J Mol Diagn, 22(2), 123–131. doi: 10.1016/j.jmoldx.2019.10.005 [DOI] [PubMed] [Google Scholar]
- Scott SA, Edelmann L, Kornreich R, & Desnick RJ (2008). Warfarin pharmacogenetics: CYP2C9 and VKORC1 genotypes predict different sensitivity and resistance frequencies in the Ashkenazi and Sephardi Jewish populations. Am J Hum Genet, 82(2), 495–500. doi: 10.1016/j.ajhg.2007.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott SA, Edelmann L, Liu L, Luo M, Desnick RJ, & Kornreich R (2010). Experience with carrier screening and prenatal diagnosis for 16 Ashkenazi Jewish genetic diseases. Hum Mutat, 31(11), 1240–1250. doi: 10.1002/humu.21327 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott SA, Martis S, Peter I, Kasai Y, Kornreich R, & Desnick RJ (2012). Identification of CYP2C19*4B: pharmacogenetic implications for drug metabolism including clopidogrel responsiveness. Pharmacogenomics J, 12(4), 297–305. doi: 10.1038/tpj.2011.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott SA, Scott ER, Seki Y, Chen AJ, Wallsten R, Owusu Obeng A, . . . Edelmann L (2020). Development and Analytical Validation of a 29 Gene Clinical Pharmacogenetic Genotyping Panel: Multi-Ethnic Allele and Copy Number Variant Detection. Clin Transl Sci. doi: 10.1111/cts.12844 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suiter CC, Moriyama T, Matreyek KA, Yang W, Scaletti ER, Nishii R, . . . Yang JJ (2020). Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proc Natl Acad Sci U S A, 117(10), 5394–5401. doi: 10.1073/pnas.1915680117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan A, Abecasis GR, & Kang HM (2015). Unified representation of genetic variants. Bioinformatics, 31(13), 2202–2204. doi: 10.1093/bioinformatics/btv112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsujimoto S, Osumi T, Uchiyama M, Shirai R, Moriyama T, Nishii R, . . . Kato M (2018). Diplotype analysis of NUDT15 variants and 6-mercaptopurine sensitivity in pediatric lymphoid neoplasms. Leukemia, 32(12), 2710–2714. doi: 10.1038/s41375-018-0190-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Lee M, Rowell WJ, Menafra R, Guchelaar HJ, Swen JJ, & Anvar SY (2022). Application of long-read sequencing to elucidate complex pharmacogenomic regions: a proof of principle. Pharmacogenomics J, 22(1), 75–81. doi: 10.1038/s41397-021-00259-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, . . . Hunkapiller MW (2019). Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol, 37(10), 1155–1162. doi: 10.1038/s41587-019-0217-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang JJ, Landier W, Yang W, Liu C, Hageman L, Cheng C, . . . Relling MV (2015). Inherited NUDT15 variant is a genetic determinant of mercaptopurine intolerance in children with acute lymphoblastic leukemia. J Clin Oncol, 33(11), 1235–1242. doi: 10.1200/JCO.2014.59.4671 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang JJ, Whirl-Carrillo M, Scott SA, Turner AJ, Schwab M, Tanaka Y, . . . Gaedigk A (2019). Pharmacogene Variation Consortium Gene Introduction: NUDT15. Clin Pharmacol Ther, 105(5), 1091–1094. doi: 10.1002/cpt.1268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y, Botton MR, Scott ER, & Scott SA (2017). Sequencing the CYP2D6 gene: from variant allele discovery to clinical pharmacogenetic testing. Pharmacogenomics, 18(7), 673–685. doi: 10.2217/pgs-2017-0033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y, Peter I, & Scott SA (2014). Pharmacogenetics in Jewish populations. Drug Metabol Drug Interact, 29(4), 221–233. doi: 10.1515/dmdi-2013-0069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y, Sebra R, Pullman BS, Qiao W, Peter I, Desnick RJ, . . . Scott SA (2015). Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS). BMC Genomics, 16, 350. doi: 10.1186/s12864-015-1572-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang SM, Rehling D, Jemth AS, Throup A, Landazuri N, Almlof I, . . . Helleday T (2021). NUDT15-mediated hydrolysis limits the efficacy of anti-HCMV drug ganciclovir. Cell Chem Biol, 28(12), 1693–1702 e1696. doi: 10.1016/j.chembiol.2021.06.001 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
