Abstract
Recombinant adeno-associated viruses (rAAVs) are currently the most prominently investigated vector platform for human gene therapy. The rAAV capsid serves as a potent and efficient vehicle for delivering genetic payloads into the host cell, while the vector genome determines the function and effectiveness of these biotherapies. However, current production schemes yield vectors that may consist of heterogeneous populations, compromising their potencies. The development of next-generation sequencing methods within the past few years have helped investigators profile the diversity and relative abundances of heterogenous species in vector preparations. Specifically, long-read sequencing methods, like single molecule real-time (SMRT) sequencing, have been used to uncover truncations, chimeric genomes, and inverted terminal repeat (ITR) mutations in vectors. Unfortunately, these sequencing platforms may be inaccessible to investigators with limited resources, require large amounts of input material, or may require long wait times for sequencing and analyses. Recent advances with nanopore sequencing have helped to bridge the gap for quick and relatively inexpensive long-read sequencing needs. However, their limitations and sample biases are not well-defined for sequencing rAAV. In this study, we explored the capacity for nanopore sequencing to directly interrogate rAAV content to obtain full-length resolution of encapsidated genomes. We found that the nanopore platform can cover the entirety of rAAV genomes from ITR to ITR without the need for pre-fragmentation. However, the accuracy for base calling was low, resulting in a high degree of miscalled bases and false indels. These false indels led to read-length compression; thus, assessing heterogeneity based on read length is not advisable with current nanopore technologies. Nonetheless, nanopore sequencing was able to correctly identify truncation hotspots in single-strand and self-complementary vectors similar to SMRT sequencing. In summary, nanopore sequencing can serve as a rapid and low-cost alternative for proofing AAV vectors.
Keywords: adeno-associated virus/nanopore/vector heterogeneity/inverted terminal repeat
INTRODUCTION
Adeno-associated viruses (AAVs) are currently the most prominent viral vector platform for human gene therapy. However, recombinant (r)AAV preparations can contain contaminants that originate from plasmid and host cell genomic DNA used in production schemes, or adventitious viruses.1,2 In addition, non-unit length and defective genomes can arise from truncations driven by secondary structures with high thermostabilities within the vector sequence.3–5 These contaminants contribute to the overall heterogeneity and reduced functionality of preparations. The greater the vector heterogeneity, the higher the doses that are required to achieve therapeutic effect and, subsequently, the greater the risks for tissue or cellular toxicity.
Next-generation sequencing (NGS) workflows have recently been developed to examine the composition of encapsidated rAAV genomes with single-particle resolution.4,5 The ability to characterize rAAV vector genomes with NGS has established new definitions for vector quality. Long-read sequencing by Oxford Nanopore Technologies (ONT) is one method that can enable the interrogation of viral vector genomes as intact strands.6 The nanopore sequencing platform relies on a recombinant helicase that unwinds a double-stranded DNA fragment, driving the lagging strand in a 5′ to 3′ direction through an alpha-hemolysin pore. As each nucleotide is threaded through the nanopore, changes in ionic current across the membrane are detected by a sensor and recorded.7
Nanopore sequencing has been used previously to sequence rAAVs by other groups;8,9 however, these reports applied fragmentation of the genomes before adaptoring and sequencing. Unfortunately, fragmented reads fail to fully capture the structure of the intact genome and heterogeneity of the preparation. In addition, one of these initial reports unexpectedly revealed that the predominant means of genome replication during production was exclusively via rolling-circle replication and not by rolling-hairpin replication,9 the canonical means for AAV vector replication.
In this study, we describe nanopore sequencing of single-stranded (ss) and self-complementary (sc)AAV vector genomes from inverted terminal repeat (ITR) to ITR on ONT's MinION platform to evaluate its ability to profile vector populations. We show that nanopore sequencing can be used directly without fragmentation of genomes to capture vector heterogeneity and identify truncation hotspots. Sequencing of the cis plasmids used to produce the vectors also confirmed that detection of truncation events only occur in vector preparations, but are not an artifact of the sequencing platform. Finally, we provide support that particles are packaged with vector genomes that arise predominantly by rolling-hairpin replication. These findings demonstrate that nanopore sequencing can provide fast-proofing of rAAV preparations with the capacity to profile the heterogeneity of vector populations.
MATERIALS AND METHODS
Vector library preparation and nanopore sequencing
Three vector genomes that were previously profiled by AAV-genome population sequencing (AAV-GPseq) were evaluated with nanopore sequencing: two single-stranded vectors, ssAAV-TBG-SaCas9-U6-sgRNA and ssAAV-2xT2T, and one self-complementary vector, scAAV-Intron-R.4,5 rAAVs were produced in HEK293 cells by the triple plasmid transfection method as described previously.5,10 Vector DNA samples (<1 μg) were treated with heating and slow annealing to promote Watson-Crick base pairing of the plus and minus stranded genomes that are packaged into separate virions.11 The annealed material was subjected to library preparations for ONT sequencing. The Native Barcoding Expansion 1–12 (PCR-free) (EXP-NBD104) and Ligation Sequencing Kit (SQK-LSK109) were used for native barcoding of genomic DNA.
The cis plasmids used for vector production were prepared for nanopore sequencing by first digesting with PacI, which cuts the plasmids at sites that are immediately beyond the 5′- and 3′-ITRs. The resulting DNA fragments were isolated by gel electrophoresis and purified with the QIAquick Gel Extraction Kit (Cat No. 28706; Qiagen, Valencia, CA), following manufacturer's recommended procedures. The enzyme-digested plasmid fragments were not subjected to heating and slow annealing, since they are already double stranded and do not require an annealing step. Libraries were then multiplexed and sequenced on an ONT MinION instrument using two commercially available flow cell versions, R9.4.1 and R10.0.1, herein referred to as R9 and R10, respectively. All flow cells were subjected to the recommended quality control checks before sequencing was performed.
Sequencing data analyses
MinKNOW software (v19.10.1) was used for base calling, adapter trimming, and demultiplexing. Sequencing reads were aligned to the appropriate references by BWA-MEM12 within the Galaxy web-based interface.13–16 Alignments were visualized using Integrative Genome Viewer (version 2.6.3)17 with soft-clipped bases displayed. Start and end positions of aligned reads were defined by converting bam format to bed format with BEDTools (version 2.27.0).18,19 The abundances of start and end positions of all mapped reads were tabulated, binned into 10-nt intervals, and plotted using GraphPad Prism (version 8.2.1 for Windows; GraphPad Software, San Diego, CA).
Figure 1 provides an illustrated summary of the workflow for sequencing rAAV vector or plasmid DNAs described in this study.
Figure 1.
Workflow for ONT nanopore sequencing and analyses of ssAAVs and scAAVs. (A) rAAVs were subjected to DNase I and Pronase treatment, followed by phenol-chloroform extraction to isolate viral DNAs. The single-stranded viral DNA strands were heated and slowly cooled to promote annealing of the strands to produce adaptable 5′ and 3′ ends. Plasmid DNAs were linearized by enzymatic digestion. (B) Heat-treated rAAV genomes or nonheat-treated plasmid DNA were processed following standard procedures for ONT nanopore sequencing, including DNA repair and end-prep, and ligation of barcodes and sequencing adapters. (C) Prepared libraries were loaded onto MinION flow cells (R9 or R10) for sequencing. Sequences were base called using the ONT's MinKNOW software. Read alignments and analyses were processed by Galaxy and visualized using IGV. IGV, Integrative Genome Viewer; ONT, Oxford Nanopore Technologies; ssAAV, single-stranded adeno-associated virus.
Data availability
The datasets generated and/or analyzed in the current study are available at the NCBI Sequence Read Archive under the BioProject ID: PRJNA849253.
RESULTS
Direct sequencing of rAAV vectors with nanopore can interrogate fully intact genomes
We aimed to determine the accuracy of the ONT platform to directly sequence rAAV genomes without the need for fragmentation.8,9 We first selected an ssAAV vector preparation that was previously shown by the AAV-GPseq workflow to predominantly harbor full-length vector genomes.5 This vector (ssAAV-TBG-SaCas9-U6-sgRNA) is an all-in-one construct that contains a humanized Staphylococcus aureus Cas9 (SaCas9) gene driven by the thyroxine-binding globulin (TBG) promoter, and a single-guide RNA (sgRNA) driven by the U6 RNA pol III promoter (Fig. 2). Strikingly, the coverage of nanopore reads from an R9 flow cell were similar to what was observed by single molecule real-time (SMRT) sequencing and AAV-GPseq analyses.5 We observed a minor truncation hotspot centered at the sgRNA cassette (Fig. 2A, red arrow). We also found that there was a proportion of reads that were truncated at the TBG promoter (Fig. 2A, blue arrow).
Figure 2.
IGV display of packaged vector and plasmid DNA reads aligned to the reference genome. Nanopore reads of ssAAV-SaCas9-sgRNA vector genomes (A) and restriction enzyme-excised parental plasmid DNA spanning from ITR to ITR (B). The vector genome reference with indicated functional domains is shown above. Coverage summaries highlight positions with high frequency (>20%) mismatches. Aligned reads are shown in a squished display to illustrate the full diversity of reads. Red asterisks indicate regions of high read mismatches attributed to flip and flop ITR configurations. Blue and red arrows indicate truncated reads among vector DNA reads centered at the TBG promoter and on the sgRNA cassette, respectively. High-frequency mismatches shared between vector and plasmid reads are demarcated by magenta arrowheads. A few selected read groups in B that do not span the full reference are demarcated in red lines. ITR, inverted terminal repeat; sgRNA, single-guide RNA; TBG, thyroxine-binding globulin.
In addition, sequencing the rAAV genome without fragmentation allowed coverage of the ITRs, yielding the different flip and flop configurations as indicated by the high degree of alignment mismatches observed at the 5′- and 3′-ITRs (Fig. 2A). One notable observation was the frequency of mismatched bases throughout the individual reads. Several positions throughout the genome showed a greater than 20% frequency of mismatches (Fig. 2A), which typically reflect polymorphisms among the reads.
We next aimed to determine whether the sequence mismatches and truncations accurately reflected vector genome heterogeneity or whether they were due to sequencing error. We therefore sought to sequence the parental cis plasmid, which represents a template unaffected by rAAV replication and packaging error. Conveniently, the plasmid harbors PacI cut sites that flank directly beyond the 5′- and 3′-ITRs. The plasmid was PacI digested to liberate the ITR-to-ITR vector genome as a double-stranded 4.7-kb fragment. Similar to the vector genome reads, we observed complete coverage of the reference from ITR-to-ITR (Fig. 2B), demonstrating that nanopore sequencing can span the entirety of the fragment and is not inherently impeded by the ITR elements. Importantly, we also observed a high degree of mismatches throughout the length of the reads.
In fact, the number of positions with greater than 20% frequency of mismatches was more abundant in the enzyme digest-retrieved fragment than the vector genome. Of the nine positions demonstrating high-frequency mismatches among the vector reads, four were also detected to have high frequencies of mismatches in the plasmid reads (Fig. 2A, B). These findings demonstrate that the majority of sequence mismatches are likely attributed to the inherent errors in nanopore sequencing and are partially stochastic. We note that some reads did not span the full length of the target (Fig. 2B). These species may represent degraded DNA fragments or a lack of processivity throughout the entire strand, resulting in randomized read termination.
Nanopore cannot reliably interrogate vector genome lengths
We sought to validate the accuracy of nanopore sequencing to represent full or partial rAAV genomes. With the AAV-GPseq workflow, this is typically achieved by observing the lengths of reads that align to the reference. We therefore tested the capacity for nanopore reads from an R9 flow cell to determine the lengths of restriction enzyme-excised DNAs from plasmids. Interestingly, the resulting lengths from these reads were shorter than the expected 4.7-kb full-length fragment size (Fig. 3A). Approximately 72% of the reads had sizes within ±5% (±240 nt) of the full sequence length. There are two possible sources of length reduction: read truncations occurring at the ends of the reads (loss of processivity at the ITRs) or inaccuracies across homopolymers (cumulative loss of bases that lead to length compression).
Figure 3.
Analyses of nanopore read lengths and start/end positions from R9 and R10 flow cells. (A) Histogram displaying the counts of nanopore read lengths from restriction enzyme-excised DNA obtained with R9 (top) or R10 (bottom) flow cells. The expected length of the PacI fragment is 4,688 bp (demarcated by solid line). Dotted lines indicate the size range within ±5% of the full sequence length (±240 nt). The percentage of reads falling within this range is indicated above the histogram. (B) Histogram displaying counts of start (orange trace) and end (blue trace) positions of DNA fragment reads obtained from R9 (top) and R10 (bottom) flow cells. Diagram of the ssAAV-SaCas9-sgRNA genome reference is shown above. Counts were binned into 10-nt increments.
A more recent R10 flow cell was also evaluated. The R10 flow cell deploys dual ionic sensors to improve base calling accuracy across homopolymers. However, the R10 flow cell was untested for its capacity to be processive through AAV ITRs. When tested for its accuracy to profile read lengths, the R10 flow cell also revealed shorter than expected read lengths (Fig. 3A). Only 66% of reads were within ±5% of the full sequence length. Furthermore, there was nearly twofold more R9 reads than R10 reads (Table 1), despite using equal volumes and concentrations of the library on the two flow cells.
Table 1.
Comparison of read counts between flow cells tested
| Construct | DNA Template | R9 Reads | R10 Reads |
|---|---|---|---|
| ssAAV-SaCas9-sgRNA | Vector genome | 56,769 | 15,714 |
| Plasmid | 68,539 | 24,426 | |
| ssAAV-2xT2T | Vector genome | 190,427 | 44,230 |
| Plasmid | 1,053,788 | 356,163 |
AAV, adeno-associated virus.
To determine whether sequence information was lost at the ends of the fragments, we captured read alignment start and end positions from R9 and R10 reads and tabulated their counts across the reference (Fig. 3B). The count frequencies revealed that the majority of read alignments had start and end positions terminating at the PacI sites that flank the ITRs for both flow cells. Disregarding the overall read count differences between R9 and R10 runs, the distribution of read alignment start and end positions had similar trends.
These observations showed that although the overall read lengths were shorter than unit-length, the reads spanned from ITR to ITR. Therefore, we concluded that the reads were becoming compressed as a result of sequence loss throughout the read and not exclusively at the 5′ and 3′ ends. This interpretation is further supported by the observation of gaps in reads visible in the aligned reads (Fig. 2).
Nanopore sequencing can reveal truncated genomes, but has low base-calling accuracy
We next assessed the ability for the nanopore platform to profile genome heterogeneity. For this study, we aimed to profile the vectors previously reported by AAV-GPseq to have a high degree of heterogeneity and known truncation hotspots.4,5 The first vector contained two sgRNA cassettes configured in a tail-to-tail (TT) configuration and the mCherry transgene placed in tandem (ssAAV-2xT2T) (Fig. 4A).5 The TT sgRNA design formed a long palindromic structure that drove substantial genome truncations. The second vector is a self-complementary design that contains a short-interfering (si)RNA cassette targeting firefly luciferase (siFLuc) inserted in the intronic region of an eGFP transgene that is driven by the hybrid cytomegalovirus (CMV) enhancer, chicken β-actin promoter (CB6) regulatory cassette (scAAV-Intron-R)4 (Fig. 4B). The siFLuc sequence served as a short-hairpin structure that conferred template-switching during vector genome replication, resulting in truncated genomes.4
Figure 4.
Nanopore sequencing of vector genomes known to generate design-influenced truncations. (A, B) IGV displays and start/stop positions of read alignments from nanopore sequencing of ssAAV-2xT2T (A), and scAAV-Intron-R vector genomes (B). (C, D) Sequencing of DNA fragments excised from plasmids with PacI. Reads of ssAAV-2xT2T (C) and scAAV-Intron-R (D) plasmid-excised fragments are shown. Aligned reads are presented in squished display to show all reads. Each alignment is accompanied by a coverage summary track. Positions that show a greater than 20% mismatches are indicated in colored regions. Counts of start (orange trace) and end (blue trace) positions of mapped reads were tabulated and displayed for each sample (bottom graphs). Vector genome references displaying construct functional domains are shown above (A, B). Red arrowheads show positions with a high frequency of gaps.
Analyses of nanopore read alignments and their start and end positions revealed the same truncation hotspots as defined by previous SMRT sequencing runs (Fig. 4A, B).4,5 We did not observe any full-length genomes for the ssAAV-2xT2T vector by nanopore sequencing (Fig. 4A), which was identical to our SMRT sequencing analysis.5 The majority of reads were revealed to be truncated at the dual sgRNA cassettes.
The scAAV-Intron-R vector demonstrated truncation hotspots at the siFLuc cassette and at the 5′ region of the eGFP ORF (Fig. 4B). As shown before by SMRT sequencing, nearly all genomes encompassed the 3′-ITR, with truncation hotspots acting as positions for template-switching events.4 We also observed a substantial abundance of read alignments with start positions at the polyA region. These reads may represent small snapback genomes, short ITR-bearing genomes with self-complementary configurations, and lengths of <1 kb in size.20
We next aimed to determine whether the detection of these events could in any way be impacted by low processivity of strands containing strong secondary structures through the pore, limiting the technology from accurately assessing the frequency of truncation events. As before, we sequenced enzyme-digested plasmid DNA fragments spanning ITR to ITR to directly determine whether elements in vectors could compromise nanopore processivity, leading to the false-positive identification of truncations. Interestingly, the summary alignment did reveal a slightly higher coverage of the 5′ end of the reference.
However, alignment start and end positions were predominantly enriched at the 5′- and 3′-PacI sites for both the ssAAV-2xT2T and scAAV-Intron-R fragment targets, respectively (Fig. 4C, D). Therefore, read processivity across DNA elements that promote vector truncations during rAAV production (thermostable hairpins and GC-rich sequences) does not inherently influence the accuracy of identifying truncations with nanopore sequencing.
Despite our observations that sequencing is processive through thermostable hairpins and GC-rich regions, we found that for the ssAAV-2xT2T vector, read accuracy across the dual gRNA cassette was very poor. In fact, the reads carried a high frequency of gaps in this region (Fig. 4C). In both ssAAV-2xT2T and scAAV-Intron-R vectors, read accuracies across the CB6 promoters were also poor (Fig. 4C, D). The CB6 promoter harbors a CG-rich region, which likely contributed to the low accuracy of base calling across this region.
Rolling-hairpin replication is confirmed as the dominant means for rAAV genome duplication in HEK293 cells
The ITR is a self-primed structure that initiates replication using its own contiguous strand as the template. The ITR is resolved by nicking activity at the terminal resolution site (TRS) by Rep, separating the nascent strand from the template.21–23 The previous 3′-ITR becomes the new 5′-ITR of the newly synthesized molecule (Fig. 5A). Since the ITR sequence is asymmetric, the actions of rolling-hairpin replication and resolution generate four flip and flop ITR-configured genomes: flip:flip, flop:flop, flip:flop, and flop:flip.24 If the TRS remains un-nicked, two genomes remain joined together in a contiguous sequence. These intermediate configurations exist as head-to-head (HH)- or TT-oriented molecules.
Figure 5.
Analyses of unresolved ITR junctions as readouts for replication. (A) Diagram of rolling-hairpin replication model for the rAAV genome (left) and the rolling-circle replication model following HT recombination of two genomes (right). (B) Histogram of reads detected that match HH, TT, or HT and TH configurations among the ssAAV-SaCas9-sgRNA and scAAV-Intron-R vector reads. HH, head-to-head; HT, head-to-tail; TH, tail-to-head; TT, tail-to-tail.
Curiously, the previous report that used nanopore sequencing to profile a small subset of unresolved genomes from rAAVs showed that head-to-tail (HT) or tail-to-head (TH) configurations were the only unresolved species identified.9 This surprising finding suggested that the predominant mode of replication was by rolling-circle replication of circularized monomers or recombined dimers that produce HT-oriented genomes (Fig. 5A).
To address whether nanopore can inform on the mode of rAAV replication within pTx/HEK293 production cell lines, we identified ssAAV-SaCas9-sgRNA and scAAV-Intron-R vector reads with unresolved genomes to determine whether HH, TT, or HT configurations were present among single-strand and self-complementary genomes, respectively. We found that the ssAAV genomes showed more unresolved ITRs with HH and TT configurations than HT forms (approximate ratio of 4:1) (Fig. 5B). With the scAAV genomes, we observed an even higher abundance of HH/TT configurations in unresolved ITRs than HT/TH configurations (approximate ratio of 60:1) (Fig. 5B).
However, it should be noted that due to the lack of a TRS at the mutant (m)ITR of the scAAV vector, which is required for ITR resolution, the rate of unresolved genomes at the mITR (H/H configuration) may mask the true frequency of rolling-circle replication revealed by this end. Nevertheless, based on the configurations identified for the ssAAV vector and those at the wild-type (wt)ITR, rolling-hairpin replication is the dominant form of vector genome replication, while rolling-circle replication may also exist at lower frequencies. These species are likely a product of rescued or replicated genomes that have recircularized by intra- or intermolecular recombination of the ITRs.
DISCUSSION
Through long-read NGS, we have found that heterogeneity can be influenced by the transgene as well as the packaging platform used to generate vectors.3,5,25 Due to recent reports of dysfunction, sepsis, liver failure, shock, and death from high-dose rAAV immunotoxicity,26,27 there is an immediate need to develop methods to identify and reduce vector heterogeneity and contaminants. In this study, we evaluated the capacity of nanopore sequencing as an NGS alternative to profile rAAVs directly without the need for template fragmentation. We show that the representation obtained from nanopore sequencing is similar to those obtained by SMRT sequencing. It is important to note that the ability for nanopore sequencing to process single-strand DNA inputs should not be overstated, since it still favors a double-stranded template.9
We observed read-length compression with nanopore, where mapped vector genome reads were ∼2–3% shorter than the expected length. This phenomenon was found to be related to the high frequency of gaps/deletions found in the reads and not due to poor processivity. Since the genome lengths defined by the nanopore reads is less than reliable, we recommend that truncations be defined by start and end positions of aligned reads. Although this strategy will not necessarily yield information regarding the structure of the vector genome, profiling nanopore reads by start and end alignment positions can still provide information regarding the functional regions that are spanned. This information can be used to infer the percentage of functional genomes in the preparation.
Another limitation for single-pass reads inherent to nanopore sequencing is its inability to accurately determine genomic variation (e.g., indels and base substitutions). The data presented outline the limitations for nanopore sequencing of rAAVs and would support its use only for early stage (basic research); for example, as a means of general vector preparation validation and screening studies of multiple vector design candidates. Subsequent validation of lead research candidates and translational development may require the more accurate NGS approaches.
Finally, we detected HH- and TT-configured unresolved rAAV genomes using nanopore sequencing. These atypical species that span two genomic units, each of which may not be full-length, are attributed to genomes produced by rolling-hairpin replication and presumably packaged into capsids. We have shown previously that genomes with unresolved ITRs can be packaged as truncated scAAV genomes and likely result from mutated ITRs or unknown mechanism(s) that influences Rep function.25 Interestingly, some of the observed unresolved genomes were greater than the 4.7-kb packaging capacity of AAV. It is currently unknown whether these oversized species accurately reflect what is packaged into rAAVs, or whether they are due to artifacts related with the nanopore library build steps. Until there is a more direct method to address these questions, it remains unknown why and how genomes larger than 5 kb are detected by nanopore sequencing.
Nevertheless, we provide evidence that rolling-hairpin replication is the major means of genome replication over rolling-circle replication in both ssAAV and scAAV genomes. This result was expected given the presence of different flip and flop ITR conformations from the rAAV reads, which cannot be generated by rolling-circle replication alone. However, the presence of HT- and TH-configured rAAV genomes suggests that rAAV genomes may have recombined to form circular species and can contribute to rAAV genome replication and packaging at lower frequencies.
We also note that unresolved ITRs at the wtITRs, which mark rolling-hairpin replication, were more frequently detected in the scAAV-Intron-R vector than the mITR. The reason is likely attributed to the generation of truncated scAAV vectors during replication, which all carry the wtITRs but are missing the mITR.3,4 These new findings hint at a mechanism of AAV genome replication yet to be explored. Further validation of these species by standard molecular biology approaches is needed.
Using AAV-GPseq, we recently reported differences in the heterogeneity of vector genomes packaged into virions made between plasmid-transfection in human cells (pTx/HEK293) and recombinant baculovirus infection of insect cells (rBV/Sf9).25 It was shown that ITR mutations in rBV/Sf9-produced vectors occurred at higher rates than in pTx/HEK293-produced vectors; therefore, leading to higher degrees of unresolved ITRs and heterogeneously packaged particles. As a method that can also profile genome heterogeneity, nanopore sequencing can also be used as an alternative to SMRT sequencing to observe platform-related differences in vector genomes. However, we caution that the ability to analyze ITR mutations may be limited with nanopore sequencing, since base call resolution is generally poorer at GC-rich regions, a hallmark of the ITR B and C arms.24
CONCLUSIONS
Nanopore sequencing can be used to directly profile rAAV vectors to reveal the full genome structures and heterogeneities of encapsidated genomes. However, due to its single-pass coverage of DNA strands, base calling at each position is relatively poor. It is not recommended to use nanopore sequencing to assess rAAV genome mutations, variants, and other queries that rely on high base call resolution. It is an excellent sequencing platform, whose major advantages are its relatively low cost, quick turnaround time, relative ease of use, and accessibility.
ACKNOWLEDGMENTS
We thank Athma Pai and Nida Javeed for sharing MinION flow cells when they were in short supply. We thank the UMass Chan Viral Vector Core for producing the vectors used in this study, and Dan Wang and Alex Brown for sharing vectors for this project.
AUTHORs' CONTRIBUTIONS
S.N.: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (equal); Methodology (lead); Software (lead); Validation (lead); Visualization (equal); and Writing—original draft (equal). N.T.T.: Investigation (equal); Methodology (equal); Resources (equal); Software. S.M.: Investigation (equal); and Resources (equal). R.H.: Investigation (equal); Resources (equal). Q.S.: Investigation (equal); and Resources (equal). J. X.: Investigation (equal) and Resources (equal). G. G.: Conceptualization (equal); Funding acquisition (lead); Project administration (equal); Supervision (equal); and Writing—review and editing (equal). P.W.L.T.: Conceptualization (equal); Formal analysis (equal); Methodology (equal); Project administration (equal); Supervision (equal); Visualization (equal); Writing—original draft (equal); and Writing—review and editing (equal).
AUTHOR DISCLOSURE
G.G. is a scientific cofounder of Voyager Therapeutics and Aspa Therapeutics, and holds equity in these companies. G.G. and P.W.L.T. are inventors on patents with royalties licensed to biopharmaceutical companies. The remaining authors declare no competing interests.
FUNDING INFORMATION
G.G. is supported by grants from the UMass Chan Medical School (an internal grant) and by the National Institutes of Health (R01NS076991-01, P01HL131471-05, R01AI121135, UG3HL147367-01, R01HL097088, R01HL152723-02, U19AI149646-01, and UH3HL147367-04).
REFERENCES
- 1. Srivastava A, Mallela KMG, Deorkar N, et al. Manufacturing challenges and rational formulation development for AAV viral vectors. J Pharm Sci 2021;110(7):2609–2624; doi: 10.1016/j.xphs.2021.03.024 [DOI] [PubMed] [Google Scholar]
- 2. Wright JF. Product-related impurities in clinical-grade recombinant AAV vectors: Characterization and risk assessment. Biomedicines 2014;2(1):80–97; doi: 10.3390/biomedicines2010080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Xie J, Mao Q, Tai PWL, et al. Short DNA hairpins compromise recombinant adeno-associated virus genome homogeneity. Mol Ther 2017;25(6):1363–1374; doi: 10.1016/j.ymthe.2017.03.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Tai PWL, Xie J, Fong K, et al. Adeno-associated virus genome population sequencing achieves full vector genome resolution and reveals human-vector chimeras. Mol Ther Methods Clin Dev 2018;9:130–141; doi: 10.1016/j.omtm.2018.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Tran NT, Heiner C, Weber K, et al. AAV-genome population sequencing of vectors packaging CRISPR components reveals design-influenced heterogeneity. Mol Ther Methods Clin Dev 2020;18:639–651; doi: 10.1016/j.omtm.2020.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Ibraheim R, Tai PWL, Mir A, et al. Self-inactivating, all-in-one AAV vectors for precision Cas9 genome editing via homology-directed repair in vivo. Nat Commun 2021;12(1):6267; doi: 10.1038/s41467-021-26518-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Wang Y, Zhao Y, Bollas A, et al. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021;39(11):1348–1365; doi: 10.1038/s41587-021-01108-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Keiser MS, Ranum PT, Yrigollen CM, et al. Toxicity after AAV delivery of RNAi expression constructs into nonhuman primate brain. Nat Med 2021;27:1982–1989; doi:10.1038/s41591-021-01522-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Radukic MT, Brandt D, Haak M, et al. Nanopore sequencing of native adeno-associated virus (AAV) single-stranded DNA using a transposase-based rapid protocol. NAR Genom Bioinform 2020;2(4):lqaa074; doi: 10.1093/nargab/lqaa074 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Sena-Esteves M, Gao G. Introducing genes into mammalian cells: Viral vectors. Cold Spring Harb Protoc 2020;2020:095513; doi:10.1101/pdb.top095513 [DOI] [PubMed] [Google Scholar]
- 11. Nakai H, Storm TA, Kay MA. Recruitment of single-stranded recombinant adeno-associated virus vector genomes and intermolecular recombination are responsible for stable transduction of liver in vivo. J Virol 2000;74(20):9451–9463; doi: 10.1128/jvi.74.20.9451-9463.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997 2013.
- 13. Blankenberg D, Von Kuster G, Coraor N, et al. Galaxy: A web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol 2010;Chapter 19:Unit 19.10:11–21; doi: 10.1002/0471142727.mb1910s89 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. de Koning W, Miladi M, Hiltemann S, et al. NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy. Gigascience 2020;9(10):giaa105; doi: 10.1093/gigascience/giaa105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Giardine B, Riemer C, Hardison RC, et al. Galaxy: A platform for interactive large-scale genome analysis. Genome Res 2005;15(10):1451–1455; doi: 10.1101/gr.4086505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Goecks J, Nekrutenko A, Taylor J. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010;11(8):R86; doi: 10.1186/gb-2010-11-8-r86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Robinson JT, Thorvaldsdóttir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol 2011;29(1):24–26; doi: 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Quinlan AR. BEDTools: The Swiss-Army Tool for genome feature analysis. Curr Protoc Bioinformatics 2014;47:11.12.11–34; doi: 10.1002/0471250953.bi1112s47 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010;26(6):841–842; doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Zhang J, Yu X, Guo P, et al. Satellite subgenomic particles are key regulators of adeno-associated virus life cycle. Viruses 2021;13(6):1185; doi: 10.3390/v13061185 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Chiorini JA, Wiener SM, Owens RA, et al. Sequence requirements for stable binding and function of Rep68 on the adeno-associated virus type 2 inverted terminal repeats. J Virol 1994;68(11):7448–7457; doi: 10.1128/JVI.68.11.7448-7457.1994 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Chiorini JA, Yang L, Safer B, et al. Determination of adeno-associated virus Rep68 and Rep78 binding sites by random sequence oligonucleotide selection. J Virol 1995;69(11):7334–7338; doi: 10.1128/jvi.69.11.7334-7338.1995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Snyder RO, Im DS, Muzyczka N. Evidence for covalent attachment of the adeno-associated virus (AAV) rep protein to the ends of the AAV genome. J Virol 1990;64(12):6204–6213; doi: 10.1128/jvi.64.12.6204-6213.1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Wilmott P, Lisowski L, Alexander IE, et al. A user's guide to the inverted terminal repeats of adeno-associated virus. Hum Gene Ther Methods 2019;30(6):206–213; doi: 10.1089/hgtb.2019.276 [DOI] [PubMed] [Google Scholar]
- 25. Tran NT, Lecomte E, Saleun S, et al. Human and insect cell-produced rAAVs show differences in genome heterogeneity. Hum Gene Ther 2022; 33(7–8):371–388; doi: 10.1089/hum.2022.050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. High-dose AAV gene therapy deaths. Nat Biotechnol 2020;38(8):910; doi: 10.1038/s41587-020-0642-9 [DOI] [PubMed] [Google Scholar]
- 27. Hinderer C, Katz N, Buza EL, et al. Severe toxicity in nonhuman primates and piglets following high-dose intravenous administration of an adeno-associated virus vector expressing human SMN. Hum Gene Ther 2018;29(3):285–298; doi: 10.1089/hum.2018.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analyzed in the current study are available at the NCBI Sequence Read Archive under the BioProject ID: PRJNA849253.





