Skip to main content
Molecular Therapy logoLink to Molecular Therapy
. 2017 Sep 15;26(1):115–131. doi: 10.1016/j.ymthe.2017.09.015

Genome-wide Mapping of Off-Target Events in Single-Stranded Oligodeoxynucleotide-Mediated Gene Repair Experiments

Sarah Radecke 1, Klaus Schwarz 1,2, Frank Radecke 2,
PMCID: PMC5763015  PMID: 28988714

Abstract

Short single-stranded oligodeoxynucleotides are versatile molecular tools used in different applications. They enable gene repair and genome editing, and they are central to the antisense technology. Because the usability of single-stranded oligodeoxynucleotides depends on their efficiencies, as well as their specificities, analyzing their genotoxic off-target activities is important. Thus, we have developed a protocol that follows the fate of a biotin-labeled single-stranded oligodeoxynucleotide in human cells based on its physical incorporation into the targeted genome. Affected chromosomal fragments are enriched and preferably sequenced by nanopore sequencing. This protocol was validated in gene repair experiments without intentionally inducing a DNA double-strand break. For a 21-nucleotide-long phosphorothioate-modified oligodeoxynucleotide, we compiled a broad array of error-free incorporations, point mutations, indels, and structural rearrangements from actively dividing HEK293-derived cells. Additionally, we demonstrated the usefulness of this approach for primary cells by treating human CD34+ hematopoietic stem and progenitor cells with a 100-nucleotide-long unmodified oligodeoxynucleotide directed against the endogenous CYBB locus. This work should pave the way for future genotoxicity analyses. Concerning genome engineering approaches based on nuclease-induced DNA double-strand breaks, this protocol could aid in detecting the unwanted effects caused by the donor fragments themselves.

Keywords: cycling cell, gene repair, genome engineering, genome-wide off-target event, genotoxicity, nanopore sequencing, single-stranded oligodeoxyribonucleotide, LM-PCR, DNA replication

Graphical Abstract

graphic file with name fx1.jpg


The usability of single-stranded oligodeoxynucleotides (ssODNs) depends on their efficiencies and their specificities. To enable specificity analyses, Radecke et al. have developed a protocol that they applied to lines and primary human hematopoietic stem and progenitor cells. ssODNs were found to cause diverse genome-wide genotoxic off-target events.

Introduction

Site-directed mutagenesis1 has developed into a platform for high-precision genome engineering of cells and whole organisms. Potential applications include medical therapies, where viral vector-based efforts2, 3 are complemented. In the simplest form, wanted changes in a genomic target region4 can be introduced using a single-stranded oligodeoxyribonucleotide (ssODN) even as short as 13 nucleotides.5 To enhance engineering efficacies by exploiting defined DNA double-strand breaks (DSBs), designer endonucleases (meganucleases, zinc-finger nucleases [ZFNs], transcription-activator-like effector nucleases [TALENs],6 and recently, the highly versatile CRISPR/nuclease systems7) have been developed. To analyze the off-target activities of these DSB-inducers, different protocols are already in place (e.g., IDLV capture,8 BLESS,9 HTGTS,10 GUIDE-seq,11 END-seq,12 and DSBCapture13). However, no procedure has yet been presented examining off-target activities of ssODNs, each of which might match thousands of positions in the human genome. Besides the antisense technology and gene repair, ssODNs are now also used in concert with the designer nucleases,14, 15, 16, 17, 18 and so there is additional need of assessing the off-target activities caused by the ssODNs themselves. Of note, von Kalle et al. developed LAM-PCR19 and nrLAM-PCR,20 because cases of leukemia21, 22, 23 occurred in clinical trials24, 25, 26 with viral vectors. These LAM-PCRs enable extensive investigations into these vectors’ genome-wide integration profiles and dynamics. Although ssODNs can also be integrated into chromosomal target sites,27 analyses based on these powerful PCRs are impractical: (1) the possibly rather short ssODN sequences (<20 nucleotides) at genomic sites will be overwritten by the primers, and (2) structural rearrangements cannot be detected per se.

Utilizing our HEK293-based model cell system carrying multiple copies of a reporter transgene,5 we established an assay and analyzed off-target events from ssODN-mediated gene repair notably without further manipulations such as cell synchronization. To make tracing the fate of the matrix possible, the ssODN itself introduces a biotin moiety into affected genomic sites enabling their enrichment on beads. A subsequent linker-mediated PCR (LM-PCR) amplifies labeled fragments from (presumably mostly) unique events.

Our protocol was used to generate a first insight into ssODN-related off-target events by combining as many data as possible from different transfection experiments. Sequencing analyses began with the classical Sanger chain termination method. To reduce materials and experimental time and to evaluate new generation sequencing (NGS) techniques, selected LM-PCR amplificates were reanalyzed with pyrosequencing.28 Another resequencing was prompted by the availability of the first single-molecule nanopore sequencer, the MinION device.29, 30, 31, 32, 33 Additionally, genome-wide in silico predictions using ssODN sequence informations provided extensive collections of possibly relevant interaction sites for comparisons with those of the cell culture experiments.

To demonstrate the broader applicability of the protocol, we finally run first analyses with human primary CD34+ hematopoietic stem and progenitor cells. These cells were treated with a 100-nt-long ssODN devoid of backbone-modifications. The endogenous CYBB locus on the human X chromosome was the target site for this ssODN.

Results

Methodology for Genome-wide ssODN Off-Target Analyses

Genome-wide ssODN off-target analyses should be highly sensitive without biases for, e.g., subregions or mutation types. With this in mind, we have developed a workflow (Figure 1A; detailed in Figure S1) taking advantage of our reporter cell line 293mEGFP-M12.5 The biotinylated ssODN cor21BdT (Figure 1B) was used to correct a point-mutated enhanced GFP (mEGFP) transgene sequence. Taken as a surrogate for therapy-targeted cell populations, reporter gene-corrected cells could readily be collected by fluorescence-activated cell sorting (FACS; repair frequencies, dead cell counts, and sorting purities, see Figure S2). The transfections comprised calcium phosphate-mediated co-precipitation (CaP) and six runs of nucleofections (Ai-Av, “pool”). In each of the CaP and Ai-Aiii experiments, genomic DNAs (gDNAs) equivalent to 500 sorted gene-repaired cells were scrutinized. Testing the protocol for unsorted cells, experiments Aiv and Av involved 250,000 treated cells (∼0.5% of them EGFP-positive). To estimate the distribution of multiple off-target events among reporter gene-corrected cells of one nucleofection, a pool experiment utilized 96 × 50 EGFP-positive cells.

Figure 1.

Figure 1

Methodology for Analyzing Genome-wide ssODN-Related Off-Target Events

(A) Schematic workflow. ssODN-BdT, single-stranded oligodeoxyribonucleotide carrying a biotinylated deoxythymidine; mEGFP, point-mutated enhanced GFP target gene; DpnII, restriction endonuclease (4-bp recognition site); LM-PCR, linker-mediated PCR. (B) Characterization of the matrix for gene correction (cor). Upper part: Structure of the 21-nt-long biotin-labeled cor21BdT ( = cB; unlabeled control is cor21 [ = c]); BdT, biotin-deoxyThymidine; PS, phosphorothioate linkage; gray box, stabilized DNA residues. Lower part: Sequence string scr21, a cor21 variant (only used in in silico analyses; alignment highlights the relationship of both sequences). (C) Gelelectrophoretic analysis after LM-PCR amplification of captured gDNA fragments. Each lane carries one-tenth of purified amplificates originally derived from ∼500 cells of experimental run Aii. Upper part: 1% agarose gel stained with ethidium bromide. The sequences for the 15- and the 13-mer of cor21 can be found in Figure S10A. Lower part: Corresponding Southern blot probed with 5′-digoxigenin-labeled cor21 (for other runs, see Figures S3–S6). untreated, cells from passaging cultures; t, total, i.e., unsorted cells; cB, biotinylated ssODN cor21; c, unlabeled ssODN cor21; EGFP(−), EGFP-negative cells; EGFP(+), EGFP-positive cells; NTC, non-template control; Pfu, Pyrococcus furiosus; M, 100-bp size marker plus additional fragments with cor21 sequences. kbp, kilobase pair; 4.868-kbp-long fragment: internal loading and blotting control (contains full-length sequence of cor21).

When viewed on agarose gels, all cell samples unexpectedly showed comparable LM-PCR results (Figure 1C). However, Southern blots revealed cor21-specific fragments—indicating incorporation—only for sorted cells treated with the biotinylated ssODN (Figure 1C, lanes 7 and 9; for other transfections, see Figures S3–S6). With the very efficient CaP transfection or 500 times more gDNA during bead-based capture (experiments Aiv, Av), not only sorted but also unsorted cells showed strong signal patterns.

Technical Issues of the Methodology

First, we scrutinized false-positive off-target signals. One type of them was occasionally observed in non-template controls, but only on ethidium bromide-stained agarose gels (Figure 1C, lanes 1 and 10). These signals originated from bacterial DNAs likely contained in reaction buffers (data not shown).

Another type concerned LM-PCR artifacts, where free cor21BdT co-isolated during gDNA preparation might serve as primer. To estimate the amount of co-isolated ssODN, titration experiments led us to conclude that it was <5 pg per 100 ng of gDNA (Figure S7). Nevertheless, utilizing high gDNA inputs yielded—notably weaker—false-positive signals (Figure S6, lanes 2 and 7). Furthermore, adding increasing amounts of cor21BdT also resulted in artifacts (Figure S8, lane 2). On the one hand, sequencing such fragments revealed that the complete ssODN sequence was located terminally (data not shown), a pattern absent in the experimental data. On the other hand, regions similar to cor21 were found nearby the linker sequence, but these regions were never mutated. Here, the ssODN might have trapped fragments by binding to their temporarily single-stranded termini. Note that 18 (∼3%) entries with such a sequence pattern are part of our experimental off-target data (Table S1), even though they might actually be false-positive events. Taken together, LM-PCR artifacts had, if at all, an only marginal influence on our analyses.

Next, we observed an underrepresentation of the corrected EGFP gene copies, because the expected strong 641-base pair (bp) EGFP-specific LM-PCR product signal was missing (Figure 1C). A bead-capture experiment with specific primers showed that the EGFP fragments were, in fact, captured (Figure S9), but could hardly be detected by the LM-PCR, likely due to inefficient linker ligation. The unexpected dropping out of these outnumbering templates, however, was serendipitously helpful in amplifying the (mostly) unique off-target regions.

Last, the vast background of gDNA fragments had to be reduced. For Sanger sequencing, fragments were selected with bacterial colony hybridizations (Figure S10), if they carried cor21 sequences roughly ≥13 nt dubbed “contiguous cor21 portions.” Background data in pyrosequencing and nanopore sequencing were filtered off bioinformatically. In conclusion, the protocol enabled collecting ssODN-marked gDNA fragments.

Reference Datasets

To analyze involvement of genes or repeat elements in off-target events, we compiled lists of UCSC knownCanonical genes and of RepeatMasker elements (characterizations, see Figure S11; see the Materials and Methods).

To examine the in silico predictability of ssODN-relevant off-target regions, we generated two datasets of chromosomal sites with Bowtie 2:34 the positive prediction dataset cor21_B, and as sequence-dependent negative control, dataset scr21_B (utilizing scr21 sequence string) (Figure 1B, bottom). The dataset cor21_R served as sequence-independent negative control comprising randomly called chromosome coordinates (for characterizations, see Figures S12–S16; see the Materials and Methods).

Characterization of Captured gDNA Fragments

From a total of 506,800 targeted cells, we collected 608 independent gDNA fragments using three different sequencing technologies. Two biases concerning fragment properties were observed indicating a non-random distribution: compared to genomic DpnII fragments randomly chosen in silico as reference (see the Materials and Methods), the median length was ∼3 times greater (supported by nanopore sequencing with its unrestricted read lengths) (Figure S17); the ∼51% GC content was higher than the genomic average of ∼42% (data not shown). These 608 fragments were scored as off-target-relevant, because they carried the biotinylated thymidine in a 10- to 21-nt-long “contiguous cor21 portion.” The portion lengths showed a wider distribution for mutated (median: 16 nt) than for non-mutated sites (median: 11 nt; Figure S18). A grouping according to the mutation status was observed: 10- to 13-mers were mostly non-mutated, whereas portions ≥17 nt were always associated with mutations (Figure 2A). To delineate important ssODN nucleotides, sequence logos were generated for the “contiguous cor21 portions.” The block of relevant residues was broader for mutated than for non-mutated fragments (Figure S18). Clearly, non-stabilized residues escaped nuclease activities and were incorporated at chromosomal sites, sometimes also introducing mutations.

Figure 2.

Figure 2

ssODN-Related Off-Target Events in 293mEGFP-M12 Cells

(A) Length distributions of “contiguous cor21 portions.” Event numbers are given above each column. N, dataset size. (B) Distribution of mutation types. Figure based on 193 mutated cor21 “footprints” (total: 627). i, insertion; p, point mutation; d, deletion; r, structural rearrangement. (C) (Left panel) Exemplary off-target sites. Alignment against cor21 sequence (for complete dataset and descriptions, see Table S1). Capital letter, base of “contiguous cor21 portion”; underlined bases, cor21 “footprint”; dash, gap/break in structural rearrangement; filled red triangle, deletion. Red question marks denote mismatched 3′ ends. (Right panel) Scheme of a structural rearrangement. Bases in italics are from GRCh37/hg19. Black question mark denotes unknown fates of reciprocal chromosome pieces. (D) Chromosome-specific distributions. (Top panel) Relative percentages are related to data of chromosome 1 (set to 100%). Dotted lines shall ease contextual viewing. (Middle panel) Absolute counts of experimental off-target events. (Bottom panel) Exemplary, partial karyotype. For entire karyotype with all 627 cor21 “footprints,” see Figure S19. G-banded chromosome schemes are depicted with off-target events (red dots) and prediction data cor21_B, scr21_B, and cor21_R (red, blue, and gray histograms, respectively). Mbp, megabase pair. (E) cor21 “footprints” length distribution. (F) cor21 “footprints” sequence logos. (G) Mutational status versus lengths of predicted mismatch-free cor21-specific regions. Some predicted sites were not observed, others more than once; e.g., of five different 16-mers, only one was found, but three times.

Off-Target Events Collected from ssODN-Mediated Gene Repair

As the presumably major chromosomal interaction sites of the ssODN during off-target events, we defined cor21 “footprints.” To this end, we started from the “contiguous cor21 portions” and classified chromosomal subregions based on matches and mismatches. This resulted in 627 cor21 “footprints” or “off-target sites,” which constituted the basis for our off-target analyses (Table S1). Overall, 193 (31%) of the sites were mutated. Grouping the mutations into seven major subcategories revealed that only three subcategories accounted for >75% of all mutations (Figure 2B). The main cor21BdT-specific mutation types were insertions and point mutations (Table S1). Examples of sequences found at chromosomal off-target sites are presented in Figure 2C (left panel). Remarkably, structural rearrangements constituted ∼20% of the mutated off-target sites, sometimes with additional mutations. Note that the fates of the reciprocal chromosome parts remained unknown (Figure 2C, right panel). For technical reasons, we cannot completely rule out that some of these rearrangement events might have originated as an artifact.

The chromosome-specific distributions of the cor21 “footprints” did not correlate with the relative chromosome lengths (Figure 2D, top panel; Figure S19A). Especially visible for chromosome 17 through chromosome X, off-targets events were found more frequently in chromosomes with relatively increased overall GC contents, which correlated with the increased GC content of the captured fragments. The experimental data tended to accumulate where more sites were predicted (Figure 2D, bottom panel; for entire karyotype, see Figure S19C). Interestingly, the predicted sites for cor21 and scr21 showed strikingly similar chromosomal distributions, likely due to their related primary sequences (Figures 1B, S12, and S13).

To determine the distribution of multiple off-target events among cells from one nucleofection, pooled subsets of EGFP-positive cells were analyzed. More than one pool showed fragment(s) (Figure S20; verified by Sanger sequencing), and therefore off-target events must have occurred in more than one cell of this nucleofection sample. In addition, single or few bands per pool were observed, suggesting here that gene-repaired cells carried mostly one additional unwanted event.

Next, we tried to gain more insight into the interaction of the ssODN with chromosomal regions. First, the length distributions of the cor21 “footprints” were analyzed (Figure 2E). The median length was 13 nt for non-mutated and 15 nt for mutated sites, which also had a larger size range. Second, the sequence logo for non-mutated cor21 “footprints” showed the phosphorothioate(PS)-stabilized bases with near maximal information content (Figure 2F). Mutated sites appeared more complex: the AA dinucleotide separated two major cor21BdT subregions. The longer region located in the middle was composed also of non-stabilized residues corroborating the findings for the “contiguous cor21 portions.” Note that in both cor21 “footprint”-based sequence logos, the stabilized 3′ terminal C residue was missing in ∼50% of cases. Taken together, the cor21 “footprints” from mutated sites tended to be longer and more complex possibly due to differential intracellular processing of the ssODN molecules. Interestingly, if a site was predicted in silico to be at least 14 nt long, it was never found mutated when captured (Figure 2G).

Looking at the different experimental runs, 73 off-target regions (∼13%) were found more than once, two of them 4 times (Figure S21; thus, 544 different regions in 627 independently scored regions). Most (>95%) of these multiply called regions were not mutated. We categorized them not as contaminants but as independent and thus valid, because all respective negative controls yielded—if at all—bacterial DNA-specific signals.

Concerning the involvement of genes, the off-target events were associated ∼1.35 times above background (Figure 3A). This was little higher than predicted by the sequence-based datasets. Introns were 12 times more often involved than exons. Two off-target events implied mutated proteins (Table S1).

Figure 3.

Figure 3

ssODN Off-Target Events: Involvement of Genome Features and Predictability

(A) Involvement of genes. The background frequency of a hit inside a gene was calculated as the total base pair coverage of the 30,502 UCSC knownCanonical genes, and this was set to 1. For prediction datasets, frequencies were calculated as the number of hits in genes per all predicted regions and then related to the base pair coverage. (B) Involvement of repeat elements. The eight repeat classes presented were encountered in the experimental data. For bias-free reference, the base pair coverages for each of the repeat classes were calculated as the respective total number of base pairs divided by the total base-pair-length of all 24 human chromosomes. For prediction datasets and the experimental data, the numbers of sites associated with each repeat class were divided by the respective dataset size. Dashed lines shall aid in contextual viewing. (C) Chromosome-specific distribution. Predicting ssODN-specific off-target regions with prediction dataset cor21_B. The numbers of sites found for cor21_B and the experimental data were set in relation to the values of chromosome 1. Dashed lines shall aid in contextual viewing. Gray bars depict the chromosome lengths also in relation to chromosome1. N, dataset size. (D) Prediction rates for ssODN-specific off-target events. Bars represent the prediction rates for sites without and with mutation. The random prediction dataset cor21_R and dataset scr21_B served as negative controls.

Involvement of repeat elements was observed as expected from their bp coverages except for SINE and LINE (Figure 3B). The LINE frequency in the off-target data was—as predicted—underrepresented versus the bp coverage. However, SINEs were found at higher rates, but more rarely than predicted in silico. This relative reduction was due to proportionally more 10- to 12-nt-long cor21 “footprints” in the experimental data than in the in silico prediction datasets. However, looking only at mutated sites, the number of hit SINE regions would be even slightly higher than predicted. Notable is again the congruence between both motif-driven prediction datasets cor21_B and scr21_B.

Analyzing the relative frequencies of sites per chromosome, both the predictions cor21_B and our experimental data showed correlating values (Figure 3C). Of note, cor21_B and scr21_B had superimposed graphs (Figure S13). However, evaluating the prediction performance, the positive prediction dataset cor21_B outcompeted both controls (Figure 3D): all non-mutated sites and 74% of the mutated regions were predicted. The overall prediction rate reached 92%.

Analyses Based on High-Throughput Sequencing Techniques

For high-throughput sequencing, the previously Sanger-sequenced LM-PCR products of experiments Aii and Aiii (pyrosequencing) and Aiv and Av (nanopore sequencing) were selected. The required reamplifications lasted ≤10 cycles to largely maintain the original patterns (Figure S22).

As expected, the median read length was ∼3 times higher for nanopore sequencing (Figure S23). Pyrosequencing yielded up to 15,300 reads per sample (Figure S24), while a MinION Flow Cell (version R7.3) produced up to 300,000 reads (Figure 4A). Concerning nanopore reads, accuracy still was limited with the R7.3 nanopore. We utilized only the 2D reads “pass,” which exploited both template and complement sequence informations. The 2D reads “pass” constituted about 50% of the initial “squiggle” data (Figures 4A and S24). Their identities were ∼86.5% based on Bowtie 2 alignments (Figure S25). To work with sequence data of comparable quality from both platforms, we analyzed the influence of different coverages on nanopore sequence identities. With median coverages ≥5, nanopore-based consensus sequences also reached identities of ∼98%. Therefore, we set a threshold of 5 for successfully mapped loci. With the CLC Genomics Workbench aligner, ≥95% of pyrosequencing reads and about 60%–80% of the nanopore 2D reads “pass” were mapped (Figure S26A; absolute numbers of mapped loci, see Figure 4B).

Figure 4.

Figure 4

Pyrosequencing and Nanopore Sequencing

(A) Proportion of nanopore 2D reads “pass.” Gray bars: numbers of all reads (”squiggle data”) from a flow cell run collected by the MinKNOW software as separate FAST5 files. Dark gray bars: 2D reads placed into “pass” folders by the Metrichor-provided cloud-based basecalling service. Percentages were calculated from number of 2D reads “pass” divided by number of all reads. Aiv, Av, nucleofection run 4, 5; c, unlabeled ssODN; cB, biotinylated cor21; t, total, i.e., unsorted cells. (B) Number of all loci (GRCh37/hg19) to which reads were mapped. The thresholds define the numbers of reads required per locus to be scored. Aii, Aiii, nucleofection run 2, 3; G+, enhanced green fluorescent protein (EGFP)-positive, i.e., sorted gene-corrected cells. (C and D) Statistics of reads harboring cor21 full-length or partial sequences. Mapped loci exhibiting capture-based enrichment of cor21 sequence parts. The enrichment factors were calculated against the negative control data Aiii_c_G+ and Av_c_t (C). Asterisks mark samples the enrichment factors of which were arbitrarily set to 100 to avoid division by zero. Number of ssODN-positive reads. The percentages were calculated based on mapping with the CLC Genomics Workbench aligner (D). (E) Retrieval of off-target affected gDNA fragments using high-throughput sequencing technologies. Data referenced to those of the colony hybridizations are shown as area-proportional Venn diagrams. CH, colony hybridization.

In captured fragments, the colony hybridization protocol detected cor21 sequences only if roughly ≥13 nt. However, the high-throughput data bioinformatically revealed >10-fold enrichment of biotin-positive cor21 sequences with lengths ≥ 10 nt (Figure 4C). Next, we searched cor21 sequences ≥10 nt in the mapped loci to collect off-target relevant genome fragments. For both platforms, only 2%–5% of the mapped reads (Figure 4D) yielded cor21-specific off-target regions. Concerning off-target sites characterized previously by Sanger sequencing, 36%–69% were observed again (Figure 4E). While some known regions were not retrieved, 459 new sites (of the 627 sites)—mostly with short cor21-specific sequences (10–12 nt)—extended the Sanger sequencing-based list (Table S1).

For the retrieved off-target-relevant fragments, median coverages for nanopore reads were rarely above 5, and were also low for pyrosequencing data (Figure S27). Clearly, streptavidin-coated beads co-isolated a strong background of gDNA fragments, where especially DNAs of class satellite (e.g., HSATII, ALR/Alpha) (Figure S28) and most—not all—regions of the mitochondrion genome (Figure S29) were observed. The high numbers of background molecules reduced in both pyrosequencing and nanopore sequencing the retrieval efficiency for off-target sites. The overall low coverages were responsible for two further observations: (1) the median lengths of pyrosequencing-based mapped off-target loci were relatively short (Figure S26B), and (2) a 5-fold median coverage for nanopore-derived sequences often was not attained. Taken together, new high-throughput sequencing techniques, especially nanopore sequencing, enable fast off-target analyses detecting relevant ssODN sequence stretches as short as 10 nt. The unrestricted length of nanopore sequencing reads allows efficient mapping also of repeat element regions.

Applying the Protocol to Primary Human Cells Treated with an ssODN against an Endogenous Locus

Because the protocol was developed using an artificial transgenic cell line, it was important to test whether the approach would also be more widely useful. To this end, we have chosen human primary CD34+ hematopoietic stem and progenitor cells being also of considerable clinical interest. Concerning the ssODN, the backbone-modified cor21BdT targeting a transgene locus was replaced with a natural-backbone 100-nt-long ssODN (dubbed “cybb100BdT,” Figure S30). This ssODN was taken from work of De Ravin et al.35 involving the CRISPR/Cas9/single-guide RNA system, where the endogenous CYBB locus was successfully corrected in a model system for treating the immunodeficiency disorder X-linked chronic granulomatous disease (X-CGD).

Preliminary tests with cybb100BdT (biotinylated thymine at position # 17) were run with transgene-free HEK293 cells, the parental line of our 293mEGFP-M12 cells. These experiments provided two important insights: First, they showed that—due to the greater length of cybb100—the alignments of reads of up to ∼500 nt in length could be impossible using the CLC Genomics Workbench aligner. The alignment of longer reads might be problematic even when full-length cybb100 sequences are part of reads, because consensus sequences might be generated, which render the cybb100 sequences unrecognizable. These problems have been overcome by generating individually tailored reference sequences based on informations extracted from the affected read groups (see the Materials and Methods). Second, the HEK293-based experiments led to adaptations in our protocol particularly further reducing the potential for artifactual off-target events (Figure S1). Following the gDNA preparation, the DNA molecules is dephosphorylated at their 5′ ends to inhibit unwanted ligation between gDNA fragments and/or co-purified free ssODN strands. Our attempt to physically shear DNA (using Covaris Adaptive Focused Acoustics [AFA] technology) had to be abandoned due to pronounced generation of artifactual off-target events. This was caused likely by the ensuing DNA end repair step joining unrelated DNA fragments, which had hybridized with their recessed ends. Thus, we switched from the restriction endonuclease DpnII, which generates 5′ overhangs, to the blunt-cutting HaeIII providing two improvements. First, no DNA end repair is necessary. Second, our correspondingly adapted LM-PCR linker will be compatible with a whole series of blunt-cutting nucleases, which could be helpful in reducing genome region biases. Furthermore, reducing the LM-PCR cycle number from 50 to 30 cycles reduced the occurrence of single-stranded PCR products, which might hybridize to experimentally unrelated DNA strands again leading to false-positive signals. Finally, the LM-PCR primer now contains a 5′ phosphate, abolishing again the need for DNA end repair during the library preparation for nanopore sequencing.

Based on the adapted protocol (Figure S1), we observed off-target events in the HEK293 cells. The most frequent types were inter- and intrachromosomal structural rearrangements bridged by long (> 40 nt) parts of cybb100BdT (Table S2). The control with the unbiotinylated cybb100 did not show any off-target events. However, in the control with biotinylated cybb100BdT added to the gDNA preparation of cells treated with unbiotinylated cybb100, we observed five events. Likely, bead-bound cybb100BdT molecules trapped—via their homologies—gDNA fragments, which had been involved in off-target events caused by the nucleofected biotin-free cybb100. With cybb100BdT about ∼20 times more consensus sequences with coverage ≥ 5 were found versus the controls (Figure S31). The likely reason might be metabolization of the ssODN in the HEK293 cells, which could enable generation of new biotinylated dT precursors used during the S phase. We also observed short non-mutated cybb100 sequences (13-mers–14-mers).

Next we applied this adapted protocol to human primary CD34+ hematopoietic stem and progenitor cells. Notably, also the negative control with cybb100BdT now added to gDNA of cells having been treated with the unrelated 96-nt-long ssODN DSB-insertion36 remained negative. This corroborated the “trapping” hypothesis. From a total of 1,484,320 reads of which ∼47% carried the cybb100BdT-related barcode (Figure S32), we retained 695,547 reads after (1) removing reads with mixed barcodes and (2) trimming off the LM-PCR linker. These reads yielded after mapping 1,595 consensus sequences with coverage ≥5. We finally identified 22 off-target events (Figures 5, S30, and S32; Table S3). In contrast to the preliminary HEK293 experiment, the majority of gDNA fragments carried cybb100BdT sequences (>80 nt) at one of their termini. Here, the cybb100BdT had obviously been already ligated with its 5′ region to a chromosomal region, while the ssODN’s 3′-terminus was still free. Beside one interchromosomal structural rearrangement like in the HEK293 experiment, four long (>80 nt) cybb100BdT portions were detected being inserted into a narrowly defined chromosomal site (Table S3). This mutation type has not been found in the preliminary HEK293 experiment.

Figure 5.

Figure 5

ssODN-Related Off-Target Events in Human Primary CD34+ Hematopoietic Stem and Progenitor Cells

(A) Statistics of consensus sequences from mapped loci. The post-capturing enrichment of consensus sequences harboring cybb100 full-length or partial sequences is shown. The enrichment factors were calculated against the negative control data “DSB-insertion + cybb100BdT: 10-mer–12-mer.” Note that in samples “DSB-insertion + cybb100BdT” and ”cybb100” no ≥13-mers of cybb100 had been observed. The asterisk marks the sample the enrichment factor of which was arbitrarily set to 100 to avoid division by zero. (B) Distribution of mutation types found in treated human CD34+ cells. (C) Graphical representations of off-target events. The two long black lines symbolize the DNA top and bottom strand. The upper scheme depicts an event with cybb100BdT inserted into a narrowly defined chromosomal site. The lower scheme details an event in which the entire ssODN was terminally appended. The gray circles mark the top strand region, which is given below at the nucleotide level. Note that only one A in the insertion was expected as the result of the dA-tailing reaction. cybb100, the non-biotinylated ssODN molecule itself or the DNA sequence string.

Discussion

The protocol successfully detected ssODN-specific off-target events across the entire genome of the human-derived 293mEGFP-M12 cells. Note that the labeled ssODN was likely incorporated into the lagging strand during DNA replication.37, 38, 39, 40, 41, 42 Thus, we have shown that in actively dividing cells, almost any chromosomal locus might be involved in off-target events caused by ssODNs. Higher concentrations of ssODNs inside the target cells likely increase the risk of these unwanted events.

Non-mutated sites were more likely associated with shorter cor21 “footprints,” while longer sequences often composed of subregions correlated with the induction of a broad spectrum of mutations. However, there were also extreme cases with minimal matches and mismatching 3′ ends yielding chromosomal rearrangements (e.g., Figure 2C, left panel, last entry). Such events are impossible to predict. Nevertheless, our in silico dataset cor21_B was helpful in compiling vulnerable sites and genomic features likely involved. Of note, the negative control scr21_B, based on a partially altered sequence string, had overall rather similar distribution properties, but considerably lower prediction rates. Therefore, Watson-Crick base-pairing was the main interaction mode with the chromosomal sites. This was corroborated by repeatedly capturing predicted 100% identical sites of lengths >14 nt, which always were non-mutated. As deduced from sequence logos, chromosomal sites were selected by hybridizing mainly to the PS-stabilized ssODN regions. The resultant SINE enrichment was not obviously dependent on the genome location. However, besides the often missing 3′-terminal cytosine, additional ssODN residues at multiple positions might also be important for the outcome. This fuzziness is in line with the complexity of our off-target data. It will be interesting to learn, how the actual design of the ssODN might influence the off-target spectrum. For example, the predominance of insertions/point mutations might be due to the PS-stabilized ssODN backbone. Another issue addressable in the future concerns the evaluation of different cell types, notably also those which are not actively dividing.

Our experimental off-target dataset—though still limited—correlates well with the cor21-based prediction data. Thus, the question arises: Are all ∼2.7 × 106 predicted sites at risk in similar experimental settings? Based on the observed data, the answer is yes. However, our protocol principally cannot reveal all off-target events. The extent of this underestimation is unknown for multiple reasons: human genome regions might not be available for analysis due to problems with fragmentation, linker ligation, or PCR (e.g., total failure, mutating the “contiguous cor21 portion”); unique gDNA fragments might be lost during the procedure; sites <10 bp cannot be verified against the background; the fates of the reciprocal chromosome parts of structural rearrangements are practically undetectable; the protocol might not detect events where the ssODN (or part of it) has not yet been fully integrated into the host DNA strand; with cor21BdT, the special design of the 3′ terminus makes it impossible to examine the off-target activity of the complementary 5′ part; depending on sequencing read distributions and mapping algorithms (e.g., coverage required), affected sites might go missing. Of note, an off-target event might sometimes be needlessly scored: with one strand of the DNA double-helix still being wild-type (as seen in high-throughput re-sequencing), the affected strand might be corrected again.

The experimental runs Aiv and Av with populations of unsorted 293mEGFP-M12 cells mimic samples where successfully treated cells cannot be selected for. In our model, ∼2 per 10,000 treated cells carried mutated genome regions. Whatever the numbers for other settings and cell types might be, nothing can yet be said about the effects of these mutations. Therefore, future studies must examine preferably biological end-points (e.g., changes in growth rates, in vitro immortalization assays,43, 44 colony formation assays for hematopoietic cells), but also exemplary mutations generated in model systems via, e.g., CRISPR/Cas9. Nevertheless, by focusing on this first survey of possible gDNA alterations, our work has already revealed some biologically relevant features of off-target events: with an ssODN directed toward an artificial EGFP open reading frame, off-target events have been detected across the entire human genome; a disproportionally large number of SINE regions have been scored as being hit; genes were involved slightly more often than expected, where two mutations also led to mutations in the respective protein sequences; non-coding RNAs were found mutated calling for appropriate analyses concerning their potential effect(s) on gene regulatory networks. Our successful predictions demonstrate that the choice of an ssODN, such as cor21BdT, can be guided by its in silico observed propensity to hit certain genomic features. This should help in selecting matrices which are less likely to affect unwanted regions important for the cell type of interest. Already at this point of protocol development, the analyses are valuable, because the findings can be more and more related to data made available from the ongoing research activities concerning the structure and function of the human genome.

The central still unresolved problem of the protocol is—in spite of efficient washing—a high background (∼95% of irrelevant gDNA fragments). This results in lower coverages and thus less efficient nanopore sequencing of the regions of interest. The improvements based on the R9 nanopore (https://nanoporetech.com/about-us/news/update-new-r9-nanopore-faster-more-accurate-sequencing-and-new-ten-minute-preparation) offer help, because increased read qualities reduce the number of reads needed per each locus. Genome region bias could be further reduced by physical shearing of the gDNAs. However, the necessary DNA end repair thereafter is prone to artifact formation. One might also miss structural rearrangements, if the DNA break occurs inside the ssODN sequence. Here, we rendered our restriction endonuclease-based protocol more versatile by switching to the group of blunt-end cutting nucleases. Hotspots might be detected by analyzing an increased number of experimental runs.

First work with primary human cells led to the intriguing finding of terminally located (mostly) full-length cybb100 sequences especially among the cybb100BdT-treated CD34+ cells. In most cases, we observed nucleotides of unknown origin between the respective 3′ end of the cybb100 sequence and the LM-PCR linker sequence. Because longer DNA strands could fold into multiple different secondary and tertiary structures, we hypothesize that an ssODN—already 5′-terminally ligated during an on-going off-target event—is elongated at its free 3′ end during hybridization to adjacent sites. In this state, this DNA structure is harvested during gDNA preparation. With the Blunt/TA Ligase Master Mix providing very efficient ligation also of difficult DNA termini, it is then conceivable that our LM-PCR linker is attached to these cybb100 ends. Thus, we obviously captured off-target events still not completely resolved by the cell.

Using the backbone-unmodified long cybb100BdT, we have seen in these first assays a different distribution of mutation types as compared to cor21BdT. From these limited data, however, we cannot yet conclude whether cybb100’s increased length and/or the natural DNA backbone are the underlying cause for this finding. It is also likely that the DNA repair mechanisms specific for each cell type might influence the outcomes.

Concerning the important issue of artifact formation, the protocol adaptations applied during the experiments with the cybb100BdT have helped in decreasing false-positive events.

Certainly, to better understand the off-target activities in genome engineering, combinations of experiments and analysis platforms are needed. Our ssODN-protocol represents a further building-block, and it should also be easily adaptable to double-stranded donor fragments. Based on the fast and length-unrestricted nanopore sequencing, it is affordable also for smaller laboratories.

Material and Methods

Cell Culture, Transfection, and Cell Sorting

The 293mEGFP-M12 cell line (derivative of hypotriploid human embryonic kidney HEK293 cell line with ∼65 mEGFP transgene copies5) and HEK293 cells were tested mycoplasma-negative by PCR.45 Their identities were confirmed by short tandem repeats (STR) analysis and comparison to the “Deutsche Sammlung von Mikroorganismen und Zellkulturen” (DSMZ, Braunschweig, Germany) database (https://www.dsmz.de/services/services-human-and-animal-cell-lines/online-str-analysis.html). Cells were cultivated at 37°C and 7.5% CO2 in DMEM (31885, GIBCO/Life Technology/Thermo Fisher Scientific, Schwerte, Germany) supplemented with 2 mM L-glutamine (25030, GIBCO/Life Technologies) and 10% fetal bovine serum (A15-101, PAA, Cölbe, Germany, or S1810, Biowest, Nuaillé, France). In preparation of the study, cells were initially expanded and cryopreserved. Then, for all experiments, cell aliquots from this passage number were reactivated 5 days before the transfections. ssODNs cor21 and cor21BdT for EGFP gene correction experiments were purchased from Thermo Fisher Scientific (CaP, Ai, Aii, Aiii) and from Eurofins MWG (Aiv, Av) (for sequence, see Figure 1B). Because the presence of a 5′-terminal phosphate group did not influence repair rates,27 the group was omitted. ssODNs cor21 and cor21BdT were tested for absence of contaminating EGFP expression plasmids with calcium phosphate co-precipitation (CaP)-based transfections of HEK293 cells. ssODNs cybb100 and cybb100BdT were purchased from Eurogentec with a 5′-terminal phosphate group (see Figure S30 for primary sequence and position of the biotin label). The control ssODN DSB-insertion (sequence: 5′-GGTGTCAAAAAGGATCCGAAATTACCCTGTTAACCCTATTAATTATGCATTTAAAAGCCGAGGTGAGATCCATCGCCACCATGGTGAGCAAGGGCG-3′) was ordered from Thermo Fisher Scientific. DSB-insertion, cybb100 and cybb100BdT ssODNs were PAGE-purified. For CaP-based gene correction experiments5, 45 μg of ssODN were added to 4.5 × 106 cells plated in a 75-mm2 flask 21–22 hr before transfection. Nucleofections with cor21 ssODNs involved 1.5 × 106 cells and 20 μg of ssODN per cuvette and were carried out using an amaxa kit V (VCA-1003, Lonza, Cologne, Germany) and program A23 in the Nucleofector I. Nucleofections with cybb100 ssODNs involved 1.5 × 106 cells and 500 pmol of ssODN per cuvette and were carried out using amaxa kit SF (V4XC-2024, Lonza) and the program CM130 in the 4D-Nucleofector X. Transfected cells from 2–7 cuvettes were pooled and cultivated for 2 days. mEGFP gene repair rates were measured by FACS as EGFP+ cells among single cells. Cells treated with cor21 ssODNs were sorted into EGFP+ and EGFP cell fractions with BD FACSAria cell sorter (BD Biosciences, Heidelberg, Germany; 100-μm nozzle; pressure: 20 psi) without live-dead staining. Cells treated with cybb100 ssODNs were sorted according to the FSC-A/SSC-A gating.

The human primary CD34+ hematopoietic stem and progenitor cells had been prepared after mobilization into the peripheral blood (purity of cell preparation based on marker CD34+: 99.7%). The cryopreserved cells were thawed and seeded into ultra-low attachment 6-wells (3471, Corning, NY, USA) at a density of 1 × 106 cells/mL in culture medium for CD34+ cells 29 hr before transfection. Culture medium for the CD34+ cells was composed of SCGM (20802, CellGenix, Freiburg, Germany) supplemented with 100 ng/mL hSCF (255-SC-010/CF, R&D Systems, MN, USA), 25 ng/mL hIL-3 (203-IL-010/CF, R&D Systems), and 20 ng/mL hIL-6 (206-IL-010/CF, R&D Systems). To remove dead cells, a density gradient (Lymphoprep (07801, Stemcell Technology, Köln, Germany) was done immediately before the transfection. The density gradient was performed as follows: 2 mL of Lymphoprep and 4 mL of cell suspension combined in a 15-mL tube were centrifuged for 20 min at 200 g. Cells were washed 1× with 10 mL PBS (GIBCO, 14190, Thermo Fisher Scientific) supplemented with 0.5% BSA (130-091-376, Miltenyi Biotec, Bergisch Gladbach, Germany), and centrifuged for 5 min at 800 g. Nucleofections involved 1.5 × 106 cells and 500 pmol of ssODN per cuvette, and were carried out using amaxa kit P3 (V4XP-3012, Lonza) and program EO-100 in the 4D-Nucleofector X. Transfected cells from two cuvettes were pooled and cultivated for 2 days. Single cells were sorted according to the FSC-A/SSC-A gating with BD FACSAria cell sorter (BD Biosciences; 100-μm nozzle; pressure: 20 psi).

gDNA Preparation

The Wizard Genomic DNA Purification Kit (A1120, Promega, Mannheim, Germany) was utilized for gDNA isolation of samples with ≥2 × 105 cells. For smaller cell samples, the protocol was adapted: 2.5 μg of polyadenylic acid potassium salt (81142, Sigma-Aldrich, Munich, Germany) were added, solution volumes were quartered, RNase digestion was omitted, and centrifugation after isopropanol addition lasted 60 min. Because of low gDNA amounts in EGFP+-sorted samples, qualities and concentrations were estimated on ethidium bromide-stained agarose gels by comparing to standardized DNA preparations. gDNA concentrations from samples transfected with cybb100 ssODNs were measured with the Qubit dsDNA BR Assay Kit (Q32850, Thermo Fisher Scientific).

Bead Experiment and LM-PCR

To avoid PCR product carry-over, the workflow started in a specific pre-PCR environment. Before bead incubation, gDNAs from experiments with cor21 ssODNs were digested with DpnII (R0543S, New England Biolabs, Frankfurt, Germany) generating compatible ends for linker ligation. The linker consisted of the annealed oligonucleotides 5′phosphate-GATCGGCTATTCGGCTATGACTAG-3′ and 5′-CTAGTCATAGCCGAATAGCC-3′. Digestion conditions for 4.5 ng of gDNA (equivalent to ∼500 hypotriploid cells): 1× DpnII buffer, 5 U of DpnII, 100 μg /mL of bovine serum albumin (BSA; B9001S, New England Biolabs), 15 μL digestion volume in 0.2-mL PCR tubes (710980, Biozym Scientific GmbH, Hessisch Oldendorf, Germany), incubation at 37°C for 1 hr. Digestion conditions for 2.25 μg of gDNA (equivalent to ∼2.5 × 105 cells): 1× DpnII buffer, 15 U of DpnII, 100 μg/mL of BSA, 50 μL digestion volume in 0.2-mL PCR tubes, overnight incubation at 37°C. For each sample, bead experiments utilized 12.5 μg of magnetic Dynabeads MyOne Streptavidin C1 (65001, Invitrogen/Thermo Fisher Scientific) in 1× binding and washing buffer (BWB; 10 mM Tris-HCl [pH 7.5], 1 mM EDTA, 1 M NaCl) supplemented with 0.1% BSA (A9647, Sigma Aldrich). Beads were washed twice with 500 μL of 2× BWB and incubated for 1 hr at room temperature in 2× BWB; bead solution was then added to digested gDNAs yielding 1× BWB conditions. Samples were incubated at room temperature for 1 hr under agitation and washed five times with 100 μL of 1× BWB. During the last washing step, beads were transferred into new tubes avoiding PCR amplification of gDNAs unspecifically adhering to tube walls. One washing step with 100 μL of 0.1% BSA solution reduced NaCl concentration. Then, the ligation mix (without ligase) was added to the beads. The Fast-Link DNA Ligation Kit (LK6201H, Epicenter/Biozym Scientific GmbH) was used according to the standard protocol for sticky ends with reduced volumes (5 or 10 μL for samples with low or high gDNA amounts, respectively) and 0.05 pmol/μL linker. Ligase was added after reactions were cooled down (35°C for 10 min, – 0.5°C/45 s until 4°C reached; GeneAmp PCR System 9700 cycler [Applied Biosystems/Thermo Fisher Scientific]). Samples were kept on ice during ligase addition. Cycler conditions: + 0.5°C/45 s from 4°C to 10°C, 10 min at 10°C, and then at 16°C for 3 hr or overnight. After ligation, beads were washed once with 100 μL of 0.1% BSA solution. To each sample, 50 μL of PCR mix were added containing 2.5 U of Pfu polymerase (600159, Agilent Technologies, Waldbronn, Germany) and 1.2 μM primer (5′-CTAGTCATAGCCGAATAGCC-3′). The LM-PCR program started with an initial denaturation at 96°C for 3 min, followed by 50 (or 30) cycles with 94°C for 20 s, 58°C for 30 s, and 72°C for 2 min 30 s; the final elongation was carried out at 72°C for 5 min. For quality control, 10% of each reaction was run on an 1.5% agarose gel (stained with ethidium bromide, 1× TBE buffer). EGFP-specific PCR utilized the Taq PCR Kit (201205, QIAGEN, Hilden, Germany; 50 μL, 2.5 U of Taq, 0.5 μM of primers 5′-GACGTAAACGGCCACAAGTT-3′ and 5′-GAAGTCGTGCTGCTTCATGTGG-3′). The EGFP-specific PCR program started with an initial denaturation at 96°C for 3 min, followed by 35 cycles with 94°C for 30 s, 54°C for 45 s, and 72°C for 30 s; the final elongation was carried out at 72°C for 5 min.

For gDNAs from experiments with cybb100 ssODNs, the following adaptations were made: the linker for dA-tailed gDNA fragments consisted of the two annealed oligonucleotides 5′-phosphate-GGCTATTCGGCTATGACTAG-3′ and 5′-CTAGTCATAGCCGAATAGCCT-3′. gDNAs were dephosphorylated before HaeIII digestion. gDNAs equivalent to ∼2.5 × 105 cells (2.25 μg [HEK293] or 1.5 μg [CD34+]) were treated with 4 U of Shrimp Alkaline Phosphatase (M0371S, New England Biolabs) in 1× CutSmart Buffer (B7204S, New England Biolabs). Reaction samples were incubated at 37°C for 30 min, and then 65°C for 10 min to inactivate the phosphatase. Afterward, 60 U of the blunt-cutting restriction enzyme HaeIII (27-0866-02, Pharmacia) were added, and the samples were incubated overnight at 37°C. dA-tailing reactions were performed (NEBNext dA-tailing (E6053S, New England Biolabs) after washing of the beads (see above). The reaction volume was 10 μL. Incubation was done at 37°C for 30 min. Beads were washed 3× with 1× BWB and 1× with 0.1% BSA solution. Ligations were done with the Blunt/TA Ligase Master Mix (M0367L, New England Biolabs) in a 5-μL reaction volume with 0.5 pmol linker at 16°C for 20 min. Afterward beads were washed 2× with 1× BWB. Then, a twofold wash step with 0.1 M NaOH/0.05 M NaCl was done to obtain only fully ligated fragments (see Radecke et al.46). Before LM-PCR, beads were washed 1× with 1× BWB and 1× with 0.1% BSA solution. For the LM-PCR, the primer was now used as a 5′-phosphorylated version. The cycle number was reduced to 30 (instead of 50) cycles.

50-Cell Pools Experiment, “One Tube Protocol”

Two days after ssODN addition, 96 × 50 EGFP-positive cells or untreated EGFP-negative cells were sorted into 4°C cold lysis buffer (2.5 M NaCl, 0.1 M EDTA, 10 mM Tris [pH 10], 5% DMSO, and 1% Triton X-100) and incubated at 4°C overnight. A DpnII digestion mix (100 μL end volume, 3 U of DpnII) was added to the lysed cells and incubated at 37°C for 4 hr. Bead assays and LM-PCRs were done as above with the following modifications: bead incubation at 4°C overnight; after linker ligation, additional single washing steps with 0.1 M NaOH, then with wash buffer, and, finally, with 0.1% BSA; LM-PCR was carried out with PfuUltra II Fusion HS PCR system (600670, Agilent Technologies; 1 μL of polymerase, 50-μL reaction, and 1.2 μM primer). The program started with an initial denaturation at 96°C for 5 min, followed by 10 cycles at 94°C for 30 s, 58°C for 60 s, and 72°C for 30 s, then followed by 35 cycles at 94°C for 30 s, 58°C for 30 s, and 72°C for 30 s; the final elongation was carried out at 72°C for 3 min.

Southern Blot

A cor21-specific probe was employed. The amplification products were purified with the QIAquick PCR Purification Kit (28104, QIAGEN). Samples equivalent to 10% of original LM-PCRs were run through an 1.5% agarose gel (1× TBE running buffer, no ethidium bromide). DNAs were blotted on positively charged nylon membranes (11209299001, Roche/Sigma Aldrich) overnight. After transfer with an alkaline transfer buffer (0.4 M NaOH/0.6 M NaCl), DNA was UV crosslinked. PCR products containing cor21 sequence information were detected using the digoxigenin (DIG) system (Roche; DIG Applications Manual for Filter Hybridization). Hybridization was performed with DIG Easy Hyb (11603558001, Roche/Sigma Aldrich) and 5 pmol/mL 5′-DIG-cor21 (for cor21 sequence, see Figure 1B) at 30°C for 3 hr. After four washing steps (high-stringency buffer washing at 43°C), the probe was detected with Anti-digoxigenin-AP Fab fragments (11093274910; dilution 1:10,000–1:20,000, DIG wash and block buffer set (11585762001) and CDP-Star ready-to-use (11685627001, all from Roche/Sigma Aldrich) according to supplier’s instructions, except that the 1× blocking solution was supplemented with 5% milk powder (T145.1, Carl Roth, Karlsruhe, Germany).

To detect EGFP gene copies, a 656-bp-long DIG-labeled fragment was generated from the EGFP open reading frame (primer sequences: 5′-GACGTAAACGGCCACAAGTT-3′ and 5′-GACTTGTACAGCTCGTCCAT-3′; DIG PCR Probe Synthesis Kit [11636090910, Roche/Sigma Aldrich]). The protocol was modified: hybridization with 12.5 ng of probe per mL at 43°C overnight and washing with high stringency buffer at 65°C. The Southern blot to detect remaining cor21BdT in gDNA preparations was modified: 4% phor agarose gel (850181, Biozym Scientific GmbH), gel run and 4 hr blotting on ice, and probe DIG-21cor_as (5′-DIG-GAGGGTTGGCCAGGGCACGGG-3′).

Subcloning of PCR Products

The standard TOPO TA Cloning Kit (K202040, Invitrogen/Thermo Fisher Scientific) protocol (4 μL of template) was used. 2–3 μL of ligation reactions were transformed into XL2-Blue Ultracompetent Cells (200150, Agilent Technologies).

Colony Hybridization

The procedure is described in DIG Applications Manual for Filter Hybridization (Roche; https://lifescience.roche.com/wcsstore/RASCatalogAssetStore/Articles/05353149001_08.08.pdf; Chapter 3.6). The following modifications enabled detection of colonies carrying plasmids with ssODN sequences down to a length of ∼13 nt: for uniformly sized colonies, bacteria were spread onto membranes (Nylon Membranes for Colony and Plaque Hybridization, 11699075001, Roche/Sigma Aldrich) layered on top of agar; RNase A was applied requiring Nuclei Lysis Solution (A7941, Promega) and 4–10 mg of RNase A per mL solution (A7973, Promega/ Fluka). Transformation reaction containing about 300 colony forming units was spread out on the membrane. Reference clones contained plasmids with no sequence ssODN cor21 information (negative control), a representative 13-mer, a representative 15-mer, or the entire 21-mer (for sequences, see Figure S10A). Plates were incubated at 37°C overnight. Then a replica membrane was generated. Both membranes were incubated on agar plates at 37°C for 3 hr. After removing major parts of growing colonies with a dry Whatman paper, the replica membrane was incubated face up on Nucleic Lysis solution for 5 min and on 2 × SSC twice for 5 min. RNase A treatment in 2× SSC was carried out at 37°C for 1 hr. After denaturation and neutralization, DNA was UV crosslinked (Stratalinker 2400; Stratagene). For reducing the amount of bacterial proteins, the membrane with colonies facing down was incubated on a 1:10 dilution of proteinase K (∼10 mg/mL; 1245680100, Merck KGaA, Darmstadt, Germany) with 2× SSC at 37°C or 56°C for 1 hr. Then, the membrane was treated at 37°C with 50 mL of stripping solution (0.2 M NaOH (Carl Roth) / 0.1% SDS) for 15 min followed by washing with 2× SSC at room temperature. Probe hybridization with 5′-DIG-21cor and antibody staining (1:15,000 dilution of Anti-Digoxigenin-AP Fab fragments) was done as described for Southern blots. Scans or digital images were taken from the membranes with growing bacteria colonies and from the exposed films. Overlays were generated using program CorelDraw X5 (Corel Corporation) to identify positive bacteria clones.

Plasmid Preparation and Sequencing

Positive bacteria clones were grown overnight in LB medium containing 30 μg/mL kanamycin. Plasmids were harvested with the QIAprep Miniprep Kit (QIAGEN). Inserts were sequenced with the BigDye Terminator (version 1.1) Cycle Sequencing Kit (4337451, Applied Biosystems/Thermo Fisher Scientific) using primers M13F and M13R; additional fragment-specific primers were utilized to complete longer fragments. Sequencing reactions were run on a 3100xl Genetic Analyzer (Applied Biosystems). Alignments and annotations were based on UCSC BLAT.47

GS FLX (Titanium Run) Next-Generation Sequencing (Roche, 454)

This sequencing project was carried out as commissional work by the GATC Biotech AG (Constance, Germany) using samples Aii_cB_G+, Aiii_c_G+ (negative controls), and Aiii_cB_G+. Because some of our LM-PCR products were longer than ∼500 bp, amplificates of each sample were first ligated into long DNA strands mimicking genomic DNAs. These strands were used for standard genomic library preparations. Because ∼2 μg of DNA per sample were required, the original LM-PCR products were re-amplified (Figure S22A) using the Pfu PCR System (Agilent Technologies) with an equivalent of 2–3 μL of original LM-PCRs as template. The program started with an initial denaturation at 96°C for 3 min, followed by 7–10 cycles at 94°C for 45 s, 58°C for 1 min, and 72°C for 2 min; the final elongation was carried out at 72°C for 5 min. Pyrosequencing data generated after de-multiplexing and trimming of the library adapters constituted “raw” data. Next, the sequences of our LM-PCR linker (5′-CTAGTCATAGCCGAATAGCC-3′) were localized with program cross_match (http://www.phrap.org/phredphrap/general.html) and further processed (i.e., splitting and linker trimming) with GATC Biotech AG proprietary scripts yielding trimmed reads, but rejecting those <15 bp.

Nanopore Sequencing

Samples Aiv_cB_t, Av_c_t (negative controls), and Av_cB_t were prepared as follows: the SQK-MAP006 Kit (Oxford Nanopore Technologies Ltd. [ONT]) was used. To generate sufficient amounts, original LM-PCR products were re-amplified (Figure S22B) using the Phusion PCR System (F530L, Thermo Fisher Scientific) with 1 μL of original LM-PCRs as template in 50-μL reactions. The program started with an initial denaturation at 98°C for 30 s, followed by 10 cycles with 98°C for 10 s, 58°C for 15 s, and 72°C for 1 min; the final elongation was carried out at 72°C for 3 min.

Re-amplified products were purified with the QIAquick PCR Purification Kit. For all steps of the library preparation, DNA LoBind Tubes (022431005, Eppendorf, Hamburg, Germany) were used. Input for the NEB Next Ultra II End Repair/dA-tailing Module (E7546S, New England Biolabs) was 500 ng of purified PCR products. Incubation was at 20°C for 30 min and then at 65°C for 30 min. Purification was done twice with 1.8× Agencourt AMPure XP Beads (A63880, Beckman Coulter, Krefeld, Germany) according to the manufacturer’s protocol. Samples were eluted with 40 μL of nuclease-free water. With an estimated average length of PCR products of 1,000 bp, the maximal input of 0.2 pmoles per preparation corresponded to ∼125 ng of DNA. Ligation was carried out according to the ONT protocol with the Blunt/TA Ligase Master Mix (M0367S, New England Biolabs) in a 100 μL reaction volume with 0.2 pmoles input of purified, end-repaired PCR products. Ligation products were purified with Dynabeads MyOne Streptavidin C1 according to the ONT protocol and eluted with 30 μL of elution buffer (ONT). Libraries were sequenced for up to 48 hr after loading onto MinION Flow Cells (R7.3) (run with MinKNOW software [version 0.50.2.15] b201507272122 for Aiv/Av_cB_t and 0.051.1.62 b201602101407 for Av_c_t) on a portable MinION device. Generated “squiggle” data were uploaded to the Metrichor-provided cloud-based service for basecalling and trimming of the library adapters (workflow: 2D Basecalling for SQK-MAP006 rev. 1.34 for Aiv/Av_cB_t and rev. 1.69 for Av_c_t). From reads downloaded in hdf5 format an in-house programming language R48 script extracted FASTQ sequence lists for import into CLC Genomics Workbench. Metrichor-based basecalling divides reads into a “pass” and a “fail” folder. For analyses, the 2D reads in the pass folder (2D reads “pass”) were chosen. Trimming of linker sequences was done with “Trim sequence tool” (CLC Genomics Workbench; settings: internal score 10, end match no, mismatch cost 2, and gap cost 2). If the linker (5′-CTAGTCATAGCCGAATAGCC-3′) was recognized, it was removed. Alternatively 45 nucleotides were deleted. These data were the nanopore sequencing “trimmed data.”

Samples from experiments with cybb100 ssODNs were prepared as follows: to generate sufficient amounts, original LM-PCR products were reamplified using the Pfu polymerase with 0.7–1.7 ng of original LM-PCRs as template in 50-μL reactions. The PCR program was the same as for the LM-PCR for 9–12 cycles. Reamplified products were purified with the QIAquick PCR Purification Kit. The kit SQK-LSK108 and the barcoding kit EXP-NBD103 (both from ONT) were used. Library preparations were done following the manufacturer’s protocols with following exception: instead of a DNA end repair step, only a dA-tailing reaction (NEBNext dA-tailing) was performed. Maximal input of 0.2 pmoles was used corresponding to about 185 ng (based on an assumed mean PCR product length of 1,500 bp). Barcoding ratio per run for “HEK-293” and “CD34+” samples was adjusted to 20% for the two negative control samples per cell type and 60% for the samples transfected with cybb100BdT. Libraries were sequenced for up to 12 hr after loading onto MinION Flow Cells (R9.5) (run with MinKNOW software version 1.7.3 (HEK293) or 1.7.10 (CD34+) and MinION MK 1B device). Generated “squiggle” data were locally basecalled with the ONT Albacore Sequencing Pipeline Software (version 1.1.2) (Oxford Nanopore Technologies Ltd.) using arguments –config r94_450bps_linear.cfg and –barcoding. The demultiplexed data obtained as FASTQ sequence lists were imported into the CLC Genomics Workbench. To remove sequences with mixed barcodes, the “Trim sequence tool” (CLC Genomics Workbench; settings: minimum internal match score 9, end match no, mismatch cost 2, and gap cost 2) was used. If a different barcode (BC) was recognized, the sequence was discarded. The resultant data were called “withoutMixedBC data.” Trimming of linker sequences was done as described above with the exception that alternatively 90 nucleotides from the 5′ end and 70 nucleotides from the 3′ end were deleted to also remove the barcode sequences. These resultant data constituted the nanopore sequencing “trimmed data.”

Analysis of Sequencing Data

Using CLC Genomics Workbench, trimmed sequencing data were mapped against reference genome GRCh37/hg19 (download created from ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.chromosome.1.fa.gz on March 3, 2015, 12:30:48ed, etc.). Settings were as follows: masking mode no masking, mismatch cost 2, deletion cost 2, insertion cost 2, cost of insertions and deletions linear gap cost, length fraction 0.9, similarity fraction 0.8, global alignment yes, non-specific match handling map randomly. Local realignment was done with CLC Genomics Workbench (realign unaligned ends: no; multi-pass realignment: 3; guidance-variant track: not set; output mode: create reads track; output track of realigned regions: yes). Consensus sequences were extracted. Settings: threshold 1 (pyrosequencing), threshold 4 (nanopore sequencing, corresponding to 5× coverage), action remove regions with low coverage, and post-remove action split into separate sequences. With in-house R scripts, exported FASTQ consensus sequences were converted into FASTA format and screened for cor21 sequences scoring positive, if at least 10-nt-long sequences encompassed the biotin-labeled thymidine. Next, a sequence list with consensus sequences having at least a 13-nt-long part of cybb100 encompassing the biotin-labeled thymidine was generated. For samples treated with cybb100 ssODNs, the “Trim sequence tool” was run (CLC Genomics Workbench; settings: minimum internal match score 13, end match no, mismatch cost 2, and gap cost 3; discard when found) saving discarded sequences. The annotation of affected genome sites was carried out using the UCSC Genome Browser website. Structural rearrangements were found as follows: if the cor21 sequence 5′-GCCAACCCT-3′ was present, reads unmapped in first round were split into two parts at the cor21 site with the respective cor21 sequence assigned to both fragments. All split reads were mapped with CLC Genomics Workbench. If both parts mapped, sequences were extracted and analyzed for the presence of a cor21 sequence ≥10 nt. If true, the hit was scored as structural rearrangement. Because the 100-nt-long cybb100 sequence added to a chromosomal DNA region constitutes a major part of a few 100-bp-long DNA fragment, the CLC Genomics Workbench aligner often scores such sequence reads as “unmapped.” To overcome this problem, unmapped sequences were analyzed on the basis of individually tailored reference sequences as follows: All unmapped reads were screened for the presence of a 10-nt-long part of cybb100 around the biotin-labeled position using the “Trim sequence tool” (CLC Genomics Workbench; settings: minimum internal match score 10, end match no, mismatch cost 2, and gap cost 3; discard when found) saving discarded sequences. Sequences with cybb100 parts were mapped against all known off-target events from the mapping against the GRCh37/hg19 (global alignment no, other settings see above) to obtain a sequence list with yet unidentified reads. These sequences were assembled among themselves with the CLC Genomics Workbench “assemble sequences” tool (Alignment stringency low, minimum aligned read length 200 bp, vote (A,C,G,T) yes. The contigs were then screened again for containing at least 13-nt-long cybb100 parts. Positive contigs were first analyzed with the “BLAT tool” of the UCSC Genome Browser website to identify affected genome regions. Then a new reference sequence was built from GRCh37/hg19 sequence(s), cybb100 ssODN sequence, and linker sequence. Linker sequences were included to check the junction between the terminally appended cybb100 sequence and the linker sequence. “withoutMixedBC” data were mapped against these new reference sequences (CLC Genomics Workbench settings: global alignment no, length fraction 0.6–0.9 (dependent on reference sequence length); other settings see above) and consensus sequences were generated. To identify possibly all off-target events with a “terminally appended” cybb100, an in-house R script was used to screen for consensus sequences with at least 15 lowest quality scores in the first or last 100 nucleotides, because some consensus sequences did not show any longer the cybb100 ssODN sequence as did the single reads; this was due to the global alignment option, which enforced the mapping of all nucleotides of a read to the respective genomic region. From newly identified regions, reads were extracted and further processed as described for unmapped reads starting with the assembling into contigs.

Identities of mapped reads were calculated with in-house R scripts using the outputfile.sam files from Bowtie 2. Mapping with Bowtie 2 against GRCh37/hg19 (source: NCBI) had following settings: bowtie2 -f -p --end-to-end --very-sensitive -L 15 --mp 1,1 --np 1 --rdg 0,1 --rfg 0,1 --score-min L,0,-0.2, -S.

To be able to map the only 16-nt 3′ terminal stretch of captured fragment #240/241 (Table S1), we utilized Ensembl BLAST/BLAT search (Ensembl GRCh37 release 84; http://grch37.ensembl.org/Homo_sapiens/Tools/Blast) with following settings: Species: Human Homo sapiens); Assembly: GRCh37; Search type: BLASTN (NCBI Blast) / Short sequences; Sequence: GCCAACCCTGCGCTGC (as FASTA); Query type: DNA; DB type: DNA database; Source: Genomic sequence; Configurations General options - Maximum number of hits to report: 100; Maximum E-value for reported alignments: 1,000; Word size for seeding alignments: 7; Scoring options Match/Mismatch scores: 1,-3; Gap penalties: Opening: 5, Extension: 2; Disallow gaps in Alignment: not ticked; Filters and masking options Filter low complexity regions: not ticked; Filter query sequences using RepeatMasker: not ticked.

Generation of Gene and Repeat Lists

For analyzing off-target events in gene regions, a representative list of human genes for genome assembly GRCh37/hg19 was compiled on the UCSC Genome Browser website (https://genome.ucsc.edu/). As a result, all UCSC knownCanonical genes of which a transcript is known were collected via UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables). Removing “abParts” with ≥450 exons, mitochondrion-related data, and alternative assembly data generated a list with 30,502 UCSC knownCanonical genes (note: some genes were listed multiple times as different canonical splice variants; no 28S rDNA genes are contained). Each gene entry contained the following fields: hg19.knownGene.chrom, hg19.knownGene.txStart, hg19.knownGene.txEnd, hg19.knownGene.strand, hg19.knownGene.name, hg19.kgXref.geneSymbol, hg19.knownGene.cdsStart, hg19.knownGene.cdsEnd, hg19.knownGene.exonCount, hg19.knownGene.exonStarts, hg19.knownGene.exonEnds, and hg19.kgXref.refseq.

The list of all repeat regions annotated by the RepeatMasker program (http://www.repeatmasker.org) was generated using the following settings for the UCSC Table Browser: clade = Mammal, genome = Human, assembly = Feb. 2009 (GRCh37/hg19), group = All Tracks, track = RepeatMasker, table = rmsk, region = genome, output format = selected fields from primary and related tables, “Select Fields from hg19.rmsk” = genoName, genoStart, genoEnd, strand, repName, repClass, repFamily. It comprised 5,232,237 regions after removing mitochondrion and alternative assembly data. Because the lists had been exported via UCSC Table Browser, all chromosomal position coordinates had a 0-based start.

In Silico Generation of Genome-wide Off-Target Prediction and Control Datasets

The prediction datasets cor21_B (N = ∼2.7 × 106) and scr21_B (N = ∼2.1 × 106) were generated with Bowtie 234 (version 2.2.3) for the human genome assembly GRCh37/hg19. They were based on the sequences of cor21 (5′-CCCGTGCCCTGGCCAACCCTC-3′) and scr21 (5′-CCCATCCGACCGCCCTGGCTC-3′), respectively. For each of these sequence-specific search, two different datasets were generated: Setting for dataset 1: bowtie2 -f -p -a -D 15 -R 2 -N 1 -L 9 -i S,1,0 --ma 2 --mp 2,2 --rdg 1,2 --rfg 1,2 --local --score-min G,7,2.8 -S. Setting for data set 2 (compiling 100 % identical hits, most of them were not reported with the first setting for unknown reason): bowtie2 -f -p -a -D 15 -R 2 -N 0 -L 9 -i S,1,0 --ma 2 --mp 40,40 --rdg 40,40 --rfg 40,40 --local --score-min G,10,2.8 -S. With in-house R scripts, datasets were converted into Galaxy datatype interval, combined, identical entries eliminated, sorted (natural order: chr, start coordinate, end coordinate), and split in “plus”- and “minus”-strand entries. These two groups—saved as individual files—were imported into Microsoft Excel 2010 to retain the longer regions of nested entries in order to represent each region uniquely. “Plus”- and “minus”-strand data were recombined and sorted (natural order: chr, start coordinate, end coordinate).

The random dataset cor21_R (N = ∼2.7 × 106) was generated with similar size and the length distributions of prediction dataset cor21_B. With an in-house R script, the sample() function collected start site coordinates scattered across all entire GRCh37/hg19 chromosomes. The number per chromosome was according to the chromosome’s relative length compared to the total length of all 24 chromosomes. To match the length distributions of the random regions with those of cor21_B, the length frequencies of cor21_B were used to generate the respective numbers needed. These values were added in random order to the start sites to generate the regions end coordinates. For dataset characterizations, see Figures S12–S16.

To analyze whether regions of the different datasets coincided, we used on the web-based platform Galaxy (https://usegalaxy.org/) the tool “Operate on Genomic Intervals→Join the intervals of two datasets side-by-side (minimal interval overlap: 1 bp).

Generation of Reference DpnII Fragments from the Human Genome

A 100,000-bp region was selected for each chromosome (including chromosome Y, even though not present in HEK293-derived cells). We started on chromosome 1 at position # 9,970,025 (at 4% of total length) representing the 0-based start coordinate to which 100,000 was added yielding the region. For the following chromosomes, the position in the percentage of chromosome length was systematically increased by factors 2.5, 3.5, and so forth. To avoid sequences consisting of Ns, start coordinates were shifted upstream (for chromosome 9: 10,000,000 bp; for chromosome Y: 40,000,000 bp). Upon in silico DpnII digestion, 6,094 fragments bounded by DpnII sites were retained with median length of 256 bp.

Generation of Sequence Logos

Sequence logos were generated using application WebLogo (version 3) (http://weblogo.threeplusone.com/manual.html).49 Sequence strings were directly taken from the alignments in column G of Table S1. For “contiguous cor21 portions,” the small letters in the cor21 sequence were replaced by dashes. For cor21 “footprints,” the complete chromosomal sequence strings were used, if 21 nt long. Both datasets, also grouped into “mutated” and “non-mutated,” were used in FASTA format as input.

Generation of Karyotypes

Chromosome-specific distributions of prediction and experimental data were visualized with programming language R package “chromPlot”50 (version 1.0.0) (Figure 2D, bottom panel; Figure S19C).

Generation of Venn Diagrams

Retrieval by pyrosequencing and nanopore sequencing of fragments defined by Sanger-based analyses was shown with Venn diagrams (Figure 4E) generated with programming language R package “VennDiagram” (version 1.6.17): Hanbo Chen (2016). VennDiagram: generate high-resolution Venn and Euler plots. R package (version 1.6.17). https://CRAN.R-project.org/package=VennDiagram.

Calculation of Chromosome-Specific GC Contents

Using CLC Genomics Workbench, the tool “Create GC Content Graph Track” (Window size: 25) was run on genome reference Ensembl Homo_sapiens GRCh37/hg19 release 75. Chromosomal values were generated with tool “Identify Graph Threshold Areas” (Window size: 25; Lower threshold: 0.0; Upper threshold: 1,000.0) (Figure 2D, top panel; Figure S19A).

Picture Processing

With software Quantum-Capt 15.14 (VILBER LOURMAT Deutschland GmbH, Eberhardzell, Germany), *.tiff files were generated. The brightness, contrast, and intensity were equally adjusted for entire gel photographs or exposure films to support the appearance of the original data.

Ethics Statement

The analyses were performed on a cryopreserved anonymized hematopoietic stem cell preparation. This preparation was obtained after informed consent from a healthy donor, and it was no longer required for transplantation purposes.

Author Contributions

F.R., K.S., and S.R. conceived and designed the experiments. F.R. and S.R. performed the experiments. F.R., K.S., and S.R. analyzed the data. F.R. wrote the paper. All authors contributed to and approved the final version of the manuscript.

Conflicts of Interest

F.R. was part of the Oxford Nanopore MinION Access Programme (MAP). Oxford Nanopore Technologies Ltd. have contributed free of charge early-access reagents in support of the data presented.

Acknowledgments

We are grateful to Ingrid Peter for her contributions in the beginning of the project. We thank all the members of our sequencing facility for their assistance in Sanger sequencing. We thank the members of the Schwarz laboratory for their critical discussions and Marita Fuehrer for critically reading of the manuscript. We gratefully acknowledge assistance from the employees of the Oxford Nanopore Technologies Ltd. We thank Daniel Fuerst for an introduction to the programming language R and further helpful advice. We are grateful to Stefan Rau for his help with hdf5 files. The Bowtie 2 alignments were performed on the computational resource bwUniCluster funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC. We gratefully acknowledge the support from the kiz of Ulm University, the bwSupport Portal, and our in-house IT department. For getting started using the bwUniCluster, we are especially thankful for the invaluable help of Marek Dynowski (University of Tuebingen; present address: Cancer Research UK Manchester Institute, The University of Manchester). This work was supported by the Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Service (Baden-Württemberg-Hessen, Ulm, Germany), and the EU project ZNIP (grant LSHB-CT2006-037783) (to K.S.).

Footnotes

Supplemental Information includes Supplemental Materials and Methods, thirty-two figures, and three tables and can be found with this article online at https://doi.org/10.1016/j.ymthe.2017.09.015.

Supplemental Information

Document S1. Supplemental Materials and Methods, Figures S1–S32, and Tables S1–S3
mmc1.pdf (6.4MB, pdf)
Table S1. Primary Data for ssODN Off-Target Events from cor21-Treated 293mEGFP-M12 Cells
mmc2.xlsx (380.6KB, xlsx)
Table S2. Primary Data for ssODN Off-Target Events from cybb100-Treated HEK-293 Cells
mmc3.xlsx (36.9KB, xlsx)
Table S3. Primary Data for ssODN Off-Target Events from cybb100-Treated Human CD34+ Hematopoietic Stem and Progenitor Cells
mmc4.xlsx (31.4KB, xlsx)
Document S2. Article plus Supplemental Information
mmc5.pdf (8.6MB, pdf)

References

  • 1.Flavell R.A., Sabo D.L., Bandle E.F., Weissmann C. Site-directed mutagenesis: generation of an extracistronic mutation in bacteriophage Q beta RNA. J. Mol. Biol. 1974;89:255–272. doi: 10.1016/0022-2836(74)90517-8. [DOI] [PubMed] [Google Scholar]
  • 2.Collins M., Thrasher A. Gene therapy: progress and predictions. Proc. Biol. Sci. 2015;282:20143003. doi: 10.1098/rspb.2014.3003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Naldini L. Gene therapy returns to centre stage. Nature. 2015;526:351–360. doi: 10.1038/nature15818. [DOI] [PubMed] [Google Scholar]
  • 4.Parekh-Olmedo H., Czymmek K., Kmiec E.B. Targeted gene repair in mammalian cells using chimeric RNA/DNA oligonucleotides and modified single-stranded vectors. Sci. STKE. 2001;2001:pl1. doi: 10.1126/stke.2001.73.pl1. [DOI] [PubMed] [Google Scholar]
  • 5.Radecke F., Radecke S., Schwarz K. Unmodified oligodeoxynucleotides require single-strandedness to induce targeted repair of a chromosomal EGFP gene. J. Gene Med. 2004;6:1257–1271. doi: 10.1002/jgm.613. [DOI] [PubMed] [Google Scholar]
  • 6.Segal D.J., Meckler J.F. Genome engineering at the dawn of the golden age. Annu. Rev. Genomics Hum. Genet. 2013;14:135–158. doi: 10.1146/annurev-genom-091212-153435. [DOI] [PubMed] [Google Scholar]
  • 7.Doudna J.A., Charpentier E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science. 2014;346:1258096. doi: 10.1126/science.1258096. [DOI] [PubMed] [Google Scholar]
  • 8.Gabriel R., Lombardo A., Arens A., Miller J.C., Genovese P., Kaeppel C., Nowrouzi A., Bartholomae C.C., Wang J., Friedman G. An unbiased genome-wide analysis of zinc-finger nuclease specificity. Nat. Biotechnol. 2011;29:816–823. doi: 10.1038/nbt.1948. [DOI] [PubMed] [Google Scholar]
  • 9.Crosetto N., Mitra A., Silva M.J., Bienko M., Dojer N., Wang Q., Karaca E., Chiarle R., Skrzypczak M., Ginalski K. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat. Methods. 2013;10:361–365. doi: 10.1038/nmeth.2408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chiarle R., Zhang Y., Frock R.L., Lewis S.M., Molinie B., Ho Y.-J., Myers D.R., Choi V.W., Compagno M., Malkin D.J. Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell. 2011;147:107–119. doi: 10.1016/j.cell.2011.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tsai S.Q., Zheng Z., Nguyen N.T., Liebers M., Topkar V.V., Thapar V., Wyvekens N., Khayter C., Iafrate A.J., Le L.P. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 2015;33:187–197. doi: 10.1038/nbt.3117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Canela A., Sridharan S., Sciascia N., Tubbs A., Meltzer P., Sleckman B.P., Nussenzweig A. DNA breaks and end resection measured genome-wide by end sequencing. Mol. Cell. 2016;63:898–911. doi: 10.1016/j.molcel.2016.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lensing S.V., Marsico G., Hänsel-Hertsch R., Lam E.Y., Tannahill D., Balasubramanian S. DSBCapture: in situ capture and sequencing of DNA breaks. Nat. Methods. 2016;13:855–857. doi: 10.1038/nmeth.3960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chen F., Pruett-Miller S.M., Huang Y., Gjoka M., Duda K., Taunton J., Collingwood T.N., Frodin M., Davis G.D. High-frequency genome editing using ssDNA oligonucleotides with zinc-finger nucleases. Nat. Methods. 2011;8:753–755. doi: 10.1038/nmeth.1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bialk P., Rivera-Torres N., Strouse B., Kmiec E.B. Regulation of gene editing activity directed by single-stranded oligonucleotides and CRISPR/Cas9 systems. PLoS ONE. 2015;10:e0129308. doi: 10.1371/journal.pone.0129308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Soldner F., Laganière J., Cheng A.W., Hockemeyer D., Gao Q., Alagappan R., Khurana V., Golbe L.I., Myers R.H., Lindquist S. Generation of isogenic pluripotent stem cells differing exclusively at two early onset Parkinson point mutations. Cell. 2011;146:318–331. doi: 10.1016/j.cell.2011.06.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nakagawa Y., Sakuma T., Nishimichi N., Yokosaki Y., Yanaka N., Takeo T., Nakagata N., Yamamoto T. Ultra-superovulation for the CRISPR-Cas9-mediated production of gene-knockout, single-amino-acid-substituted, and floxed mice. Biol. Open. 2016;5:1142–1148. doi: 10.1242/bio.019349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Renaud J.-B., Boix C., Charpentier M., De Cian A., Cochennec J., Duvernois-Berthet E., Perrouault L., Tesson L., Edouard J., Thinard R. Improved genome editing efficiency and flexibility using modified oligonucleotides with TALEN and CRISPR-Cas9 nucleases. Cell Rep. 2016;14:2263–2272. doi: 10.1016/j.celrep.2016.02.018. [DOI] [PubMed] [Google Scholar]
  • 19.Schmidt M., Zickler P., Hoffmann G., Haas S., Wissler M., Muessig A., Tisdale J.F., Kuramoto K., Andrews R.G., Wu T. Polyclonal long-term repopulating stem cell clones in a primate model. Blood. 2002;100:2737–2743. doi: 10.1182/blood-2002-02-0407. [DOI] [PubMed] [Google Scholar]
  • 20.Gabriel R., Eckenberg R., Paruzynski A., Bartholomae C.C., Nowrouzi A., Arens A., Howe S.J., Recchia A., Cattoglio C., Wang W. Comprehensive genomic access to vector integration in clinical gene therapy. Nat. Med. 2009;15:1431–1436. doi: 10.1038/nm.2057. [DOI] [PubMed] [Google Scholar]
  • 21.Hacein-Bey-Abina S., von Kalle C., Schmidt M., Le Deist F., Wulffraat N., McIntyre E., Radford I., Villeval J.L., Fraser C.C., Cavazzana-Calvo M., Fischer A. A serious adverse event after successful gene therapy for X-linked severe combined immunodeficiency. N. Engl. J. Med. 2003;348:255–256. doi: 10.1056/NEJM200301163480314. [DOI] [PubMed] [Google Scholar]
  • 22.Deichmann A., Brugman M.H., Bartholomae C.C., Schwarzwaelder K., Verstegen M.M., Howe S.J., Arens A., Ott M.G., Hoelzer D., Seger R. Insertion sites in engrafted cells cluster within a limited repertoire of genomic areas after gammaretroviral vector gene therapy. Mol. Ther. 2011;19:2031–2039. doi: 10.1038/mt.2011.178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Braun C.J., Boztug K., Paruzynski A., Witzel M., Schwarzer A., Rothe M., Modlich U., Beier R., Göhring G., Steinemann D. Gene therapy for Wiskott-Aldrich syndrome--long-term efficacy and genotoxicity. Sci. Transl. Med. 2014;6:227ra33. doi: 10.1126/scitranslmed.3007280. [DOI] [PubMed] [Google Scholar]
  • 24.Fischer A., Hacein-Bey-Abina S., Cavazzana-Calvo M. Gene therapy for primary adaptive immune deficiencies. J. Allergy Clin. Immunol. 2011;127:1356–1359. doi: 10.1016/j.jaci.2011.04.030. [DOI] [PubMed] [Google Scholar]
  • 25.Mullen C.A., Snitzer K., Culver K.W., Morgan R.A., Anderson W.F., Blaese R.M. Molecular analysis of T lymphocyte-directed gene therapy for adenosine deaminase deficiency: long-term expression in vivo of genes introduced with a retroviral vector. Hum. Gene Ther. 1996;7:1123–1129. doi: 10.1089/hum.1996.7.9-1123. [DOI] [PubMed] [Google Scholar]
  • 26.Muul L.M., Tuschong L.M., Soenen S.L., Jagadeesh G.J., Ramsey W.J., Long Z., Carter C.S., Garabedian E.K., Alleyne M., Brown M. Persistence and expression of the adenosine deaminase gene for 12 years and immune reaction to gene transfer components: long-term results of the first clinical gene therapy trial. Blood. 2003;101:2563–2569. doi: 10.1182/blood-2002-09-2800. [DOI] [PubMed] [Google Scholar]
  • 27.Radecke S., Radecke F., Peter I., Schwarz K. Physical incorporation of a single-stranded oligodeoxynucleotide during targeted repair of a human chromosomal locus. J. Gene Med. 2006;8:217–228. doi: 10.1002/jgm.828. [DOI] [PubMed] [Google Scholar]
  • 28.Ronaghi M., Karamohamed S., Pettersson B., Uhlén M., Nyrén P. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 1996;242:84–89. doi: 10.1006/abio.1996.0432. [DOI] [PubMed] [Google Scholar]
  • 29.Church, G., Deamer, D.W., Branton, D., Baldarelli, R., and Kasianowicz, J. (1998). Characterization of individual polymer molecules based on monomer-interface interactions. U.S. patent 5795782.
  • 30.Kasianowicz J.J., Brandin E., Branton D., Deamer D.W. Characterization of individual polynucleotide molecules using a membrane channel. Proc. Natl. Acad. Sci. USA. 1996;93:13770–13773. doi: 10.1073/pnas.93.24.13770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Clarke J., Wu H.-C., Jayasinghe L., Patel A., Reid S., Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 2009;4:265–270. doi: 10.1038/nnano.2009.12. [DOI] [PubMed] [Google Scholar]
  • 32.Deamer D.W., Akeson M. Nanopores and nucleic acids: prospects for ultrarapid sequencing. Trends Biotechnol. 2000;18:147–151. doi: 10.1016/s0167-7799(00)01426-8. [DOI] [PubMed] [Google Scholar]
  • 33.History - About Us - Oxford Nanopore Technologies. https://nanoporetech.com/about-us/history.
  • 34.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.De Ravin S.S., Li L., Wu X., Choi U., Allen C., Koontz S., Lee J., Theobald-Whiting N., Chu J., Garofalo M. CRISPR-Cas9 gene repair of hematopoietic stem cells from patients with X-linked chronic granulomatous disease. Sci. Transl. Med. 2017;9:eaah3480. doi: 10.1126/scitranslmed.aah3480. [DOI] [PubMed] [Google Scholar]
  • 36.Radecke F., Peter I., Radecke S., Gellhaus K., Schwarz K., Cathomen T. Targeted chromosomal gene modification in human cells by single-stranded oligodeoxynucleotides in the presence of a DNA double-strand break. Mol. Ther. 2006;14:798–808. doi: 10.1016/j.ymthe.2006.06.008. [DOI] [PubMed] [Google Scholar]
  • 37.Brachman E.E., Kmiec E.B. DNA replication and transcription direct a DNA strand bias in the process of targeted gene repair in mammalian cells. J. Cell Sci. 2004;117:3867–3874. doi: 10.1242/jcs.01250. [DOI] [PubMed] [Google Scholar]
  • 38.Igoucheva O., Alexeev V., Yoon K. Mechanism of gene repair open for discussion. Oligonucleotides. 2004;14:311–321. doi: 10.1089/oli.2004.14.311. [DOI] [PubMed] [Google Scholar]
  • 39.Wu X.-S., Xin L., Yin W.-X., Shang X.-Y., Lu L., Watt R.M., Cheah K.S., Huang J.D., Liu D.P., Liang C.C. Increased efficiency of oligonucleotide-mediated gene repair through slowing replication fork progression. Proc. Natl. Acad. Sci. USA. 2005;102:2508–2513. doi: 10.1073/pnas.0406991102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Olsen P.A., Randøl M., Luna L., Brown T., Krauss S. Genomic sequence correction by single-stranded DNA oligonucleotides: role of DNA synthesis and chemical modifications of the oligonucleotide ends. J. Gene Med. 2005;7:1534–1544. doi: 10.1002/jgm.804. [DOI] [PubMed] [Google Scholar]
  • 41.Aarts M., te Riele H. Parameters of oligonucleotide-mediated gene modification in mouse ES cells. J. Cell. Mol. Med. 2010;14(6B):1657–1667. doi: 10.1111/j.1582-4934.2009.00847.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Papaioannou I., Simons J.P., Owen J.S. Oligonucleotide-directed gene-editing technology: mechanisms and future prospects. Expert Opin. Biol. Ther. 2012;12:329–342. doi: 10.1517/14712598.2012.660522. [DOI] [PubMed] [Google Scholar]
  • 43.Modlich U., Bohne J., Schmidt M., von Kalle C., Knöss S., Schambach A., Baum C. Cell-culture assays reveal the importance of retroviral vector design for insertional genotoxicity. Blood. 2006;108:2545–2553. doi: 10.1182/blood-2005-08-024976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Modlich U., Navarro S., Zychlinski D., Maetzig T., Knoess S., Brugman M.H., Schambach A., Charrier S., Galy A., Thrasher A.J. Insertional transformation of hematopoietic cells by self-inactivating lentiviral and gammaretroviral vectors. Mol. Ther. 2009;17:1919–1928. doi: 10.1038/mt.2009.179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hopert A., Uphoff C.C., Wirth M., Hauser H., Drexler H.G. Specificity and sensitivity of polymerase chain reaction (PCR) in comparison with other methods for the detection of mycoplasma contamination in cell lines. J. Immunol. Methods. 1993;164:91–100. doi: 10.1016/0022-1759(93)90279-g. [DOI] [PubMed] [Google Scholar]
  • 46.Radecke S., Radecke F., Cathomen T., Schwarz K. Zinc-finger nuclease-induced gene repair with oligodeoxynucleotides: wanted and unwanted target locus modifications. Mol. Ther. 2010;18:743–753. doi: 10.1038/mt.2009.304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kent W.J. BLAT--the BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.R Development Core Team. R Foundation for Statistical Computing; 2016. R: a language and environment for statistical computing.https://www.R-project.org/ [Google Scholar]
  • 49.Crooks G.E., Hon G., Chandonia J.-M., Brenner S.E. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Oróstica K.Y., Verdugo R.A. chromPlot: global visualization tool of genomic data. Bioinformatics. 2016;32:2366–2368. doi: 10.1093/bioinformatics/btw137. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental Materials and Methods, Figures S1–S32, and Tables S1–S3
mmc1.pdf (6.4MB, pdf)
Table S1. Primary Data for ssODN Off-Target Events from cor21-Treated 293mEGFP-M12 Cells
mmc2.xlsx (380.6KB, xlsx)
Table S2. Primary Data for ssODN Off-Target Events from cybb100-Treated HEK-293 Cells
mmc3.xlsx (36.9KB, xlsx)
Table S3. Primary Data for ssODN Off-Target Events from cybb100-Treated Human CD34+ Hematopoietic Stem and Progenitor Cells
mmc4.xlsx (31.4KB, xlsx)
Document S2. Article plus Supplemental Information
mmc5.pdf (8.6MB, pdf)

Articles from Molecular Therapy are provided here courtesy of The American Society of Gene & Cell Therapy

RESOURCES