Abstract
Despite widespread use of CRISPR, comprehensive data on frequency and impact of Cas9-mediated off-targets in modified rodents is limited. Here, we present deep sequencing data from 81 mouse and rat genome editing projects and whole-genome sequencing data from 10 mouse embryos and their genetic parents. We compared the predictive ability of algorithms to an adapted version of GUIDE-seq, and we show that eSpCas9(1.1) and Cas9-HF1 reduce off-target mutation rates in vivo.
Several studies describe analysis of CRISPR off-targets in cell lines, and a number of methods have been developed that generate a list of off-targets in an un-biased manner: HTGTS1, GUIDE-seq2, Digenome-seq3, BLESS4 and most recently SITE-seq5 and CIRCLE-seq6. To our knowledge, these methods are not currently used for testing of sgRNAs prior to generating rodent models. Instead, analysis for off-target risk in rodent genome editing relies on CRISPR design algorithms7–9.
Most facilities generating genome-edited mice and rats apply a minimal set of guidelines to limit off-target effects. While guidelines are useful, even the most optimal sgRNAs have a predicted list of potential off-targets which is only valuable if followed up with experimental testing of the resulting animals. Due to limited understanding of long-range enhancers and other regulatory elements, potential off-targets outside annotated genes cannot be disregarded10. A key advantage of CRISPR for genome editing is the speed with which animal models with new mutations can be generated and avoiding the need for back-crossing to the original strain for more than one generation is desirable. Furthermore, the success of “knock-in” projects depends on proximity of the initial double-strand break to the target site11 and in some cases it is therefore necessary to compromise on sgRNA specificity.
Our CRISPR workflow for rodent genome editing projects is outlined in Supplementary Fig. 1a. A similar sequencing approach has been described previously12, but therein data from only five sgRNAs and 56 potential off-targets were described and no off-target mutations were identified. In addition to identification of lower frequency off-targets that might otherwise be missed (Supplementary Note 1), deep sequencing analysis of the mosaic G0 founders rather than their G1 progeny also allows for selection of founders with high contribution of the desired allele for more efficient breeding.
Twenty-three percent (19/81) of our animal model projects had off-targets (Fig. 1a) defined by at least one animal with at least one allele with Cas9-induced mutations in >3% of the sequence reads in at least one of the analyzed off-target loci. Some animal model projects require the use of two sgRNAs, for example to generate a large knock-out deletion. Eighteen percent of the sgRNAs (21/119, Supplementary Fig. 1b) had off-target activity. Eight of the 21 sgRNAs had activity at more than one predicted off-target. CRISPR/Cas9 activity was detected at 2% (32/1,423) of predicted off-target loci with informative sequence reads (Supplementary Fig. 1c). Fifty-six percent (18/32) of the off-targets were located in introns or within a conservatively defined promoter/UTR region (10 kb upstream or downstream) of known genes (Fig. 1b). A summary for each of the 21 off-target-positive sgRNAs and a complete list of all sgRNAs analyzed in this study with predicted off-targets are provided in Supplementary Figs. 2 and 3 and Supplementary Table 1, respectively.
Figure 1. Off-targets identified from CRISPR projects and in vivo reduction of off-target frequency by re-engineered Cas9.
a, Fraction of projects (animal models) with off-targets. b, Distribution of identified off-targets with respect to known coding genes. c-i, Box-and-whisker plots showing the distribution of 7/21 targets (black dots) and 9/32 off-targets (red dots), sorted by project. Y-axes show fraction of NGS-reads with evidence of Cas9 activity at each locus (one minus fraction of wildtype reads). c, EYFP, n=6 mice. n/d: no data; the EYFP sgRNA targets a mouse knock-in allele and on-target efficiency was analyzed by Sanger sequencing. d, Cflar, n=7 mice. e, Clu, n=5 mice. f, Dhps, n=15 mice. g, Ehmt2, n=14 mice. h, Gsdma, n=10 mice. i, Gsdmc, n=7 mice. All box-and-whisker plots depict min-max range, four quartiles, and center line represents median. Data points depict individual animals. OT: Off-target rank on top 15 list of predicted off-target locations. j, Percent mutation reads at the mouse Pnpla3 target and the originally identified 4 off-targets in vivo. Each datapoint represents one day of microinjection and NGS-analysis of pooled blastocysts. n=3 blastocyst pools comprised of the following number of blastocysts per day and condition; SpCas9_wt (n=45, 32, 56 blastocysts), SpCas9_1.1 (n=50, 51, 65 blastocysts), and SpCas9_HF1 (n=49, 45, 73 blastocysts). Off-target numbering as in Supplementary Fig. 3c. Un-paired two-tailed t-test for on-target means being identical (df=4): wt vs. 1.1, t=0.9025, p=0.42; wt vs. HF1, t=0.4260, p=0.69; 1.1 vs. HF1, t=0.3259, p=0.76. k, Percent mutation reads at the rat Map3k14 target and originally identified off-target in vivo. Each datapoint represents NGS-analysis of one rat embryo, SpCas9_wt n=17, SpCas9_1.1 n=30, SpCas9_HF1 n=24 embryos. Off-target numbering as in Supplementary Fig. 3g. Un-paired two-tailed t-test for on-target means being identical: wt vs. 1.1, df=45, t=3.488, p=0.0011; wt vs. HF1, df=39, t=10.66, p<0.0001; 1.1 vs. HF1, df=52, t=6.994, p<0.0001. For a, b, center line represents mean and error bars show SEM.
An overview of the 32 off-targets is provided in Supplementary Fig. 4 and the distribution of on- and off-target mutation frequencies is presented in Fig. 1c–i and Supplementary Fig. 5. A break-down of allele frequencies for all G0 founders is provided in Supplementary Figs. 6 and 7 and Supplementary Table 2. For 5 projects, G0 founders with off-target alleles were bred, and data for off-target allele transmission from these founders to their G1 progeny is provided in Supplementary Fig. 6. Our data demonstrates that even off-target alleles representing about 10% of reads can be transmitted to the next generation. Examples of genomic alignments are provided in Supplementary Fig. 8. The majority of Cas9-induced indel mutations would be small enough to be identified by our analysis13. Therefore, apart from the 113 predicted off-target loci lacking sequence data, it is unlikely that predicted and top-ranked off-target-positive loci were missed.
sgRNAs resulting in off-target activity are not more active than sgRNAs with no such activity (Supplementary Fig. 9a). We do not observe a strong correlation between specificity score and fraction of animals with off-target hits (r2=0.034, Supplementary Fig. 9b). Nevertheless, our data suggests, in agreement with previous observations9, a specificity score7 cut-off of 66. The odds-ratio of identifying an sgRNA without off-target mutations is 18 if the score is ≥66 (p=0.0001) (see Supplementary Note 2). Although the identified off-targets tend to rank higher among the top 15 predicted off targets (Supplementary Fig. 9c), several were ranked lower, suggesting that true off-targets could be missed by restricting analysis to the top 15 predicted genomic sites.
Using two of the sgRNAs found to have off-targets (mouse Pnpla3 and rat Map3k14), we compared engineered Cas9 variants with improved specificity14, 15 to wildtype Cas9 in embryos (Fig. 1j and k) as well as in mouse and rat cell lines (Supplementary Fig. 10), respectively. Both Cas9 variants reduced the off-target mutation frequency. For Pnpla3 embryos we did not observe a reduction in on-target efficiency for the variants (Fig. 1j). In cells, eSpCas9(1.1) activity was slightly higher than both wt and SpCas9-HF1 (Supplementary Fig. 10a). For Map3k14, engineered Cas9 on-target efficiency was more variable and lower than wt Cas9 activity (Fig. 1k, Supplementary Fig. 10b), in agreement with other findings (Supplementary Note 3). We recommend that eSpCas9(1.1)14 and SpCas9-HF115 or other engineered Cas9 variants (e.g. HypaCas916) be considered for routine generation of animal models, especially for projects where lower on-target efficiency is acceptable.
As we observed four off-targets among the predicted top 15 for the mouse Pnpla3 sgRNA, and since the specificity score was very low (26.4), we next took an unbiased approach to identify all true off-targets for this sgRNA, allowing us to more precisely test the predictive value of existing algorithms. Using a mouse cell line, we first performed Target-Enriched GUIDE-seq (TEG-seq), a variant of GUIDE-seq2 adapted for the Ion Torrent sequencing platform (Online Methods) and identified 170 DNA tag insertion sites. Deep sequencing analysis of amplicons (ampli-seq) confirmed 105 of the sites as off-targets (Supplementary Table 3). As shown in Supplementary Fig. 11, 63 of the 105 ampli-seq validated off-targets overlap with our list of 10,360 CRISPOR9-predicted off-targets having up to 5 mismatches. Forty-two off-targets had 6–8 mismatches and some required bulging17 of base pairs for proper alignment (Supplementary Table 3).
It is unknown how well GUIDE-seq analysis predicts off-targets generated in vivo. We therefore performed in vitro fertilization (IVF) and microinjection of the Pnpla3 sgRNA and Cas9 mRNA into zygotes from C57BL/6J, followed by whole genome sequencing of 10 embryos and their genetic parents, at an average of 80x coverage (Online Methods). Filtering (Supplementary Table SN1) included subtraction of all variants found in the genetic parents and we identified 43 true Cas9-generated off-targets (Supplementary Fig. 12 and Supplementary Table 3). Examples of Pnpla3 off-target genomic alignments are provided in Supplementary Figure 13 and all 43 can be viewed from a data track hub on the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=max&hgS_otherUserSessionName=GenentechNoInsertion). Thirty of the 43 off-targets overlap with the validated TEG-seq hits, and 25/30 also overlap with the predicted off-targets (Supplementary Fig. 11, see also Supplementary Note 4.)
The distribution of the 43 off-targets among all 10 embryos is shown in Fig. 2a, Supplementary Figs. 14 and 15a. We observed a strong correlation (r2=0.85, p<0.0001) between the ampli-seq mutation frequencies and average embryo mutation frequency (Fig. 2b). The number of TEG-seq reads were less strongly correlated with average embryo mutation frequency (r2=0.58, p<0.0001, Supplementary Fig. 15b) and we did not observe a strong correlation between average embryo mutation frequency and MIT7 (r2=0.30, p=0.0001) or CFD8 (r2=0.38, p<0.0001) off-target scores (Supplementary Fig. 15c, d).
Figure 2. TEG-seq is a good predictor of in vivo activity.
a, Radar plots showing the distribution of on- and off-target activity across the target and 43 off-target loci identified by whole-genome sequencing. Off-target numbers are indicated on the periphery. Data from two embryos is shown (185, least number of off-targets, 231, most off-targets). Percent mutation read quartiles are indicated by grey circles. b, Correlation between ampli-seq mutation frequency in Neuro-2a cells and average embryo mutation frequency (n=119 loci, 118 off-targets, black dots, and Pnpla3 target, red dot). r2 value (coefficient of determination) is indicated (correlation coefficient r=0.9225. Two-tailed t-test: r value significantly different from zero, p<0.0001). c, box-and-whisker plot depicting number of off-targets with at least 10% mutation efficiency per embryo, n=10 mouse embryos. Plot shows min-max range, four quartiles, and center line represents median.
To calculate the ability of ampli-seq to predict true off-targets in vivo, we chose off-targets with at least 10% average mutation frequency in mosaic mouse embryos, since lower frequency alleles are less likely to be transmitted to the next generation (Supplementary Fig. 6). Fig. 2c shows the number of off-targets with ≥10% frequency, per embryo, but only 15/43 off-targets had an average frequency above 10%. All of these 15 off-targets were observed in at least two embryos. Of the 15 off-targets, one had 6 mismatches, two had 5 mismatches, and the remaining 12 had 1–4 mismatches. Supplementary Fig. 15e shows the fraction of the 15 off-targets captured at a given ampli-seq mutation frequency cut-off. With a 2.5% ampli-seq cut-off, 14 out of 15 off-targets would be identified (93%). Off-target 21 (Supplementary Table 3) was identified by TEG-seq but not subsequently confirmed by ampli-seq. By comparison, algorithm off-target score cut-offs of 0.1 (1,423 loci, MIT) or 0.2 (250 loci, CFD) are needed to capture the majority (73% and 93%, respectively) of these 15 hits (Supplementary Fig. 15 f, g, Supplementary Note 5).
While the Pnpla3 sgRNA produced a considerable number of off-target mutations, it probably represents a worst-case scenario (see Supplementary Note 6). By selecting sgRNAs with higher specificity scores and/or using Cas9 with increased fidelity, it should be possible to reduce the risk and impact of off-targets. However, to avoid un-expected phenotypes altogether, we also recommend prediction of off-targets using un-baised methods followed by ampli-seq screening of G0 founder animals.
METHODS
Methods, including statements of data availability and any associated accession codes and references are available in the online version of the paper.
ONLINE METHODS
Animals
All mice and rats were generated at Genentech and maintained in accordance with American Association of Laboratory Animal Care (AALAC) guidelines. The experiments were conducted in compliance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the Genentech Institutional Animal Care and Use Committee (IACUC).
sgRNA design and off-target prediction
Depending on type of animal model to be engineered, sgRNAs in the genomic region of interest were identified initially (first 9 animal models) using a previously published scoring algorithm (crispr.mit.edu7) and subsequently a Molecular Biology CRISPR design tool (Benchling) that uses the same algorithm providing “MIT” specificity scores for each sgRNA as well as the top 50 predicted off-target loci and corresponding “MIT” off-target scores, that also highlights mismatches. The first edition of the Benchling tool masked repeat regions similar to the Hsu et al. tool. The current version of Benchling’s CRISPR design tool allows unmasking of repeat regions and includes any potential off-target hits in the top 50 list based on scores. Final sgRNAs used for editing were chosen based on a qualitative balance of specificity scores, distance to desired mutation/insertion and manual assessment of the off-target list. The off-target list assessment includes avoiding sgRNAs with potential hits in coding regions, avoiding sgRNAs with off-target hits on the same chromosome as the intended target, when possible avoiding any sgRNAs that had many predicted off-targets lacking mismatches in the seed region (10–12 nt proximal to PAM), and considering whether off-targets had NGG or NAG PAMs. Once an sgRNA decision was finalized, the off-target list was used to identify the top 11, and later top 15, off-target loci per sgRNA and NGS amplicon primers were designed for the on-target locus and each of the off-targets (more details below).
Microinjection of mouse and rat embryos
sgRNAs were generated by MEGA shortscript T7 in vitro transcription (ThermoFisher, AM1354) according to standard methods (performed in-house or purchased directly from ThermoFisher). For all sgRNAs, a fragment was cloned into a plasmid backbone, sequence-verified, linearized and used as template for in vitro transcription. The scaffold for all sgRNAs in this study was 5’-gttttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgc-3’.
In vitro transcribed sgRNAs were purified (MEGAclear kit, ThermoFisher, AM1908) and purity QC performed one of three ways (Agilent Small RNA kit # 5067–1548, Agilent RNA 6000 Nano kit #5067–1511, or Advanced Analytics Standard RNA Analysis kit #DNF-471). Using Bioanalyzer (Agilent #G2939B), a more accurate concentration was calculated as area under the expected peak curve, which, depending on sequence and strength of secondary structure, ranged from 75 – 105 nt. Reagent concentrations for microinjection: 25 ng/µl Cas9 mRNA (ThermoFisher, #A29378) + 13 ng/µl each sgRNA. For the whole genome sequencing project, pronuclear stage embryos from C57BL/6J mice (The Jackson Laboratory) were generated by in vitro fertilization (IVF). Three week-old females were injected with 0.1 ml of HyperOva (Cosmo Bio)18 followed by 5 IU of human chorionic gonadotropin (hCG, NHPP) 47 hours later. IVF was performed 14 hours after the hCG injection in Modified Tyrode’s Solution (in g/L, 7.31 NaCl, 0.20 KCl, 0.04 NaH2PO4, 2.10 NaHCO3, 0.10 MgCl2 6H20, 0.26 CaCl2 2H2O, 1.00 glucose, 0.5 Penicillin G, and 4 BSA) supplemented with 1.25 mM reduced glutathione (Sigma). Oocytes (126 and 103, respectively) collected from the two females underwent IVF in separate dishes with sperm from a single 11 week-old male. Spleen samples were collected from all three donor mice after oocyte or sperm collection, and the samples were snap frozen in liquid Nitrogen and stored at −80°C until genomic DNA preparation. Approximately six hours after IVF, zygotes with two visible pronuclei were sorted using an inverted microscope (Nikon Eclipse, TS100). Groups of 25 to 50 pronuclear stage embryos were transferred into an injection slide/chamber containing M2 medium (Zenith Bio) supplemented with 5 µg/ml of Cytochalasin B (Sigma) for 10 minutes19. Subsequently, cytoplasmic injection was performed with a microinjection needle crafted using a P-97 micro pipette puller (Sutter Instruments). Embryos surviving microinjection were incubated (37°C, 6% CO2) overnight in KSOM+AA (Zenith Bio) drops covered with mineral oil (Sigma). Each e0.5 pseudopregnant ICR female received 22 to 30 2-cell stage embryos by oviduct transfer surgery. The injection mixture was prepared the day of microinjection with RNAse- and DNAse-free reagents. Injection buffer: 10 mM Tris-HCl, 0.25 mM EDTA, pH8.0 (Fisher Scientific). Cas9 mRNA and sgRNA concentrations used are listed above. Embryos from each donor female were kept separate during IVF, microinjection and embryo transfer. Similar conditions were used for the additional microinjection projects, except natural mating rather than IVF was used to obtain zygotes.
Multiplex PCR Amplicon NGS sample Generation
On-target and off-target primer pairs were designed using NCBI Primer-Blast20 with the following modifications to the default setting: amplicon size 200–300, Tm min 59°C, Tm max 61°C, pair specificity against appropriate Genome (reference assembly). If primers could not be designed that generate a unique amplicon in the 200–300 bp range, repeat filter was turned off, avoiding low complexity regions was unselected, and amplicon size was broadened to 150–400 bp. Primers were not permitted to reside within a 50 nt region to either side of the expected sgRNA cut site to ensure coverage of varied deletion sizes. All on- and off-target primer pairs associated with each sgRNA were ordered as 50 µM RxnReady stocks (Integrated DNA Technologies). Multiplex PCR (1 animal = 1 reaction/sgRNA) was performed using a polymerase kit (Qiagen, #206143) and the manufacturer’s protocol was followed. A subset of early projects employed primer sets that amplified each on- and off-target region individually using Advantage GC 2 polymerase mix (Clontech, 639119) and reactions were pooled post-PCR. A single un-related wildtype animal control (appropriate species and strain) was routinely included to account for locus specific SNPs, indels, and problematic sequences not associated with CRISPR editing (data not included). When wildtype animals were not included, off-target-negative littermates served as negative controls and provided confirmation of reference sequence accuracy. Each multiplex PCR reaction (or reactions if using two sgRNAs) was purified with a DNA Clean & Concentrator-5 kit (ZymoResearch, D4004) and eluted in 25 µl H2O. Amplicon DNA concentrations were obtained using a Qubit High Sensitivity kit (Invitrogen, Q32854). For single sgRNA knock-out and some knock-in animal models, all G0 mosaic founders were screened by targeted amplicon deep sequencing. For dual sgRNA knock-out and remainder of knock-in animal models, G0 animals were first analyzed by PCR/gel electrophoresis or droplet digital PCR (BioRad), and only desired dropout deletion- or intended mutation-positive G0 mosaic founders were screened by targeted amplicon NGS.
Preparation of libraries and deep sequencing
For library preparation of each sample, 66 ng of the PCR products were used with the Ovation Library system for Low Complexity Samples kit (NuGEN, #9092–256). Sequencing of the libraries was completed with the Illumina MiSeq or HiSeq 2500 in rapid mode and 200 cycle single-end runs with V2 chemistry reading dual barcodes.
Analysis of CRISPR on- and off-targets using NGS data
Sequencing reads were aligned to the genome (GRCm38/mm10 for mouse, RGSC 5.0/Rn5 for rat) using GSNAP21 as packaged in gmap-2014–11-14 with the following options: -m 5 -i 1 -N 1 -B 5 --split-output=alignment/gsnap -E 4 -n 10 -w 200000 --quality-protocol=sanger --format=sam -t 18. We only used the uniquely mapped reads for downstream analysis. InDel allele frequencies were computed by counting every type of InDel with a specific start position and length. Only reads that fully cross a 51 bp window around the predicted target site are counted. Reads containing only base mismatches likely introduced during PCR or sequencing were counted as wildtype reads. Unique mutant InDel alleles with >3% of total reads were flagged as potential off-targets by the analysis pipeline. Potential off-targets were then analyzed by visual inspection in IGV and only considered true off-targets if the InDel occurred at the expected position upstream of the PAM site. Mutant reads with a frequency <3%, including reads with additional SNPs likely resulting from PCR or the sequencing reaction in addition to an InDel, were pooled (=“sum of alleles <3%” in Supplementary Figs. 6 and 7). To identify potential Cas9-induced SNPs, the wt reads were visually inspected in IGV as SNPs would automatically be included in the wildtype bin.
Whole genome sequencing
Spleens from the male (#188) and the two female parents (#187 and #232), and all viable e10.5 embryos (#180–186 and #229–231) were harvested and digested completely for genomic DNA extraction (DNeasy Blood & Tissue kit, Qiagen, #69506). Genomic DNA was sheared to lengths between 200 and 700 bp using Covaris (LE220) and fragments ranging between 400–600 bp were selected for the libraries, prepared from 100 ng DNA using the Nano library kit (Truseq, FC-121–4001). Libraries were sequenced on Illumina HiSeq 4000 instruments with paired end 1×150 bp read length at an average of 80x coverage. Sequencing reads were aligned to the mouse genome (GRCm38/mm10) using BWA 0.7.10 as follows: bwa mem -t 32 -M. The alignments were then processed through the GATK pipeline to end up with a joint genotyped VCF file. All indel events with a GQ quality score of at least 80 in one of the offspring were then used as targets for our CRISPR analysis workflow described above using a 21 bp window and indel allele frequencies in each animal were computed. InDels present in at least one offspring and absent in parental mice were retained.
Cell culture
Hepa1–6 cells (ATCC, CRL-1830, derived from C57L/J mice22) were cultured in DMEM-high glucose with 10% FBS (Seradigm, #1500–100), 10mM L-glutamine, penicillin (100 U/mL) and streptomycin (100 µg/mL) (Gibco, 15140–122) and used for mouse gRNA studies. Rat-2 cells (ATCC, CRL-1764, derived from Fisher rat embryo23) were cultured in RPMI with 10% FBS (Seradigm, #1500–100), 10 mM L-glutamine, penicillin (100 U/mL) and streptomycin (100 µg/mL) (Gibco, 15140–122) and used for rat sgRNA studies. For the Cas9 variant study transfections, 3 × 10^5 cells were nucleofected (Lonza Nucleofector 4D) with 1µg pRK-Cas9 (or variants eSpCas9(1.1)/SpCas9-HF1, cloned in the same configuration as wt Cas9 to ensure accurate comparison) and 0.5 µg sgRNA plasmids using solution SF + program EN-138 or solution SG + DS-189 for Hepa1–6 or Rat-2 cells, respectively. Cells were recovered post nucleofection in 100 µl RPMI media for 30 min – 1 hr prior to seeding in 6 well plates. Genomic DNA from cells was harvested with a Quick-gDNA Microprep kit (ZymoResearch, D3021) 5 days post-nucleofection.
Target-Enriched GUIDE-seq (TEG-seq)
TEG-seq is comparable to GUIDE-seq2. Both methods use protected short double strand DNAs to tag break sites in the genome. These tags are then used as universal primer sites for amplification and NGS mapping of the sequences flanking the break site. The TEG-seq protocol uses amplification and sequencing primer design that makes it specifically compatible with the Ion Torrent line of NGS platforms. Comparisons with data from Illiumina-based GUIDE-seq experiments suggest that both methods yield similar results (Tang et al. manuscript submitted). TEG-seq was performed by ThermoFisher. Briefly, 150,000 Neuro-2a mouse cells (ATCC® CCL131™, derived from the A/J strain24) were transfected with Cas9 protein (1µg) and a full-length synthetic Pnpla3 sgRNA (10 pmol, Synthego) complexed as ribonucleoprotein (RNP) along with a dsTag (1.25 pmol) using the Neon electroporation system, incubated for 3 days after which genomic DNA was isolated (PureLink® Genomic DNA, ThermoFisher) and subjected to the TEG-seq/GUIDE-seq procedure. Synthetic sgRNA was used for TEG-seq as we have found it to be more efficient than in vitro transcribed sgRNA for RNP electroporation. Amplicon reads were aligned to the reference mouse genome (GRCm38/mm10). The mapped reads were further preceded through Motif-Search, a plugin software for off-target search and read count (ThermoFisher). For easier comparison, total reads from the different samples were normalized by using Reads Per Million (RPM): total reads from the sample multiplied by one million and divided by total number of mapped reads of the NGS run. The off-target candidates with RPM > 1 were subjected to Targeted Amplicon re-sequencing (Ampli-seq). Primers flanking the cleavage sites and used for Ampli-seq are listed in Supplementary Table 2. Off-targets were confirmed by detection of either the dsTag or presence of indels >3 bp in the expected position upstream of the PAM site.
Statistical analysis
All analysis was carried out using Prism 6 (Graphpad Software). All box-and-whisker plots (Figs. 1c–i, 2c, Supplementary Figs. 5 and 15a) show median, minimum to maximum range, and first, second, third and fourth quartiles. All data points are shown. Dot plots (Fig. 1j and k, Supplementary Figs. 9a, 10) show mean value and error bars indicate standard error of the mean (SEM). All data points are shown. For Supplementary Fig. 9a, average on-target values were calculated in Microsoft Excel for each sgRNA from individual animal on-target efficiences for that sgRNA. Each data point thus represents the average on-target efficiency for each sgRNA and the means of the distributions represent the mean of average values. The distributions of on-target values for each individual sgRNA were not calculated. Two-tailed un-paired t-test was used to test for significance of difference between the means. For Figs. 1j, k and Supplementary Fig. 10, mean on-target values were compared in a pairwise manner using the two-tailed un-paired t-test for means being identical. For Fig. 2b, and Supplementary Figs. 15b, c, d, Pearson correlation coefficient (r) and r2 value (coefficient of determination) were computed. Odds ratios (Supplementary Fig.9b) were calculated using Fisher’s exact test (two-tailed) for each specificity score and the score with the highest odds ratio (66, OR=18) was plotted in the figure.
Figures
Plots and graphs were generated in Prism 6 (GraphPad Software). Tables included in Supplementary Figs. 2, 3, 4 and 12 were generated in Microsoft Word. IGV alignment data in supplementary figures were generated from snapshots of files loaded in IGV2.3.96. Radar plots were generated using ggplot225. All figures were annotated and assembled in Adobe Illustrator CS6.
Life Sciences Reporting Summary
Further information on experimental design is available in the Life Sciences Reporting Summary.
Data availability
Whole-genome sequencing, TEG-seq and ampli-seq data from this study is available through the NCBI Sequence Read Archive under accession number SRP124981 (https://www.ncbi.nlm.nih.gov/sra/SRP124981). The BAM files for the 43 off-targets identified from WGS can be viewed as a track in the UCSC browser (http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=max&hgS_otherUserSessionName=GenentechNoInsertion). The authors declare that all other data supporting the findings of this study are available within the paper and its Supplementary Information tables.
Plasmids
All plasmids used in this study are available from us upon request.
Supplementary Material
ACKNOWLEDGMENTS
We thank the Genentech animal core groups for animal care and preparation of genomic DNA; B. Haley and V. Dixit for helpful discussions; S. Seshagiri for NGS support; K. Kawamura, C. Reyes, and P.-Z. Tang (Thermo Fisher) for generating the TEG-seq data; A. Bruce for generating Supplementary Fig. 1. Most authors were Genentech employees at time of study and all studies were funded by Genentech, a member of the Roche Group. M. Haeussler was funded by NIH/NHGRI 5U41HG002371-15.
Footnotes
Any Supplementary Information and Source Data files are available in the online version of the paper.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests
References
- 1.Frock RL et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat Biotechnol 33, 179–186 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tsai SQ et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187–197 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kim D et al. Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat Meth 12, 237–243 (2015). [DOI] [PubMed] [Google Scholar]
- 4.Crosetto N et al. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat Meth 10, 361–365 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cameron P et al. Mapping the genomic landscape of CRISPR-Cas9 cleavage. Nat Meth 14, 600–606 (2017). [DOI] [PubMed] [Google Scholar]
- 6.Tsai SQ et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat Meth 14, 607–614 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hsu PD et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Doench JG et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol 34, 184–191 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Haeussler M et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17, 148 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hnisz D et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Paquet D et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129 (2016). [DOI] [PubMed] [Google Scholar]
- 12.Singh P, Schimenti JC & Bolcun-Filas E A mouse geneticist’s practical guide to CRISPR applications. Genetics 199, 1–15 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dow LE et al. Inducible in vivo genome editing with CRISPR-Cas9. Nat Biotechnol 33, 390–394 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Slaymaker IM et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kleinstiver BP et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chen JS et al. Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature 550, 407–410 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lin Y et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res 42, 7473–7485 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Takeo T & Nakagata N Superovulation Using the Combined Administration of Inhibin Antiserum and Equine Chorionic Gonadotropin Increases the Number of Ovulated Oocytes in C57BL/6 Female Mice. PLoS ONE 10, e0128330 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hu L-L et al. Cytochalasin B treatment of mouse oocytes during intracytoplasmic sperm injection (ICSI) increases embryo survival without impairment of development. Zygote 20, 361–369 (2012). [DOI] [PubMed] [Google Scholar]
- 20.Ye J et al. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13, 134 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wu TD & Nacu S Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Darlington GJ, Bernhard HP, Miller RA & Ruddle FH Expression of liver phenotypes in cultured mouse hepatoma cells. J Natl Cancer Inst 64, 809–819 (1980). [PubMed] [Google Scholar]
- 23.Topp WC Normal rat cell lines deficient in nuclear thymidine kinase. Virology 113, 408–411 (1981). [DOI] [PubMed] [Google Scholar]
- 24.Klebe RJ & Ruddle FH Neuroblastoma: Cell culture analysis of a differentiating cell system. J Cell Biol 43, 69A (1969). [Google Scholar]
- 25.Wickham H ggplot2 (Springer New York, 2009). doi: 10.1007/978-0-387-98141-3 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Whole-genome sequencing, TEG-seq and ampli-seq data from this study is available through the NCBI Sequence Read Archive under accession number SRP124981 (https://www.ncbi.nlm.nih.gov/sra/SRP124981). The BAM files for the 43 off-targets identified from WGS can be viewed as a track in the UCSC browser (http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=max&hgS_otherUserSessionName=GenentechNoInsertion). The authors declare that all other data supporting the findings of this study are available within the paper and its Supplementary Information tables.


