Skip to main content
. Author manuscript; available in PMC: 2023 Oct 18.
Published in final edited form as: Nature. 2022 Aug 24;609(7926):384–393. doi: 10.1038/s41586-022-05059-4

Extended Data Fig. 2 |. Additional analyses of off-target DNA binding from ChIP–seq data.

Extended Data Fig. 2 |

a, Scatter plot showing correlation between two biological replicates of ChIP–seq experiments with 3×Flag-tagged Cas8 and crRNA-4. ChIP–seq enrichment values are plotted for n = 424 peaks called by MACS3 that were present in both datasets, as identified by overlapping peak start and end coordinates. A linear regression fit and Pearson linear correlation coefficient (r) are shown; the on-target site is indicated. b, Scatter plot showing correlation between off-target peaks observed for Cas8 and TnsC ChIP–seq experiments with crRNA-287. ChIP–seq enrichment values are plotted for n = 60 peaks called by MACS3 that were present in both datasets, as identified by overlapping peak start and end coordinates. A linear regression fit and Pearson linear correlation coefficient (r) are shown; the on-target site is off-scale for TnsC and was not included. c, Histogram of the distance between the genomic coordinates of Cas8 and TnsC peak summits for the overlapping peaks in b. Most peaks are within 20 bp of each other. d, Global visualization of ChIP–seq peaks with crRNA-291 for off-target sites using heat maps, plotted side-by-side for Cascade (left), TnsC (middle), and TnsB (right); data are shown as in Fig. 2b, c. A 2-kb window for 487 genomic loci (y axis) is plotted in order of decreasing peak enrichment for ChIP–seq peaks called for the Cascade dataset, with the RPKM scale bar shown. The inset for TnsB highlights the strong enrichment of the on-target site (row 1) and immediate drop-off for other Cascade off-targets (rows 2–5). e, Global visualization of ChIP–seq peaks with crRNA-295, plotted as in d. f, Venn diagrams showing the overlap in off-target peaks called for Cascade, TnsC, and TnsB. Peaks were called individually for each ChIP–seq dataset using MACS3, and overlaps were analysed based on the coordinates for each peak. Data are shown for the same two crRNAs as in d and e. g, Multiple Cascade ChIP–seq datasets from experiments with distinct crRNAs exhibit common off-target peaks that share a common motif, as assessed by MEME analysis. Shown for each dataset (top to bottom) are the motif visualized above the corresponding nucleotides at the 5’ end of pseudo-crRNA spacer (left), and the motif probability graph showing the probability of a motif match occurring at a given position in the input sequence (right). The PAM and seed regions are indicated at the top, as well as the lack of discrimination at position 6 (red asterisk/arrow). n, the number of peaks contributing to the motif (and their percentage of total peaks called); E, the E-value significance of the motif resulting from MEME analysis. h, The motifs for common off-target peaks in g can be ascribed to pseudo-crRNAs that use the spacer-like sequence downstream of the terminal repeat within the CRISPR array. The schematic shows the architecture of the CRISPR array encoded by pQCascade and the presumed mechanism of crRNA biogenesis. A pseudo-crRNA relies on a 5′-handle derived from the second repeat and a pseudo-spacer derived from the downstream sequence, which is common to all pQCascade vectors used in this study. Notably, the pseudo-crRNA will lack the repeat-derived 3′-handle (stem-loop bound by Cas6), is unlikely to have a single defined length, and is predicted to form a minimal Cascade complex lacking both Cas6 and TniQ.