SUMMARY
We set out to exhaustively characterize the impact of the cis-chromatin environment on prime editing, a precise genome engineering tool. Using a highly sensitive method for mapping the genomic locations of randomly integrated reporters, we discover massive position effects, exemplified by editing efficiencies ranging from ~0% to 94% for an identical target site and edit. Position effects on prime editing efficiency are well-predicted by chromatin marks, e.g. positively by H3K79me2 and negatively by H3K9me3. Next, we developed a multiplex perturbational framework to assess the interaction of trans-acting factors with the cis-chromatin environment on editing outcomes. Applying this framework to DNA repair factors, we identify HLTF as a context-dependent repressor of prime editing. Finally, several lines of evidence suggest that active transcriptional elongation enhances prime editing. Consistent with this, we show we can robustly decrease or increase the efficiency of prime editing by preceding it with CRISPR-mediated silencing or activation, respectively.
Graphical Abstract

In Brief
Prime editing exhibits strong, chromatin-related position effects and is promoted by active transcriptional elongation. Leveraging these features via epigenetic conditioning of a locus modulates prime editing efficiency.
INTRODUCTION
Prime editing facilitates the precise installation of diverse genetic variants with minimal off-target effects1–5. The prime editor consists of a fusion of a Cas9 nickase and reverse transcriptase (RTase), while the prime editing guide RNA (pegRNA) specifies both the target site and desired edit. The high programmability and specificity of prime editing, together with its avoidance of double-stranded breaks (DSBs), offer considerable advantages over alternatives in the context of therapeutic genome editing6,7, molecular recording8–10 and the functional characterization of genetic variants11.
A counterpoint to this promise is that prime editing’s efficiency is highly variable and often low1,12–14. Relevant factors likely include: 1) properties of the prime editing ribonucleoprotein (RNP) complex; 2) the target sequence and edit type; 3) trans-acting factors, e.g. endogenous DNA repair proteins; and 4) the cis-chromatin environment of the target site. The first three factors have been explored to great effect, resulting in optimized and/or specialized prime editors1,13,15,16, sequence-based machine learning models that facilitate pegRNA design12,17–20, and MMR-targeting strategies that enhance prime editing efficiency1,13,14, respectively. However, how the cis-chromatin environment impacts prime editing remains largely unexplored.
It is well established that conventional Cas9-mediated genome editing is strongly influenced by chromatin, both through the steric effects of nucleosomes21,22 and epigenetic effects on the balance of endogenous DNA repair pathways used to repair Cas9-mediated DSBs23–26. However, as prime editing leverages a different set of endogenous DNA repair pathways than conventional Cas9-mediated genome editing, we hypothesized that prime editing’s outcome would also be strongly influenced by the chromatin.
RESULTS
Design and efficient mapping of prime editing reporters
To measure cis-chromatin effects, we developed a strategy for efficiently mapping the precise genomic locations of randomly integrated reporters (Fig. 1A). The method relies on linear amplification of insertion junctions by in vitro transcription (IVT) on genomic DNA (gDNA) from a bacteriophage T7 promoter embedded within the reporter. In vitro transcribed RNAs are then reverse transcribed (RT), amplified by semi-specific PCR, and sequenced to associate reporter-specific barcodes with neighboring genomic sequences. Sequencing reads span insertion junctions, such that integration sites are mapped at base-pair (bp) resolution (Supplementary Fig. 1A). T7-assisted reporter mapping offers two key advantages over conventional reporter mapping approaches27–30. First, it eliminates digestion and ligation steps that are characterized by bias and/or suboptimal efficiency. Second, T7 polymerase linearly amplifies molecules carrying positional information before RT-PCR, which further increases sensitivity.
Figure 1. Efficient genome-wide mapping of integration sites of synHEK3 reporters.

A) Measuring cis-chromatin effects on prime editing efficiency. A library of synHEK3 reporters is randomly inserted throughout the genome. Genomic locations of individual reporters are determined with a T7-assisted reporter mapping method. The cis-chromatin contexts of mapped reporters are used to model prime editing outcomes as measured from each reporter. B) Genome browser view of a read pileup pinpointing the precise coordinates of a synHEK3 reporter integration. Barcode sequence, orientation and coordinates of the reporter are annotated. C) Motif enrichment analysis of 20-bp windows surrounding synHEK3 integration sites. D) Coverage plot of unique synHEK3 reporters identified in the bottlenecked pool (n = 4,273). Vertical bar lengths correspond to read counts. E) UpSet plot of genomic annotations of synHEK3 integration sites.
We developed a compact piggyBac-based (PB) prime editing reporter bearing the T7 promoter, the target sequence for a highly efficient pegRNA (HEK3), and a 16-bp degenerate barcode (BC). The final construct was 358 bp and lacked any known regulatory elements that could potentially interfere with the local chromatin environment (Supplementary Fig. 1A). We randomly integrated a complex library of these “synHEK3” reporters into a K562 cell line constitutively expressing the PE2 prime editor2 (Supplementary Fig. 1B). After piggyBac transposase was diluted out and all synHEK3 integrations were stable, we estimated 15.5 reporter copies (mean) per cell by qPCR (Supplementary Fig. 1C). We bottlenecked these cells to ~500 clones, each presumably containing a unique set of integrated reporters. Downstream work was performed on this bottlenecked population (Supplementary Fig. 1B).
To map the genomic locations of synHEK3 integrations, we performed T7-assisted reporter mapping. Subsets of aligned reads defined sharp boundaries when visualized on a genome browser, each corresponding to the precise insertion point of a synHEK3 reporter (Fig. 1B). After barcode error correction, we identified 10,095 insertion sites (Supplementary Fig. 1D). Saturation analysis indicated the mapping was near-complete (Supplementary Fig. 1E). Motif analysis of insertion junctions revealed the expected TTAA motif for piggyBac transposition31 (Fig. 1C). We removed sites lacking this motif (6.4%) and assigned the genomic coordinates of TTAA motifs as locations of individual reporters. Of note, only 4,273 mapped sites (42.3%) bore a unique barcode (Fig. 1D), while the remaining 5,177 (51.3%) bore one of 880 barcodes that recurred at multiple integration sites (Supplementary Fig. 1F). These recurrent barcodes are not due to limited barcode complexity in the plasmid library, but rather reflect piggyBac transposon excision and re-integration events occurring after DNA replication but before the transposase was fully diluted out32.
SynHEK3 reporters were distributed across all major genomic annotations, with a bias towards genic regions (Fig. 1E; Supplementary Fig. 1G,H). Compared to randomly selected genomic or TTAA sites, piggyBac integrations were most strongly enriched near active transcriptional start sites (TSS) and enhancers, and most strongly depleted from quiescent regions, which are mostly constitutive heterochromatin33 (Supplementary Fig. 1I). Given these biases, we sought to assess how broad a range of epigenetic environments was sampled by integrated reporters. For this, we computed epigenetic scores for 2-kb windows surrounding synHEK3 integrations for various chromatin features in K562 cells34 (Tables S2–S3), and compared these against equivalent scores for randomly selected genomic and TTAA sites. While synHEK3 integration sites tended to have higher scores for markers of open chromatin due to piggyBac integration bias, they nonetheless sample the full dynamic ranges of the chromatin features surveyed, including those associated with poorly accessible chromatin environments (Supplementary Fig. 1J).
Impact of chromatin features on prime editing efficiency
To measure prime editing outcomes across different chromatin contexts, we transfected the bottlenecked cell population with a pegRNA designed to install a CTT insertion at the HEK3 target sequence1. After 4 days, we measured prime editing efficiencies for all uniquely barcoded reporters (Fig. 2A, left). This pegRNA drove 17% editing at the endogenous HEK3 locus. However, a broad range of efficiencies (~0% to 94%, mean: 19.8%) was observed for the HEK3 target when integrated at different genomic locations via the synHEK3 reporter, demonstrating prime editing outcomes are highly influenced by position effects (Fig. 2A, right).
Figure 2. Chromatin context has a major impact on prime editing efficiency.

A) Left: workflow of experiment. Right: density plot of CTT insertion frequency in all uniquely barcoded synHEK3 reporters (n = 4,273). Red line indicates prime editing efficiency (17%) at endogenous HEK3 locus in K562 cells. B) Heatmap of fractions of highly editable (>25%) sites in synHEK3 sites stratified by chromatin feature scores. SynHEK3 reporters are binned into 10 equally sized bins with increasing chromatin feature scores. The chromatin features are ordered left-to-right by their correlation coefficient (Spearman’s ρ) with prime editing efficiency. C) Scatter plot of observed vs. predicted prime editing efficiencies using reporters in a holdout test set. Points colored by the number of neighboring points. D) Scatter plot of Spearman’s ρ between chromatin feature scores and prime editing efficiencies, calculated separately for intragenic and intergenic reporters. E) Prime editing efficiency for gene-proximal reporters. Distance was scaled by gene length and binned. Negative values refer to synHEK3 sites located upstream of TSS. Values >100% refer to synHEK3 sites located downstream of transcription termination site (TTS). Points colored based on expression levels (log10) of the genes. TPM, transcripts per million. F) Genome browser views of the 4 most highly editable sites. Sites of integration and measured editing efficiencies are shown as a dot plot at top and aligned with selected epigenetic tracks. For each synHEK3 insertion, editing efficiency, number of reads with edit (numerator), and total number of reads (denominator) are annotated. The dashed vertical lines mark locations of the insertions. G) Scatter plot of sequence-based prediction (DeepPrime, x-axis) vs. normalized editing rate (y-axis, log10 scale) for epegRNAs designed for prime editing at endogenous genomic sites. H) Scatter plot of chromatin-based prediction (our model, x-axis) vs. normalized editing rate (y-axis, log10 scale). I) Scatter plot of combined score (x-axis) vs. normalized editing rate (y-axis, log10 scale).
We examined correlations between chromatin feature scores34 and prime editing efficiencies across synHEK3 genomic locations. The chromatin feature scores of H3K79me2, POLR2AS2, H3K9ac and HDAC2 were most positively correlated, while H3K9me3 was most negatively correlated (Supplementary Fig. 2A,B). When we binned synHEK3 reporters based on each chromatin feature score, the same positive features were strongly associated with the proportion of sites that were highly edited (>25%; Fig. 2B).
Among all features examined, H3K79me2 levels were most strongly correlated with prime editing efficiency (Spearman’s ρ = 0.56; Supplementary Fig. 2A,B). The lowest and highest deciles of H3K79me2 scores exhibited median efficiencies of 1.7% and 54.5% respectively, a 32-fold difference (Supplementary Fig. 2B). Upon training a beta regression model based on the chromatin feature scores, H3K79me2 (p = 1.47 × 10−12) and H3K9me3 (p = 1.15 × 10−14) were the best positive and negative predictors of efficiency, respectively (Table S2). This model accurately predicted prime editing efficiencies on a holdout test set (Spearman’s ρ = 0.62, Pearson’s r= 0.67, p < 2.2 × 10−16 for both; Fig. 2C).
Surprisingly, a small fraction (11%) of H3K79me2-high sites (top 10%) had near-zero editing efficiencies (Supplementary Fig. 2A). We surmised that some of these underedited outliers might be due to a technical artifact, wherein some clones lose expression of PE2 due to excision during synHEK3 integration, as both were integrated via piggyBac. To check this, we engineered two new pools of synHEK3 reporters in wild-type K562 cells, mapped reporter locations, and measured editing efficiencies with transiently expressed PE2 (Table S3). Consistent with our hypothesis, a smaller fraction of H3K79me2-high sites had near-zero editing efficiencies (4.7% vs. 11%). Importantly, chromatin features predicting high prime editing efficiencies were similar, and the beta regression model trained on the original data accurately predicted the editing efficiencies of these two independently generated sets of synHEK3 reporters (Spearman’s ρ = 0.71, Pearson’s r= 0.70, p < 2.2 × 10−16 for both; Supplementary Fig. 2C,D).
In mammalian cells, H3K79 methylation is solely deposited by DOT1L in a process coupled to RNA Pol II transcriptional elongation35–37. As H3K79me2 is overwhelmingly intragenic, a global analysis might mask differences in the interaction of chromatin and prime editing in intragenic vs. intergenic regions. We therefore repeated the correlation analysis and beta regression modeling on intragenic and intergenic synHEK3 reporters separately (Fig. 2D; Table S2). These analyses showed that H3K79me2’s value as a positive predictor is restricted to intragenic regions of the genome, while H3K9me3 is a strong negative predictor of prime editing efficiency in both intragenic and intergenic regions.
The ChIP-seq signal of H3K79me2 is more spread out and downstream-biased than that of H3K4me3, a marker of promoters (Supplementary Fig. 2E). We hypothesized that prime editing is most efficient in transcriptionally active regions, and particularly so immediately downstream of active TSS. To evaluate this, we stratified synHEK3 reporters by distance from the nearest TSS as well as the mRNA expression level of the overlapping or nearest gene. Prime editing efficiencies were indeed correlated with both expression levels and TSS proximity (Fig. 2E; Supplementary Fig. 2F–G). In addition, we identified a weak correlation between prime editing efficiency and transcript orientation, as synHEK3 sites on the same strand as the coding strand of transcription exhibited slightly higher editing efficiencies (p = 0.05; Supplementary Fig. 2H).
To visualize the contrasting epigenetic environments, we prepared genome browser views of the top 4 highly editable synHEK3 sites (90–94% PE) and 4 poorly editable synHEK3 sites (~1% PE) (Fig. 2F; Supplementary Fig. 2I). The top sites are intragenic, in highly expressed genes, and within 3.5 kb of the TSS. The poorly editable examples are also intragenic, but within unexpressed genes and in some cases hundreds of kilobases from the TSS. Similar visualizations of the epigenetic environments of all synHEK3 sites from this experiment, together with their efficiencies, can be browsed at [link].
Impact of chromatin environment on prime editing in diverse endogenous target sites
To ask whether learnings based on synHEK3 reporters generalize, we designed epegRNAs with DeepPrime19 to install 3-bp CCT insertions at 121 endogenous genomic target sites and tested these in K562 cells (predicted efficiencies 65–85%; Table S1,S4). Interestingly, the resulting editing efficiencies were uncorrelated with DeepPrime scores, probably because they were all designed for high efficiency (Spearman’s ρ = 0.03, p = 0.74, Pearson’s r = 0.05, p = 0.62; Fig. 2G). However, despite variation in guide and target sequences, editing efficiencies were well-predicted by the synHEK3-derived beta regression model (Spearman’s ρ = 0.56, p = 4.6 × 10−9, Pearson’s r = 0.52, p = 5.0 × 10−8; Fig. 2H; Supplementary Fig. 2J). Combining the DeepPrime score with our model did not further increase prediction accuracy (Spearman’s ρ = 0.57, p = 1.2 × 10−9, Pearson’s r = 0.53, p = 2.0 × 10−8; Fig. 2I). We conclude that the epigenetic factors shaping prime editing efficiencies at synHEK3 reporters also apply to endogenous target sites. Furthermore, a prediction model based on chromatin features can be used to rank the outputs of sequence-based pegRNA design tools12,18,19.
Differential position effects on Cas9 vs. prime editing
To the extent that the wide range of prime editing efficiencies at integrated synHEK3 reporters is due to differential accessibility for the prime editor, we would expect Cas9 editing to be similarly affected. We therefore transfected the K562 synHEK3 reporter pool with Cas9 RNP bearing a gRNA with the same HEK3-targeting spacer, and compared editing outcomes for prime vs. Cas9 editing across an identical set of genomic locations. Cas9 editing was more efficient than prime editing. Frequent indels were observed as early as Day 1. 75% of reporters had indel frequencies >90% by Day 4, and Cas9 successfully edited synHEK3 reporters that were resistant to prime editing (Fig. 3A; Supplementary Fig. 3A). As expected, synHEK3 reporters near highly expressed genes were more frequently edited by Cas9. Reporters immediately downstream of TSSs of highly expressed genes had higher indel frequencies at Day 1, but the differences were negligible at later time points (Supplementary Fig. 3B).
Figure 3. Comparison between Cas9 and prime editing, leveraging a common set of integrated reporters.

A) Comparison of editing efficiencies for Cas9 (Day 1, 2 or 4 after transfection) vs. prime editing (Day 4 after transfection) at an identical set of synHEK3 reporters. Plots colored based on the number of reporters assigned to each bin. 1-D histograms of editing efficiencies plotted at top and right. B) Hierarchical clustering of synHEK3 reporters based on prime and Cas9 editing efficiencies. C) Density plot of prime and Cas9 editing efficiencies for 14 groups of synHEK3 reporters, ordered by mean prime editing efficiency. D) Bar graph of the log2 ratio between number of intragenic vs. intergenic sites in each of the 14 groups. E-G) Comparison of properties of intragenic sites in selected groups. P-value: two-sided Kolmogorov–Smirnov test. E) Boxplot of ATAC-seq scores of selected groups. F) Expression levels of the overlapping genes in TPM of selected groups (x-axis; log10 scale). G) Distance (bp) between the synHEK3 reporters in selected groups and the nearest TSS (x-axis; log10 scale).
To explore differences, we clustered synHEK3 reporters into 6 groups based on measured efficiencies of prime and Cas9 editing (Fig. 3B). The reporters in each group naturally clustered in PCA plots of chromatin feature scores, suggesting that the epigenetic environment shapes the editing efficiencies exhibited by members of each group (Supplementary Fig. 3C,D). We further clustered the larger, more highly edited groups (Groups 3–6) into subclusters, resulting in 14 clusters overall (Fig. 3B; Supplementary Fig. 3E).
While prime and Cas9 editing efficiencies were generally correlated, there were a few pairs of groups whose properties were informative beyond that correlation: 1) those highly edited by both prime and Cas9 editing (Groups 6.1 & 6.2); 2) those highly edited by Cas9 but poorly by prime editing (Groups 4.2 & 4.3); and 3) those poorly edited by both prime and Cas9 editing (Groups 1.0 & 2.0) (Fig. 3C).
Groups 6.1 and 6.2 exhibited similarly high Cas9 editing (mean: 96.1% vs. 96.8%, p = 0.1), but Group 6.2 exhibited significantly higher prime editing (mean: 69.3% vs. 82.5%, p = 6.6 × 10−16). Both Group 6.1 and Group 6.2 are overwhelmingly intragenic (Fig. 3D) and similarly accessible (p = 0.63; Fig. 3E). However, Group 6.2 sites are more highly expressed than Group 6.1 sites, suggesting that higher levels of transcription specifically promote prime editing (p = 2.8 × 10−3; Fig. 3F).
Groups 4.2 and 4.3 exhibit high levels of Cas9 RNP editing (mean: 86.4% and 89.9%) but low levels of prime editing (mean: 17.1% and 2.2%). Many synHEK3 sites in Group 4.3 exhibited near-zero efficiency for prime editing, which we attributed to the technical artifact of piggyBac excision of PE2 construct, discussed above. However, Group 4.2 sites tended to be intragenic and accessible (Fig. 3D,E), but lowly expressed (p = 8.3 × 10−14 for Group 4.2 vs. 6; Fig. 3F) and far from any TSS (p = 4.1 × 10−11 for Group 4.2 vs. 6; Fig. 3G). The fact that Group 4.2 sites are amenable to Cas9 editing but not prime editing suggests that both Cas9 and prime editing benefit from chromatin accessibility, but more active transcription specifically promotes prime editing.
The most poorly editable sites fell into Groups 1.0 and 2.0 (Fig. 3C). Both exhibited very low prime editing efficiencies (mean: 0.8% vs. 2.2%, p = 9.3 × 10−9), but Group 1.0 had substantially lower Cas9 editing (mean: 25.8% vs. 38.3%, p < 2.2 × 10−16). We suspect Group 1.0 corresponds to constitutive heterochromatin marked by higher H3K9me3 that remains silenced throughout the cell cycle and development38. In contrast, Group 2.0 is marked by higher H3K27me3 and might correspond to facultative heterochromatin, which is silenced upon differentiation39,40 (Supplementary Fig. 3F).
Finally, we examined frequencies of alleles resulting from Cas9 editing inferred to derive from non-homologous end joining (NHEJ) vs. microhomology-mediated end joining (MMEJ) repair pathways. We found MMEJ usage was significantly higher in synHEK3 sites near H3K27me3 and H3K9me3, consistent with observations by Schep et al.24 (Supplementary Fig. 3G). Moreover, MMEJ usage increased as editing efficiencies decreased, with Group 1.0 sites exhibiting the highest MMEJ / (MMEJ + NHEJ) ratios (median: 0.54; Supplementary Fig. 3H). Finally, within genes, MMEJ usage steadily increased along the gene body (Supplementary Fig. 3I,J), suggesting a gradient effect posed by chromatin on DNA break repair kinetics such that faster repair mechanisms are favored immediately downstream of TSSs24,41. We speculate that a similar TSS-distance gradient may affect which repair mechanisms ultimately resolve prime editing-induced single strand breaks (SSBs) and intermediate structures.
Investigating the cis-chromatin context-dependencies of trans-acting factors influencing prime editing
DNA damage repair (DDR) is tightly regulated by local chromatin context42. To explore mechanisms mediating chromatin context-specific regulation of prime editing, we sought to develop a multiplex perturbational framework with which we could capture differences in prime editing outcomes in various chromatin environments upon knocking down various DDR-relevant factors. Potential strategies for this goal include Repair-seq13 or well-based screening of pre-integrated DNA repair reporters43. However, the former averages over genomic contexts, while the latter does not scale effectively. We therefore designed a DDR-focused genetic screen, in which perturbations are coupled to outcomes at pre-integrated synHEK3 reporters with single cell molecular profiling44–46. To co-profile the perturbation(s) received by a given cell and prime editing outcomes at its synHEK3 reporters, we sought to use T7 in situ transcription (IST)47,48 to drive RNA production from both perturbational and reporter constructs in fixed, permeabilized nuclei, and to then capture these molecules with sci-RNA-seq3, a scalable method for single cell transcriptional profiling49.
We focused on 22 and 28 uniquely barcoded synHEK3 reporters in two monoclonal K562 lines derived from the original pool (‘clone 3’ and ‘clone 5’), whose integration sites spanned a range of chromatin environments (Fig. 4A; Table S3). Prime editing efficiencies in these monoclonal lines were highly correlated with those estimated for the same reporters in the original polyclonal population (Supplementary Fig. 4A).
Figure 4. Dissecting chromatin context-dependent regulation of prime editing using a modified sci-RNA-seq3 workflow.

Experimental workflow of the pooled shRNA screen. A) The two monoclonal K562 lines used in this experiment stably expressed PE2 and reverse tetracycline transactivator (rtTA), and together contained 50 unique synHEK3 reporters. Cells were transduced with the TRE-shRNA library at a high multiplicity of infection (MOI) and treated with doxycycline. On Day 2, cells were transfected with pegRNAs to introduce random 6-bp insertions at synHEK3 reporters. After 3–4 days, nuclei were extracted and fixed. TRE: tetracycline response element. B) Fixed nuclei were subjected to IST with T7 polymerase (pink circle) to produce transcripts from synHEK3 and shRNA constructs. C) Nuclei were distributed to 96-wells for indexed RT. In each well, a cocktail of three indexed RT primers were used: oligo-dT primers, and synHEK3- and shRNA-specific primers. D) After RT, nuclei were pooled and redistributed into 96-well plates for indexed hairpin ligation. Then, they were pooled and split to final 96-well plates. After second-strand synthesis, nuclei were lysed and the resulting lysates were split to two plates. One plate underwent Tn5 tagmentation and indexed PCR to generate a transcriptome library. The other plate was used for indexed enrichment PCR targeting the synHEK3 and shRNA transcripts. E) For each synHEK3 reporter, editing outcomes were computed and compared between cells with vs. without a specific shRNA.
To perturb trans-acting factors, we chose a doxycycline (Dox)- inducible short haripin RNA (shRNA) system, which can trigger a potent knockdown effect at single copy and is orthogonal to prime editing50,51. We constructed a library of 304 shRNAs against 76 genes, including 74 DDR-related genes (10 unexpressed) and 2 luciferase genes (Table S5). The DDR-related, expressed genes comprised hits found by Repair-seq13,52, genes in other major DDR pathways, and epigenetic factors involved in H3K79me2 metabolism (Supplementary Fig. 4B). To enable post-fixation enrichment and targeted capture of the shRNA transcripts, we modified the lentiviral construct to contain a T7 promoter upstream of the shRNA and a RT primer binding site (PBS) (Supplementary Fig. 4C).
We chose to model 6-bp insertions for several reasons. First, random insertions minimized potential biases due to the insertion sequence18. Second, 6-bp insertions achieved high baseline editing efficiencies (~20% across 50 synHEK3 sites), increasing signal-to-noise. Third, the random 6-mers serve as unique molecular identifiers (UMIs) for counting unique prime editing events at each integration site from sequencing data.
To co-capture perturbations and prime editing outcomes, cells were profiled via a modified sci-RNA-seq3 workflow (Fig. 4A). The first modification occurred before the RT step, in that we performed T7 IST on methanol-fixed nuclei to drive RNA expression from both synHEK3 reporters and the shRNA construct (Fig. 4B). Second, at the RT step, a primer cocktail targeting mRNAs (oligo-dT), synHEK3 and shRNA transcripts was used to simultaneously capture the transcriptome, editing outcome, and perturbation(s) in a given cell (Fig. 4C). Finally, towards the end of the protocol, nuclear lysate was split; half was subjected to conventional sci-RNA-seq3 library preparation, and half to enrichment PCR targeting the synHEK3 and shRNA-derived transcripts (Fig. 4D).
For each experiment, three libraries were sequenced, corresponding to the cellular transcriptome, shRNA integrants and synHEK3 integrants (Supplementary Fig. 4D). Cellular transcriptomes were used to determine high-quality cells, and captured shRNA and synHEK3 transcripts were assigned to these cells based on matching combinatorial indices. Thus, for each profiled cell, we obtained the identity of shRNA(s) present in that cell as well as the editing outcome at each of its 22 (clone 3) or 28 (clone 5) synHEK3 sites. We then applied a quantitative trait locus-style analysis to this matrix, for each synHEK3-shRNA pair, comparing the editing frequency at the synHEK3 reporter for cells that did vs. did not receive the shRNA (Fig. 4E).
Identification of trans-acting regulatory factors of prime editing
We recovered 667,810 high-quality cells as assessed by complexity of the cellular transcriptome. SynHEK3-derived reads assigned to the two clones were well-separated, indicating a low collision rate (Supplementary Fig. 5A). After quality control filters, we retrieved 229,657 cells (105,583 and 124,074 for clones 3 and 5, respectively), for which at least one shRNA and one synHEK3 integrant were detected. For clones 3 and 5, respectively, the median numbers of cells per shRNA were 1,604 and 2,208, shRNAs detected per cell were 4 and 5, and synHEK3 BCs detected per cell were 7 and 11 (Supplementary Fig. 5B). To determine whether we could reliably estimate editing efficiency with this approach, we sampled a small fraction of the cells in this screen and performed amplicon sequencing of the synHEK3 reporters. Prime editing efficiencies estimated from single cell data were almost perfectly correlated with those estimated by bulk amplicon sequencing (Supplementary Fig. 5C).
To detect shRNA-mediated knockdown effects on prime editing outcomes, we calculated and compared editing efficiencies in cells with vs. without a given shRNA (binomial test). Across all synHEK3-shRNA combinations in the two monoclonal lines, we observed a clear excess of significant synHEK3-shRNA pairs compared to pairs involving control shRNAs (Fig. 5A). 337/7,168 and 250/5,632 synHEK3-shRNA pairs were significant at a false discovery rate (FDR) of 5% in clone 5 and clone 3, respectively. This excess of highly significant p values does not appear to be driven by differences in the number of cells associated with candidate-targeting vs. control shRNAs (Supplementary Fig. 5D).
Figure 5. Effects of perturbing MMR-related genes on prime editing.

A) Q-Q plot of statistical significance (-log10) of synHEK3-shRNA pairs in clones 3 (left) and 5 (right). Candidate shRNAs (green) and control shRNAs (gray) are plotted separately. B) Plots of adjusted p values (-log10) of all synHEK3-shRNA pairs. Target genes with high statistical significance are annotated. Points colored by editing efficiency changes caused by corresponding shRNAs. C) Effects of shRNAs targeting MMR-related genes in clone 5. Log2 fold-changes of prime editing efficiencies of synHEK3-shRNA pairs are plotted and colored by their corresponding adjusted p values (-log10). D) Effects of shRNAs against MLH1 and PMS2. Pink lines: editing frequencies in cells with individual shRNAs; red line: mean editing frequencies of the gene-targeting shRNAs; light blue lines: control editing frequencies for individual shRNAs (not visible because low variance relative to mean line, shown in blue); blue line: mean control editing frequencies.
shRNAs targeting several genes were both highly significant and reproducible across the two clones (Fig. 5B). The most prominent targets include major components of the MMR pathway (PMS2 and MLH1), previously reported to influence prime editing outcomes13,14. To better visualize these effects, we consolidated synHEK3-shRNA pairs targeting each MMR-related factor (Fig. 5C; Supplementary Fig. 5E). The strongest prime editing-promoting effects were observed for shRNAs against PMS2 and MLH1 (homologs of bacterial MutLα; Fig. 5D), which form a heterodimer and coordinate multiple repair steps after mismatch recognition53. Knocking down FEN1, a 5’ DNA flap endonuclease, led to strong suppression of prime editing (Supplementary Fig. 5F), consistent with its reported role13. Perturbing the bacterial MutSα homologs MSH2 and MSH653 did not yield consistent effects, with MSH6 even leading to a slight decrease in prime editing efficiency in clone 5. MSH2/MSH6 dimer recognizes 1–2-bp mismatches, while MSH2/MSH3 dimer detects longer indels (>2 bp). We speculate that knocking down MSH6 releases sequestered MSH2, allowing for more efficient detection and correction of the 6-bp insertions. The mild effect for MSH2 might suggest a different mechanism used in recognizing 6-bp insertions. Knocking down other MMR-related factors (e.g. EXO1, LIG1, SSBP1, MSH4) did not yield consistent effects (Fig. 5C; Supplementary Fig. 5E).
In addition to MMR genes, targeting EP300 decreased prime editing efficiencies across all sites (Supplementary Fig. 5G). EP300 encodes the p300 transcriptional coactivator54 and its direct involvement in DDR remains elusive. We cannot rule out the possibility that its effects here result from downregulation of global transcriptional output including expression of the prime editor complex.
To our surprise, knocking down the methyltransferase (DOT1L) and demethylase (KDM2B)55 of H3K79 did not lead to significant changes in the prime editing efficiencies (Supplementary Fig. 5H). To confirm this result, we performed both bulk RNA sequencing and measurements of prime editing efficiencies in synHEK3-bearing K562 cells treated with EPZ-5676, a potent DOT1L inhibitor56. Expression levels of genes containing synHEK3 insertions in the single cell screen were not significantly affected, despite massive expression changes induced by EPZ-5676 (Supplementary Fig. 5I). There was a global reduction in prime editing efficiency, but no significant relationship between changes in prime editing efficiencies and baseline H3K79me2 levels at intragenic synHEK3 reporters nor gene expression changes (Supplementary Fig. 5J). These results suggest that H3K79me2 is a bystander of high prime editing efficiency, rather than playing a direct role.
HLTF works as a chromatin context-dependent repressor of prime editing
Another strong hit in the screen was HLTF, a member of the SWI/SNF2 family that contains an E3 ubiquitin ligase domain. HLTF mediates polyubiquitination of PCNA, which promotes replication through DNA lesions in an error-free manner57. A previous report has shown knocking down HLTF led to a slight decrease in the frequency of prime editing-mediated single-nucleotide substitutions13. In contrast, we observed strong upregulation of 6-bp insertions, comparable to changes induced by shRNAs against MLH1 and PMS2 (Fig. 6A), and suggesting differences in mechanisms involving HLTF in the repair of short vs. longer mismatches.
Figure 6. Chromatin context-specific response to HLTF inhibition.

A) Violin plot of fold-changes of editing efficiency of synHEK3 sites in response to inhibition of HLTF, MLH1 and PMS2. Points colored by shRNA identity. B) Heatmap of synHEK3 reporters (row) and their responses to shRNAs against HLTF (left: clone 5; right: clone 3). Leftmost bar annotates the overlapping status of synHEK3 reporters with GRCh38 gene annotations. Second left bar indicates the expression status of the overlapping or nearest gene in TPM. Third left bar indicates distances (bp) to corresponding TSS of gene-overlapping synHEK3 reporters or nearest genes for synHEK3 reporters outside genes. Middle heatmap was generated using scaled chromatin feature scores of synHEK3 reporters and clustered by column. Right line plot shows effects of shRNAs against HLTF. Pink lines: editing frequencies in cells with individual shRNAs; red line: mean editing frequencies of the gene-targeting shRNAs; light blue lines: control editing frequencies for individual shRNAs (not visible because low variance relative to mean line, shown in blue); blue line: mean control editing frequencies. Dashed lines: sites showing differential response to HLTF inhibition. C) Bar plot of synHEK3 reporter counts based on their responsiveness to HLTF inhibition and overlapping status with GRCh38 gene annotations. P value: Fisher’s exact test. D) Expression of genes overlapping or proximal (within 10 kb) to a synHEK3 reporter. Genes above the dashed line (TPM = 3) are considered expressed. P value: two-sided Kolmogorov–Smirnov test.
Inspection of individual synHEK3 insertion sites revealed heterogeneous responses to HLTF inhibition (Fig. 6B; Supplementary Fig. 6A). Due to editing frequencies being bounded on [0, 1], there is an inverse correlation between baseline editing efficiencies and fold-changes in responses to perturbations (Supplementary Fig. 6C). Therefore, we focused on synHEK3 reporters with intermediate baseline editing frequencies (0.2–0.9). In clone 5, 4/21 of these sites showed much weaker increases in editing frequencies compared to reporters with similar starting editing frequencies. In clone 3, reporters were overall less responsive to HLTF inhibition, but 5/14 sites showed stronger and statistically significant upregulation of editing frequencies (Supplementary Fig. 6A,D). Sites that were less responsive to HLTF inhibition showed slightly higher response to shRNAs against PMS2, an effect more obvious in clone 5 (Supplementary Fig. 6B). We grouped the selected synHEK3 reporters based on their responsiveness to HLTF shRNA knockdown and counted their overlapping status with annotated genes. The responsive group was enriched for gene-overlapping sites (5.4-fold, Fisher’s exact test p = 0.033; Fig. 6C). Furthermore, when taking expression status of overlapping or nearby genes (within 10 kb) into consideration, responsive sites tended to be near actively transcribed genes, while the unresponsive sites were mostly in non-transcribing regions (9.2-fold mean gene expression difference; p = 0.021; Fig. 6D). We validated these observations in cells transduced with individual shRNAs (Supplementary Fig. 6E).
HLTF might suppress prime editing in transcribed regions through its role in epigenetic regulation or alternatively through its role in DNA repair. To distinguish between these possibilities, we performed bulk RNA-seq and ATAC-seq in cells expressing shRNAs targeting HLTF (shHLTF.2367, shHLTF.2623). However, we observed minimal changes upon HLTF knockdown, including at genes and ATAC peaks near HLTF-responsive synHEK3 sites (Supplementary Fig. 6F,G; Table S6). This result suggests that the context-specific effect of HLTF on prime editing is not mediated through its role in chromatin remodeling.
Modulating prime editing outcomes by targeted epigenetic reprogramming
Several lines of evidence presented here suggest that active transcriptional elongation enhances prime editing. First, H3K79me2 is deposited by DOT1L, which is part of the RNA Pol II transcriptional elongation complex. H3K79me2 is strongly predictive of intragenic prime editing efficiency. At the same time, shRNA-mediated knockdown of DOT1L failed to appreciably alter prime editing efficiencies, a result corroborated by pharmacological inhibition of DOT1L. Second, intragenic prime editing efficiencies were also correlated with transcription levels, proximity to TSS, and, more weakly, transcript orientation. Third, our analyses of the differential outcomes of prime and Cas9 editing on identical synHEK3 reporters suggested that while chromatin accessibility may enhance both prime and Cas9 editing, higher levels of transcription and TSS proximity appear to specifically promote prime editing.
We next sought to exploit the apparent relationship between active transcriptional elongation and prime editing to modulate the efficiency of prime editing. We chose three intronic synHEK3 reporters identified in the monoclonal line, clone 5, used above. These reporters were in the genes USP7, METTL2A and LRRC8C (Supplementary Fig. 7A). We transiently transfected these cells with CRISPRoff-v2, an epigenetic editing tool58, and pairs of gRNAs to silence these genes’ promoters, and subsequently measured prime editing efficiency across synHEK3 reporters (Fig. 7A). After calibrating to synHEK3 reporters outside of target genes (Fig. 7B; Table S7), we observed 39%, 26% and 47% reductions in editing frequencies of synHEK3 reporters in USP7, METTL2A and LRRC8C, respectively (Fig. 7C). These reductions are likely direct effects of gene silencing, as other synHEK3 reporters were not affected and RNA-seq indicated highly specific target gene silencing (Supplementary Fig. 7B). Interestingly, the magnitude of editing efficiency reductions were on par with the reductions in gene expression induced by CRISPRoff (53%, 30% and 55%, respectively).
Figure 7. Modulating prime editing outcomes by epigenetic conditioning.

A) Workflow of the CRISPRoff experiment. B) Scatter plots of mean prime editing efficiencies of synHEK3 reporters in cells transfected with CRISPRoff gRNAs targeting USP7, METTL2A and LRRC8C promoters. SynHEK3 reporters in corresponding target genes are labeled. Error bars correspond to standard deviation of measured editing efficiencies. C) Bar plot of prime editing efficiency changes of synHEK3 reporters in CRISPRoff experiment. Control editing efficiencies are predicted efficiencies using linear models trained in synHEK3 reporters that are not in the corresponding CRISPRoff target genes as shown in panel B. D) Workflow of the CRISPRa experiments. E) Prime editing efficiency (%) at endogenous gene targets in K562 cells, with or without epigenetic editing via CRISPRa. Green bars show mean prime editing efficiencies in a wild-type K562 cell line which received only PEmax and (e)pegRNA. Gray bars show mean prime editing efficiencies when control promoters were activated via CRISPRa. Blue bars show mean prime editing efficiencies when target gene promoters were activated. Fold-changes are calculated between the CRISPRa (blue) and control (gray) groups. Inset shows a zoomed view of the first three genes. F) Prime editing efficiency (%) at endogenous gene targets in a human iPSC line, with or without epigenetic editing via CRISPRa. Gray bars show mean prime editing efficiencies when control promoters were activated via CRISPRa. Blue bars show mean prime editing efficiencies when target gene promoters were activated. G) Scatter plots of mean prime editing efficiencies measured in control vs. CRISPRa cells in exons of IL2RB. Points colored by edit type. x- and y-axes in log10 scale. The pink and red lines indicate 2-, 5- and 10-fold differences between the CRISPRa and control groups. H) Boxplot of prime editing efficiency fold-change for all variants. The dashed line indicates a fold-change of 2.
Enhancing prime editing efficiency by transient gene activation
Encouraged by these results, we next examined the possibility of increasing intragenic prime editing efficiency by preceding it with transient gene activation by CRISPRa59. For these experiments, we sought to enhance prime editing of the endogenous genome, rather than synHEK3 reporters. We initially selected four target genes, CXCR4, IL2RB, EGFR and CDKL5, and used prime editing to introduce a different substitution per target, some clinically relevant60–62. We first transfected a K562 cell line stably expressing the dCas9-VP64 fusion63 with a pair of gRNAs (2XMS2) targeting the target gene’s promoter, along with an MCP-p65-Rta fusion protein. Then, 2 days later, we transfected the cells with PEmax and pegRNA to program the desired mutations (Fig. 7D). For three of the targets, we observed increases in prime editing efficiency, ranging from 40% (EGFR, p = 0.076) to 21.9-fold (CXCR4, p = 0.001) (Supplementary Fig. 7C–E; Table S7).
A potential limitation is that these pegRNAs had low baseline efficiencies. Focusing on a subset of previous targets and some new targets that could be strongly upregulated by CRISPRa (Supplementary Fig. 7D), we designed new (e)pegRNAs targeting them with DeepPrime. The newly designed (e)pegRNAs had baseline editing efficiencies ranging from 0.5% to 13.5%. After CRISPRa of corresponding promoters, we observed 2.9-to 16.5-fold upregulation in prime editing efficiencies (Fig. 7E; Table S7). The newly designed pegRNA for IL2RB had the highest baseline efficiency (13.5%), which was boosted to 40% (2.9-fold) by CRISPRa. Our best post-CRISPRa efficiency was for a target in CREB3L3, and was boosted from 4% to 66% (16.5-fold). Of note, we measured editing efficiencies of the same (e)pegRNAs in a wild-type K562 cell line and observed similar baselines, suggesting that prime editing is not suppressed by the CRISPRa machinery (Fig. 7E). To test whether this strategy was effective in a different cellular context, we preceded prime editing of CXCR4, WNT3A and EGFR with CRISPRa in human iPSCs64, and observed 7.8-fold, 1.7-fold and 1.5-fold increases in editing, respectively (Fig. 7F; Table S7).
To evaluate whether this strategy could facilitate various types of edits, we designed 7 libraries of epegRNAs to install 19 different variants within 7 different exons of IL2RB, including all possible substitutions, small insertions and deletions, and a long insertion (Supplementary Fig. 7F,G; Tables S1,S7). Preceding prime editing with CRISPRa resulted in 1.6- to 5.8-fold increases in editing efficiency for the 7 exons (Supplementary Fig. 7H). The epegRNAs targeting exon 1 of IL2RB were the least responsive to CRISPRa, potentially due to the CRISPRa and editing target sites being only 300 bp apart. The installation of small insertions (1, 3, 6 bp) were most responsive to CRISPRa (median fold-changes ranging from 6.6- to 10.2-fold), followed by other edit types (Fig. 7G,H). Among single nucleotide substitutions, G→A and A→G edits were most responsive to CRISPRa (median fold-changes of 5.3-fold and 4.7-fold, respectively; Fig. 7H). We conclude that epigenetic reprogramming strategy is effective for enhancing the prime editing efficiencies of all edit types.
Taken together, these experiments reinforce the strong link between transcriptional activity and prime editing outcome. They also demonstrate the feasibility of enhancing prime editing outcomes via transient activation of the gene to which edits are being introduced.
DISCUSSION
At the core of this study is a simple T7 promoter-bearing reporter construct and straightforward protocol that leverages T7 IVT and IST to allow classic questions surrounding chromatin position effects to be tackled in either bulk or single-cell format. When applied in bulk, T7 IVT enabled near-complete mapping of densely integrated reporters and measurement of position-dependent regulatory effects on prime editing. When incorporated into single-cell RNA-seq protocol, T7 IST enabled co-profiling of non-transcribing genomic constructs. By further incorporation into an shRNA genetic screen, the interaction between trans-acting factors and genome editing could be stratified by chromatin context. Of note, the method is straightforward to adapt, requiring only the inclusion of a 19-bp T7 promoter to any reporter construct, such that it has the potential to make capturing precise genomic coordinates routine for any bulk or single cell assay in which reporter or effector constructs are randomly integrated.
We measured prime editing efficiency at over 4,000 genomic locations, and quantified the correlation of 23 chromatin features with those efficiencies. Through beta regression modeling, we show that H3K79me2 is the best predictor of efficacious prime editing, but multiple lines of evidence suggest that this is probably due to its strong correlation with active transcriptional elongation rather than a direct effect. Importantly, this same beta regression model, based entirely on position effects on prime editing of the synHEK3 reporter, successfully predicts the relative editing efficiencies of high-quality epegRNAs targeting endogenous genomic target sites, providing value that is orthogonal to sequence-based prediction tools.
Examining the intersection of trans-acting factors with the cis-chromatin environment in shaping prime editing, we identified HLTF as a context-dependent repressor of prime editing, as knocking it down preferentially enhanced prime editing at sites undergoing active transcription. Although we cannot rule out a mechanism involving some unknown function of HLTF, our observations more plausibly relate to HLTF’s role in DNA repair57,65 than its role in chromatin remodeling. HLTF has previously been shown to repress PE3 but not PE213. Taken together with our observations here, we hypothesize HLTF might be unevenly distributed along chromatin or preferentially recognize special SSB repair intermediates. Actively transcribed regions of the genome might be under intense scrutiny by HLTF such that DNA lesions can be corrected immediately by DDR to avoid accumulation of mutations in the transcribed genome over cell cycles. However, further studies, in particular biochemical profiling of HLTF’s transient interactions during prime editing of transcribed regions, will be necessary to elucidate the underlying mechanism.
The correlation between active transcriptional elongation and prime editing efficiency prompted us to explore whether manipulating the epigenetic context of a target site could alter the efficiency with which it was edited. Indeed, with CRISPRoff, we showed that targeted gene silencing reduces prime editing efficiency for intragenic targets, with effect sizes on par with the fold-reduction in gene expression. Conversely, prime editing efficiencies of endogenous, intragenic targets can be markedly enhanced by first “conditioning” the locus with CRISPRa. We validated the effectiveness of this approach for multiple loci, at a range of distances from the TSS, in both K562 and iPSC cell lines, and for all edit types.
Across target sites in 6 genes that were successfully upregulated with CRISPRa in K562 cells, we observed a mean 10.2-fold (median 8.4-fold) increase in prime editing efficiency (Fig. 7E), while across 133 combinations of edit types and target sites within the IL2RB locus, we observed a mean 4.1-fold (median 2.6-fold) increase (Fig. 7G). These effect sizes are comparable to that of MMR inhibition (mean 7.7-fold increase)13,14. Of note, the effectiveness of MMR inhibition is highly mutation-dependent, with certain types of edits (e.g. longer indels) being less sensitive. While more experiments are required to fully understand the edit biases of our own strategy, we showed all prime editing-mediated edit types are robustly enhanced by CRISPRa conditioning, e.g. 6-bp insertions in IL2RB exhibited mean 10.0-fold (median 10.2-fold) enhancement (Fig. 7H). The strategies are thus complementary, but given that this approach probably works through mechanisms orthogonal to MMR, they can potentially be combined.
In summary, we show that prime editing’s efficiency is strongly impacted by the cis-chromatin landscape, and promoted by active transcription in particular. Leveraging that insight, we further show how CRISPRa conditioning can be used to enhance the efficiency of intragenic prime editing, a strategy that may be useful in both basic research and therapeutic contexts. Finally, we note that the methods described here, in particular T7-assisted reporter mapping, may be generally useful for studying position effects on other chromatin-modulated processes.
Limitations of the Study
Several limitations and technical nuances merit discussion. First, it is likely that there is more to the epigenome than is captured by 23 chromatin features. We note that synHEK3 reporters in distal intergenic regions exhibit highly variable efficiencies (Supplementary Fig. 2K) and wonder if heterogeneity within heterochromatin beyond what is captured by ENCODE assays34 or, alternatively, higher-order structures (e.g. chromatin looping), might explain some of this.
Second, our initial experiment leveraged a single pegRNA introducing the same insertion to a constant target site in a single cell type, thereby neglecting potential interactions between the mutation type, target sequence, cis-chromatin environment, and cell type. Although downstream experiments suggest that our results generalize, there may be nuances that we have missed.
Third, in our shRNA single cell screen, the effect sizes of the perturbations on prime editing efficiencies were not as large as those measured with Repair-seq13. However, we do not attribute this to inefficient knockdown by shRNAs (Supplementary Fig. 6H). Instead, this observation likely reflects the physiological regulation of 6-bp insertions, as they tend to escape MMR and have higher baseline editing efficiencies (20%−90% for the visualized sites), which may mute effect sizes (Supplementary Fig. 6C).
Finally, a reasonable worry in applying epigenetic reprogramming to condition loci for prime editing is the possibility of interference between editors. We do not observe any consistent suppression of prime editing in cells constitutively expressing the CRISPRa machinery (Fig. 7E), but an anecdotal example (exon 1 of IL2RB) is consistent with steric interference if the CRISPRa and prime editing target sites are too close (Supplementary Fig. 7H). Using orthogonal (epi)genome engineering tools and/or transiently present CRISPR RNPs might further reduce the risk of cross-talk between different CRISPR-mediated modules.
STAR METHODS
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Jay Shendure (shendure@uw.edu).
Materials availability
All unique reagents generated in this study are available from the lead contact with a completed Materials Transfer Agreement.
Data and code Availability
All sequencing data have been deposited at GEO (GSE228465) and are publicly available. This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table. The UCSC trackhub created for visualizing results of this work is: https://shendure-web.gs.washington.edu/content/members/xyli10/public/nobackup/hub.txt.
This paper does not report original code.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Key Resources Table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Mouse monoclonal anti-β-actin | Sigma-Aldrich | Cat# A2228; Clone: AC-74; RRID: AB_476697 |
| Donkey anti-Rabbit IgG polyclonal antibody (IRDye 800CW) | LI-COR Biosciences | Cat# 926-32213; RRID: AB_621848 |
| Donkey anti-Mouse IgG polyclonal antibody (IRDye 680CW) | LI-COR Biosciences | Cat# 926-68072; RRID: AB_10953628 |
| Rabbit monoclonal anti-HLTF (E9H5I) | Cell Signaling Technology | Cat# 45965; RRID: AB_3094661 |
| Rabbit monoclonal anti-PMS2 (E9U4P) | Cell Signaling Technology | Cat# 27884; RRID: AB_3094662 |
| Mouse monoclonal anti-MLH1 (4C9C7) | Cell Signaling Technology | Cat# 3515; RRID: AB_2145615 |
| Bacterial and virus strains | ||
| NEB Stable Competent E. coli | New England Biolabs | Cat# C3040H |
| NEB 10-beta Electrocompetent E. coli | New England Biolabs | Cat# C3020K |
| Chemicals, peptides, and recombinant proteins | ||
| RPMI 1640 Medium | Gibco | Cat# 11875119 |
| DMEM, High Glucose | Gibco | Cat# 11965118 |
| FBS | Hyclone | Cat# SH30396.03 |
| Penicillin-Streptomycin (10,000 U/mL) | Gibco | Cat# 15140122 |
| mTeSR™ Plus kit | STEMCELL technologies | Cat# 100-0276 |
| Y-27632 | Stemgent | Cat# 04-0012-02 |
| Geltrex™ | Gibco | Cat# A1413302 |
| Trimethoprim | Sigma-Aldrich | Cat# 92131 |
| DNeasy Blood & Tissue Kit | QIAGEN | Cat# 69504 |
| HiScribe™ T7 High Yield RNA Synthesis Kit | New England Biolabs | Cat# E2040S |
| HiScribe® T7 Quick High Yield RNA Synthesis Kit | New England Biolabs | Cat# E2050 |
| RNase-Free DNase Set | QIAGEN | Cat# 79254 |
| TURBO™ DNase | Invitrogen | Cat# AM2238 |
| TRIzol LS Reagent | Invitrogen | Cat# 10296010 |
| RNeasy Mini Kit | QIAGEN | Cat# 74104 |
| Glycogen (5 mg/mL) | Invitrogen | Cat# AM9510 |
| SuperScript™ IV Reverse Transcriptase | Invitrogen | Cat# 18091050 |
| RNaseOUT™ Recombinant Ribonuclease Inhibitor | Invitrogen | Cat# 10777019 |
| Proteinase K | Thermo Scientific | Cat# EO0491 |
| AMPure XP Reagent | Beckman Coulter Life Sciences | Cat# A63882 |
| KAPA2G Robust HotStart ReadyMix PCR Kit | Roche | Cat# KK5702 |
| Power SYBR™ Green PCR Master Mix | Applied Biosystems | Cat# 4367659 |
| NEBuilder® HiFi DNA Assembly Cloning Kit | New England Biolabs | Cat# E5520S |
| NEBridge® Golden Gate Assembly Kit (BsmBI- v2) | New England Biolabs | Cat# E1602S |
| NEBridge® Golden Gate Assembly Kit (BsaI-HF® v2) | New England Biolabs | Cat# E1601S |
| BsmBI-v2 | New England Biolabs | Cat# R0739S |
| BsaI-HF®v2 | New England Biolabs | Cat# R3733S |
| FastDigest Esp3I (IIs class) | Thermo Scientific | Cat# FD0454 |
| I-SceI | New England Biolabs | Cat# R0694S |
| XhoI | New England Biolabs | Cat# R0146S |
| EcoRI-HF | New England Biolabs | Cat# R3101S |
| BamHI-HF | New England Biolabs | Cat# R3136S |
| XbaI | New England Biolabs | Cat# R0145S |
| NotI-HF | New England Biolabs | Cat# R3189S |
| T4 DNA ligase | New England Biolabs | Cat# M0202L |
| SF Cell Line 4D-Nucleofector™ X Kit | Lonza Bioscience | Cat# V4XC-2012 |
| SF Cell Line 96-well Nucleofector™ Kit | Lonza Bioscience | Cat# V4SC-2096 |
| P3 Primary Cell 96-well Nucleofector™ Kit | Lonza Bioscience | Cat# V4SP-3096 |
| ViraPower™ Lentiviral Packaging Mix | Invitrogen | Cat# K497500 |
| Lipofectamine™ 3000 Transfection Reagent | Invitrogen | Cat# L3000001 |
| PEG-it™ Virus Precipitation Solution (5×) | System Biosciences | Cat# LV810A-1 |
| Polybrene | Millipore | Cat# TR-1003-G |
| Geneticin | Gibco | Cat# 10131035 |
| Blasticidin S HCl (10 mg/mL) | Gibco | Cat# A1113903 |
| DEPC (Diethyl Pyrocarbonate) | Millipore Sigma | Cat# D5758-25ML |
| YOYO™-1 Iodide (491/509) - 1 mM Solution in DMSO | Invitrogen | Cat# Y3601 |
| dNTP mix | New England Biolabs | Cat# N0447L |
| NEBNext® Ultra™ II Non-Directional RNA Second Strand Synthesis Module | New England Biolabs | Cat# E6111L |
| Protease | QIAGEN | Cat# 19157 |
| Buffer EB | QIAGEN | Cat319086 |
| Tn5 transposase | Diagenode | Cat# C01070010-20 |
| BSA | New England Biolabs | Cat# B9000S |
| Monarch® DNA Gel Extraction Kit | New England Biolabs | Cat# T1020S |
| NEBNext® High-Fidelity 2X PCR Master Mix | New England Biolabs | Cat# M0541L |
| OneTaq® Hot Start 2X Master Mix with Standard Buffer | New England Biolabs | Cat# M0484L |
| SYBR™ Green I Nucleic Acid Gel Stain | Invitrogen | Cat# S7563 |
| Alt-R™ S.p. Cas9 Nuclease V3 | Integrated DNA Technologies | Cat# 1081059 |
| Alt-R® Cas9 Electroporation Enhancer, 2 nmol | Integrated DNA Technologies | Cat# 1075915 |
| Critical commercial assays | ||
| TruSeq RNA Library Prep Kit v2 | Illumina | Cat# RS-122-2001 |
| TruSeq Stranded mRNA kit | Illumina | Cat# 20020594 |
| Illumina TruSeq RNA UD Indexes | Illumina | Cat# 20023785 |
| NextSeq 1000/2000 P2 Reagents (100 Cycles) v3 | Illumina | Cat# 20046811 |
| NextSeq 500/550 Mid Output Kit v2.5 (150 Cycles) | Illumina | Cat# 20024904 |
| MiSeq Reagent Kit v2 (300-cycles) | Illumina | Cat# MS-102-2002 |
| MiSeq Reagent Kit v3 (150-cycle) | Illumina | Cat# MS-102-3001 |
| MiSeq Reagent Kit v2 (50-cycles) | Illumina | Cat# MS-102-2001 |
| Tagment DNA TDE1 Enzyme | Illumina | Cat# 20034197 |
| Deposited data | ||
| Sequencing data generated in this study | This manuscript | GEO: GSE228465 |
| K562 DHS data | ENCODE Project Consortium, 201234 | GEO:GSM816655 and ENCFF972GVB |
| K562 H3K79me2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733653 and ENCFF957YJT |
| K562 CTCF ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM935407 and ENCFF682MFJ |
| K562 POLR2A ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE91721 and ENCFF806LCJ |
| K562 ATAC-seq data | ENCODE Project Consortium, 201234 | GEO:GSE170378 and ENCFF093IIW |
| K562 H3K9ac ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733778 and ENCFF286WRJ |
| K562 H3K9me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733776 and ENCFF601JGK |
| K562 H3K9me1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733777 and ENCFF654SLZ |
| K562 H4K20me1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733675 and ENCFF605FAF |
| K562 BRD4 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE101225 and ENCFF251SRH |
| K562 EZH2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM1003576 and ENCFF587SWK |
| K562 H2AFZ ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733786 and ENCFF621DJP |
| K562 POLR2AS2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM935402 and ENCFF434PYZ |
| K562 SMC3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM935310 and ENCFF469OWD |
| K562 HDAC1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE10583 and ENCFF684RNO |
| K562 HDAC2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE91451 and ENCFF954LGE |
| K562 HDAC3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE127356 and ENCFF975DCO |
| K562 H3K4me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE96303 and ENCFF253TOF |
| K562 H3K4me2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733651 and ENCFF959YJV |
| K562 H3K4me1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733692 and ENCFF834SEY |
| K562 H3K27me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733658 and ENCFF405HIO |
| K562 H3K27ac ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733656 and ENCFF849TDM |
| K562 H3K36me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733714 and ENCFF163NTH |
| Experimental models: Cell lines | ||
| K562 | ATCC | CCL-243 |
| K562 PE2-Puro | Choi et al., 20222 | N/A |
| K562 dCas9-VP64 | Chardon et al., 202363 | N/A |
| K562 PEmax | This manuscript | N/A |
| WTC11 DHFR-dCas9-VPH | Tian et al., 202164 | N/A |
| Oligonucleotides | ||
| List of oligonucleotides | This manuscript, Table S1 | N/A |
| Recombinant DNA | ||
| PB-T7-HEK3-BC (synHEK3) | This manuscript | N/A |
| LT3-GFP-T7-miR-E-CS1-PGK-Neomycin | This manuscript | N/A |
| Lenti-rtTA-P2A-Blast | This manuscript | N/A |
| pU6-Sp-pegRNA-HEK3-ins6N | This manuscript | N/A |
| pU6-Sp-pegRNA-HEK3-ins3N | This manuscript | N/A |
| pU6-Sp-dual-gRNA | This manuscript | N/A |
| pU6-Sp-gRNA-2XMS2 | This manuscript | N/A |
| PB-CMV-MCP-XTEN80-p65-Rta-3xNLS-P2A-T2AmPlum | This manuscript | N/A |
| PB-CMV-PEmax-EF1a-Puro | This manuscript | N/A |
| PB-UCOE-EF1a-PEmax-P2A-mCherry-PGK-Blast | This manuscript | N/A |
| PB-CMV-MCS-EF1α-Puro | System Biosciences | Cat# PB510B-1 |
| pCMV-HyPBase | Yusa et al., 201173 | N/A |
| PB-CMV-PE2-EF1a-Puro | Choi et al., 20222 | N/A |
| pU6-Sp-pegRNA-HEK3-insCTT | Anzalone et al., 20191 | Addgene: 132778 |
| pCMV-PEmax | Chen et al., 202113 | Addgene: 174820 |
| pCMV-PEmax-P2A-hMLH1dn | Chen et al., 202113 | Addgene: 174828 |
| CRISPRoff-v2.1 | Nuñez et al., 202158 | Addgene: 167981 |
| LT3GEN | Fellmann et al., 201350 | Addgene: 111173 |
| Software and algorithms | ||
| Bcl2fastq (v2.20) | Illumina | https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html |
| Samtools (v1.9) | Danecek et al., 202184 | http://www.htslib.org |
| Bedops (v2.4.35) | Neph et al., 201280 | https://bedops.readthedocs.io/en/latest/index.html |
| featureCounts | Liao et al., 201481 | https://subread.sourceforge.net/featureCounts.html |
| Needleall | Needleman et al., 197095 Kruskal, 198396 | https://emboss.sourceforge.net/apps/release/6.6/emboss/apps/needleall.html |
| STAR (v2.7.6a) | Dobin et al., 201377 | https://github.com/alexdobin/STAR |
| Salmon (v1.9.0) | Patro et al., 201776 | https://salmon.readthedocs.io/en/latest/ |
| Cutadapt (v4.1) | Martin, 201182 | https://cutadapt.readthedocs.io/en/stable/ |
| TrimGalore (v0.6.6) | Martin, 201182 | https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ |
| Bwa (v0.7.17) | Li et al., 200983 | https://bio-bwa.sourceforge.net/ |
| GenomicRanges | Lawrence et al., 201385 | https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html |
| ChIPseeker | Yu et al., 201588; Wang et al., 202289 | https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html |
| Homer | Heinz et al., 201090 | http://homer.ucsd.edu/homer/ |
| Genomation | Akalin et al., 201592 | https://bioconductor.org/packages/release/bioc/html/genomation.html |
| HTseq (v.2.0.2) | Putri et al., 202297 | https://htseq.readthedocs.io/en/master/ |
| Seurat (v4.0.0) | Hao et al., 202198 | https://satijalab.org/seurat/ |
| Hmisc (v.5.1-1) | Harrell and Dupont | https://cran.r-project.org/web/packages/Hmisc/index.html |
| Betareg (v3.1.4) | Cribari-Neto et al., 201099 | https://cran.r-project.org/web/packages/betareg/index.html |
| DESeq2 | Love et al., 201478 | https://bioconductor.org/packages/release/bioc/html/DESeq2.html |
| WebLogo 3 | Crooks et al., 200487 | https://weblogo.threeplusone.com/ |
| ComplexHeatmap | Gu et al., 201693 | https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html |
| Gviz | Hahne and Ivanek, 201694 | https://bioconductor.org/packages/release/bioc/html/Gviz.html |
| IGV | Robinson et al., 201186 | http://software.broadinstitute.org/software/igv/ |
| FlashFry | McKenna and Shendure, 201874 | https://github.com/mckennalab/FlashFry |
| GuideScan2 | Schmidt et al., 202275 | https://www.guidescan.com/ |
| DeepPrime | Yu et al., 202319 | https://deepcrispr.info/DeepPrime/ |
| Other | ||
| UCSC trackhub for visualizing prime editing results in synHEK3 reporters and surrounding epigenetic environments | This manuscript | https://shendure-web.gs.washington.edu/content/members/xyli10/public/nobackup/hub.txt |
EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
Cell lines and cell culture
K562 cells (CCL-243, female) were purchased from ATCC. PE22, PEmax and dCas9-VP6463 K562 lines were generated previously or in this study. All K562 cells were maintained in RPMI 1640 medium (Gibco) supplied with 10% FBS (Hyclone) and penicillin/streptomycin (Gibco, 100 U/ mL). HEK293T cells were maintained in DMEM medium (Gibco) supplied with 10% FBS and penicillin/streptomycin. The DHFR-dCas9-VPH (VP48-P65-HSF1) WTC11 (human iPSCs, male) line was generated and kindly provided by the Martin Kampmann lab64. iPSCs were maintained on Geltrex (Gibco) coated plates in the mTeSR Plus medium (STEMCELL Technologies) supplied with 10 μM Y-27632 (Stemgent). All cells were kept in a humidified incubator at 37 °C, 5% CO2.
METHOD DETAILS
Molecular cloning
PB-T7-HEK3-BC (synHEK3):
First, a minimal piggyBac cargo construct was created by deleting all intervening sequences between the 5’ and 3’ terminal repeats (including core insulators) of the PB-CMV-MCS-EF1α-Puro vector (System Biosciences, PB510B-1). A gBlock (Integrated DNA Technologies, IDT) consisting of a filler sequence and flanking scaffold sequences (from GFP) was inserted to create a shuttle vector. The filler sequence contains two divergent BsmBI recognition sites and can be removed scarlessly. Second, the shuttle vector was digested with BsmBI (New England Biolabs). An 87-bp region around the HEK3 gRNA target site was synthesized from IDT and amplified with a pair of primers to introduce a T7 promoter and a 16-bp barcode to its upstream and downstream, respectively. The resulting PCR product was inserted into the linearized shuttle vector using NEB HiFi assembly. Ligated products were cleaned up and concentrated with AMPure XP beads (Beckman Coulter), and electroporated into NEB 10-beta electrocompetent E. coli. Electroporation was performed in a 0.1 cm electroporation cuvette using a Bio-Rad GenePulser electroporator at 2.0 kV, 200 Omega, and 25 μF.
LT3-GFP-T7-miR-E-CS1-PGK-Neomycin:
The LT3GEN vector was purchased from Addgene (#111173)50 and digested with I-SceI (New England Biolabs). A fragment containing a T7 promoter and homologous sequences was ordered from IDT and assembled into the backbone. The LT3-GFP-T7-PGK-Neomycin vector was digested with XhoI and EcoRI-HF (New England Biolabs). An shRNA targeting the Renilla luciferase (Ren713) was inserted along with a Capture Sequence 1 (CS1) after the EcoRI site. For shRNA cloning, the LT3-GFP-T7-miR-E-CS1-PGK-Neomycin construct was digested with XhoI and EcoRI-HF. shRNAs were ordered from IDT as 97-nt 4 nmole Ultramers (Table S1) or oPool (Table S5) and amplified with primers p1 (5’-ATTACTTCGACTTCTTAACCCAACAGAAGGCTCGAGAAGGTATATTGCTGTTGACAGTGAGCG-3’) and p2 (5’-AATTGCTCTTGCTAGGACCGGCCTTAAAGCGAATTCTAGCCCCTTGAAGTCCGAGGCAGTAGGCA-3’). The PCR products were assembled into the backbone using NEB HiFi assembly and transformed into NEB Stable Competent E. coli (single shRNA vectors) or electroporated into NEB 10-beta electrocompetent E. coli (library).
Lenti-rtTA-P2A-Blast:
The Lenti-Cas9-P2A-Blast (Addgene: 52962) vector66 was digested with XbaI and BamHI (New England Biolabs) to create a backbone. rtTA was amplified from the LT3GEPIR vector (Addgene: 111177)50 and cloned into the backbone using NEB HiFi assembly.
pU6-Sp-pegRNA-HEK3-ins6N and pU6-Sp-pegRNA-HEK3-ins3N:
the pU6-Sp-pegRNA-HEK3-insCTT vector (Addgene:132778)1 was linearized by PCR with 5’ phosphorylated oligos p3 (5’-TCTGCCATCANNNNNNCGTGCTCAGTCTGTTTTTTTAAGCTTG-3’, ins6N) or p4 (5’-TCTGCCATCANNNCGTGCTCAGTCTGTTTTTTTAAGCTTG-3’, ins3N) and p5 (5’-GGACCGACTCGGTCCCACTT-3’) and ligated with T4 DNA ligase. Ligation product was concentrated and electroporated into NEB 10-beta electrocompetent E. coli.
pU6-Sp-dual-gRNA vectors:
A pU6-Sp-dual-gRNA scaffold vector was generated by replacing the pegRNA expressing cassette of pU6-Sp-pegRNA-HEK3-insCTT vector with a dual U6-gRNA cassette from the PX333 vector (Addgene: 64073)67. The second gRNA cloning site (two BsaI sites) was modified to two BsmBI sites. Spacer sequences were cloned into this vector sequentially using the oligo annealing method68.
pU6-Sp-gRNA-2XMS2 vectors:
A pU6-Sp-gRNA-2XMS2 scaffold vector was generated by modifying the pU6-Sp-pegRNA-HEK3-insCTT backbone. The scaffold sequence is 5’-GTTTAAGAGCTAAGCCAACATGAGGATCACCCATGTCTGCAGGGCATAGCAAGTTTAAATAAGGCTAGTC CGTTATCAACTTGGCCAACATGAGGATCACCCATGTCTGCAGGGCCAAGTGGCACCGAGTCGGTGCTTTT TTT-3’58. Spacer sequences were cloned in between two BsmBI sites using the oligo annealing method68.
PB-CMV-MCP-XTEN80-p65-Rta-3xNLS-P2A-T2A-mPlum:
The MCP(N55K) sequence69 was synthesized as an IDT gBlock and amplified. XTEN80 and 3XNLS-P2A were amplified from TETv4 (Addgene: 167983)58. p65-Rta was amplified from sadCas9-VPR (Addgene: 188514)70. mPlum was amplified from mPlum-C1(Addgene: 54839)71. All PCR products were purified and cloned into the PB-CMV-MCS-EF1α-Puro vector between the NotI site and the SV40 polyA sequence using NEB HiFi assembly. Then, a T2A sequence was inserted next to P2A by NEB HiFi assembly.
PB-CMV-PEmax-EF1a-Puro:
PE2max was amplified from pCMV-PEmax-P2A-hMLH1dn (Addgene: 174828)13 and cloned into the PB-CMV-MCS-EF1α-Puro vector.
PB-UCOE-EF1a-PEmax-P2A-mCherry-PGK-Blast:
The UCOE-EF1a was amplified from pMH0006 (Addgene: 135448)72. PE2max was amplified from pCMV-PEmax-P2A-hMLH1dn. The rest sequences were synthesized as IDT gblocks and amplified. All PCR products were cloned into an empty piggyBac transposon vector.
(e)pegRNA plasmids:
(e)pegRNAs (pU6-Sp-(e)pegRNA) used in this study were ordered as 4nm ultramers (IDT) or long primers containing the spacer and 3’ extension sequences and cloned into the backbone of pU6-Sp-pegRNA-HEK3-CTT using NEB HiFi assembly. The epegRNA libraries were ordered as individual IDT eBlocks, pooled and cloned into the same backbone by Golden Gate assembly (New England Biolabs).
General transfection strategies
Transfections of K562 cells were performed using a Lonza Bioscience 4D-Nucleofector system and the SF Cell Line 4D-Nucleofector X kits (Lonza). For single nucleocuvettes (100 μL), 1–3 × 106 cells were transfected with up to 9 μg DNA. For 96-well Nucleocuvette plates (20 μL), 2 × 105 ~ 4 × 105 cells were transfected with up to 1.2 μg DNA. Program code was FF-120.
Transfections of WTC11 cells were performed using a Lonza Bioscience 4D-Nucleofector system and the P3 Primary Cell 96-well Nucleofector Kit (Lonza). For each transfection, 2 × 105 cells were transfected with up to 3.5 μg DNA. Program code was CB-150.
Transfections of the HEK293T cell line were performed using Lipofectamine 3000 (Invitrogen) following manufacturer’s instructions.
Cell line generation
Wild-type K562 cells were transfected with PB-UCOE-EF1a-PEmax-P2A-mCherry-PGK-Blast and pCMV-HyPBase at a ratio of 3:1 in a 100 μL nucleofection reaction. Cells were selected with 10 μg/mL blasticidin (Gibco) for 7 days. Monoclonal lines were isolated and the clone with the brightest mCherry fluorescent signal was used for following experiments.
Prime editing experiment in synHEK3 reporters
3 × 106 PE2(+) or wild-type K562 cells were transfected with synHEK3 and the pCMV-HyPBase plasmid73 at a 3:1 ratio (total 9 μg) in a 100 μL nucleofection reaction. Cells were cultured in a large flask to maintain complexity. After 10 days, gDNA was extracted to estimate synHEK3 copy number by qPCR. Cells were counted and 500 cells were seeded to a new dish. gDNA was extracted from a portion of the expansion of these cells and used for T7-assisted reporter mapping (described below).
For measuring prime editing efficiency, 2 × 106 PE2(+) or wild-type K562 cells with integrated synHEK3 reporters from the bottlenecked 500-cell pools generated above were transfected with 2 μg pU6-Sp-pegRNA-HEK3-insCTT or 1 μg pU6-Sp-pegRNA-HEK3-insCTT with 1 μg PB-CMV-PE2-EF1a-Puro2, respectively. Wild-type K562 cells were selected with 2 μg/mL puromycin for 2 days 24 hours after transfection. Cells were lysed 4 days after transfection and subjected to amplicon sequencing (described below).
Prime editing experiment in endogenous sites
SpCas9 spacers of the pegRNAs were searched in a set of 1-kb windows 500-bp away from a different set of randomly integrated synHEK3 reporters in K562 cells (n = 3,634) using FlashFry74. gRNAs were filtered by specificity (specificity score Hsu 2013 > 80) and ranked by cutting efficiency (Doench 2014), and best gRNAs were selected. This resulted in 3,157 gRNAs. DeepPrime was used to predict best pegRNAs or epegRNAs for installing 3-bp insertions (CCT) at the cut sites (the HEK293T PE2max models). The predicted scores were highly correlated for each spacer between the pegRNAs and epegRNAs (Pearson’s r = 0.83). Spacers scored highly in both pegRNA (>40%) and epegRNAs (>65%) were selected (n = 121). Eventually, the library was synthesized in the form of epegRNAs (with the evopreQ1 hairpin; Table S1).
The epegRNAs were grouped into 2 separate pools (N = 60 and 61, respectively) with 2% spike-in of the pU6-Sp-pegRNA-HEK3-insCTT vector. The abundance of pegRNAs were measured by sequencing of the plasmid libraries. The plasmid pools (2 μg) were then transfected into 1 × 106 PEmax(+) K562 cells in duplicates. Editing rates were measured at the 121 sites and normalized using the abundance of the HEK3 plasmid and editing efficiency at the HEK3 loci in the individual pools (Table S4).
Cas9 RNP editing experiment
Alt-R sgRNAs were ordered from IDT and resuspended in TE buffer to 100 μM. The RNP complex was assembled by combining 2.1 μL PBS, 1.2 μL Alt-R sgRNA (100 μM) and 1.7 μL Alt-R Cas9 Nuclease (61 μM, IDT) and incubating at room temperature for 10–20 min. The RNP mixture was added to 1 × 106 PE2(+) K562500-cell pool resuspended in nucleofection solution along with 0.5 μg pmax-GFP (Lonza) and 1 μL electroporation enhancer (100 μM, IDT). gDNA was extracted at Day 1, 2 and 4 after transfection for amplicon sequencing of synHEK3 reporter.
CRISPRoff experiment
On Day 0, 2 × 106 clone 5 cells were transfected with CRISPRoff-v2.1 (3 μg), pU6-Sp-dual-gRNA (1 μg) and pmax-GFP (500 ng, Lonza) using the SF Cell Line 4D-Nucleofector X kit L (Lonza). On Day 2, cells were sorted based on high GFP expression (top 20%) on a flow cytometer and expanded. On Day 11, 4 × 105 cells were transfected with the pU6-Sp-pegRNA-HEK3-insCTT plasmid (1 μg). Cells were lysed on Day 15 for amplicon sequencing of the synHEK3 reporter. A portion of cells were collected on Day 11 for total RNA extraction and RNA sequencing (described below).
CRISPRa experiments
K562 experiments:
On Day 0, 4 × 105 K562 dCas9-VP64 cells were transfected with PB-CMV-MCP-XTEN80-p65-Rta-3xNLS-P2A-T2A-mPlum (600 ng) and paired pU6-Sp-gRNA-2XMS2 (200 ng each) plasmids targeting the same promoter. On Day 2, 4 × 105 cells from the previous transfection were transfected with pCMV-PEmax (Addgene: 174820)13 or PB-CMV-PEmax-EF1a-Puro (800 ng) and the pU6-Sp-(e)pegRNA plasmids (400 ng). Cells transfected with PB-CMV-PEmax-EF1a-Puro were selected with 2 μg/mL puromycin 24 hours after transfection for 2 days. Cells were lysed at Day 5 or 6 for amplicon sequencing.
iPSCs experiments:
On Day 0, 2 × 105 the DHFR-dCas9-VPH WTC11 cells were transfected with PB-CMV-MCP-XTEN80-p65-Rta-3xNLS-P2A-T2A-mPlum (2.1 μg) and paired pU6-Sp-gRNA-2XMS2 (700 ng each) plasmids targeting the same promoter. 20 μM Trimethoprim (Sigma-Aldrich) was added to induce dCas9-VPH. On Day 3, 2 × 105 cells from the previous transfection were transfected with PB-CMV-PEmax-EF1a-Puro (2 μg) and the pU6-Sp-(e)pegRNA plasmids (1 μg). Cells were continuously treated with 20 μM Trimethoprim and were selected with 2 μg/mL puromycin 24 hours after transfection for 1 day. Cells were lysed at Day 5 or 6 for amplicon sequencing.
Design and test of IL2RB epegRNA libraries
SpCas9 spacers for the selected exons of human IL2RB (Exons 1, 2, 4, 5, 8, 9 and 10) were generated using GuideScan275, sorted by the number of off-target and cutting efficiency, and filtered for spacers with all of A, T, C, G bases within the first 7 base pairs from the cut sites (which allowed for modeling all 12 possible substitutions in close proximity of the cut sites). For substitutions, the first occurrences of the A, T, C, G bases were converted to the rest 3 bases. For 1-bp insertions, insertions reverse complement to the first base after the cut sites were modeled. For 3- and 6-bp insertions, insertions of CCT and CGTCAT were modeled at the cut sites. For long insertions, the 34-bp loxP sequence was inserted at the cut sites. For deletions, sequences of 1, 3 and 6 bp were deleted after the cut sites. epegRNAs for substitutions, and insertions and deletions ≤ 3 bps were designed using DeepPrime. The resulting epegRNAs for each exon had a median DeepPrime score > 40%. For 6-bp insertions and deletions, and insertions of loxP, the epegRNAs were designed manually with the length of primer binding sites being 13 bp and length of template (excluding the inserted or deleted sequences) being 10 bp. See Table S1 for sequences of the epegRNAs.
19 epegRNAs derived from the same spacer were synthesized as a pool (IDT oPool) and cloned into the pU6-Sp-pegRNA-HEK3-CTT backbone as a library. The 7 epegRNAs libraries were tested following the CRISPRa experiment procedure in the dCas9-VP64 K562 cells as described above.
Quantitative PCR analysis
qPCRs were performed on purified gDNA or cDNA reverse transcribed from total RNAs using SuperScript IV reverse transcriptase (200 U/μL, Invitrogen) following manufacturer’s instructions. qPCRs were performed with Power SYBR Green PCR Master Mix (Invitrogen) or KAPA2G Robust HotStart ReadyMix (Roche) supplied with SYBR Green (Invitrogen). For copy number estimation of synHEK3 reporters, Cq values of synHEK3 were normalized to those of SNRPB (3 copies per genome). See Table S1 for the list of primers used.
T7-assisted reporter mapping
gDNA of K562 cells was purified using the DNeasy Blood & Tissue Kit (QIAGEN) and in vitro transcribed with the HiScribe T7 High Yield RNA Synthesis Kit (New England Biolabs). Briefly, each reaction (20 μL) contained 0.3~1 μg gDNA, NTPs (10 mM each) and 2 μl T7 RNA Polymerase Mix. The reaction mixture was incubated at 37 °C for 16 hours. Then, gDNA was digested with 2.5 μL DNase (QIAGEN) in a 100 μL reaction at room temperature for 30 min. RNA was extracted with TRIzol LS Reagent (Invitrogen), and aqueous phase was precipitated with 1 volume of isopropanol and 5 μg Glycogen (Invitrogen) at −80 °C for 1 hour. RNA pellet was collected by centrifugation at 21,000 × g at 4 °C for 1 hour. The pellet was washed with 80% ice-cold ethanol and resuspended in 11.5 μL nuclease-free water.
For reverse transcription, RNA was incubated with 0.5 μL 100 μM RT primer p6 (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNN-3’) and 1 μL 10 mM dNTP at 65 °C for 5 min and cooled on ice. Then, 4 μL 5X RT buffer, 1 μL 100 mM DTT, 1 μL RNaseOUT (40 U/μL) and 1 μL SuperScript IV RT (200 U/μL) were added and the reaction mixture was incubated at 23 °C for 10 min, 50 °C for 15 min and 80 °C for 10 min.
cDNA library was amplified with KAPA2G Robust HotStart polymerase, using primers p7 (5’-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGAAAGGAAGCCCTGCTTCCTCCAGAGGG-3’, 0.5 μM) and p8 (5’- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’, 0.5 μM). PCR reaction was performed as follows: 95 °C 3 min; 95 °C 15 s, 65 °C 15 s, 72 °C 30 s, 16~18 cycles; and 72 °C 1 min. The PCR product was subjected to double-sided size selection (0.5X, 0.9X) and cleaned up with AMPure XP beads. The resulting product ranged from 200 to 1000 bp. To prepare Illumina sequencing libraries, 5–10 ng PCR product was re-amplified with the Nextera P5 and TruSeq P7 library index primers as shown in Table S1 for 5 cycles. The final PCR product underwent another round of double-sided size selection (0.5X, 0.9X) and clean-up with AMPure XP beads. The library was sequenced on an Illumina MiSeq in paired-end mode (Read 1: 254 bp; Read2: 55bp).
synHEK3 editing library preparation
gDNA of K562 cells was purified using the DNeasy Blood & Tissue Kit. Alternatively, cells were lysed in a lysis buffer [10 mM Tris-HCl pH8.0, 0.05% SDS and 0.04 mg/mL proteinase K(Thermo Scientific)], and incubated at 50 °C 60 min and 80 °C 30 min. 100~250 ng gDNA or cell lysates were amplified using KAPA2G Robust HotStart ReadyMix with primers p9 (5’-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCTACCCCGACCACATGAAGCAGC-3’, 0.5 μM) and p10 (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNNGACCATGTCATCGCGCTTCTCGT-3’, 0.5 μM). PCR reaction was performed as follows: 95 °C 3 min; 95 °C 15 s, 68-N °C 15 s, 72 °C 30 s, for 9 cycles (N was cycle number); 95 °C 15 s, 65 °C 15 s, 72 °C 30 s, for 11 cycles; and 72 °C 1 min. To ensure enough coverage and accurate measurement of editing efficiencies, for the K562 synHEK3 pool, we pooled products from at least 16 PCR reactions. The PCR product was purified with AMPure XP beads. 5 ng PCR product was re-amplified with the Nextera P5 and TruSeq P7 library index primers in Table S1 for 5 cycles. The final libraries were cleaned up with AMPure XP beads and sequenced on an Illumina NextSeq 500 or an Illumina NextSeq 2000 sequencer.
Lentivirus production and transduction
1.6 × 107 HEK293T cells were seeded in a 10 cm2 dish the day before transfection. 5.6 μg lentiviral vector and 14.4 μg ViraPower Lentiviral Packaging Mix (Invitrogen) were mixed and transfected using Lipofectamine 3000 following manufacturer’s instructions. Medium was changed at 24 hours post transfection. Viruses were collected at 48 and 72 hours post transfection, filtered using 45 μm filters and concentrated 100-fold using PEG-it Virus Precipitation Solution (System Biosciences).
In general, K562 cells were transduced with lentivirus in the presence of 8 μg/mL polybrene (Millipore). Medium was replaced after 24 hours.
Pooled shRNA screen
To prepare for the pooled shRNA screening, two monoclonal lines (clone 3 and clone 5) from the original 500-cell synHEK3 pool were isolated and genotyped. There were 22 and 28 unique synHEK3 reporters in these two clones. These cells were transduced with Lenti-rtTA-P2A-Blast viruses and selected in 10 μg/mL blasticidin for 7 days. rtTA (+) monoclonal lines were then generated for downstream experiments.
The shRNA lentiviral library was first titrated, and mixed with clone 3 and 5 rtTA(+) cells at an MOI of 10 and a >1000X coverage. Transduced cells were selected in 800 μg/mL Geneticin (Invitrogen) for 7 days. 1 × 106 cells were treated with 1 μg/mL doxycycline for 2 days to induce shRNA expression. Then 1.5 × 106 - 2.5 × 106 cells were transfected with 4.5–6.0 μg pU6-Sp-pegRNA-HEK3-ins6N (plasmid to introduce 6-bp insertion in HEK3). After 3–4 days, cells from the two clones were collected and mixed at a 1:1 ratio for sci-RNA-seq3 library preparation.
The sci-RNA-seq3 libraries were prepared by modifying a recently published protocol from the lab49. Briefly, K562 cells were counted, washed with PBS and lysed in 5 mL Hypotonic lysis buffer B with DEPC (Sigma) for 10 min on ice. Nuclei were collected by centrifugation at 500 xg, 4 °C for 3 min and resuspended in 1 mL 0.3 M SPBSTM buffer with DEPC. Nuclei were fixed with 4 mL ice-cold methanol for 15 min on ice, swirled occasionally. After rehydration by adding 10 mL SPBSTM, nuclei were collected by centrifugation and washed once with 1 mL 0.3 M SPBSTM. For T7 in situ transcription, fixed nuclei resuspended in 171 μL SPBSTM were mixed with 99 μL NTP buffer and 30 μL T7 polymerase mix (New England Biolabs) and incubated in a 1.5 mL LoBind tube (Eppendorf) at 37 °C for 30 min. Afterwards, nuclei were washed with 1 mL SPBSTM, sonicated using a Diagenode sonicator for 12 seconds. At this point, nuclei were stained with YOYO-1 Iodide (Invitrogen) and counted on a Countess automated cell counter.
Counted nuclei were resuspended in SPBSTM at 4 × 106/mL. For each RT plate, 500 μL nuclei were combined with 56 μL 10 mM dNTPs (New England Biolabs) and distributed to a low-bind 96-well plate (5 μL per well). 1 μL indexed oligo-dT, HEK3 and CS1 primers (10 μM) were added to each well. The plate was incubated at 55 °C for 5 min and cooled on ice. RT mixture was prepared by combining 240 μL 5X SuperScript IV Buffer, 60 μL SuperScript IV and 60 μL water. 3 μL RT mixture was added to each well. The RT plate was incubated at 55 °C for 10 min and cooled on ice. 5 μL ice-cold SPBSTM was added to each well and all nuclei were pooled in pre-chilled LoBind tubes. Nuclei were washed once with 1 mL SPBSTM and resuspended in 1,200 μL SPBSTM (per ligation plate).
11 μL nuclei were distributed to each well of a new 96-well plate and mixed with 2 μL 10 μM ligation primers. For each ligation plate, 195 μL 10X T4 Ligation Buffer was mixed with 65 μL T4 DNA Ligase, and 2 μL of the ligation mixture was added to each well. Ligation was performed at room temperature for 20 min on the bench. For the shRNA screen, nuclei were distributed into 4 ligation plates (384 ligation indices) to increase cell index complexity. The ligation plate was then cooled on ice and 10 μL ice-cold SPBSTM was added to pool nuclei from all wells. Nuclei were washed twice with 1 mL SPBSTM.
Nuclei were resuspended in 1X Second Strand Synthesis Buffer (New England Biolabs), counted with YOYO-1 Iodide and diluted to 1.3 × 105 - 2.5 × 105 per 400 μL. 4 μL nuclei were distributed to multiple 96-well plates based on the number of nuclei retrieved at this step. Extra plates with nuclei were stored at −80 °C. Second strand synthesis mixture was prepared by combining 10.5 μL 10X Second Strand Synthesis Buffer, 35 μL 20X Second Strand Synthesis Enzyme with 94.5 μL water. 1 μL of second strand synthesis mixture was added to each well. The plate was incubated at 16 °C for 2.5 hours. To lyse nuclei, 1 μL ~1.07 AU/mL protease (QIAGEN) was added to each well and incubated at 37 °C for 40 min. Protease was inactivated at 75 °C for 20 min. 5 μL EB buffer (QIAGEN) was added to each well and mixed. 5 μL was taken out from each well and used for enrichment PCR, while the rest was used for Tn5 tagmentation and transcriptome library preparation.
To prepare the transcriptomic library, Tn5-N7 and mosaic end oligos were resuspended to 100 μM in annealing buffer (50 mM NaCl, 40 mM Tris-HCl pH8.0), mixed at a 1:1 ratio and annealed on a thermocycler using the following program: 95°C 5 min, cool to 65°C (0.1°C/s), 65°C 5 min, cool to 4°C (0.1°C/s). 20 μL Tagmentase (Tn5 transposase - unloaded, Diagenode) was mixed with 20 μL annealed oligos and incubated on a thermomixer at 350 rpm, 23°C for 30 min. 20 μL glycerol was added to the loaded Tn5 before storage at −20 °C. For tagmentation, 13.75 μL N7-loaded Tn5 (Diagenode) was mixed with 550 μL TD buffer and 5 μL of this mixture was added to each well. The plate was incubated at 55 °C for 5 min. To remove Tn5 transposase, 50 μL 1% SDS, 50 μL BSA (New England Biolabs) and 225 μL water were mixed, and 2.6 μL of the mixture was added to each well and incubated at 55 °C for 15 min. Then, SDS was quenched by adding 2 μL 10% Tween-20 to each well. For PCR, 96 indexed P5 primers were used with constant or indexed P7 primers (See Table S1). A PCR master mixture was prepared by combining 2,200 μL 2x NEBNext High-Fidelity 2X PCR Master Mix (New England Biolabs), 22 μL common P7 primer (100 μM) and 352 μL water. 2 μL indexed TruSeq P5 primer and 23.4 μL PCR mixture were added to each well. If using indexed P7 primers, 2 μL of 10 μM primer was added to each well. PCR reaction was performed as follows: 70 °C 3 min; 98 °C 30 s; 98 °C 10 s, 63 °C 30 s, 72 °C 60 s, 16 cycles; and 72 °C 5 min. 3 μL of each well was pooled and cleaned up with 0.8X AMPure XP beads. The library was resolved on a 1% agarose gel and the smear between 300–600 bp was extracted using Monarch DNA Gel Extraction Kit (New England Biolabs).
For HEK3 and shRNA libraries, enrichment PCRs were directly performed on protease-treated cDNAs. To match PCR indices with those of the transcriptome library, the same indexed TruSeq P5 primers were used. The enrichment PCR master mixture contained 2,200 OneTaq 2X Master Mix (New England Biolabs), 16.5 μL synHEK3 P7 enrichment primer (100 μM), 16.5 μL shRNA P7 enrichment primer (100 μM), 44 μL 100X SYBR green (Invitrogen) and 1,353 μL water. 2 μL indexed P5 primer and 33 μL PCR mixture were added to each well. PCR reaction was performed as follows: 95 °C 3 min; 95 °C 15 s, 68-N °C 15 s, 72 °C 30 s, for 9 cycles (N was cycle number); 95 °C 15 s, 65 °C 15 s, 72 °C 30 s, for M cycles (decided by qPCR); and 72 °C 1 min. We monitored the reaction on a real-time qPCR machine and terminated the reaction at the log phase. All PCR products were pooled and concentrated using 0.9X AMPure XP beads. The library was resolved on a 1% agarose gel and the two discrete bands corresponding to HEK3 and shRNA constructs were extracted using Monarch DNA Gel Extraction Kit.
Eventually, all libraries were sequenced on an Illumina NextSeq 500 or an Illumina NextSeq 2000 sequencer. For HEK3 and shRNA libraries, custom Index 1 primers and Read 2 primers were used. Usually, 34 cycles were allocated to Read 1 for reading the RT index, ligation index, and unique molecular identifier (UMI). See Table S1 for a list of oligos and sequencing primers used in this experiment.
Amplicon sequencing
Cells were lysed in a lysis buffer (10 mM Tris-HCl pH8.0, 0.05% SDS and 0.04 mg/mL proteinase K), and incubated at 50 °C 60 min and 80 °C 30 min. Lysates were directly used for PCR with the KAPA2G Robust HotStart ReadyMix with primers (0.5 μM each) designed for the endogenous targets. PCR reaction was performed as follows: 95 °C 3 min; 95 °C 15 s, 66-N °C 15 s, 72 °C 40 s, for 8 cycles (N was cycle number); 95 °C 15 s, 60 °C 15 s, 72 °C 40 s, for M cycles (decided by qPCR); and 72 °C 1 min. The PCR product was purified with AMPure XP beads. 5 ng PCR product was re-amplified with the Nextera P5 and TruSeq P7 library index primers in Table S1 for 5 cycles. The final libraries were cleaned up with AMPure XP beads and sequenced on an Illumina Miseq, an Illumina NextSeq 500 or an Illumina NextSeq 2000 sequencer.
Sequencing reads were demultiplexed using the bcl2fastq software (Illumina). A custom script was used to determine editing efficiency.
RNA sequencing
For bulk RNA sequencing of K562 PE2(+) cells and EPZ-5676 treated cells, total RNAs were purified from cells using TRIzol LS Reagent following manufacturer’s instructions, treated with RNase-free DNase (QIAGEN) and cleaned up using RNeasy Mini kit (QIAGEN). 1 μg total RNA (RNA integrity >= 9.7) were used for unstranded mRNA library preparation using the Illumina TruSeq RNA Library Prep Kit v2. For the CRISPRoff experiment and the HLTF knockdown experiment, total RNAs were purified from cells using TRIzol LS Reagent, treated with Turbo DNase (Invitrogen) and cleaned up using the RNeasy Mini kit. 1 μg total RNA (RNA integrity >= 9.9) were used for stranded mRNA library preparation using the Illumina TruSeq Stranded mRNA Library Prep kit. The RNA libraries were indexed using the Illumina TruSeq RNA UD Indexes. All libraries were sequenced on an Illumina NextSeq 500 or an Illumina NextSeq 2000 sequencer in a paired end mode.
Sequencing reads were demultiplexed using the bcl2fastq software. For K562 PE2(+) and EPZ-5676 treated cells, an average of 34 million 75-bp paired-end reads were obtained. For the CRISPRoff experiment, on average, 55 million 50-bp paired-end reads were obtained per sample. For the HLTF knockdown experiment, 20 million 59-bp paired-end reads were obtained per sample. For calculating TPM of genes, sequencing reads from the unstranded RNA libraries were aligned to the GRCh38 reference genome (Gencode V43) using Salmon (v1.9.0)76. Reads were aligned to the GRCh38 reference genome and counted against all Ensembl genes using STAR (2.7.6a)77. Raw counts were analyzed with DESeq278.
ATAC sequencing
1 × 105 cells were collected, washed with PBS, and pelleted by centrifugation at 500 xg, 4 °C for 5 min. Cells were then lysed with 50 μL lysis buffer (10 mM Tris-HCl pH7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40, 0.1% Tween-20, 0.01% Digitonin) on ice for 3 min and neutralize with 250 μL RSB buffer with Tween-20 (10 mM Tris-HCl pH7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20). Nuclei were counted on a hemocytometer and 5 × 104 nuclei underwent tagmentation. For tagmentation, per 50 μL contained 25 μL 2x TD Buffer, 8.25 μL PBS, 0.5 μL 1% Digitonin, 0.5 μL 10% Tween-20, 2.5 μL Tn5 enzyme (Illumina, 2.5 μM) and 13.25 μL water. The reaction was incubated at 37°C for 30 min. DNA was purified using the Clean and Concentrate kit (Zymo) and eluted in 10-20 μL EB Buffer (QIAGEN). All or half of the eluted DNA was used for qPCR. In each reaction, 10 μL tagmented DNA was mixed with 25 μL 2x NEBNext High-Fidelity 2X PCR Master Mix (NEB), 2.5 μL i5 primer, 2.5 μL i7 primer (Table S1), 0.25 μL 100X SYBR Green (Invitrogen) and 9.75 μL water. The qPCR reaction was performed as follows: 72°C 5 min, 98 °C 30 s; 98 °C 10 s, 63 °C 30 s, 72 °C 1 min. The reaction was stopped after 7 cycles. Libraries were pooled, purified with AMPure XP beads, and sequenced on an Illumina NextSeq 2000 sequencer.
On average, 34 million 59-bp paired-end reads were obtained per sample. Reads were processed using the ENCODE79 ATAC-seq pipeline. Filtered peaks were sorted and merged across samples using bedops (v2.4.35)80 and were used to generate a peak count table using featureCounts (v2.0.2)81. Raw counts were analyzed with DESeq2 (Table S6).
T7-assisted reporter mapping library analysis
Sequencing reads were first demultiplexed using the bcl2fastq software. Under the current library design and sequencing scheme, Read1 started from genomic sequence and extended into the integrated synHEK3 construct, while Read2 contained reporter barcode information. Processing was as follows: 1) for each sequencing read pair, the 16-bp reporter barcode was extracted from Read 2 and attached to the read name of Read 1. 2) Read1 was trimmed using cutadapt (v4.1)82 with following parameters: --cores=4 --discard-untrimmed -e 0.2 -m 10 -O 8 -a CCCTAGAAAGATAGTCTGCGTAAAATTGACGCATG. The adapter corresponded to the 3’ITR of piggyBac transposon. Since parameter “--discard-untrimmed” was used, only reads spanning insertion junctions were kept and trimmed. 3) Trimmed sequences were aligned to the GRCh38 reference genome using bwa mem (v0.7.17)83. The “-Y” option was used to enable soft clipping for supplementary alignments. 4) Reads uniquely (without XA:Z tag) and contiguously mapped near putative piggyBac landing pads (TTAA motifs) were kept using samtools (v1.9)84 and a custom script. 5) Aligned reads were converted to BED format using the sam2bed function in bedops (v2.4.35). Reads aligned to standard chromosomes were kept. 6) Insertion points were calculated for all reads based on strand of alignment. And reads were sorted by the insertion coordinates. 7) The first 8-bp of reads were used as UMIs. A custom script was used to collapse reads at a per-location, per-barcode, per UMI basis. A barcode-location-UMI count table was generated. 8) synHEK3 barcodes <3 Levenshtein Distances at each location were collapsed and barcodes with > 3 UMIs were kept. 9) The count table was converted to a GenomicRanges85 object in R. Coordinates of the last 4 base pairs of aligned reads were designated as genomic locations of the inserted synHEK3 reporters. 10) “Landing pads” of the mapped synHEK3 reporters were retrieved using the getSeq() function in the BSgenome package. Reporters that didn’t have a TTAA sequence (which could be due to PCR error, mapping error, or the use of non-canonical landing pads) were removed. 11) SynHEK3 barcodes mapped to more than one location were removed.
Aligned reads were visualized on the Integrated Genome Browser (IGV; Fig. 1B)86. Motif enrichment and visualization (Fig. 1C) was performed using WebLogo 387. Genomic coverage analysis and annotation of the synHEK3 reporters (Fig. 1D,E) were performed using the ChIPseeker package88,89 in R. Enrichment of synHEK3 reporters across genomic features (Supplementary Fig. 1G,H) were performed using the annotatePeaks.pl function in Homer90. A custom script was used to find overlapping genes and calculate distance to closest TSSs.
ENCODE dataset analysis
ENCODE datasets were downloaded from the ENCODE portal91. Accessions of the datasets are listed in Key Resources Table and Table S2. The GenomicRanges object containing all unique synHEK3 reporters was resized to 2 kb in width. The R package Genomation92 was used to extract values within the 2-kb windows from the bigwig files of the chromatin features. For a specific dataset, scores of the synHEK3 reporters were calculated as , where xi was the value at base i, and c was the minimum non-zero value of the chromatin feature score in a base within all the windows surveyed. For the generation of heatmap in Fig. 6B, chromatin feature scores were scaled using the scale() function in R. Clustering and visualization of synHEK3 reporters was performed using the ComplexHeatmap package93 in R. Visualization of the epigenetic tracks was done by Gviz94 in R (Fig. 2F; Supplementary Fig. 2I).
synHEK3 editing library analysis
Sequencing reads from the Illumina NextSeq platforms were first demultiplexed using the bcl2fastq software. The 16-bp reporter barcodes were extracted from read and attached to its read name. Sequences around the CRISPR cut site were extracted for all reads. A barcode-editing outcome table was generated. For prime editing experiments, a custom script was used to align sequences to a reference sequence and count mutation frequency for every barcode. For Cas9 mutagenesis analysis, all sequences were aggregated and editing outcomes (alleles) with the highest number of counts were selected. These most frequent alleles were then aligned to reference sequences using needleall95,96 with the following parameters: -gapopen 20 -gapextend 0.5 -endopen 20. A custom script was used to annotate the mutational events. MMEJ alleles were selected based on the following criteria: 1) microhomology sequences being at least 2 bp; 2) observed allele frequencies being 6-fold higher than expected frequencies. The 9 most frequent non-wild-type alleles were used for calculating the MMEJ/(MMEJ+NHEJ) ratio in Supplementary Fig. 3G–I.
sci-RNA-seq3 transcriptome library analysis
Sequencing reads from the Illumina NextSeq platforms were first demultiplexed based on P5 PCR index using the bcl2fastq software. Reads were filtered based on RT and ligation index, by allowing <2 Hamming distance from reference. Filtered reads were trimmed with TrimGalore (v0.6.6)82 with parameters: -a AAAAAAAA --three_prime_clip_R1 1. Reads were then aligned to the GRCh38 reference genome using STAR (v2.7.6a). PCR duplicates were collapsed using the RT index, ligation index, UMI sequence and end coordinate of reads. Reads were further demultiplexed based on the combination of the RT, ligation and PCR index and split into files for individual cells. To generate gene expression count matrices, reads were assigned to the exonic and intronic region of closest genes with HTseq (v.2.0.2)97. Reads with ambiguous assignments were discarded. Cells were further filtered based on total UMI (> 100) and mitochondrial reads percentage (<10%). Cells with the number of features detected between 10% and 90% percentile of all cells were kept and considered high-quality cells. The single cell analysis was performed using the Seurat (v4.0.0) package98 in R.
Pooled shRNA screen analysis
Sequencing reads from the Illumina NextSeq platforms were first demultiplexed based on P5 PCR index using the bcl2fastq software. A custom script was used to further demultiplex and filter the synHEK3 and shRNA libraries based on the combination of RT, ligation and PCR index (cell ID), by allowing <2 Hamming distance from reference. For the synHEK3 library, reporter barcodes (<2 Hamming distance to the reference barcode set), prime editing outcomes and read UMIs were extracted. For the shRNA library, shRNA sequences (<2 Hamming distance to the reference set) and read UMIs were extracted. UMIs with <3 Hamming distances were collapsed.
A series of pre-processing steps were applied to the data. 1) We matched cell IDs between the sci-RNA-seq3 transcriptome libraries and the capture libraries, and only kept high-quality cells (number of features between 10% and 90% percentile). 2) For the shRNA library, cells with <3 or >200 UMIs or >20 shRNA were removed. 3) For the synHEK3 library, cells with <4 UMIs were first removed. We counted UMIs for synHEK3 reporters belonging to the two clones and calculated a clone UMI/total UMI ratio. If the ratio was >80% for a specific clone, the cell is assigned to the corresponding clone. Cells with extremely high UMIs (cutoff between 100 and 250 UMIs, depending on library complexity and sequencing depth of the plate) were also removed before downstream analysis. 4) For a cell-synHEK3 barcode combination, if multiple insertion sequences were observed, sequences with <3 Hamming distance were collapsed. Cell-synHEK3 barcode pairs having conflicting editing outcomes were discarded. 5) A cell-synHEK3 barcode-editing outcome-shRNA matrix was generated for a common set of cells identified by the shRNA and synHEK3 libraries.
In general, the prime editing efficiency of a specific synHEK3 reporter was estimated in all cells containing the reporter by collapsing reads on a per-cell, per-barcode basis. To assess the effect of an shRNA on a specific synHEK3 reporter, editing efficiency of this reporter was calculated in cells without the specific shRNA, which was defined as the hypothesized probability of editing. Then, by treating prime editing outcomes as binary events (unedited vs. insertion), the total number of cells containing the shRNA and number of cells with 6-bp insertions were counted. A p value was computed using the function binom.test() in R. For each clone, all synHEK3-shRNA pairs were assessed by binomial tests. Q-Q plots in Fig. 5A were generated using uncorrected p values. The p values were further corrected using the function p.adjust() in R and the Benjamini-Hochberg method was used. Empirical p values of candidate synHEK3-shRNA pairs were calculated as (n_lower + 1)/(N + 1), where n_low was the number of control tests with a raw p value lower than the candidate test’s raw p value, and N was the total number of control tests. Empirical p values were then Benjamini-Hochberg corrected, and those with eFDR < 5% were reported. Processed screen results for clone 3 and clone 5 are provided in Table S5.
QUANTIFICATION AND STATISTICAL ANALYSIS
The statistical tests for each experiment are described in the text, figure legends or STAR Methods. In Fig. 2C, the beta regression model was performed using the function betareg() in R package betareg99 using default parameters. In Fig. 2B–D,G–I, Supplementary Fig. 2A,D, Fig. 3A, Supplementary Fig. 4A, Supplementary Fig. 5C, Spearman’s and Pearson’s correlation coefficient and p values were generated using the cor() function or the rcorr() function in the package Hmisc (Harrell and Dupont) in R. In Supplementary Fig. 3I, 3J and Supplementary Fig. 5J, the fit with confidence intervals was produced using function geom_smooth() in the R package ggplot using parameter ‘method = “lm”’ and all other parameters being default. In Fig. 5, binomial tests were performed to measure the effects of shRNAs and the resulting p values were corrected for multiple hypothesis testing using the Benjamini-Hochberg method. Empirical p values of candidate synHEK3-shRNA pairs were calculated as (n_lower + 1)/(N + 1), where n_low was the number of control tests with a raw p value lower than the candidate test’s raw p value, and N was the total number of control tests. Empirical p values were then Benjamini-Hochberg corrected, and those with eFDR < 5% were reported. In Supplementary Fig. 5I,J, Supplementary Fig. 6F,G and Supplementary Fig. 7B, differential gene expression or accessibility analyses were performed using the DESeq2 package in R (Wald tests). In Fig. 6C, Fisher’s exact test was used to compare the differential enrichment of gene-overlapping synHEK3 reporters in the two groups. In Supplementary Fig. 2H, Fig. 3E–G, Supplementary Fig. 3F–G, Fig. 6D, and Supplementary Fig. 6D,E, p values were calculated using two-sided Kolmogorov–Smirnov tests. In Fig. 7B, scatter plots report mean editing efficiencies of two biological replicates and error bars correspond to standard deviations. In Fig. 7C, control editing efficiencies were predicted using linear model trained in synHEK3 reporters that are not in the corresponding CRISPRoff target genes using the lm() function in R. In Supplementary Fig. 7C, welch’s two sample t-tests were used to compare prime editing efficiency of genes before and after CRISPR activation.
Supplementary Material
Supplementary Figure 1. Characterization of synHEK3 insertion sites determined by T7-assisted reporter mapping assay, related to Figure 1. A) Schematic of the synHEK3 reporter construct and the T7-assisted reporter mapping method. B) Experimental workflow. A complex library of synHEK3 reporters was transfected into PE2(+) K562 cells along with the piggyBac transposase. After transposons were stably integrated, the population of cells was bottlenecked to ~500 clones. Cells derived from the expansion of this bottlenecked pool were then subjected to T7-assisted reporter mapping. C) Copy number of synHEK3 reporters was estimated by qPCR using that of SNRPB (3 copies) as a reference. D) Histogram of T7 mapping counts of all synHEK3 reporters (n = 10,095). x-axis on log10 scale. E) Scatter plot of number of integration sites recovered at different sequencing depths by subsampling the original dataset. Overlaid polynomial regression lines fit to all sites (blue) or unique (red) sites. x- and y-axis on log10 scale. F) Histogram of number of distinct genomic integration sites per synHEK3 barcode. y-axis on log10 scale. G) Log2 enrichment of number of synHEK3 reporters across major genomic features relative to expected numbers based on feature sizes. H) Log2 enrichment of number of synHEK3 reporters across major genomic features relative to expected number of TTAA motifs in each feature. Blue points: log2 enrichment calculated using 10 sets of randomly sampled TTAA motifs (n = 4,273). Error bar: standard deviation. I) Barplot showing fractions of genomic sites overlapping with the Roadmap annotations. For piggyBac integrations, the overlapping status of all integrations (n = 9,450, green) and integrations marked by unique barcodes (n = 4,273, orange) are shown. For background, the mean fraction of 10 sets of TTAA sites (n = 9,450 each, blue) and 10 sets of random 4-bp sites (n = 9,450 each, gray) are shown. J) Density plots of chromatin feature scores of the selected genomic sites. For piggyBac integrations, distributions of all integrations (n = 9,450, green) and integrations marked by unique barcodes (n = 4,273, orange) are plotted. For background, 10 sets of TTAA sites (n = 9,450 each, blue) and 10 sets of random 4-bp sites (n = 9,450 each, gray) are plotted.
Supplementary Figure 2. Chromatin context has a major impact on prime editing efficiency, related to Figure 2. A) Scatter plots of chromatin feature scores vs. prime editing efficiencies for individual synHEK3 reporters. Points are colored by the number of neighboring points. The Spearman’s (ρ) and Pearson’s (r) correlation coefficients between the chromatin feature score and prime editing efficiency are annotated. B) Boxplots of prime editing efficiency in synHEK3 sites, binned by chromatin feature scores. Q1 - Q10 correspond to 10 equally sized bins of synHEK3 reporters with increasing chromatin feature scores. C) Scatter plots of Spearman’s ρ of chromatin feature scores vs. prime editing efficiency observed in two independent pools of synHEK3 reporters in wild-type (x-axis) or PE2(+) K562 cells (y-axis). D) Scatter plots of predicted (beta regression model) vs. observed prime editing efficiencies in two independent pools of synHEK3 reporters in wild-type K562 cells. Points are colored by the number of neighboring points. E) Histogram of ChIP-seq signal of H3K79me2 (red), H3K4me3 (blue) and H3K36me3 (green) in an 10-kb window surrounding the TSS of genes that are expressed (n = 11,514, TPM > 3), top) or unexpressed (n = 10,506, TPM < 3, bottom) in K562 cells. F) Boxplot of prime editing efficiencies of gene-proximal synHEK3 reporters. Distances were calculated relative to the closest TSS, scaled by gene length and binned. Negative values refer to synHEK3 sites located upstream of TSS. Values >100% refer to synHEK3 sites located downstream of TTS. The reporters were binned based on the TPM of the overlapping/nearest genes. Bin 1 contains unexpressed genes, while Bin 2–5 are equally sized in terms of the number of assigned genes (though not necessarily in terms of the number of assigned synHEK3 reporters), sorted by increasing expression levels. G) Boxplot of prime editing efficiencies of gene-proximal reporters. Same as panel F except that distance groupings are in terms of absolute rather than scaled distances from TSS. H) Cumulative distribution function (CDF) plot of prime editing efficiencies of intragenic synHEK3 reporters of different orientations with respect to the direction of transcription. “Opposite” means synHEK3 reporter on the opposite strand of the coding strand, and “same” means synHEK3 reporter on the same strand as the coding strand. Genes with detectable expression (TPM > 3) and synHEK3 reporters within 50 kb from the TSSs were selected for this analysis. P value: two-sided Kolmogorov–Smirnov test. I) Genome browser views of 4 very poorly editable sites. Sites of integration and measured editing efficiencies are shown as a dot plot at the top and aligned with selected epigenetic tracks. For each synHEK3 insertion, editing efficiency, number of reads with edit (numerator), and total number of reads (denominator) are annotated. The dashed vertical lines mark locations of the insertions. J) Boxplots of normalized editing rates (y-axis) for epegRNAs designed for prime editing at endogenous genomic sites, stratified by chromatin feature scores. The epegRNAs are grouped into quartiles based on each chromatin feature score (Q1 = lowest; Q4 = highest). The y-axis is in log10 scale. K) Density plot of CTT insertion frequency of synHEK3 reporters in intragenic, proximal intergenic (<10 kb) and distal intergenic (>10 kb) regions.
Supplementary Figure 3. Comparison between prime editing and Cas9 editing, leveraging a common set of integrated editing reporters, related to Figure 3. A) PCA plots of synHEK3 reporters generated using chromatin scores as features. The first two PCs are plotted (PC1: variance 62%; PC2: variance 9%). Points are colored by prime editing efficiency at Day 4 and Cas9 indel frequency measured at Day 1, 2 and 4. B) Cas9 indel frequency measured at Day 1, 2 and 4 for gene-proximal reporters. Distance was calculated relative to the closest TSS, scaled by gene length and binned. Negative values refer to synHEK3 sites located upstream of TSS. Values >100% refer to synHEK3 sites located downstream of TTS. Points are colored based on expression levels (log10) of the genes. C) PCA plots of synHEK3 reporters generated using chromatin scores as features. The first two PCs are plotted (PC1: variance 62%; PC2: variance 9%). Points are colored by the 6 main groups as in Fig. 3B. D) Scatter plot of Cas9 editing (Day 1) and prime editing (Day 4) efficiency. Points are colored by the 6 main groups as in panel C. E) Scatter plot of Cas9 editing (Day 1) and prime editing (Day 4) efficiency, colored by the 14 subgroups. F) Boxplot of H3K27me3 and H3K9me3 scores of synHEK3 reporters in Groups 1.0 and 2.0. P-value: two-sided Kolmogorov–Smirnov test. G) The MMEJ / (MMEJ + NHEJ) ratio in all synHEK3 sites and in sites overlapping with H3K27me3 and H3K9me3 peaks. Red line: the median MMEJ / (MMEJ + NHEJ) ratio. P-value: two-sided Kolmogorov–Smirnov test. H) The MMEJ / (MMEJ + NHEJ) ratio in the 6 groups of synHEK3 sites as in panel C. Red line: the median MMEJ / (MMEJ + NHEJ) ratio. I-J) Scatter plots of the MMEJ / (MMEJ + NHEJ) ratio or allele frequencies of intragenic synHEK3 reporters and their distance to corresponding TSSs (x-axis; log10 scale). Points are colored based on expression levels (log10) of the genes. Blue line: linear regression line with confidence interval in gray. I) The MMEJ / (MMEJ + NHEJ) ratio is plotted. J) Allele frequencies of the most frequent NHEJ (left) and MMEJ (right) alleles are plotted. The “NHEJ del/0/G” allele contains a single-base deletion of G at the Cas9 cut site. The “MMEJ del/5/GCACGTGATG” allele contains a bidirectional deletion of a 10-bp sequence (GCACGTGATG) around the cutsite.
Supplementary Figure 4. Experimental setup of the pooled shRNA screen and its readout via T7 IST-assisted sci-RNA-seq3, related to Figure 4. A) Correlation between efficiencies of random 3-bp insertions measured in the monoclonal lines versus efficiencies measured for the same reporters in the original polyclonal population. B) List of genes targeted by the shRNA library. Genes are grouped by pathways. Control genes are shown separately. C) Schematic of the lentiviral shRNA construct and the synHEK3 reporter, with key features relevant to the sci-RNA-seq3 workflow highlighted. TRE: tetracycline response element. D) Schematic structures of the sci-RNA-seq3, shRNA and synHEK3 libraries. UMI: unique molecular identifier; RT PBS: reverse transcription primer binding site.
Supplementary Figure 5. A pooled shRNA screen with sci-RNA-seq3 and effects of perturbing the MMR pathway on prime editing, related to Figure 5. A) Scatter plot of synHEK3 UMIs detected in single cells in sci-RNA-seq3 data. Cells assigned to the two clones are colored (green: clone 5; pink: clone 3). Mixed cells are in gray. B) Histograms of cell count per shRNA (left), number of shRNAs captured per cell (middle), and number of synHEK3 reporters captured per cell (right) in the two clones. C) Scatter plot of prime editing frequencies of synHEK3 reporters estimated with sci-RNA-seq3 vs. bulk amplicon sequencing. D) Scatter plots of the number of cells per synHEK3-shRNA pair and corresponding adjusted p values (-log10). Candidate shRNAs (left) and control shRNAs (right) are plotted separately. E) Effects of shRNAs targeting MMR-related genes in clone 3. Log2 fold-changes of prime editing efficiencies of synHEK3-shRNA pairs are plotted and colored by their corresponding adjusted p values (-log10). F-G) Effects of shRNAs against FEN1 and EP300. Pink lines: editing frequencies in cells with individual shRNAs; red line: mean editing frequencies of the gene-targeting shRNAs; light blue lines: control editing frequencies for individual shRNAs (not visible because low variance relative to mean line, shown in blue); blue line: mean control editing frequencies. H) Effects of shRNAs against DOT1L and KDM2B. Barcode sequences omitted but in the same order as panels F and G. Colors as panels F and G. I) Volcano plots of gene expression changes in EPZ-5676 treated cells (5 μM for 6 days). Genes with synHEK3 insertions in clone 3 or clone 5 of the single cell screen are colored in red. J) Scatter plots of log2 fold-change of prime editing efficiency vs. gene expression change (log2 fold; top) and H3K79me2 score (bottom) for a set of intragenic synHEK3 reporters with baseline efficiency > 20%. Blue line: linear regression line with confidence interval in gray.
Supplementary Figure 6. Chromatin context-specific response to HLTF inhibition, related to Figure 6. A) Effects of shRNAs against HLTF on all synHEK3 reporters in clone 5 (left) and clone 3 (right). For every synHEK3-shRNA pair, editing frequencies in cells with candidate shRNA are plotted and colored by their statistical significance. Control editing frequencies are shown in gray. Regions of the plot containing synHEK3 reporters that are significantly less responsive to HLTF inhibition are shaded green. Regions of the plot containing synHEK3 reporters with low (<0.2) or very high (>0.9) editing frequencies are shaded gray. Dashed horizontal lines indicate editing frequency of 0.2 and 0.9. B) Effects of shRNAs against PMS2 on all synHEK3 reporters in clone 5 (left) and clone 3 (right). Layout as in panel A. C) Scatter plots of baseline editing efficiencies (x-axes) vs. fold-changes induced by shRNAs against HLTF, MLH1 and PMS2 (y-axes) across 50 synHEK3 reporters. Green vertical lines correspond to baseline editing efficiencies of 0.2 and 0.9. D) CDF plots of log2 fold-change of mean editing frequency induced by shRNAs against HLTF. synHEK3 reporters are colored by their responsiveness to HLTF inhibition based on the shRNA screen. P value: two-sided Kolmogorov–Smirnov test. E) Validation of differential responsiveness to HLTF inhibition in cells transduced with individual shRNAs. CDF plots of log2 fold-change of editing frequency induced by shRNAs against HLTF (shHLTF.2367: used in the shRNA screen; shHLTF.2623: an orthogonal shRNA) or MLH1 (shMLH1.1911). synHEK3 reporters are colored by their responsiveness to HLTF inhibition based on the shRNA screen. P value: two-sided Kolmogorov–Smirnov test. F) Scatter plot of RNA log2 fold-changes induced by two shRNAs against HLTF. Genes overlapping or near shHLTF responsive sites are in pink; genes overlapping or near shHLTF unresponsive sites are in blue; other sites are in gray; HLTF is shown in purple (at bottom left corner; −1.5 and −2 log2 fold-change). G) CDF plot of log2 fold-changes of ATAC-seq peak counts induced by shHLTF.2367. Peaks near synHEK3 sites were defined as those either: 1) within 5 kb of a synHEK3 site; and/or 2) in the promoter or body of a gene overlapping or proximal (within 10 kb) to a synHEK3 insertion. H) Western blot analysis of HLTF, PMS2 and MLH1. Clone 5 or a wild-type (WT) K562 line were transduced with rtTA and shRNAs targeting the candidate genes. Protein expression was compared to cells without doxycycline (Dox) or cells transduced with a shRNA targeting a different gene.
Supplementary Figure 7. Modulating prime editing outcomes by epigenetic reprogramming, related to Figure 7. A) Schematic diagrams of target genes in the CRISPRoff experiment, with the locations of the CRISPRoff gRNAs (green) and synHEK3 reporters (red) annotated. B) Scatter plots of mean normalized counts of gene expression in cells transfected with CRISPRoff gRNAs at Day 11 (2 replicates each). x- and y-axis on log10 scale. Insets are bar plots of normalized counts of the CRISPRoff target genes. NTC: non-targeting control. C) Prime editing efficiency (%) at endogenous gene targets in K562 cells, with or without epigenetic editing via CRISPRa. Gray bars show mean prime editing efficiencies when control promoters were activated via CRISPRa. Blue bars show mean prime editing efficiencies when target gene promoters were activated via CRISPRa. P value: Welch’s two sample t-test. D) mRNA fold-change of CRIPSRa target genes quantified by qPCR. E) Relative expression levels of selected CRISPRa target genes compared to a set of reference genes. Circle: reference genes; triangle: endogenous expression levels of the target genes; square: expression levels of the target genes after CRISPRa. y-axis on log10 scale. F) Schematic of the IL2RB gene with locations of gRNA (for CRISPRa) or (e)pegRNA (for prime editing) targets annotated. Red lollipops indicate pegRNAs that were separately tested in experiments shown in Fig. 7E and Supplementary Fig. 7C. Dashed lollipops indicate locations of the new epegRNA pools. G) List of epegRNA edit types tested, including all possible single-nucleotide substitutions, small insertions of 1, 3 and 6 bp, insertion of a loxP site (34 bp), and small deletions of 1, 3 and 6 bp. H) Total editing efficiencies measured at the individual IL2RB exons. Gray bars show mean prime editing efficiencies in control groups and blue bars show mean prime editing efficiencies when the IL2RB promoter was first activated via CRISPRa.
Table S1. List of oligos and nucleic acid sequences used in this study, related to STAR Methods.
Table S2. List of ENCODE datasets used in this study and information of the beta regression models, related to Figures 1, 2 and STAR Methods.
Table S3. Lists of synHEK3 reporters and relevant features used in this study, related to Figures 2, 3, 6.
Table S4. Editing outcomes of epegRNAs targeting endogenous loci, related to Figure 2.
Table S5. The DDR-focused shRNA library and result of the pooled shRNA screen, related to Figures 4–6.
Table S6. The RNA-seq and ATAC-seq datasets for HLTF knockdown in K562 cells, related to Figure 6.
Table S7. Prime editing efficiency measured in the CRISPRoff and CRISPRa experiments, related to Figure 7.
Highlights.
T7-assisted reporter mapping assay determines precise locations of integrated constructs.
Active transcriptional elongation enhances prime editing.
Pooled shRNA screens reveal HLTF as a context-dependent suppressor of prime editing.
Epigenetic conditioning of a locus modulates prime editing efficiency.
ACKNOWLEDGMENTS
We thank members of the Shendure Lab and Yi Yin (UCLA) for ongoing input and advice over the course of this project. This research is supported by research grants from the National Human Genome Research Institute (NHGRI; UM1HG011966 to J.S., R01HG010632 to J.S., K99HG012973 to J.C., F32HG011817 to D.C.). J.C. was a Howard Hughes Medical Institute Fellow of the Damon Runyon Cancer Research Foundation (DRG-2403-20). T.A.M was supported by a Banting Postdoctoral Fellowship from the Natural Sciences and Engineering Research Council of Canada (NSERC). H.K. is a Washington Research Foundation Postdoctoral Fellow. J.B.L. is a Fellow of the Damon Runyon Cancer Research Foundation (DRG-2435-21). J.F.N is supported by the UW ISCRM Fellows Program. J.S. is an Investigator of the Howard Hughes Medical Institute.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
DECLARATION OF INTERESTS
J.S. is an SAB member, consultant and/or co-founder of Prime Medicine, Cajal Neuroscience, Guardant Health, Maze Therapeutics, Camp4 Therapeutics, Phase Genomics, Adaptive Biotechnologies, Scale Biosciences, Sixth Street Capital and Pacific Biosciences. University of Washington has filed a provisional patent application based on this work, on which J.S., X.L., W.C. and J.C. are co-inventors. All other authors declare no competing interests.
REFERENCES
- 1.Anzalone AV, Randolph PB, Davis JR, Sousa AA, Koblan LW, Levy JM, Chen PJ, Wilson C, Newby GA, Raguram A, et al. (2019). Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Choi J, Chen W, Suiter CC, Lee C, Chardon FM, Yang W, Leith A, Daza RM, Martin B, and Shendure J (2022). Precise genomic deletions using paired prime editing. Nat. Biotechnol 40, 218–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jiang T, Zhang X-O, Weng Z, and Xue W (2022). Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol 40, 227–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Anzalone AV, Gao XD, Podracky CJ, Nelson AT, Koblan LW, Raguram A, Levy JM, Mercer JAM, and Liu DR (2022). Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol 40, 731–740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen PJ, and Liu DR (2023). Prime editing for precise and highly versatile genome manipulation. Nat. Rev. Genet 24, 161–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu B, Dong X, Cheng H, Zheng C, Chen Z, Rodríguez TC, Liang S-Q, Xue W, and Sontheimer EJ (2022). A split prime editor with untethered reverse transcriptase and circular RNA template. Nat. Biotechnol 40, 1388–1393. [DOI] [PubMed] [Google Scholar]
- 7.Jang H, Jo DH, Cho CS, Shin JH, Seo JH, Yu G, Gopalappa R, Kim D, Cho S-R, Kim JH, et al. (2022). Application of prime editing to the correction of mutations and phenotypes in adult mice with liver and eye diseases. Nat Biomed Eng 6, 181–194. [DOI] [PubMed] [Google Scholar]
- 8.Choi J, Chen W, Minkina A, Chardon FM, Suiter CC, Regalado SG, Domcke S, Hamazaki N, Lee C, Martin B, et al. (2022). A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature 608, 98–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chen W, Choi J, Nathans JF, Agarwal V, Martin B, Nichols E, Leith A, Lee C, and Shendure J (2021). Multiplex genomic recording of enhancer and signal transduction activity in mammalian cells. bioRxiv, 2021.11.05.467434. 10.1101/2021.11.05.467434. [DOI] [Google Scholar]
- 10.Loveless TB, Carlson CK, Hu VJ, Dentzel Helmy CA, Liang G, Ficht M, Singhai A, and Liu CC (2021). Molecular recording of sequential cellular events into DNA. bioRxiv, 2021.11.05.467507. 10.1101/2021.11.05.467507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Erwood S, Bily TMI, Lequyer J, Yan J, Gulati N, Brewer RA, Zhou L, Pelletier L, Ivakine EA, and Cohn RD (2022). Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol 40, 885–895. [DOI] [PubMed] [Google Scholar]
- 12.Mathis N, Allam A, Kissling L, Marquart KF, Schmidheini L, Solari C, Balázs Z, Krauthammer M, and Schwank G (2023). Predicting prime editing efficiency and product purity by deep learning. Nat. Biotechnol 10.1038/s41587-022-01613-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen PJ, Hussmann JA, Yan J, Knipping F, Ravisankar P, Chen P-F, Chen C, Nelson JW, Newby GA, Sahin M, et al. (2021). Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635–5652.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ferreira da Silva J, Oliveira GP, Arasa-Verge EA, Kagiou C, Moretton A, Timelthaler G, Jiricny J, and Loizou JI (2022). Prime editing efficiency and fidelity are enhanced in the absence of mismatch repair. Nat. Commun 13, 760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Doman JL, Pandey S, Neugebauer ME, An M, Davis JR, Randolph PB, McElroy A, Gao XD, Raguram A, Richter MF, et al. (2023). Phage-assisted evolution and protein engineering yield compact, efficient prime editors. Cell 186, 3983–4002.e26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nelson JW, Randolph PB, Shen SP, Everette KA, Chen PJ, Anzalone AV, An M, Newby GA, Chen JC, Hsu A, et al. (2021). Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol 10.1038/s41587-021-01039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kim HK, Yu G, Park J, Min S, Lee S, Yoon S, and Kim HH (2021). Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol 39, 198–206. [DOI] [PubMed] [Google Scholar]
- 18.Koeppel J, Weller J, Peets EM, Pallaseni A, Kuzmin I, Raudvere U, Peterson H, Liberante FG, and Parts L (2023). Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants. Nat. Biotechnol 10.1038/s41587-023-01678-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yu G, Kim HK, Park J, Kwak H, Cheong Y, Kim D, Kim J, Kim J, and Kim HH (2023). Prediction of efficiencies for diverse prime editing systems in multiple cell types. Cell 186, 2256–2272.e23. [DOI] [PubMed] [Google Scholar]
- 20.Mathis N, Allam A, Tálas A, Benvenuto E, Schep R, Damodharan T, Balázs Z, Janjuha S, Schmidheini L, Böck D, et al. (2023). Predicting prime editing efficiency across diverse edit types and chromatin contexts with machine learning. bioRxiv, 2023.10.09.561414. 10.1101/2023.10.09.561414. [DOI] [Google Scholar]
- 21.Yarrington RM, Verma S, Schwartz S, Trautman JK, and Carroll D (2018). Nucleosomes inhibit target cleavage by CRISPR-Cas9 in vivo. Proc. Natl. Acad. Sci. U. S. A 115, 9351–9358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Isaac RS, Jiang F, Doudna JA, Lim WA, Narlikar GJ, and Almeida R (2016). Nucleosome breathing and remodeling constrain CRISPR-Cas9 function. Elife 5. 10.7554/eLife.13450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Janssen JM, Chen X, Liu J, and Gonçalves MAFV (2019). The Chromatin Structure of CRISPR-Cas9 Target DNA Controls the Balance between Mutagenic and Homology-Directed Gene-Editing Events. Mol. Ther. Nucleic Acids 16, 141–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schep R, Brinkman EK, Leemans C, Vergara X, van der Weide RH, Morris B, van Schaik T, Manzo SG, Peric-Hupkes D, van den Berg J, et al. (2021). Impact of chromatin context on Cas9-induced DNA double-strand break repair pathway balance. Mol. Cell 81, 2216–2230.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Daer RM, Cutts JP, Brafman DA, and Haynes KA (2017). The Impact of Chromatin Dynamics on Cas9-Mediated Genome Editing in Human Cells. ACS Synth. Biol 6, 428–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gisler S, Gonçalves JP, Akhtar W, de Jong J, Pindyurin AV, Wessels LFA, and van Lohuizen M (2019). Multiplexed Cas9 targeting reveals genomic location effects and gRNA-based staggered breaks influencing mutation efficiency. Nat. Commun 10, 1598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ochman H, Gerber AS, and Hartl DL (1988). Genetic applications of an inverse polymerase chain reaction. Genetics 120, 621–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Potter CJ, and Luo L (2010). Splinkerette PCR for mapping transposable elements in Drosophila. PLoS One 5, e10168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Schmidt M, Zickler P, Hoffmann G, Haas S, Wissler M, Muessig A, Tisdale JF, Kuramoto K, Andrews RG, Wu T, et al. (2002). Polyclonal long-term repopulating stem cell clones in a primate model. Blood 100, 2737–2743. [DOI] [PubMed] [Google Scholar]
- 30.Hamada M, Nishio N, Okuno Y, Suzuki S, Kawashima N, Muramatsu H, Tsubota S, Wilson MH, Morita D, Kataoka S, et al. (2018). Integration Mapping of piggyBac-Mediated CD19 Chimeric Antigen Receptor T Cells Analyzed by Novel Tagmentation-Assisted PCR. EBioMedicine 34, 18–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ding S, Wu X, Li G, Han M, Zhuang Y, and Xu T (2005). Efficient transposition of the piggyBac (PB) transposon in mammalian cells and mice. Cell 122, 473–483. [DOI] [PubMed] [Google Scholar]
- 32.Wilson MH, Coates CJ, and George AL Jr (2007). PiggyBac transposon-mediated gene transfer in human cells. Mol. Ther 15, 139–145. [DOI] [PubMed] [Google Scholar]
- 33.Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, et al. (2015). Integrative analysis of 111 reference human epigenomes. Springer Nature, 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Okada Y, Feng Q, Lin Y, Jiang Q, Li Y, Coffield VM, Su L, Xu G, and Zhang Y (2005). hDOT1L links histone methylation to leukemogenesis. Cell 121, 167–178. [DOI] [PubMed] [Google Scholar]
- 36.Steger DJ, Lefterova MI, Ying L, Stonestrom AJ, Schupp M, Zhuo D, Vakoc AL, Kim J-E, Chen J, Lazar MA, et al. (2008). DOT1L/KMT4 recruitment and H3K79 methylation are ubiquitously coupled with gene transcription in mammalian cells. Mol. Cell. Biol 28, 2825–2839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Feng Q, Wang H, Ng HH, Erdjument-Bromage H, Tempst P, Struhl K, and Zhang Y (2002). Methylation of H3-lysine 79 is mediated by a new family of HMTases without a SET domain. Curr. Biol 12, 1052–1058. [DOI] [PubMed] [Google Scholar]
- 38.Padeken J, Methot SP, and Gasser SM (2022). Establishment of H3K9-methylated heterochromatin and its functions in tissue differentiation and maintenance. Nat. Rev. Mol. Cell Biol 23, 623–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Peters AHFM, Kubicek S, Mechtler K, O’Sullivan RJ, Derijck AAHA, Perez-Burgos L, Kohlmaier A, Opravil S, Tachibana M, Shinkai Y, et al. (2003). Partitioning and plasticity of repressive histone methylation states in mammalian chromatin. Mol. Cell 12, 1577–1589. [DOI] [PubMed] [Google Scholar]
- 40.Plath K, Fang J, Mlynarczyk-Evans SK, Cao R, Worringer KA, Wang H, de la Cruz CC, Otte AP, Panning B, and Zhang Y (2003). Role of histone H3 lysine 27 methylation in X inactivation. Science 300, 131–135. [DOI] [PubMed] [Google Scholar]
- 41.Brinkman EK, Chen T, de Haas M, Holland HA, Akhtar W, and van Steensel B (2018). Kinetics and Fidelity of the Repair of Cas9-Induced Double-Strand DNA Breaks. Mol. Cell 70, 801–813.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Price BD, and D’Andrea AD (2013). Chromatin remodeling at DNA double-strand breaks. Cell 152, 1344–1354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Vergara X, Manjon AG, Morris B, Schep R, Leemans C, Sanders MA, Beijersbergen RL, Medema RH, and van Steensel B (2022). Widespread chromatin context-dependencies of DNA double-strand break repair proteins. bioRxiv, 2022.10.07.511243. 10.1101/2022.10.07.511243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R, et al. (2016). Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853–1866.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Adamson B, Norman TM, Jost M, Cho MY, Nuñez JK, Chen Y, Villalta JE, Gilbert LA, Horlbeck MA, Hein MY, et al. (2016). A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell 167, 1867–1882.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, Schuster LC, Kuchler A, Alpar D, and Bock C (2017). Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Yin Y, Jiang Y, Lam K-WG, Berletch JB, Disteche CM, Noble WS, Steemers FJ, Camerini-Otero RD, Adey AC, and Shendure J (2019). High-Throughput Single-Cell Sequencing with Linear Amplification. Mol. Cell 76, 676–690.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Askary A, Sanchez-Guardado L, Linton JM, Chadly DM, Budde MW, Cai L, Lois C, and Elowitz MB (2020). In situ readout of DNA barcodes and single base edits facilitated by in vitro transcription. Nat. Biotechnol 38, 66–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Martin BK, Qiu C, Nichols E, Phung M, Green-Gladden R, Srivatsan S, Blecher-Gonen R, Beliveau BJ, Trapnell C, Cao J, et al. (2022). Optimized single-nucleus transcriptional profiling by combinatorial indexing. Nat. Protoc 10.1038/s41596-022-00752-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Fellmann C, Hoffmann T, Sridhar V, Hopfgartner B, Muhar M, Roth M, Lai DY, Barbosa IAM, Kwon JS, Guan Y, et al. (2013). An optimized microRNA backbone for effective single-copy RNAi. Cell Rep. 5, 1704–1713. [DOI] [PubMed] [Google Scholar]
- 51.Pelossof R, Fairchild L, Huang C-H, Widmer C, Sreedharan VT, Sinha N, Lai D-Y, Guan Y, Premsrirut PK, Tschaharganeh DF, et al. (2017). Prediction of potent shRNAs with a sequential classification algorithm. Nat. Biotechnol 35, 350–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hussmann JA, Ling J, Ravisankar P, Yan J, Cirincione A, Xu A, Simpson D, Yang D, Bothmer A, Cotta-Ramusino C, et al. (2021). Mapping the genetic landscape of DNA double-strand break repair. Cell 184, 5653–5669.e25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kunkel TA, and Erie DA (2005). DNA mismatch repair. Annu. Rev. Biochem 74, 681–710. [DOI] [PubMed] [Google Scholar]
- 54.Eckner R, Ewen ME, Newsome D, Gerdes M, DeCaprio JA, Lawrence JB, and Livingston DM (1994). Molecular cloning and functional analysis of the adenovirus E1A-associated 300-kD protein (p300) reveals a protein with properties of a transcriptional adaptor. Genes Dev. 8, 869–884. [DOI] [PubMed] [Google Scholar]
- 55.Kang J-Y, Kim J-Y, Kim K-B, Park JW, Cho H, Hahm JY, Chae Y-C, Kim D, Kook H, Rhee S, et al. (2018). KDM2B is a histone H3K79 demethylase and induces transcriptional repression via sirtuin-1-mediated chromatin silencing. FASEB J. 32, 5737–5750. [DOI] [PubMed] [Google Scholar]
- 56.Daigle SR, Olhava EJ, Therkelsen CA, Basavapathruni A, Jin L, Boriack-Sjodin PA, Allain CJ, Klaus CR, Raimondi A, Scott MP, et al. (2013). Potent inhibition of DOT1L as treatment of MLL-fusion leukemia. Blood 122, 1017–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Unk I, Hajdú I, Fátyol K, Hurwitz J, Yoon J-H, Prakash L, Prakash S, and Haracska L (2008). Human HLTF functions as a ubiquitin ligase for proliferating cell nuclear antigen polyubiquitination. Proc. Natl. Acad. Sci. U. S. A 105, 3768–3773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Nuñez JK, Chen J, Pommier GC, Cogan JZ, Replogle JM, Adriaens C, Ramadoss GN, Shi Q, Hung KL, Samelson AJ, et al. (2021). Genome-wide programmable transcriptional memory by CRISPR-based epigenome editing. Cell 184, 2503–2519.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Konermann S, Brigham MD, Trevino AE, Joung J, Abudayyeh OO, Barcena C, Hsu PD, Habib N, Gootenberg JS, Nishimasu H, et al. (2015). Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Sockolosky JT, Trotta E, Parisi G, Picton L, Su LL, Le AC, Chhabra A, Silveria SL, George BM, King IC, et al. (2018). Selective targeting of engineered T cells using orthogonal IL-2 cytokine-receptor complexes. Science 359, 1037–1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Tian S, Choi W-T, Liu D, Pesavento J, Wang Y, An J, Sodroski JG, and Huang Z (2005). Distinct functional sites for human immunodeficiency virus type 1 and stromal cell-derived factor 1alpha on CXCR4 transmembrane helical domains. J. Virol 79, 12667–12673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Thress KS, Paweletz CP, Felip E, Cho BC, Stetson D, Dougherty B, Lai Z, Markovets A, Vivancos A, Kuang Y, et al. (2015). Acquired EGFR C797S mutation mediates resistance to AZD9291 in non-small cell lung cancer harboring EGFR T790M. Nat. Med 21, 560–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Chardon FM, McDiarmid TA, Page NF, Martin BK, Domcke S, Regalado SG, Lalanne J-B, Calderon D, Starita LM, Sanders SJ, et al. (2023). Multiplex, single-cell CRISPRa screening for cell type specific regulatory elements. bioRxiv, 2023.03.28.534017. 10.1101/2023.03.28.534017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Tian R, Abarientos A, Hong J, Hashemi SH, Yan R, Dräger N, Leng K, Nalls MA, Singleton AB, Xu K, et al. (2021). Genome-wide CRISPRi/a screens in human neurons link lysosomal failure to ferroptosis. Nat. Neurosci 24, 1020–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Lin J-R, Zeman MK, Chen J-Y, Yee M-C, and Cimprich KA (2011). SHPRH and HLTF act in a damage-specific manner to coordinate different forms of postreplication repair and prevent mutagenesis. Mol. Cell 42, 237–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sanjana NE, Shalem O, and Zhang F (2014). Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Maddalo D, Manchado E, Concepcion CP, Bonetti C, Vidigal JA, Han Y-C, Ogrodowski P, Crippa A, Rekhtman N, de Stanchina E, et al. (2014). In vivo engineering of oncogenic chromosomal rearrangements with the CRISPR/Cas9 system. Nature 516, 423–427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, and Zhang F (2013). Genome engineering using the CRISPR-Cas9 system. Nat. Protoc 8, 2281–2308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Tsai BP, Wang X, Huang L, and Waterman ML (2011). Quantitative Profiling of In Vivo-assembled RNA-Protein Complexes Using a Novel Integrated Proteomic Approach*. Mol. Cell. Proteomics 10, M110.007385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Escobar M, Li J, Patel A, Liu S, Xu Q, and Hilton IB (2022). Quantification of Genome Editing and Transcriptional Control Capabilities Reveals Hierarchies among Diverse CRISPR/Cas Systems in Human Cells. ACS Synth. Biol 11, 3239–3250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kremers G-J, Hazelwood KL, Murphy CS, Davidson MW, and Piston DW (2009). Photoconversion in orange and red fluorescent proteins. Nat. Methods 6, 355–358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chen JJ, Nathaniel DL, Raghavan P, Nelson M, Tian R, Tse E, Hong JY, See SK, Mok S-A, Hein MY, et al. (2019). Compromised function of the ESCRT pathway promotes endolysosomal escape of tau seeds and propagation of tau aggregation. J. Biol. Chem 294, 18952–18966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Yusa K, Zhou L, Li MA, Bradley A, and Craig NL (2011). A hyperactive piggyBac transposase for mammalian applications. Proc. Natl. Acad. Sci. U. S. A 108, 1531–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.McKenna A, and Shendure J (2018). FlashFry: a fast and flexible tool for large-scale CRISPR target design. BMC Biol. 16, 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Schmidt H, Zhang M, Mourelatos H, Sánchez-Rivera FJ, Lowe SW, Ventura A, Leslie CS, and Pritykin Y (2022). Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2. bioRxiv, 2022.05.02.490368. 10.1101/2022.05.02.490368. [DOI] [Google Scholar]
- 76.Patro R, Duggal G, Love MI, Irizarry RA, and Kingsford C (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Love MI, Huber W, and Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.ENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, et al. (2020). Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, et al. (2012). BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Liao Y, Smyth GK, and Shi W (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930. [DOI] [PubMed] [Google Scholar]
- 82.Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12. [Google Scholar]
- 83.Li H, and Durbin R (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. (2021). Twelve years of SAMtools and BCFtools. Gigascience 10. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, and Carey VJ (2013). Software for computing and annotating genomic ranges. PLoS Comput. Biol 9, e1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, and Mesirov JP (2011). Integrative genomics viewer. Nat. Biotechnol 29, 24–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Crooks GE, Hon G, Chandonia J-M, and Brenner SE (2004). WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Yu G, Wang L-G, and He Q-Y (2015). ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383. [DOI] [PubMed] [Google Scholar]
- 89.Wang Q, Li M, Wu T, Zhan L, Li L, Chen M, Xie W, Xie Z, Hu E, Xu S, et al. (2022). Exploring Epigenomic Datasets by ChIPseeker. Curr Protoc 2, e585. [DOI] [PubMed] [Google Scholar]
- 90.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, and Glass CK (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee BT, et al. (2016). ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Akalin A, Franke V, Vlahoviček K, Mason CE, and Schübeler D (2015). Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics 31, 1127–1129. [DOI] [PubMed] [Google Scholar]
- 93.Gu Z, Eils R, and Schlesner M (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849. [DOI] [PubMed] [Google Scholar]
- 94.Hahne F, and Ivanek R (2016). Visualizing Genomic Data Using Gviz and Bioconductor. In Statistical Genomics: Methods and Protocols, Mathé E and Davis S, eds. (Springer; New York: ), pp. 335–351. [DOI] [PubMed] [Google Scholar]
- 95.Needleman SB, and Wunsch CD (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol 48, 443–453. [DOI] [PubMed] [Google Scholar]
- 96.Kruskal JB (1983). An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules. SIAM Rev. 25, 201–237. [Google Scholar]
- 97.Putri GH, Anders S, Pyl PT, Pimanda JE, and Zanini F (2022). Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics 38, 2943–2945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. (2021). Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Cribari-Neto F, and Zeileis A (2010). Beta Regression inR. J. Stat. Softw 34. 10.18637/jss.v034.i02. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Figure 1. Characterization of synHEK3 insertion sites determined by T7-assisted reporter mapping assay, related to Figure 1. A) Schematic of the synHEK3 reporter construct and the T7-assisted reporter mapping method. B) Experimental workflow. A complex library of synHEK3 reporters was transfected into PE2(+) K562 cells along with the piggyBac transposase. After transposons were stably integrated, the population of cells was bottlenecked to ~500 clones. Cells derived from the expansion of this bottlenecked pool were then subjected to T7-assisted reporter mapping. C) Copy number of synHEK3 reporters was estimated by qPCR using that of SNRPB (3 copies) as a reference. D) Histogram of T7 mapping counts of all synHEK3 reporters (n = 10,095). x-axis on log10 scale. E) Scatter plot of number of integration sites recovered at different sequencing depths by subsampling the original dataset. Overlaid polynomial regression lines fit to all sites (blue) or unique (red) sites. x- and y-axis on log10 scale. F) Histogram of number of distinct genomic integration sites per synHEK3 barcode. y-axis on log10 scale. G) Log2 enrichment of number of synHEK3 reporters across major genomic features relative to expected numbers based on feature sizes. H) Log2 enrichment of number of synHEK3 reporters across major genomic features relative to expected number of TTAA motifs in each feature. Blue points: log2 enrichment calculated using 10 sets of randomly sampled TTAA motifs (n = 4,273). Error bar: standard deviation. I) Barplot showing fractions of genomic sites overlapping with the Roadmap annotations. For piggyBac integrations, the overlapping status of all integrations (n = 9,450, green) and integrations marked by unique barcodes (n = 4,273, orange) are shown. For background, the mean fraction of 10 sets of TTAA sites (n = 9,450 each, blue) and 10 sets of random 4-bp sites (n = 9,450 each, gray) are shown. J) Density plots of chromatin feature scores of the selected genomic sites. For piggyBac integrations, distributions of all integrations (n = 9,450, green) and integrations marked by unique barcodes (n = 4,273, orange) are plotted. For background, 10 sets of TTAA sites (n = 9,450 each, blue) and 10 sets of random 4-bp sites (n = 9,450 each, gray) are plotted.
Supplementary Figure 2. Chromatin context has a major impact on prime editing efficiency, related to Figure 2. A) Scatter plots of chromatin feature scores vs. prime editing efficiencies for individual synHEK3 reporters. Points are colored by the number of neighboring points. The Spearman’s (ρ) and Pearson’s (r) correlation coefficients between the chromatin feature score and prime editing efficiency are annotated. B) Boxplots of prime editing efficiency in synHEK3 sites, binned by chromatin feature scores. Q1 - Q10 correspond to 10 equally sized bins of synHEK3 reporters with increasing chromatin feature scores. C) Scatter plots of Spearman’s ρ of chromatin feature scores vs. prime editing efficiency observed in two independent pools of synHEK3 reporters in wild-type (x-axis) or PE2(+) K562 cells (y-axis). D) Scatter plots of predicted (beta regression model) vs. observed prime editing efficiencies in two independent pools of synHEK3 reporters in wild-type K562 cells. Points are colored by the number of neighboring points. E) Histogram of ChIP-seq signal of H3K79me2 (red), H3K4me3 (blue) and H3K36me3 (green) in an 10-kb window surrounding the TSS of genes that are expressed (n = 11,514, TPM > 3), top) or unexpressed (n = 10,506, TPM < 3, bottom) in K562 cells. F) Boxplot of prime editing efficiencies of gene-proximal synHEK3 reporters. Distances were calculated relative to the closest TSS, scaled by gene length and binned. Negative values refer to synHEK3 sites located upstream of TSS. Values >100% refer to synHEK3 sites located downstream of TTS. The reporters were binned based on the TPM of the overlapping/nearest genes. Bin 1 contains unexpressed genes, while Bin 2–5 are equally sized in terms of the number of assigned genes (though not necessarily in terms of the number of assigned synHEK3 reporters), sorted by increasing expression levels. G) Boxplot of prime editing efficiencies of gene-proximal reporters. Same as panel F except that distance groupings are in terms of absolute rather than scaled distances from TSS. H) Cumulative distribution function (CDF) plot of prime editing efficiencies of intragenic synHEK3 reporters of different orientations with respect to the direction of transcription. “Opposite” means synHEK3 reporter on the opposite strand of the coding strand, and “same” means synHEK3 reporter on the same strand as the coding strand. Genes with detectable expression (TPM > 3) and synHEK3 reporters within 50 kb from the TSSs were selected for this analysis. P value: two-sided Kolmogorov–Smirnov test. I) Genome browser views of 4 very poorly editable sites. Sites of integration and measured editing efficiencies are shown as a dot plot at the top and aligned with selected epigenetic tracks. For each synHEK3 insertion, editing efficiency, number of reads with edit (numerator), and total number of reads (denominator) are annotated. The dashed vertical lines mark locations of the insertions. J) Boxplots of normalized editing rates (y-axis) for epegRNAs designed for prime editing at endogenous genomic sites, stratified by chromatin feature scores. The epegRNAs are grouped into quartiles based on each chromatin feature score (Q1 = lowest; Q4 = highest). The y-axis is in log10 scale. K) Density plot of CTT insertion frequency of synHEK3 reporters in intragenic, proximal intergenic (<10 kb) and distal intergenic (>10 kb) regions.
Supplementary Figure 3. Comparison between prime editing and Cas9 editing, leveraging a common set of integrated editing reporters, related to Figure 3. A) PCA plots of synHEK3 reporters generated using chromatin scores as features. The first two PCs are plotted (PC1: variance 62%; PC2: variance 9%). Points are colored by prime editing efficiency at Day 4 and Cas9 indel frequency measured at Day 1, 2 and 4. B) Cas9 indel frequency measured at Day 1, 2 and 4 for gene-proximal reporters. Distance was calculated relative to the closest TSS, scaled by gene length and binned. Negative values refer to synHEK3 sites located upstream of TSS. Values >100% refer to synHEK3 sites located downstream of TTS. Points are colored based on expression levels (log10) of the genes. C) PCA plots of synHEK3 reporters generated using chromatin scores as features. The first two PCs are plotted (PC1: variance 62%; PC2: variance 9%). Points are colored by the 6 main groups as in Fig. 3B. D) Scatter plot of Cas9 editing (Day 1) and prime editing (Day 4) efficiency. Points are colored by the 6 main groups as in panel C. E) Scatter plot of Cas9 editing (Day 1) and prime editing (Day 4) efficiency, colored by the 14 subgroups. F) Boxplot of H3K27me3 and H3K9me3 scores of synHEK3 reporters in Groups 1.0 and 2.0. P-value: two-sided Kolmogorov–Smirnov test. G) The MMEJ / (MMEJ + NHEJ) ratio in all synHEK3 sites and in sites overlapping with H3K27me3 and H3K9me3 peaks. Red line: the median MMEJ / (MMEJ + NHEJ) ratio. P-value: two-sided Kolmogorov–Smirnov test. H) The MMEJ / (MMEJ + NHEJ) ratio in the 6 groups of synHEK3 sites as in panel C. Red line: the median MMEJ / (MMEJ + NHEJ) ratio. I-J) Scatter plots of the MMEJ / (MMEJ + NHEJ) ratio or allele frequencies of intragenic synHEK3 reporters and their distance to corresponding TSSs (x-axis; log10 scale). Points are colored based on expression levels (log10) of the genes. Blue line: linear regression line with confidence interval in gray. I) The MMEJ / (MMEJ + NHEJ) ratio is plotted. J) Allele frequencies of the most frequent NHEJ (left) and MMEJ (right) alleles are plotted. The “NHEJ del/0/G” allele contains a single-base deletion of G at the Cas9 cut site. The “MMEJ del/5/GCACGTGATG” allele contains a bidirectional deletion of a 10-bp sequence (GCACGTGATG) around the cutsite.
Supplementary Figure 4. Experimental setup of the pooled shRNA screen and its readout via T7 IST-assisted sci-RNA-seq3, related to Figure 4. A) Correlation between efficiencies of random 3-bp insertions measured in the monoclonal lines versus efficiencies measured for the same reporters in the original polyclonal population. B) List of genes targeted by the shRNA library. Genes are grouped by pathways. Control genes are shown separately. C) Schematic of the lentiviral shRNA construct and the synHEK3 reporter, with key features relevant to the sci-RNA-seq3 workflow highlighted. TRE: tetracycline response element. D) Schematic structures of the sci-RNA-seq3, shRNA and synHEK3 libraries. UMI: unique molecular identifier; RT PBS: reverse transcription primer binding site.
Supplementary Figure 5. A pooled shRNA screen with sci-RNA-seq3 and effects of perturbing the MMR pathway on prime editing, related to Figure 5. A) Scatter plot of synHEK3 UMIs detected in single cells in sci-RNA-seq3 data. Cells assigned to the two clones are colored (green: clone 5; pink: clone 3). Mixed cells are in gray. B) Histograms of cell count per shRNA (left), number of shRNAs captured per cell (middle), and number of synHEK3 reporters captured per cell (right) in the two clones. C) Scatter plot of prime editing frequencies of synHEK3 reporters estimated with sci-RNA-seq3 vs. bulk amplicon sequencing. D) Scatter plots of the number of cells per synHEK3-shRNA pair and corresponding adjusted p values (-log10). Candidate shRNAs (left) and control shRNAs (right) are plotted separately. E) Effects of shRNAs targeting MMR-related genes in clone 3. Log2 fold-changes of prime editing efficiencies of synHEK3-shRNA pairs are plotted and colored by their corresponding adjusted p values (-log10). F-G) Effects of shRNAs against FEN1 and EP300. Pink lines: editing frequencies in cells with individual shRNAs; red line: mean editing frequencies of the gene-targeting shRNAs; light blue lines: control editing frequencies for individual shRNAs (not visible because low variance relative to mean line, shown in blue); blue line: mean control editing frequencies. H) Effects of shRNAs against DOT1L and KDM2B. Barcode sequences omitted but in the same order as panels F and G. Colors as panels F and G. I) Volcano plots of gene expression changes in EPZ-5676 treated cells (5 μM for 6 days). Genes with synHEK3 insertions in clone 3 or clone 5 of the single cell screen are colored in red. J) Scatter plots of log2 fold-change of prime editing efficiency vs. gene expression change (log2 fold; top) and H3K79me2 score (bottom) for a set of intragenic synHEK3 reporters with baseline efficiency > 20%. Blue line: linear regression line with confidence interval in gray.
Supplementary Figure 6. Chromatin context-specific response to HLTF inhibition, related to Figure 6. A) Effects of shRNAs against HLTF on all synHEK3 reporters in clone 5 (left) and clone 3 (right). For every synHEK3-shRNA pair, editing frequencies in cells with candidate shRNA are plotted and colored by their statistical significance. Control editing frequencies are shown in gray. Regions of the plot containing synHEK3 reporters that are significantly less responsive to HLTF inhibition are shaded green. Regions of the plot containing synHEK3 reporters with low (<0.2) or very high (>0.9) editing frequencies are shaded gray. Dashed horizontal lines indicate editing frequency of 0.2 and 0.9. B) Effects of shRNAs against PMS2 on all synHEK3 reporters in clone 5 (left) and clone 3 (right). Layout as in panel A. C) Scatter plots of baseline editing efficiencies (x-axes) vs. fold-changes induced by shRNAs against HLTF, MLH1 and PMS2 (y-axes) across 50 synHEK3 reporters. Green vertical lines correspond to baseline editing efficiencies of 0.2 and 0.9. D) CDF plots of log2 fold-change of mean editing frequency induced by shRNAs against HLTF. synHEK3 reporters are colored by their responsiveness to HLTF inhibition based on the shRNA screen. P value: two-sided Kolmogorov–Smirnov test. E) Validation of differential responsiveness to HLTF inhibition in cells transduced with individual shRNAs. CDF plots of log2 fold-change of editing frequency induced by shRNAs against HLTF (shHLTF.2367: used in the shRNA screen; shHLTF.2623: an orthogonal shRNA) or MLH1 (shMLH1.1911). synHEK3 reporters are colored by their responsiveness to HLTF inhibition based on the shRNA screen. P value: two-sided Kolmogorov–Smirnov test. F) Scatter plot of RNA log2 fold-changes induced by two shRNAs against HLTF. Genes overlapping or near shHLTF responsive sites are in pink; genes overlapping or near shHLTF unresponsive sites are in blue; other sites are in gray; HLTF is shown in purple (at bottom left corner; −1.5 and −2 log2 fold-change). G) CDF plot of log2 fold-changes of ATAC-seq peak counts induced by shHLTF.2367. Peaks near synHEK3 sites were defined as those either: 1) within 5 kb of a synHEK3 site; and/or 2) in the promoter or body of a gene overlapping or proximal (within 10 kb) to a synHEK3 insertion. H) Western blot analysis of HLTF, PMS2 and MLH1. Clone 5 or a wild-type (WT) K562 line were transduced with rtTA and shRNAs targeting the candidate genes. Protein expression was compared to cells without doxycycline (Dox) or cells transduced with a shRNA targeting a different gene.
Supplementary Figure 7. Modulating prime editing outcomes by epigenetic reprogramming, related to Figure 7. A) Schematic diagrams of target genes in the CRISPRoff experiment, with the locations of the CRISPRoff gRNAs (green) and synHEK3 reporters (red) annotated. B) Scatter plots of mean normalized counts of gene expression in cells transfected with CRISPRoff gRNAs at Day 11 (2 replicates each). x- and y-axis on log10 scale. Insets are bar plots of normalized counts of the CRISPRoff target genes. NTC: non-targeting control. C) Prime editing efficiency (%) at endogenous gene targets in K562 cells, with or without epigenetic editing via CRISPRa. Gray bars show mean prime editing efficiencies when control promoters were activated via CRISPRa. Blue bars show mean prime editing efficiencies when target gene promoters were activated via CRISPRa. P value: Welch’s two sample t-test. D) mRNA fold-change of CRIPSRa target genes quantified by qPCR. E) Relative expression levels of selected CRISPRa target genes compared to a set of reference genes. Circle: reference genes; triangle: endogenous expression levels of the target genes; square: expression levels of the target genes after CRISPRa. y-axis on log10 scale. F) Schematic of the IL2RB gene with locations of gRNA (for CRISPRa) or (e)pegRNA (for prime editing) targets annotated. Red lollipops indicate pegRNAs that were separately tested in experiments shown in Fig. 7E and Supplementary Fig. 7C. Dashed lollipops indicate locations of the new epegRNA pools. G) List of epegRNA edit types tested, including all possible single-nucleotide substitutions, small insertions of 1, 3 and 6 bp, insertion of a loxP site (34 bp), and small deletions of 1, 3 and 6 bp. H) Total editing efficiencies measured at the individual IL2RB exons. Gray bars show mean prime editing efficiencies in control groups and blue bars show mean prime editing efficiencies when the IL2RB promoter was first activated via CRISPRa.
Table S1. List of oligos and nucleic acid sequences used in this study, related to STAR Methods.
Table S2. List of ENCODE datasets used in this study and information of the beta regression models, related to Figures 1, 2 and STAR Methods.
Table S3. Lists of synHEK3 reporters and relevant features used in this study, related to Figures 2, 3, 6.
Table S4. Editing outcomes of epegRNAs targeting endogenous loci, related to Figure 2.
Table S5. The DDR-focused shRNA library and result of the pooled shRNA screen, related to Figures 4–6.
Table S6. The RNA-seq and ATAC-seq datasets for HLTF knockdown in K562 cells, related to Figure 6.
Table S7. Prime editing efficiency measured in the CRISPRoff and CRISPRa experiments, related to Figure 7.
Data Availability Statement
All sequencing data have been deposited at GEO (GSE228465) and are publicly available. This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table. The UCSC trackhub created for visualizing results of this work is: https://shendure-web.gs.washington.edu/content/members/xyli10/public/nobackup/hub.txt.
This paper does not report original code.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Key Resources Table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Mouse monoclonal anti-β-actin | Sigma-Aldrich | Cat# A2228; Clone: AC-74; RRID: AB_476697 |
| Donkey anti-Rabbit IgG polyclonal antibody (IRDye 800CW) | LI-COR Biosciences | Cat# 926-32213; RRID: AB_621848 |
| Donkey anti-Mouse IgG polyclonal antibody (IRDye 680CW) | LI-COR Biosciences | Cat# 926-68072; RRID: AB_10953628 |
| Rabbit monoclonal anti-HLTF (E9H5I) | Cell Signaling Technology | Cat# 45965; RRID: AB_3094661 |
| Rabbit monoclonal anti-PMS2 (E9U4P) | Cell Signaling Technology | Cat# 27884; RRID: AB_3094662 |
| Mouse monoclonal anti-MLH1 (4C9C7) | Cell Signaling Technology | Cat# 3515; RRID: AB_2145615 |
| Bacterial and virus strains | ||
| NEB Stable Competent E. coli | New England Biolabs | Cat# C3040H |
| NEB 10-beta Electrocompetent E. coli | New England Biolabs | Cat# C3020K |
| Chemicals, peptides, and recombinant proteins | ||
| RPMI 1640 Medium | Gibco | Cat# 11875119 |
| DMEM, High Glucose | Gibco | Cat# 11965118 |
| FBS | Hyclone | Cat# SH30396.03 |
| Penicillin-Streptomycin (10,000 U/mL) | Gibco | Cat# 15140122 |
| mTeSR™ Plus kit | STEMCELL technologies | Cat# 100-0276 |
| Y-27632 | Stemgent | Cat# 04-0012-02 |
| Geltrex™ | Gibco | Cat# A1413302 |
| Trimethoprim | Sigma-Aldrich | Cat# 92131 |
| DNeasy Blood & Tissue Kit | QIAGEN | Cat# 69504 |
| HiScribe™ T7 High Yield RNA Synthesis Kit | New England Biolabs | Cat# E2040S |
| HiScribe® T7 Quick High Yield RNA Synthesis Kit | New England Biolabs | Cat# E2050 |
| RNase-Free DNase Set | QIAGEN | Cat# 79254 |
| TURBO™ DNase | Invitrogen | Cat# AM2238 |
| TRIzol LS Reagent | Invitrogen | Cat# 10296010 |
| RNeasy Mini Kit | QIAGEN | Cat# 74104 |
| Glycogen (5 mg/mL) | Invitrogen | Cat# AM9510 |
| SuperScript™ IV Reverse Transcriptase | Invitrogen | Cat# 18091050 |
| RNaseOUT™ Recombinant Ribonuclease Inhibitor | Invitrogen | Cat# 10777019 |
| Proteinase K | Thermo Scientific | Cat# EO0491 |
| AMPure XP Reagent | Beckman Coulter Life Sciences | Cat# A63882 |
| KAPA2G Robust HotStart ReadyMix PCR Kit | Roche | Cat# KK5702 |
| Power SYBR™ Green PCR Master Mix | Applied Biosystems | Cat# 4367659 |
| NEBuilder® HiFi DNA Assembly Cloning Kit | New England Biolabs | Cat# E5520S |
| NEBridge® Golden Gate Assembly Kit (BsmBI- v2) | New England Biolabs | Cat# E1602S |
| NEBridge® Golden Gate Assembly Kit (BsaI-HF® v2) | New England Biolabs | Cat# E1601S |
| BsmBI-v2 | New England Biolabs | Cat# R0739S |
| BsaI-HF®v2 | New England Biolabs | Cat# R3733S |
| FastDigest Esp3I (IIs class) | Thermo Scientific | Cat# FD0454 |
| I-SceI | New England Biolabs | Cat# R0694S |
| XhoI | New England Biolabs | Cat# R0146S |
| EcoRI-HF | New England Biolabs | Cat# R3101S |
| BamHI-HF | New England Biolabs | Cat# R3136S |
| XbaI | New England Biolabs | Cat# R0145S |
| NotI-HF | New England Biolabs | Cat# R3189S |
| T4 DNA ligase | New England Biolabs | Cat# M0202L |
| SF Cell Line 4D-Nucleofector™ X Kit | Lonza Bioscience | Cat# V4XC-2012 |
| SF Cell Line 96-well Nucleofector™ Kit | Lonza Bioscience | Cat# V4SC-2096 |
| P3 Primary Cell 96-well Nucleofector™ Kit | Lonza Bioscience | Cat# V4SP-3096 |
| ViraPower™ Lentiviral Packaging Mix | Invitrogen | Cat# K497500 |
| Lipofectamine™ 3000 Transfection Reagent | Invitrogen | Cat# L3000001 |
| PEG-it™ Virus Precipitation Solution (5×) | System Biosciences | Cat# LV810A-1 |
| Polybrene | Millipore | Cat# TR-1003-G |
| Geneticin | Gibco | Cat# 10131035 |
| Blasticidin S HCl (10 mg/mL) | Gibco | Cat# A1113903 |
| DEPC (Diethyl Pyrocarbonate) | Millipore Sigma | Cat# D5758-25ML |
| YOYO™-1 Iodide (491/509) - 1 mM Solution in DMSO | Invitrogen | Cat# Y3601 |
| dNTP mix | New England Biolabs | Cat# N0447L |
| NEBNext® Ultra™ II Non-Directional RNA Second Strand Synthesis Module | New England Biolabs | Cat# E6111L |
| Protease | QIAGEN | Cat# 19157 |
| Buffer EB | QIAGEN | Cat319086 |
| Tn5 transposase | Diagenode | Cat# C01070010-20 |
| BSA | New England Biolabs | Cat# B9000S |
| Monarch® DNA Gel Extraction Kit | New England Biolabs | Cat# T1020S |
| NEBNext® High-Fidelity 2X PCR Master Mix | New England Biolabs | Cat# M0541L |
| OneTaq® Hot Start 2X Master Mix with Standard Buffer | New England Biolabs | Cat# M0484L |
| SYBR™ Green I Nucleic Acid Gel Stain | Invitrogen | Cat# S7563 |
| Alt-R™ S.p. Cas9 Nuclease V3 | Integrated DNA Technologies | Cat# 1081059 |
| Alt-R® Cas9 Electroporation Enhancer, 2 nmol | Integrated DNA Technologies | Cat# 1075915 |
| Critical commercial assays | ||
| TruSeq RNA Library Prep Kit v2 | Illumina | Cat# RS-122-2001 |
| TruSeq Stranded mRNA kit | Illumina | Cat# 20020594 |
| Illumina TruSeq RNA UD Indexes | Illumina | Cat# 20023785 |
| NextSeq 1000/2000 P2 Reagents (100 Cycles) v3 | Illumina | Cat# 20046811 |
| NextSeq 500/550 Mid Output Kit v2.5 (150 Cycles) | Illumina | Cat# 20024904 |
| MiSeq Reagent Kit v2 (300-cycles) | Illumina | Cat# MS-102-2002 |
| MiSeq Reagent Kit v3 (150-cycle) | Illumina | Cat# MS-102-3001 |
| MiSeq Reagent Kit v2 (50-cycles) | Illumina | Cat# MS-102-2001 |
| Tagment DNA TDE1 Enzyme | Illumina | Cat# 20034197 |
| Deposited data | ||
| Sequencing data generated in this study | This manuscript | GEO: GSE228465 |
| K562 DHS data | ENCODE Project Consortium, 201234 | GEO:GSM816655 and ENCFF972GVB |
| K562 H3K79me2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733653 and ENCFF957YJT |
| K562 CTCF ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM935407 and ENCFF682MFJ |
| K562 POLR2A ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE91721 and ENCFF806LCJ |
| K562 ATAC-seq data | ENCODE Project Consortium, 201234 | GEO:GSE170378 and ENCFF093IIW |
| K562 H3K9ac ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733778 and ENCFF286WRJ |
| K562 H3K9me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733776 and ENCFF601JGK |
| K562 H3K9me1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733777 and ENCFF654SLZ |
| K562 H4K20me1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733675 and ENCFF605FAF |
| K562 BRD4 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE101225 and ENCFF251SRH |
| K562 EZH2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM1003576 and ENCFF587SWK |
| K562 H2AFZ ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733786 and ENCFF621DJP |
| K562 POLR2AS2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM935402 and ENCFF434PYZ |
| K562 SMC3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM935310 and ENCFF469OWD |
| K562 HDAC1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE10583 and ENCFF684RNO |
| K562 HDAC2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE91451 and ENCFF954LGE |
| K562 HDAC3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE127356 and ENCFF975DCO |
| K562 H3K4me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSE96303 and ENCFF253TOF |
| K562 H3K4me2 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733651 and ENCFF959YJV |
| K562 H3K4me1 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733692 and ENCFF834SEY |
| K562 H3K27me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733658 and ENCFF405HIO |
| K562 H3K27ac ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733656 and ENCFF849TDM |
| K562 H3K36me3 ChIP-seq data | ENCODE Project Consortium, 201234 | GEO:GSM733714 and ENCFF163NTH |
| Experimental models: Cell lines | ||
| K562 | ATCC | CCL-243 |
| K562 PE2-Puro | Choi et al., 20222 | N/A |
| K562 dCas9-VP64 | Chardon et al., 202363 | N/A |
| K562 PEmax | This manuscript | N/A |
| WTC11 DHFR-dCas9-VPH | Tian et al., 202164 | N/A |
| Oligonucleotides | ||
| List of oligonucleotides | This manuscript, Table S1 | N/A |
| Recombinant DNA | ||
| PB-T7-HEK3-BC (synHEK3) | This manuscript | N/A |
| LT3-GFP-T7-miR-E-CS1-PGK-Neomycin | This manuscript | N/A |
| Lenti-rtTA-P2A-Blast | This manuscript | N/A |
| pU6-Sp-pegRNA-HEK3-ins6N | This manuscript | N/A |
| pU6-Sp-pegRNA-HEK3-ins3N | This manuscript | N/A |
| pU6-Sp-dual-gRNA | This manuscript | N/A |
| pU6-Sp-gRNA-2XMS2 | This manuscript | N/A |
| PB-CMV-MCP-XTEN80-p65-Rta-3xNLS-P2A-T2AmPlum | This manuscript | N/A |
| PB-CMV-PEmax-EF1a-Puro | This manuscript | N/A |
| PB-UCOE-EF1a-PEmax-P2A-mCherry-PGK-Blast | This manuscript | N/A |
| PB-CMV-MCS-EF1α-Puro | System Biosciences | Cat# PB510B-1 |
| pCMV-HyPBase | Yusa et al., 201173 | N/A |
| PB-CMV-PE2-EF1a-Puro | Choi et al., 20222 | N/A |
| pU6-Sp-pegRNA-HEK3-insCTT | Anzalone et al., 20191 | Addgene: 132778 |
| pCMV-PEmax | Chen et al., 202113 | Addgene: 174820 |
| pCMV-PEmax-P2A-hMLH1dn | Chen et al., 202113 | Addgene: 174828 |
| CRISPRoff-v2.1 | Nuñez et al., 202158 | Addgene: 167981 |
| LT3GEN | Fellmann et al., 201350 | Addgene: 111173 |
| Software and algorithms | ||
| Bcl2fastq (v2.20) | Illumina | https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html |
| Samtools (v1.9) | Danecek et al., 202184 | http://www.htslib.org |
| Bedops (v2.4.35) | Neph et al., 201280 | https://bedops.readthedocs.io/en/latest/index.html |
| featureCounts | Liao et al., 201481 | https://subread.sourceforge.net/featureCounts.html |
| Needleall | Needleman et al., 197095 Kruskal, 198396 | https://emboss.sourceforge.net/apps/release/6.6/emboss/apps/needleall.html |
| STAR (v2.7.6a) | Dobin et al., 201377 | https://github.com/alexdobin/STAR |
| Salmon (v1.9.0) | Patro et al., 201776 | https://salmon.readthedocs.io/en/latest/ |
| Cutadapt (v4.1) | Martin, 201182 | https://cutadapt.readthedocs.io/en/stable/ |
| TrimGalore (v0.6.6) | Martin, 201182 | https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ |
| Bwa (v0.7.17) | Li et al., 200983 | https://bio-bwa.sourceforge.net/ |
| GenomicRanges | Lawrence et al., 201385 | https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html |
| ChIPseeker | Yu et al., 201588; Wang et al., 202289 | https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html |
| Homer | Heinz et al., 201090 | http://homer.ucsd.edu/homer/ |
| Genomation | Akalin et al., 201592 | https://bioconductor.org/packages/release/bioc/html/genomation.html |
| HTseq (v.2.0.2) | Putri et al., 202297 | https://htseq.readthedocs.io/en/master/ |
| Seurat (v4.0.0) | Hao et al., 202198 | https://satijalab.org/seurat/ |
| Hmisc (v.5.1-1) | Harrell and Dupont | https://cran.r-project.org/web/packages/Hmisc/index.html |
| Betareg (v3.1.4) | Cribari-Neto et al., 201099 | https://cran.r-project.org/web/packages/betareg/index.html |
| DESeq2 | Love et al., 201478 | https://bioconductor.org/packages/release/bioc/html/DESeq2.html |
| WebLogo 3 | Crooks et al., 200487 | https://weblogo.threeplusone.com/ |
| ComplexHeatmap | Gu et al., 201693 | https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html |
| Gviz | Hahne and Ivanek, 201694 | https://bioconductor.org/packages/release/bioc/html/Gviz.html |
| IGV | Robinson et al., 201186 | http://software.broadinstitute.org/software/igv/ |
| FlashFry | McKenna and Shendure, 201874 | https://github.com/mckennalab/FlashFry |
| GuideScan2 | Schmidt et al., 202275 | https://www.guidescan.com/ |
| DeepPrime | Yu et al., 202319 | https://deepcrispr.info/DeepPrime/ |
| Other | ||
| UCSC trackhub for visualizing prime editing results in synHEK3 reporters and surrounding epigenetic environments | This manuscript | https://shendure-web.gs.washington.edu/content/members/xyli10/public/nobackup/hub.txt |
