Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 10.
Published in final edited form as: Nat Biotechnol. 2022 Nov 24;41(4):500–512. doi: 10.1038/s41587-022-01527-4

Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases

Matthew T N Yarnall 1,*, Eleonora I Ioannidi 1,2,*, Cian Schmitt-Ulms 1,*, Rohan N Krajeski 1,*, Justin Lim 1, Lukas Villiger 1, Wenyuan Zhou 1, Kaiyi Jiang 1,3, Sofya K Garushyants 4, Nathaniel Roberts 5, Liyang Zhang 5, Christopher A Vakulskas 5, John A Walker II 6, Anastasia P Kadina 6, Adrianna E Zepeda 6, Kevin Holden 6, Hong Ma 7, Jun Xie 7, Guangping Gao 7, Lander Foquet 8, Greg Bial 8, Sara K Donnelly 9, Yoshinari Miyata 9, Daniel R Radiloff 9, Jordana M Henderson 10, Andrew Ujita 10, Omar O Abudayyeh 1,†,, Jonathan S Gootenberg 1,†,
PMCID: PMC10257351  NIHMSID: NIHMS1902911  PMID: 36424489

Abstract

Programmable genome integration of large, diverse DNA cargo without DNA repair of exposed DNA double-strand breaks (DSBs) remains an unsolved challenge in genome editing. We present Programmable Addition via Site-specific Targeting Elements (PASTE), which uses a CRISPR-Cas9 nickase fused to both a reverse transcriptase and serine integrase for targeted genomic recruitment and integration of desired payloads. We demonstrate integration of sequences as large as ~36 kb at multiple genomic loci across three human cell lines, primary T cells, and non-dividing primary human hepatocytes. To augment PASTE, we discover 25,614 serine integrases and cognate attachment sites from metagenomes and engineer orthologs with higher activity and shorter recognition sequences for efficient programmable integration. PASTE has editing efficiencies similar to or exceeding those of homology directed repair and non-homologous end joining–based methods, with activity in non-dividing cells and in vivo with fewer detectable off-target events. PASTE expands the capabilities of genome editing by allowing large, multiplexed gene insertion without reliance on DNA repair pathways.

Ed summary:

Large sequences are integrated site-specifically into the human genome without double-strand DNA cleavage.


Programmable genome insertion is vital for both gene therapy and basic research. Common methods to insert long DNA sequences rely on cellular responses to double-strand breaks (DSBs), using programmable nucleases such as CRISPR-Cas913, for induction of repair pathways such as non-homologous end joining (NHEJ)4 — as with the homology-independent targeted insertion (HITI)5 technology — or homology directed repair (HDR)68. However, DSB-based approaches have limitations: genome damage causes undesirable outcomes including insertions/deletions, translocations, and activation of p539,10; NHEJ can generate off-target insertions at unintended DSBs11; and HDR has low efficiency in non-dividing cells, including many cell types in vivo and requires long DNA templates that are labor-intensive to produce12. Genome editing technologies such as base editing1315 and prime editing16 alleviate DSB dependencies, but are limited to only nucleotide edits, small insertions (less than ~50 nucleotides), or short deletions (less than ~80 nucleotides)16, and cannot install or replace large sequences of DNA. More recent paired guide Prime editing approaches, which use two pegRNAs with complementary reverse transcription template regions, have enabled insertion of large sequences by biasing repair towards the edited strands17,18. However, these approaches have diminishing efficiency in the 1 to 5.6 kb range and cannot insert larger sequences.

Natural transposable element systems, which include several families of integrases and transposases, provide efficient routes for genome integration without DSBs, but lack the programmability of CRISPR effector nucleases. Transposases insert varying copies of a donor sequence into cells at loosely defined sites, such as TA dinucleotides, resulting in semi-random gene insertion throughout the genome19. In contrast, site-specific integrases, such as large serine phage integrases, efficiently integrate their DNA cargo into sequence-defined landing sites that are ~30-50 nucleotides long20 and have been used to insert therapeutic transgenes at naturally occurring pseudo-sites in the human genome in pre-clinical models21. While targeted integration can be achieved by a two-step approach involving prior insertion of integrase landing sites at a desired location using HDR22, this approach is limited by the inefficiency of two-step integration with HDR and the risks associated with DSBs. Furthermore, a major issue limiting clinical application of certain integrases, such as phiC31, is chromosomal rearrangements between pseudo-sites, which can also lead to significant DNA damage responses23,24.

Engineered systems to direct integrases, recombinases, or transposases to genomic sites for integration of gene cargos without DNA cleavage rely on fusions with programmable DNA binding proteins. Approaches fusing either zinc finger, transcription activator-like effector (TALE), or catalytically inactive Cas9 programmable DNA binding proteins to transposases2529 or recombinases3036 have been demonstrated in mammalian cells, but their reported integration efficiency is low at genomic loci. Moreover, transposase fusions are hindered by excessive promiscuity and off-target insertions, while recombinase fusions have limited targets in the genome due to intrinsic sequence restrictions.

To overcome the current limitations of gene integration approaches, we married advances in programmable CRISPR-based gene editing, such as prime editing, with precise site-specific integrases. Fusing Cas9, reverse transcriptases, and large serine integrases, we demonstrate programmable integration of cargos up to ~36 kb in a single-delivery reaction with efficiencies up to ~50-60% in cell lines and ~4-5% in primary human hepatocytes and T cells. This approach, termed PASTE (Programmable Addition via Site-specific Targeting Elements), is easily retargeted to new genes, can be delivered with a single dose of plasmids, and functions in non-dividing and primary cells. By profiling thousands of guide designs in a pooled screen, we determine guide rules for optimal programming to loci. We engineer PASTE for orthogonal integration and sequence replacement, simultaneously introducing three genes at three separate loci, and concurrently deleting and inserting sequences using guide pairs. With genome wide sequencing, we show that PASTE is much more specific than HITI with higher insertion purity than HDR and HITI. Comparing PASTE to other prime-editing based insertion approaches17, we find 8.3-42.1 fold higher integration efficiencies by PASTE at three endogenous targets. To further improve PASTE, we mine bacterial genomes and metagenomes for integrases, find 25,614 new integrase orthologs and predict associated attachment sites, select multiple new large serine recombinases and demonstrate activity in mammalian cells, and use these integrases for high efficiency integration as part of the PASTE system. For therapeutic relevance, we show diverse templates are compatible with PASTE, including AAV and adenovirus, allowing for DNA integration of viruses and other DNA templates. We extend our use of PASTE into mouse models for in vivo programmable gene insertion in the liver. As an integration tool, PASTE opens multiple applications for gene insertion and tagging in biomedical research and therapeutic development.

Results

PASTE combines CRISPR editing and site-specific integration

We envisioned a programmable integration system coupling a CRISPR-based targeting approach with efficient insertion via serine integrases, which typically insert sequences containing an AttP attachment site into a target containing the related AttB attachment site. By using programmable genome editing to place integrase landing sites at desired locations in the genome, this system would guide the direct activity of the associated integrase to the specific genomic site. As prime editors have been reported to insert 44 bp sequences16, we hypothesized that the ~46 bp AttB landing site of serine integrases could be incorporated into the prime editing guide RNA (pegRNA) design and be copied into the genome via reverse transcription and flap repair (Fig. 1ab). This “beacon” would serve as a target for an integrase, which could either be supplied in trans or directly fused to the Cas9 protein for additional recruitment. By simultaneously delivering a circular double-strand DNA template containing the AttP attachment site, the expressed integrase could directly integrate the DNA cargo at the desired target site with a single delivery mechanism (Fig. 1ab).

Figure 1: PASTE editing allows for programmable gene insertion independent of DNA repair pathways.

Figure 1:

a) Schematic of programmable gene insertion with PASTE. The PASTE system involves insertion of landing sites via Cas9-directed reverse transcriptases, followed by landing site recognition and integration of cargo via Cas9-directed integrases. b) Schematic of PASTE insertion at the ACTB locus, showing guide and target sequences. c) Comparison of GFP cargo integration efficiency between BxbINT and Cre recombinase at the 5’ end of the ACTB locus. d) Comparison of PASTE integration efficiency of GFP with a panel of integrases targeting the 5’ end of the ACTB locus. Both orientations of landing sites are profiled (F, forward; R, reverse). e) Optimization of PASTE constructs with a panel of linkers and RT modifications for EGFP integration at the ACTB and LMNB1 loci with different payloads. f) Gel electrophoresis showing complete insertion by PASTE for multiple cargo sizes. g) Effect of cargo size on PASTEv3 insertion efficiency at the endogenous ACTB and LMNB1 targets. Cargos were transfected with fixed molar amounts. h) PASTEv3 insertion of 36 kb cargo template at the ACTB locus. Data are mean (n= 3) ± s.e.m.

We engineered pegRNAs with AttB sequences, hereafter referred to as attachment site-containing guide RNA (atgRNA), and surveyed a panel of atgRNAs with different length AttB truncations, successfully inserting sequences up to 56 bp at the beta-actin (ACTB) gene locus, with higher efficiency at lengths below 31 bp (Ext. Data Fig. 1ab). As prime editing has been reported to insert LoxP beacons for Cre-based insertion16, we tested a Cre-based integration approach with co-expression of PE2 and a Cre recombinase; however, tyrosine recombinases showed inefficient insertion (Fig. 1c). Given the high efficiency of serine recombinases37, we evaluated a panel of multiple enzymes, including Bxb1 (hereafter referred to as BxbINT), TP901 (hereafter referred to as Tp9INT), and phiBT1 (hereafter referred to as Bt1INT) phage serine integrases (Supplementary Table 1), and could insert all landing sites tested, with efficiencies between 10-30% (Ext. Data Fig. 1c). To test the complete system, we combined all components and delivered them in a single transfection: the prime editing vector, the atgRNA, a nicking guide for stimulating repair of the other strand, a mammalian expression vector for the corresponding integrase or recombinase and a 969 bp minicircle38 DNA cargo encoding green fluorescent protein (GFP) (Fig. 1d). We compared GFP integration rates among the four integrases and recombinases and found that BxbINT integrase showed the highest integration rate (~15%) at the targeted ACTB locus and required the nicking guide for optimal performance (Fig. 1d; Ext. Data Fig. 1df). This combined system, termed PASTEv1, resulted in programmable efficient insertion of the EGFP transgene.

We next hypothesized that we could improve PASTE editing through a series of protein and guide engineering efforts. We tested modified scaffold designs (atgRNAv2) for increased stabilization and expression from PolIII promoters39, improving both atgRNA landing site insertion and overall PASTE efficiency (Ext. Data Fig. 1g). To optimize other potential bottlenecks for PASTE activity, we screened a panel of protein modifications at the ACTB and lamin B1 (LMNB1) loci, including alternative reverse transcriptase fusions and mutations; various linkers between the Cas9, reverse transcriptase, and integrase domains; and reverse transcriptase and BxbINT domain mutants (Fig. 1e and Ext. Data Fig. 1hk, Supplementary tables 2 and 3). A number of protein modifications, including a 48 residue XTEN linker between the Cas9 and reverse transcriptase, as well as the fusion of MMuLV to the Sto7d DNA binding domain or mutation of L139P40 improved PASTE integration efficiency (Ext. Data Fig. 1hj). When these top modifications were combined with a (GGS)6 linker between the reverse transcriptase and BxbINT, they produced up to ~30% gene integration, highlighting the importance of directly recruiting the integrase to the target site (Fig. 1e and Ext. Data Fig. 1k). We refer to this optimized construct, SpCas9-(XTEN-48)-RT(L139P)-(GGS)6-BxbINT, as PASTEv2. We combined PASTEv2 with atgRNAv2 to generate PASTEv3, which achieved precise integration of templates as large as ~36,000 bp with ~10-20% integration efficiency at ACTB and LMNB1 (Fig. 1fh and Ext. Data Fig. 2ae), with complete integration of the full-length cargo confirmed by Sanger sequencing (Ext. Data Fig. 2fg).

atgRNA and AttB site parameters influence PASTE efficiency

To optimize atgRNA parameters for PASTE, we explored the impact of atgRNA and integrase parameters on integration efficiency. Relevant atgRNA parameters for PASTEv1 include the primer binding site (PBS), reverse transcription template (RT), and AttB site lengths, as well as the relative locations and efficacy of the atgRNA spacer and nicking guide (Ext. Data Fig. 3a). We tested a range of PBS and RT lengths at ACTB and LMNB1, and found that rules governing integration efficiency varied between loci, with shorter PBS lengths and longer RT designs having higher integration rates at the ACTB locus (Ext. Data Fig. 3b) and longer PBS and shorter RT designs performing better at LMNB1 (Ext. Data Fig. 3c). These differences may be related to locus-dependent efficiency of priming and resolution of flap insertion observed in other prime editing applications16. The length of the AttB landing site must balance two conflicting factors: the higher efficiency of prime editing for smaller inserts16 and reduced efficiency of Bxb1 integration at shorter AttB lengths41. We evaluated AttB lengths at ACTB, LMNB1, and nucleolar phosphoprotein p130 (NOLC1) loci, finding that the optimal AttB length was locus dependent. At the ACTB locus, long AttB lengths could be inserted (Ext. Data Fig. 1a) and overall PASTE efficiencies for the insertion of GFP were highest for long AttB lengths (Ext. Data Fig. 3d). In contrast, intermediate AttB lengths had higher overall integration efficiencies (> 20%) at LMNB1 (Ext. Data Fig. 3e) and NOLC1 (Ext. Data Fig. 3f), indicating that the increased efficiency of installing shorter AttB sequences overcame the reduction of BxbINT integration at these sites. We tested a panel of shorter RT and PBS guides at ACTB and LMNB1 loci in comparison to our previous optimized guides and found that while shorter RT and PBS sequences did not increase integration at ACTB (Ext. Data Fig. 3g), they had improved integration at LMNB1 (Ext. Data Fig. 3h) Moreover, manual design of a variety of atgRNA to different targets had varying levels of performance and integration outcomes at seven different gene loci (ACTB, SUPT16H, SRRM2, NOLC1, DEPDC4, NES, and LMNB1) (Ext. Data Fig. 3i).

To develop thorough rules for design, we tested atgRNA designs in high-throughput via pooled library screening (Fig. 2a). Using pooled oligo synthesis and cloning, we generated a library of 10,580 atgRNA designs for 11 spacers across 8 target genes (ACTB, LMNB1, NOLC1, SUPT16H, DEPDC4, NES, CFTR, and SERPINA1). For each spacer/target pair, we were able to evaluate PBS lengths between 5-19 bp, RT lengths between 6-36 bp (increments of 2 bases), and AttB lengths of 38, 40, 43, and 46 bp, generating a distribution of edits (Fig. 2b and Extended Data 1 and 2). Across the screen, every gene had atgRNAs with significant AttB insertion rates (Fig 2bc). Upon analyzing the results, we found that more AttB insertion was generally found at a per target basis for shorter AttBs and that a wider range of RT and PBS lengths were permissible, although the exact optimal combinations differed across genes (Fig. 2d and Ext. Data Fig. 4). Across the 8 targets, RTs longer than 20 bp tended to yield higher AttB insertion rates, whereas PBS lengths could be between 5-19 bp without any clear trend. To validate the screen, we tested a panel of top predicted atgRNAs and found that they were all capable of higher efficiency AttB insertion (Fig. 2e) and PASTE integration (Fig. 2f) than our previous set of manually designed atgRNAs derived from our arrayed screening of parameters.

Figure 2: Evaluating design rules for efficient PASTE insertion at endogenous genomic loci.

Figure 2:

a) Schematic of pooled oligo library design for high-throughput screening of atgRNA designs at endogenous gene targets. b) Box plots depicting the editing rates of AttB addition at the different endogenous targets across 10,580 different atgRNA designs. Box indicates between 25th and 75th percentiles, whiskers indicate 1.5 times interquartile-range. Center line indicates the 50th percentile. c) Scatter plot depicting AttB site insertion rates versus significance of the editing (−log(p-value)) as measured by a Student’s two-tailed t-test against a no Cas9-RT control. d) Heatmaps depicting percent AttB site insertion for LMNB1 guide 1 across different RT, PBS, and AttB lengths. Bar charts indicating normalized summation across relevant PBS, RT, or AttB parameter axes are shown on heatmap sides. e) Top atgRNA hits from the screen are compared for AttB site insertion against manually designed atgRNAs (grey bars). f) PASTEv3 efficiency for insertion of an EGFP cargo at different endogenous targets is compared between screen validated atgRNAs and manually designed atgRNAs. g) Accuracy results by 5-fold cross validation of a MLP classifier trained on data from the 10,580 atgRNAs. h) PASTE integration rates of previously evaluated atgRNAs predicted by the MLP classifier to be efficient (pos. guides) or not efficient (neg. guides). i) PASTE integration rates of top atgRNAs predicted to be efficient (dark pink) or not efficient (light pink) by the MLP classifier. Solid line indicates median, dotted lines indicate 25th and 75th percentiles. j) PASTE integration rates and indel formation for integration of ten therapeutically relevant payloads at the ACTB locus. k) Endogenous protein tagging with GFP via PASTE by in-frame endogenous gene tagging at four loci (ACTB, SRRM2, NOLC1, and LMNB1). Immunofluorescence images of representative cells are shown. Cells have nuclear DAPI staining and antibody staining of the labeled proteins to show correlation to the endogenous PASTE tagging signal. Data are mean (n= 3) ± s.e.m.

To build an explicit predictive model for designing atgRNAs for PASTE, we trained a classifier using a kmer-based multi-layer perceptron for modeling the effect of an atgRNA sequence on the final editing rate of AttB insertion. Feature optimization and model training had high accuracy (AUC = .84, Fig. 2g), and scoring of atgRNAs not seen by the model against LMNB1, NOLC1, and ACTB revealed clear differences in efficiency between guides nominated by the model and those rejected (Fig. 2hi). Because our screening results have shown that rational design rules are difficult to generalize across gene targets, we release this prediction model as a guide design tool via a software package (https://github.com/abugoot-lab/pegRNA_rank) that simply receives as input a user’s target sequence and produces a list of atgRNAs rank-ordered by the predicted efficiency score.

The PE3 version of prime editing combines PE2 and an additional nicking guide to bias resolution of the flap intermediate towards insertion. To test the importance of nicking guide selection on PASTE editing, we tested integration at ACTB and LMNB1 loci with two nicking guide positions. Suboptimal nicking guide positions reduced PASTE efficiency up to 30% (Ext. Data Fig. 5ab) in agreement with the 75% reduction of PASTE efficiency in the absence of nicking guide (Ext. Data Fig. 1d, 5c). We also found, as expected, that the atgRNA spacer sequence was necessary for PASTE integration, and substitution of the spacer sequence with a non-targeting guide eliminated editing (Ext. Data Fig. 5d).

PASTE integration with diverse payloads at endogenous sites

Because PASTE does not require locus homology on cargo plasmids, integration of diverse cargo sequences is modular and easily scaled across different loci. With PASTEv3, we tested a panel of 10 different gene cargos, consisting of the common therapeutic genes CEP290, HBB, PAH, GBA, ADA, SERPINA1, and the NYESO T-cell receptor, at the ACTB locus and a subset of these cargos at LMNB1. These cargos, which varied in size from 969 bp to 4,906 bp, had integration frequencies between 4% and 22% depending on the gene and insertion locus, with minimal indel formation (Fig. 2j and Ext. Data Fig. 5e). We next tested if PASTE could insert with base-pair resolution, useful for in-frame protein tagging or expressing cargo without disruption of endogenous gene expression. As BxbINT leaves residual sequences in the genome (termed AttL and AttR) after cargo integration, we hypothesized that these genomic scars could serve as protein linkers. We positioned the frame of the AttR sequence through strategic placement of the AttP on the minicircle cargo, achieving a suitable protein linker, GGLSGQPPRSPSSGSSG. Using this linker, we tagged four genes (ACTB, SRRM2, NOLC1, and LMNB1) with GFP using PASTEv1. To assess correct gene tagging, we compared the subcellular location of GFP with the tagged gene product by immunofluorescence. For all four targeted loci, GFP co-localized with the tagged gene product as expected, indicating successful tagging (Fig. 2k).

PASTE efficiencies exceed DSB-based insertion methods

To benchmark PASTE against other gene integration methods, we compared PASTEv3 to DSB-dependent gene integration using either NHEJ (i.e. HITI) or HDR6,7 pathways (Fig. 3a). PASTE had better gene insertion efficiencies than HITI (Fig. 3b). As DSB generation can lead to insertions or deletions (indels) as an alternative and undesired editing outcome, we assessed the indel frequency by next-generation sequencing, finding significantly fewer indels generated with PASTEv3 than HITI in both HEK293FT and HepG2 cells (Fig. 3b and Ext. Data Fig. 6a), showcasing the high purity of gene integration outcomes with PASTE due to the lack of DSB formation. On a panel of 7 different endogenous targets, PASTEv3 exceeded HITI editing at 6 out of 7 genes, with similar efficiency for the 7th gene (Fig. 3c). We also compared PASTEv3 to previously validated HDR constructs at the N-terminus of ACTB and LMNB1 for EGFP tagging42, finding that although PASTE had similar efficiency at the ACTB locus and lower efficiency at the LMNB1 locus, it generated significantly fewer indels than HDR (Fig. 3d). Notably, both HDR and HITI generate more indels than desired on-target integrations at the ACTB locus. To comprehensively profile PASTE outcomes, we analyzed all possible intermediate or alternative editing outcomes for PASTEv2 and PASTEv3 at the ACTB and LMNB1 loci, including presence of residual AttB sites and indels at either end of the integration junction. Residual AttB sites were a minority event, with an integration frequency into available AttB sites at ~70-75% (Fig. 3e), and testing the effect of these residual AttB sites via western blots showed they had no effect on protein expression (Ext. Data Fig. 6b). Additionally, we found no indel formation at the integration junctions (Fig. 3e).

Figure 3: Characterization of genome-wide PASTE specificity and purity of integration compared to other integration approaches.

Figure 3:

a) Schematic of PASTE, homology-independent targeted integration (HITI), and homology-directed repair (HDR) gene integration approaches. b) Integration of a GFP template by PASTE at the ACTB and LMNB1 loci compared to HITI at the same target. Quantification is by ddPCR. Integration efficiency is compared to the rate of byproduct indel generation. c) GFP integration efficiency at a panel of genomic loci by PASTE compared to insertion rates HITI. d) Integration of a GFP template by PASTE at the ACTB and LMNB1 loci compared to HDR at the same target. Quantification is by single-cell clone counting, since HDR homology preclude use of ddPCR. Integration efficiency is compared to the rate of byproduct indel generation. e) Analysis of all possible editing outcomes for PASTEv3 at the ACTB and LMNB1 sites. f) Schematic of next-generation sequencing method to assay genome-wide off-target integration sites by PASTE and HITI. g) Alignment of reads at the on-target ACTB site using our unbiased genome-wide integration assay, showing expected on-target PASTE integration outcomes. h) Manhattan plot of averaged integration events for multiple single-cell clones with PASTE editing. The on-target site is at the ACTB gene on chromosome 7 (labeled). Number of off-targets with greater than 0.1% integration is shown. i) Manhattan plot of averaged integration events for multiple single-cell clones with HITI editing. The on-target site is at the ACTB gene on chromosome 7 (labeled). Number of off-targets with greater than 0.1% integration is shown. Data are mean (n= 3) ± s.e.m.

Off-target characterization of PASTE and HITI integration

As off-target editing is a critical consideration for genome editing technologies, we explored the specificity of PASTE at specific sites through two hypotheses: 1) off-targets generated by BxbINT integration into pseudo-AttB sites in the human genome and 2) off-targets generated via guide- and Cas9- dependent editing in the human genome. While BxbINT lacks documented integration into the human genome at pseudo-attachment sites43, we computationally identified potential sites with partial similarity to the natural BxbINT AttB core sequence. We tested for BxbINT integration by ddPCR across these sites and found no off-target activity (Ext. Data Fig. 6cg). To assay Cas9 off-targets for our ACTB atgRNA, we identified two potential off-target sites via computational prediction and found no off-target integration for PASTE (Ext. Data Fig. 6c), but substantial off-target activity by HITI at one of the sites (Ext. Data Fig. 6d). While PASTE is shown to be specific for our targets, Cas9-based off-target analysis should be performed for each new PASTE target to ensure specificity.

As computationally predicted sites may not account for all possible off-targets, we additionally evaluated genome-wide off-targets due to either Cas9 or BxbINT through tagging and PCR amplification of insert-genomic junctions (Fig. 3f). We isolated single cell clones for conditions with PASTEv3 integration and negative controls missing PE2, and deep sequencing of insert-genomic junctions from these clones showed all reads aligning to the on-target ACTB site, confirming no off-target genomic insertions (Fig. 3gi and Ext. Data Fig. 6h). We also used this genome-wide pipeline to analyze HITI off-targets using the same ACTB guide and HITI EGFP insertion template and found substantial off-target activity across the genome with 12 different sites identified across 10 chromosomes. Moreover, the on-target ACTB edit was only 34.8% of the reads identified with two other off-targets having higher efficiency. These results show that linear template-based integration approaches have significant off-target activity and highlight the benefits of using circular templates with a dual-nicking PASTE system.

Expression of reverse transcriptases and integrases involved in PASTE may have detrimental effects on cellular health. To determine the extent of these effects, we transfected the complete PASTEv1 system, the corresponding guides and cargo with only PE2, and the corresponding guides and cargo with only BxbINT, and compared them to both GFP control transfections and guides without protein expression via transcriptome-wide RNA sequencing. We found that, while BxbINT expression in the absence of Prime editing had several significant off-targets, the complete PASTE system had only one differentially regulated gene with more than a 1.5-fold change (Ext. Data Fig. 6ij). Genes upregulated by BxbINT overexpression included stress response genes, such as TENT5C and DDIT3, but these changes were not seen in the expression of the PASTE system (Ext. Data Fig. 6ij), potentially due to differences in expression when BxbINT is linked to the PASTE construct.

AttB engineering, gene replacement, and multiplexed PASTE

To optimize PASTE efficiency, we profiled attachment site mutants for optimization of integration kinetics of BxbINT, especially for shorter AttB sites that have reduced integration efficiency. Testing a panel of different AttP sequences previously shown to affect BxbINT integration44,45, we found AttP sequence variants that substantially improved the integration rate (Ext. Data Fig. 7ab). To further improve integration, we expanded our AttP mutagenesis using a pooled screen to evaluate over 5,775 AttP variants containing single and double mutations for enhanced integration activity (Fig. 4a and Ext. Data Fig. 7c), finding a mutant with improved integration activity over the WT AttP at both the ACTB and LMNB1 target sites with PASTEv3 (Fig. 4b).

Figure 4: Multiplexed and orthogonal gene insertion with PASTE.

Figure 4:

a) Schematic for AttP mutagenesis screen for identifying AttP mutants that promote higher integration efficiencies with PASTE. b) Evaluation of two AttP variants from the pooled screen for PASTE integration activity at the ACTB and LMNB1 loci. c) AttB site replacement efficiency with the PASTE-Replace system at the LMNB1 locus. d) EGFP gene replacement efficiency with the PASTE-Replace system at the LMNB1 locus using payloads with either AttP mutant 1 or WT AttP. e) Schematic of multiplexed integration of different cargo sets at specific genomic loci. Three fluorescent cargos (GFP, mCherry, and YFP) are inserted orthogonally at three different loci (ACTB, LMNB1, NOLC1) for in-frame gene tagging. f) Orthogonality of top 4 AttB/AttP dinucleotide pairs evaluated for GFP integration with PASTE at the ACTB locus. g) Efficiency of multiplexed PASTE insertion of combinations of fluorophores at ACTB, LMNB1, and NOLC1 loci. Data are mean (n= 3) ± s.e.m.

Using the optimized AttP site mut 1, we tested whether it might be possible to replace a target sequence by combining integration of a transgene with simultaneous deletion, building on recent developments using prime editing for replacing genomic DNA with short sequences46,47. Using paired atgRNAs at the LMNB1 locus with a 38 bp AttB sequence and RTs that bridge to the other landing site, we replaced 130 bp and 385 bp of genomic sequence at a rate of 7-10% (Fig. 4c). Combining deletion with integration, we could insert the EGFP payload at ~8% integration efficiency to replace 130 bp of genomic sequence, with higher efficiencies for the AttP mutant 1 insertion template versus the WT AttP (Fig. 4d). This version of PASTE, which we termed PASTE-Replace, only requires two atgRNAs containing the PBS and AttB sequence, with an optional inclusion of RT to bridge the deletion. We further profiled PASTE-Replace at the ACTB, NOLC1, and CCR5 loci, finding that PASTEv3 could insert an EGFP payload at 21%, 25%, and 4.5% efficiency, respectively, using both single atgRNA/nicking guide combinations and dual guide RNAs (Ext. Data Fig. 7df). We also compared PASTE-Replace to the recently published paired guide integrase approach, TwinPE mediated knock-in17, finding that PASTE-Replace had 8.3, 42.1, and 9 fold higher integration efficiencies at ACTB, NOLC1, and CCR5, respectively (Ext. Data Fig. 7df). Quantifying residual AttB placement, we found that improved efficiency of PASTE-Replace was not primarily driven by the efficiency of AttB integration, which did not have significant differences between the two approaches (Ext. Data Fig. 7eg). Integrating performance across all three loci, the improved integration efficiency of PASTE-Replace was driven by a combination of the PASTEv3 construct and longer Bxb1 AttB and AttP lengths, which are more optimal for integrase efficiency (Ext. Data Fig. 7df).

The central dinucleotide of the AttP and AttB sites of BxbINT is intimately involved in the association of these attachment sites for integration41, and changing the matched central dinucleotide sequences can modify integrase activity and provide orthogonality for insertion of two genes48. We hypothesized that expanding the set of AttB/AttP dinucleotides could enable multiplexed gene insertion with PASTE, using orthogonal atgRNA combinations (Fig 4e). To find optimal AttB/AttP dinucleotides for PASTE insertion, we profiled the efficiency of GFP integration at the ACTB locus with PASTE across all 16 dinucleotide AttB/AttP sequence pairs. We found several dinucleotides with integration efficiencies greater than the wild-type GT sequence (Ext. Data Fig 7h). Majority of dinucleotides had 75% integration efficiency or greater compared to wild-type AttB/AttP efficiency, implying that these dinucleotides could be potential orthogonal channels for multiplexed gene insertion with PASTE.

Next, we explored the specificity of matched and unmatched AttB/AttP dinucleotide interactions. We comprehensively profiled the interactions between all dinucleotide combinations in a scalable fashion using a pooled assay to compare AttB/AttP integration (Ext. Data Fig. 7i). By barcoding 16 AttP dinucleotide plasmids with unique identifiers, co-transfecting this AttP pool with the BxbINT integrase expression vector and a single AttB dinucleotide acceptor plasmid, and sequencing the resulting integration products, we measured the relative integration efficiencies of all possible AttB/AttP pairs (Ext. Data Fig. 7j). We found that dinucleotide specificity varied wildly, with some dinucleotides (GG) exhibiting strong self-interaction with negligible crosstalk, and others (AA) showing minimal self-preference. Sequence logos of AttP preferences (Ext. Data Fig. 7k) reveal that dinucleotides with C or G in the first position have stronger preferences for AttB dinucleotide sequences with shared first bases, while other AttP dinucleotides, especially those with an A in the first position, have reduced specificity for the first AttB base.

Informed by the efficiency and specificity of the central dinucleotides, we tested GA, AG, AC, and CT dinucleotide atgRNAs for GFP integration at ACTB with PASTEv3, either paired with their corresponding AttP cargo or mispaired with the other three dinucleotide AttP sequences. We found that all four of the tested dinucleotides efficiently integrated cargo only when paired with the corresponding AttB/AttP pair, with no detectable integration across mispaired combinations (Fig. 4f).

Selecting the three top dinucleotide attachment site pairs (CT, AG, and GA), we designed atgRNAs that target ACTB (CT), LMNB1 (AG), and NOLC1 (GA) and corresponding minicircle cargo containing GFP (CT), mCherry (AG), and YFP (GA). Upon co-delivering these reagents to cells, we found that we could achieve single-plex, dual-plex, and trip-plex editing, as read out by bulk genomic DNA harvesting and ddPCR, of all possible combinations of these atgRNAs and cargo in the range of 5%-25% integration with PASTEv1 (Fig. 4g).

A useful application for multiplexed gene integration is for labeling different proteins to visualize intracellular localization and interactions within the same cell. We used PASTEv1 to simultaneously tag ACTB (GFP) and NOLC1 (mCherry) or ACTB (GFP) and LMNB1 (mCherry) in the same cell. We observed that no overlap of GFP and mCherry fluorescence exists and we confirmed that tagged genes were visible in their appropriate cellular compartments, based on the known subcellular localizations of the ACTB, NOLC1 and LMNB1 protein products (Ext. Data Fig. 7l).

Programmable gene integration provides a modality for expression of therapeutic protein products, and we tested protein production of therapeutically relevant proteins Alpha-1 antitrypsin (encoded by SERPINA1) and Carbamoyl phosphate synthetase I (encoded by CPS1), involved in the diseases Alpha-1 antitrypsin deficiency and CPS1 deficiency, respectively. By tagging gene products with the luminescent protein subunit HiBiT49, we could independently assess transgene production and secretion in response to PASTE treatment (Ext. Data Fig. 8a). We transfected PASTEv1 with SERPINA1 or CPS1 cargo in HEK293FT cells and a human hepatocellular carcinoma cell line (HepG2) and found efficient integration at the ACTB locus ( Ext. Data Fig. 8bc). This integration resulted in robust protein expression, intracellular accumulation of transgene products, and secretion of proteins into the media (Ext. Data Fig. 8dg).

Discovery and development of serine integrases for PASTE

As we found that integrase choice can have implications for integration activity (Fig. 1cd), we decided to mine bacterial and metagenomic sequences for new phage associated serine integrases (Fig. 5a). Exploring over 10 TB worth of data from NCBI, JGI, and other sources, we found 25,614 integrases containing the putative catalytic residues (Fig. 5bc and Extended Data 3) and annotated their associated attachment sites by evaluating the presence of repetitive structures in potential 50 bp attachment sites near phage boundaries. Analysis of the integrase sequences led to the identification of five distinct clusters: INTa-e with diverse domain architectures (Fig. 5c). About 20% of the integrases (5203) derived from metagenomic sequences, presumably from prophages, and 4452 of these specifically derived from human microbiome metagenomic samples. An initial screen of integrase activity using a reporter system revealed that several integrases were active in HEK293FT cells, including multiple with more activity than BxbINT, a member of the INTd family (Fig. 5d). Using the predicted 50 bp sequences encoded in atgRNAs along with minicircles containing the complementary AttP sites, we found that these integrases were compatible with PASTE, but performed less effectively than BxbINTd-based PASTE (Ext Data Fig. 8h). We hypothesized that this reduction in performance of the new integrases was due to their longer 50 bp AttB sequences and so we explored truncations of these AttBs in the hopes of finding more minimal attachment sites. Truncation screening on integrase reporters revealed that AttB truncations of all the integrases, including as short as 34 bp, were still active and many had more activity than BxbINTd (Ext Data. Fig. 8i). Upon porting these new shorter AttBs to atgRNAs for PASTE, we found that a number of integrases had more activity in the PASTE system than BxbINTd-based PASTE at the ACTB locus, including the integrase from B. cereus (BceINTa), an integrase from a stool sample from China (SscINTd), and an integrase from a stool sample from adult in China (SacINTd), while others like the integrase from B. cytotoxicus (BcyINTd) and S. lugdunensis (SluINTd) did not (Fig. 5ef). Additionally, we computationally nominated a set of integrases with shorter AttB sites of 30nt, and tested them as PASTE, finding that several candidates, Sss2INTd and SscINTd functioned as a complete PASTE system. To improve PASTE with our new integrases, we fused BceINTa to SpCas9-MLV-RTL139P, termed PASTEv4, and found that it performed better than BxbINTd-based PASTE across a number of endogenous gene loci (Fig. 5g).

Figure 5: Discovery of phage-derived integrases for programmable gene integration with PASTE.

Figure 5:

a) Schematic of integrase discovery pipeline from bacterial and metagenomic sequences. b) Phylogenetic tree of discovered integrases showing distinct subfamilies. Synthesized orthologs are shown as orange dots. c) Domain architecture of the five integrase sub-families. RES, resolvase (cd00338); REC, recombinase (PF07508); ZR, zinc ribbon (PF13408); DF, unknown domain (DUF4368), SMRES, resolvase (smart00857). d) Screening integrase integration activity using reporters in HEK293FT cells compared to BxbINT and phiC31. e) PASTE integration activity with BceINT and BcyINT with truncated attachment sites compared to BxbINT. f) PASTE integration activity with SscINT and SacINT with truncated attachment sites compared to BxbINT. g) Integration of EGFP at different endogenous gene targets for PASTE with either BceINT or BxbINT. Data are mean (n= 3) ± s.e.m.

PASTE efficiency in non-dividing and primary cells

As PASTE does not rely on DSB repair pathways that are only active in dividing cells, we tested PASTE activity in non-dividing cells by transfecting either Cas9 and HDR templates or PASTE into HEK293FT cells and arresting cell division50 via aphidicolin treatment (Ext. Data Fig. 9a). In this model of blocked cell division, we found that PASTEv1 maintained GFP gene integration activity greater than 20% at the ACTB locus whereas HDR-mediated integration was abolished (Ext. Data Fig. 9bc). To evaluate the size limits for therapeutic transgenes, we evaluated insertion of cargos up to 13.3 kb in length in both dividing and aphidicolin treated cells, and found insertion efficiency greater than 10% (Ext. Data Fig. 9d). To overcome reduction of large insert delivery to cells due to potential delivery inefficiencies, we found that delivering larger DNA amounts of insert could significantly improve gene integration efficiency (Ext. Data Fig. 9e)

We also expanded PASTE editing to additional cell types, testing PASTE in the K562 lymphoblast line, primary human T cells, and primary human hepatocytes. We found that PASTEv1 had ~15% gene integration activity in K562 cells and around 5% efficiency in primary human T cells (Fig. 6a and Ext. Data Fig. 9f). In addition, in non-dividing quiescent human primary hepatocytes, we found that PASTEv1 was capable of ~5% gene integration at the ACTB locus (Fig. 6b) after sorting for transfected cells, consistent with the non-dividing activity we observed with the aphidicolin-treated HEK293FT cells.

Figure 6: PASTE is compatible with multiple delivery approaches and can be delivered to primary cell types and in vivo animal models.

Figure 6:

a) PASTE integration efficiency with single vector designs in primary human T cells. Data are mean (n= 3) ± s.e.m. b) PASTE integration efficiency with single vector designs in primary human hepatocytes. Data are mean (n= 3) ± s.e.m. c) Schematic of the adenoviral constructs used to deliver PASTE and the EGFP payload template. d) AdV delivery of all PASTE components in HEK293FT and HepG2 cells. Data are mean (n= 3) ± s.e.m. e) Integration efficiency of AdV delivery of integrase, guides, and cargo in primary human hepatocytes (PXB-cells®). Viral components were listed at dosages indicated. (n= 1). f) Adenoviral EGFP template integration efficiency at the human ACTB locus in the liver of a liver-humanized mouse model using adenovirally delivered PASTE. Integration efficiency is measured 4 weeks post-injection. For integration conditions, points represent different regions of the liver analyzed for editing. At the top is shown a schematic for in vivo targeted gene integration with PASTE via retroorbital injection. Data are mean (n = 8).

Viral therapeutic payload delivery with PASTE

To explore compatibility of PASTEv3 with therapeutically relevant delivery modalities, we explored whether components of the PASTE system could be delivered with either adenovirus-associated viral (AAV) or adenoviral (AdV) vectors. Testing AAV-delivered cargo with an AttP-containing payload in conjunction with other PASTE components delivered via transfection, we found ~4-10% integration of the viral payload in a dose dependent fashion (Ext. Data Fig. 9gi). The AAV genome serving as a suitable template for serine integrase-mediated insertion is consistent with reports of AAV genome circularization in cells51.

In order to package larger cargos in viral vectors, we utilized an AdV vector, an emerging approach for clinical delivery of large genes52. We evaluated whether adenovirus could deliver a suitable template for BxbINT-mediated insertion along with plasmids for PASTEv3 and guide expression, or AdV delivery of guides and BxbINT with plasmid delivery of SpCas9-RT, finding that we could achieve 10-20% integration of the ~36 kb adenovirus genome carrying EGFP in HEK293FT and HepG2 cells (Ext. Data Fig. 9j).

To further demonstrate PASTE would be amenable for in vivo delivery, we developed an mRNA version of the PASTEv1 protein components as well as chemically-modified synthetic atgRNA and nicking guide against the LMNB1 target. Electroporation of the mRNA and guides along with delivery of the template via adenovirus or plasmid yielded high efficiency integration up to ~20% (Ext. Data Fig. 9km). As we hypothesized more sustained BxbINT expression would allow for integration into newly placed AttB sites in the genome, we tested circular mRNA expression53 and found that this boosted the efficiency of integration to ~30% (Ext. Data Fig. 10a).

To package the complete PASTE system in viral vectors, we devised a vector strategy to package the PASTEv1 components across two additional adenoviral vectors, allowing the cargo and PASTEv1 system to be delivered across three AdV vectors (Fig 6c). We found that the complete PASTE system (Cas9-reverse transcriptase, integrase and guide RNAs, and cargo) could be substituted by adenoviral delivery, with integration of up to ~50-60% with viral-only delivery in HEK293FT and HepG2 cells (Fig. 6d and Ext. Data Fig. 10b). As an evaluation of therapeutic feasibility of adenovirally-delivered PASTE, we tested complete AdV delivery at three different cargo amounts in primary human hepatocytes (PXB-cells®), finding editing efficiencies up to 10.5% in a cargo-dependent fashion using an NGS-based integration analysis, with up to 3.8% integration using an AAV template (Fig. 6e and Ext. Data Fig. 10cd).

In vivo delivery of PASTE for liver gene integration

We next applied adenoviral PASTE delivery for in vivo targeting of the liver. As our adenoviral PASTE components were designed to target the human ACTB locus, we performed experiments in ~5.5-month-old, liver-humanized FRG mice (Fah−/−Rag2−/−Il2rg−/− on C587BL/6 with ≥70% human hepatocyte repopulation54). Mice were retro-orbitally injected with the triple-vector PASTE cocktail and maintained for 3 weeks prior to liver harvesting and NGS-based integration analysis. We found that PASTE was capable of integration rates as high as 2.5% in the human hepatocytes in the chimeric liver, with indel byproduct formation at the ACTB locus of 0.1%-0.2% (Fig. 6f and Ext. Data Fig. 10ei).

Discussion

We develop PASTE by engineering of Cas9, reverse transcriptase, and integrase linkers to create a fusion protein capable of efficient integration (5-50%) of diverse cargos at precisely defined target locations within the human genome with small, stereotyped scars that can serve as protein linkers. We demonstrate the versatility of PASTE for gene tagging, gene replacement, gene delivery, and protein production and secretion. Through extensive characterization of integrase attachment sites, we engineer multiplexed gene integration with PASTE, enabling applications such as the specific fusion of three different endogenous genes with three different fluorescent cargos. Overall, we show PASTE insertions at 9 different endogenous sites with 13 different cargos ranging in size from 779 bp to ~36,000 bp, which would enable insertion of greater than 99.7% of human cDNAs as transgenes55. We additionally benchmark PASTE against other prime-editing and integrase-based insertion approaches17, finding significant improvements, driven by a combination of more optimized AttB and AttP sequences and the fusion-based design of the PASTEv3 editor. In agreement with previous studies of serine integrases and prime editing, we find no off-target activity with PASTE.

Metagenomic mining enabled the discovery of thousands of putative integrase/attachment site combinations and engineer multiple integrase orthologs with improved activity and reduced attachment site requirements to further optimize the activity of PASTE, generating a PASTEv4 system using the BceINT integrase. We anticipate that the compendium of 25,614 serine recombinases we discover and characterize will be useful for additional PASTE and synthetic biology applications, although more work is needed to fully characterize the activity of these integrases and any natural pseudo-sites of integration in the human genome that might serve as off-target sites. Moreover, in contrast to transposase-based integration systems56,57, PASTE integration is stereotyped, allowing for precise design of integration and predictable gene fusions. As PASTE does not rely on HDR, it can function in non-dividing cells, including in primary hepatocytes and T-cells, and we demonstrate human hepatocyte editing in vivo via adenoviral delivery to liver-humanized mouse models. In addition, as delivery conditions were not optimized and adenovirus can be hepatotoxic, we anticipate in vivo activity can be substantially improved.

Programmable insertion is a fundamental tool for genetics, for applications such as tagging of gene products, interrogating variants of unknown function, and developing disease models. PASTE also enables therapeutic correction of genetic disease through insertion of full length, functional genes at native loci, a viable strategy for both treating recessive loss of function mutations that cover 4,122 genetic diseases58 and overcoming dominant negative mutations. Current genome editing approaches for diseases such as cystic fibrosis or Leber’s congenital amaurosis59 are limited, as systems must be tailored for specific mutations60,61, requiring unique genome editing therapies for each subset of the patient population. Programmable insertion of the wild-type gene at the endogenous location could address most potential patient mutations, serving as a blanket therapy. Beyond direct correction of hereditary disease, gene insertion provides a promising avenue for cell therapies, and efficient integration of engineered transgenes, such as chimeric antigen receptors, at specific loci can produce improved therapeutic products in comparison to random integration62.

The development of PASTE marries engineering of CRISPR nucleases with the discovery and mammalian characterization of a variety of serine integrases with diverse sequence preferences. By providing efficient, multiplexed integration of transgenes in dividing and non-dividing cells and in animal models, the PASTE platform builds upon fundamental developments in both integrase and CRISPR biology to expand the scope of genome editing and enable new applications across basic biology and therapeutics.

Methods

Cloning of atgRNAs, nicking guides.

atgRNA and nicking guides were cloned by Golden Gate assembly of PCR products. Guide products were amplified by PCR (KAPA HiFi HotStart DNA Polymerase, Roche) off of the Cas9 sgRNA scaffold, with the forward primer containing spacer sequences and the reverse primer containing desired PBS, RT and AttB insertion sequences, in the case of the atgRNA. PCR products were purified by gel extraction (Monarch gel extraction kit, NEB), and assembled in a Golden Gate assembly containing 6.25 ng pU6-atgRNA-GG-acceptor (Addgene #132777), purified PCR product (approximately 2-to-4-fold molar excess), 0.125 μL Fermentas Eco31I (Thermo Fisher Scientific), 0.0625 μL T7 DNA ligase (Enzymatics),0.0625μL of 20 mg/mL Bovine Serum Albumin (NEB), 2x Reaction Ligation Buffer (Enzymatics) and water for 6.25μL total reaction volume. Reactions were incubated between 37°C and 20°C for 5 minutes each for a total of 15 cycles. 2μL of assembled reactions were transformed into 20 μL of competent Stbl3 generated by Mix and Go! competency kit (Zymo) and plated on agar plates supplemented with appropriate antibiotics. After overnight growth at 37°C, colonies were picked into Terrific Broth (TB) media (Thermo Fisher Scientific) and incubated with shaking at 37°C for 24 hours. Cultures were harvested using QIAprep Spin Miniprep Kit (Qiagen) according to the manufacturer’s instructions. All guides used in experiments are summarized in supplementary table 4 and all AttB sequences used in the paper are listed in supplementary table 5.

Cloning of PASTE and cargo constructs.

Expression constructs for Cas9-RT fusions, RT mutants, integrase and recombinases, and Bxb1 mutants were cloned into for mammalian expression via Gibson cloning using Hifi Assembly mix (NEB) according to manufacturer’s instructions. All enzyme expression plasmids used in mammalian experiments are summarized in supplementary table 6. Sequences of linkers used are listed in supplementary table 2, and sequences of Bxb1 and RT mutants are listed in supplementary table 3. For cloning of minicircle cargo plasmids, the Bxb1 or equivalent integrase/recombinase AttP sites and the cargo sequence were introduced into a minicircle parental plasmid with Gibson cloning using Hifi Assembly mix (NEB) according to manufacturer’s instructions. The parental plasmid was digested in order the sequences to be cloned between the bacterial attB and attP sites recognized by the ZYCY10P3S2T E. coli Minicircle Strain (Systems Bioscience). All transgene and cargo plasmids used in experiments are summarized in supplementary table 7.

For all Gibson clonings, 2 μL of assembled reactions were transformed into 20 μL of competent Stbl3 generated by Mix and Go! competency kit (Zymo) and plated on agar plates supplemented with appropriate antibiotics. After growth overnight at 37°C, colonies were picked into Terrific Broth (TB) media (Thermo Fisher Scientific) and incubated with shaking at 37°C for 24 hours. Cultures were harvested using QIAprep Spin Miniprep Kit (Qiagen) according to the manufacturer’s instructions.

For screening integrases discovered computationally, gene fragments were synthesized by Twist Biosciences. These genes were then cloned into separate expression vectors for comparing activity on reporters in mammalian cells and top integrases were cloned into PASTE vectors fused to SpCas9-RT constructs.

Minicircle production.

To produce the minicircle plasmids containing only the integrase AttP site and the transgene sequence, the parental plasmid was transformed into the ZYCY10P3S2T E. coli Minicircle Strain (System Biosciences, catalog #: MN900A-1) overnight at 37°C. The next day a colony was picked into TB media containing Kanamycin antibiotic and grown for approximately 12 hours of incubation at 32°C in an incubator shaker. When the OD600 reached 4-6, the induction media was added in 1:1 ratio to the sample. For the preparation of 100.5 mL induction media, 100 mL of Lysogeny Broth media (Thermo Fisher Scientific) were mixed with 400 μL of 10 M Sodium Hydroxide Solution (Sigma Aldrich) and 100 μL of 20% L-Arabinose (Sigma Aldrich). The induced bacterial culture was then incubated at 32°C in the shaker for 4-5 hours. After spinning down at 5000 x g for 15 minutes, the media was removed leaving only the cell pellet at the bottom of the tube. For the purification of the DNA plasmid, an endotoxin-free plasmid midiprep DNA purification took place using NucleoBond Xtra Midi EF kit (Takara Bio) following the manufacturer’s protocol. Minicircle digestion was then confirmed using restriction enzymes and subsequent gel electrophoresis that allowed for interpretation of the minicircle and parent plasmid fractions in the purified DNA.

Mammalian cell culture.

HEK293FT cells (Thermo Fisher R70007) were cultured in Dulbecco’s Modified Eagle Medium with high glucose, sodium pyruvate, and GlutaMAX (Thermo Fisher Scientific), additionally supplemented with 10% (v/v) fetal bovine serum (FBS) and 1× penicillin-streptomycin (Thermo Fisher Scientific). For puromycin selection, HEK293FT cells were replated at a 1:3 dilution one day post-transfection into media supplemented with 1 μg/mL final concentration puromycin (Thermo Fisher Scientific). HEPG2 cells (American Type Culture Collection (ATCC – HB8065) were seeded in Eagle’s Minimum Essential Medium (Thermo Fisher Scientific), additionally supplemented with 10% (v/v) FBS, at 37°C and 5% CO2. Adherent cells were maintained at confluency below 80-90% at 37°C and 5% CO2. K562 cells (American Type Culture Collection (ATCC) - CCL-243) were cultured in Gibco Roswell Park Memorial Institute 1640 Medium (Thermo Fisher Scientific), additionally supplemented with 10% (v/v) FBS, and maintained at 37°C and 5% CO2. Primary human peripheral blood CD8+ T cells (Stemcell Technologies - #70027) were expanded using fresh complete ImmunoCult-XF T Cell Expansion Medium (Stemcell Technologies - #10981) additionally supplemented with cytokines (Human Recombinant IL-2; Stemcell Technologies - #78036). To stimulate the cells, 25 μL/mL of ImmunoCult Human CD3/CD28/CD2 T Cell Activator (Stemcell Technologies - #10970) were used. Primary human hepatocytes pooled from 5 donors (Thermo fisher scientific #HMCPP5) were plated on collagen coated 96 well plates and transfected 24 hours post-plating. 96 well plates were coated using Collagen I, Rat Tail (Thermo fisher scientific #A10483-01). Stock Collagen I was diluted to 50 μg/mL with 20 mM acetic acid (# A6283) and added to plates at 5 μg/cm2. Plates were incubated at room temperature for 1 hour then rinsed three times with sterile 1X PBS. Thawed hepatocytes were transferred into Hepatocyte Thaw Medium (Thermo fisher Scientific #CM7500) and centrifuged at 100 x g for 10 minutes at room temperature. Pelleted cells were resuspended and plated at 2.5e4 using William’s E Medium (Thermo fisher scientific #A1217601) supplemented with Primary Hepatocyte Thawing and Plating Supplements (Thermo fisher scientific #CM3000). Initial media change occurred 6 hours post plating, with subsequent media changes occurring every 24 hours using William’s E Medium supplemented with Primary Hepatocyte Maintenance Supplements (Thermo fisher scientific #CM4000). For PhoenixBio primary human hepatocyte experiments, we obtained live human hepatocyte cultures (PXB-cells®) from the provider in 96 well plates DMEM without FBS. Upon arrival, hepatocytes were switched to hepatocyte growth medium (dHCGM) (PhoenixBio) and maintained at 37°C. Every 3-4 days the culture media was refreshed.

Transfection.

Cells were plated at 5-15K the day prior to transfection in a 96-well plate coated with poly-D-lysine (BD Biocoat). HEK293FT and HepG2 cells were transfected with Lipofectamine 2000 (Thermo Fisher Scientific) and GenJet HepG2 reagent (SignaGen, SL100489-HEPG2), respectively, according to manufacturer’s specifications. For PASTE insertions, 100 ng atgRNA guide-encoding plasmid, 250 ng cargo plasmid, 50 ng nicking guide-encoding plasmid, and 375 ng SpCas9-RT-P2A-Bxb1 complex-encoding plasmid were delivered to each well unless otherwise specified. For HITI insertions, 100 ng guide-encoding plasmid, 250 ng cargo plasmid, and 75 ng SpCas9 plasmid were delivered to cells. For HDR insertion of a large EGFP cargo, 100 ng guide RNA guide, 200 ng SpCas9 plasmid, and 250 ng insertion template plasmid were delivered to cells. Cells were replated 72 hours later via limiting dilution in order to isolate clonal outgrowth in a 96-well plate for quantification of fluorescent colonies compared to PASTE. For HDR gene editing at the EMX1 locus for non-dividing cell experiments, 300 ng of a single vector encoding the guide RNA, SpCas9, and HDR editing template were transfected and cells were harvested 72 hours later for analysis by next-generation sequencing. For PASTE experiments with hepatocytes, plasmids were transfected with standard lipofectamine 3000 protocols with 400 ng of total plasmid. For Twin-PE knock-in experiments, transfection was performed as previously described17 with lipofectamine 2000 for NOLC1 targeting or lipofectamine 3000 for ACTB and CCR5.

Plasmid Electroporation.

K562 and primary T cells were electroporated using a Lonza 4D-Nucleofector device (Lonza). The SF Cell Line 4D X Kit S (Lonza) was used for K562s and the P3 Primary Cell 4D Kit (Lonza) was used for the unstimulated primary T cells. Approximately 1.5e6 K562 cells were electroporated in a final volume of 20 μL in a 16-well nucleocuvette strip (Lonza). For the T-cell experiments, 7.25e6 primary T cells were electroporated in a final volume of 100 μL in a cuvette.

For the single vector and two vector PASTE systems delivered to K562 cells, 900 ng of prime-Bxb1 complex-encoding plasmid or 800 ng of prime-encoding plasmid and 100 ng of Bxb1-encoding plasmid were electroporated, respectively. For both systems, 250 ng cargo plasmid, 200 ng atgRNA guide-encoding plasmid and 80 ng RNA nicking guide-encoding plasmid were added.

For T-cell electroporations, 990 ng of a guide vector expressing both the atgRNA and nicking guide, 875 ng of the EGFP-containing minicircle plasmid, and 3,150 ng of the PASTE plasmid (SpCas9-RT-P2A-Bxb1) were electroporated.

Electroporations were performed according to the manufacturer’s protocol and after 72 hours the cells were harvested for genomic DNA isolation and digital droplet PCR quantification.

Cloning of atgRNA efficiency screen library.

atgRNA library members were computationally designed to cover corresponding ranges of PBS, RT, and AttB length. Each library member was also paired with a unique barcode, as well as additional padding sequence after the poly-T transcriptional terminator to maintain consistent oligo length. For each of the 12 spacer sequences in the library, corresponding library members were flanked by spacer-specific subpooling binding regions. The 10,580-member library was synthesized as a pool by Twist Biosciences and PCR amplified to generate 12 subpools. Each subpool was Golden Gate cloned into a corresponding backbone containing both the spacer sequence and a 200-300 nucleotide region for targeting. Each library was independently electroporated into Endura electrocompetent cells (Lucigen), plated on agar bioassay plates, and harvested next day for protein purification.

Pooled Screening of atgRNA efficiency.

The complete library was co-transfected with psPAX2 and pMD2.G with Lipofectamine 3000 (Thermo Fisher Scientific) to produce lentivirus for atgRNA library testing. Two days post-transfection, supernatant containing virus was harvested, filtered using 0.45 μm syringe filters, and titer via spinfection, puromycin selection, and Cell Titer Glow viability readout (Promega). After titer, the atgRNA viral library was used to infect 80M HEK239FT cells at a 0.3 MOI to ensure single integration. Post-spinfection, cells were selected for 2 days with puromycin, then allowed to expand and recover without drug for an additional two days, and then were transfected with either PASTE constructs or Bxb1 integrase controls. Three days after transfection, cells were harvested with the Quick gDNA midi kit (Zymo), and the corresponding library region was prepared for sequencing via PCR amplification. Prepared libraries were pair-end sequenced on an Illumina NextSeq 500.

Computational analysis of the pooled atgRNA screen.

Forward reads were trimmed to the corresponding barcode region to extract barcode sequences. Extracted barcodes were paired with corresponding targeted regions in the reverse read, which were trimmed to the region within 20 nucleotides of the putative AttB region. To test for the presence or absence of editing, the region corresponding to the editing target was aligned to either the AttB-insertion outcome or the wildtype outcome, with reads aligning closer to the AttB-insertion outcome being ranked as edited. Editing frequency was then taken as the ratio of edited to total reads for each barcode, with a psuedocount adjustment of 1.

Multilayer perceptron modelling of atgRNA efficiency

Three different sequence-to-function models were considered for accurate prediction of atgRNA efficiency: simple linear/logistic regression, random forest classifier, and multilayer perceptron (MLP) classifier. After the initial round of screening, we found that MLP classifier performed the best, and decided to move forward with a two hidden layer MLP model built in PyTorch. After initial optimization, the MLP classifier contains an input layer of 125 neurons, a first hidden layer of 512 neurons, a second hidden layer of 10 neurons, and an output layer of 2 neurons. RELU is used as the activation function and a dropout rate of 0.1 is applied for each layer. The output layer is transformed by a softmax function to predict probability for each class. To represent atgRNA as a vector, we considered simple one hot vector or k-mer breakdown of atgRNA. We varied the k-mer breakdown from 1 to 7 and found that breaking atgRNA into short 3-nucleotide sequence (3-mer) was the most effective in training MLP. Padding is applied to atgRNA sequences that are shorter than 198 nt with “N” as the padding element to fulfill the input matrix to a uniform size. During the training of the MLP model, we varied the Adam optimizer’s learning rate from 0.0001 to 0.01, batch size from 30 to 100, and epoch number from 10 to 100. We minimized the validation loss in a 5-fold cross validation algorithm with the cross entropy loss as the loss function and chose a learning rate of 0.001, epoch of 50, and batch size of 64 as the final training hyperparameters. ROC_AUC curve is carried out using the sklearn’s roc_auc function. Codes to predict atgRNA efficiency and corresponding setup instructions are available at the following github repositories. (https://github.com/abugoot-lab/atgRNA_rank)

mRNA and synthetic guides electroporation.

Before in vitro transcription, the DNA template was linearized by FastDigest MssI restriction enzyme (Thermo Fisher) and purified by QIAprep 2.0 Spin Miniprep Columns (Qiagen). PASTE mRNA (SpCas9-RT-P2A-Bxb1) and Bxb1 mRNA (NLS-Bxb1) were transcribed and poly-A tailed using the HiScribe T7 ARCA mRNA Kit (NEB, E2065S) with 50% supplement of 5-Methyl-CTP and Pseudo-UTP (Jena Biosciences), following the manufacturer’s protocol. The mRNA was then purified using the MEGAclear Transcription Clean-Up Kit (Thermo Fisher, AM1908). For circularized Bxb1 mRNA, in vitro transcription was conducted using the HiScribe T7 ARCA mRNA Kit without modified nucleotides or the poly-A tailing step. mRNA was subsequently circularized as previously reported53 and cleaned up again using the MEGAclear Transcription Clean-Up Kit. Additional full length PASTE and Bxb1 mRNAs were prepared by Trilink with CleanCap or ’OMe Cleancap AG modifications and were fully substituted with N1-methylpseudouridine. Chemically modified synthetic atgRNAs and and nicking guides (Integrated DNA Technologies and Synthego Corporation) were provided by the corresponding parties. HEK293FT cells were electroporated using a Lonza 4D-Nucleofector device and the SF Cell Line 4D-NucleofectorTM X Kit S (Lonza). For each sample, 4000 ng PASTE mRNA and 1000 ng Bxb1 mRNA were mixed with the designated amount of guide RNAs in a total volume of 15 μL SF buffer solution. Cells (2.0e5 per sample) were spun down at 100×g for 10 minutes, resuspended in 5 μL SF buffer solution, and added to the 15 μL RNA solution. The 20 μL mixture was placed in one well of the cuvette strip and subject to electroporation using the CM-130 program. Electroporated cells were resuspended in its culture media and incubated at 37°C and 5% CO2 for 72 hours before analysis.

Genomic DNA extraction and purification.

DNA was harvested from transfected cells by removal of media, resuspension in 50 μL of QuickExtract (Lucigen), and incubation at 65 °C for 15 min, 68 °C for 15 min, and 98 °C for 10 min. After thermocycling, lysates were purified via addition of 45 μL of AMPure magnetic beads (Beckman Coulter), mixing, and two 75% ethanol wash steps. After purification, genomic DNA was eluted in 25 μL water.

Genome editing quantification by digital droplet polymerase chain reaction (ddPCR).

To quantify PASTE and HITI editing efficiency by digital droplet PCR, 24 μL solutions were prepared in a 96-well plate containing 1) 12 μL 2x ddPCR Supermix for Probes (Bio-Rad) 2) primers for amplification of the integration junction at 250 nM-900 nM, 3) FAM probe for detection of the integration junction amplicon at 250 nM 4) 1.44 μL RPP30 HEX reference mix (Bio-Rad) 5) 0.12 μL FastDigest restriction enzyme for degradation of primer off-targets (Thermo Fisher) and 6) Sample DNA at 1-10 ng/μL. All primers and probes used for ddPCR are listed in supplementary table 8. 20 μL of reaction mix was transferred to a Dg8 Cartridge (Bio-Rad) and loaded into a QX2000 droplet generator (Bio-Rad). 40 μL droplets suspended in ddPCR droplet reader oil were transferred to a new 96-well plate and thermocycled according to manufacturer’s specifications. Lastly, the 96-well plate was transferred to a QX200 droplet reader (Bio-Rad) and the generated data were analyzed using Quantasoft Analysis Pro to quantify DNA editing.

Genome editing quantification by targeted deep sequencing.

To quantify integration of AttB/AttP pairs in the Bxb1 orthogonality assay and genome editing for prime editing and HDR integration at the EMX1 locus, target regions were PCR amplified and analyzed by deep sequencing. Genomic DNA samples were isolated using 50 μL of QuickExtract (Lucigen) per well, and target regions were PCR amplified with NEBNext High-Fidelity 2X PCR Master Mix (NEB) based on the manufacturer’s protocol. PCR amplicon primers are listed in supplementary table 9. Barcodes and adapters for Illumina sequencing were added in a subsequent PCR amplification. Amplicons were pooled and prepared for sequencing on a MiSeq (Illumina). Reads were demultiplexed and analyzed with appropriate pipelines. To analyze the Bxb1 orthogonality assay, AttP barcodes were extracted and normalized to overall barcode counts, and WebLogos were generated with LogoMaker65. To analyze prime and HDR editing, amplicons were analyzed using custom scripts to analyze the relative number of reads with the inserted sequence. We also developed a three-primer NGS assay to quantify left junction integration, using a common forward primer, a reverse primer to detect the unintegrated genomic locus, and another reverse primer for detecting the insertion template. This assay was performed as above with each reverse primer at half concentration.

RNA sequencing library preparation for analysis of transcriptome-wide off-targets.

For analysis of Bxb1, prime, and PASTE transcriptome effects, HEK293FT were transfected with corresponding vectors and harvested after 3 days. Total RNA was purified using a Quick-RNA kit (Zymo) and mRNA was isolated from total RNA with a NEBNext Poly(A) mRNA Magnetic Isolation Module. Purified mRNA prepared for next generation sequencing with a NEBNext Ultra Directional RNA Library Prep Kits, and libraries were sequenced on a Illumina NextSeq instrument with a target of at least 5 million reads per library.

RNA sequencing analysis pipeline for analysis of transcriptome-wide off-targets

RNA-seq samples were demultiplexed using custom scripts, checked for read quality with FastQC, and aligned to the human GRCh38 genome index using the STAR (Spliced Transcripts Alignment to a Reference)66 aligner package. Differential gene analysis was performed with edgeR, Limma, and voom packages6769 to remove lowly expressed genes, normalize gene expression distributions, and correct for non- heteroscedascity of count data when comparing between Bxb1, prime, and PASTE effects on the transcripome. Differentially expressed genes were considered significant for the absolute values of the differential gene expression was >0.585-fold and the p-value was < .05 after Benjamini, Hochberg correction. Volcano plots were generated to visualize the significance of differentially expressed transcripts in different libraries.

Genome-wide off-target integration and integrase integration quantification by UMI TN5 and next generation sequencing.

To quantify the off-target integration of cargo payloads by PASTE and HITI throughout the human genome, single cell clones were harvested three days post transfection with QuickExtract (Lucigen) and purified using AMPure magnetic beads (Beckman Coulter) according to the manufacturer’s protocol. Cellular gDNA was eluted in water and normalized to 6.25 ng/μL. A 2x Tn5 dialysis buffer was prepared with the following components according to Picelli et al 2014: 100 HEPES-KOH at pH 7.2, 0.2 M NaCl, 0.2 mM EDTA, 2 mM DTT, 0.2% Triton X-100, 20% glycerol). TN5 was assembled with equimolar pre annealed mosaic-end double stranded oligonucleotides by incubating the following components at RT for one hour: 25 μL 100uM (final concentration, see supplementary figure 10) oligonucleotide mix in TE, 80 μL 100% glycerol, 24 μL 2x TN5 dialysis buffer, 72 μL Tn5 at A280=3.0. 2.9 μL of normalized DNA was incubated with .1 μL of this Tn5 oligonucleotide mix and .75 μL of 5x Tris-HCl buffer (50 mM Tris-HCl pH8, 25 mM MgCl2) for 10 minutes at 55°C. 1.875 μL of this TN5 transposition reaction were used as the template in a PCR reaction using SuperFi PCR Master mix platinum (Thermofisher) according to the manufacturer’s protocol. Next, 1 μL for this reaction was used as the template in a next generation sequencing reaction (see protocol below); UMI TN5 primers for genome-wide off target integration detection are listed in supplementary table 10. After NGS barcoding, all samples were diluted 1:1 and pooled; 20 μL of this pool was run on a 1% agarose gel and a smear from 280-800 bp was extracted, purified, and prepared for next generation sequencing on a MiSeq (Illumina).

To compare and quantify the integration efficiency of integrases, HEK293T cells were transfected with an atgRNA-expressing plasmid containing the attb site of the punitive integrase along with a minicircle and integrase-expressing plasmid; integration efficiency of the punitive integrase was measured as the integration of the minicircle into the atgRNA vector. To quantify this integration, the above UMI protocol was followed with different primer sets. The mosaic-end double stranded oligonucleotides used in Tn5 preparation remained constant, as did the indexing reverse primer used in the SuperFi PCR mix and first round NGS thermocycling steps. The forward primers for these thermocycling steps were changed for ones with homology for the atgRNA acceptor plasmid. Integrase reporter primers can be found in supplementary table 10.

Quantification of in vivo editing efficiency by three-primer next generation sequencing.

To quantify in vivo PASTE editing efficiency by three primer next generation sequencing, a 5uL reaction was prepared containing 2.5uL NebNext PCR mix (New England Biolabs), 2uL water/primer mix, and .5uL mouse liver DNA normalized to 40ng/uL (purified as described above). For left (AttL) junction analysis, a pool of forward primers binding upstream of the endogenous edited site was paired with two reverse primers, one binding downstream of the endogenous edited site and one binding in the PASTE-integrated minicircle; for right (AttR) junction analysis, a single reverse primer binding downstream of the endogenous edited site was paired with two forward primer pools, one binding upstream of the endogenous edited site and one binding in the PASTE-integrated minicircle (Supplementary table 9). To avoid PCR bias, primers were positioned to ensure both amplicons generated in the subsequent PCR reaction were of equivalent length (+−5bp); the final concentration of reverse and forward primers in solution were equivalent (1uM total). The first round PCR reaction was run for 18 cycles, then barcoded for an additional 20 cycles according to the manufacturer’s protocol; samples were then prepared for next generation sequencing on a MiSeq (Illumina) as described above.

Validation of three-primer next generation sequencing method with PCR standards.

To validate the accuracy of our three-primer next generation sequencing method for quantifying PASTE integration efficiency, a series of PCR standards were prepared. An amplicon with the sequence of the unedited endogenous site and an amplicon with the sequence of the PASTE-edited endogenous site were mixed in a dilution series of 100:0, 25:75, 50:50, 75:25, 0:100. Three-primer next generation sequencing was performed on these PCR standards as described above and the measured ‘editing efficiency’ of the standards was compared next to the standard’s amplicon composition (Extended data fig. 10a). The measured editing efficiency of the standards had strong concordance with the amplicon composition of the input standards, validating the accuracy of the three-primer next generation method for quantifying PASTE integration efficiency.

Computational identification of Bxb1 and Cas9 Off-targets.

To identify potential off-target sites for Bxb1 integration or Cas9 cleavage, similar sequences were identified in the human genome using either BLAST70, for similar AttP sequences, or Cas9 off-target prediction algorithms71. To validate successful amplicon generation by primer sets, positive control off-target amplicons were ordered as oligonucleotides, annealed, and tested by ddPCR. Off target sites are listed in supplementary table 11.

For genome-wide characterization of off-targets, reads were filtered for reads carrying the cargo sequence and trimmed to the genomic junction. Reads were then BWA aligned to the human genome (GRCh38/hg38) and filtered for alignment lengths of at least 30 bp. Filtered reads were then analyzed for coverage and plotted.

Imaging.

For sample preparation for imaging, cover slips were placed at the bottom of a 24-well plate prior to plating HEK293FT cells. After transfection at ~70% confluency and incubation period of three days, the media was removed, and the wells were washed with 1 mL PBS pH 7.4 (Thermo Fisher Scientific). The cells were fixed with 500 μL of 4% Pierce Formaldehyde (Thermo Fisher Scientific) for 30 minutes. Another washing with 1 mL PBS pH 7.4 was performed three times. If no immunostaining was to be performed, slides were processed to be mounted.

If immunostaining was to be performed, the cells were blocked in 1 mL 2.5% goat serum (Cell Signaling Technology) and in 0.1% Triton-X (Sigma Aldrich) for 1 hour at room temperature. For the primary stain, the primary antibodies were mixed with 1.25% goat serum and 300 μL were added per well according to the following dilutions: 1:1500 for the anti-ACTB antibody (NB600-501SS, NovusBio), 1:200 for the anti-SRRM2 antibody (NBP2-55697, NovusBio), 1:200 for the anti-NOLC1 antibody (11815-1-AP, ProteinTech), and 1:200 for the anti-LMNB1 antibody (12987-1-AP, ProteinTech). After shaking overnight at 4 °C, the wells were washed three times with 1mL PBS pH 7.4. For the secondary staining, 1:1000 dilution of secondary antibody, either goat anti-mouse IgG Alexa Fluor 568 (Thermo Fisher Scientific, A-11004) or goat anti-rabbit IgG Alexa Fluor 647 (Thermo Fisher Scientific, A21244), were mixed with 1.25% goat serum. After 1 hour at room temperature, another washing step with PBS took place three times and slides were then mounted.

To mount the slides, a drop of ProLong Gold Antifade Mountant with DAPI (Thermo Fisher Scientific) was placed on the top of the slide and the cover slips were removed from the 24-well plate and inverted onto the drop. The coverslips were left to dry for 24 hours protected from the light at room temperature and then sealed with nail polish. For acquisition of images, a scanning laser confocal microscope (Zeiss LSM900) was used with a 40x oil objective and three different filter sets for visualizing EGFP, DAPI, and the immunofluorescence stain (either 568 or 647).

AAV production and transduction.

To produce AAV vectors for delivery of PASTE cargo, HEK293FT cells were transfected in T25 flasks using Lipofectamine 3000 (Thermo Fisher Scientific) with 1.6 μg GFP AAV cargo plasmid, 1.96 μg AAV8 capsid vector, and 4.13 μg AAV helper pAdDeltaF6 plasmid (Addgene #112867) per T25 flask according to manufacturer’s protocol. Two days after transfection, the media containing the loaded viral vector was filtered using a 0.45 μm filter (Sigma Aldrich) and the final product was stored at −80°C. One day after the transfection of PASTE components (PE2, Bxb1, nicking guide, and atgRNA) into HEK293FT cells, AAVs containing the GFP cargo template were introduced directly into the cells according to the indicated volumes. Three days after the transduction, the cells were harvested and ddPCR readout took place.

AdV production and transduction.

Adenoviral vectors were cloned using the AdEasy-1 system obtained from Addgene. Briefly, SpCas9-RT-P2A-Blast, Bxb1 and guide RNAs, and an EGFP carge gene were cloned into separate adenoviral template backbones and recombined to add the full Adenoviral genome with the AdEasy-1 plasmid in BJ5183 E. coli cells. These recombined plasmids were sent to Vector BioLabs for commercial production. Additional adenoviral vectors were produced for in vivo experiments by the University of Massachusetts Medical School Viral Vector Core, as previously described7274.

For the EGFP cargo vector, it was added at 6.7e6 PFU per well of a 96 well plate of HEK293FT cells and 1.3e6 PFU per well of a 96 well plate of HepG2 cells. For experiments with the three-vector adenoviral delivery of all PASTE components, we used 8e5 PFUs of each viral vector per well of a 96 well plate of HEK293FT cells. For three-vector adenovirus delivery on HepG2 cells, we used 40e6 PFUs of the EGFP cargo vector, 10e6 of the SpCas9-RT-P2A-Blast vector, and 20e6 of the Bxb1and guides vector per well of a 96 well plate. For three-vector adenoviral delivery to PhoenixBio primary human hepatocytes (PXB-cells®), we used the vector amounts listed in the figure and transduced for 3 days prior to harvesting for ddPCR analysis.

Quantification of protein expression.

Three days after the transfection of HepG2 cells, the Nano-Glo HiBiT Lytic Detection System (Promega) was used for the quantification of the HiBiT-tagged proteins, SERPINA1 and CPS1, in cell lysates or media. For the preparation of the Nano-Glo HiBiT Lytic Reagent, the Nano-Glo HiBit Lytic Buffer (Promega) was mixed with Nano-Glo HiBiT Lytic Substrate (Promega) and the LgBiT Protein (Promega) according to manufacturer’s protocol. The volume of Nano-Glo HiBiT Lytic Reagent added, was equal to the culture medium present in each well, and the samples were placed on an orbital shaker at 600 rpm for 3 minutes. After incubation of 10 minutes at room temperature, the readout took place with 125 gain and 2 seconds integration time using a plate reader (Biotek Synergy Neo 2). The control background was subtracted from the final measurements.

Computational discovery of serine integrases.

Prokaryotic genomic and metagenomic sequences were retrieved from various public databases and datasets including NCBI, ENA, Ensembl, MetaSUB, MGnify and JGI. Protein coding genes were predicted with Prodigal v2.6.375. The protein sequences were scanned for large serine integrase domains with hmmsearch (HMMER v3.3.2)76 using Pfam models PF00239, PF07508 and PF13408 with model specific gathering cutoffs. Protein sequences not containing at least a resolvase and recombinase domain were discarded and the remaining sequences were marked as putative large serine integrases. The source contigs of these putative integrases were passed to VirSorter v1.0.677 for prophage prediction. Contigs that were predicted to have a putative integrase completely embedded in a prophage region were kept and the 1000 bp around the predicted prophage boundaries were searched for kmer matches of 2-18 bp with the 100 bp around the predicted integrase. Matching kmers sites were then expanded to 50 bp and scanned for inverted repeats. Sites with a high number of di-nucleotide inverted repeats (based on an experimentally derived cutoff) were nominated as putative attachment sites. To expand the set of integrases and improve attachment site prediction accuracy, another mining approach was applied to all sequences with a species level taxonomic annotation. All of the assemblies with a predicted integrase were paired with an assembly from the same species without a predicted integrase. MGEfinder v1.0.678 was used on each pair to predict mobile genetic elements. Predicted integrases that were completely embedded in a predicted MGE region were kept and the same attachment site prediction algorithm was applied to their contigs with a reduced search with of 30 bp. The two sets of integrases were pooled, and the attachments sites predicted using the MGEfinder method took precedence in the case of multiple predictions. The pooled set of integrases was then used to search the NCBI CDD using RPS-BLAST with a E-value threshold of 0.001 and integrases without a hit to a serine recombinase resolvase domain were discarded. The remaining sequences were clustered with CD-HIT v4.8.1 using the −c 0.7 −s 0.8 options (70% sequence identity and 80% coverage of the shorter sequence). The sequences were aligned with MUSCLE v5.0.1278 and sequences not aligning to the integrase catalytic residues were discarded. The remaining sequences were used to generate a phylogenetic tree with FastTree v2.1.11 using the LG+CAT substitution model. Clades were chosen with manual inspection and domain architectures were visualized with HHpred.

Western blot analysis of protein levels.

Cells were lysed in Cell Lysis Buffer (Invitrogen). Equal volumes of cell lysate were run on Mini-PROTEAN TGX Stain-Free precast gel (Bio-Rad) and transferred to nitrocellulose membranes using the iBlot 2 Dry Blotting System (Thermo Scientific). Non-specific antigen binding was blocked with LICOR Intercept (PBS) Blocking Buffer for 1 hr. at room temperature. Membranes were then incubated with primary antibodies (β-Actin antibody [4967S] with GAPDH antibody [97166S] or Lamin B1 antibody [12586S] with GAPDH antibody [97166S]) from Cell Signaling Technology. Antibodies were diluted at 1:1,000 in Intercept (PBS) Blocking Buffer with 0.2% Tween-20 and the membranes were incubated for 16 hrs at 4°C. Membranes were washed for four 5-min washes with PBS-T (0.2% Tween-20) and further incubated with LICOR Donkey Anti-Rabbit IgG Polyclonal Antibody (IRDye® 800CW) and Goat Anti-Mouse IgG Polyclonal Antibody (IRDye® 680RD) diluted 1:15,000 in Intercept (PBS) Blocking Buffer with 0.2% Tween-20. The membranes were incubated for 1 hr at room temperature followed by four 5-min washes in PBS-T. The membranes were imaged using an Odyssey scanner (LI-COR Biosciences) and analyzed by band densitometry using ImageJ.

In vivo injections of PASTE adenovirus.

PASTE adenoviral preparations were prepared for the two in vivo conditions: 1) 8e+10 PFU of the EGFP cargo vector (UMass Medical), 2e+8 PFU of the SpCas9-RT-P2A-Blast vector (Vector Biolabs), and 2e+10 PFU of the Bxb1and guides vector (UMass Medical); and 2) 8e+10 PFU of the EGFP cargo vector (UMass Medical) and 2e+10 PFU of the Bxb1and guides vector (UMass Medical). These preparations were constituted in 150 μL of PBS and shipped to Yecuris for injections in their liver-humanized FRG® KO mouse model (Fah−/−Rag2−/−Il2rg−/− KO on female C587BL/6 with ≥70% human hepatocyte repopulation). Mice were injected at ~5.5 months of age via retro-orbital injection. All mice enrolled in the study were removed from 2-(2-Nitro-4-trifluromethylbenzoyl)-1,3-cyclohexanedione (Nitisinone, NTBC) for ≥ 20 days and SMX/TMP for ≥ 3 days prior to dosing to reduce the contribution of mouse hepatocytes. Liver humanization was evaluated ≤ 7 days prior to the start of the study by human serum albumin ELISA (Bethyl Laboratory, Catalog #E90-134). All mice were weighed prior to the start of the study and evenly grouped based on their human serum albumin concentration and body weights. One day post-dosing, the mice were treated with NTBC for three days and then continued the standard water formulation containing dextrose and vitamin C for the duration of the study. Mice were maintained at Yecuris Corporation affiliated IACUC accredited facility. General procedures for animal care and housing were as described in the Guide for the Care and Use of Laboratory Animals, National Research Council, Yecuris IACUC Policy and Yecuris General Mouse Handling Care and Euthanasia. Cages were changed every two weeks and the testing facility was sanitized weekly. Animal studies were carried out in accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The protocols were approved by the Institutional Animal Care and Use Committee at the Massachusetts Institute of Technology (Protocol Number: 0919-065-22) and Yecuris Corporation IACUC Policy.

At ~3 weeks post injection, the chimeric livers were harvested and snap frozen. Liver pieces were sectioned from different regions and genomic DNA was purified using the Qiagen DNeasy Blood & Tissue Kit. Genomic DNA was then subjected to a three-primer sequencing assay for quantifying the left junction integration of the adenoviral template at the human-specific ACTB locus.

Extended Data

Extended Data Figure 1: Evaluation of prime integration activity for diverse AttB sequences and optimization of PASTE editing through dosage and mutagenesis.

Extended Data Figure 1:

a) Prime editing efficiency for the insertion of different length BxbINT AttB sites at ACTB. Data are mean (n = 2 or 3) ± s.e.m. b) Prime editing efficiency for this insertion of a BxbINT AttB site at ACTB with targeting and non-targeting guides. Data are mean (n= 3) ± s.e.m. c) Prime editing efficiency for the insertion of different integrases’ AttB sites at ACTB. Both orientations of landing sites are profiled (F, forward; R, reverse). Data are mean (n= 3) ± s.e.m. d) PASTE editing efficiency for the insertion of EGFP at ACTB with and without a nicking guide. Data are mean (n= 3) ± s.e.m. e) PASTE integration efficiency of EGFP at ACTB measured with different doses of a single-vector delivery of components. Data are mean (n = 2 or 3) ± s.e.m. f) PASTE integration efficiency of EGFP at ACTB measured with different ratios of a single-vector delivery of components to the EGFP template vector. Data are mean (n= 3) ± s.e.m. g) PASTE efficiency at the ACTB target compared between atgRNAs containing either the v1 or v2 scaffold designs. Data are mean (n= 3) ± s.e.m. h) PASTE integration efficiency of EGFP at ACTB with different RT domain fusions. Data are mean (n = 2 or 3) ± s.e.m. i) PASTE integration efficiency of EGFP at ACTB with different RT domain fusions and linkers. Data are mean (n = 2 or 3) ± s.e.m. j) PASTE integration efficiency of EGFP at ACTB with mutant RT domains. Data are mean (n= 3) ± s.e.m. k) Optimization of PASTE constructs with a panel of linkers and RT modifications for Gluc integration at the ACTB locus using atgRNAs with the v2 scaffold. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 2: Characterization of PASTE payload sizes and integration junctions.

Extended Data Figure 2:

a) PASTE integration efficiency at the ACTB locus of varying sized cargos transfected at a fixed DNA amount and variable molar ratio. b) PASTE integration efficiency at the ACTB locus of varying sized cargos transfected at a variable DNA amounts. c) Schematic of PASTE integration, including resulting AttR and AttL sites that are generated and PCR primers for assaying the integration junctions. d) PCR and gel electrophoresis readout of left integration junction from PASTE insertion of GFP at the ACTB locus. Insertion is analyzed for in-frame and out-of-frame GFP integration experiments as well as for a no prime control. Expected sizes of the PCR fragments are shown using the primers shown in the schematic in subpanel A. e) PCR and gel electrophoresis readout of right integration junction from PASTE insertion of GFP at the ACTB locus. Insertion is analyzed for in-frame and out-of-frame GFP integration experiments as well as for a no prime control. Expected sizes of the PCR fragments are shown using the primers shown in the schematic in subpanel A. f) Sanger sequencing shown for the right integration junction for an in-frame fusion of GFP via PASTE to the N-terminus of ACTB. g) Sanger sequencing shown for the left integration junction for an in-frame fusion of GFP via PASTE to the N-terminus of ACTB. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 3: Validation of design rules for efficient PASTE insertion at endogenous genomic loci.

Extended Data Figure 3:

a) Schematic of various parameters that affect PASTE integration of ~1 kb GFP insert. On the atgRNA, the PBS, RT, and AttB lengths can alter the efficiency of AttB insertion. Nicking guide selection also affects overall gene integration efficiency. b) The impact of PBS and RT length on PASTE integration of GFP at the ACTB locus. c) The impact of PBS and RT length on PASTE integration of GFP at the LMNB1 locus. d) The impact of AttB length on PASTE integration of GFP at the ACTB locus. e) The impact of AttB length on PASTE integration of GFP at the LMNB1 locus. f) The impact of AttB length on PASTE integration of GFP at the NOLC1 locus. g) The impact of minimal PBS, RT, and AttB lengths on PASTE integration efficiency of GFP at the ACTB locus. h) The impact of minimal PBS, RT, and AttB lengths on PASTE integration efficiency of GFP at the LMNB1 locus. i) PASTE integration efficiency of EGFP at varying endogenous loci. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 4: Heatmaps depicting the effect of PBS, RT, and AttB lengths on atgRNA efficiency of attachment site insertion from high-throughput pooled screening of 10,580 guides targeting a variety of loci.

Extended Data Figure 4:

Bar charts indicating normalized summation across relevant PBS, RT, or AttB parameter axes are shown on heatmap sides.

Extended Data Figure 5: Effect of nicking guides on insertion of diverse cargos.

Extended Data Figure 5:

a) PASTE insertion efficiency at ACTB and LMNB1 loci with two different nicking guide designs. b) Attachment site insertion at the SERPINA1 locus with a panel of different nicking guides at varying distances. c) Effect of nicking guides on PASTE integration efficiency at the LMNB1 locus with two different atgRNA designs. d) PASTE integration efficiency at ACTB and LMNB1 with target and non-targeting spacers and matched atgRNAs with and without BxbINT expression. e) Integration of a panel of different gene cargo at LMNB1 locus via PASTE. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 6: Further characterization of PASTE specificity and effects on cellular transcriptome.

Extended Data Figure 6:

a) Comparison of indel rates generated by PASTE and HITI mediated insertion of EGFP at the ACTB and LMNB1 loci in HepG2 cells. b) Effect of AttB site integration on protein production. Samples treated with either ACTB, LMNB1 non-targeting guides were harvest and analyzed for protein expression by Western blot. Quantified band intensities relative to GAPDH controls are shown below samples. c) GFP integration activity at predicted BxbINT and PASTE ACTB Cas9 guide off-target sites in the human genome. d) GFP integration activity at predicted HITI ACTB Cas9 guide off-target sites. e) Validation of ddPCR assays for detecting editing at predicted BxbINT off-target sites using synthetic amplicons. f) Validation of ddPCR assays for detecting editing at predicted PASTE ACTB Cas9 guide off-target sites using synthetic amplicons. g) Validation of ddPCR assays for detecting editing at predicted HITI ACTB Cas9 guide off-target sites using synthetic amplicons. h) Analysis of on-target and off-target integration events across 3 single-cell clones for PASTE and 3 single-cell clones for no prime condition. i) Volcano plots depicting the fold expression change of sequenced mRNAs versus significance (p-value). Each dot represents a unique mRNA transcript and significant transcripts are shaded according to either upregulation (red) or downregulation (blue). Fold expression change is measured against ACTB-targeting guide-only expression (including cargo). Significance is determined by moderated t-statistic63 adjusted for a log-fold cut off of 0.58564. j) Top significantly upregulated and downregulated genes for BxbINT-only conditions. Genes are shown with their corresponding Z-scores of counts per million (cpm) for BxbINT only expression, GFP-only expression, PASTE targeting ACTB for EGFP insertion, Prime targeting ACTB for EGFP expression without BxbINT, and guide/cargo only. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 7: Additional characterization of AttP mutants for improved editing and multiplexing.

Extended Data Figure 7:

a) Integration efficiencies of wildtype and mutant AttP sites with PASTE at the ACTB locus. b) AttP single mutants are characterized for PASTE EGFP integration at the ACTB locus. c) Relative enrichment values (calculated as ratio of integrated reads to total reads) for the wildtype Bxb1 and top 5 mutants from the mutagenesis screen d) Comparison of integration efficiency between PASTEv3 and Twin-PE integration at the ACTB locus, with both single atgRNA (46 bp) or dual atgRNA with PASTE-Replace (38 bp). e) Comparison of integration efficiency and residual AttB formation between PASTEv3 with PASTE-Replace and Twin-PE integration at the NOLC1 locus with dual atgRNAs containing either a 46 bp or 42 bp AttB sequence. f) Comparison of integration efficiency and residual AttB formation between PASTEv3 with PASTE-Replace and Twin-PE integration at the CCR5 locus with dual atgRNAs containing a 38 bp AttB sequence. g) Comparison of residual AttB formation between PASTEv3 with PASTE-Replace and Twin-PE integration at the ACTB locus. h) Characterization of integration of a 5 kb payload at the ACTB locus with all 16 possible dinucleotides for AttB/AttP pairs between the atgRNA and minicircle. i) Schematic of the pooled AttB/AttP dinucleotide orthogonality assay. Each AttB dinucleotide sequence is co-transfected with a barcoded pool of all 16 AttP dinucleotide sequences and BxbINT, and relative integration efficiencies are determined by next generation sequencing of barcodes. All 16 AttB dinucleotides are profiled in an arrayed format with AttP pools. j) Relative insertion preferences for all possible AttB/AttP dinucleotide pairs determined by the pooled orthogonality assay. k) Orthogonality of BxbINT dinucleotides as measured by a pooled reporter assay. Each web logo motif shows the relative integration of different AttP sequences in a pool at a denoted AttB sequence with the listed dinucleotide. l) Representative fluorescence images of multiplexed PASTE gene tagging of ACTB, LMNB1, and NOLC1. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 8: Therapeutic applications of PASTE and further characterization of integrases.

Extended Data Figure 8:

a) Schematic of protein production assay for PASTE-integrated transgene. SERPINA1 and CPS1 transgenes are tagged with HIBIT luciferase for readout with both ddPCR and luminescence. b) Integration efficiency of SERPINA1 and CPS1 transgenes in HEK293FT cells at the ACTB locus. c) Integration efficiency of SERPINA1 and CPS1 transgenes in HepG2 cells at the ACTB locus. d) Intracellular levels of SERPINA1-HIBIT and CPS1-HIBIT in HepG2 cells. e) Secreted levels of SERPINA1-HIBIT and CPS1-HIBIT in HepG2 cells. f) Integration of SERPINA1 and CPS1 genes that are HIBIT tagged as measured by a protein expression luciferase assay. g) Integration of SERPINA1 and CPS1 genes that are HIBIT tagged as measured by a protein expression luciferase assay normalized to a standardized HIBIT ladder, enabling accurate quantification of protein levels. h) PASTE integration activity with most active integrases compared to BxbINT. i) Characterization of integrase activity on truncated attachment sites using integrase reporters in HEK293FT cells. j) PASTE integration activity with computationally selected integrases with shorter AttB sites. Data are mean (n= 3) ± s.e.m.

Extended Data Figure 9: Evaluation of viral templates for PASTE and characterization of editing in non-dividing cells.

Extended Data Figure 9:

a) Schematic of PASTE performance in the presence of cell cycle inhibition. Cells are transfected with plasmids for insertion with PASTE or Cas9-induced HDR and treated with aphidicolin to arrest cell division. Efficiency of PASTE and HDR are read out with ddPCR or amplicon sequencing, respectively. b) Editing efficiency of single mutations by HDR at EMX1 locus with two Cas9 guides in the presence or absence of cell division read out with amplicon sequencing. Data are mean (n= 3) ± s.e.m. c) HDR mediated editing of the EMX1 locus is significantly diminished in non-dividing HEK293FT cells blocked by 5 μM aphidicolin treatment. Data are mean (n= 3) ± s.e.m. d) Integration efficiency of various sized GFP inserts up to 13.3 kb at the ACTB locus with PASTE in the presence or absence of cell division. Data are mean (n= 3) ± s.e.m. e) Effect of insert minicircle DNA amount on PASTE-mediated insertion at the ACTB locus in dividing and non-dividing HEK293FT cells blocked by 5 μM aphidicolin treatment. Data are mean (n= 3) ± s.e.m. f) PASTE efficiency of EGFP integration at the ACTB locus in K562 cells. Data are mean (n= 3) ± s.e.m. g) Insertion templates delivered via AAV transduction. Templates were co-delivered via AAV dosing at levels indicated. Data are mean (n= 3) ± s.e.m. h) PASTE integration of GFP at the ACTB locus with the GFP template delivered via AAV in HEK293FT cells. i) PASTE integration of GFP at the ACTB locus with the GFP template delivered via AAV at different doses in HEK293FT cells. Data are mean (n= 3) ± s.e.m. j) Integration efficiency of AdV delivery of integrase, guides, and cargo in HEK293FT and HepG2 cells. BxbINT and guide RNAs or cargo were delivered either via plasmid transfection (Pl), AdV transduction (AdV), or omitted (−). SpCas9-RT was only delivered as plasmid or omitted. Data are mean (n= 3) ± s.e.m. k) Delivery of PASTE system components with mRNA and synthetic guides, paired with either AdV or plasmid cargo. Data are mean (n= 3) ± s.e.m. l) Attachment site insertion efficiency at the LMNB1 locus using PASTE delivered as mRNA with synthetic atgRNA and nicking guides. Data are mean (n= 3) ± s.e.m. m) Integration efficiency at the LMNB1 locus using PASTE delivered as mRNA (Trilink versions), synthetic atgRNA and nicking guides, and adenoviral delivered EGFP cargo. All conditions contain full length PASTE mRNA and are optionally supplemented with additional Bxb1 mRNA as indicated. Data are mean (n = 2) ± s.e.m.

Extended Data Figure 10: Additional characterization of in vivo liver editing with PASTE.

Extended Data Figure 10:

a) PASTE integration using delivery of circular mRNA with synthetic guides and either AdV or plasmid cargo. Data are mean (n= 3) ± s.e.m. b) PASTE integration of GFP at the ACTB locus with dose titration of PASTE components and GFP cargo delivered as AdV in HepG2 cells. Data are mean (n= 3) ± s.e.m. c) Evaluation of a 3-primer NGS assay for measuring integration efficiency, akin to junctional readouts by ddPCR. Using amplicon standards mixed at predefined ratios (x-axis), we can ascertain the accuracy of the measured editing (y-axis) by NGS. d) Analysis of primary human hepatocyte (PXB-cells®) EGFP integration at the ACTB locus using adenoviral delivery for PASTEv1 and guides and AAV for the EGFP template. Viral doses are as indicated. Shown is mean ± s.e.m with n = 2. e) Analysis of all liver editing outcomes for adenoviral EGFP template integration at the ACTB locus using PASTE in vivo. f) Analysis of AttB site insertion efficiency at the ACTB locus using PASTE in vivo. Data are mean (n = 8). g) Analysis of adenoviral EGFP template integration efficiency into available AttB sites at the ACTB locus using PASTE in vivo. Data are mean (n = 8). h) Analysis of indel frequency at the ACTB locus using PASTE in vivo. Data are mean (n = 8). i) Analysis of AttB-site associated indels during in vivo integration with PASTE via alignment of representative reads to the ACTB locus containing the desired AttB site.

Supplementary Material

Supplementary Text

Acknowledgments:

We would like to thank B. Desimone, F. Chen, J. Joung, A. Serj-Hansen, G. Feng, J. Wilde, M. Calos, T. Aida, Y. Cha, and M. Mittens for helpful discussions; E.V. Koonin and K. Makarova for helpful discussions with integrase discovery and annotation; P. Reginato, D. Weston, and E. Boyden for MiSeq instrumentation; S. Jacobs and A. Ainbinder for digital-droplet PCR instrumentations; S. Bhatia and S. March Riera for hepatocyte assistance; G. Paradis and M. Griffin for flow cytometry assistance; and J. Crittenden for editing the manuscript.

Funding:

L.V. is supported by a Swiss National Science Foundation Postdoc Mobility Fellowship. O.O.A. and J.S.G. are supported by NIH grants 1R21-AI149694, R01-EB031957, and R56-HG011857; The McGovern Institute Neurotechnology (MINT) program; the K. Lisa Yang and Hock E. Tan Center for Molecular Therapeutics in Neuroscience; G. Harold & Leila Y. Mathers Charitable Foundation; MIT John W. Jarve (1978) Seed Fund for Science Innovation; Impetus Grants; Cystic Fibrosis Foundation Pioneer Grant; Google Ventures; FastGrants; and the McGovern Institute. S.K.G. was supported by the Intramural Research Program of the National Library of Medicine (NLM), National Institutes of Health.

Footnotes

Code Availability: Code to predict atgRNA efficiency and support information are available at https://github.com/abugoot-lab/atgRNA_rank.

Competing interests: O.O.A. and J.S.G. are co-inventors on patent applications filed by MIT relating to work in this manuscript. O.O.A. and J.S.G. are co-founders of Sherlock Biosciences, Proof Diagnostics, Moment Biosciences, and Tome Biosciences. O.O.A. and J.S.G. were advisors for Beam Therapeutics during the course of this project. K.H., J.A.W, A.P.K, and A.E.Z. are employees and shareholders of Synthego. S.K.D., Y.M., and D.R.R. are employees of PhoenixBio. L.F. and G.B. are employees of Yecuris Corporation. N.R., L.Z., and C.A.V. are employees of Integrated DNA Technologies. The remaining authors declare no competing interest.

Data availability:

Raw reads for RNA sequencing and the atgRNA efficiency screen are available at Sequence Read Archive under BioProject accession number PRJNA700575. Expression plasmids are available from Addgene at https://www.addgene.org/browse/article/28223250/ under UBMTA. The human genome GRCh38 can be accessed at https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/.

References:

  • 1.Hsu PD, Lander ES & Zhang F Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262–1278 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Anzalone AV, Koblan LW & Liu DR Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol 38, 824–844 (2020). [DOI] [PubMed] [Google Scholar]
  • 3.Wright AV, Nuñez JK & Doudna JA Biology and Applications of CRISPR Systems: Harnessing Nature’s Toolbox for Genome Engineering. Cell 164, 29–44 (2016). [DOI] [PubMed] [Google Scholar]
  • 4.Nami F et al. Strategies for In Vivo Genome Editing in Nondividing Cells. Trends Biotechnol. 36, 770–786 (2018). [DOI] [PubMed] [Google Scholar]
  • 5.Suzuki K et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144–149 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mali P et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cong L et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rouet P, Smih F & Jasin M Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell. Biol 14, 8096–8106 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chapman JR, Taylor MRG & Boulton SJ Playing the end game: DNA double-strand break repair pathway choice. Mol. Cell 47, 497–510 (2012). [DOI] [PubMed] [Google Scholar]
  • 10.Geisinger JM & Stearns T CRISPR/Cas9 treatment causes extended TP53-dependent cell cycle arrest in human cells. Nucleic Acids Res. 48, 9067–9081 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang H et al. Development of a Self-Restricting CRISPR-Cas9 System to Reduce Off-Target Effects. Mol Ther Methods Clin Dev 18, 390–401 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kanca O et al. An efficient CRISPR-based strategy to insert small and large fragments of DNA using short homology arms. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gaudelli NM et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rees HA & Liu DR Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet 19, 770–788 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Komor AC, Kim YB, Packer MS, Zuris JA & Liu DR Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Anzalone AV et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Anzalone AV et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol (2021) doi: 10.1038/s41587-021-01133-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang J et al. Efficient targeted insertion of large DNA fragments without DNA donors. Nat. Methods 19, 331–340 (2022). [DOI] [PubMed] [Google Scholar]
  • 19.Ivics Z, Hackett PB, Plasterk RH & Izsvák Z Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells. Cell 91,501–510 (1997). [DOI] [PubMed] [Google Scholar]
  • 20.Brown WRA, Lee NCO, Xu Z & Smith MCM Serine recombinases as tools for genome engineering. Methods 53, 372–379 (2011). [DOI] [PubMed] [Google Scholar]
  • 21.Calos MP The C31 Integrase System for Gene Therapy. Curr. Gene Ther 6, 633–645 (2006). [DOI] [PubMed] [Google Scholar]
  • 22.Mulholland CB et al. A modular open platform for systematic functional studies under physiological conditions. Nucleic Acids Res. 43, e112 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ehrhardt A, Engler JA, Xu H, Cherry AM & Kay MA Molecular Analysis of Chromosomal Rearrangements in Mammalian Cells After øC31-Mediated Integration. Hum. Gene Ther 17, 1077–1094 (2006). [DOI] [PubMed] [Google Scholar]
  • 24.Liu J, Jeppesen I, Nielsen K & Jensen TG Phi c31 integrase induces chromosomal aberrations in primary human fibroblasts. Gene Ther. 13, 1188–1190 (2006). [DOI] [PubMed] [Google Scholar]
  • 25.Kovač A et al. RNA-guided retargeting of Sleeping Beauty transposition in human cells. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ma S et al. Enhancing site-specific DNA integration by a Cas9 nuclease fused with a DNA donor-binding domain. Nucleic Acids Res. 48, 10590–10601 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chen SP & Wang HH An Engineered Cas-Transposon System for Programmable and Site-Directed DNA Transpositions. CRISPR J 2, 376–394 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bhatt S & Chalmers R Targeted DNA transposition in vitro using a dCas9-transposase fusion protein. Nucleic Acids Res. 47, 8126–8135 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hew BE, Sato R, Mauro D, Stoytchev I & Owens JB RNA-guided piggyBac transposition in human cells. Synth. Biol 4, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chaikind B, Bessen JL, Thompson DB, Hu JH & Liu DR A programmable Cas9-serine recombinase fusion protein that operates on DNA sequences in mammalian cells. Nucleic Acids Res. 44, 9758–9770 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Akopian A, He J, Boocock MR & Stark WM Chimeric recombinases with designed DNA sequence recognition. Proc. Natl. Acad. Sci. U. S. A 100, 8688–8691 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gordley RM, Smith JD, Gräslund T & Barbas CF 3rd. Evolution of programmable zinc finger-recombinases with activity in human cells. J. Mol. Biol 367, 802–813 (2007). [DOI] [PubMed] [Google Scholar]
  • 33.Mercer AC, Gaj T, Fuller RP & Barbas CF 3rd. Chimeric TALE recombinases with programmable DNA sequence specificity. Nucleic Acids Res. 40, 11163–11172 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gersbach CA, Gaj T, Gordley RM, Mercer AC & Barbas CF 3rd. Targeted plasmid integration into the human genome by an engineered zinc-finger recombinase. Nucleic Acids Res. 39, 7868–7878 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Prorocic MM et al. Zinc-finger recombinase activities in vitro. Nucleic Acids Res. 39, 9316–9328 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gordley RM, Gersbach CA & Barbas CF 3rd. Synthesis of programmable integrases. Proc. Natl. Acad. Sci. U. S. A 106, 5053–5058 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Xu Z et al. Accuracy and efficiency define Bxb1 integrase as the best of fifteen candidate serine recombinases for the integration of DNA into the human genome. BMC Biotechnol. 13, 87 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kay MA, He C-Y & Chen Z-Y A robust system for production of minicircle DNA vectors. Nat. Biotechnol 28, 1287–1289 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Dang Y et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biol. 16, 280 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Oscorbin IP, Wong PF & Boyarskikh UA The attachment of a DNA-binding Sso7d-like protein improves processivity and resistance to inhibitors of M-MuLV reverse transcriptase. FEBS (2020). [DOI] [PubMed] [Google Scholar]
  • 41.Ghosh P, Kim AI & Hatfull GF The orientation of mycobacteriophage Bxb1 integration is solely dependent on the central dinucleotide of attP and attB. Mol. Cell 12, 1101–1111 (2003). [DOI] [PubMed] [Google Scholar]
  • 42.Sun D et al. A functional genetic toolbox for human tissue-derived organoids. Elife 10, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Keravala A et al. A diversity of serine phage integrases mediate site-specific recombination in mammalian cells. Mol. Genet. Genomics 276, 135–146 (2006). [DOI] [PubMed] [Google Scholar]
  • 44.Singh S, Ghosh P & Hatfull GF Attachment site selection and identity in Bxb1 serine integrase-mediated site-specific recombination. PLoS Genet. 9, e1003490 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhang Q, Azarin SM & Sarkar CA Model-guided engineering of DNA sequences with predictable site-specific recombination rates. Nat. Commun 13, 4152 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jiang T, Zhang X-O, Weng Z & Xue W Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol (2021) doi: 10.1038/s41587-021-01026-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Choi J et al. Precise genomic deletions using paired prime editing. Nat. Biotechnol 40, 218–226 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Jusiak B et al. Comparison of Integrases Identifies Bxb1-GA Mutant as the Most Efficient Site-Specific Integrase System in Mammalian Cells. ACS Synth. Biol 8, 16–24 (2019). [DOI] [PubMed] [Google Scholar]
  • 49.Schwinn MK et al. CRISPR-Mediated Tagging of Endogenous Proteins with a Luminescent Peptide. ACS Chem. Biol 13, 467–474 (2018). [DOI] [PubMed] [Google Scholar]
  • 50.Lin S, Staahl BT, Alla RK & Doudna JA Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. Elife 3, e04766 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Schnepp BC, Jensen RL, Chen C-L, Johnson PR & Clark KR Characterization of adeno-associated virus genomes isolated from human tissues. J. Virol 79, 14793–14803 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Wold WSM & Toth K Adenovirus vectors for gene therapy, vaccination and cancer gene therapy. Curr. Gene Ther 13, 421–433 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Wesselhoeft RA, Kowalski PS & Anderson DG Engineering circular RNA for potent and stable translation in eukaryotic cells. Nat. Commun 9, 2629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Azuma H et al. Robust expansion of human hepatocytes in Fah−/−/Rag2−/−/Il2rg−/− mice. Nat. Biotechnol 25, 903–910 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bateman A et al. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Strecker J et al. RNA-guided DNA insertion with CRISPR-associated transposases. Science (2019) doi: 10.1126/science.aax9181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Klompe SE, Vo PLH, Halpin-Healy TS & Sternberg SH Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature 1 (2019). [DOI] [PubMed] [Google Scholar]
  • 58.Amberger JS, Bocchini CA, Schiettecatte F, Scott AF & Hamosh A OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research vol. 43 D789–D798 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Maeder ML et al. Development of a gene-editing approach to restore vision loss in Leber congenital amaurosis type 10. Nat. Med 25, 229–233 (2019). [DOI] [PubMed] [Google Scholar]
  • 60.Mackay DS et al. Screening of a Large Cohort of Leber Congenital Amaurosis and Retinitis Pigmentosa Patients Identifies Novel LCA5 Mutations and New Genotype-Phenotype Correlations. Human Mutation vol. 34 1537–1546 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Marson FAL, Bertuzzo CS & Ribeiro JD Classification of CFTR mutation classes. The Lancet. Respiratory medicine vol. 4 e37–e38 (2016). [DOI] [PubMed] [Google Scholar]
  • 62.Eyquem J et al. Targeting a CAR to the TRAC locus with CRISPR/Cas9 enhances tumour rejection. Nature 543, 113–117 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Smyth GK Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol 3, Article3 (2004). [DOI] [PubMed] [Google Scholar]
  • 64.McCarthy DJ & Smyth GK Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25, 765–771 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Tareen A & Kinney JB Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Dobin A et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Law CW, Chen Y, Shi W & Smyth GK voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Robinson MD, McCarthy DJ & Smyth GK edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Ritchie ME et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Johnson M et al. NCBI BLAST: a better web interface. Nucleic Acids Res. 36, W5–9 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Hsu PD et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol 31, 827–832 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sena-Esteves M & Gao G Introducing genes into mammalian cells: Viral vectors. Cold Spring Harb. Protoc 2020, 095513 (2020). [DOI] [PubMed] [Google Scholar]
  • 73.Su Q, Sena-Esteves M & Gao G Release of the cloned recombinant Adenovirus genome for rescue and expansion. Cold Spring Harb. Protoc. 2019, db.prot095539 (2019). [DOI] [PubMed] [Google Scholar]
  • 74.Su Q, Sena-Esteves M & Gao G Purification of the recombinant Adenovirus by cesium chloride gradient centrifugation. Cold Spring Harb. Protoc 2019, db.prot095547 (2019). [DOI] [PubMed] [Google Scholar]
  • 75.Hyatt D et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Eddy SR Accelerated Profile HMM Searches. PLoS Comput. Biol 7, e1002195 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Roux S, Enault F, Hurwitz BL & Sullivan MB VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Durrant MG, Li MM, Siranosian BA, Montgomery SB & Bhatt AS A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation. Cell Host Microbe 28, 767 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Text

Data Availability Statement

Raw reads for RNA sequencing and the atgRNA efficiency screen are available at Sequence Read Archive under BioProject accession number PRJNA700575. Expression plasmids are available from Addgene at https://www.addgene.org/browse/article/28223250/ under UBMTA. The human genome GRCh38 can be accessed at https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/.

RESOURCES