Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2023 Apr 20;51(9):4519–4535. doi: 10.1093/nar/gkad270

Novel molecular requirements for CRISPR RNA-guided transposition

Matt W G Walker 1,3, Sanne E Klompe 2,3,, Dennis J Zhang 3, Samuel H Sternberg 4,
PMCID: PMC10201428  PMID: 37078593

Abstract

CRISPR-associated transposases (CASTs) direct DNA integration downstream of target sites using the RNA-guided DNA binding activity of nuclease-deficient CRISPR-Cas systems. Transposition relies on several key protein-protein and protein-DNA interactions, but little is known about the explicit sequence requirements governing efficient transposon DNA integration activity. Here, we exploit pooled library screening and high-throughput sequencing to reveal novel sequence determinants during transposition by the Type I-F Vibrio cholerae CAST system (VchCAST). On the donor DNA, large transposon end libraries revealed binding site nucleotide preferences for the TnsB transposase, as well as an additional conserved region that encoded a consensus binding site for integration host factor (IHF). Remarkably, we found that VchCAST requires IHF for efficient transposition, thus revealing a novel cellular factor involved in CRISPR-associated transpososome assembly. On the target DNA, we uncovered preferred sequence motifs at the integration site that explained previously observed heterogeneity with single-base pair resolution. Finally, we exploited our library data to design modified transposon variants that enable in-frame protein tagging. Collectively, our results provide new clues about the assembly and architecture of the paired-end complex formed between TnsB and the transposon DNA, and inform the design of custom payload sequences for genome engineering applications with CAST systems.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Pooled DNA libraries reveal nucleotide preferences for the TnsB transposase, permit the design of functional linker sequences for in-frame protein tagging, and uncover the involvement of integration host factor (IHF).

INTRODUCTION

Transposons are pervasive genetic elements capable of mobilizing between distinct genetic contexts using various targeting pathways, which serve as a potent force for genome evolution in all domains of life (1–3). Despite their abundance and sheer diversity, most transposons share a common feature in the presence of inverted repeat sequences that dictate the boundaries of the mobile element (4). These terminal transposon sequences are referred to as the left and right ends and typically encode one or more binding sites recognized by oligomeric transposase proteins through specific protein-DNA interactions.

DNA transposons (Class II) may be classified by their hallmark transposase domain, leading to major classes of DD(E/D) transposons, serine transposons, tyrosine transposons and Y1/Y2-transposons (5). DD(E/D)-family transposable elements encompass all known examples of ‘cut-and-paste’ DNA transposons, which are excised from the donor site and inserted into the target site. This reaction relies on mechanistic and enzymatic symmetry, with transposase subunits executing identical chemical steps on both transposon ends that involve coordinated nucleophilic attacks during strand cleavage and joining (4). The transposase binding sites themselves, however, are not always positioned symmetrically. Whereas some transposons encode symmetrically-positioned transposase binding sites within identical left and right ends, including Tc1/mariner-family transposons such as Mos1 (6), other elements encode asymmetrically-positioned binding sites within distinct left and right ends, including Hermes, piggyBac, Bacteriophage Mu, P elements, and Tn7-family transposons (7–11). For the well-characterized Tn7 transposon, the asymmetric transposon ends facilitate strict control over integration orientation (7,12), allowing one transposon end to be preferentially integrated adjacent to a target site, but the underlying mechanistic basis explaining this preference is unresolved.

Tn7-family transposons exhibit a modular nature in that they encode conserved core transposition proteins—TnsA, TnsB and TnsC—but diverse target site selection components. TnsB is a DDE-family integrase that recognizes and binds the transposon ends, catalyzes 3’ cleavage of both transferred strands at the donor site, and catalyzes the transesterification reaction at the target site. TnsA is an endonuclease protein that forms a heteromeric complex with TnsB and cleaves the 5’ end of the non-transferred strand, collectively allowing for full excision of the transposon from the donor site (13,14). TnsC is an AAA+ ATPase protein that communicates between the transposase module and the targeting component (15,16). Tn7-like transposons encode a sequence-specific DNA-binding protein as their targeting module, often of the TniQ family and referred to as TnsD, which directs transposition to a safe-harbor locus such as glmS, comM, yhiN and parE (17–22). In contrast, CRISPR-associated transposons (CASTs) use nuclease-deficient CRISPR-Cas systems to catalyze programmable, RNA-guided DNA transposition (18,23–27).

We previously reported RNA-guided transposition activity for VchCAST, a Tn7-like transposon from Vibrio cholerae (also referred to as Tn6677), which mediates efficient and highly specific DNA integration in an Escherichia coli heterologous host (23). VchCAST encodes a Type I-F CRISPR-Cas system that specifies integration sites through RNA-guided DNA targeting by a multi-subunit complex called Cascade (23,28). Importantly, Cascade binds DNA in complex with an accessory transposition protein, TniQ, which ultimately recruits TnsC and the heteromeric TnsAB transposase, in complex with the donor DNA, to assemble the catalytically active transpososome (Figure 1A) (28,29). DNA integration occurs roughly 50-bp downstream of the R-loop formed by TniQ-Cascade, but the preferred transposon insertion site varies across separate targets, with the dominant integration position ranging between 46–52-bp downstream of the target site (23,30). The sequence determinants underlying these heterogeneous integration distances remained enigmatic in our previous work (23,30). Additionally, the role and requirement for multiple putative TnsB binding sites within both transposon ends is unclear, limiting efforts to further engineer VchCAST as a DNA integration tool. Like other Tn7-family transposons, the transposon left and right ends encode asymmetrically positioned TnsB binding sites, which may also relate to the biased orientation with which transposon insertions occur (7,31).

Figure 1.

Figure 1.

Pooled library approach to investigate transposon end mutability. (A) Schematic of RNA-guided transposition with VchCAST. (B) Integration efficiency of the WT mini-transposon in both orientations (illustrated in panel D) when directed to a genomic lacZ target site, as measured by qPCR. (C) Number of transposon right and left end library variants tested in each category. ‘Truncations’ encode serial mutations to shorten the WT transposon end sequence and reveal minimal sequence requirements. ‘Substitutions’ mutate the transposon ends in either 1-bp, 2-bp or 4-bp increments. Mutations to the TnsB binding sites include ‘identity’ variants that swap TnsB binding sites individually or combinatorially, ‘spacing’ variants that modulate the distance between binding sites, ‘inversions’ that invert the orientation of binding sites, and ‘additions’ that add a fourth binding site to the left end between L2 and L3 or to the right end distal to R3. Finally, ‘linker’ variants encode mutations that eliminate stop codons and obtrusive amino acids from the open reading frames encoded by the right transposon end, in order to permit in-frame protein tagging. Since some variants are represented in multiple categories, we specified the total number of variants including duplicates (‘Total’) and excluding duplicates (‘Total (unique)’). (D) Pooled library transposition approach. Library members were synthesized as single-stranded oligos and cloned into a plasmid donor library (pDonor), with 10-bp barcodes (gray) located between the transposon end and cargo to uniquely identify each variant. The donor library was used for transposition into the E. coli genome, and junction amplicons were generated to determine the representation of each library member within integrated products by NGS. (E) Schematic of the native VchCAST transposon end sequences (top) and relative T-RL integration activity for library members in which the left and right ends were sequentially mutagenized beginning internally (bottom). Each point represents one biological replicate.

Here, we employ library-based experiments in combination with high-throughput sequencing to investigate DNA sequence requirements during RNA-guided transposition by VchCAST. By individually mutating both transposon ends and measuring resulting DNA integration activity, we revealed sequence requirements of transposase binding sites that also mediate transposase-transposon cognate specificity. Interestingly, our results indicated that the relative positioning of each transposase binding site plays a crucial role in defining the proper architecture of the transpososome complex, with spacing patterns that correspond to the helical pitch of double-stranded DNA. These mutational data also revealed the importance of an integration host factor (IHF) binding site within the left transposon end, and subsequent genetic knockout and rescue experiments confirmed the role of IHF in stimulating transposition efficiency in E. coli. Finally, we uncovered sequence preferences at the site of integration, and we exploited our mutagenesis data to rationally engineer the transposon right end to enable in-frame tagging of endogenous protein-coding genes. Collectively, this work expands our understanding of both protein and DNA sequence requirements of Tn7-like transposons, reveals insights into the architecture of the transpososome complex, and provides new knowledge to inform the design of custom transposon sequences for genome engineering applications.

MATERIALS AND METHODS

Cloning, testing, and analysis of pooled pDonor libraries

Donor plasmid (pDonor) libraries were generated by cloning 1549 transposon left end variants or 1849 transposon right end variants into a donor plasmid, which was co-transformed with an effector plasmid (pEffector) that directed transposition into the E. coli genome (schematized in Figure 1D). Each transposon end variant was associated with a unique 10-bp barcode that was used to uniquely identify variants in our sequencing approach, which relied on sequencing the starting plasmid libraries (input) and integrated products from genomic DNA (output) by NGS to determine the representation of each library member before and after transposition. To sequence the output, we independently amplified integration events in the T-RL and T-LR orientations using a cargo-specific primer flanking the transposon end and a genomic primer either upstream or downstream of the integration site. We wrote custom Python scripts to compare each library member's representation in the output to its representation in the input, allowing us to calculate the relative transposition efficiency of our custom transposon end variants.

To clone the transposon donor libraries, we first generated library variants as 200-nt single-stranded pooled oligos (Twist Bioscience). All variants are listed in Supplementary Tables S3 and S4. We wrote custom MATLAB scripts to automate the design of our substitution and truncation variants. The remainder of variants were designed by hand in spreadsheets. 1 ng of oligoarray library DNA was amplified by PCR for 12 cycles in 40 μl reactions using Q5 High-Fidelity DNA Polymerase (NEB) and primers specific to the right or left end library, in order to add restriction enzyme digestion sites. All plasmids used in this study are listed in Supplementary Table S1, and oligos are listed in Supplementary Table S2. Amplicons were cleaned up and eluted in 45 μl Milli-Q H2O (QIAquick PCR Purification Kit). As the backbone vector, we used a plasmid encoding a 775-bp mini-transposon, delineated by 147-bp of the native transposon left end and 75-bp of the native transposon right end, on a pUC57 backbone. The backbone vector and library insert amplicons were digested (AscI and SapI for the right end library, and NcoI and NotI for the left end library) at 37°C for 1 h, gel purified, and ligated in 20 μl reactions with T4 DNA Ligase (NEB) at 25°C for 30 min. Ligation reactions were cleaned up and eluted in 10 μl Milli-Q H2O (MinElute PCR Purification Kit), and then used to transform electrocompetent NEB 10-beta cells in five individual electroporation reactions according to the manufacturer's protocol. After recovery (37°C for 1 h), transformed cells were plated on large 245 mm x 245 mm bioassay plates containing LB-agar with 100 μg/ml carbenicillin. Plates were scraped to collect cells, and plasmid DNA was isolated using the QIAGEN Plasmid Midi Kit.

Transposition experiments were performed in E. coli BL21(DE3) cells. pEffector encoded a CRISPR array (repeat-spacer-repeat), a native tniQ-cas8-cas7-cas6 operon, and a native tnsA-tnsB-tnsC operon, all under the control of a single T7 promoter on a pCDFDuet-1 backbone (30). 2 μl of DNA solution containing 200 ng of pDonor and pEffector in equal molar amount was used to co-transform electrocompetent cells according to the manufacturer's protocol (Sigma-Aldrich). Four transformations were performed for each sample, and following recovery at 37°C for 1 h, each transformation was plated on a large bioassay plate containing LB-agar with 100 μg/ml spectinomycin, 100 μg/ml carbenicillin, and 0.1 mM IPTG. Cells were grown at 37°C for 18 h. Thousands of colonies were scraped from each plate, and genomic DNA was extracted using the Wizard Genomic DNA Purification Kit (Promega).

Next-generation sequencing (NGS) amplicons were prepared by PCR amplification using Q5 High-Fidelity DNA Polymerase (NEB). 250 ng of template DNA was amplified in 15 cycles during the PCR1 step. PCR1 samples were diluted 20-fold and amplified in 10 cycles during the PCR2 step. PCR1 primer pairs contained one pDonor backbone-specific primer and one transposon-specific primer (input library), or one genomic target-specific primer and one transposon-specific primer (output library). PCR amplicons were resolved by 2% agarose gel electrophoresis and gel-purified (QIAGEN Gel Extraction Kit). Libraries were quantified by qPCR using the NEBNext Library Quant Kit for Illumina (NEB). Sequencing for both input and output libraries was performed using a NextSeq Mid or High Output Kit with 150-cycles (Illumina). Additionally, the input libraries were also sequenced with a paired-end run using a MiSeq Reagent Kit v2 with 500-cycles (Illumina).

NGS data analysis was performed using custom Python scripts. Demultiplexed reads were filtered to remove reads that did not contain a perfect match to the 19-bp primer binding sequence at the 3’-terminus of the transposon end. Then, the 10-bp sequence directly downstream of the primer binding sequence was extracted, which encodes a barcode that uniquely identifies each transposon end variant. The number of reads containing each library member barcode was counted. If a read did not contain a barcode that matched a library member barcode, it was discarded. The barcode counts were summed across two NGS runs using the same PCR2 samples for the input libraries. Two biologically independent replicates were performed for the output libraries. The relative abundance of each library member was then determined by dividing the barcode count of each library member by the total number of barcode counts. The fold-change between the output and input libraries was calculated by dividing the relative abundance of each library member in the output library by its relative abundance in the input library. This fold-change was then normalized by dividing the fold-change of each library member by the average fold-change of four wildtype library members that contained identical transposon ends but unique barcodes.

One source of experimental noise in our approach came from PCR recombination (32), in which barcodes became uncoupled from their associated transposon end variants during PCR amplification. PCR amplification has been previously observed to be a source of uncoupling between paired elements during lentiviral production, which can confound analysis in CRISPR screens (32). We quantified the frequency of uncoupling by performing long-read Illumina sequencing (MiSeq, 500-cycles) to sequence both the barcode and full-length transposon end, and found that 64.4% of the right end and 40.9% of the left end barcodes were coupled to their correct transposon end sequence (Supplementary Figure S1C). However, uncoupled reads mapped to a diverse pool of sequences, with the most abundant incorrect sequence for each library member representing only a low percentage of total reads (Supplementary Figure S1D). Indeed, on average, the most abundant incorrect sequence for a given library member was only 2.8% and 9.0% for the right and left end libraries, respectively. These data therefore indicate that uncoupling events did not largely affect the ability to calculate relative integration efficiencies for each library member.

Sequence logos were generated with WebLogo 3.7.4. The VchCAST sequence logo in Figure 2B was generated from the six predicted TnsB binding sites.

Figure 2.

Figure 2.

Transposase binding site (TBS) requirements for VchCAST. (A) Schematic representation of the VchCAST transposon end sequences. Bioinformatically predicted transposase binding site (TBS) sequences are indicated with blue boxes and labeled L1–L3 and R1–R3. The 8-bp terminal end sequences that dictate the transposon boundaries are marked with yellow boxes. (B) WebLogo depicting the sequence conservation of the six bioinformatically predicted TBSs. Grey shading indicates residues that are completely conserved across all TnsB binding sites. (C) Relative integration efficiencies (log2-transformed) for mutagenized TBS sequences averaged over all six binding sites, shown as the mean for two biological replicates. Grey shading indicates the position of residues that are conserved across all TnsB binding sites. (D) Top: TBS sequences of the Tn7002 CAST system are shown with individual bases colored based on VchCAST transposon end library data, where red indicates a relatively inefficient residue. Bottom: relative integration efficiencies of VchCAST/Tn7002 chimeric ends verify critical compatibility sequence requirements of TBSs. Data are shown for two biological replicates. (E) Relative integration efficiencies for transposon variants containing altered distances between the indicated TBSs. Orange arrows highlight the 10-bp periodic pattern of activity. Data are shown for two biological replicates.

Cloning, testing and analysis of pooled pTarget libraries

pTarget libraries were designed to include an 8-bp degenerate sequence positioned 42-bp downstream of one of two potential target sites, as schematized in Figure 3B. Integration was directed to one of the two target sites flanking the degenerate sequence by a single plasmid (pSPIN) encoding both the donor molecule and transposition machinery under the control of a T7 promoter, on a pCDF backbone [described in (33)]. To generate insert DNA for cloning the pTarget libraries, two partially overlapping oligos (oSL2241 and oSL2245, Supplementary Table S2) were annealed by heating to 95°C for 2 min and then cooling to room temperature. Annealed DNA was treated with DNA Polymerase I, Large (Klenow) Fragment (NEB) in 40 μl reactions and incubated at 37°C for 30 min, then gel-purified (QIAGEN Gel Extraction Kit). Double-stranded insert DNA and vector backbone was digested with BamHI and AvrII (37°C, 1 h); the digested insert was cleaned-up (MinElute PCR Purification Kit) and the digested backbone was gel-purified. Backbone and insert were ligated with T4 DNA Ligase (NEB), and ligation reactions were used to transform electrocompetent NEB 10-beta cells in four individual electroporation reactions according to the manufacturer's protocol. After recovery (37°C for 1 h), cells were plated on large bioassay plates containing LB-agar with 50 μg/ml kanamycin. Thousands of colonies were scraped from each plate, and plasmid DNA was isolated using the QIAGEN Plasmid Midi Kit. Plasmid DNA was further purified by mixing with Mag-Bind TotalPure NGS Beads (Omega) at a vol:vol ratio of 0.60× and extracting the supernatant to remove contaminating fragments smaller than ∼450 bp.

Figure 3.

Figure 3.

Transposase sequence preferences influence integration site patterns. (A) Deep sequencing of VchCAST integration sites revealed biases in integration site preference between four target sites (4–7) located in the lac operon of the E. coli BL21(DE3) genome (top row) or encoded on a separate target plasmid (second row). Chimeric target plasmids that either maintain the 32-bp target site (third row) or 60-bp downstream region (bottom row) of target 4 were also tested. These data reveal that sequence identity of the downstream region (including the integration site), but not the target site, governs the observed differences in integration site distribution. (B) Schematic of integration site library experiments, in which integration was directed into an 8-bp degenerate sequence encoded on a target plasmid (pTarget). (C) Sequence logo of preferred integration site, generated by selecting nucleotides from the top 5000 enriched sequences across all integration positions in each library, with a minimum threshold of four-fold enrichment in the integrated products compared to the input. (D) The preferred 5’-CWG-3’ motif in the center of the TSD is predictive of integration site distribution, as the displacement of this motif within the degenerate sequence shifts the preferred integration site distance, indicated by the red number.

2 μl of DNA solution containing 200 ng of pTarget and pSPIN at equal mass amounts was used to co-transform electrocompetent E. coli BL21(DE3) cells according to the manufacturer's protocol (Sigma-Aldrich). Three transformations were performed and plated on large bioassay plates containing LB-agar with 100 μg/ml spectinomycin and 50 μg/ml kanamycin. Thousands of colonies were scraped from each plate, and plasmid DNA was isolated using the QIAGEN Plasmid Midi Kit.

Integration into pTarget yielded a larger plasmid than the starting input plasmid. To isolate the larger plasmid, we performed a digestion step that facilitated resolution of the integrated and unintegrated bands on an agarose gel, for extraction of the larger integrated plasmid. We performed this digestion step on both input and output libraries, digesting with NcoI-HF (37°C for 1 h) and running them on a 0.7% agarose gel. The products were gel-purified (QIAGEN Gel Extraction Kit) and eluted in 15 μl EB in a MinElute Column (QIAGEN). 6.5 μl of cleaned-up DNA was used in each PCR1 amplification with Q5 High-Fidelity DNA Polymerase (NEB) for 15 cycles. PCR1 samples were diluted 20-fold and amplified in 10 cycles for PCR2. PCR1 primer pairs contained pTarget backbone-specific primers flanking a 45-bp region encompassing the degenerate sequence. Sequencing was performed with a paired-end run using a NextSeq High Output Kit with 150-cycles (Illumina).

NGS data analysis was performed using a custom Python script. Demultiplexed reads were filtered to remove reads that did not contain a perfect match to the 34- to 35-bp sequence upstream of the degenerate sequence for any i5-reads, or to the 45- to 46-bp sequence for any i7-reads (35-bp and 46-bp was used for reads that were amplified from primers containing an additional nucleotide, which were used in PCR1 to generate cluster diversity during sequencing). For all reads that passed filtering, the 8-bp degenerate sequence was extracted and counted. The integration distance was determined in the output libraries by examining the i5 read sequence at an integration distance of 43- to 56-bp downstream of each target for the presence of the transposon right or left end sequence (20-nt of each end). The degenerate sequence was then extracted from either or both of the i5 and i7 reads, depending on the integration position. The degenerate sequence counts were summed across the two primer pairs. The relative abundance was determined by dividing the degenerate sequence count by the total number of degenerate sequence counts. Finally, the fold-change between the output and input libraries was calculated by dividing the relative abundance of each degenerate sequence at each integration position in the output library by its relative abundance in the input library, and then log2-transformed.

Sequence logos were generated with WebLogo 3.7.4. The preferred integration site logos in Supplementary Figure S3A were generated from all degenerate sequences that were enriched four-fold in the integrated products compared to the input. The overall preferred integration site logos in Figure 3C and Supplementary Figure S3E were generated by first applying the minimum threshold of four-fold enrichment in the integrated products compared to the input, and then selecting nucleotides from the top 5000 enriched sequences across all integration positions. We selected nucleotides from the top 5000 sequences from each library, yielding a total of 10 000 nucleotides at each position.

Endogenous protein tagging experiments

All VchCAST constructs were subcloned from pEffector and pDonor as described previously, using a combination of inverse (around-the-horn) PCR, Gibson assembly, restriction digestion-ligation, and ligation of hybridized oligonucleotides (23,30). pEffector encodes a CRISPR array (repeat-spacer-repeat), a native tniQ-cas8-cas7-cas6 operon, and a native tnsA-tnsB-tnsC operon, all under the control of a single T7 promoter on a pCDFDuet-1 backbone (30). Donor plasmids (pDonor) were designed to encode a mini-transposon (mini-Tn) with a wild-type 147-bp transposon left end and 57-bp linker-coding right end variant, on a pUC19 backbone. For endogenous protein tagging experiments, superfolder GFP (sfGFP) lacking a ribosome binding site (RBS) and start codon was cloned into the mini-Tn cargo region, and the mini-Tn was further cloned into a temperature-sensitive pSIM6 backbone.

Linker functionality constructs were designed to encode sfGFP with an extended 32-amino acid (aa) loop region between the 10th and 11th β-strands, under the control of a single T7 promoter, as described by Feng and colleagues (34). Linker variants encoding 18–19 aa were subcloned into the 32-aa loop region as follows. An entry vector was generated on a pCOLADuet-1 (pCOLA) vector harboring sfGFP, such that the 11th β-strand (GFP11) was replaced by the aforementioned extended 32-aa loop (34). Fragments encoding transposon right end linker variants and GFP11 were then amplified by conventional PCR and inserted into the extended loop region of the entry vector downstream of β-strands 1–10 (GFP1-10), such that total length of the loop remained constant at 32 aa.

To perform linker functionality assays, chemically competent E. coli BL21(DE3) cells were co-transformed with T7-controlled sfGFP linker functionality constructs (pCOLA) and an equimolar amount of empty pUC19 vector. Negative control transformants harbored both unfused sfGFP1-10 and sfGFP11 fragments on separate pCOLA and pUC19 backbones, respectively, or harbored isolated sfGFP fragments. Transformed cells were plated on LB-agar plates with antibiotic selection (100 μg/ml carbenicillin, 50 μg/ml kanamycin), and single colonies were used to inoculate 200 μl of LB medium (100 μg/ml carbenicillin, 50 μg/ml kanamycin, 0.1 mM IPTG) in a 96-well optical-bottom plate. The optical density at 600 nm (OD600) was measured every 10 min, in parallel with the fluorescence signal for sfGFP, using a Synergy Neo2 microplate reader (Biotek) while shaking at 37°C for 15 h. To derive normalized fluorescence intensities (NFI), all measured fluorescence intensities were divided by their corresponding OD600 values across all time points.

Transposition experiments were performed by transforming chemically competent E. coli BL21(DE3) cells harboring pEffector plasmids with pDonor plasmids by heat shock at 42°C for 30 s, followed by recovery in fresh LB medium. Recovery was performed at 30°C for 1.5 h for temperature-sensitive pDonor plasmids, and 37°C for 1 h for all other pDonor plasmids. Transformants were isolated on LB-agar plates containing the proper antibiotics and inducer (100 μg/ml carbenicillin, 100 μg/ml spectinomycin, 0.1 mM IPTG). After 43 h growth at 30°C for temperature-sensitive pDonor plasmids, and 18 h growth at 37°C for all other pDonor plasmids, samples were prepared for downstream qPCR analysis of integration efficiency or colony PCR identification of integration events.

For qPCR quantification, colonies were scraped from plates and resuspended in LB medium, and cell lysates were prepared for qPCR as described by Klompe and colleagues (23). Pairs of transposon- and target DNA-specific primers were designed to amplify fragments from integrated transposition products at the expected loci in either of two possible orientations. In parallel, a separate pair of genome-specific primers was designed to amplify an E. coli reference gene (rssA) for normalization purposes. qPCR reactions (10 μl) contained 5 μl of SsoAdvanced Universal SYBR Green Supermix (BioRad), 1 μl Milli-Q H2O, 2 μl of 2.5 μM primers, and 2 μl of hundredfold-diluted cell lysate. Reactions were prepared in 384-well clear/white PCR plates (BioRad), and measurements were obtained in a CFX384 Real-Time PCR Detection System (BioRad). Transposition efficiency was calculated for each orientation as 2ΔCq, in which ΔCq is the Cq difference between the experimental and control reactions. All measurements presented were determined from three independent biological replicates.

For colony PCR identification of integration events, colonies were scraped from plates after transposition assays, resuspended in fresh LB medium, and re-streaked on LB-agar plates with the appropriate antibiotics and without IPTG inducer. To generate lysates, individual colonies were each transferred to 10 μl of Milli-Q H2O, followed by incubation at 95°C for 2 min and centrifugation at 4000 × g for 5 min to pellet cell debris. Pairs of transposon- and target DNA-specific primers were designed to amplify fragments from integrated transposition products in the expected locus and orientation. In parallel, a separate pair of genome-specific primers was designed to amplify an E. coli reference gene (rssA) and determine whether the crude lysates were sufficiently dilute to allow successful amplification of the integrated transposition product. To verify in-frame integration events, amplicons of the expected length were excised after gel electrophoresis, purified by the Gel Extraction Kit (QIAGEN), and sent for Sanger sequencing (GENEWIZ).

Fluorescence microscopy experiments were performed as follows. A pEffector plasmid was designed to C-terminally tag the native E. coli msrB gene by integrating a mini-Tn encoding a linker variant (ORF2a) and sfGFP cargo in-frame with the coding sequence, thereby interrupting the endogenous stop codon. Transposition experiments were performed as described above by transforming chemically competent E. coli BL21(DE3) cells harboring pEffector plasmids with temperature-sensitive pDonor plasmids. Colonies were then scraped and resuspended in fresh LB medium. Resuspensions were diluted and re-streaked on double antibiotic LB-agar plates lacking IPTG (100 μg/ml carbenicillin, 100 μg/ml spectinomycin). After overnight growth on solid medium at 30°C, individual colonies were used to inoculate liquid cultures (100 μg/ml spectinomycin) for overnight heat-curing at 37°C, followed by replica plating on single and double antibiotic plates to isolate heat-cured samples. In tandem, colony PCR and Sanger sequencing (GENEWIZ) were performed to identify colonies with in-frame transposition products as described above. On the day of imaging, 500 μl of saturated overnight cultures was transferred to 5 ml of fresh LB medium with the appropriate antibiotics. Aliquots of the newly inoculated cultures were removed around the stationary or mid-log phases and immobilized in glass slides coated with partially dehydrated aqueous 1% agarose-TAE pads. Immediately after immobilization, fluorescent microscopy was performed with a Nikon ECLIPSE 80i microscope using a 100× oil immersion objective, which was equipped with a Spot CCD camera and SpotAdvance software. All images were processed in ImageJ by normalizing background fluorescence.

Western blots were performed as follows. E. coli cultures were diluted 1:100, and cells were grown to an OD600 of 0.6 by shaking at 37°C. 2.5 ml of culture were pelleted by centrifugation, and pellets were resuspended in 150 μl lysis buffer (20 mM Tris pH 7.5, 150 mM NaCl, 0.5 mg/ml lysozyme). Samples were incubated at room temperature for 10 min before adding 1% N-lauroyl sarcosine, 10 mM DTT, and 2X SDS loading dye (100 mM Tris-Cl (pH 6.8), 4% (w/v) SDS, 0.07% (w/v) bromophenol blue, 30% (v/v) glycerol), and boiling at 95°C for 10 min. 3 μl of each sample was run on a precast PAGE gel (Bio-Rad, Mini-PROTEAN TGX) at 100 V for 90 min. Proteins were transferred to a PVDF membrane (Invitrogen iBlot 2 Transfer Stack), and subsequent washing, blocking, and antibody incubation steps were performed by gentle nutation at room temperature. Membranes were washed once in 1× PBS, followed by three times in 1× PBS + 0.1% Tween 20. Blocking was performed for 1 h in blocking buffer (1× PBS, 0.1% Tween 20, 5% BSA), followed by two washes in 1× PBS + 0.1% Tween 20. Blots were incubated in primary antibodies diluted in blocking buffer for 1 h at room temperature with either a 1:2000 dilution of anti-GFP antibody (Anti-GFP from Mouse IgG1, Roche SKU 11814460001) or a 1:5000 dilution of anti-GAPDH antibody (GAPDH Loading Control Monoclonal Antibody, Invitrogen SKU MA5-15738). Primary antibodies were detected with HRP-conjugated anti-mouse-IgG1 (Goat Anti-Mouse IgG1 Heavy Chain HRP, Abcam SKU ab97240) diluted 1:50 000 in blocking buffer. Finally, enhanced chemiluminescence was performed (ThermoFisher SuperSignal West Dura Extended Duration Substrate), and blots were visualized on an Amersham Imager 600 (GE).

Generating and testing E. coli knockout mutants

E. coli genomic knockouts of ihfA, ihfB, ycbG, hupA, hupB, hns and fis were generated using Lambda Red recombineering, as previously described (35). Knockouts were designed to replace of each gene with a kanamycin resistance cassette, which was amplified by PCR with Q5 High-Fidelity DNA Polymerase (NEB) using primers that contained 50-nt homology arms to knockout gene locus. PCR amplicons were resolved on a 1% agarose gel and gel-purified, eluting with 40 μl Milli-Q H2O (QIAGEN Gel Extraction Kit). Electrocompetent E. coli BL21(DE3) cells were prepared containing a temperature-sensitive plasmid that encodes the Lambda Red machinery under the control of a temperature-sensitive promoter (pSIM6). Protein expression from the temperature-sensitive promoter was induced by incubating cells at 42°C for 25 min immediately prior to electrocompetent cell preparation. 300–600 ng of each insert was used to transform cells via electroporation (2 kV, 200 Ω, 25 μF), and cells were recovered overnight at 30°C by shaking in 3 ml of SOC media. After recovery, 250 μl of culture was spread on 100 mm standard plates (LB-agar with 50 μg/ml kanamycin) and grown overnight at 30°C. Kanamycin-resistant colonies were isolated, and the genomic knock-in was confirmed by PCR amplification and Sanger sequencing (GENEWIZ) using primer pairs flanking the knock-in locus.

VchCAST transposition experiments in E. coli knockout strains were performed by first preparing chemically competent WT and mutant cells and then transforming these strains with a single plasmid (pSPIN), which encodes the donor molecule and the native transposition machinery under the control of a T7 promoter and a crRNA targeting the lacZ genomic locus, on a pCDF backbone. After transformation by heat shock, cells were plated onto LB-agar with 100 μg/ml spectinomycin and 0.1 mM IPTG to induce protein expression, and incubated at 37°C for 18 h. Hundreds of colonies were scraped from each plate, and integration efficiencies were quantified using the same qPCR assay described for the endogenous protein tagging experiments. Transposition experiments for other Type I-F homologs were performed as in the VchCAST experiments, except that the concentration of IPTG was reduced to 0.01 mM to mitigate toxicity.

Experiments that tested protein expression conditions in WT and ΔIHF cells were performed as described in the VchCAST transposition experiments. Promoters were varied from constitutive promoters (J23119, J23101) to inducible promoters (T7), for which different concentrations of IPTG were also tested.

For the complementation experiments, cells were co-transformed with pSPIN and a rescue plasmid (pRescue) that encoded both E. coli ihfA and ihfB under the control of separate T7 promoters on a pACYC backbone, and plated onto LB-agar with 100 μg/ml spectinomycin, 25 μg/ml chloramphenicol, and 0.1 mM IPTG to induce protein expression. Cells were incubated at 37°C for 18 h, before colonies were scraped from each plate and integration efficiencies in both orientations were measured by qPCR.

To test DNA donor molecules with symmetric transposon ends, we cloned mutant pDonor encoding two right or two left transposon ends, and measured integration efficiency by co-transforming pDonor with pEffector under the control of a T7 promoter on a pCDF backbone. Cells were plated onto LB-agar with 100 μg/ml spectinomycin, 100 μg/ml carbenicillin, and 0.1 mM IPTG and incubated at 37°C for 18 h, before colonies were scraped from each plate and integration efficiencies in both orientations were measured by qPCR.

E. coli Tn7 transposition experiments and NGS analysis

To measure the integration efficiencies and distance distributions of E. coli Tn7 in WT and E. coli mutant cells, we cloned genomic primer binding sites into the mini-Tn cargo of a single plasmid for Tn7 transposition, which encoded a native tnsA-tnsB-tnsC-tnsD operon under the control of a constitutive pJ23119 promoter, on a pCDF backbone. The genomic primer binding sites were cloned adjacent to the transposon left and right ends such that the NGS amplicon length would be the same for unintegrated products and integrated products in either orientation (schematized in Supplementary Figure S7A). To quantify integration efficiencies using qPCR, we used primer pairs designed to amplify integrated products in both orientations, with one primer adjacent to the right transposon end a second primer either upstream or downstream of the integration site.

To quantify integration efficiencies by NGS, we amplified genomic DNA using a single primer pair with one primer complementary to the genomic primer binding site and the second primer complementary to the 3’-end of the glmS locus. Genomic DNA was extracted using the Wizard Genomic DNA Purification Kit (Promega). 250 ng of genomic DNA was used in each PCR1 amplification with Q5 High-Fidelity DNA Polymerase (NEB) for 15 cycles. PCR1 samples were diluted 20-fold and amplified in 10 cycles for PCR2. Sequencing was performed with a paired-end run using a NextSeq High Output Kit with 150-cycles (Illumina).

NGS data analysis was performed using a custom Python script. Demultiplexed reads were filtered to remove reads that did not contain a perfect match to the first 65-bp of expected sequence resulting from either non-integrated genomic products or from integration events spanning 0- to 30-bp downstream of the glmS locus, and then we counted the number of reads matching each of these possible products.

RESULTS

A pooled library approach to investigate transposon end sequence requirements

We set out to mutagenize the transposon left and right end sequences of V. cholerae Tn6677 (VchCAST) using large pooled oligoarray libraries, building off our previous study of the VchCAST system (23). Starting with a minimal pDonor design that directed efficient genomic integration in both of two possible orientations (Figure 1B), we designed thousands of variants of the left (L) and right (R) end sequences, including truncations, base-pair substitutions, and transposase binding site modifications (Figure 1C, Supplementary Figure S1A and Supplementary Tables S3 and S4). We assigned each variant a unique 10-bp barcode located between the transposon end variant and the cargo, obviating the requirement to sequence across the entire transposon end to identify each variant. Each library also included four wildtype (WT) variants associated with unique barcodes, which we used for downstream validation of our experimental setup and to approximate the relative integration efficiency of other library members. Libraries were then synthesized as single-stranded oligos, cloned into a mini-transposon donor (pDonor), and carefully characterized using next-generation sequencing (NGS), which demonstrated that all members were represented in the input sample for both transposon left and right end libraries (Supplementary Figure S1B–E).

We performed transposition experiments by transforming E. coli BL21(DE3) cells expressing the transposition machinery with pDonor encoding either the left end or right end library, amplifying successful genomic integration products in both orientations via junction PCR (Figure 1D), and subjecting PCR products to NGS analysis. An enrichment score was then calculated for each variant, revealing a wide range of integration efficiencies, with most library members exhibiting diminished integration relative to the four WT samples (Supplementary Figure S1E). Finally, we used enrichment scores of the WT library members for normalization, yielding a score for each variant that represented its relative activity. To validate our approach, we performed two biological replicates for each library transposition experiment and found strong concordance between both datasets, especially in the dominant orientation, ‘T-RL,’ in which the transposon was integrated in the right–left (‘RL’) orientation relative to the target site (Supplementary Figure S1F). Importantly, we also determined the background level of library member–barcode uncoupling, given the high degree of sequence similarity between library members, which established contributors of experimental noise in our datasets (Supplementary Figure S1C, D and Materials and Methods).

In previous work, we tested truncations of the transposon ends by cloning and testing transposon end variants individually (23). To explore the strength and verify the robustness of our pooled library approach, we generated a similar but vastly expanded panel of truncations by sequentially mutating the transposon end sequences, effectively creating end truncations, albeit without a change in overall mini-transposon size (Figure 1E and Supplementary Figure S1A). A single pooled transposition experiment then confirmed that, to facilitate efficient integration, a minimum of three transposase (TnsB) binding sites are required within the left end but only two are required within the right end. These findings are consistent with previous literature and add information at single-bp resolution to the minimal transposon end sequences for efficient integration (23).

Transposase activity depends on specific sequence requirements

TnsB is integral to the mobilization of Tn7-like transposons, in that it catalyzes the excision and integration chemistry while also conferring sequence specificity for the transposon ends through recognition of repetitive sequence elements known as TnsB binding sites (TBSs) (7,14,36). Sequence analysis of the native VchCAST ends revealed three conserved TBSs in both the left and right ends (Figure 2A, B and Supplementary Figure S2A) (23), and we verified these sequence requirements by examining a mutational panel at single-bp resolution (Figure 2C and Supplementary Figure S2B). This dataset revealed that individual TBS point mutations can affect efficiency, particularly for positions 1, 6–9 and 12–14 (of which all but position 9 are completely conserved across TnsB binding sites), but are not critical for integration. This lenient sequence requirement is in line with recently published cryo-EM structures of DNA-bound TnsB from Tn7 and Type V-K CAST systems, which revealed that many protein-DNA interactions occur with the phosphodiester backbone rather than specific nucleobases (37–39).

Experiments with E. coli Tn7 showed that the internal TBSs are occupied before the more terminal sites (7). Even though the majority of bases within the six TBSs of VchCAST are conserved (Figure 2B and Supplementary Figure S2A), we wondered if the existing differences might be biologically important, perhaps by enforcing a specific assembly pathway. To test this hypothesis, we tested all possible combinations of TBSs for the left and right ends, which we defined as L1–L3 and R1–R3 (Supplementary Figure S2C). For both VchCAST ends, site 1 displayed the strongest TBS preference and preferred the L1, L3 or R1 sequence, whereas site 2 preferred L1, R1 or R2; site 3 exhibited the weakest TBS preference but favored L3. We observed a preference for R1 in the first position on the left end, and a preference for L1 in the first position on the right end, suggesting that transposition might be favored when the terminal end sequences are identical (whether based on equal affinity or otherwise).

Apart from regulating transposition frequency, TBS sequence identity could also explain the propensity of a given CAST system to cross-react with related transposon substrates (17). We previously showed that VchCAST could efficiently mobilize mini-transposon substrates from three homologous CAST systems, but not from Tn7002. To determine which Tn7002 sequences were incompatible with mobilization by VchCAST machinery, we designed chimeric transposon ends that contain parts of both the VchCAST and Tn7002 transposon ends (Figure 2D). The data revealed that chimeric left ends allowed for near WT integration efficiencies whereas chimeric right ends drastically decreased integration efficiency, likely due to the deleterious presence of a cytidine at position 9 of R1–R3 (Figure 2D). These data thus demonstrate that TBS sequence identity imparts specific constraints on the substrate recognition of a transposase for its cognate transposon DNA.

Finally, we sought to investigate the conserved positioning of TBSs within the transposon ends, after hypothesizing that the specific distance between TBSs might facilitate proper assembly of transposase subunits within a paired-end-complex (PEC) (17). After testing a panel of variants in which the length between TBSs was systematically varied (Figure 2E and Supplementary Figure S2D), we found that even single-bp perturbations caused drastic changes in integration efficiency. Additionally, we detected an intriguing pattern of increasing and decreasing integration efficiencies at roughly 10-bp intervals, suggesting that the three-dimensional positioning of transposase proteins on helical DNA is important for transposition.

Together, these data highlight the impact of TBS mutations and TBS sequence positioning on transposition, and provide clues about how TBSs may have evolved to direct efficient assembly of synaptic paired-end complexes.

Transposase sequence preferences influence integration site patterns

In our previous work, we showed that VchCAST integration patterns differed in subtle but reproducible ways between distinct genomic target sites (23,30). Since integration is the result of both RNA-guided DNA targeting and transposase-mediated DNA integration, we wondered which DNA sequences and protein machineries were responsible for the heterogeneity in integration products. First, we used deep sequencing to compare integration site patterns for four endogenous E. coli target sequences, designated 4–7, either at their native genomic location or on an ectopic target plasmid (Figure 3A). Integration site patterns were notably distinct between the four targets but were highly consistent between genomic and plasmid contexts, suggesting that these patterns are dependent on local sequence alone and independent of other factors such as DNA replication or local transcription. Next, to disentangle contributions of the 32-bp target sequence (complementary to crRNA guide) from the downstream region including the integration site, we tested target plasmids that contained chimeras of the four target regions (Figure 3A). Remarkably, integration patterns for these chimeric substrates closely mirrored the patterns observed for the non-chimeric substrates when the ‘downstream region’ was kept constant, clearly indicating that the sequence identify of the 32-bp target region alone does not modulate selection of the integration site.

We hypothesized that, like other transposases, TnsB might exhibit local sequence preferences immediately at the site of DNA insertion, and that these preferences could explain the observed heterogeneity in integration site patterns (40). To test this possibility, we generated a target plasmid (pTarget) library encoding two target sequences, designated, ‘Target A’ and ‘Target B,’ flanking an 8-bp degenerate sequence, such that integration events directed by a crRNA matching either target would lead to insertion into the degenerate 8-mer sequence (Figure 3B). We sequenced the target plasmids before and after transposition and compared the representation of integration site sequences to determine which sequences were enriched after transposition. These analyses revealed striking nucleotide preferences at conserved positions relative to the integration site (Figure 3C and Supplementary Figure S3A). Specifically, there were clear biases for a YWR motif within the central three nucleotides of the target-site duplication (TSD), as well as a preference for D (A, T or G) and H (A, T or C) at the –3 and + 3 positions relative to the TSD, respectively. Similar TSD preferences were previously observed for the Type V-K ShCAST system (24), suggesting that they may be broadly applicable to TnsB-family transposases.

To further explore the deterministic role of the preferred motif within the TSD, we plotted the distribution of reads containing a central 5’-CWG-3’ motif at different positions within the degenerate sequence. We focused on this motif because it favored a more unimodal distribution for the integration site by avoiding a centrally-preferred A or T nucleotide flanking the W. We found that this motif was indeed predictive of the preferred integration site distance that was sampled by VchCAST (Figure 3D). We extended this observation by plotting the distribution of reads containing multiple 5’-CWG-3’ motifs within the integration site and found that two copies of this preferred motif within the integration site conferred a bimodal distribution, wherein there were not one but two preferred integration sites within the degenerate sequence (Supplementary Figure S3B). Finally, we examined the integration site distribution of previously targeted locations (23) and found that they corresponded to the preferred sequence motifs determined in our library experiment (Supplementary Figure S3C). Indeed, the dominant integration distance(s) always encoded more preferred motifs within the TSD relative to the motifs found at neighboring positions.

Both of the two distinct crRNAs and corresponding target sites on pTarget yielded consistent sequence preferences for both the TSD and ±3-bp positions (Supplementary Figure S3A), but we were surprised to find that the preferred integration distance was shifted by 1 bp when comparing the two (Supplementary Figure S3D). We suspected that this difference could be due to sequences preferences at the ±3-bp position that fell outside the degenerate sequence, and indeed, when we examined the sequences flanking the 8-mer library, we found that the downstream target (target B) contained a disfavored nucleotide in the -3-bp position for insertions that would occur with the 49-bp distance (Supplementary Figure S3E). Interestingly, the role for these positions in modulating transposition behavior is supported by two recent structures of TnsB from Type V-K ShCAST bound to strand-transfer DNA (38,39), which revealed residue K290 of both terminal TnsB protomers in close proximity to the ±3-bp position outside of the target site duplication.

Role of boundary sequences and right end internal features on DNA integration

We next focused our attention on additional sequence features at the outermost edges of mini-transposon substrates. VchCAST and many other Tn7-like transposons encode an 8-bp terminal end immediately adjacent to the first transposase binding site, with the terminal TG dinucleotide highly conserved among a broad spectrum of transposons including IS3, Tn7, Mu and even retrotransposons (41–44). Integration data with library variants that featured mutations within these terminal residues revealed that positions 1–3, but not 4–8, were critical for efficient transposition (Supplementary Figure S4B). This result is consistent with the DNA-bound cryo-EM structure of TnsB from a Type V-K CAST system, in which base-specific interactions were observed for the terminal TG dinucleotide (38), and with experiments indicating that these terminal dinucleotides are important for the formation of a stable Mu transpososome complex (42,45). Sequences beyond the terminal TG are also acted upon during excision of Tn7-like transposons, since the endonuclease TnsA cleaves the 5’ ends of the donor DNA 3-bp outside the transposon end boundaries (46). Also, the sequences flanking the donor impact transposition efficiency of TnsB from a Type V-K CAST (47). These observations suggested the possibility that the sequence context of the transposon donor itself might play a role in efficient transposition. However, library variants with mutations in the 5-bp sequence flanking the mini-transposon were integrated with equivalent efficiencies (Supplementary Figure S4A), indicating that transposition machinery does not exhibit sequence specificity within this region.

To investigate whether the spacing between the terminal TG dinucleotide and the first TBS mattered, we tested variants that modulated the distance between the 8-bp terminal end and TBS1 (Supplementary Figure S4C). Adding a single base pair in either the left or right end still allowed for efficient transposition, whereas transposition was completely ablated with the removal of 1 bp or addition of 2 bp, indicating tight control over this spacing. Interestingly, larger bp additions or deletions between the TG dinucleotide and first TBS were in some cases also permitted, but always with a concomitant shift in the transposon boundary that was actually mobilized and integrated at the target site (Supplementary Figure S4C); in all cases, transposition still required a terminal TG. These data therefore suggest that the essential feature within the terminal end sequence is the TG dinucleotide, and that the ∼8-bp spacing between this dinucleotide and the first TBS is critical for efficient transposition.

We also further investigated the importance of a palindromic sequence found 97–107 bp from the transposon right end boundary. Previous work suggested that this sequence might affect integration orientation, possibly by promoting transcription of the tnsABC operon, which would be consistent with empirical expression data and the AT-richness of the transposon end (48). To test this possibility, we mutated the sequence and found that not the palindromic nature, but the sequence of only one arm of the palindrome (PB) was sufficient to shift the orientation bias away from T-RL (Supplementary Figure S4D, E). We also included bona fide constitutive promoters in place of the palindromic sequence and found that promoters directing transcription inwards (towards the cargo) did not impact integration orientation, whereas promoters directed outwards (across the right end) shifted the orientation preference towards T-LR, perhaps by antagonizing stable assembly of TnsB selectively at the right end (Supplementary Figure S4F). These data highlight the role of this right end sequence region on integration orientation, which should be considered when designing custom cargo sequences.

Endogenous protein tagging with rationally engineered right ends

The left and right end sequences are critical for transposon DNA recognition and excision/integration, and transposition products therefore necessarily include these sequences as ‘scars’ at the site of insertion. We sought to exploit this feature and use our new knowledge of the mutability of the transposon ends to convert these scars into functional sequences that encode amino acid linkers for downstream protein tagging applications. We focused on the shorter right end, starting with a minimal 57-bp sequence, and observed that stop codons were present in all three possible open reading frames (ORF) for the WT sequence (Figure 4A) (23). When we tested a library of rationally designed right end variants that replaced stop codons and codons encoding bulky and/or charged amino acids (Supplementary Figure S5A), we identified numerous candidates for each possible ORF that maintained near-wild-type integration efficiency (Supplementary Figure S5B). After validating library data by testing individual linker variants for genomic integration in E. coli (Figure 4B), we next set up a fluorescence-based assay to test for functionality of the encoded amino acid linkers.

Figure 4.

Figure 4.

Engineered transposon right ends enable functional in-frame protein tagging. (A) An illustration of the minimal transposon right end sequence (‘WT-min’) and the amino acids it encodes in three different reading frames. The 8-bp terminal end (yellow box) and TBSs (blue boxes) are shown. (B) Integration efficiencies for individual pDonor variants in which stop codons and codons encoding bulky/charged amino acids were replaced, as determined by qPCR. ‘Vector only’ refers to the negative control condition where pEffector was co-transformed with a vector that did not encode a transposon. (C) Select right end linker variants were cloned in between the 10th and 11th β-strands of GFP, in order to identify stable polypeptide linkers that still allow for proper formation and fluorescence activity of GFP. Normalized fluorescence intensity (NFI) was calculated using the optical density of each culture and is plotted for each linker variant alongside wildtype GFP. (D) Schematic of a proof-of-concept experiment in which the endogenous E. coli gene msrB is tagged with GFP by targeted, site-specific RNA-guided transposition. (E) Fluorescence microscopy images reveal functional tagging of MsrB with the linker variant right end, but not the WT, stop codon-containing right end. Scale bar represents 10 μm. (F) Western blots with anti-GFP antibody (top) and anti-GAPDH antibody (bottom) as loading control. The four samples are unmodified BL21(DE3) cells (‘–’), cells that underwent transposition with a GFP-encoding donor plasmid using either the WT transposon end (‘WT’) or the modified ORF2a transposon end (‘Variant’), and cells expressing a plasmid encoding GFP driven by a T7 promoter (‘pGFP’). The expected size of GFP alone is 26.8 kDa, while the expected size of the MsrB-GFP fusion product is ∼42 kDa.

GFP naturally consists of 11 β-strands that are connected by small loop regions, and a prior study demonstrated that the loop region between the 10th and 11th β-strand can be extended with novel linker sequences while still allowing for proper folding and fluorescence of the variant GFP protein (34). We cloned selected transposon right end variants into the loop region between β-strand 10 and 11 and measured GFP fluorescence intensity after expression of each construct, which revealed a subset of variants that were fully functional (Figure 4C and Supplementary Figure S5C). Next, we selected the endogenous E. coli gene msrB for C-terminal tagging in a proof-of-concept experiment (Figure 4D). msrB encodes the enzyme MsrB, a methionine sulfoxide reductase (49,50), which has been fluorescently tagged and imaged in an endogenous context by others (51) and contains a PAM sequence that allows for DNA insertion within the msrB stop codon, providing an ideal target for this initial trial. After generating a pDonor construct that encodes a right end linker variant with an adjacent, in-frame GFP gene lacking a promoter or start codon, we performed transposition experiments and used Sanger sequencing to verify that integration interrupted the endogenous stop codon while placing the linker and GFP sequence directly in-frame. Proper expression of MsrB-GFP fusion proteins was analyzed by imaging cells via fluorescence microscopy that received either the WT transposon right end or the linker variant, demonstrating that only the modified right end variant elicited the expected cellular fluorescence (Figure 4E and Supplementary Figure S5D). Finally, to confirm that GFP was translationally fused to MsrB, we performed an anti-GFP western blot and found that GFP was not detected in the WT transposon end fusion but was detected at the expected size in the modified linker variant (Figure 4F). Together, these data provide the basis for new genome engineering tools that allow for facile, endogenous protein tagging with single-bp control.

Integration host factor (IHF) binds the left transposon end to stimulate transposition

Closer inspection of the transposon left end mutational data revealed a sequence between the two terminal TnsB binding sites (TBSs) that, when mutated, led to reproducible transposition defects (Figure 5A). We noticed that the corresponding DNA sequence perfectly matched a consensus binding sequence for integration host factor (IHF) (52,53), a heterodimeric nucleoid-associated protein (NAP) that binds to the consensus sequence 5’-WATCARNNNNTTR-3’ and induces a DNA bend of more than 160° (54). First identified as a host factor for bacteriophage λ integration, IHF is also involved in diverse cellular activities including chromosome replication initiation, transcriptional regulation, and various site-specific recombination pathways (55–57). This observation suggested the intriguing possibility that IHF might also play a role in RNA-guided transposition by CAST systems.

Figure 5.

Figure 5.

IHF involvement in RNA-guided transposition by VchCAST. (A) Library mutagenesis data for the transposon left end. Each point represents the effect of 4-bp mutations, averaged across four variants per base. (B) Integration activity of VchCAST in WT, ΔihfA, and ΔihfB cells. Integration activity was rescued by a plasmid encoding both ihfA and ihfB (pRescue). Each point represents integration efficiency measured by qPCR for one independent biological replicate. (C) Integration activity when the IHF binding site is mutated (Mut), in which all consensus bases within the binding site were modified (from 5’-AATCAGCAAACTTA-3’ to 5’-CCGACTCAACGGC-3’). (D) Conservation of the IHF binding site in the transposon left end of 20 Type I-F CAST systems [first described in (17)]. (E) Sequence logo generated by aligning the left end sequence of all homologs around the conserved IHF binding site. (F) Integration activity in WT and ΔihfA cells for five Type I-F CAST systems. Asterisks indicate the degree of statistical significance: * P ≤ 0.05, ** P ≤ 0.01, ***P ≤ 0.001. (G) Model: IHF binds the left end to resolve the spacing between the first two TBSs, bringing together TnsB protomers to form an active transpososome.

To test whether the IHF binding site in the left transposon end functions to promote transposition, we first generated IHF knockout strains by mutating either ihfA and ihfB, and then measured integration efficiency with WT VchCAST. Deletion of either ihfA or ihfB decreased integration efficiency in the mutant strains by ∼20-fold (Figure 5B), and this effect was completely rescued when we introduced a plasmid encoding recombinant ihfA or ihfB, confirming the IHF knockouts as causative genetic perturbations (Figure 5B). Interestingly, the reduction in integration efficiency was sensitive to vector design and expression conditions, as integration was less dependent on IHF when the donor DNA was encoded on a separate plasmid from the transposition machinery compared to when the donor DNA was encoded on the same plasmid as the transposition machinery (Supplementary Figure S6A). This sensitivity to vector design may be due to differences in the expression of transposition proteins. Even though cells were always grown for 18 h after transformation, when a separate plasmid was used to express the transposition machinery (‘pEffector + pDonor’ conditions), cells already contained the effector plasmid before they were transformed with donor DNA. This longer time for effector proteins to be expressed may have increased transposition efficiency in ΔIHF cells for these conditions. When we selectively mutated the conserved IHF binding site residues of a transposon donor, we found that transposition efficiency decreased (Figure 5C). Moreover, sensitivity to ΔIHF was dependent on the presence of an intact IHF binding site, since the loss of IHF in cells containing a mutant binding site did not cause an additional decrease in transposition efficiency. In other words, our results indicate that mutating the IHF binding site is epistatic to the loss of IHF. These experiments indicate that IHF binds the left transposon end to stimulate RNA-guided transposition.

We next wondered whether the IHF requirement was conserved across diverse I-F CAST systems, taking advantage of the twenty homologous systems that we recently described (17). Visual examination of the transposon left ends revealed a highly conserved IHF binding site across all homologs (Figure 5D, E), and aligning the sequence between the first two TBSs using Clustal Omega also revealed the binding site consensus as a conserved feature (Supplementary Figure S6B). To test whether IHF stimulated transposition for these systems, we performed experiments in WT and ΔIHF cells for five other systems and found that only two (Tn7000 and Tn7014) showed a strong IHF dependence (Figure 5F). These data suggest that the IHF dependence may not be conserved across all I-F CAST systems.

Given the involvement of IHF and, more generally, the importance of donor/target DNA supercoiling and topology for other mobile elements (58,59), we decided to test whether other E. coli NAPs might play a role in transposition. We generated knockout strains of 5 additional NAP genes (ycbG, hupA, hupB, hns, and fis), which play architectural roles in DNA compaction and organization and affect a variety of cellular processes such as transcription, replication, recombination, repair and SOS response (60–62). We measured integration efficiency within these mutant backgrounds and found that only the loss of fis affected transposition, decreasing integration efficiency by 2-fold (Supplementary Figure S6F). When we tested the same cohort of NAP knockouts for transposition with the prototypic Tn7 system, IHF had no effect whereas Fis again influenced integration efficiency, though with a ∼4-fold increase in the knockout strain (Supplementary Figure S7B). Fis (factor for inversion stimulation) plays diverse roles in altering DNA topology, mediating DNA inversions, and regulating gene expression (63–65); these varied roles, and the lack of a clearly defined consensus sequence, make it difficult to know how Fis impacts transposition in either system, or whether changes in integration efficiency might instead be indirect effects. Interestingly, our amplicon-sequencing detection approach for E. coli Tn7 transposition (Supplementary Figure S7A) also yielded new information about the nature of DNA integration products for the well-studied TnsABCD pathway. Whereas prior studies identified a single integration site downstream of the essential glmS gene (66–68), our more high-throughput analyses were able to uncover additional integration events that sampled a wider sequence space, including rare but reproducible transposition products in the less-common T-LR orientation (Supplementary Figure S7C). These findings highlight the value of deep sequencing to thoroughly and unbiasedly query the range of potential integration products for a given transposable element.

Finally, we decided to investigate whether IHF might also bias the orientation of transposon integration for CAST systems, since the IHF binding site is uniquely present within the transposon left end. After testing bidirectional transposition for two CAST systems in both a WT and ΔIHF strain of E. coli, we found that although the loss of IHF did not affect orientation preference for VchCAST, its loss reversed the dominant orientation for Tn7000 from T-RL to T-LR (Supplementary Figure S6C). This result raises the intriguing possibility that IHF may be involved in establishing a transpososome architecture that controls the directionality of DNA insertions. Previous work with the prototypic Tn7 system found that transposon substrates with two right ends were competent for integration whereas two left ends were not (12), and we wondered whether a symmetric VchCAST donor with two right ends would similarly be competent for transposition while also eliminating IHF dependency. In agreement with this hypothesis, the loss of IHF had no impact on transposition with a substrate containing two transposon right ends, which was integrated without orientation bias, while a substrate containing two left ends exhibited severely reduced integration efficiency that retained a dependence on IHF (Supplementary Figure S6D, E). Overall, our data support a model (Figure 5G) in which IHF binds the region between TBSs L1 and L2 to bend the transposon left end and drive DNA integration, akin to the proposed role of HU in Mu transposition [12]. This model is also similar to the role of IHF in CRISPR adaptation, during which IHF binds and bends the leader sequence of the CRISPR array to recruit the Cas1–Cas2 integrase and drive the specificity of leader-proximal integrations (69–71).

DISCUSSION

RNA-guided DNA integration by CRISPR-associated transposons depends on diverse, sequence-specific nucleic acid determinants. Focusing on VchCAST, a highly efficient and accurate CAST system derived from Vibrio cholerae (also known as Tn6677) (23,30), we employed high-throughput screening methods to systematically investigate and characterize these sequence requirements in this study. We first determined the minimal transposon sequences needed for robust activity and validated the importance of each transposase binding site (TBS) found within both left and right ends. Interestingly, our data revealed a broad degree of tolerance to mutagenesis of individual TBSs, a feature corroborated by recent TnsB transposase-DNA structures that show interactions mainly with the DNA backbone rather than specific nucleobases (37–39). The presence of multiple binding sites within each transposon end might allow for accumulative specificity and affinity, and likely play a role in regulating transposition frequency. Our results furthermore suggest that the asymmetric nature of the two transposon ends controls the idiosyncratic preferences of a given element for integrating in one orientation over another.

One limitation of our experimental setup for the transposon end libraries is that we could not directly compare relative integration orientation within the same NGS libraries, since integration events were amplified independently in the T-RL and T-LR orientations. Instead, we inferred approximate integration efficiencies by comparing the enrichment scores of transposon end variants to those of wildtype variants within the same library. We also note that our strategy involved separate mutagenesis of either the left end or right end, but not both transposon ends simultaneously. Lastly, we stress that all transposition assays with pDonor libraries were performed heterologously in E. coli under overexpression conditions, and thus subtleties of transposon end recognition and binding that depend on regulated TnsB expression levels may be obscured.

We uncovered additional regions within the transposon ends that drastically affect integration efficiencies, including a sensitive region within the left end that ultimately revealed a conserved binding site for integration host factor (IHF). Transposition assays with perturbations of the IHF binding site, and in E. coli strains lacking IHF, demonstrated that IHF is critical for efficient transposition of VchCAST and some, but not all, homologous Type I-F CAST systems, at least under the conditions we tested. Systems that were insensitive may still exploit IHF to increase transposition in native environments, where other transposition components may not be as abundant as in our overexpression set-up, or these systems have evolved to bypass this molecular requirement altogether. For VchCAST, where the effect is clear, we propose that IHF is important for the proper quaternary organization of the transpososome, given the role that IHF plays in bending its bound DNA (54,57). This hypothesis is further supported by transposon end variants containing alternate spacing between the TBSs, which revealed a conserved periodicity that is consistent with the helical nature of double-stranded DNA. It is striking that, although Type I-F CASTs rely on a multitude of transposon-encoded genes, diverse DNA sequence determinants, and potential additional host-encoded factors, heterologous assays in E. coli with twenty CASTs from a range of gammaproteobacteria revealed active transposition for all (17). How and why mobile genetic elements would evolve dependencies on host-specific factors are questions that encourage further research into the regulation of transposition and search for additional accessory factors (72), especially in native host organisms.

We also analyzed sequence biases at the site of integration and found a clear preference for insertions into sites containing a central 5’-YWR-3’ motif, with additional nucleotide preferences 3-bp upstream and downstream of the TSD. Interestingly, these are the same regions that appear to make direct contacts with the TnsB transposase from a Type V-K CAST (38). Remarkably, by projecting this new information onto the integration site patterns we previously obtained for a panel of genomic target sites in E. coli, we were able to explain the observed product heterogeneity, thus enabling guide RNA selection with high predictability for integration products at single-bp resolution. Finally, we exploited our dataset on transposon end mutability and integration site preference to design modified transposon variants that enabled in-frame tagging of endogenous protein-coding genes. In a proof-of-concept experiment, we tagged the endogenous E. coli MsrB protein with GFP using a modified short transposon right end and an in-frame gfp gene within the transposon cargo, and similar efforts should enable in-frame tagging in other cell types, where transposon end ‘scars’ are converted into functional sequence modifications.

Our work demonstrates the power of combining rationally designed libraries with deep sequencing approaches. We reveal new insights on the molecular mechanism of RNA-guided transposition while also building a register, at single-bp resolution, of which bases can and cannot be mutated for engineering purposes. We envision combining these insights with future structural data to enable opportunities for rational design of hyperactive transposon end sequences that improve integration activity in other cellular contexts. Collectively, these insights inform both the biology and application potential of CAST systems.

DATA AVAILABILITY

High-throughput sequencing data are available at the National Center for Biotechnology Information (NCBI) Sequence Read Archive (BioProject Accession: PRJNA919078). Custom scripts used for analyses of high-throughput sequencing data are available at GitHub (https://github.com/sternberglab/Walker_Klompe_etal_2023) and on Zenodo (DOI 10.5281/zenodo.7776252). Datasets generated and analyzed in the current study are available from the corresponding authors on reasonable request.

Supplementary Material

gkad270_Supplemental_Files

ACKNOWLEDGEMENTS

We thank S.R. Pesari for laboratory support; N.E. Sanjana for helpful discussions about pooled library experiments; M.A. Hydorn and J.E. Dworkin for fluorescence microscopy support and microscope access; L.F. Landweber for qPCR instrument access; E. Moore and A. Mor for the anti-GFP antibody; and the staff at the JP Sulzberger Columbia Genome Center for NGS support.

Contributor Information

Matt W G Walker, Department of Biological Sciences, Columbia University, New York, NY 10027, USA.

Sanne E Klompe, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.

Dennis J Zhang, Department of Biological Sciences, Columbia University, New York, NY 10027, USA.

Samuel H Sternberg, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health [DP2HG011650, R21AI168976 to S.H.S.]; Pew Biomedical Scholars Program (to S.H.S.); Alfred Sloan Foundation Research Fellowship (to S.H.S.); Irma T. Hirschl Career Scientist Award (to S.H.S.); National Science Foundation (GRFP to M.W.G.W.). Funding for open access charge: Sternberg Lab, Columbia University.

Conflict of interest statement. Columbia University has filed a patent application related to this work for which M.W.G.W., S.E.K., D.J.Z. and S.H.S. are inventors. M.W.G.W., S.E.K. and S.H.S. are inventors on other patents and patent applications related to CRISPR-Cas systems and uses thereof. M.W.G.W. is a co-founder of Can9 Bioengineering. S.H.S. is a co-founder and scientific advisor to Dahlia Biosciences, a scientific advisor to CrisprBits and Prime Medicine, and an equity holder in Dahlia Biosciences and CrisprBits.

REFERENCES

  • 1. Feschotte C., Pritham E.J.. DNA transposons and the evolution of eukaryotic genomes. Genetics. 2007; 41:331–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Dubin M.J., Scheid O.M., Becker C.. Transposons: a blessing curse. Curr. Opin. Plant Biol. 2018; 42:23–29. [DOI] [PubMed] [Google Scholar]
  • 3. Kidwell M.G., Lisch D.R.. Perspective: transposable elements, parasitic dna, and genome evolution. Evolution. 2001; 55:1–24. [DOI] [PubMed] [Google Scholar]
  • 4. Hickman A.B., Dyda F.. Mechanisms of DNA transposition. Microbiol Spectr. 2015; 3:MDNA3–A0034–2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hickman A.B., Dyda F.. DNA transposition at work. Chem. Rev. 2016; 116:12758–12784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Richardson J.M., Dawson A., O’hagan N., Taylor P., Finnegan D.J., Walkinshaw M.D.. Mechanism of Mos1 transposition: insights from structural analysis. EMBO J. 2006; 25:1324–1334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Arciszewska L.K., Craig N.L.. Interaction of the Tn7-encoded transposition protein TnsB with the ends of the transposon. Nucleic Acids Res. 1991; 19:5021–5029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Ghanim G.E., Rio D.C., Teixeira F.K.. Mechanism and regulation of P element transposition. Open Biol. 2020; 10:200244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Hickman A.B., Ewis H.E., Li X., Knapp J.A., Laver T., Doss A.-L., Tolun G., Steven A.C., Grishaev A., Bax A.et al.. Structural basis of hAT transposon end recognition by hermes, an octameric DNA transposase from Musca domestica. Cell. 2014; 158:353–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Chen Q., Luo W., Veach R.A., Hickman A.B., Wilson M.H., Dyda F.. Structural basis of seamless excision and specific targeting by piggyBac transposase. Nat. Commun. 2020; 11:3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Montaño S.P., Pigli Y.Z., Rice P.A.. The Mu transpososome structure sheds light on DDE recombinase evolution. Nature. 2012; 491:413–417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Arciszewska L.K., Drake D., Craig N.L.. Transposon Tn7 cis-acting sequences in transposition and transposition immunity. J. Mol. Biol. 1989; 207:35–52. [DOI] [PubMed] [Google Scholar]
  • 13. Sarnovsky R.J., May E.W., Craig N.L.. The Tn7 transposase is a heteromeric complex in which DNA breakage and joining activities are distributed between different gene products. EMBO J. 1996; 15:6348–6361. [PMC free article] [PubMed] [Google Scholar]
  • 14. Tang Y., Lichtenstein C., Cotterill S.. Purification and characterisation of the TnsB protein of Tn7: a transposition protein that binds to the ends of Tn7. Nucleic Acids Res. 1991; 19:3395–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Choi K.Y., Spencer J.M., Craig N.L.. The Tn7 transposition regulator TnsC interacts with the transposase subunit TnsB and target selector TnsD. Proc. Natl. Acad. Sci. U.S.A. 2014; 111:E2858–E2865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Stellwagen A.E., Craig N.L.. Mobile DNA elements: controlling transposition with ATP-dependent molecular switches. Trends Biochem. Sci. 1998; 23:486–490. [DOI] [PubMed] [Google Scholar]
  • 17. Klompe S.E., Jaber N., Beh L.Y., Mohabir J.T., Bernheim A., Sternberg S.H.. Evolutionary and mechanistic diversity of Type I-F CRISPR-associated transposons. Mol. Cell. 2022; 82:616–628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Petassi M.T., Hsieh S.-C., Peters J.E.. Guide RNA categorization enables target site choice in Tn7-CRISPR-Cas transposons. Cell. 2020; 183:1757–1771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Mitra R., McKenzie G.J., Yi L., Lee C.A., Craig N.L.. Characterization of the TnsD-attTn7 complex that promotes site-specific insertion of Tn7. Mobile DNA-UK. 2010; 1:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Waddell C.S., Craig N.L.. Tn7 transposition: recognition of the attTn7 target sequence. Proc. Natl. Acad. Sci. U.S.A. 1989; 86:3958–3962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Peters J.E., Fricker A.D., Kapili B.J., Petassi M.T.. Heteromeric transposase elements: generators of genomic islands across diverse bacteria. Mol. Microbiol. 2014; 93:1084–1092. [DOI] [PubMed] [Google Scholar]
  • 22. Peters J.E. Targeted transposition with Tn7 elements: safe sites, mobile plasmids, CRISPR/Cas and beyond. Mol. Microbiol. 2019; 112:1635–1644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Klompe S.E., Vo P.L.H., Halpin-Healy T.S., Sternberg S.H.. Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature. 2019; 571:219–225. [DOI] [PubMed] [Google Scholar]
  • 24. Strecker J., Ladha A., Gardner Z., Schmid-Burgk J.L., Makarova K.S., Koonin E.V., Zhang F.. RNA-guided DNA insertion with CRISPR-associated transposases. Science. 2019; 365:48–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Saito M., Ladha A., Strecker J., Faure G., Neumann E., Altae-Tran H., Macrae R.K., Zhang F.. Dual modes of CRISPR-associated transposon homing. Cell. 2021; 184:2441–2453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Peters J.E., Makarova K.S., Shmakov S., Koonin E.V.. Recruitment of CRISPR-Cas systems by Tn7-like transposons. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:E7358–E7366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Faure G., Shmakov S.A., Yan W.X., Cheng D.R., Scott D.A., Peters J.E., Makarova K.S., Koonin E.V.. CRISPR–Cas in mobile genetic elements: counter-defence and beyond. Nat. Rev. Microbiol. 2019; 17:513–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Halpin-Healy T.S., Klompe S.E., Sternberg S.H., Fernández I.S.. Structural basis of DNA targeting by a transposon-encoded CRISPR–Cas system. Nature. 2020; 577:271–274. [DOI] [PubMed] [Google Scholar]
  • 29. Hoffmann F.T., Kim M., Beh L.Y., Wang J., Vo P.L.H., Gelsinger D.R., George J.T., Acree C., Mohabir J.T., Fernández I.S.et al.. Selective TnsC recruitment enhances the fidelity of RNA-guided transposition. Nature. 2022; 609:384–393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Vo P.L.H., Ronda C., Klompe S.E., Chen E.E., Acree C., Wang H.H., Sternberg S.H.. CRISPR RNA-guided integrases for high-efficiency, multiplexed bacterial genome engineering. Nat. Biotechnol. 2021; 39:480–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Lichtenstein C., Brenner S.. Site-specific properties of Tn7 transposition into the E. coli chromosome. Mol. Gen. Genet. 1981; 183:380–387. [DOI] [PubMed] [Google Scholar]
  • 32. Hegde M., Strand C., Hanna R.E., Doench J.G.. Uncoupling of sgRNAs from their associated barcodes during PCR amplification of combinatorial CRISPR screens. PLoS One. 2018; 13:e0197547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Vo P.L.H., Acree C., Smith M.L., Sternberg S.H.. Unbiased profiling of CRISPR RNA-guided transposition products by long-read sequencing. Mobile DNA-UK. 2021; 12:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Feng S., Sekine S., Pessino V., Li H., Leonetti M.D., Huang B.. Improved split fluorescent proteins for endogenous protein labeling. Nat. Commun. 2017; 8:370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Sharan S.K., Thomason L.C., Kuznetsov S.G., Court D.L.. Recombineering: a homologous recombination-based method of genetic engineering. Nat. Protoc. 2009; 4:206–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. McKown R.L., Waddell C.S., Arciszewska L.K., Craig N.L.. Identification of a transposon Tn7-dependent DNA-binding activity that recognizes the ends of Tn7. Proc. Natl. Acad. Sci. U.S.A. 1987; 84:7807–7811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Kaczmarska Z., Czarnocki-Cieciura M., Górecka-Minakowska K.M., Wingo R.J., Jackiewicz J., Zajko W., Poznański J.T., Rawski M., Grant T., Peters J.E.et al.. Structural basis of transposon end recognition explains central features of Tn7 transposition systems. Mol. Cell. 2022; 82:2618–2632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Park J.-U., Tsai A.W.-L., Chen T.H., Peters J.E., Kellogg E.H.. Mechanistic details of CRISPR-associated transposon recruitment and integration revealed by cryo-EM. Proc. Natl. Acad. Sci. U.S.A. 2022; 119:e2202590119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Tenjo-Castaño F., Sofos N., López-Méndez B., Stutzke L.S., Fuglsang A., Stella S., Montoya G.. Structure of the TnsB transposase-DNA complex of type V-K CRISPR-associated transposon. Nat. Commun. 2022; 13:5792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Green B., Bouchier C., Fairhead C., Craig N.L., Cormack B.P.. Insertion site preference of Mu, Tn5, and Tn7 transposons. Mobile DNA-UK. 2012; 3:3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Fayet O., Ramond P., Polard P., Prère M.F., Chandler M.. Functional similarities between retroviruses and the IS3 family of bacterial insertion sequences?. Mol. Microbiol. 1990; 4:1771–1777. [DOI] [PubMed] [Google Scholar]
  • 42. Tang Y., Cotterill S., Lichtenstein C.P.. Genetic analysis of the terminal 8-bp inverted repeats of transposon Tn7. Gene. 1995; 162:41–46. [DOI] [PubMed] [Google Scholar]
  • 43. Craig N.L., Chandler M., Gellert M., Lambowitz A.M., Rice P.A., Sandmeyer S.B.. Mobile DNA III. 2015; Washington, DC: ASM Press. [Google Scholar]
  • 44. Mizuuchi K. Transpositional recombination: mechanistic insights from studies of Mu and other elements. Annu. Rev. Biochem. 1992; 61:1011–1051. [DOI] [PubMed] [Google Scholar]
  • 45. Lee I., Harshey R.M.. Importance of the conserved CA dinucleotide at mu termini11Edited by M. Gottesman. J. Mol. Biol. 2001; 314:433–444. [DOI] [PubMed] [Google Scholar]
  • 46. Bainton R., Gamas P., Craig N.L.. Tn7 transposition in vitro proceeds through an excised transposon intermediate generated by staggered breaks in DNA. Cell. 1991; 65:805–816. [DOI] [PubMed] [Google Scholar]
  • 47. Tou C.J., Orr B., Kleinstiver B.P.. Precise cut-and-paste DNA insertion using engineered type V-K CRISPR-associated transposases. Nat. Biotechnol. 2023; 10.1038/s41587-022-01574-x. [DOI] [PubMed] [Google Scholar]
  • 48. Zhang Y., Yang J., Yang S., Zhang J., Chen J., Tao R., Jiang Y., Yang J., Yang S.. Programming cells by multicopy chromosomal integration using CRISPR-associated transposases. Crispr J. 2021; 4:350–359. [DOI] [PubMed] [Google Scholar]
  • 49. Ezraty B., Aussel L., Barras F.. Methionine sulfoxide reductases in prokaryotes. Biochim. Biophys. Acta. 2005; 1703:221–229. [DOI] [PubMed] [Google Scholar]
  • 50. Boschi-Muller S., Olry A., Antoine M., Branlant G.. The enzymology and biochemistry of methionine sulfoxide reductases. Biochim. Biophys. Acta. 2005; 1703:231–238. [DOI] [PubMed] [Google Scholar]
  • 51. Watt R.M., Wang J., Leong M., Kung H., Cheah K.S.E., Liu D., Danchin A., Huang J.-D.. Visualizing the proteome of Escherichia coli: an efficient and versatile method for labeling chromosomal coding DNA sequences (CDSs) with fluorescent protein genes. Nucleic Acids Res. 2007; 35:e37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Friedman D.I. Integration host factor: a protein for all reasons. Cell. 1988; 55:545–554. [DOI] [PubMed] [Google Scholar]
  • 53. Wang S., Cosstick R., Gardner J.F., Gumport R.I.. The specific binding of Escherichia coli integration host factor involves both major and minor grooves of DNA. Biochemistry-US. 1995; 34:13082–13090. [DOI] [PubMed] [Google Scholar]
  • 54. Rice P.A., Yang S., Mizuuchi K., Nash H.A.. Crystal structure of an IHF-DNA complex: a protein-induced DNA U-turn. Cell. 1996; 87:1295–1306. [DOI] [PubMed] [Google Scholar]
  • 55. Miller H.I., Kikuchi A., Nash H.A., Weisberg R.A., Friedman D.I.. Site-specific recombination of bacteriophage: the role of host gene products. Cold Spring Harb. Symp. 1979; 43:1121–1126. [DOI] [PubMed] [Google Scholar]
  • 56. Kikuchi A., Flamm E., Weisberg R.A.. An Escherichia coli mutant unable to support site-specific recombination of bacteriophage λ. J. Mol. Biol. 1985; 183:129–140. [DOI] [PubMed] [Google Scholar]
  • 57. Swinger K.K., Rice P.A.. IHF and HU: flexible architects of bent DNA. Curr. Opin. Struct. Biol. 2004; 14:28–35. [DOI] [PubMed] [Google Scholar]
  • 58. Wang Z., Harshey R.M.. Crucial role for DNA supercoiling in Mu transposition: a kinetic study. Proc. Natl. Acad. Sci. U.S.A. 1994; 91:699–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Chalmers R., Guhathakurta A., Benjamin H., Kleckner N.. IHF modulation of Tn10 transposition: sensory transduction of supercoiling status via a proposed protein/DNA molecular spring. Cell. 1998; 93:897–908. [DOI] [PubMed] [Google Scholar]
  • 60. Dillon S.C., Dorman C.J.. Bacterial nucleoid-associated proteins, nucleoid structure and gene expression. Nat. Rev. Microbiol. 2010; 8:185–195. [DOI] [PubMed] [Google Scholar]
  • 61. Hołówka J., Zakrzewska-Czerwińska J.. Nucleoid associated proteins: the small organizers that help to cope with stress. Front Microbiol. 2020; 11:590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Mercier R., Petit M.-A., Schbath S., Robin S., Karoui M.E., Boccard F., Espéli O.. The MatP/matS site-specific system organizes the terminus region of the E. coli chromosome into a macrodomain. Cell. 2008; 135:475–485. [DOI] [PubMed] [Google Scholar]
  • 63. Schneider R., Lurz R., Lüder G., Tolksdorf C., Travers A., Muskhelishvili G.. An architectural role of the Escherichia coli chromatin protein FIS in organising DNA. Nucleic Acids Res. 2001; 29:5107–5114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Bradley M.D., Beach M.B., Koning A.P.J.d., Pratt T.S., Osuna R. Effects of Fis on Escherichia coli gene expression during different growth stages. Microbiology+. 2007; 153:2922–2940. [DOI] [PubMed] [Google Scholar]
  • 65. Finkel S.E., Johnson R.C.. The Fis protein: it's not just for DNA inversion anymore. Mol. Microbiol. 1992; 6:3257–3265. [DOI] [PubMed] [Google Scholar]
  • 66. Kuduvalli P.N., Rao J.E., Craig N.L.. Target DNA structure plays a critical role in Tn7 transposition. EMBO J. 2001; 20:924–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Waddell C.S., Craig N.L.. Tn7 transposition: two transposition pathways directed by five Tn7-encoded genes. Gene Dev. 1988; 2:137–149. [DOI] [PubMed] [Google Scholar]
  • 68. Gay N.J., Tybulewicz V.L., Walker J.E.. Insertion of transposon Tn7 into the Escherichia coli glmS transcriptional terminator. Biochem. J. 1986; 234:111–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Nuñez J.K., Bai L., Harrington L.B., Hinder T.L., Doudna J.A.. CRISPR immunological memory requires a host factor for specificity. Mol. Cell. 2016; 62:824–833. [DOI] [PubMed] [Google Scholar]
  • 70. Santiago-Frangos A., Buyukyoruk M., Wiegand T., Krishna P., Wiedenheft B.. Distribution and phasing of sequence motifs that facilitate CRISPR adaptation. Curr. Biol. 2021; 31:3515–3524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Fagerlund R.D., Wilkinson M.E., Klykov O., Barendregt A., Pearce F.G., Kieper S.N., Maxwell H.W.R., Capolupo A., Heck A.J.R., Krause K.L.et al.. Spacer capture and integration by a type I-F Cas1–Cas2-3 CRISPR adaptation complex. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:E5122–E5128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Schmitz M., Querques I., Oberli S., Chanez C., Jinek M.. Structural basis for the assembly of the type V CRISPR-associated transposon complex. Cell. 2022; 185:4999–5010. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkad270_Supplemental_Files

Data Availability Statement

High-throughput sequencing data are available at the National Center for Biotechnology Information (NCBI) Sequence Read Archive (BioProject Accession: PRJNA919078). Custom scripts used for analyses of high-throughput sequencing data are available at GitHub (https://github.com/sternberglab/Walker_Klompe_etal_2023) and on Zenodo (DOI 10.5281/zenodo.7776252). Datasets generated and analyzed in the current study are available from the corresponding authors on reasonable request.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES