Abstract
Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification.
Author Summary
Diversity-generating retroelements function through a unique, reverse transcriptase–mediated “copy and replace” mechanism that enables repeated rounds of protein diversification, selection, and optimization. The ability of DGRs to introduce targeted diversity into protein-coding DNA sequences has the potential to dramatically accelerate the evolution of adaptive traits. The utility of these elements in nature is underscored by their widespread distribution throughout the bacterial domain. Here we define DNA sequences and structures that are necessary and sufficient to direct the diversification machinery to specified target sequences. In addition to providing mechanistic insights into conserved features of DGR activity, our results provide a blueprint for the use of DGRs for a broad range of protein engineering applications.
Introduction
Diversity-generating retroelements (DGRs) have been identified in numerous bacterial phyla [1], [2]. Although most DGRs are bacterial chromosomal elements, they are prevalent in phage and plasmid genomes as well. The prototype DGR was identified in a temperate bacteriophage, BPP-1, on the basis of its ability to switch tropism for different receptor molecules on host Bordetella species [3]. Tropism switching is mediated by a phage-encoded DGR which introduces nucleotide substitutions in a gene that specifies a host cell-binding protein, Mtd (major tropism determinant), positioned at the distal tips of phage tail fibers. This allows phage adaptation to the dynamic changes in cell surface molecules that occur during the infectious cycle of its bacterial host [3]. Comparative bioinformatics predicts that all DGRs function by a fundamentally similar mechanism using conserved components ([1]; Gingery et al., unpublished data). These include unique reverse transcriptase (RT) genes (brt for BPP-1), accessory loci (avd or HRDC), short DNA repeats, and target genes that are specifically diversified [1]–[4].
As illustrated by the BPP-1 DGR shown in Figure 1A, diversity results from the introduction of nucleotide substitutions in a variable repeat (VR) located at the 3′ end of the mtd gene [1]–[4]. Variable sites in VR correspond to adenine residues in a homologous template repeat (TR), which remains unchanged throughout the process [1]–[4]. Transcription of TR provides an essential RNA intermediate that is reverse transcribed by Brt, creating a cDNA product which ultimately replaces the parental VR [4]. During this unidirectional retrotransposition process of mutagenic homing, TR adenines are converted to random nucleotides which subsequently appear at corresponding positions in VR [1]–[4]. Adenine mutagenesis appears to occur during cDNA synthesis and is likely to be an intrinsic property of the DGR-encoded RT [4].
Located at the 3′ end of VR is the IMH (initiation of mutagenic homing) region, which consists of at least two functional elements: a 14 bp GC-only sequence [(GC)14] which is identical to the corresponding segment of TR, and a 21 bp sequence containing 5 mismatches with TR that determines the directionality of information transfer [1]. Using a saturating co-conversion assay, we have precisely mapped a marker transition boundary that appears to represent the point at which 3′ cDNA integration occurs and information transfer begins [4]. This maps within the (GC)14 element and we previously postulated that it represents the site of a nick or double-strand break in the target DNA [4]. If true, the resulting 3′ hydroxyl could serve to prime reverse transcription of the TR-derived RNA intermediate in a target DNA-primed reverse transcription (TPRT) mechanism [4]–[7]. cDNA integration at the 5′ end of VR requires TR/VR homology and may occur via template switching during cDNA synthesis [4].
There are 23 adenines upstream of the (GC)14 element in the BPP-1 TR, each of which is capable of variation [3]. The theoretical maximum DNA sequence diversity is ∼1014, which translates to a maximum protein diversity of nearly 10 trillion distinct polypeptides at the C-terminus of Mtd. For Mtd and other DGR-diversified proteins, co-evolution has resulted in the precise positioning of TR adenines to correspond to solvent exposed residues in the ligand binding pockets of variable proteins [8], [9]. As implicated in Figure 1A, mutagenic homing occurs through a “copy and replace” mechanism that precisely regenerates all cis-acting components required for further rounds of diversification [4]. This allows the system to operate over and over again to optimize ligand-receptor interactions.
The goal of this study was to characterize requirements for target site recognition by the BPP-1 DGR. Along with insights into the mechanism of mutagenic homing, our results reveal engineering principles that allow DGRs to be exploited to diversify heterologous genes through a process that is entirely contained within bacterial cells.
Results
Boundaries of the BPP-1 DGR target sequence
5′ and 3′ boundaries of the BPP-1 DGR target sequence were delineated using a PCR-based assay that specifically detects VR sequences that have been modified by DGR-mediated retrohoming [4]. The system consists of a donor plasmid (pMX-ΔTR23-96, Figure 1B) carrying avd, a modified TR containing a 30 bp tag (TG2), and brt co-expressed from a BvgAS-regulated promoter [4], and a recipient prophage genome deleted for avd, TR, and brt (BPP-1ΔATR, Figure 1C). TR retrotransposition from the donor plasmid to the recipient prophage VR creates a “tagged” VR that can be detected using primer pairs specific for the tag and VR-flanking sequences (P1/P4 and P2/P3 in Figure 1B; Table 1). Controls include the demonstration that homing products are Brt-dependent and contain mutagenized adenines. An advantage of this assay is that it does not require infectious phage particle formation and consequently allows manipulation of sequences that are required for Mtd function.
Table 1. List of oligonucleotides used in this study.
Name | Sequence |
P1 | 5′ TTCGGTACCTGCTAGGCGTCAACCACCTG |
P2 | 5′ AGCAAGCTTGTCCTGTTTGCGCGTGATGCT |
P3 | 5′ AAATCTAGATCTGTCTGCGTTTGTGTT |
P4 | 5′ AGCAAGCTTAGCACAGGAACACAAACG |
P5 | 5′ GGTCACCATGAGCATTTGGTCGTAGCA |
P6 | 5′ GTACAGCGGGCCGTCGTTCTCGTTCGCGTT |
P7 | 5′ CCCTCTAGAGCTCCGGTTGCTTGTGGACG |
P8 | 5′ AGCAAGCTTCCTCGATGGGTTCCAT |
P9 | 5′ ATATCTAGACGTTTTCTTGGGTCTACCGTTTAATGTCG |
P10 | 5′ ATAAAGCTTCGACATTAAACGGTAGACCCAAGAAAA |
Deletions were introduced into VR and adjacent sequences in BPP-1ΔATR lysogens (Figure 1C and Figure S1) and the abilities of mutated prophages to serve as recipients in retrohoming assays were measured (see Materials and Methods). As shown in Figure 1D, sequences upstream of VR were dispensable for DGR homing (lanes 4&13). A deletion mutation that truncates the first 20 bp of VR still supported homing, although at a decreased level (lanes 5&14). Sequence analysis of homing products for this mutant suggested that 5′ cDNA integration occurred at cryptic sites within the truncated VR, although 3′ cDNA integration occurred in a normal manner (Figure 1D, lanes 5&14; Figure S2). At the 3′ end, homing was highly dependent on a 35 bp region located downstream of VR (lanes 6&15 vs. lanes 7&16). This implicated sequences with 8 bp inverted repeats that could potentially form a hairpin structure in ssDNA or a cruciform structure in dsDNA as a possible determinant of DGR target function (Figure 1C). Additional analysis showed that deletion of sequences immediately downstream of the stem was well tolerated (3′Δ58, Figure 1C and 1E), while further deletions at the 3′ end (3′Δ68) reduced target function to essentially non-detectable levels in homing assays.
In the experiments in Figure 1, homing products were not detected using a donor plasmid expressing enzymatically inactive Brt (BrtSMAA, in which the active site motif YADD is replaced by SMAA; [1], [3], [4]), and sequence analysis of products generated with primer sets P1/P4 and P2/P3 demonstrated transfer of the TG2 tag from TR to VR. Adenine mutagenesis was observed in ∼53% of clones containing P1/P4 products and ∼32% of clones containing P2/P3 products (data not shown), which had 3 and 2 TR adenine residues available for mutagenesis, respectively. These observations indicated that true DGR homing products were being detected. Equivalent amounts of template phage DNA, as measured by quantitative PCR, were included in each experiment (lanes 19–27, Figure 1D; lanes 17–24, Figure 1E).
Stem structure, but not sequence, is critical for DGR homing and phage tropism switching
We next determined whether the primary sequence or the secondary structure of the putative hairpin/cruciform located downstream of VR is important for function. To disrupt the structure, 7 consecutive residues proximal to the loop on the 3′ half of the stem were changed to their complementary residues (StMut, Figure 2A). The resulting mutant was essentially unable to support DGR homing at a level that could be detected in PCR-based assays (lanes 3&9, Figure 2B). Complementary substitutions were subsequently introduced to the 5′ half of the stem to generate StRev (Figure 2A). If the primary sequence is important, the StRev recipient should remain non-functional. Alternatively, if the structure of the stem is the critical element, restoring base pairing interactions might restore DGR target function. As shown in Figure 2B (lanes 5&11), this appears to be the case, as the StRev mutant regained DGR homing activity. Homing products were verified by sequencing and adenine mutagenesis was observed (Figure S3).
Phage tropism switching assays provide a quantitative measure of DGR function [1], [3], [4]. Although the evolution of new ligand specificities is an inherently stochastic process, the frequency at which it occurs reflects the combined efficiencies of retrohoming and adenine mutagenesis. In Figure 2C, tropism switching was measured using BPP-1ΔATR or mutant derivatives complemented with plasmid pMX1, which provides avd, TR and brt in trans (see Materials and Methods). The StMut mutation resulted in over a 1000-fold decrease in tropism switching, which was restored to near WT levels by the StRev allele. Sequence analysis of VR regions in phages with switched tropisms (5 random clones each) confirmed adenine mutagenesis in every case (Figures S4, S5, S6).
Taken together, these data argue that the ability to form a hairpin or cruciform structure, as opposed to the primary sequence of the inverted repeats, is a critical determinant of target site recognition. The residual tropism switching activity of StMut phage suggests that hairpin/cruciform-independent pathways may exist, although they operate at a much lower efficiency.
Physical evidence for hairpin/cruciform formation in negatively supercoiled DNA
To determine if the hairpin/cruciform structure can form in vitro, supercoiled plasmids carrying WT or mutant BPP-1 DGR target sequences were isolated and treated with phage T7 DNA endonuclease I, followed by primer extension with 5′ end-labeled primers to identify specific cleavage sites [10], [11]. T7 DNA endonuclease I is a structure-specific enzyme that resolves DNA four-way (Holiday) junctions and has previously been used to identify DNA hairpin or cruciform formation [10], [11]. As shown in Figure 3, cleavage sites were detected on both DNA strands in the hairpin/cruciform structure, with major cleavage sites at or near the four-way junction. Minor cleavage sites were also detected at or near the loop, as T7 DNA endonuclease I also has some activity on single-stranded DNA [12]. T7 endonuclease I cleavage at the hairpin/cruciform region requires structure formation, as plasmids containing a disrupted stem (StMut) were not cleaved in the corresponding region. Linearization of plasmids containing the WT sequence eliminated cleavage, suggesting that negative supercoiling is required for hairpin/cruciform formation [13], [14]. These results demonstrate that hairpins can form on either strand of the target DNA. Although it is likely that they form simultaneously on both strands to create cruciforms, this is not directly addressed by enzyme cleavage assays, hence the hairpin/cruciform designation.
The BPP-1 DGR target sequence functions in an orientation-independent manner
We next determined whether the orientation of the target sequence relative to the phage genome is important for DGR retrohoming. In the experiment in Figure 4A, a segment of the BPP-1ΔATR prophage that includes VR and its flanking sequences was inverted, and PCR-based DGR homing assays were performed with donor plasmid pMX-ΔTR23-96. DGR homing into the inverted target occurred at a level comparable to that of the WT control (Figure 4B), and sequence analysis indicated that normal homing products were produced (Figure S7). These results show that the polarity of phage replication is not important for DGR homing, and that the hairpin/cruciform structure functions in a manner that is independent of its orientation relative to the leading or lagging strands formed during DNA replication.
Conservation and functional characteristics of DGR hairpin/cruciform structures
Inverted repeats are nearly always found downstream of VR sequences in target genes [Gingery et al., unpublished data], as illustrated by the phage DGR sequences shown in Figure 5. These elements display a striking pattern of similarity, suggesting they have conserved and important functions. In each case, hairpin/cruciform structures with 7–10 bp GC-rich stems and 4 nt loops can potentially be formed. Although stems are always GC-rich, their sequences differ, while loops are more conserved with the consensus sequence (5′GRNA3′, with R = A or G, N = any nucleotide) in the sense strand. The exact distance between the hairpin/cruciform structures and the 3′ ends of their respective VRs appears to be quite flexible. We took advantage of the BPP-1 DGR system to test the relevance of these patterns of conservation, with the goal of generating a more comprehensive understanding of parameters important for target site recognition.
We first studied requirements for stem length and sequence and found that although minor changes are tolerated, the WT configuration appears to be optimized for BPP-1 DGR function. Of the stem length variants in Figure 6A, extensions are better tolerated than deletions. Removal of 2, 4 or 6 bp proximal to the loop results in markedly decreased activity in both PCR-based homing (Figure 6B) and phage tropism switching assays (Figure 6C), to levels similar to those observed with the StMut allele in which the stem is completely abolished (Figure 2A). Insertion of 2 bp next to the loop had little effect on activity, while longer insertions gradually decreased target site function. Keeping the length of the stem constant, a sequence change in the middle of the stem that converts 4 GC base pairs to AT base pairs (StAT, Figure 6A) greatly reduced, but did not eliminate function. We next tested the effects of altering the sequence and size of the loop using the mutant constructs shown in Figure 7A. Substituting CTTT for the consensus loop sequence GAAA, or increasing the size of the loop by as little as 2 nt, decreased activity in PCR-homing (Figure 7B) and tropism switching assays (Figure 7C) to near background levels. Based on these experiments, it appears that an 8–10 bp GC-rich stem is optimal for BPP-1 DGR homing, and that both the size and sequence of the 4 bp loop are critical for function. Our results correlate with the patterns of conservation shown in Figure 5.
Shifting the position of the hairpin/cruciform element alters the efficiency of target site recognition but not the site of 3′ cDNA integration
In the experiments in Figure 8, we tested the effects of altering the position of the hairpin/cruciform with respect to the 3′ boundary of VR and probed sequence requirements for the intervening region. SpM4 (Figure 8A), in which the 4 residues in the spacer were switched to the complementary nucleotides, retained WT activity (Figure 8B). In contrast, deletion of the spacer (SpD4) resulted in a significant decrease in target function. The SpM4 and SpD4 mutations eliminate the mtd stop codon and generate non-infective phages, obviating the ability to measure tropism switching. Nonetheless, their relative levels of activity were readily apparent in PCR-homing assays. Expansion of the spacer was tolerated to a greater extent than deletion. SpI3, which has a 3 bp insertion in the spacer (Figure 8A), showed no significant defect in PCR-homing or phage tropism switching assays (Figure 8B and 8C), but longer insertions gradually decreased target site function.
The SpI6 insertion, which increases the distance between the hairpin/cruciform structure and the 3′ end of VR by 6 bp, retains a measurable level of activity. We took advantage of this and used a marker coconversion assay (Figure S8; [4]) to determine the relationship between the position of the hairpin/cruciform structure and the site at which information transfer initiates. As summarized in Figure 8D, our coconversion assay measured transfer of nucleotide polymorphisms from tagged TR donors to a recipient VR carrying the SpI6 mutation using PCR-based homing assays (data not shown). With the WT recipient, a coconversion boundary occurs between positions 107 and 112, and this was interpreted as representing the site at which TR-derived cDNA synthesis initiates [4]. As shown in Figure 8D, the coconversion boundary remains essentially unchanged in the SpI6 mutant. Although the position of the hairpin structure affects the efficiency of DGR homing, it does not determine the site at which cDNA is integrated at the 3′ end of VR.
Engineering the BPP-1 DGR to target a heterologous gene
To determine if the results presented here complete our understanding of DGR-encoded requirements for retrohoming to a target gene, we applied them as engineering principles in an attempt to construct a functional, synthetic, TR/VR system. For a DNA sequence to serve as a recipient VR, three conditions must be met. First, it must be adjacent to an IMH region with functional (GC)14 and 21 bp elements at its 3′ end [1], [4]. Second, the IMH region must be followed by inverted repeats capable of forming a hairpin/cruciform structure of appropriate size, composition and distance from IMH. And finally, sufficient VR/TR sequence homology must be provided to allow efficient upstream (5′) cDNA integration. In recent studies we have shown that although short stretches of nucleotide identity (≥8 bp) between the TR-derived cDNA and VR target sequences are sufficient to complete the homing reaction, homing efficiency is increased with longer (≥19 bp) stretches of homology [4]. With these parameters in mind, we tested our ability to engineer the BPP-1 DGR to target a heterologous reporter gene (aph3′Ia; [15]) which provides facile detection of targeting events by antibiotic selection.
The recipient VR-KanS cassette shown in Figure 9A contains an aph3′Ia kanamycin resistance (KanR) allele with a 3′ deletion that renders it nonfunctional by removing coding sequences for 6 essential C-terminal residues. The truncated gene was placed immediately upstream of IMH, followed by the hairpin/cruciform-forming inverted repeats from the BPP-1 DGR. Transcription is directed by the native aph3′Ia promoter. The donor plasmid expresses avd, brt, and one of two engineered TRs (TR-Km1, TR-Km2) from the Pfha promoter. Both TRs contain the intact 3′ end of the aph3′Ia open reading frame, followed by two consecutive stop codons and sequences 97–134 from the 3′ end of the BPP-1 TR. For TR-Km2, the aph3′Ia fragment is also flanked, at its 5′ end, by the first 22 residues of the BPP-1 TR. DGR-mediated retrotransposition from the donor TR constructs to the VR-KanS recipient should regenerate a full-length aph3′Ia gene conferring KanR.
We first tested whether targeting can occur in the context of a replicating phage. BPP-1ΔATR*KanS carries the VR-KanS cassette inserted between attL and bbp1 on the left arm of the prophage genome [16], along with a deletion of avd, TR and brt and a series of synonymous substitutions in IMH to inactivate the mtd VR (Figure 7A). B. bronchiseptica RB50 carrying the TR-Km1 or -Km2 donor plasmid, or derivatives with a null mutation in brt, were infected with BPP-1ΔATR*KanS and targeting efficiencies were determined by infecting RB50 with progeny phages and measuring relative numbers of KanR lysogens. KanR lysogens were readily detected when targeting occurred from Brt+ TR donors, but not Brt− donors (Figure 9B), and sequence analysis showed that KanR resulted from the regeneration of full-length aph3′Ia alleles which often contained mutations at positions corresponding to adenines in donor TRs (Figures S9 and S10). It is interesting to note that the TR-Km1 donor was significantly more efficient than TR-Km2. This suggests that the majority of cDNAs are extended to the 5′ termini of these short synthetic TRs, and target (VR) homology to the extreme 3′ ends of the extension products may be advantageous for cDNA integration.
We also tested the ability to target the VR-KanS cassette when present on a resident prophage in the bacterial chromosome or on a plasmid. In the experiment in Figure 9C, RB50/BPP-1ΔATR*KanS lysogens were transformed with donor plasmids under conditions that suppress Pfha promoter activity. Following a 6 hr pulse of Pfha induction, cells were plated under promoter-suppressing conditions on media with or without kanamycin. In Figure 9D, a similar protocol was used to target a VR-KanS cassette carried on a medium copy number plasmid in RB50 cells containing a TR donor plasmid, but no other phage sequences. In both experiments, KanR colonies were readily detected when targeting occurred from Brt+, but not Brt− TR donors, and sequence analysis showed characteristic patterns of adenine mutagenesis (Figures S11, S12, S13, S14). Taken together, our results demonstrate the ability to engineer a VR/TR system that targets a heterologous reporter gene on a phage, plasmid or bacterial genome. The data in Figure 9D show that no BPP-1 phage products, other than those encoded in the DGR, are required for mutagenic retrohoming.
Discussion
Understanding DGR target site recognition requires a precise definition of cis-acting sequences important for retrohoming. Our analysis of the boundaries of the BPP-1 DGR target showed that sequences upstream of VR are dispensable, as predicted by previous results [4]. More importantly, we show that homing is facilitated by an element downstream of VR, beyond the point at which TR/VR homology ends. Sequence analysis, mutagenesis, and structure-specific nuclease assays demonstrated that GC-rich inverted repeats directly following VR form a hairpin/cruciform structure that plays a critical role in retrohoming. Highly similar elements are present in analogous locations in many phage- or prophage-related DGRs (Figure 5), and hairpin/cruciform structures are predicted for the majority of DGRs that naturally reside on bacterial chromosomes and plasmids as well [Gingery et al., unpublished data]. We propose that DNA hairpin formation near the 3′ end of VR is a conserved requirement for DGR-mediated retrohoming.
For the BPP-1 DGR target, the 8 bp stem appears to function as a structure that is dependent on nucleotide composition but not sequence. In contrast, the loop of the hairpin/cruciform structure is constrained in size and sequence and conforms to the consensus, 5′-GRNA, derived from comparisons with other phage-related DGRs. This suggests that loop sequence and size may be important for stabilizing the hairpin/cruciform structure [17], or for creating a strand bias in DNA cleavage by a host-encoded endonuclease. It is also possible that the loop is in direct physical contact with a critical component, such as Brt, Avd, a TR-containing RNA transcript, or other parts of the DGR target. By testing the effects of length and sequence variations between the hairpin/cruciform and VR, we found that distance is an important parameter, although some flexibility exists. Extending the spacer by 6 bp did not shift the marker coconversion boundary in the (GC)14 region during DGR homing [4], showing that the position of the hairpin/cruciform does not determine the site at which 3′ cDNA integration occurs.
DGRs are evolutionarily related to group II introns [1] and it is interesting to note that a subset of these retroelements, the group IIC introns, also target motifs with stem-loop structures [18]–[20]. In nature, group IIC introns are often found to be located short distances downstream of sequences encoding known or predicted factor-independent transcription terminators, which are composed of GC-rich stems with loops of varying sizes followed by poly-uridine stretches [18]–[20]. Using an in vitro mobility assay, Robart et al. [19] have shown that reconstituted ribonucleoprotein particles from the Bacillus halodurans B.h.I1 group IIC intron recognize structures in ssDNA that correspond to RNA hairpins formed during transcription termination. As observed with the BPP-1 DGR, the B.h.I1 mobility reaction was highly dependent on stem formation but not absolute sequence [19]. Stems shorter than 9 bp had significantly reduced activities in in vitro mobility assays, a longer stem (14 bp) retained function, and the efficiency of targeting correlated with GC content and predicted stem stability [19]. In contrast to our observations with the BPP-1 DGR, alterations in loop sequence had little effect on B.h.I1 mobility in vitro [19]. The adaptation of group IIC introns to recognize and insert downstream of factor-independent transcriptional terminators was proposed to provide a selective advantage by limiting their expression, avoiding the interruption of essential coding sequences, and facilitating horizontal spread as intrinisic terminators are common and conserved in bacteria [19]. For DGRs, we speculate that the ability to target sequences upstream of terminator-like stem-loop structures may have played a role in directing their sequence diversification capabilities to the 3′ coding regions of target genes.
The TPRT model for DGR homing postulates that cDNA synthesis initiates with a nick or double-strand break in the IMH (GC)14 sequence, providing a primer for reverse transcription of a TR-containing RNA transcript [4]. Analogous to target recognition by group IIC introns, the hairpin/cruciform structure may serve as a recognition element for a retrohoming complex that includes trans-acting DGR-encoded factors. A DNA endonuclease that might be responsible for cleavage awaits identification, and possibilities include Avd, Brt, a TR-derived catalytic RNA, or an unidentified host factor. It is also possible that the DNA hairpin/cruciform actively promotes single- or double-strand breaks. If DNA repair synthesis extends to the (GC)14 region, the elongating antisense strand could then be used for cDNA priming. DNA breaks at the hairpin/cruciform structure could be created by an endonuclease that cleaves the single-stranded loop, or by a structure-specific enzyme similar to T7 endonuclease I [21]. Since DNA cruciforms are structurally similar to Holiday junctions, host-encoded recombination proteins that function in resolving recombination intermediates could be involved [22]. The cDNA priming mechanism of the BPP-1 DGR appears to be different from that of mobile group II introns that lack a DNA endonuclease activity in their intron-encoded proteins [23]–[25]. Reverse transcription in retrohoming and ectopic transposition of these elements is proposed to be primed by either the leading or lagging strand during DNA replication, and strong strand-specific biases are observed [23]–[25]. Our observation that the BPP-1 DGR target sequence is orientation-independent suggests that DNA replication polarity does not play a significant role in cDNA priming. Although our results to date are consistent with TPRT, further studies are required to definitively characterize the mechanism of cDNA initiation and integration at the 3′ end of VR and to determine the precise role of the hairpin/cruciform structure in the retrohoming process.
The broad distribution of DGRs in nature attests to their utility, and prospects for adapting these elements for protein engineering applications are compelling. Our results demonstrate that the region containing the (GC)14 and 21 bp sequences in IMH, and an adjacent hairpin/cruciform, is sufficient to direct the DGR mutagenic homing machinery to a heterologous target gene through appropriate engineering of a cognate TR. Using similar design principles we have successfully targeted a tetracycline resistance determinant as well (HG and JFM, unpublished data). For DGRs to be useful tools, it will be necessary to engineer their activity to allow efficient and controlled diversification. Having defined the DGR-encoded cis- and trans-acting factors required to diversify heterologous sequences, efforts to optimize their activities can now proceed in an informed and comprehensive way. It will also be important to determine the effects of TR/VR size, composition, and position relative to cis-acting DGR elements, on the efficiency of diversifying heterologous sequences. In preliminary experiments, insertions of moderate size (up to ∼200 bp) at position 84 in the BPP-1 TR (134 bp) are transferred to VR and mutagenized at adenines, suggesting that sequences of >300 bp could be diversified by an engineered system (LVT, HG and JFM, unpublished data).
In addition to providing prodigious levels of diversity, mutagenic homing is a regenerative process that allows DGRs to operate through unlimited rounds to optimize variable protein functions [4]. This may be particularly advantageous for directed protein evolution since desired traits can be selected and continuously evolved in iterative cycles, without the need for library construction or other interventions, through a process that takes place entirely within bacterial cells.
Materials and Methods
Bacterial strains and phages
B. bronchiseptica strains RB50, RB53Cm, RB54 and ML6401 have been described [16]. The BPP-1ΔATR lysogen was constructed from ML6401, an RB50 strain lysogenized with phage BPP-1, by deleting sequences from avd position 48 to position 882 of brt. Target region deletions/insertions and hairpin/cruciform modifications were introduced into the BPP-1ΔATR lysogen through allelic exchange [1], [4] and are diagramed in the figures. The BPP-1ΔATR* lysogen contains multiple silent mutations at both the 5′ and 3′ ends of VR to inactivate it as a DGR target. It was used as the parental strain to create the BPP-1ΔATR*KanS lysogen, in which the KanR gene aph3′Ia has sequences encoding the C-terminal 6 amino acid residues truncated and is placed upstream of IMH and the hairpin/cruciform structure as a reporter for heterologous gene targeting. The aph3′Ia allele also contains an AAA to CGC substitution resulting in K260R. The VR-KanS reporter cassette was inserted between attL and bbp1 of the phage genome. Phage BPP-1ΔATR and its various derivatives were produced from the above lysogens.
Plasmid constructs
Plasmid pMX-ΔTR23–96 has TR positions 23–96 deleted and replaced by a 30 bp PCR tag as in pMX-ΔTR23–84 [4]. Its RT-deficient derivative contains the YMDD to SMAA mutation at Brt positions 213–216 [3], [4]. Plasmids pMX1 and pMX1SMAA were used for phage tropism switching assays and have previously been described [4].
pUC-StWT is a pUC18-based plasmid containing the WT BPP-1 DGR target from position −6 upstream of VR to position +82 downstream of VR. pUC-StMut is its derivative with 7 residues in the 3′ half of the stem, proximal to the loop, mutated to their complementary nucleotides.
Plasmids pMX-TRC85T, pMX-TRC91T, pMX-TRC97T, pMX-TRC100T, pMX-TRC105T, pMX-TRC107T, pMX-TRC109T, pMX-TRC112T, pMX-TRC115T, pMX-TRC120T and pMX-TRC125T have been previously described [4].
Plasmids pMX-Km1 and pMX-Km2 were constructed from pMX-ΔTR23–96 for KanR gene targeting, both containing the last 36 bp of aph3′Ia. The 36 bp sequence and its following two stop codons replace TR positions 1–96 in pMX-Km1 and TR positions 23–96 in pMX-Km2.
Plasmid pHGT-KanS contains the VR-KanS cassette described above and was used as the recipient plasmid for KanR targeting. The plasmid also carries a tetracycline resistance gene.
Phage production for DGR homing and tropism switching assays
Phage production for DGR functional assays was carried out by either single-cycle lytic infection or mitomycin C induction from lysogens as previously described [4], except for minor modifications as noted. For single-cycle lytic infection, B. bronchiseptica RB50 cells transformed with appropriate donor plasmids were grown overnight at 37°C in Luria-Bertani (LB) media containing 25 µg/ml of chloramphenicol (Cam), 20 µg/ml streptomycin (Str), and 10 mM nicotinic acid to modulate to the Bvg− phase and prevent transcription from the Pfha promoter. An amount of cells equal to 1 ml of culture (OD600 = 1.0) was pelleted, rinsed, and resuspended in 2.5 ml Stainer Scholte (SS) medium [26] containing 25 µg/ml Cam and 20 µg/ml Str (SS+Cam+Str). Cultures were grown for 3 hr at 37°C to modulate bacteria to the Bvg+ phase and activate Pfha promoter expression. An aliquot of 500 µl from each culture was used for OD600 measurement and cell number calculation. Phage particles were added to the rest of the culture at a multiplicity of infection of ∼2.0. Following 1 hr incubation at 37°C for phage absorption, infected cells were pelleted and resuspended in 1 ml of fresh, prewarmed SS+Cam+Str media and incubated at 37°C for 3 hr post phage addition to allow completion of a single cycle of phage development. Progeny phages were harvested following chloroform extraction.
For phage production from lysogens, RB50 derivatives carrying appropriate prophages and donor plasmids were grown and modulated to the Bvg+ phase as in single-cycle lytic infections. Phage production was induced with 0.2 µg/ml mitomycin C for 3 hr at 37°C. Progeny phages were harvested by chloroform extraction.
BPP-1 phage tropism switching and PCR-based DGR homing assays
Phage tropism switching and DGR homing assays have been previously described [4].
Analysis of hairpin/cruciform formation in plasmid DNA in vitro
Plasmids containing the WT BPP-1 DGR target and the StMut mutation were isolated from E. coli DH5αλpir cells using the QIAprep Spin miniprep kit (Qiagen). Plasmids were linearized by digestion with BglI as indicated. To analyze hairpin/cruciform structure formation in supercoiled or relaxed DNAs, 0.5 µg of supercoiled or linearized plasmids were treated with 10 units of T7 DNA endonuclease I (New England Biolabs, Ipswich, MA) for 40 minutes as in Miller et al. [11]. The reactions were terminated by phenol-chloroform-isoamyl alcohol (25∶24∶1) extraction and DNAs were precipitated with ethanol. T7 DNA endonuclease I cleavage sites were determined by primer extension with 5′-end 32P-labeled primers using Vent (exo-) DNA polymerase (New England Biolabs, Ipswich, MA) as in Miller et al. [11], except that 5% DMSO was added for GC-rich templates. Primer extension products were resolved on 6% polyacrylamide/8 M urea gels, alongside Sanger sequencing ladders generated with the same labeled primers and a plasmid template containing the WT target.
Targeting of a KanR gene by engineered BPP-1 phage DGRs
To target the KanR gene on a replicating phage, BPP-1ΔATR*KanS phage particles were used for single-cycle lytic infection of RB50 cells transformed with appropriate donor plasmids, similar to phage production by single-cycle lytic infection described above. Progeny phages were titered and ∼1011 pfu of different phages were added to 25 ml RB50 cells (OD600 = 1.2) in SS+Str media for 8.0 hr to reestablish lysogens. Cells were pelleted and resuspended in 5 ml LB and serial dilutions were plated on LB+NA+Str and LB+NA+Str+Kan (50 µg/ml) to determine KanR gene targeting frequencies. Lysogen reestablishment efficiencies ranged from 60% to 100% based on PCR analysis of 10 colonies each picked on LB+NA+Str plates using phage specific primers. KanR targeting efficiency for each donor plasmid was determined as the ratio of colony forming units (cfu) on LB+NA+Str+Kan plates to those on LB+NA+Str, calibrated with the lysogen reestablishment efficiency for that sample.
To target the KanR gene on a prophage in the bacterial chromosome, RB50 cells lysogenized with phage BPP-1ΔATR*KanS were transformed with appropriate donor plasmids. Starting cultures were grown overnight in LB+NA+Str+Cam as described above. An amount of cells equal to 1 ml of culture (OD600 = 1.0) was pelleted, rinsed, and resuspended in 2.5 ml SS+Cam+Str and grown at 37°C for 6 hours. Serial dilutions were plated on LB+NA+Str and LB+NA+Str+Kan (50 µg/ml) to determine KanR gene targeting frequencies. KanR targeting efficiencies were determined as relative numbers of KanR cells as above. To target the KanR gene on a plasmid, the recipient plasmid pHGT-KanS and appropriate donors were transformed into RB50 cells and analyzed similarly. Tetracycline was added to 5.0 µg/ml for recipient plasmid maintenance.
Supporting Information
Acknowledgments
We thank members of JFM laboratory and David W. Martin and other scientists at AvidBiotics for constructive input.
Footnotes
JFM is a founder of AvidBiotics Corporation and a member of its Scientific Advisory Board. HG is a consultant of the company. SW is a company employee.
This work was supported by NIH grants RO1AI071204 and R21DE021528 (JFM). SW was partially supported by grants 1R43AI088979 and R43AI088863 from NIAID. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Doulatov S, Hodes A, Dai L, Mandhana N, Liu M, et al. Tropism switching in Bordetella bacteriophage defines a family of diversity-generating retroelements. Nature. 2004;431:476–481. doi: 10.1038/nature02833. [DOI] [PubMed] [Google Scholar]
- 2.Medhekar B, Miller JF. Diversity-generating retroelements. Curr Opin Microbiol. 2007;10:388–395. doi: 10.1016/j.mib.2007.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Liu M, Deora R, Doulatov SR, Gingery M, Eiserling FA, et al. Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science. 2002;295:2091–2094. doi: 10.1126/science.1067467. [DOI] [PubMed] [Google Scholar]
- 4.Guo H, Tse LV, Barbalat R, Sivaamnuaiphorn S, Xu M, et al. Diversity-generating retroelement homing regenerates target sequences for repeated rounds of codon rewriting and protein diversification. Mol Cell. 2008;31:813–823. doi: 10.1016/j.molcel.2008.07.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zimmerly S, Guo H, Perlman PS, Lambowitz AM. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell. 1995;82:545–554. doi: 10.1016/0092-8674(95)90027-6. [DOI] [PubMed] [Google Scholar]
- 6.Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 1993;72:595–605. doi: 10.1016/0092-8674(93)90078-5. [DOI] [PubMed] [Google Scholar]
- 7.Cost GJ, Feng Q, Jacquier A, Boeke JD. Human L1 element target-primed reverse transcription in vitro. EMBO J. 2002;21:5899–5910. doi: 10.1093/emboj/cdf592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.McMahon SA, Miller JL, Lawton JA, Kerkow DE, Hodes A, et al. The C-type lectin fold as an evolutionary solution for massive sequence variation. Nat Struct Mol Biol. 2005;12:886–892. doi: 10.1038/nsmb992. [DOI] [PubMed] [Google Scholar]
- 9.Miller JL, Le Coq J, Hodes A, Barbalat R, Miller JF, et al. Selective ligand recognition by a diversity-generating retroelement variable protein. PLoS Biol. 2008;6:e131. doi: 10.1371/journal.pbio.0060131. doi: 10.1371/journal.pbio.0060131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dai X, Greizerstein MB, Nadas-Chinni K, Rothman-Denes LB. Supercoil-induced extrusion of a regulatory DNA hairpin. Proc Natl Acad Sci U S A. 1997;94:2174–2179. doi: 10.1073/pnas.94.6.2174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Miller A, Dai X, Choi M, Glucksmann-Kuis MA, Rothman-Denes LB. Single-stranded DNA-binding proteins as transcriptional activators. Methods Enzymol. 1996;274:9–20. doi: 10.1016/s0076-6879(96)74004-1. [DOI] [PubMed] [Google Scholar]
- 12.Lu M, Guo Q, Studier FW, Kallenbach NR. Resolution of branched DNA substrates by T7 endonuclease I and its inhibition. J Biol Chem. 1991;266:2531–2536. [PubMed] [Google Scholar]
- 13.Mizushima T, Kataoka K, Ogata Y, Inoue R, Sekimizu K. Increase in negative supercoiling of plasmid DNA in Escherichia coli exposed to cold shock. Mol Microbiol. 1997;23:381–386. doi: 10.1046/j.1365-2958.1997.2181582.x. [DOI] [PubMed] [Google Scholar]
- 14.Witz G, Stasiak A. DNA supercoiling and its role in DNA decatenation and unknotting. Nucleic Acids Res. 2010;38:2119–2133. doi: 10.1093/nar/gkp1161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Siregar JJ, Miroshnikov K, Mobashery S. Purification, characterization, and investigation of the mechanism of aminoglycoside 3′-phosphotransferase type Ia. Biochemistry. 1995;34:12681–12688. doi: 10.1021/bi00039a026. [DOI] [PubMed] [Google Scholar]
- 16.Liu M, Gingery M, Doulatov SR, Liu Y, Hodes A, et al. Genomic and genetic analysis of Bordetella bacteriophages encoding reverse transcriptase-mediated tropism-switching cassettes. J Bacteriol. 2004;186:1503–1517. doi: 10.1128/JB.186.5.1503-1517.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Senior MM, Jones RA, Breslauer KJ. Influence of loop residues on the relative stabilities of DNA hairpin structures. Proc Natl Acad Sci U S A. 1988;85:6242–6246. doi: 10.1073/pnas.85.17.6242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dai L, Zimmerly S. Compilation and analysis of group II intron insertions in bacterial genomes: evidence for retroelement behavior. Nucleic Acids Res. 2002;30:1091–1102. doi: 10.1093/nar/30.5.1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Robart AR, Seo W, Zimmerly S. Insertion of group II intron retroelements after intrinsic transcriptional terminators. Proc Natl Acad Sci U S A. 2007;104:6620–6625. doi: 10.1073/pnas.0700561104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Granlund M, Michel F, Norgren M. Mutually exclusive distribution of IS1548 and GBSi1, an active group II intron identified in human isolates of group B streptococci. J Bacteriol. 2001;183:2560–2569. doi: 10.1128/JB.183.8.2560-2569.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nishino T, Ishino Y, Morikawa K. Structure-specific DNA nucleases: structural basis for 3D-scissors. Curr Opin Struc Biol. 2006;16:60–67. doi: 10.1016/j.sbi.2006.01.009. [DOI] [PubMed] [Google Scholar]
- 22.Declais AC, Lilley DM. New insight into the recognition of branched DNA structure by junction-resolving enzymes. Curr Opin Struc Biol. 2008;18:86–95. doi: 10.1016/j.sbi.2007.11.001. [DOI] [PubMed] [Google Scholar]
- 23.Ichiyanagi K, Beauregard A, Lawrence S, Smith D, Cousineau B, et al. Retrotransposition of the Ll.LtrB group II intron proceeds predominantly via reverse splicing into DNA targets. Mol Microbiol. 2002;46:1259–1272. doi: 10.1046/j.1365-2958.2002.03226.x. [DOI] [PubMed] [Google Scholar]
- 24.Zhong J, Lambowitz AM. Group II intron mobility using nascent strands at DNA replication forks to prime reverse transcription. EMBO J. 2003;22:4555–4565. doi: 10.1093/emboj/cdg433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lambowitz AM, Zimmerly S. Mobile group II introns. Annu Rev Genet. 2004;38:1–35. doi: 10.1146/annurev.genet.38.072902.091600. [DOI] [PubMed] [Google Scholar]
- 26.Stainer DW, Scholte MJ. A simple chemically defined medium for the production of phase I Bordetella pertussis. J Gen Microbiol. 1970;63:211–220. doi: 10.1099/00221287-63-2-211. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.