Abstract
The R1Bm element, found in the silkworm Bombyx mori, is a member of a group of widely distributed retrotransposons that lack long terminal repeats. Some of these elements are highly sequence-specific and others, like the human L1 sequence, are less so. The majority of R1Bm elements are associated with ribosomal DNA (rDNA). R1Bm inserts into 28S rDNA at a specific sequence; after insertion it is flanked by a specific 14-bp target site duplication of the 28S rDNA. The basis for this sequence specificity is unknown. We show that R1Bm encodes an enzyme related to the endonuclease found in the human L1 retrotransposon and also to the apurinic/apyrimidinic endonucleases. We expressed and purified the enzyme from bacteria and showed that it cleaves in vitro precisely at the positions in rDNA corresponding to the boundaries of the 14-bp target site duplication. We conclude that the function of the retrotransposon endonucleases is to define and cleave target site DNA.
Retrotransposons that lack long terminal repeats are very diverse in structure and can insert into a wide variety of different types of DNA targets. Some of these elements, such as the human L1 element, insert into a relatively wide array of targets distributed on all host chromosomes. In contrast, related retroelements from insects and trypanosomes integrate at very specific sequences. The basis for this extreme specificity is in most cases unknown. We recently showed that all of the elements that lack sequence specificity as well as a subset of those that are sequence-specific encode an endonuclease (EN) domain, usually at the N terminus of the second ORF (1). ORF2 also encodes reverse transcriptase (RT), and an attractive model is that the bifunctional ORF2 protein nicks target DNA and then primes reverse transcription of transposon RNA from the target DNA nick. The EN domain resembles apurinic/apyrimidinic (AP) ENs, which are important for DNA repair (2), but the human L1 EN does not cleave at AP sites.
The silkworm Bombyx mori genome contains two sequence-specific retroelements, each of which is inserted at a specific position in 28S rDNA; R1Bm encodes an EN domain and R2Bm does not. The R2Bm element has a single ORF that nevertheless encodes both an EN activity and an RT activity and has served as the best model system for the target-primed reverse transcription (TPRT) model of transposition in these elements (3). In contrast, R1Bm has two ORFs. ORF1 encodes a protein with certain similarities to retroviral Gag proteins, and ORF2 encodes a protein with homology to both RT and an EN (1, 4) (Fig. 1). The retrotransposition of R1Bm has not been studied. The basis for its sequence specificity is unknown; in principle, it could be either specified by the EN itself, by the RT, or by host factors. We wished to determine the function of the R1Bm EN and specifically to test the hypothesis that the R1Bm EN specified target sequence cleavage.
The R1 elements are widely distributed among insect orders and are inserted in precisely the same site in the rDNA of these diverse species (5). Remarkably, in some strains of Drosophila, these insertions interrupt as many as 50–70% of the copies of rDNA (6, 7), but insertion into other genomic sites can also occur (8, 9). However, the basis for the exquisite target specificity of this element was unknown until now.
We expressed the R1 EN domain in bacteria, purified the protein, and showed that it encodes a sequence-specific EN. It specifically cleaves Bombyx rDNA at both boundaries of the 14-bp target site (Fig. 1), indicating that this R1Bm EN defines and cleaves the DNA target for R1Bm. Because R1Bm ORF2 also encodes an RT activity, we propose a TPRT model to explain the R1Bm retrotransposition mechanism.
The existence of a conserved AP EN-like domain in diverse retrotransposons that lack long terminal repeats raised questions about its function. AP ENs have three biochemical activities: endonucleolytic cleavage, RNase H, and 3′ → 5′-exonuclease. Any related activities in the transposon enzymes could potentially play a role in retrotransposition. The endonucleolytic cleavage activity suggests a role in target site definition and cleavage, RNase H activity could be required for degradation of an RNA/DNA hybrid, and exonuclease could play a role in proofreading. We have shown that the EN encoded by the human L1 retrotransposon is not an AP EN but rather a simple nicking enzyme; we found no evidence for RNase H or 3′ → 5′-exonuclease activity. We used a functional assay for L1 retrotransposition (10) to demonstrate that the L1 EN was required for retrotransposition, suggesting an essential function for L1 EN in retrotransposition and arguing against a proofreading function (1). In vitro assays failed to provide any evidence for an RNase H activity in L1 EN (Q.F. and J.D.B., data not shown). The cleavage specificity of the L1 EN is consistent with target site definition and cleavage, but as L1 can insert at thousands of different genomic sites, we have no direct positive evidence for the function of the EN domain in retrotransposition. To provide such evidence, we studied the R1Bm element, which inserts at a specific sequence. If the role of the EN domain is to recognize and cleave the target DNA, then R1Bm EN should cleave specifically at the boundaries of the R1Bm target site duplication (tsd). Also, the TPRT model of Luan et al. (3) predicts that the bottom strand of the target site should be cleaved before the top strand as bottom strand cleavage generates the primer for reverse transcription of the RNA. We show here that R1Bm EN protein indeed recognizes and cleaves the target sequence in vitro.
MATERIALS AND METHODS
Plasmids, Protein Expression, and Purification.
The RIBm EN domain was amplified by PCR with primers JB1158 (5′-TACCATGGATATTAGGCCCCGAC) and JB1159 (5′-GCCCATGGTACCGCCCCCCACCCC) and p78.Xho-1.9kb (4) as template. Primers contain NcoI sites (underlined) at their 5′-ends. The resulting 670-bp fragment was cloned into pCRII (Invitrogen) and confirmed by DNA sequencing to contain no unwanted mutations. pGS405 was constructed by inserting the 670-bp NcoI fragment from this construct into the NcoI site of expression vector pET-15b (Novagen). Point mutants were created by oligonucleotide-directed mutagenesis with a Quickchange site-directed mutagenesis kit (Stratagene). For E40A, PCR was performed with primers JB1476 (5′-GTTCTTGTACAGGCCCAATATTCCATG-3′) and JB1477 (5′-CATGGAATATTGGGCCTGTACAAGAAC-3′) and Pfu Polymerase on a pGS405 template. The PCR product was digested with DpnI and transformed into Escherichia coli. The mutant construct pQF371 was confirmed by DNA sequencing. Similarly, the D186A mutant was made with primers JB1478 (5′-GAATCTTATGTCGCTGTCACGCTGTCT-3′) and JB1479 (5′-AGACAGCGTGACAGCGACATAAGATTC-3′) to generate pQF372. The rDNA target plasmid, pB109, kindly provided by T. Eickbush, consists of a HincII–SpeI fragment (1055 bp) of B. mori cloned into the HincII–XbaI site of pUC19 (3).
Proteins were expressed and purified exactly as previously described (1) except that the proteins were eluted with 0.25 ml of washing buffer containing 150 or 300 mM imidazole. Most of the protein eluted in the 300 mM imidazole fraction. Both fractions were pooled, dialyzed, and concentrated against storage buffer (50 mM Tris⋅HCl, pH 7.6/300 mM NaCl/10% glycerol/10 mM 2-mercaptoethanol). Protein aliquots of 5–10 μg in storage buffer at 0.2–0.5 μg/μl were stored at −70°C. Freeze–thaw had no apparent effect on protein activity.
Activity Assay.
Optimal salt, divalent cation, temperature, and pH conditions were defined for the R1Bm EN activity based on its ability to nick pB109 (data not shown). The EN reaction mix contained 50 mM Pipes⋅HCl at pH 6.0, 30 mM NaCl, 1 mM CoCl2, 0.2 μg of supercoiled plasmid DNA, and 0.5 μg of purified protein in a total volume of 25 μl. MgCl2 was later shown to substitute for CoCl2. Incubation was at 25°C for 1 h. The reaction was stopped by adding EDTA to a final concentration of 25 mM. Half of the reaction mix was loaded on a 1% agarose gel in TTE (Tris⋅taurine EDTA) buffer containing 0.5 μg/ml ethidium bromide.
Cleavage Site Mapping.
End-labeled DNA molecules containing the R1Bm target site were created by PCR with pB109 as a template and a combination of one kinased primer and one unlabeled primer. The sequences of the primers used were: JB1291, 5′-TCCTTACAATGCCAGACTAG-3′; JB1296, 5′-CTTAAGGTAGCCAAATGC-3′; JB1531, 5′-AACGTGAAGAAATTCAAGC-3′; JB1534, 5′-GTTTTTCAGCGACGATCG-3′.
R1 ENp (1–2 μg) was used to digest approximately 100 ng of PCR product in the presence of MgCl2. The protein was inactivated by adding EDTA to a final concentration of 25 mM. Initially, formamide was added to a final concentration of 35%, but this resulted in incomplete denaturation; in subsequent experiments the samples were precipitated and resuspended in 95% formamide and boiled for 10 min. The products were run on 6% polyacrylamide DNA sequencing gels together with dideoxy sequencing reactions performed by using the same radiolabeled primers and pB109 template as size standards. The double-strand cleavage reactions were done exactly as above except that the products were not mixed with formamide and were run on a nondenaturing polyacrylamide gel.
Gel Filtration.
24 μg of purified EN protein in 200 μl (4.1 μM) of storage buffer were applied to a Superose-12HR 10/30 column with a Pharmacia FPLC system. The column was equilibrated with storage buffer, eluted at a flow rate of 0.4 ml/min, and monitored for protein by absorbance at 280 nm. For calibration, catalase, aldolase, ovalbumin, and ribonuclease A were used as standards. Values of [−log(Kav)]1/2 were plotted against the corresponding Stokes radii of the standards. The partition coefficient Kav of each protein was calculated by using the equation Kav = Ve − V0/Vt − V0, where Ve is elution volume of the protein, V0 is column void volume determined by blue dextran 2000, and Vt is total bed volume of the column. Fractions were assayed for R1Bm EN protein by immunoblotting with antibody G-18 (Santa Cruz Biotechnology), which recognizes the C-terminal tag.
Sedimentation Equilibrium.
The molecular mass of R1Bm EN was examined by analytical ultracentrifugation to determine the multimeric state of EN in solution. The experiments were conducted on a Beckman Optima XL-A analytical ultracentrifuge with an An-60Ti rotor and a standard six-sector cell. A 100-μl sample containing 1.5 nmol of 15 μM EN in storage buffer was centrifuged at 4°C, with rotor speeds of 12,000 and 15,000 rpm, and equilibrium data were collected at a wavelength of 280 nm. Equilibrium was checked by comparing scans at various times up to 24 h. Data were analyzed with nonlin, a program that performs a global nonlinear least-squares fit of sedimentation equilibrium data (11). The extinction coefficient at 280 nm used was estimated at 20,400 M−1⋅cm−1 based on the amino acid composition of R1Bm EN.
RESULTS
Expression and Purification of R1Bm EN.
We expressed the R1Bm EN domain (Fig. 1b) in bacteria with a C-terminal His6 tag and purified the protein by nickel chelate affinity chromatography (Fig. 2a). By expressing the EN domain separately from the rest of the ORF2 protein, we could test whether specificity of retrotransposition was conferred by the EN domain itself. We used a very sensitive plasmid nicking assay to detect endonucleolytic activity of the R1Bm EN protein. We found that it had weak nicking activity on supercoiled plasmids and defined conditions under which supercoiled plasmids bearing the B. mori rDNA (the in vivo target of the R1Bm transposon) were nicked (Fig. 2b). We optimized the cleavage of an rDNA plasmid substrate, pB109, for R1Bm EN relative to monovalent cation, pH, buffer, and temperature conditions and showed that cleavage absolutely required the divalent cations Mg2+ or Co2+ (data not shown). Cleavage of a control plasmid lacking the target site was also observed, so the presence of an R1Bm target site was not required for cleavage. The specific activity of this protein was extremely low, about 1% of the specific activity of L1 EN (1). The R1Bm enzyme, like L1 EN protein, was inactive on apurinic DNA (data not shown). The L1 EN is itself about 20,000-fold less active than DNase I, a distantly related nicking EN (12). Thus the specific activity of the isolated R1Bm EN domain is about 2 × 106-fold lower than that of the digestive enzyme DNase I. We attempted to increase the specific activity of R1Bm EN by expressing lengthened versions (containing amino acid residues 1–230, 1–239, and 1–259), but these proteins instead showed modest decreases in specific activity (data not shown), suggesting that we had not omitted critical amino acid sequences in the design of our original construct. It is formally possible that the low specific activity of the R1Bm EN is because of a low fraction of properly folded molecules rather than an intrinsic property of the enzyme. Because of the extremely low levels of enzyme activity detected in our sensitive nicking assay, it was necessary to demonstrate that the detected activity was encoded by R1Bm ORF2. We mutated the highly conserved E40 and D186 residues in our R1Bm EN construct; the corresponding residues are known to be critical for catalysis in pancreatic DNase, exonuclease III (12), human AP EN (13), and the human L1 EN domain (1). Multiple sequence alignments (1, 14) show that these residues are absolutely conserved among retrotransposon ENs as well as in AP ENs. We purified the E40A and D186A mutant proteins and found that they were inactive in our nicking assay (Fig. 2). Thus the observed activity is indeed encoded by R1Bm ORF2.
Cleavage Specificity.
We tested whether cleavage occurred at a specific sequence(s). To test this, we prepared end-labeled 175-bp fragments of the B. mori rDNA containing the R1Bm insertion site and incubated them with or without wild-type R1Bm enzyme. The cleavage products were then separated on polyacrylamide DNA sequencing gels with sequence standards. On the bottom strand, the most prominent cleavage product corresponded precisely to the left boundary of the 14-bp tsd that R1Bm generates in vivo, whereas on the top strand, a prominent cleavage product corresponded precisely to the left boundary of the tsd (Fig. 3a). Additional cleavage products were observed on the top strand, indicating that cleavage by R1Bm EN protein is not absolutely sequence-specific in vitro. In time course experiments (e.g., Fig. 3b), bottom strand cleavage was faster than top strand cleavage, consistent with the TPRT model (3) in which bottom strand cleavage defines the initial target site primer for reverse transcription. Top strand cleavage occurs subsequently, when it may be required for priming of the second strand.
We have also investigated the specificity of cleavage on other substrates, including a fragment just 100 bp longer than the above substrate generated using primers JB1531 and JB1534. Obviously, sequences other than the preferred target can also be cleaved, as vector lacking the rDNA substrate is nicked to about the same extent as the plasmid containing the rDNA target site. We found that the relative efficiency of cleavage of the R1Bm in vivo target site was not consistently the highest efficiency cleavage site in every fragment tested, and other sites of cleavage (lacking obvious sequence similarity to the target site) were sometimes equally prominent, as in the 300-bp substrate mentioned above (data not shown). Thus it appears that the nature of flanking sequences can affect the relative rates of cleavage at various sites. We conclude that R1Bm EN is a sequence-specific nuclease but that its specificity can be altered by the effects of flanking sequences.
Double-Strand Cleavage and Evidence for Multimerization.
We next examined whether double-strand cleavage could be mediated by R1Bm EN. Only a small fraction of the substrate was nicked on each strand under the conditions used in Fig. 3. Similarly digested products were run on a nondenaturing gel, and products with electrophoretic mobilities consistent with double-strand cleavage at the tsd boundaries were observed (Fig. 4). The ability of the enzyme to make a double-strand break suggested two possibilities: (i) a monomer might make both cleavages or (ii) the enzyme might be multimeric. We examined the latter possibility by gel-filtration chromatography and equilibrium sedimentation and found that bulk R1Bm EN protein indeed behaves as a multimer, probably a tetramer (Fig. 5). Thus the fragment of the ORF2 protein we purified contains both an EN active site and a multimerization domain. Because of the low activity of the enzyme, it was not possible to determine unambiguously whether the tetramer form represented the active form of the enzyme. Therefore, it is formally possible that the active species is monomeric. Nevertheless, the fact that this retrotransposon EN shows evidence of tetramerization is interesting, because other enzymes involved in integration, including retroviral integrases and Mu transposase, have multimeric active forms (15–17).
DISCUSSION
We recently showed that the human retrotransposon L1, which inserts into many sites in human DNA, encodes an EN with nicking activity (1). L1 EN also nicks specific DNA sites but with less specificity than R1Bm EN. Essentially, L1 EN prefers to nick at sequences that conform to the sequence Yn ↓ Rn that are A+T-rich, a rather degenerate consensus sequence. Although L1 EN in vitro cleavage sites resemble the sites of TPRT inferred from the sequences of various L1 in vivo transposition events, the similarities were restricted to runs of one to a few consecutive purines immediately 3′ to the cleavage site. Because ENs of this general class (such as E. coli Exo III) have ribonuclease H activity (13) and because ribonuclease H activity could in principle be important for retrotransposition, Barzilay and Hickson (2) actually proposed that these ENs are ribonucleases rather than target site definition ENs. Our work showing that R1Bm EN cleaves with sequence specificity precisely at the boundaries of the R1Bm tsd provides the strongest evidence yet that the critical role of the retrotransposon ENs is instead to define and cleave the target DNA. Furthermore, the sluggish specific activity of R1Bm EN is consistent with target site cleavage, which requires only two cleavages. Also, because retrotransposition is predicted to take place in the nucleus, cellular RNase H could carry out this degradative role, unlike the case for retroviruses and retrotransposons, which carry out reverse transcription in the cytoplasm. However, it remains a formal possibility that the R1 EN has multiple functions in retrotransposition. Finally, recent studies of retrotransposon Tx1, a Xenopus element proposed to be specific for a sequence within a small DNA transposon (18), have provided independent evidence for our findings on R1Bm EN. The Tx1 EN domain was shown to have a precise bottom strand sequence-specific nicking activity in vitro on a substrate containing the putative Tx1 target DNA (S. Christensen and D. Carroll, University of Utah, personal communication).
The fact that R1Bm EN can make paired cleavages at each end of the tsd suggests a double TPRT model for the complete R1Bm retrotransposition process (Fig. 6). The multimeric state of the EN enzyme we expressed supports the possibility that the bottom and top strand cleavages might be made by independent subunits of a multimer of ORF2 protein. In this model, a multimer of bifunctional ORF2 proteins (each monomer containing both EN and RT activities) initially nicks target site DNA on the bottom strand and then uses the 3′ end generated to prime reverse transcription on R1Bm RNA. In support of this model, we have expressed a larger fragment of R1Bm ORF2 including the region homologous to RT in E. coli and shown that this protein has RT activity in vitro (Q.F. and J.D.B., data not shown). The detailed structure of R1Bm RNA is unknown, but a low level of cotranscription (readthrough) with rRNA (corresponding to less than one such transcript per cell) has been reported (19). The readthrough transcripts would have rRNA sequences flanking the transposon sequences. The fact that this element’s RNA has such target sequences flanking its own sequences will serve to increase the precision of a TPRT mechanism, as has been shown for R2Bm (20). Sequence complementarity between the cotranscript RNA and target DNA should increase the precision of retrotransposition even in the absence of cleavage precisely at the top strand tsd boundary and could explain how sequence-specific insertion can be effected in the absence of precise cleavage on the top strand (D. Carroll, personal communication). A simple series of nicking and priming events and a strand-transfer event could readily explain the observed R1Bm structure (Fig. 6). Experiments are under way to more completely understand the mechanism of this unique retrotransposition pathway.
Acknowledgments
We are especially grateful to Tom Eickbush for freely providing clones and useful information and an anonymous reviewer for insightful observations. We thank Shawn Christensen and Dana Carroll for communicating unpublished data. We thank Shani Waninger, Dyche Mullins, David Symer, and Cynthia Wolberger for assistance with analytic ultracentrifugation and Tom Kelly for helpful comments on the manuscript. This work was funded in part by National Institutes of Health Grant CA16519.
ABBREVIATIONS
- EN
endonuclease
- RT
reverse transcriptase
- AP
apurinic/apyrimidinic
- TPRT
target-primed reverse transcription
- tsd
target site duplication
References
- 1.Feng Q, Moran J, Kazazian H, Boeke J D. Cell. 1996;87:905–916. doi: 10.1016/s0092-8674(00)81997-2. [DOI] [PubMed] [Google Scholar]
- 2.Barzilay G, Hickson I D. BioEssays. 1995;17:713–719. doi: 10.1002/bies.950170808. [DOI] [PubMed] [Google Scholar]
- 3.Luan D D, Korman M H, Jakubczak J L, Eickbush T H. Cell. 1993;72:595–605. doi: 10.1016/0092-8674(93)90078-5. [DOI] [PubMed] [Google Scholar]
- 4.Xiong Y, Eickbush T H. Mol Cell Biol. 1988;8:114–123. doi: 10.1128/mcb.8.1.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Jakubczak J L, Burke W D, Eickbush T H. Proc Natl Acad Sci USA. 1991;88:3295–3299. doi: 10.1073/pnas.88.8.3295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wellauer P K, Dawid I B. Cell. 1977;10:193–212. doi: 10.1016/0092-8674(77)90214-8. [DOI] [PubMed] [Google Scholar]
- 7.Hollocher H, Templeton A R, DeSalle R, Johnston J S. Genetics. 1992;130:355–366. doi: 10.1093/genetics/130.2.355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dawid I B, Botchan P. Proc Natl Acad Sci USA. 1977;74:4233–4237. doi: 10.1073/pnas.74.10.4233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Xiong Y, Burke W D, Jakubczak J L, Eickbush T H. Nucleic Acids Res. 1988;16:10561–10573. doi: 10.1093/nar/16.22.10561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Moran J V, Holmes S E, Naas T P, DeBerardinis R J, Boeke J D, Kazazian H H., Jr Cell. 1996;87:917–927. doi: 10.1016/s0092-8674(00)81998-4. [DOI] [PubMed] [Google Scholar]
- 11.Johnson M L, Correia J J, Yphantis D A, Halvorson H R. Biophys J. 1981;36:575–588. doi: 10.1016/S0006-3495(81)84753-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mol C D, Kuo C-F, Thayer M M, Cunningham R P, Tainer J A. Nature (London) 1995;374:381–386. doi: 10.1038/374381a0. [DOI] [PubMed] [Google Scholar]
- 13.Barzilay G, Walker L J, Robson C N, Hickson I D. Nucleic Acids Res. 1995;23:1544–1550. doi: 10.1093/nar/23.9.1544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Martín F, Marañon C, Olivares M, Alonso C, Lopez M C. J Mol Biol. 1995;247:49–59. doi: 10.1006/jmbi.1994.0121. [DOI] [PubMed] [Google Scholar]
- 15.Engelman A, Bushman F D, Craigie R. EMBO J. 1993;12:3269–3275. doi: 10.1002/j.1460-2075.1993.tb05996.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Aldaz H, Schuster E, Baker T A. Cell. 1996;85:257–269. doi: 10.1016/s0092-8674(00)81102-2. [DOI] [PubMed] [Google Scholar]
- 17.Jones K S, Coleman J, Merkel G W, Laue T M, Skalka A M. J Biol Chem. 1992;267:16037–16040. [PubMed] [Google Scholar]
- 18.Garrett J E, Knutzon D S, Carroll D. Mol Cell Biol. 1989;9:3018–3027. doi: 10.1128/mcb.9.7.3018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Long E O, Dawid I B. Cell. 1979;18:1185–1196. doi: 10.1016/0092-8674(79)90231-9. [DOI] [PubMed] [Google Scholar]
- 20.Luan D, Eickbush T H. Mol Cell Biol. 1995;15:3882–3891. doi: 10.1128/mcb.15.7.3882. [DOI] [PMC free article] [PubMed] [Google Scholar]