Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2004 Oct 1;101(41):14719–14724. doi: 10.1073/pnas.0406281101

Reverse transcriptase and endonuclease activities encoded by Penelope-like retroelements

Konstantin I Pyatkov *,, Irina R Arkhipova ‡,§, Natalia V Malkova ¶,, David J Finnegan , Michael B Evgen'ev *,**,††
PMCID: PMC522041  PMID: 15465912

Abstract

Penelope-like elements are a class of retroelement that have now been identified in >50 species belonging to at least 10 animal phyla. The Penelope element isolated from Drosophila virilis is the only transpositionally active representative of this class isolated so far. The single ORF of Penelope and its relatives contains regions homologous to a reverse transcriptase of atypical structure and to the GIY-YIG, or Uri, an endonuclease (EN) domain not previously found in retroelements. We have expressed the single ORF of Penelope in a baculovirus expression system and have shown that it encodes a polyprotein with reverse transcriptase activity that requires divalent cations (Mn2+ and Mg2+). We have also expressed and purified the EN domain in Escherichia coli and have demonstrated that it has EN activity in vitro. Mutations in the conserved residues of the EN catalytic module abolish its nicking activity, whereas the DNA-binding properties of the mutant proteins remain unaffected. Only one strand of the target sequence is cleaved, and there is a certain degree of cleavage specificity. We propose that the Penelope EN cleaves the target DNA during transposition, generating a primer for reverse transcription. Our results show that an active Uri EN has been adopted by a retrotransposon.

Keywords: GIY-YIG endonuclease, retrotransposons, Drosophila virilis, Uri domain


There are two major classes of autonomous retrotransposons, the long terminal repeat (LTR) retrotransposons and the non-LTR retrotransposons, also known as LINEs. These retrotransposons are present in various amounts in genomes of most animals and plants studied so far and can be distinguished on the basis of their sequence organization and mode of transposition (1). Many new retrotransposon families have been discovered as eukaryotic genomes were being sequenced, and most have turned out to be representatives of the previously described retrotransposon classes, although noncanonical groups have also been identified (2). The Penelope-like elements (PLEs) are highly unusual elements discovered in Drosophila and other organisms. Penelope was isolated from Drosophila virilis after the discovery of hybrid dysgenesis in this species (3, 4). This phenomenon is observed when females from strains lacking Penelope are crossed with males carrying multiple active Penelope copies. Elements from several different families, both transposons and retrotransposons, transpose in the progeny of such a cross, with Penelope apparently playing a key role in mobilizing the others (4). PLEs have been reported in genomes of crustaceans, echinoderms, fish, amphibians, flatworms, roundworms, and rotifers (58). Phylogenetic analysis indicates that the reverse transcriptases (RTs) of Penelope and other PLEs do not belong to any of the characterized major RT lineages, being closer to telomerase RTs (TERTs) than to any other RT (8). The C-terminal domain of the Penelope polyprotein is predicted to encode an endonuclease (EN) of the Uri family that could function as an integrase. This family includes bacterial/organellar group I intron-encoded GIY-YIG ENs and UvrC excision repair ENs (5, 9). To our knowledge, no retroelements containing a predicted EN of this family have been characterized previously.

Our recent investigation of PLEs in D. virilis and bdelloid rotifers revealed that the majority of PLE genomic copies in these species contain spliceosomal introns (8). The GIY-YIG domain of PLEs is of special interest in this respect because of its potential to participate in a DNA-dependent transposition pathway similar to that used by group I and occasionally group II introns (10, 11). The ability of PLEs to retain introns during transposition, their peculiar structural organization, and the unusual sequence of their RT domain indicate that the members of this clade are clearly different from both LTR and non-LTR retrotransposons and constitute a third, probably very ancient, class of eukaryotic retroelement (8). The mechanism by which PLEs transpose is unknown, and from the amino acid sequence of the highly divergent RT it is not clear whether it can function as a RT, given the ability of PLEs to retain introns. The predicted EN domain is also highly diverged from that of other Uri domain proteins. We report here that the Penelope RT and EN domains do indeed have their predicted enzyme activities, providing experimental evidence that a Uri EN has been adopted by a retrotransposon.

Materials and Methods

DNA Manipulations. Full-length Penelope ORF was amplified by PCR from the clone p6 (4, 12) (Fig. 1A) with primer pair A and cloned in NcoI–XhoI sites of the pFastBac HT A vector (Bac-to-Bac baculovirus expression system, GIBCO/BRL/Life Technologies, Grand Island, NY). This plasmid was used to generate the recombinant virus BacPenORF. All primers are listed in Table 1, which is published as supporting information on the PNAS web site. The EN domain was amplified from p6 with primer pair B and cloned in NdeI–XhoI sites of pET19mod (Novagen). This plasmid, pET-PenEndo, was used to make EN mutants with the aid of QuikChange Site-Directed Mutagenesis Kit (Stratagene) and primer pairs C–H. The ps1 integration site was amplified from genomic DNA of a D. virilis M-like strain lacking Penelope with primer pair I complementary to the flanking sequences in the p1 genomic clone (4). The resulting 391-bp DNA fragment (GI:17432717) was cloned into the SmaI site of pUC19. Fragments PB, 1B, 2B, and 3B, containing 366, 106, 92, and 80 bp of Penelope sequence, respectively, were amplified from p6 with primer pairs J–M and cloned into CaSpeR-AUG-β-gal (13).

Fig. 1.

Fig. 1.

Structure of Penelope and purification of its ORF-encoded activities. (A) The transpositionally active p6 Penelope copy (12). The 5′ fragment of cDNA with the splice donor/acceptor sites for the 75-bp intron is shown above the diagram. The position of the 34-bp tail in the 5′ terminal part of Penelope is shown by a gray circle outlined in black, and its position at the 3′ ends of other copies of Penelope (but not p6) is shown by a gray circle with no outline. (B) Polyacrylamide gel electrophoresis of protein extracts from S. frugiperda cells infected by BacPenORF at different stages of multistep chromatography: 1, crude extract; 2, phosphocellulose P11; 3, heparin-Sepharose; 4, Mono S; and 5, 50-kDa Centricon. (C) Polyacrylamide gel electrophoresis of His6-tagged Penelope EN after purification by metal-chelate chromatography: 1, IPTG-induced E. coli lysate; 2, flowthrough after Ni-NTA column; 3, wash with 50 mM imidazole; and 4, elution with 250 mM imidazole. Arrows indicate the position of 93-kDa (B) and 22.4-kDa (C) proteins.

Expression and Purification of Penelope Full-Length Protein. Penelope ORF was expressed in Spodoptera frugiperda (strain Sf21) with the recombinant BacPenORF virus. Virus growth, titer determination, and infection of Sf21 cells were carried out according to instructions of the Bac-to-Bac system (see above). Cells (1.5 × 109) were infected at multiplicity of infection 50, incubated for 72 h at 28°C in 500-ml flasks, and centrifuged at 1,000 × g for 15 min at 4°C. The pellet was washed with ice-cold PBS and centrifuged as above, and cells were lysed in 20 ml of A0.1 buffer (25 mM Hepes, pH 7.3/100 mM NaCl/5% glycerol/1 mM DTT/0.02% Nonidet P-40; A0.1 means 100 mM NaCl. The pellet was resuspended in 20 ml of A0.1 buffer and sonicated on ice, the lysate was centrifuged at 40,000 × g for 1 h, and the crude extract was loaded onto a 5-ml phosphocellulose P11 column (Whatman) equilibrated in A0.1 buffer. The flowthrough containing the RT activity was loaded onto 1 ml of heparin-Sepharose CL-6B (Amersham Pharmacia) equilibrated in A0.1 buffer and washed with 10 ml of A0.1 buffer, and the protein was eluted with an increasing gradient of NaCl in A0.1-1 buffer. RT-containing fractions eluted at 0.35–0.45 M NaCl were collected, dialyzed against A0.05 buffer, applied to the Mono S HR5/5 column (FPLC, Amersham Pharmacia), equilibrated in A0.05 buffer, and eluted with an increasing gradient of NaCl in A0.05-1 buffer. Active fractions were concentrated with 50-kDa Centricon (Millipore), dialyzed against A buffer containing 50% glycerol, aliquoted, and stored at –20°C.

RT Assays. RT activity was measured in a 20-μl standard reaction mix containing 50 mM Tris·HCl (pH 8.0), 60 mM KCl, 10 mM MgCl2, 10 mM DTT, 0.1% Nonidet P-40, 0.5 mM MnSO4, 0.04 mM dTTP, 0.1 A260 unit of poly(rA)·(dT)12, 0.5 μCi of [3H]dTTP (1 Ci = 37 GBq), and the protein. Reactions were incubated for 20 min at 30°C, and aliquots were removed, spotted on DE-81 paper (Whatman), washed five times for 5 min in 2× standard saline citrate (SSC), and dried, and radioactivity was measured by liquid scintillation counting. As a negative control, activity was measured in a reaction mix containing either 0.1 A260 unit of poly(rA) or 5 pmol of (dT)12. No incorporation of [3H]dTTP or [3H]TTP was observed in either reaction. RT activity was also assayed with the Penelope primer AATATTTATTTGTTGGCTGGCTCGA (position 3391) and short (Ps, 277-nt, position 3111, gaaattaatacgactcactatagggagaGCGCTCACTTGTACAGACACC) or long (Pl, 2,779-nt, position 614, gaaattaatacgactcactatagggagaAATATGGAAAGGTCGCCAG) Penelope templates synthesized by T7 RNA polymerase in vitro (14). T7 promoter sequence is in lowercase, and homology to p6 (GI:15559193) begins at indicated positions.

Purification and Renaturation of Penelope EN. E. coli strain Bl21(DE3)/pRIL/pET-PenEndo was grown at 37°C in LB until OD600 reached 0.6, induced by 1 mM isopropyl β-d-thiogalactoside (IPTG), and incubated for 5 h. Cells from 0.5 liter of culture were harvested by centrifugation, washed in cold 50 mM Tris·HCl (pH 8.0), resuspended in 15 ml of buffer B (50 mM NaH2PO4, pH 8.0/10 mM EDTA/1 mM PMSF/2 μg/ml pepstatin/2 μg/ml leupeptin), and sonicated on ice. The lysate was centrifuged at 30,000 × g for 30 min at 4°C to pellet the inclusion bodies. The pellet was resuspended in 15 ml of buffer C (50 mM NaH2PO4, pH 8.0/2 M urea/1 mM PMSF/2 μg/ml pepstatin/2 μg/ml leupeptin) and centrifuged as above. Inclusion bodies were lysed in 5 ml of L buffer (100 mM NaH2PO4, pH 8.0/10 mM Tris·HCl/300 mM NaCl/8 M urea/10 mM imidazole) for 2 h at room temperature. The lysate was centrifuged at 30,000 × g for 30 min, loaded onto 1 ml of Ni-NTA Superflow column (Qiagen, Valencia, CA) equilibrated in L buffer, washed with 10 ml of L buffer and 10 ml of W buffer (L buffer with 50 mM imidazole), eluted with 2 ml of E buffer (L buffer with 250 mM imidazole), and dialyzed against 100 vol of D buffer (20 mM NaH2PO4, pH 7.1/4 M urea/150 mM NaCl/20 mM 2-mercaptoethanol/5% glycerol/1 mM EDTA/0.02% Triton X-100) for 12 h at room temperature. The enzyme was refolded by using size exclusion chromatography on Sephacryl S-300 (Amersham Pharmacia). The S-300 column K9/60 was equilibrated in R buffer (20 mM NaH2PO4, pH 7.1/150 mM NaCl/5 mM 2-mercaptoethanol/5% glycerol/0.5 mM EDTA/0.02% Triton X-100). The protein (0.5 ml, 1–3 mg/ml) was loaded onto the column and eluted with R buffer at a flow rate of 0.3 ml/min. The refolded protein was eluted in the volume corresponding to 20–30 kDa, concentrated with 10-kDa Centricon, dialyzed against G buffer (20 mM NaH2PO4, pH 7.1/50 mM NaCl/5 mM 2-mercaptoethanol/50% glycerol/0.5 mM EDTA/0.02% Triton X-100) for 12 h, aliquoted, and stored at –20°C.

EN Activity Assays. Primer pair N, corresponding to the ps1 insertion site, was annealed, extended by the Klenow fragment of DNA polymerase I with [32P]dATP, and precipitated with ethanol. A 10-μl standard reaction mix containing 10 mM Tris·HCl (pH 7.5), 2.5 mM MgCl2, 0.5 mM DTT, 0.02% Triton X-100, 0.1 pmol of 32P-labeled DNA, and 0.1 μg of protein was incubated for 2 h at 30°C, mixed with 3 μl of formamide loading dye (95% formamide/20 mM EDTA/0.05% bromophenol blue/0.05% xylene cyanol) and analyzed by electrophoresis on a 10% polyacrylamide/8 M urea gel. For ligation assays, the 32P-labeled ps1 fragment was digested with WT Penelope EN as above, quantitatively precipitated by 0.1 ml of 1-butanol, washed twice with 80% ethanol, dried, and dissolved in 9.5 μl of 1× T4 DNA ligase buffer. After addition of 0.5 Weiss unit of T4 DNA ligase (Fermentas, Hanover, MD), reactions were incubated at 16°C and analyzed by denaturing gel electrophoresis as above.

DNA-Binding Assays. DNA-binding activity was assayed in a 5-μl standard reaction mix containing 5 mM Tris·HCl (pH 7.5), 0.5 mM DTT, 0.5 mM EDTA, 0.02% Triton X-100, 0.1 μg of poly(dI)·poly(dC) homopolymers, 0.1 pmol of 32P-labeled DNA, and 0.1 μg of enzyme, incubated for 20 min at 25°C, mixed with 1 μl of loading dye (1× binding buffer with 50% glycerol/1 mM EDTA/0.05% bromophenol blue/0.05% xylene cyanol), and analyzed by electrophoresis on a 10% polyacrylamide gel in 0.3× TBE (1× TBE = 89 mM Tris/89 mM boric acid/2 mM EDTA, pH 8.3) at 4°C.

Determination of the Cleavage Sites. The top and bottom strands of the 34-bp ps1 fragment were labeled independently by using primer pairs O and P. Each pair was annealed, extended by the Klenow fragment of DNA polymerase I with [32P]dATP, and divided into two parts, one of which was used to prepare a 1-bp sequence ladder by chemical degradation (15). The other part was precipitated with ethanol, incubated for 2 h at 30°C with 0.1 μg of protein in a 10-μl reaction mix containing 10 mM Tris·HCl (pH 7.5), 2.5 mM MgCl2, 0.5 mM DTT, 0.02% Triton X-100, and 0.1 pmol of 32P-labeled DNA, precipitated with ethanol, heat-denatured in formamide loading dye, and run along with the sequence ladder on a 60-cm 10% polyacrylamide/urea gel.

Results

Sequence Conservation of RT and EN Domains. A comparison of RT domains from several PLEs with the well studied RT of non-LTR retrotransposons and LTR retrotransposons, group II introns, retrons, and TERTs demonstrates that the homology encompasses only the most conserved RT core motifs 1–7 (8). To determine the degree of RT conservation between different PLEs, we obtained additional PLE sequences from databases. In all of these elements, the core RT motifs carry noncanonical residues, which are conserved between PLEs but differ characteristically from their counterparts in other RTs (such as AMG instead of the usual PFG/RQG in motif 4, the absence of a G residue in motif 6, FLD instead of hLG in motif 7, and extremely tight spacing between motifs 6 and 7) (Fig. 5 Upper, which is published as supporting information on the PNAS web site). Conservation of the seven core RT motifs argues in favor of its ability to function as an RNA-dependent DNA polymerase, although the enzyme may exhibit some unusual properties because of noncanonical highly conserved residues in the most essential catalytic domains.

Alignment of the C-terminal EN domain of the ORFs encoded by a number of eukaryotic PLEs (Fig. 5 Lower) shows that it shares with other members of the GIY-YIG family the most conserved residues thought to constitute the active site (5, 16, 17). In particular, the Penelope EN contains residues (Y730, Y746, G748, R757, E800, and N816) presumably corresponding to invariant or highly conserved residues required for catalytic activity of the prototype GIY-YIG EN I-TevI from bacteriophage T4 (Y6, Y17, G19, R27, E75, and N90). This finding allowed us to model the 3D structure of the Penelope EN domain with I-TevI as a template (Fig. 6, which is published as supporting information on the PNAS web site). Overall, the structure fits very well to its template, the primary differences being an extended loop between β-strands 1 and 2 and an apparent deletion between β-strand 3 and α-helix 3 leading to shortening of the helix by 1.5 turns and of the β-sheet by 5 amino acid residues. Notably, helix 1 exhibits a strong positive charge in both molecules, with side chains of five lysine residues exposed on the surface of the Penelope EN. This feature is thought to be responsible for interaction with DNA through phosphate backbone contacts (17). The conserved N816 at the C terminus is able to form hydrogen bonds with I747 in the backbone of β-strand 2, thus, possibly keeping helix 3 in place. The absence of major inconsistencies in the modeled structure suggests that it can adopt the same conformation as I-TevI, indicating that their catalytic properties may be similar.

Characterization of Penelope-Encoded RT Activity. We chose the full-length ORF of D. virilis Penelope (Fig. 1A) for studies of PLE RT activity because this element can transpose after interspecific transformation (12). Initial detection of RT activity was performed in cell-free extracts of S. frugiperda infected by recombinant baculovirus BacPenORF. A significant rate of [3H]dTTP incorporation was observed in the presence of poly(A)·oligo(dT)15, whereas no incorporation was detected in extracts from cells infected by the control virus (data not shown). The protein was purified by multistep chromatography (Fig. 1B). RT activity always copurified with the protein and was the highest after the final purification step.

Fig. 2 summarizes the results of analysis of various enzymatic properties of Penelope RT. Initial determinations of optimal conditions for RT assays were done by using different preincubation and incubation times, [3H]dTTP concentrations, incubation temperatures, and template/primers (Fig. 2 AD). The temperature optimum of Penelope RT lies between 25°C and 30°C (Fig. 2C), like that of other insect RTs (18, 19). In subsequent experiments, we used 30°C as incubation temperature and 20 min as incubation time because the enzyme was not sufficiently stable to justify longer incubation. The activity was also assayed with different template/primer combinations, such as short (277 bases) and long (2,779 bases) in vitro transcripts of Penelope and poly(rA)·oligo(dT) (Fig. 2D). The homopolymeric template was chosen for further comparisons because of the possibility of secondary self-priming of newly synthesized Penelope fragments.

Fig. 2.

Fig. 2.

Enzymatic properties of RT encoded by Penelope ORF. Activity was measured as incorporation (I) of [3H]dTTP, pmol/μg protein. Unless otherwise indicated, standard reaction conditions described in Materials and Methods were used. (A) RT activity at increasing incubation times (solid line) and at increasing preincubation times without [3H]dTTP followed by 20-min incubation at 30°C after [3H]dTTP addition (dashed line). (B) Dependence of RT activity on [3H]dTTP concentration. (C) RT activity at different temperatures. (D) RT activity on different templates. Ps and Pl, short and long Penelope transcripts, respectively; pA, poly(rA)·oligo(dT). (E) RT activity in the presence of monovalent and divalent cations: 1, requirements for K+, Mg2+, and Mn2+; 2, replacement of K+ by different monovalent cations; 3, replacement of Mn2+ by 5 mM NiCl2, ZnCl2, or CaCl2.

The Penelope RT activity, like that of many other characterized RTs (2022), requires divalent cations. The activity is higher in the presence of Mn2+ than Mg2+, as is the case for murine leukemia virus RT (23), and Mg2+ can be replaced by Zn2+ but not by Ca2+ or Ni2+ (Fig. 2E). In contrast to avian myeloblastosis virus, murine leukemia virus, or Rous sarcoma virus RT (20, 23, 24), Penelope RT is quite active within a wide range of KCl and NaCl concentrations, and K+ can be replaced by monovalent cations Na+, Rb+, or Li+ (Fig. 2E). All of the above properties are characteristic of retroviral and retroelement RTs and indicate that the Penelope-encoded activity represents a genuine RNA-directed DNA polymerase.

Role of Conserved Residues in the EN Domain. To find out whether the identified EN domain is functional, we expressed the C-terminal part (residues 684–757) of the Penelope ORF in E. coli and purified it by using an N-terminal His6 tag (Fig. 1C). We also expressed and purified six mutant proteins with substitutions of the conserved Tyr residues in the VVY-YIG motif (Y730H and Y746H) and four highly conserved residues identified in comparisons of GIY-YIG domains (R757A, H781G, D792A, and E800G) (ref. 5 and Fig. 6) to determine whether they are important for cleavage. We chose to analyze the bacterially expressed rather than baculovirus-expressed EN because of the need for appropriate negative controls, which could be provided only by generating a set of site-specific mutants, a difficult and time-consuming procedure in large baculovirus vectors containing the full-length ORF.

Purified proteins were tested in EN activity assays by using the 38-bp labeled double-stranded DNA fragment containing the naturally occurring Penelope insertion site (ps1; Materials and Methods) as a target. To maximize the sensitivity of the assay, so that reduced cleavage by mutant proteins could be detected, the target fragment was labeled to high specific activity by incorporation of two [α-32P]dATP molecules at both ends (Fig. 3A Upper). The assays revealed that the EN mutants Y730H, Y746H, and E800G had no detectable catalytic activity, whereas both H781G and D792A exhibited a reduced level of cleavage compared with that of the WT enzyme (Fig. 3A). Interestingly, the R757A mutant in terms of catalytic activity behaves essentially as the WT protein.

Fig. 3.

Fig. 3.

Properties of Penelope-encoded EN and its mutant derivatives. (A and B) Cleavage and DNA-binding assays. The 38-bp ps1 target (A Upper) was labeled with [α-32P]dATP at both ends (in bold italics). Lane C, control without protein. The remaining seven lanes in A and B represent cleavage and DNA-binding assays, respectively, for the EN protein and its mutant variants: Y730H (2%); Y746H (0%); R757A (100%); H781G (27%); D792A (5%); E800G (0%); and WT, wild-type Penelope EN (100%); numbers in parentheses designate the relative cleavage activity with respect to WT obtained from PhosphorImager quantitative scans of gel A. To suppress target cleavage, binding reactions in B were performed without Mg2+ and in the presence of EDTA; however, residual cleavage is still visible for the most active R757A and WT proteins. (C) Binding of WT Penelope EN to the ps1 target at different concentrations of poly(dI)·poly(dC). Lane C, control without protein; WT, wild-type Penelope EN in a standard binding reaction with 0.1 mg/ml BSA. In the remaining lanes, increasing amounts of poly(dI)·poly(dC) (from 0.025 to 1 μg) were added to the binding reaction, as indicated. (D) Determination of cleavage sites of Penelope EN. The 34-bp fragment of ps1 was used as a target, with the top and bottom strand labeled as indicated. Lane L, 1-bp DNA ladder; WT, labeled fragments digested by WT EN; Y746H, labeled fragments digested by the EN mutant. Target site duplication (TSD) is shown by brackets, and the most prominent cleavage products, which become visible after a 1-h incubation, are shown in bold italics. (E) Ligation of the labeled ps1 fragment after digestion by the WT Penelope EN. Lane C, labeled fragment without protein; WT, labeled fragment digested with EN. The remaining lanes show ligation of the cleavage products after 3-, 10-, and 30-min incubation with T4 DNA ligase.

Nucleic Acid Binding and Cleavage Properties of the EN Domain. Gel mobility-shift assays (Fig. 3B) demonstrate that mutations in the EN domain do not affect DNA-binding properties of the protein. However, neither mutant nor WT proteins exhibit a high degree of specificity in terms of binding with the target DNA because the addition of a 1,000-fold excess of poly(dI)·poly(dC) to the labeled target completely inhibits binding (Fig. 3C). Nevertheless, the addition of poly(dI)·poly(dC) in high concentrations after brief incubation of the target with the EN protein does not displace the label from the preformed DNA–protein complexes (data not shown).

Use of the target labeled on both strands (Fig. 3A), although providing increased sensitivity, precluded high-resolution mapping of cleavage sites. To map their position precisely, we used a 34-bp fragment labeled at either end independently, enabling us to distinguish nicks in the top and bottom strand and to generate the sequencing ladder for each strand. The results presented in Fig. 3D indicate that only the bottom strand is cleaved (WT, bottom strand) and show the exact position of cleavage sites within the fragment. The hot spot does not exactly coincide with the target-site duplication (TSD), but definitely occupies at least one-half of it. Digestion of the top strand appears to be extremely weak if it occurs at all.

To find out whether the cleavage products possessed 5′ PO4 and 3′ OH ends, which would be essential for their ability to serve as primers for reverse transcription, we incubated these products with T4 DNA ligase (Fig. 3E). Efficient ligation of the cleavage products after 30-min incubation with T4 DNA ligase, which requires the presence of the 5′ PO4 and 3′ OH groups for its activity, is consistent with generation of 3′ OH groups by Penelope EN.

Penelope EN Specifically Acts on a Sequence Within the Element. An interesting property of Penelope and its relatives is the ability to form a partial-tandem structure, with a full-length copy preceded by another, often 5′ truncated, copy in the same orientation. This ability may lead to the formation of structures reminiscent of retroviral LTRs, and it is noteworthy that the only Penelope copy shown to be functionally active (p6; Fig. 1 A) is organized in exactly this way. A peculiar feature of this structure, also found in fish, frog, and sea urchin PLEs (refs. 6 and 7; I.R.A., unpublished data), is a short (34- to 37-bp) tail, which is normally present at the 5′ end of the full-length copy and may or may not be present at its 3′ end, thus creating a 3′ end heterogeneity (Fig. 1 A). Because insertion of an additional copy into this target would result in creation of an LTR-like structure, we sought to test whether the Penelope EN may exhibit any preferences in recognition and/or cleavage of this potential target sequence.

A deletion series of four plasmids was used to test this hypothesis (Fig. 4). Incubation of plasmids PB and 1B, containing the entire 34-bp tail, and plasmid 2B, retaining 20 bp of this sequence, with the WT EN resulted in a significant decrease in levels of supercoiled plasmid DNA, whereas incubation with the mutant version (Y746H) did not result in any such decrease. Plasmid 3B, which had 26 of 34 nucleotides of the potential target site removed, did not undergo any changes upon incubation (Fig. 4B) and neither did pUC19 without an insert (data not shown). Thus, removal of most of the 34-bp tail from the target plasmid results in loss of its ability to be recognized by EN and/or to serve as a substrate for cleavage. It may also be noted that the ps1 site is cleaved less efficiently in comparison with PB, 1B, and 2B.

Fig. 4.

Fig. 4.

Assay for cleavage activity of the Penelope EN with different targets. (A) A diagram of Penelope-containing inserts of the PB-3B deletion series. Numbers correspond to the position of the endpoints with respect to the transcription start site in the 5′ LTR (position 1 in GI:16152117). The 34-bp tail, located at the boundary between the upstream (5′ LTR) and downstream Penelope copies in a partial-tandem (Fig. 1), is designated by a thick line, and its corresponding nucleotide sequence is given in uppercase in Upper. (B) Agarose gel electrophoresis of the plasmids PB, 1B, 2B, 3B, and ps1 incubated for 5 h at 27°C in the cleavage buffer with wild-type (w) or mutant Y746H (m) Penelope EN or without protein addition (-). The positions of supercoiled (sc) and open-circular (oc) plasmid DNA are indicated. Lane L, 1-kb plus DNA ladder (GIBCO/BRL).

Discussion

Penelope ORF Encodes an Active RT. Alignments of the predicted Penelope RT with other RT sequences revealed conservation of the seven diagnostic RT motifs (25), suggesting that Penelope encodes an RT (4). These motifs are well conserved in all PLEs but exhibit significant deviations, both in sequence and in spacing, from those of canonical RT from non-LTR and LTR retrotransposons. In this study, we have expressed and purified the Penelope polyprotein and detected an associated RT activity. This finding provides biochemical evidence that Penelope encodes an active RT, indicates that many intact PLEs inhabiting diverse eukaryotic genomes may also code for functional RT, and confirms that PLEs are bona fide retroelements.

The presence of introns in genomic PLE copies in D. virilis and bdelloid rotifers (8) could be explained either by frequent use of an RNA-independent pathway of PLE transposition, or by reverse transcription of minor unspliced transcripts, such as the one recently detected in D. virilis (K.I.P. and M.B.E., unpublished data). It is worth noting that retroviruses primarily use the full-length unspliced RNA for reverse transcription and integration, although they also encode subgenomic RNAs generated by splicing, which are necessary for production of env and/or gag proteins (26). In addition, the fact that at least one intronless Penelope fragment was detected in Drosophila melanogaster genomic DNA after germ-line transformation (12) also provides support in favor of the ability of PLEs to use, at least in some cases, the conventional RT-based pathway of transposition. Detection of RT activity in the Penelope polyprotein opens avenues for developing an in vitro system to further investigate the molecular mechanisms of PLE transposition, which may reveal novel mechanistic features that are different from those used by non-LTR or LTR retrotransposons.

Penelope ORF Encodes an Active EN. PLEs are the only retrotransposons known to contain a Uri EN domain, and here we show that in the case of Penelope it exhibits EN activity and requires amino acids corresponding to the most conserved residues in GIY-YIG proteins. This activity could cleave the target DNA to initiate reverse transcription and integration by a mechanism similar to that used by R2Bm, L1, and probably other non-LTR retrotransposons (2729), as well as group II introns (30), arguing in favor of the ancestral nature of this mechanism.

Whereas mutagenesis of most of the highly conserved residues results in substantial or complete loss of cleavage activity, this is not the case for the highly conserved R757. Although R27 is essential for catalytic activity of I-TevI (31), as is the equivalent R42 of the UvrC excision repair EN (32), in our experiments the R757A substitution did not affect Penelope EN activity. However, several active GIY-YIG ENs (i.e., all Seg proteins) do not contain either motif B with the invariant Arg or any other positively charged amino acids in this region, which argues against the indispensable role of this Arg (16, 33). The neighboring highly conserved H761 may be involved in catalysis instead, as suggested for I-TevI (17).

Penelope appears to have preferred sites of integration in D. virilis and D. melanogaster genomic DNA (12, 34, 35). Examination of Penelope insertion sites in these species did not reveal any consensus recognition sequence, although all target-site duplications were very A+T-rich. Our results suggest that the Penelope EN cuts DNA with some sequence specificity, but this is probably insufficient to account for the observed hot spots of integration.

In our experiments cleavage of only one DNA strand was observed, whereas on the other strand it was practically undetectable. This selectivity is not unusual, because UvrC and related structure-specific Slx1 ENs cut only one strand (32, 36), although GIY-YIG homing ENs and freestanding Seg ENs normally exhibit cleavage of both DNA strands (37, 38). For group II retrointrons, the HNH EN domain associated with RT was shown to be responsible for the antisense-strand but not sense-strand cleavage, which is provided by intron RNA (39). If dimerization, either transient or stable, is required for cleavage of both strands, it is possible that the EN domain on its own lacks the sequences involved in dimerization. However, in our experiments with a two-plasmid system in E. coli, expression of the full-length Penelope ORF led to conversion of the supercoiled target plasmid into open-circular molecules (K.I.P. and M.B.E., unpublished data). It is also possible that cleavage of the second strand occurs only at very low levels and/or with a significant delay after the first-strand cleavage, or requires the presence of the corresponding RNA.

The cleavage specificity of the Penelope EN observed in plasmids containing the 34-bp tail provides a plausible explanation for the previously noticed tendency of Penelope to form partial-tandem structures, resulting in an LTR-like arrangement (4, 12). Such specificity may serve as a mechanism that ensures preservation of the complete element by enabling the 5′ LTR formed in this way to provide the site of transcription initiation for the ORF located downstream, in a manner analogous to the tandemly arranged HeT-A elements of Drosophila (40).

PLEs as a Previously Unrecognized Class of Retroelements: Evolutionary Considerations. The demonstration of RT and EN activities associated with the Penelope polyprotein is of evolutionary significance, taking into account the recently documented wide distribution of PLEs (refs. 58 and this study) and the key role of Penelope in D. virilis hybrid dysgenesis (4). Along with the discovery of a retroelement from Caenorhabditis elegans that encodes a cysteine protease instead of a typical aspartate protease (41), these observations suggest that nonorthologous domain displacement (42) might have occurred during evolution of retroelements. The Uri domain could have been acquired by Penelope through recombination with a DNA virus encoding a GIY-YIG EN, such as a baculovirus. However, given the early divergence of the PLE-TERT branch from the non-LTR-retrotransposon and LTR-retrotransposon lineages (8), it also appears likely that this domain could have been acquired by independent fusion with the ancestral RT domain after its divergence from the TERT lineage. A third, more remote, possibility would be a direct descent of the PLE lineage from a hypothetical group II intron-like ancestor containing both the RT and the GIY-YIG domain, with subsequent branching of the TERT lineage accompanied by loss of the GIY-YIG domain. Like the PLE RT, the predicted EN sequence is quite distinct from other Uri-domain proteins, supporting the ancient and monophyletic origin of this retroelement lineage.

Supplementary Material

Supporting Information
pnas_101_41_14719__.html (21.4KB, html)

Acknowledgments

We thank I. Granovsky and M. Shlyapnikov (Institute of Biochemistry and Physiology of Microorganisms, Pushchino) for expert assistance with protein expression and purification. Research was supported by grants for Russian Basic Science, a grant of the Physico-Chemical Biology Program Russian Academy of Sciences (to M.B.E.), Wellcome Trust Collaborative Research Grant 06522 (to D.J.F. and M.B.E.), and National Science Foundation Grant MCB-0239075 (to M. S. Meselson).

Author contributions: K.I.V., I.R.A., and M.B.E. designed research; K.I.P., I.R.A., and N.V.M. performed research; K.I.P., I.R.A., D.J.F., and M.B.E. analyzed data; and I.R.A., D.J.F., and M.B.E. wrote the paper.

Abbreviations: EN, endonuclease; PLEs, Penelope-like elements; RT, reverse transcriptase; TERT, telomerase RT.

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. AF418571, AF446087, and U49102).

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
pnas_101_41_14719__.html (21.4KB, html)
pnas_101_41_14719__2.pdf (95.1KB, pdf)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES