Abstract
Non-long-terminal-repeat (non-LTR) retrotransposons amplify their copies by reverse transcribing mRNA from the 3′ end, but the initial processes of reverse transcription are still unclear. We have shown that a telomere-specific non-LTR retrotransposon of the silkworm, SART1, requires the 3′ untranslated region (3′ UTR) for retrotransposition. With an in vivo retrotransposition assay, we identified several novel motifs within the 3′ UTR involved in precise and efficient reverse transcription. Of 461 nucleotides (nt) of the 3′ UTR, the central region, from nt 163 to nt 295, was essential for SART1 retrotransposition. Of five putative stem-loops formed in RNA for the SART1 3′ UTR, the second stem-loop (nt 159 to 221) is included in this region. Loss of the 3′ region (nt 296 to 461) in the 3′ UTR and the poly(A) tract resulted in decreased and inaccurate reverse transcription, which starts mostly from several telomeric repeat-like GGUU sequences just downstream of the second stem-loop. These results suggest that short telomeric repeat-like sequences in the 3′ UTR anneal to the bottom strand of (TTAGG)n repeats. We also demonstrated that the mRNA for green fluorescent protein (GFP) could be retrotransposed into telomeric repeats when the GFP coding region is fused with the SART1 3′ UTR and SART1 open reading frame proteins are supplied in trans.
Non-long-terminal-repeat (non-LTR) retrotransposons are endogenous mobile genetic elements that are widespread among the genome of eukaryotes. Non-LTR retrotransposons multiply their copies through reverse transcription of RNA intermediates with a self-encoding reverse transcriptase (RT). Non-LTR retrotransposons, some of which are called long interspersed nuclear elements (LINEs) in vertebrates, contribute to genome structure and evolution through their replication process (5, 8, 10, 21). In humans, LINEs have accumulated up to 21% of the genome, and they are the major source of insertional mutagenesis. LINEs shape the mammalian genome through exon shuffling, mobilization of short interspersed nuclear elements, and processed pseudogene formation (4, 6, 10, 18, 20). However, we have little knowledge of the molecular basis underlying these genomic events because the retrotransposition mechanisms of non-LTR retrotransposons are insufficiently understood, compared with those of other retroelements, such as LTR retrotransposons and retroviruses.
Non-LTR retrotransposons encode an RT domain and an endonuclease (EN) domain. Pioneering studies of the Bombyx non-LTR element R2 showed that the EN domain nicks the bottom strand of target DNA and that the RT domain uses the 3′-hydroxyl end of the nicked DNA as a primer for reverse transcribing non-LTR element RNA (12). This reverse transcription initiation process is termed target-primed reverse transcription (TPRT) and is inherent to all non-LTR retrotransposons. During TPRT, reverse transcription is initiated at the 3′ end of non-LTR elements. Most genomic copies of non-LTR retrotransposons are 5′ truncated, presumably because of the arrest of reverse transcription. However, all integrated copies have the precise 3′-terminal region and 3′ truncation is rarely observed (5, 8). These observations reflect the processes of reverse transcription peculiar to non-LTR retrotransposons and show the necessity of recognizing some structure near the 3′ ends for their initial steps. The requirement of the 3′ UTR for retrotransposition is reported in R2 (13), and UnaL2 elements in eels (9). However, there are elements such as human L1s that apparently have no strict sequence requirements in the 3′-end sequences except for the polyadenylation tail (4, 6, 19). Therefore, we do not have the general picture of the initial processes of reverse transcription, especially of how the RT recognizes the 3′ UTR RNA of non-LTR elements.
A telomere-specific non-LTR retrotransposon, SART1, that inserts itself between TT and AGG of the silkworm (TTAGG)n telomeric repeats (23, 25) is a good model with which to study the TPRT mechanisms. We have developed an in vivo assay system in which retrotransposition is easily detected by PCR (24). With this system, we are trying to enumerate all of the functional regions in open reading frame 1 (ORF1) and ORF2 required for SART1 retrotransposition, which should enable us to clarify the TPRT mechanisms from various aspects (17). In a previous report, we showed that SART1 loses its retrotransposition ability when 461 nucleotides (nt) containing the 3′ UTR and a poly(A) tract are deleted from the element (24). The retrotransposition activity of the 3′ UTR-deficient SART1 mutant was not rescued in trans by mutants with amino acid substitutions in Zn fingers and EN and RT domains, which lost their activities. However, the activities of the latter mutants were trans complemented by the 3′ UTR-deficient mutant. These observations show that the 3′ UTR of SART1 RNA, but not other RNA regions, is essential for initiation of TPRT.
In this study, to understand how and which region of the 3′ UTR functions in SART1 retrotransposition, we have analyzed the effects of deletions in the 3′ UTR on transposition of SART1. The retrotransposition assay showed that 132 nt of the central region of the 3′ UTR, which features a firm stem-loop structure, are essential for retrotransposition. The remaining part of the 3′ UTR and the poly(A) tail were required for precise initiation of reverse transcription and efficient retrotransposition. We also show that possessing the 3′ UTR sequences is sufficient for RNA to go through retrotransposition by SART1 proteins.
MATERIALS AND METHODS
Northern hybridization.
Approximately 3 × 105 Spodoptera frugiperda 9 (Sf9) cells in a 12-well plate were infected with SART1-containing Autographa californica nuclear polyhedrosis virus (AcNPV) at a multiplicity of infection of 10 PFU per cell. Total RNA was isolated from Sf9 cells with TRIZOL (GIBCO-BRL) at 48 h postinfection. Aliquots of 3 μg of RNA per lane were electrophoresed at 5 V/cm on 18% formaldehyde-20 mM MOPS (3-morpholinopropanesulfonic acid, pH 7.0)-5 mM sodium acetate-1 mM EDTA-0.9% agarose gels and blotted onto nylon membranes (Biodyne A Membrane; Pall BioSupport) in 10× SSC (1.5 M NaCl, 0.15 M sodium citrate). After prehybridization, the membranes were hybridized with each probe at 42°C overnight in 40% formamide-10× Denhardt's solution (0.2% each bovine serum albumin, Ficoll, and polyvinylpyrrolidone)-5× SSC-250 μg of salmon sperm DNA per ml-50 mM NaPO4 (pH 7.0)-10% dextran sulfate. The probes were labeled with [α-32P]dCTP with Ex-Taq polymerase (Takara) by PCR. The primer sets used to generate the probes are listed in Table 1.
TABLE 1.
Primer | Sequence (5′→3′) | Generated derivative(s) or use |
---|---|---|
SART1-S3014-NcoI | AAAAAACCATGGGCAGCAGCCCTTATCATATACTAC | ORF2+3′UTR |
SART1-S6222-NotI | AATAATAATTGCGGCCGCGGACCGTCGGGCG | SART1 WT (+NotI)-pAcGHLTB |
SART1-A6600-BglII | AAAAAAAGATCTGGAAGAAACAGGAAGAAGTCG | 1-379, 73-379 |
SART1-S6501-NotI | AATAATAATAGCGGCCGCTGAACTCAGCCCAGC | 279-461/(A)20 |
SART1-A6682-BglII | AAAAAAAGATCTGGTATCGATGGGGAATCCC | ΔpolyA |
SART1-S6294-NotI | AATAATAATTGCGGCCGCGGGCGCTGTGGCTC | 73-379, 73-379/(A)20, 73-295 |
SART1-A6516-BglII | AAAAAAAGATCTGGGCTGAGTTCAGCTC | 73-295 |
SART1-A6600+20A-BglII | AAAAAAAGATCT(T)18TGGAAGAAACAGGAAGAAGTCG | 1-379/(A)20, 73-379/(A)20 |
SART1-A6704-BglII/BamHI | TTTTTTGGATCCAGATCT(T)19GGTATCGATGGGGAATC | 279-461/(A)20 |
SART1-S6517 | CGCGCCTTTTTCAAGGCGTAGTCTCC | Δ163-297, Δ228-297 |
SART1-A6293 | TATACCCTCACCACCACCACTGGACTATCG | Δ73-162, Δ73-227 |
SART1-S6384 | TGTGGGGGGCCTGCGGGG | Δ73-162, Δ73-227 |
SART1-S6449 | GAGCTCGTTGGGTTTTAGTCGGTAGTCGTTAAG | Δ73-227, Δ163-227 |
SART1-A6383 | CCGGCTCCCAGCCTGACGAAC | Δ163-227, Δ163-297 |
SART1-A6448 | CTATCTTTCCGGCATAGGGGGAACCTACGATAC | Δ228-297 |
pAcGHLTB-S2183 | CCTATAAATACGGATCTGTATTCATGTCCC | GST-His6 probe |
pAcGHLTB-A2810 | GGCCATGCTATATACTTGCTGGATTTCAAG | GST-His6 probe |
pAcGHLTB-S3032 | CGACTCTGCTGAAGAGGAGGAAATTC | Polyhedrin 3′ region probe |
pAcGHLTB-A3430 | CAAGATTTGGCAAGTTTTGTGGCGTTGAG | Polyhedrin 3′ region probe |
hsp_pEGFP1-A2330T-S | GCGGCCGCGACTCTAGATCTTAATCAGCCATAC | hsp-EGFP1-BglII |
hsp_pEGFP1-A2330T-A | GTATGGCTGATTAAGATCTAGAGTCGCGGCCGC | hsp-EGFP1-BglII |
pEGFP1-S96-EcoRI | AAAAAAGAATTCATGGTGAGCAAGGGCGA | EGFP1/S1-3′UTR-pVL1393 |
pEGFPI-A901 | GGGGGAGGTGTGGG | EGFP1/S1-3′UTR-pVL1393 |
pEGFPI-S688 | GACAACCACTACCTGAGCACC | 3′ junction PCR |
pEGFPI-A576 | GTTCTTCTGCTTGTCGGCCATGATATAG | 5′ junction PCR |
Plasmid construction.
To construct plasmid clone SART1 ORF2 + 3′-pAcGHLTB, the SART1 ORF2 3′ UTR portion was amplified by PCR from genomic library clone BS103 with primers SART1-S3014-NcoI (Table 1) and SAX3P-NotI (24). PCR was conducted for 30 cycles with Pfu Turbo DNA polymerase (Stratagene). The PCR product was subcloned between the NcoI and NotI sites of the pAcGHLTB plasmid (Pharmingen). Constructs used in the retrotransposition assay to identify the essential region of the 3′ UTR were generated in either of two ways. The primers used for plasmid construction are listed in Table 1. Constructs SART1 WT (+NotI)-pAcGHLTB, ΔpolyA, 1-379/(A)20, 1-379, 73-379/(A)20, 73-379, 73-295, and 279-461-(A)20 were generated by PCR amplifying the SART1 3′ UTR portion of SART1 WT-pAcGHLTB and subcloning it into the NotI and BglII sites of SART1Δ3′-pAcGHLTB (24). To generate constructs SART1 Δ73-295, Δ73-227, Δ73-162, Δ163-227, Δ163-295, and Δ228-295, portions of SART1 WT-pAcGHLTB other than those deleted were amplified by inverse PCR with 5′-phosphorylated primers and then self-ligated.
EGFP1/S1-3′UTR-pVL1393 was generated by the following procedure. The Drosophila hsp promoter was subcloned into the HindIII site of pEGFP1 (Clontech). A BglII site was introduced into this plasmid by inverse PCR with primers hsp_pEGFP1-A2330T-S and hsp_pEGFP1-A2330T-A. This construct, hsp_pEGFP1-BglII, contains a NotI site and a BglII site that immediately follows the EGFP1 protein. Next, the SART1 3′ UTR portion was PCR amplified with SART1-S6222 NotI and SART1-A6704 BglII/BamHI from SART1 WT-pAcGHLTB and subcloned between the NotI and BglII sites of hsp_pEGFP1-BglII. In the resulting plasmid, hsp-pEGFP1-SART1-3′UTR, the SART1 3′ UTR resides downstream of EGFP1. Third, the EGFP1-SART1 3′ UTR portion was amplified with pEGFP1-S96-EcoRI and pEGFP1-A906 from hsp-pEGFP1-SART1-3′UTR and subcloned into the EcoRI and BglII sites of pVL1393 (Pharmingen).
Recombinant AcNPV generation.
Sf9 cells were propagated as monolayer cultures at 27°C in TC-100 medium supplemented with 10% fetal bovine serum (Katakura Co., Nagano, Japan) in the presence of penicillin-streptomycin (GIBCO-BRL). The recombinant baculovirus containing the wild-type, mutant, or chimeric SART1 portion driven by the polyhedrin promoter was produced by cotransfection of the wild-type, mutant, or chimeric SART1-pAcGHLT-B/pVL1393 plasmid with BaculoGold DNA (Pharmingen) into Sf9 cells with the Tfx-20 lipofection reagent (Promega). The medium was collected 4 days later and used for plaque purification and subsequent virus propagation in accordance with the manufacturer's (Pharmingen) instructions.
In vivo retrotransposition assay by PCR.
Approximately 3 × 105 Sf9 cells were infected in a 12-well plate with SART1-containing AcNPV at a multiplicity of 10 PFU per cell. The genomic DNA was extracted 72 h postinfection as previously described (24). PCR assays were conducted with Ex-Taq or LA-Taq (Takara) in the presence of TaqStart Antibody (Clontech) with ∼1 μg of Sf9 DNA. The reaction mixture was denatured at 94°C for 3 min, followed by 30, 35, or 40 cycles of 98°C for 20 s, 62°C for 30 s, and 72°C for 1 min for the SART1 3′ junction and 40 cycles for the 5′ junction. Five microliters of each mixture was subjected to 2% agarose electrophoresis in Tris-acetate-EDTA buffer and visualized by ethidium bromide staining. PCR products were directly cloned into the pGEM-T Easy vector (Promega). The cloned products were sequenced with a BigDye Terminator cycle sequencing kit (Applied Biosystems) on an ABI 310 genetic analyzer and an ABI 3100 genetic analyzer. Sequence analysis was carried out with Vector NTI Suite version 7.1 (Informax).
RESULTS
Primary and secondary structures of the SART1 3′ UTR.
First, to focus on the 3′ UTR sequences essential for SART1 retrotransposition, we compared the 3′ UTR sequences of SART1 (GenBank accession number D85594) with those of WISH Bm1 (NV060754), SARTPx1 (AB078931), and the Bombyx genomic clone of SART1 (SARTBmGC; AB088394) (Fig. 1A) (11). WISHBm1 is a derivative of SART1 that has lost its sequence specificity. SARTPx1 is a SART1 element from the swallowtail butterfly, Papilio xuthus, that also integrates into the same site of (TTAGG/CCTAA)n telomeric repeats. SARTPx1 and WISHBm1 constitute a monophyly with SART1. The sequence alignment was conducted by simply matching bases. The 3′ UTR sequences between SART1 and SARTBmGC were highly conserved; only 21 of 461 nt in total are substituted, and only 4 nt are deleted in SARTBmGC. Between SART1 and WISHBm1, the only conserved regions are the first ∼60 nt and the last ∼80 nt in the 3′ UTR. The 3′ UTR sequence of SARTPx1 does not show such a remarkable sequence homology with SART1, except for ∼60 nt of the 3′-terminal region. As far as we compared the primary sequences, only ∼60 nt at the 3′ terminus were highly conserved among the SART1-related elements.
Next, we predicted the RNA secondary structure of the SART1 3′ UTR by computer with the mFOLD program (16, 28). The SART1 3′ UTR showed five putative firm stem-loop structures (Fig. 1B). Stem-loops 1, 2, and 5 were also conserved in SARTBmGC, where all base substitutions (shown by /N) were compensating changes, and deletions were in paired nucleotides of stems (G-C pairs in stem 2). However, three out of four substitutions in stem-loop 3 altered the secondary structure, suggesting that this region is less important than the others. In WISHBm1 and SARTPx1, the only conserved structure was stem-loop 5, as was predicted from its primary structure.
Readthrough transcripts of AcNPV-expressed SART1.
To understand the initial process of SART1 reverse transcription, we next attempted to characterize by Northern hybridization the 3′ ends of the SART1 transcripts, which were produced in a baculovirus-mediated in vivo assay system (Fig. 2). We extracted total RNAs from Sf9 cells infected with AcNPV including various SART1 constructs (SART1 WT, wild-type SART1; Δ3′, SART1 without the 3′ UTR; ORF2+3′UTR, SART1 lacking ORF1) (Fig. 2A). We prepared a glutathione S-transferase (GST)-His6 probe (probe a) and a polyhedrin 3′ region probe (probe b) that both originated from the baculovirus transfer vector (Fig. 2A). Hybridization with probe a showed smeary bands of ca. 8 kb, which correspond to GST-fused SART1 transcripts in all of the constructs (Fig. 2B, probe a). Probe b hybridized with all of the transcripts from Sf9 cells infected with AcNPV, which express three SART1 constructs, but not with RNAs from Sf9 free of virus infection (Fig. 2B, probe b). The major band that hybridized with probe b was about 8 kb in SART1 WT and in Δ3′ but about 6 kb in ORF2+3′UTR, which reflects the deletion of the ORF1 region (2,148 bp long). This indicates that detectable amounts of readthrough transcripts that include the downstream polyhedrin 3′ UTR sequences are synthesized in baculovirus-mediated SART1 expression. In SART1 WT and ORF2+3′UTR, the original poly(A) tract (A20) contiguous to the 3′ UTR was subcloned into the transfer vector. Thus, transcription continues through the poly(A) tract until reaching the polyhedrin 3′ regions in these clones.
The in vivo retrotransposition assay showed that the 3′ junction to the telomeric repeat of retrotransposed SART1 WT is the poly(A) tract adjacent to the 3′ UTR sequence (24) (Fig. 3A, WT). These observations suggest that reverse transcription is started from the SART1 poly(A) tract in the readthrough RNA. In this case, SART1 RT must recognize both the 3′ UTR and the poly(A) tract of the RNA in initiating reverse transcription. Another possibility is that there are transcripts ending with the SART1 poly(A) tract that could not be effectively distinguished by Northern hybridization and that they serve as the template for reverse transcription.
A poly(A) tract at the end of the 3′ UTR is necessary for efficient and accurate retrotransposition of SART1.
To investigate the domains in the 3′ UTR including the poly(A) tract that are required for SART1 retrotransposition, we generated a series of SART1-expressing AcNPVs with partial deletions (Fig. 3A, constructs 1 to 10) and assayed their in vivo retrotransposition abilities with a system we have established (24). The AcNPV-expressed SART1 in Sf9 cells retrotransposes into the telomeric repeats (TTAGG)n in a highly sequence-specific manner. We detected the retrotransposition of SART1 by amplifying the 3′ junction of SART1 to telomeric repeats by PCR with primers +6096 and (CCTAA)6 (Fig. 3A) by using genomic DNAs from SART1-expressing AcNPV-infected Sf9 cells as the PCR template.
Figure 3B shows the results of the in vivo retrotransposition assay for the 3′ UTR mutants listed in Fig. 3A. To clarify the differences in the retrotransposition frequencies (RFs) and abilities of respective SART1 constructs, we performed PCRs with 30, 35, and 40 cycles and compared the band patterns (Fig. 3B). In all cycles, the wild-type SART1 construct, SART1 WT (+NotI), showed an intense 600-bp PCR band, which represents the retrotransposed 3′ junction region composed of 570 bp plus telomeric repeats. On the basis of the band intensity, we defined the RF for this construct as +++ (Fig. 3A).
Compared to wild-type SART1, ΔpolyA (Fig. 3A, construct 3) which is the same as SART1 WT but without a 20-bp poly(A) tract, showed a very weak PCR band of 400 bp at 35 cycles (Fig. 3B, lane 3). Sequence analysis of PCR products indicated that the band represented retrotransposition from the internal sequence of the 3′ UTR (Fig. 4A, lane 3). The band density of ΔpolyA became more intense in 40 cycles; however, it was more dilute than the wild-type band, which seemed to have reached the plateau phase in 30 cycles. These data suggest that RF (shown as +) is less effective and retrotransposition is inaccurate (Fig. 4A) in this construct. Thus, the poly(A) tract at the end of the 3′ UTR of SART1 is required not for retrotransposition itself but for accurate and effective reverse transcription of SART1 mRNA.
Regions in the SART1 3′ UTR essential for in vivo retrotransposition.
As with ΔpolyA, we also found that a number of 3′ UTR mutations did not abolish retrotransposition but reduced its efficiency or altered its specificity. Constructs 1-379 and 1-379/(A)20, in which the 3′-terminal about 80 bp are deleted, showed weak 400-bp PCR bands (Fig. 3, constructs and lanes 5 and 6), which are about 100 bp smaller than the expected size when the correct reverse transcription occurs. Sequence analyses revealed that these bands also represented inaccurate retrotransposition (Fig. 4A, lanes 5 and 6). Similar results were obtained for constructs 73-379, 73-379/(A)20, and 73-295 (Fig. 3B, lanes 7, 8, and 9, respectively), whether or not the constructs include the poly(A) tract, suggesting that at least the most 3′-terminal region (379 to 461) is involved in efficient and accurate retrotransposition but is not required for retrotransposition itself.
A more important finding is that a mutant with a large deletion in the 5′ portion of the 3′ UTR [construct 279-461-(A)20 (Fig. 3A)] showed no band but only smears even in 40 cycles (Fig. 3B, lane 4), as in Sf9 (negative control) and Δ3′UTR (Fig. 3A, construct 2). We cloned and sequenced the PCR products, which were produced from primer-dimers but not from retrotranposed copies (Fig. 4A, lanes 2 and 4). This confirmed that construct 279-461-(A)20 abolished the retrotransposition ability. Because the 72 nt in the 5′-terminal region of the 3′ UTR were not essential in three constructs (73-379, 73-379/(A)20, and 73-295), we next generated construct Δ73-295 and found that this mutant did not produce PCR bands (Fig. 3, construct and lane 10) or retrotransposed copies (Fig. 4A, lane 10). These results indicate that region 73-295 is indispensable for SART1 retrotransposition. In construct 73-295, out of the three conserved stem-loops, 1 and 5 are disrupted and only stem-loop 2 presumably remains.
Internal start of reverse transcription from telomeric repeat-like sequences within the 3′ UTR in several mutants.
The in vivo retrotransposition assay demonstrated that mutants without a poly(A) tract or without region 379-461 showed decreased accuracy of retrotransposition. In order to understand how reverse transcription is started in mutants, we cloned and sequenced the PCR products in mutants that showed abnormal-sized bands. We obtained 52 clones from ΔpolyA, 1-379, 73-379/(A)20, 73-379, and 73-295 and have summarized the 3′ junction sequences of retrotransposed SART1 mutants in Fig. 4A. Remarkably, we could not detect nontemplated additional nucleotides at the 3′ end of the inserted sequence of SART1, although such additional nucleotides were observed in R2Bm (13) and human L1 (3) retrotransposition.
Of the 3′ junction sequences of ΔpolyA (Fig. 4A, lane 3), six out of nine clones initiated reverse transcription from an internal sequence within the 3′ UTR. There were three internal initiation sites, +236, +241, and +271 (+1 being the 5′ end of the 3′ UTR of SART1 [Fig. 4B]), that reside within a short sequence of only 36 nt. Surprisingly, all of these clones end with GTT or GGTT at the 3′ terminus of the retrotransposed copy and are further followed by the telomeric repeats AGG(TTAGG)n of the host genome. Of the remaining three, two clones initiated reverse transcription from just 5 nt downstream of the 3′ UTR (AGATC is a readthrough sequence and is derived from the vector). The remaining single clone (+3338) initiated reverse transcription from ∼250 bp downstream of the 3′ UTR in the readthrough RNA derived from the vector sequence. Interestingly, the boundary sequence to telomeric repeats was GGTTT, similar to the case of internal initiation.
The other mutants also initiated reverse transcription, mostly from internal sequences, among which the majority were telomeric repeat-like GGTT (+206, +241, and +271), GTT (+236), and GT (+240 and +270): 4 clones in 1-379, 5 clones in 73-379(A)20, 16 clones in 73-379, and 6 clones in 73-295. Other than the GGTT-like sequences above, AG (+245, +252, and +261), AGG (+311), and TAG (+252) were used to start reverse transcription. All of these sequences were part of the TTAGGTTAGG telomeric repeats.
In constructs 73-379 and 73-295, we found three clones that initiated reverse transcription from the poly(A) tail, which was apparently generated by polyadenylation at position +239 or +240. The irregular polyadenylation may be due to severe deletion of 3′ UTR sequences and to the lack of (A)20 in these mutants.
In summary, inaccurate reverse transcription in the above mutants started with GGUU and GUU as templates, which accounted for 71% of the total 3′ junction sequences. In these clones, it is interesting that reverse transcription appears to continue telomeric repeats such as GGTTAGGTT (presumed reverse-transcribed portion underlined). Furthermore, other telomeric repeat-like start sites, AG, AGG, and TAG, also ensured the continuity of the telomeric repeats, such as TAGGTTAGG or AGGTTAGG. In all, telomeric repeat-related initiation sites add up to 85%. Most (76%) of the above inaccurate reverse transcription was initiated from only four sites, shown in Fig. 4B.
The stem-loop structure and the internal reverse transcription initiation sites make up the essential region.
To further characterize the 3′ UTR that is essential for retrotransposition, we divided region 73-295 into three sections (73-162, 163-227, and 228-295) and generated mutants with each section deleted (Fig. 5A). Stem-loop 2 comprises region 163-227 except for the first 5 nt. The four major start sites for inaccurate reverse transcription (+236, +241, +252, and +271 [Fig. 4B]) are included in region 228-295. An in vivo retrotransposition assay showed that a weak PCR band representing the retrotransposition event was only observed in Δ73-162 (Fig. 5B). The decrease in the PCR band size coincides with the size of the deletion in this mutant. However, all of the constructs that lack regions 163-227 and 228-295 did not show retrotransposition, indicating that both the stem-loop 2 region (163-227) and the region including telomeric repeat-like reverse transcription initiation sites (228-295) are necessary for SART1 retrotransposition. Although we do not know the exact function of the telomeric repeat-like sequences from 228-295, we suggest that the GGUU (or UAG) telomeric repeat-like sequences of SART1 RNA interact with the target telomeric DNA (the AACC bottom strand [see Fig. 7 and Discussion]).
SART1 ORF proteins can retrotranspose GFP mRNA with the SART1 3′ UTR into telomeric repeats.
We observed that a large portion of the 3′ UTR in SART1 is necessary either for retrotransposition or for precision and efficiency of retrotransposition. Next, to determine whether the SART1 3′ UTR is sufficient to confer retrotransposition, we asked whether an enhanced GFP (EGFP) mRNA fused to the SART1 3′ UTR could retrotranspose into telomeric repeats.
For this purpose, we made construct EGFP1/S1-3′UTR by connecting the EGFP-encoding gene with SART1 3′ UTR sequences and poly(A) (Fig. 6Aa). Because two different SART1 mutants can recover the ability to retrotranspose by trans complementation (24), we coinfected SART1 Δ3′ (Fig. 6Ab) or the 2D699V mutant (Fig. 6Ac), which has abolished RT activity (24), with EGFP1/S1-3′UTR and assayed if EGFP1/S1-3′UTR was inserted into telomeric repeats. Sets of primers that were designed for amplifying EGFP sequences and telomeric repeats (Fig. 6A) were used for the in vivo retrotransposition assay. A distinct 700-bp band was observed in the 3′ junction PCR assay when coinfected with Δ3′ (Fig. 6B, left, lane a + b). To confirm that the retrotransposition occurred accurately, the PCR products were cloned and sequenced. Of the nine clones sequenced, only one initiated reverse transcription from the internal sequences within the SART1 3′ UTR but the remaining eight were reverse transcribed precisely from the poly(A) tract at the end of the 3′ UTR of EGFP1/S1-3′UTR (data not shown). Furthermore, when primers EGFP1-A576 and (TTAGG)6 were used to detect the 5′ junction of the retrotransposed copy, the PCR band was also detected (Fig. 6B, right, lane a + b). The PCR band size was in good accordance with the full-length insertion of 610 bp plus the telomeric repeat length for the 3′ junction and with 543 bp plus the telomeric repeat length for the 5′ junction.
In contrast, EGFP1/S1-3′UTR did not show retrotransposition when infected alone (Fig. 6B, lane a) or with the 2D699V mutant (Fig. 6B, lanes a+c), indicating that retrotransposition of EGFP1/S1-3′UTR is mediated by the RT activity of the SART1 ORF proteins provided in trans from Δ3′. This suggests that the 3′ UTR sequence of SART1 is sufficient for RNA recognition of the SART1 RT unit and that any genes having the SART1 3′ UTR can be retrotransposed effectively into the telomeric repeats when trans complemented by 3′ UTR-deficient SART1.
DISCUSSION
Functional structures in the 3′ UTR of SART1.
Previous studies of the R2 and UnaL2 elements have suggested that the RT unit in non-LTR retrotransposons recognizes a specific RNA secondary structure within the 3′ UTR in the initial step of TPRT (9, 15), although the protein-RNA interaction has not been characterized.
In this study, we found five putative stem-loops in the 3′ UTR of SART1. Comparative studies showed that stem-loop 5 (+403 to +429) near the 3′ end of the 3′ UTR is the most conserved feature of the SART1-related elements (Fig. 1). Deletion of the 3′-terminal 3′ UTR (+380 to +461) resulted in a decrease in retrotransposition efficiency and loss of precision in reverse transcription initiation. Stem-loop 5 may function in initiating reverse transcription from the poly(A) tail, because mutants rarely initiate reverse transcription from the poly(A) tract when region 380-461 is deleted. However, in vivo retrotransposition assays revealed that stem-loops 1, 4, and 5 are not involved in essential steps of retrotransposition. Stem-loop 2 was conserved only among the two SART1 genomic clones of the silkworm, but when deleted, the retrotransposition ability was abolished, suggesting that this stem-loop is an essential structure recognized by the RT unit. Because stem-loop 2 is not conserved in SARTPx1 and WISHBm1, stem-loops formed at different positions may be recognized for these forms.
From the retrotransposition assay, we found that deletion of poly(A) from SART1 WT causes aberrant and inefficient reverse transcription initiation (Fig. 3 and 4, ΔpolyA). This indicates that the poly(A) tract is not critical for retrotransposition but is important in initiating reverse transcription. This is different from human L1, which is believed to recognize just the poly(A) tract itself (19). Drosophila I factor has TAA repeats at the 3′ end instead of the poly(A) tail. The deletion of TAA repeats also affects the reverse transcription initiation site and the efficiency of reverse transcription in I factor. The functional role of the poly(A) tract in SART1 may be similar to TAA repeats of I factor (1) rather than the poly(A) tail of L1.
Reverse transcription from telomeric repeat-like sequences.
Deletion of the poly(A) tract and the latter part of the 3′ UTR decreased retrotransposition efficiency and altered the reverse transcription initiation sequences. The 3′ junction of wild-type SART1 on the genome is composed of 3′ UTR-(A)nAGGTTAGGTTAGG, although most of the 3′ junction sequences of the above mutants were GGTTAGGTTAGG or TAGGTTAGG (the presumed SART1-derived sequence is underlined). In these mutants, reverse transcription was initiated mainly at the internal GGUU sequences concentrated from +236 to +271 within the 3′ UTR, not from the poly(A) tract. There are many GGUU or UAG sequences in SART1 mRNA, but only four sites in a very restricted region of the 3′ UTR are selected for initiating reverse transcription. Why does reverse transcription start exactly from the telomeric repeat-like sequences in region 236-271? One possible explanation is that GGUU telomeric repeat-like sequences might interact with the target telomeric (CCTAA) strand.
We propose a hypothetical model in Fig. 7, which shows the specific interaction between the SART1 RNA and the target telomeric DNA during initiation of the TPRT reaction. Although the biochemical features of the SART1 EN domain have not been analyzed, SART1 EN is presumed to nick between CCT and AA of the (CCTAA) bottom strand specifically, on the basis of junction sequence analyses of SART1 in the Bombyx genome (25), in vivo retrotransposition assays (24), and trans complementation experiments (T. Anzai and H. Fujiwara, unpublished data). After the bottom strand of the target telomeric DNA is nicked by the SART1 EN, RNA of SART1 mutants is presumably bound to the RT domain by stem-loop 2 and bound to the 3′-CCAA-5′ sequence of the future primer strand DNA by the 5′-GGUU-3′ sequences (Fig. 7A). When the mutant SART1 RNA anneals to the telomeric DNA CCAA sequence, a thymine residue (T) at the 3′ end of the nicked bottom strand will be removed because the T does not pair with the RNA strand. Recently, human apurinic/apyrimidinic EN 1, with a structure similar to that of the EN domains of many LINEs including SART1 (14), was reported to have 3′-to-5′ exonuclease activity in mismatched DNA pairs (2). Thus, the EN domain of SART1 might remove the superfluous mismatched nucleotides on the bottom target DNA, which results in the start of reverse transcription from the nucleotide next to the CCAA site. If this hypothesis is correct, reverse transcription must start from the next nucleotide 5′ to the GGUU sequence as an RNA template with the CCAA sequence as a DNA primer (Fig. 7Ba). However, we do not know exactly whether the junction GGTT sequence in the retrotransposed copies of mutants originates from the host genome or newly synthesized by reverse transcription.
Other telomeric repeat-like sequences, AGG, AG, and TAG, in the SART1 3′ UTR were also selected as initiation sites for reverse transcription in the above mutants (Fig. 4A). Similar to the GGUU initiation mechanisms, the selection of these sites seems to be dependent on the annealing of 3′ UTR RNA to the telomeric DNA (Fig. 7Bb). The SART1 EN originally nicks the bottom strand between A and T, but its 3′-to-5′ exonuclease activity further eliminates the CTCA sequence because it is not annealed to the 3′ UTR RNA.
Recently, we found similar phenomena in an in vivo retrotransposition assay of the R1 element (Anzai and Fujiwara, unpublished data), which is a 28S ribosomal DNA-specific retrotransposon (27). Wild-type R1 frequently used TG or TGT in the 3′ UTR of RNA as the initiation sites for reverse transcription. The TG or TGT nucleotides correspond to the 5′ end of the target DNA of R1 (TGTCCCTATCTACT). These observations suggest that the interaction between the RNA template and the target DNA may occur and facilitate the initial step of TPRT in wild-type SART1 (Fig. 7C), although we do not have direct evidence of this. The fact that four of the initiation sites of SART1 RNA mentioned above are located within the essential region between positions +228 and +295 supports this hypothesis.
Recognition of 3′ UTR by RT of non-LTR retrotransposons.
The necessity of the 3′ UTR for retrotransposition has been reported in other non-LTR retrotransposons. For instance, the common 3′ UTR structure of UnaL2 and UnaSINE1 enables retrotransposition of both elements (9). SART1 is similar to UnaL2 in the sense that they both have stem-loop structures in the essential region. However, SART1 requires the full-length 3′ UTR sequence for efficient and precise insertion, so the entire 3′ UTR may play a role in determining RNA structure, as was suggested for R2 (15).
Human L1 is able to retrotranspose under conditions in which most of the 3′ UTR is deleted, and it is likely that RT recognizes the poly(A) tail to initiate reverse transcription (19). According to such relaxed recognition of template RNA, human L1 possibly recognizes huge numbers of cellular mRNAs with a poly(A) tail. Both L1-mediated processed pseudogene formation and 3′ transduction have been demonstrated in the human genomes and in experimental results (6, 8, 18, 26). Human L1 prefers acting on self-mRNA (cis preference) rather than other mRNA (trans complementation) (26). This feature of L1 should minimize the frequency of genome rearrangement caused by its relaxed recognition of L1 mRNA.
In contrast, SART1 recognizes the 3′ UTR sequences strictly and works in trans effectively on other SART1 mRNAs. This is supported by the observation that the complete length of the SART1 3′ UTR fused with the gene for EGFP is more effectively retrotransposed than SART1 mutants with deletions in the 3′ UTR when SART1 ORF proteins are provided in cis. Because SART1 has strict recognition of its 3′ UTR, its action in trans on other mRNAs is only restricted for SART1 species and does not cause processed pseudogene formation. These specialized features of SART1—strict targeting, strict 3′ UTR recognition, and in-trans action—may be correlated with the control of telomere length in the silkworm, which has lost or attenuated its telomerase activity (7, 22).
The above-described transposition system has several advantages for gene delivery. A long mRNA gene to be delivered can be manipulated easily because only the 460-bp SART1 3′ UTR is put on the 3′ end of the desired mRNA. Furthermore, the retrotransposed copies are stable in the genome because the enzymatic unit SART1 ORF proteins are provided separately in trans.
Acknowledgments
This work was supported by a grant from the Ministry of Education, Science and Culture of Japan (15370003) and by a Grant-in-Aid from the Research for the Future Program of the Japan Society for the Promotion Science.
REFERENCES
- 1.Chambeyron, S., A. Bucheton, and I. Busseau. 2002. Tandem UAA repeats at the 3′-end of the transcript are essential for the precise initiation of reverse transcription of the I factor in Drosophila melanogaster. J. Biol. Chem. 277:17877-17882. [DOI] [PubMed] [Google Scholar]
- 2.Chou, K. M., and Y. C. Cheng. 2002. An exonucleolytic activity of human apurinic/apyrimidinic endonuclease on 3′ mispaired DNA. Nature 415:655-659. [DOI] [PubMed] [Google Scholar]
- 3.Cost G. J., Q. Feng, A. Jacquier, and J. D. Boeke. 2002. Human L1 element target-primed reverse transcription in vitro. EMBO J. 21:5899-5910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dewannieux, M., C. Esnault, and T. Heidmann. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35:41-48. [DOI] [PubMed] [Google Scholar]
- 5.Eickbush, T. H. 1992. Transposing without ends: the non-LTR retrotransposable elements. New Biol. 4:430-440. [PubMed] [Google Scholar]
- 6.Esnault, C., J. Maestre, and T. Heidmann. 2000. Human LINE retrotransposons generate processed pseudogenes. Nat. Genet. 24:363-367. [DOI] [PubMed] [Google Scholar]
- 7.Fujiwara, H., Y. Nakazato, S. Okazaki, and O. Ninaki. 2000. Stability and telomere structure of chromosomal fragments in two different mosaic strains of the silkworm, Bombyx mori. Zool. Sci. 17:743-750. [Google Scholar]
- 8.International Human Genome Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921. [DOI] [PubMed] [Google Scholar]
- 9.Kajikawa, M., and N. Okada. 2002. LINEs mobilize SINEs in the eel through a shared 3′ sequence. Cell 111:433-444. [DOI] [PubMed] [Google Scholar]
- 10.Kazazian, H. H., Jr. 2000. Genetics. L1 retrotransposons shape the mammalian genome. Science 289:1152-1153. [DOI] [PubMed] [Google Scholar]
- 11.Kojima, K. K., and H. Fujiwara. 2003. Evolution of target specificity in R1 clade non-LTR retrotransposons. Mol. Biol. Evol. 20:351-361. [DOI] [PubMed] [Google Scholar]
- 12.Luan, D. D., M. H. Korman, J. L. Jakubczak, and T. H. Eickbush. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595-605. [DOI] [PubMed] [Google Scholar]
- 13.Luan, D. D., and T. H. Eickbush. 1995. RNA template requirements for target DNA-primed reverse transcription by the R2 retrotransposable element. Mol. Cell. Biol. 15:3882-3891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793-805. [DOI] [PubMed] [Google Scholar]
- 15.Mathews, D. H., A. R. Banerjee, D. D. Luan, T. H. Eickbush, and D. H. Turner. 1997. Secondary structure model of the RNA recognized by the reverse transcriptase from the R2 retrotransposable element. RNA 3:1-16. [PMC free article] [PubMed] [Google Scholar]
- 16.Mathews, D. H., J. Sabina, M. Zuker, and D. H. Turner. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288:911-940. [DOI] [PubMed] [Google Scholar]
- 17.Matsumoto, T., H. Takahashi, and H. Fujiwara. 2004. Targeted nuclear import of open reading frame 1 protein is required for in vivo retrotransposition of a telomere-specific non-long terminal repeat retrotransposon, SART1. Mol. Cell. Biol. 24:105-122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Moran, J. V., R. J. DeBerardinis, and H. H. Kazazian, Jr. 1999. Exon shuffling by L1 retrotransposition. Science 283:1530-1534. [DOI] [PubMed] [Google Scholar]
- 19.Moran, J. V., S. E. Holmes, T. P. Naas, R. J. DeBerardinis, J. D. Boeke, and H. H. Kazazian, Jr. 1996. High frequency retrotransposition in cultured mammalian cells. Cell 87:917-927. [DOI] [PubMed] [Google Scholar]
- 20.Okada, N., M. Hamada, I. Ogiwara, and K. Ohshima. 1997. SINEs and LINEs share common 3′ sequences: a review. Gene 205:229-243. [DOI] [PubMed] [Google Scholar]
- 21.Ostertag, E. M., and H. H. Kazazian, Jr. 2001. Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res. 11:2059-2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sasaki, T., and H. Fujiwara. 2000. Detection and distribution patterns of telomerase activity in insects. Eur. J. Biochem. 267:3025-3031. [DOI] [PubMed] [Google Scholar]
- 23.Takahashi, H., and H. Fujiwara. 1999. Transcription analysis of the telomeric repeat-specific retrotransposons TRAS1 and SART1 of the silkworm Bombyx mori. Nucleic Acids Res. 27:2015-2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Takahashi, H., and H. Fujiwara. 2002. Transplantation of target site specificity by swapping the endonuclease domains of two LINEs. EMBO J. 21:408-417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Takahashi, H., S. Okazaki, and H. Fujiwara. 1997. A new family of site-specific retrotransposons, SART1, is inserted into telomeric repeats of the silkworm, Bombyx mori. Nucleic Acids Res. 25:1578-1584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wei, W., N. Gilbert, S. L. Ooi, J. F. Lawler, E. M. Ostertag, H. H. Kazazian, J. D. Boeke, and J. V. Moran. 2001. Human L1 retrotransposition: cis preference versus trans complementation. Mol. Cell. Biol. 21:1429-1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Xiong, Y., and T. H. Eickbush. 1988. The site-specific ribosomal DNA insertion element R1Bm belongs to a class of non-long-terminal-repeat retrotransposons. Mol. Cell. Biol. 8:114-123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zuker, M., D. H. Mathews, and D. H. Turner. 1999. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide, p. 11-43. In J. Barciszewski and B. F. C. Clark (ed.), RNA biochemistry and biotechnology. Kluwer Academic Publishers, Dordrecht, The Netherlands.