Abstract
The LINE-1 (L1) retrotransposon is an ancient genetic parasite that has written around one-third of the human genome through a ‘copy and paste’ mechanism catalysed by its multifunctional enzyme, open reading frame 2 protein (ORF2p)1. ORF2p reverse transcriptase (RT) and endonuclease activities have been implicated in the pathophysiology of cancer2,3, autoimmunity4,5 and ageing6,7, making ORF2p a potential therapeutic target. However, a lack of structural and mechanistic knowledge has hampered efforts to rationally exploit it. We report structures of the human ORF2p ‘core’ (residues 238–1061, including the RT domain) by X-ray crystallography and cryo-electron microscopy in several conformational states. Our analyses identified two previously undescribed folded domains, extensive contacts to RNA templates and associated adaptations that contribute to unique aspects of the L1 replication cycle. Computed integrative structural models of full-length ORF2p show a dynamic closed-ring conformation that appears to open during retrotransposition. We characterize ORF2p RT inhibition and reveal its underlying structural basis. Imaging and biochemistry show that non-canonical cytosolic ORF2p RT activity can produce RNA:DNA hybrids, activating innate immune signalling through cGAS/STING and resulting in interferon production6–8. In contrast to retroviral RTs, L1 RT is efficiently primed by short RNAs and hairpins, which probably explains cytosolic priming. Other biochemical activities including processivity, DNA-directed polymerization, non-templated base addition and template switching together allow us to propose a revised L1 insertion model. Finally, our evolutionary analysis demonstrates structural conservation between ORF2p and other RNA- and DNA-dependent polymerases. We therefore provide key mechanistic insights into L1 polymerization and insertion, shed light on the evolutionary history of L1 and enable rational drug development targeting L1.
Subject terms: X-ray crystallography, Transposition, Cryoelectron microscopy
X-ray crystallography, cryo-electron microscopy, structural modelling, biochemistry, cell biology, and evolutionary analysis enable characterization of ORF2p, the reverse transcriptase of the ancient ‘parasitic’ LINE-1 retrotransposon that has written around one-third of the human genome.
Main
Recent primate transposon evolution is dominated by RNA ‘copy and paste’ retrotransposons that insert RNA intermediates into the genome by encoded reverse transcriptase (RT) activity9. These retrotransposons are divided into two classes: (1) endogenous retroviruses (ERVs), flanked by long terminal repeats (LTRs); and (2) the non-LTR retrotransposon long interspersed element-1 (LINE-1, L1)1. ERVs are no longer thought to be active in humans1. By contrast, each person inherits about 100 polymorphic and fixed potentially active L1s, a small subset of the approximately half a million inactive L1 copies and fragments1. LINEs have been coevolving with their hosts for 1–2 billion years, since the emergence of eukaryotes. Human L1 encodes two proteins, ORF1p10 and ORF2p, the latter having endonuclease (EN) and RT activities11–13, along with three other domains with unknown functions (Fig. 1a,b). ORF2p cotranslationally binds its encoding L1 RNA, a property termed ‘cis preference’14–17, forming a ribonucleoprotein (RNP) complex with many copies of ORF1 and host proteins10,15,17–19 (Fig. 1b). New insertions begin with the target primed reverse transcription (TPRT) priming mechanism: an EN nick on the ‘bottom’ DNA strand liberates a DNA 3′-OH used to prime RT and generate an RNA:DNA hybrid intermediate20–23. The details of TPRT in L1, second strand synthesis and how the resulting intermediates are resolved remain unclear, although it is known that a subsequent staggered break in the second ‘top’ DNA strand24 results in a characteristic target site duplication of typically less than 20 base pairs (bp) flanking L1-mediated insertions24,25. Despite its cis preference, ORF2p also binds and inserts other RNAs, including messenger RNA sequences and short interspersed element RNAs such as Alu.
Derepressed L1 elements can contribute to the pathology of cancer, ageing, neurodegeneration and inflammation (mechanisms posited in Fig. 1b). Consistent with this, RT inhibitors have shown promising results in model systems6–8,26,27 and in clinical studies of colorectal cancer28 and Aicardi–Goutières syndrome, a rare Mendelian interferonopathy characterized by accumulation of L1 intermediates4,27,29. However, our knowledge of the mechanistic details of both L1 insertion and how L1 contributes to pathophysiology is limited. The best characterized L1 relatives are insect R2 LINE elements21 and bacterial group II mobile introns30,31, which lack the amino-terminal apurinic/apyrimidinic EN (APE)-like EN of ORF2p12,13 and diverged from the human lineage around 700 million and 4 billion years ago, respectively. Both recognize and mobilize unique DNA and RNA sequences, limiting comparison with L1.
To address knowledge gaps in L1 biology and facilitate the potential for drug discovery, we have established systems to purify both full-length ORF2p and a minimal ‘core’, characterized ORF2p RT activity, and determined its structure using various modalities. Our investigation revealed (1) efficient RT priming by short RNAs and hairpins; (2) direct cytosolic synthesis of RNA:DNA hybrids that activate cGAS-STING, resulting in interferon production; (3) a series of conformational adaptations in the ‘right-handed’ fingers, palm and thumb RT fold that are likely to modulate biochemical activities required for the replication cycle of L1; (4) the presence of two previously undescribed domains in the RT core, which we name ‘tower’ and ‘wrist’; and (5) concerted dynamics of the N-terminal EN and carboxy-terminal domain (CTD). Informed by this structure, we elucidate the evolutionary relationships between conserved structural features in ORF2p. Our results shed light on previously enigmatic steps in the L1 replication cycle, its roles in pathophysiology and potential routes to therapeutics.
Purification of highly active ORF2p RT
Previous efforts to measure ORF2p enzymatic activity have been limited by an inability to purify more than trace amounts of ORF2p RT, with limited characterization of impure enzyme indicating that ORF2p may be able to perform DNA synthesis using RNA or DNA templates20,32,33. Here, we optimized purification of the ORF2p core (residues 238–1061) to yield milligram quantities of more than 99% pure enzyme (Fig. 1c) that was monomeric (Extended Data Fig. 1a) and highly active against oligo(A) templates (Extended Data Fig. 1b), enabling structural and kinetic analyses, as well as single-base-resolution assays with various substrates and inhibitors.
A 2.1 Å crystal structure of the ORF2p core
To characterize domains of ORF2p of previously unknown function, understand how these domains interact during priming and reverse transcription, and elucidate the structural basis of differential RT inhibition as a basis for rational drug design, we solved the crystal structure of ORF2p core in an active configuration, using an AlphaFold model for molecular replacement (Extended Data Table 1 and Extended Data Fig. 1c). The structure represents a ternary complex with an incoming deoxythymidine triphosphate (dTTP) nucleotide and a template–primer heteroduplex containing a three-nucleotide (nt) 5′ overhang in the RNA template and 3′ dideoxy-terminated DNA primer. The complex crystallized in space group C2, with one monomer in the asymmetric unit. The structure (Fig. 1d) reveals the fingers, palm and thumb of a characteristic right-hand RT fold but also shows key differences compared with other RTs. Two folded domains which we name ‘wrist’ (863–1061) and ‘tower’ (240–440, Figs. 1d and 2, described below) are absent from other known structures of RT enzymes from viruses or mobile elements. All five domains make extensive contact with the bound nucleic acid (Supplementary Methods, Fig. 1d inset diagram and Extended Data Fig. 1e).
Extended Data Table 1.
Five ORF2p core domains all bind nucleic acid
As in other RTs, the fingers, palm and thumb domains form a groove that cradles the RNA template–DNA primer heteroduplex. Nucleotide positions in the template and primer are numbered n−3 to n+10 relative to 5′, and n−1 is the templating ribonucleoside and incoming deoxyribonucleoside triphosphate (dNTP) (Fig. 1d, insets, and Extended Data Fig. 1e). We identify template contacts in both new domains: the tower contacts the 5′ RNA template at the n−3 base, and the wrist makes multiple contacts with the downstream region of the template (3′ end). The overall configuration of the active site and resultant catalytic mechanism are highly conserved throughout RTs and related polymerases30,34: in a region of the palm termed the N-site, the incoming dNTP base pairs with the n−1 base on the template and is poised for covalent linkage to the 3′ hydroxyl of the primer n+1 deoxyribose ring. The catalytic triad of aspartic acids (D600, D702, D703) resides at the active site and coordinates a Mg2+ ion and the dNTP; D702 and D703 form the base of the FADD loop (Fig. 1d, inset). The gatekeeping residue F605 has an aromatic side chain that selects against ribonucleotides with a 2′ hydroxyl, which probably explains the inability of ORF2p to function as an RNA-dependent RNA polymerase (RdRp); Extended Data Figs. 1d and 4c and Supplementary Fig. 3c). The 5′ upstream RNA template enters ORF2p above the fingertips, with eight residues contacting n−3, including hydrogen bonding between the base and an extended palm loop and the tower. The template next interacts with the R0 loop, which forms a ‘lid’ over the template RNA. This loop is a portion of the R0 region, also called the N-terminal extension (NTE)-0, which is found in non-LTR retrotransposons, the group IIC intron and HCV RdRp, but not in viral RTs30, and has been demonstrated to be important for template jumping and/or switching activity35,36 (‘Domain comparison of ORF2p and other RTs’). The downstream template makes extensive interactions continuing until the n+8 position with fingers, palm, wrist and thumb (Fig. 1, diagram). The DNA primer is contacted through the n+5 position, held upstream by the primer grip and downstream by the thumb with the helix clamp at its base.
Structure of the L1 wrist domain
The wrist domain (863–1061) has not been previously recognized, although experiments deleting large portions of the wrist and the subsequent CTD have shown that both domains are required for efficient retrotransposition37. Scanning mutagenesis also has shown numerous wrist regions required for retrotransposition38. The fold consists of 12 helices anchored to the RT through interactions with the thumb helices and palm through a helix at residues 573–581 and a short β turn at residues 688–695. Searches on similarity servers Dali and Foldseek show weak similarity to a sterile alpha motif-like domain, indicating possible roles in nucleic acid binding or protein–protein interactions. In the structure, the wrist makes numerous backbone contacts with the RNA template through n+4 to n+7, and trialanine mutants spanning these residues have resulted in reduced or no retrotransposition activity38.
ORF2p cryo-electron microscopy structures in three states
We next measured the thermal stability of ORF2p in differential scanning fluorometry assays, in which heat-induced denaturation results in increasing exposure of the hydrophobic core of the protein and resultant binding and fluorescence of the SYPRO Orange dye. Apo ORF2p, lacking bound nucleic acid, was unstable, with a melting temperature (Tm) of 34.1 ± 0.4 °C. ORF2p was markedly stabilized by binding single-stranded RNA (ssRNA) (ΔTm = 14.4 ± 0.6 °C) and further stabilized by binding an RNA:DNA hybrid (Fig. 2a; ΔTm from ssRNA-bound = 2.7 ± 0.4 °C, ΔTm from apo = 16.1 ± 0.4 °C). To understand the structural changes resulting from binding of the primer and template, we used single-particle cryo-electron microscopy (cryo-EM; Extended Data Table 2 and Supplementary Figs. 1 and 2) to obtain reconstructions of ORF2p in three distinct states: in an active ternary complex with incoming dTTP and template–primer; bound to oligo-25(A) ssRNA; and in apo form (to 3.30, 3.66 and 4.06 Å resolution, respectively; Extended Data Fig. 2a). This is the first reported structure of an RT bound with ssRNA in the active site.
Extended Data Table 2.
The density for the active ternary complex was complete and facilitated building of a structural model with clear density for the incoming dNTP, Mg2+ and template–primer (Fig. 2b, inset left). The cryo-EM-derived atomic model was predominantly indistinguishable from the crystal structure, with an overall root mean square deviation (RMSD) of 1.01 Å in tower–fingers–palm–thumb. There was apparent flexibility between the wrist and the rest of ORF2p, but the wrist fold itself was predominantly unchanged between the two structures (wrist backbone RMSD of 4.04 Å, aligned wrist RMSD = 1.01 Å, overall RMSD including wrist 3.68 Å; Extended Data Fig. 2b). Comparison of heteroduplex and ssRNA-bound states revealed distinct template paths (template RMSD of 3.76 Å; Fig. 2b, inset right) but overall maintenance of similar contacts through movement of flexible loops, notably in the palm and wrist domains. Intriguingly, although the structure was not as high resolution, the apo ORF2p was found in a ‘thumb up’ conformation, in which the template binding and active sites were accessible; by contrast, apo viral RTs assumed an inactive ‘thumb down’ conformation, in which the thumb occupied the nucleic-acid-binding site (Extended Data Fig. 2c,d). This ‘thumb up’ conformation, the instability of the apo protein and tight RNA binding are likely to contribute to the cis preference of L1.
Structure of the L1 tower domain
ORF2p contains an N-terminal APE-like EN13 and is the first such retrotransposon to be structurally characterized; other classes of non-LTR retrotransposons have C-terminal restriction-like ENs (RLE)22–24. The tower domain (239–440) corresponds to the region between the EN and RT domains and consists of four key components, (1) a baseplate (residues 254–300), (2) the protruding tower helices (residues 301–370), (3) the subsequent tower lock (residues 374–382) and (4) a PIP box helix (PCNA-interacting protein, residues 404–419), and encompasses regions previously termed ‘cryptic’ or ‘desert’38,39. Structure similarity searches did not show significant similarities to other proteins. The tower baseplate (Fig. 2c) was resolved to residue 304 in the crystal and 310 in our EM model. The tower and lock were anchored to RT at two points: (1) by the baseplate to fingers through mostly hydrophobic contacts, and (2) by PIP to the palm and fingers by a mix of hydrophobic and polar interactions. Mutation of key residues in the baseplate reduce retrotransposition39, and PIP orchestrates an ORF2p–PCNA interaction that depends on EN and RT activities and is required for retrotransposition17,18,39. AlphaFold2 modelling indicates that the intervening helices form an elongated hairpin-like tower, which seems to be flexible. Modelling using molecular dynamics simulations and AlphaFold indicated that the tower lock is consistent with orphan density above the n+4 base in low-pass filtered cryo-EM maps of ssRNA-bound ORF2p and may therefore fold down and ‘cap’ the RNA template (Extended Data Fig. 2d). A functionally similar tower lock was present in the smaller tower-like domain in R2, despite sequence divergence (see domain comparison below)22,23. To test the importance of the unresolved tower and tower lock on RT activity, we purified ORF2p mutants that truncated the tower (Δ302–363) or tower and tower lock (Δ302–389), replacing them with short flexible linkers (Extended Data Fig. 3a,b). Both constructs were active similarly to the wild type in RT assays (Fig. 2d and Extended Data Fig. 3c,d), but trialanine mutagenesis has shown no retrotransposition with mutants in various regions of the tower and in the lock38. Together, these data demonstrate that the ORF2p tower is important for L1 retrotransposition but not RT activity. They also indicate that ORF2p fragments consisting of portions of the tower base may be able to bind to the rest of ORF2p in trans, enabling ‘bipartile’ Alu retrotransposition39.
ORF2p RT and polymerase activities
ORF2p can polymerize DNA on RNA or DNA templates (RT or pol activities) with approximately equal efficiency using either DNA or RNA primers. RNA priming of cDNA synthesis on an RNA template is less efficient but still occurs at a significant rate (Fig. 3a and Supplementary Fig. 3a,b). This reduced but significant level of L1 ORF2p RNA priming on RNA templates is in stark contrast with HIV-1 RT, for which only specialized RNA primers are used in initiation, at an efficiency reduced by orders of magnitude40. L1 ORF2p RNA synthesis (RdRp activity) was strongly selected against, with minimal detectable activity (Extended Data Fig. 4c and Supplementary Fig. 3c). In single-nucleotide additions with long 20 nt primers, ORF2p had no apparent preference for an RNA or DNA template. HIV-1 RT and human ERV K (HERV-K) RT34 also accept both templates and have roughly ten-fold and two-fold higher efficiency of single nucleotide incorporations than L1 ORF2p, respectively. By contrast, whereas ORF2p efficiently extended 5 nt DNA primers on DNA or RNA templates, HIV-1 RT had markedly reduced efficiency with 6 nt primers in RT reactions, was incapable of reverse transcribing a 5 nt primer, and did not extend primers 5–10 nt long on DNA templates (Extended Data Figs. 4a,b and 5a,b). ORF2p was highly processive and unaffected by a heparin competitor, whereas HIV-1 RT was significantly less processive at baseline and did not produce full-length template with a heparin competitor in any condition (Extended Data Fig. 5c).
ORF2p also consistently produced larger products of two types, which increased with both longer reaction times and higher concentrations of reaction components: (1) non-templated addition (NTA, or 3′ tailing), in which single bases are added beyond the 5′ end of the template; and (2) template jumping or template switching products, in which polymerization of the same cDNA strand (copy of template1) continues on a new incoming template molecule (template2) that is accepted and copied, making a concatemer (copy of template1 + copy of template2 (Supplementary Fig. 4). No NTA or template jumping activities of ORF2p were detectable with HIV-1 RT (Extended Data Fig. 5b). These activities have been well characterized in other non-LTR transposons and are thought to be important for completion of an insertion (‘Discussion’) but have not previously been shown for ORF2p. NTA activity mechanistically explains previously reported ‘5′ extra nucleotides’ or ‘microhomologies’ observed in naturally occurring25 and engineered L1 insertions41,42.
ORF2p is known to tolerate some terminal mismatches in priming in crude RNP complex preparations15,16. In assays with an RNA template terminating in A, ORF2p showed little discrimination against terminal mismatches, with the exception of A:G, which retained some detectable activity. These results are similar to those of previous studies using RNP preparations16, in which the predominant template was presumed to be the poly(A) tail, and the similarity between the two results is evidence that most ORF2p in L1 RNP preparations rests on the poly(A) tail15–17. C:U and T:U internal mismatches at the second-to-last position are also tolerated, along with a UA:TC double mismatch, to a lesser extent. Overall, ORF2p is similarly active to HIV-1 RT but tolerates more mismatches (A:A and A:G mismatches are not tolerated by HIV-1 RT; Extended Data Fig. 4d). This reduced specificity may facilitate priming against diverse cellular sequences.
Requirements for ORF2p priming
ORF2p efficiently extends DNA primers as short as 5 nt on RNA or DNA templates, with slightly lower efficiency at 5 and 6 nt than at 7–20 nt (Fig. 3c and Extended Data Figs. 4b and 5b). This is consistent with requirements of 4–6 bp annealing seen in RNP preparation assays, in which the predominant template is assumed to be the poly(A) tail16, and with the five primer bases that contact ORF2p (Fig. 1d). These priming results led us to investigate whether L1 ORF2p might directly accept and extend short RNA hairpin substrates. ORF2p efficiently extended a previously published 29 nt RNA hairpin containing a 7 nt duplex (Fig. 3d) and a similar hairpin derived from the substrates tested above (Supplementary Fig. 5), even at the lowest dNTP concentration tested (0.1 µM), which was at least ten-fold lower than the physiologic dNTP concentration43. This activity was barely detectable with HIV-1 RT at 100 µM, a difference in activity of at least four orders of magnitude; by contrast, the two enzyme preparations were similarly active in RT reactions (Fig. 3d and Extended Data Figs. 4d and 5b). As recent studies report cytosolic synthesis of Alu cDNA and indicate possible priming against the oligo(A) tail by the pol-III terminal U-tract26, we tested an Alu-derived sequence and found that this hairpin was also efficiently extended by ORF2p (Fig. 3e and Supplementary Fig. 5). In all cases, RNA synthesis was strongly selected against, although more activity was consistently seen at 1 mM NTPs; this concentration is likely to be supraphysiologic for all but ATP43. Together, these results demonstrate that ORF2p can synthesize cDNA primed only by short RNA sequences and hairpins at physiologic concentrations of dNTPs, providing a potential mechanistic basis for its cytosolic RT activity6,7,26.
ORF2p synthesizes cDNA in the cytosol
Various cytosolic single-stranded DNAs (ssDNAs), double-stranded nucleic acids and Alu cDNAs have been identified in senescent cells6,7, retinal cells26 and neural progenitors27, along with L1 ORF1 protein. Although RT inhibitors often reduce or ablate cDNA levels, their origin has remained uncertain. We transfected HeLa and U2-OS cells with plasmids expressing L1 and found robust cytosolic RNA:DNA hybrids in transfected cells that colocalized with both L1 proteins, depended on RT activity, and were unaffected by loss of EN activity. Their formation was inhibited by 50 µM d4T treatment (Fig. 3f and Extended Data Fig. 6a–c). Hybrids were seen using synthetic ORFeus-Hs L1 and native L1RP sequences and with two different detection reagents: S9.6, a well-established monoclonal antibody known also to bind dsRNA under some conditions, and purified catalytically inactive human RNase H1 (dRNH1), which has recently been reported to be more specific for hybrids in imaging experiments. Hybrids were also detectable in some cells in smaller punctae when ORF2p was expressed in the absence of ORF1 (Fig. 3f and Extended Data Fig. 6a–c). As EN-independent retrotransposition occurs at levels at least 100-fold lower than wild type44, these results rule out a nuclear origin for these cytosolic hybrids and demonstrate that L1 can directly synthesize RNA:DNA hybrids in the cytosol.
Synthesized cDNAs activate cGAS/STING
To investigate the consequences of cytosolic L1 RT activity, we used a secreted luciferase interferon reporter in THP1 cells, a leukaemia cell line with monocytic differentiation. Treating THP1 cells with 1 µM decitabine derepresses L1 expression by preventing DNA methylation during replication and results in interferon production28,45,46 (Fig. 3g). Knockout of TREX1 (three-prime repair exonuclease 1), a nuclease that is mutated in Aicardi–Goutières syndrome and systemic lupus erythematous and that has been shown to degrade cytosolic L1 DNA4,27,29, increased both baseline and decitabine-induced interferon levels (Fig. 3g). Both baseline and decitabine-induced interferon levels were reduced by treatment with a cGAS inhibitor (10 µM G140) or RT inhibitor (RTI; 50 µM d4T) (Fig. 3g and Extended Data Fig. 6d,e). As d4T potency was modest in this assay, we tested whether triphosphorylation of d4T was limiting inhibition by synthesizing a POC prodrug of d4T (POC d4T (d4T bis(isopropoxycarbonyloxymethyl)phosphate; Supplementary Fig. 6b)). POC d4T was approximately 30-fold more potent than d4T in suppressing interferon secretion, which provides compelling evidence that d4T triphosphate is the active form that inhibits ORF2p (Fig. 3g and Extended Data Fig. 6e). Together, these results demonstrate that cytosolic cDNA synthesis by L1 results in interferon production through the cGAS/STING pathway.
In vitro inhibition of ORF2p
A critical path towards treating diseases associated with RT activity, such as HIV and HBV infections, is the use of RTIs40. Given the emerging role of L1 in disease, we sought to determine whether current RTIs had activity against ORF2p. Titrating nucleoside triphosphate (NTP) forms of nucleoside RTIs (NRTIs) into gel-based L1 RT assays showed that 3TC (lamivudine, Extended Data Fig. 7a) and carbovir (the active metabolite of abacavir) were modest ORF2p inhibitors (half-maximal inhibitory concentration (IC50) 5–7 µM), whereas d4T (stavudine) and entecavir were more potent (IC50 0.4–0.6 µM, Extended Data Fig. 7a). To enable robust high-throughput inhibition analysis, we developed homogeneous time-resolved fluorescence assays for ORF2p RT. NRTI NTPs all inhibited ORF2p to varying extents, with thymidine analogues dideoxythymidine (ddT) and d4T the most potent (IC50 < 10 nM), followed by AZT and 3TC as modest inhibitors under these conditions (IC50 200–750 nM)33 (Fig. 4a and Extended Data Fig. 7b,c). By contrast, none of the six tested allosteric HIV-1 non-nucleoside RTIs (NNRTIs) inhibited ORF2p; notably, even 1 mM nevirapine showed no inhibition (Fig. 4a, Extended Data Fig. 7c and Supplementary Fig. 6a,b). Using a stable dual luciferase retrotransposition reporter system in HeLa cells, we confirmed previously published modest inhibition of L1 by d4T, 3TC, FTC (emtricitabine), AZT, tenofovir and GBS-149 (IC50 1–5 µM)33 (Extended Data Fig. 7d). GBS-149 potency was not significantly different from that of related 3TC and FTC; the HCV inhibitor sofosbuvir did not inhibit L1 at up to 30 µM (Extended Data Fig. 7d). Differences between the in vitro and cell-based assays may be attributable to differential triphosphorylation of NRTIs.
Structural basis of inhibition of ORF2p
Potency against ORF2p varied almost 200-fold between NRTIs tested, and AZT and 3TC were not potent inhibitors (Fig. 4a). In HIV-1, resistance to 3TC can come from M184 mutations in RT (YMDD to YVDD/YIDD), which cause a steric clash with the oxathiolane ring47. HIV-1 mutants to Ala (YADD, like FADD in ORF2p) have been studied with respect to 3TC potency, demonstrating that van der Waals interactions between M184 and the 3TC oxathiolane ring are stabilizing; these interactions are not present with the smaller A701 (FADD) in ORF2p, and this difference may explain the relatively lower potency of 3TC against L1 ORF2p RT. Modelling the related 3TT-TP analogue into the active site of L1 using the cocrystal structure of dTTP confirmed the proximity of M701 to the oxathiolane ring, whereas the A701 in wild-type L1 was further away. Further supporting this mode of inhibition, 3TC was approximately 15-fold more potent in inhibiting A701M mutant full-length ORF2p (FMDD) than wild type (FADD, Fig. 4b and Extended Data Fig. 7e). On the basis of these results, HIV-1 inhibition40 and analyses of HERV-K34, we conclude that 3TC and related FTC and GBS-149 are unlikely to be selective for L1 ORF2p.
To understand the structural basis underlying differences between AZT and more potent thymidine analogues, we modelled the triphosphates of thymidine-based NRTIs into the ORF2p ternary crystal structure containing dTTP in the N-site. As expected, ddTTP and d4T-TP did not show any clashes with the protein, as they closely resemble the shape of dTTP. However, the AZT-TP model showed a clash of the middle nitrogen of the 3′-azido group with amide hydrogen of F605 (distance 2.03 Å, Fig. 4c), which was not relieved by energy minimization. This clash was not observed in the crystal structure of AZT-TP bound to HIV-1 RT (respective distance 2.28 Å, Fig. 4c). The inability to remove the clash in ORF2p may be explained by a difference in conformational flexibility of the region around the 3′-azido group (residues 602–607 in ORF2p and 112–117 in HIV-1 RT). In ORF2p, this segment contains two internal salt bridges that are absent from HIV-1 RT and has lower average backbone B factors than HIV-1 with respect to the complete dNTP site (defined as all residues within 6 Å of dTTP; site versus region in ORF2p, 43.4 versus 48.1; HIV-1 RT, 114.3 versus 110.7). Calculations on the basis of free energy perturbation simulations of the relative ORF2p binding of these nucleotides showed an insignificant difference in relative binding free energy (ΔG) between ddTTP and d4T, but a large positive difference between these and AZT (Supplementary Fig. 6c), consistent with the greater than 20-fold change in ORF2p inhibitory activity of AZT compared with ddTTP and d4T (Fig. 4a).
As inhibition of telomerase RT (TERT) would be a potential source of toxicity in a therapy, we investigated the relative selectivity of NRTI triphosphates for L1 versus TERT, testing the panel of NRTI triphosphates in a biochemical TERT assay. The tested compounds were generally around 1,000-fold less potent inhibitors of TERT than L1 RT, with IC50 in the mid-micromolar range (for example, the IC50 of d4T-TP was 9 nM versus ORF2p and 15 µM versus TERT; Supplementary Fig. 7a); this result was in line with expectations, because these drugs are all tolerated therapeutically in patients. The structures of the active sites of the two enzymes explain these stark differences, with a more hydrophobic environment in the ORF2p active site (Supplementary Fig. 7b,c). NRTIs designed for HCV RdRp are also unlikely to inhibit L1 as drugs of this class, like sofosbuvir, contain 2′ modifications mimicking the 2′-OH of an incoming ribonucleoside triphosphate. This was first confirmed by modelling of sofosbuvir into the ORF2p active site, which revealed a clash between the sofosbuvir 2′ F and the gatekeeping residue F605; this was further confirmed in cell-based L1 assays, which showed no inhibition by sofosbuvir (Extended Data Fig. 7d and Supplementary Fig. 7d). Together, these results demonstrate that the ORF2p crystal structure provides a useful starting point for structure-based design of new ORF2p-specific NRTIs.
NRTIs act at the RT active site and are known to inhibit ORF2p with varying potency, whereas HIV-1 NNRTIs33 bind to an induced allosteric site in the palm between the primer grip, the β-sheet containing the YMDD loop and the 94–102 segment40; this pocket is absent from HBV, HIV-2 and HERV-K34. HIV-1 NNRTIs do not inhibit ORF2p (Fig. 4a and Extended Data Fig. 7c,d), and structural and sequence differences between the HIV-1 NNRTI pocket and the equivalent region in ORF2p explain this lack of inhibition (Fig. 4d). As HIV-1 RT undergoes a conformational change when NNRTIs bind, the HIV-1 RT structure in the absence of NNRTI was compared with the ORF2p crystal structure. The most striking difference was replacement of the 94–102 segment of HIV-1 RT with a longer α-helix formed by residues 572–588 in ORF2p, making none of these positions structurally equivalent. In addition, residues Y181 and Y188, which have been implicated in aromatic ring stacking with nevirapine and other NNRTIs40, were replaced with S698 and I705, respectively, and the small residue G190 in HIV-1 RT was replaced with bulky Y707 in ORF2p. These differences, taken together, explain why ORF2p does not form a pocket that binds HIV-1 NNRTIs.
Structure of full-length ORF2p
Purified full-length ORF2p was similarly active to the ORF2p core in single-nucleotide-resolution RT assays and was similarly inhibited by 3TC (Fig. 4e, Extended Data Fig. 7f and Supplementary Fig. 8a–c), indicating that EN and CTD may not directly modulate RT activity. Monodisperse full-length ORF2p, bound to the same short RNA17–DNA14 hybrid used above for cryo-EM of the ORF2p core, was analysed by negative stain transmission electron microscopy and found to be monomeric and probably flexible, with two-dimensional classes indicating multiple conformations (Fig. 4f, raw contour, and Supplementary Figs. 9–10). To elucidate the conformational landscape of ORF2p, we used cryo-EM maps, cross-linking mass spectrometry, AlphaFold2 and molecular dynamics simulations to generate an ensemble of conformational states using the Integrative Modeling Platform (Supplementary Figs. 8d,e, 9 and 10 and Supplementary Tables 1 and 2). Informed by AlphaFold2 and molecular dynamics simulations, we first segmented the EN, tower and CTD into 15 rigid bodies connected by 14 flexible linkers and computed an ensemble of integrative models satisfying the input data (Fig. 4g; conformational heterogeneity and model uncertainty is represented as localization densities). The ensemble was then validated by matching computed two-dimensional model projections to negative stain two-dimensional class averages: each class average was assigned a best-matching model and each matched model fit the data better than the parental AlphaFold model (Fig. 4f and Supplementary Fig. 10). Structural clustering of these best-matching models indicated two distinct groups (Fig. 4g and Supplementary Fig. 10), which we named ORF2p open and closed-ring states, that were characterized by unique positions of the EN and tower. Closure of the ring entailed an approximately 48 Å movement of the tower domain (measured from the top of the tower), hinging at the baseplate and bringing it adjacent to the CTD. To test potential roles of these states, we repeated the negative stain EM with ORF2p bound instead to a 376 nt RNA derived from the 3′ end of L1RP with a 14 A tail. Many classes overlapped, but there was also a significantly increased number of closed-ring states and a reduction in open states (Supplementary Fig. 10b–d). We interpret these differences to mean that the closed state may represent a predominant conformation when ORF2p is bound to messenger RNA, whereas the open state may be involved in retrotransposition.
Domain comparison of ORF2p and other RTs
To better understand specific adaptations of ORF2p, we compared it with diverse structurally characterized RTs: the R2 LINE element from the silk moth Bombyx mori (R2Bm)22, the distantly related mobile group IIC intron RT from Geobacillus stearothermophilus (GsI-IIC)30, the RT from LTR element HERV-K34 and HIV-1 RT (Extended Data Fig. 8a). The structure of the group IIC intron was chosen over the evolutionarily closer group IIB intron31 because it represents the same active form with substrate in the active site and is higher resolution, although members of the IIB family were included in the wider evolutionary analysis (see below). ORF2p is larger than the other enzymes, with limited similarity outside the conserved right-hand fingers–palm–thumb subdomains in RTs. Structural alignment of all five enzymes by palm superposition highlighted conserved RT sequence blocks and showed that ORF2p had insertions in fingers (motifs 0, 2a) and palm (motif 3a, 6a) and permutation of the thumb helices compared with both HIV-1 and HERV-K.
Viral and LTR transposon RTs, represented by HIV-1 and HERV-K, are distinct from the non-LTR RTs in that they encode their own RNase H, located C-terminally, and GsI-IIC has a DNA-binding D domain in this position (Extended Data Fig. 8c,d and Supplementary Fig. 11). Other than GsI-IIC D, these CTDs all stabilize the polymerase complex by coordinating downstream nucleic acids but do so in distinct ways. The ORF2p wrist binds the template close to the active site; the connection and RNase H domains of viral/LTR elements bind distally; and, although the linker of R2Bm makes limited and distinct nucleic acid contacts, most of its function seems to be coordination of the activity of the C-terminal RLE domain22,48. In R2Bm, RLE cuts ssDNA, which in the context of initiation is melted from the dsDNA target by the adjacent C-terminal CCHC zinc finger (ZnF)22,24,48. The ORF2p CTD is required for retrotransposition37,38 and has a similarly positioned CCHC motif (Extended Data Fig. 9 and Supplementary Fig. 11) that may also melt target DNA and/or bind single-stranded nucleic acid49, but its function remains unclear.
In comparison with R2Bm, the ORF2p domain topology is reversed: ORF2p apurinic/apyrimidinic endonuclease (APE)-like EN is located N-terminally and cuts dsDNA rather than ssDNA12,13,22,50. Structurally, ORF2p EN sits on the opposite wall of the polymerase groove to R2Bm RLE, atop fingers rather than thumb (Extended Data Fig. 9 and Supplementary Fig. 11). This seems to position the target DNA in reverse orientation to the active site for the two enzymes, although other orientations are possible (Extended Data Fig. 9). The tower of ORF2p seems to play a part in dynamic positioning of the EN. A smaller domain that we term ‘tower-like’ is present in R2 (residues 305–374); this region was previously annotated as NTE-1 and contains the tower lock as well as helices analogous to ORF2p PIP that anchor the tower lock to fingers and palm. However, the PIP box, tower and tower baseplate are not present in R2. R2Bm also has two N-terminal domains, Myb and N-ZnF, that recognize specific ribosomal DNA sequences unique to the element, reflecting the extremely high sequence specificity of R2 for a single site in the ribosomal DNA.
Structural adaptations of ORF2p RT
There are numerous contrasting features of the N-terminal regions of the four RT families (Extended Data Fig. 8b). Viral and LTR RTs have an α-helix posterior to the fingertips, which is absent from the group II intron RT but occupied by the tower-like helix of R2Bm and the PIP helix in ORF2p. The fingertips of all four representative RTs are similar in that they provide a hydrophobic surface for sliding the template bases (notably I515, I517 and I533 in ORF2p), but ORF2p and R2Bm both have a distinctive insertion in the fingertips loop. The upstream template path differs significantly in all four enzymes: in viral and LTR RTs, the 5′ template is pushed away from the fingertips by π-stacking with a characteristic tryptophan (W38 HERV-K, W24 HIV-1), whereas the non-LTR transposons and group II intron have a groove formed by the conserved R0 region with a loop that forms a lid for the template. Here, ORF2p is also distinct: the fingertips for group II intron and R2Bm have an arginine (R63 and R446, respectively) that forms a salt bridge with the n−2 phosphate, pushing the n−3 base away from the posterior side of the fingertips, whereas the analogous residue in ORF2p (T638) is significantly smaller and allows the n−3 base to fold into a hydrophobic pocket created by a loop from the palm anchored by I642. The result of this is an apparently different entry path of the template RNA. The R0 region also differs significantly between ORF2p and the group II intron and R2Bm: the R0 loop in ORF2p is the longest of the three and makes no primer contacts; by contrast, the group II intron and R2Bm both contact the n+6 primer backbone.
In these RT families, the proximal primer is anchored by a conserved primer grip in the palm, which contains a characteristic hydrophobic motif helix clamp (Extended Data Fig. 8c). C-terminal to the primer grip is the thumb domain, a parallel three-helix bundle that occupies the minor groove of the template–primer heteroduplex and makes extensive primer contacts. The thumb in LTR RTs is permuted relative to the other families: the second helix of ORF2p, R2Bm, and the group II intron is functionally analogous to the first α-helix in viral and LTR RTs and contains the helix clamp subdomain at its base30 (Extended Data Fig. 8c). The helix clamp proline in non-LTR RTs (P819 in ORF2p) assumes a similar function to the glycine in LTR RTs and the group II intron, allowing proximity to the minor groove, and the subsequent aromatic residue (Y823 in ORF2p) forms π-interactions with the primer n+2 or n+3 nucleotide backbone. The wrist of ORF2p makes more extensive contacts with the downstream template than either the group II intron D domain or the R2Bm linker.
Structural insight into L1 evolution
L1 dates to at least the Precambrian era51; on the basis of limited sequence similarity, it is speculated to have a putative common ancestor with bacterial mobile group II introns51 and has no clear evolutionary ancestor among extant viruses. We therefore sought to use protein structure to shed light on the conserved features and evolutionary origin of ORF2p that cannot be identified by sequence alignment alone. We used multiple sequence/structural alignments and AlphaFold2 predictions to examine conservation of the human ORF2p structure relative to 57 other L1 ORF2p sequences from vertebrates and plants. By computing and plotting the residue-level diversity of the aligned ORF2ps as the Shannon entropy (Fig. 5a and Supplementary Methods), we found high concordance between the two multiple alignment strategies (sequence versus structural) in the RT domain (fingers–palm–thumb, Supplementary Fig. 12a). Despite relatively lower sequence conservation in regions of the tower, wrist and CTD domains, the structure was conserved, indicating that domain topology may be more important than the sequence of these domains for L1 function. Leveraging data from a published trialanine mutagenesis library of 417 consecutive AAA ORF2p mutants, in which residual function of mutants was compared with that of the wild type (100%)38, we found that structural entropy was significantly correlated with residues dispensable for retrotransposition activity (Fig. 5b,c and Supplementary Fig. 12a). As most mutations resulted in reduced function, these results together indicate that optimization of retrotransposition is a main evolutionary driving force.
We next compared ORF2p and other proteins with the intention of identifying shared structural features and inferring evolutionary relationships. First, we manually curated a set of 50 experimental protein structures that represented main families: RTs, RdRps, DdDps (DNA-dependent DNA polymerases) and DdDps/RdRps, as well as ‘negative controls’ that should have little resemblance to the other proteins (Supplementary Table 3). We then sought to represent structural similarity in a manner that would faithfully account for differences in protein length, account for inherent alignment quantity/quality trade-offs, and address a limitation of other methods, such as RMSD, in which different relative orientations of otherwise identical domains result in poor scores. We developed a new information-theoretic algorithm, named ‘Plexy’, which represents a high-quality alignment as one that reduces the structural perplexity between their coordinates (Supplementary Methods). The smaller this value, the more likely it is that one can ‘guess’ the coordinates of one structure knowing the coordinates of the other. Plotting structural perplexity from ORF2p RT for this set (Fig. 5d and Supplementary Figs. 12b,c and 13) showed that it recapitulates close relationships between ORF2p, R2Bm and group II introns, and that ‘negative control’ proteins have extremely high perplexities from ORF2p. To better understand relationships between full-length ORF2p and other proteins, we computed the pairwise structural distances across all pairs of proteins and normalized them with respect to the size of the two proteins and their alignment, anchoring the plot on the ORF2p crystal structure (Supplementary Methods, Fig. 5e). Across both datasets, proteins in the same functional class typically clustered together in an unsupervised manner, with R2Bm and group II introns again closest to ORF2p. Group IIB introns are thought to be evolutionarily closer to L1 than group IIC, but intriguingly both have similar perplexities from ORF2p with subtle differences in subdomains, highlighting structural conservation (Supplementary Fig. 13). Domesticated cellular RTs were next closest to ORF2p RT, but normalized distances between full-length ORF2p and Prp8 and TERT were larger owing to the incorporation of unrelated structural elements (Supplementary Fig. 12b). Viral RdRps such as HCV and influenza B have remarkable similarity to ORF2p RT30; non-LTR and viral RTs are more distant. Notably, the inactive p51 HIV-1/2 RT subunit was predicted to be far more distant to ORF2p than the active p66 HIV-1/2 RT, despite identical amino acid sequence (up to a deletion). Therefore, this analytical framework quantifies conformational similarity in a manner that is sensitive to function.
Discussion
Our integrated analyses reveal the inner workings of the molecular machine that has written nearly half of the human genome. Understanding L1 structure and function is important both in evolution and, increasingly, in human disease. Accumulating evidence links L1 activity and the host response to common pathologies including cancer, ageing, neurodegeneration and autoimmunity2–7,26,27. Our biochemical, structural and evolutionary analyses show that ORF2p contains a highly active polymerase that is uniquely adapted for its parasitic replication cycle, with both conserved and new structural features that preserve optimal retrotransposition throughout evolution. Together, these data provide insights into two key underlying mechanisms through which L1 may cause disease: (1) nuclear insertional mutagenesis and resultant genomic havoc, and (2) cytosolic sensing of the products of ORF2p reverse transcription.
Although nuclear L1 activity has been correlated with DNA damage and structural genomic rearrangements2,41,42,52, a mechanistic understanding of L1 insertion has been elusive. The insertion process can be understood as two half reactions: first and second strand synthesis. Second strand synthesis has been challenging to study, and it was unclear whether it is performed by L1 or the host. Our data demonstrate that ORF2p is competent to perform all enzymatic steps required to prime and execute both first and second strand syntheses: it effectively synthesizes DNA with short RNA or DNA primers on both RNA and DNA templates (Fig. 3, Extended Data Figs. 4 and 5 and Supplementary Figs. 3–5). Interpreting our results in the context of high-quality biochemical data from decades of studying the R2 LINEs in insects21,24,36,48 provides us with the opportunity to update the L1 insertion model (Fig. 6). The mechanism describes a canonical insertion that is intentionally simplified and omits numerous supportive and repressive host proteins, including topoisomerase TOP1, PARP1, purine-rich element binding proteins, the Fanconi pathway (including BRCA1) and p53 (refs. 8,17–19). Furthermore, alternative pathways as such host-catalysed second strand synthesis may occur in different contexts or following ORF2p failure, and the host may combat insertion by, for example, cleaving intermediates.
Our data also shed light on other areas of the canonical L1 replication cycle. ORF2p cis RNA binding is thought to occur at the ribosome53,54. Newly translated apo ORF2p is unstable until RNA is bound, and it assumes a ‘thumb up’ conformation competent to tightly bind RNA; we speculate that the initial RNA binding probably occurs cotranslationally, potentially before the CTD has even been translated. PCNA binding, which is required for retrotransposition17 and recruits RNase H2 to allow second strand cleavage29, does not seem to be occluded in any identified state; this, together with EN and RT dependence17,18, indicates that PCNA may be recruited to ORF2p by the developing genomic lesion. Most new LINE insertions are heavily 5′ truncated1; often they comprise only a few hundred base pairs, but the reasons are not well understood. ORF2p is efficient and highly processive, consistent with previous observations16,32, adding support to the idea that host cleavage of the L1 RNA or intermediates is more likely to cause 5′ truncation than inefficiency of the polymerase55. Nuclear ORF1p levels are limited17,18, and bound ORF1p chaperones would be displaced from L1 RNA during RT, potentially leaving the large single-stranded cDNA loop intermediate unprotected (steps 3–7, Fig. 6). This could represent both a unique vulnerability and a potential nidus for translocations41,42,52, given its homology to much of the genome.
Cytosolic double-stranded nucleic acids, viral mimicry and resultant interferon signalling are known to contribute to pathology in several contexts, and NRTIs have been shown to limit the production of interferon and of these nucleic acids6,7, but their origin has remained controversial. First, our data show that ORF2p can use RNA primers and short RNA hairpins to initiate RT reactions; an Alu-like sequence is readily extended, and uridylation of the L1 RNA56 might convert it into a similar substrate as well. RNA priming of ORF2p RT in the cytoplasm can parsimoniously explain the origin of these nucleic acids. We also show that DNA primers as short as 5 nt can prime L1; it is possible that shorter primers are also tolerated16. Second, we demonstrate that L1 can directly synthesize RNA:DNA hybrids in the cytosol; these are RT-dependent but EN-independent, ruling out a nuclear origin in this system. Third, we show that L1 synthesized cDNAs activate cGAS/STING, resulting in interferon production. Our observations further demonstrate the potentially critical role of L1 and its RT products in viral mimicry57,58, as inferred from genome and cancer evolution59,60. Moreover, our robust inhibitor data provide a framework for evaluating the involvement of L1 in these phenotypes and for targeting this in the future. In summary, our structural elucidation of ORF2p will facilitate rational design of new therapeutics and lays the groundwork for future studies needed to dissect and improve our understanding of the insertion mechanism of L1, its evolution and its roles in disease.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-023-06947-z.
Supplementary information
Acknowledgements
We thank P. Cole and D. Sabatini for resources for protein expression and purification and for helpful discussions; J. Boeke, S. Whedon, D. T. Ting, L. Dai and R. Trachman for helpful discussions; C. Feschotte and J. Wells for sharing L1 sequences from numerous organisms and helpful discussions; N. Rusk for editorial assistance; D. Kocincova and E. Woolner for excellent technical assistance in the expression of the full-length L1 ORF2p RT used in gel-based assays; Y. Zhang and X. Du of Pharmaron for running the L1 homogeneous time-resolved fluorescence, telomerase and cell-based retrotransposition assays and the THP1–TREX knockout assays; J. Zhang and M. Hagel for developing the HeLa retrotransposition assays; J. Baker-Lepain for managing the Wynton computer cluster at QBI@UCSF; W.-C. Cheng, J. Heaps and J. Kalinowski for excellent technical assistance in assessing L1 expression in cells; B. Smail for purified L1-derived RNA; and the DFCI Molecular Imaging Core for microscopy assistance. Cryo-EM data were collected at the Rockefeller University Evelyn Gruss Lipper Cryo-electron Microscopy Resource Center (RRID:SCR_021146), where we thank M. Ebrahim, J. Sotiris and H. Ng, the cryo-EM facility at UMass Chan Medical School, where we thank C. Xu, K. Song and C. Ouch; and the National Center for CryoEM Access and Training (NCCAT) and the Simons Electron Microscopy Center located at the New York Structural Biology Center, where we thank E. T. Eng and H. Kuang. NCCAT is supported by the National Institutes of Health (NIH) Common Fund Transformative High Resolution Cryo-Electron Microscopy program (U24 GM129539,) and by grants from the Simons Foundation (SF349247) and New York State. This work was supported in part by NIH grants K08DK129824 and T32CA009216 (M.S.T.), R01GM130680 (K.H.B.), P41GM109824 (M.P.R., A.S.), NIGMS R01GM083960 and NCI P0557533 (A.S.), R01AI027690 (F.X.R., E.A.), R01GM126170 and R01AG078925 (J.L.), R01AI081848, R01CA240924 and U01CA228963 (B.D.G.), NIH/NCI Cancer Center Support Grant P30CA008748 (D.H., B.D.G.), an ASPIRE award from the Mark Foundation (D.H., B.D.G.), Friends of Dana-Farber Cancer Institute (K.H.B.), an Anderson Center for Cancer Research Fellowship at The Rockefeller University (T.v.E.) and Worldwide Cancer Research grant 19-0223 (J.L.).
Extended data figures and tables
Author contributions
Authors E.T.B., T.v.E., D.H., A.Z. and M.S.T. contributed equally; and authors E.P.T., R.S., B.D.M. and L.H.D. contributed equally. M.S.T., J.L., E.T.B. and D.L.R. conceptualized the study. M.S.T., E.T.B., T.v.E., D.H., A.Z., K.H.B., M.G., M.P.R., E.A., B.D.G., D.L.R. and J.L. formulated the research plan and interpreted experimental results with assistance from E.P.T., R.S., B.D.M., L.H.D., F.X.R., M.H., K.B.R., S.M.C., R.K., D.M.Z., A.S. and O.W. T.W., A.M.S., O.W. and M.S.T. developed the method to express and purify the ORF2p core with assistance from P.W., R.H.-K., D.L.R., C.N. and E.T.B. M.S.T., T.W., C.N., E.P.T., P.W., K.R., R.H.K., M.A., A.L., A.J., K.X., S.C., M.H., K.B.R. and A.M.S. expressed and purified ORF2p, designed and prepared constructs, and carried out preliminary structural experiments with supervision by O.W. and M.G. K.R. performed and analysed the results of the colorimetric ORF2p RT assay. T.v.E. and M.S.T. designed and performed differential scanning fluorometry assays with assistance from E.T.B. E.P.T., M.S.T., T.v.E., O.W. and M.G. designed and analysed the results of the single-nucleotide polymerase assays, which were performed by E.P.T. C.N. crystallized and solved the structure of ORF2p with assistance from P.W., E.T.B., E.A. and D.L.R. T.v.E. performed cryo-EM experiments and analysis with assistance from M.S.T., F.X.R. and E.A. L.H.D. performed and analysed the results of cross-linking mass spectrometry experiments with assistance from M.S.T., T.v.E., A.Z. and J.L. E.I., M.S.T., B.D.M. and C.M.D. performed imaging experiments. N.H., R.K., D.M.Z., D.L.R. and W.M. designed the THP1 assay, and B.D. and N.H. conducted some of the THP1 assays. W.M. designed the homogeneous time-resolved fluorescence assays with assistance from D.L.R. T.v.E. and P.U. performed and analysed the results of negative stain EM experiments. A.Z. performed integrative modelling with assistance from M.H., T.v.E., A.S., M.P.R., J.L. and M.S.T. R.S. designed and performed inhibitor molecular modelling with input from D.L.R. and M.S.T. D.H. performed the evolutionary analysis with assistance from F.X.R., T.v.E., A.Z., E.A., M.S.T. and B.D.G. D.H. and B.D.G. developed the Plexy algorithm. G.S.B., O.L.S. and D.L.R. acquired and oversaw synthesis and testing of inhibitors. S.M.C., T.v.E., M.S.T., O.W. and J.E.F. developed the insertion model with assistance from K.H.B., J.L. and M.P.R. J.E.F. illustrated the manuscript along with M.S.T. M.S.T., T.v.E. and B.D.G. wrote the manuscript with assistance from R.S., D.H. and A.Z. All authors reviewed and edited the manuscript.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Data availability
The coordinates for the ORF2p crystal structure have been deposited in the Protein Data Bank (PDB ID: 8C8J). Single-particle cryo-EM maps for the ORF2p core have been deposited in the Electron Microscopy Data Bank and their associated model coordinates in the Protein Data Bank under accession codes EMD-40858, PDB ID:8SXT (heteroduplex); EMD-40859, PDB ID:8SXU (oligo(A)); EMD-40856 (apo). Raw videos and motion-corrected micrographs for apo ORF2p have been deposited in the Electron Microscopy Public Image Archive under accession code EMPIAR-11556. The mass spectrometry proteomics data have been deposited at the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) through the PRIDE partner repository with dataset identifier PXD038615. Files containing the input data, scripts and results of integrative modelling are available at https://github.com/integrativemodeling/ORF2p and the nascent integrative modelling section of the worldwide Protein Data Bank (wwPDB) PDB-Dev repository for integrative structures and corresponding data under accession code PDBDEV_00000211. AlphaFold2 predictions, molecular dynamics simulation results and full-atom versions of best-matching models are available from the ModelArchive repository (https://www.modelarchive.org/doi/10.5452/ma-fejd6, https://www.modelarchive.org/doi/10.5452/ma-joo4d, https://www.modelarchive.org/doi/10.5452/ma-lzyrq,https://www.modelarchive.org/doi/10.5452/ma-xlzzy, https://www.modelarchive.org/doi/10.5452/ma-9wovj). New plasmids have been deposited at Addgene.
Code availability
Software for the evolutionary analysis is available at https://github.com/dfhoyosg/Plexy.
Competing interests
M.S.T., B.D.G., M.G., K.H.B. and E.A. hold equity in and have received consulting fees from ROME Therapeutics. J.L. holds equity in ROME Therapeutics. D.H. and E.T. have received consulting fees from ROME Therapeutics. Research conducted at Proteros Biostructures and Charles River Laboratory was contracted by ROME Therapeutics. Research for this project in the Götte laboratory was sponsored by ROME Therapeutics. M.S.T. has received consulting fees from Tessera Therapeutics. K.H.B. declares relationships with Alamar Biosciences, Genscript, Oncolinea/PrimeFour Therapeutics, Scaffold Therapeutics, Tessera Therapeutics and Transposon Therapeutics.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Eric T. Baldwin, Trevor van Eeuwen, David Hoyos, Arthur Zalevsky, Martin S. Taylor
These authors jointly supervised this work: Kathleen H. Burns, Matthias Götte, Michael P. Rout, Eddy Arnold, Benjamin D. Greenbaum, Donna L. Romero, John LaCava, Martin S. Taylor
Contributor Information
Kathleen H. Burns, Email: kathleenh_burns@dfci.harvard.edu
Matthias Götte, Email: gotte@ualberta.ca.
Michael P. Rout, Email: rout@rockefeller.edu
Eddy Arnold, Email: arnold@cabm.rutgers.edu.
Benjamin D. Greenbaum, Email: greenbab@mskcc.org
Donna L. Romero, Email: dlromero@rometx.com
John LaCava, Email: j.p.lacava@rug.nl.
Martin S. Taylor, Email: mstaylor@mgh.harvard.edu
Extended data
is available for this paper at 10.1038/s41586-023-06947-z.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-023-06947-z.
References
- 1.Kazazian HH, Jr, Moran JV. Mobile DNA in health and disease. N. Engl. J. Med. 2017;377:361–370. doi: 10.1056/NEJMra1510092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rodriguez-Martin B, et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 2020;52:306–319. doi: 10.1038/s41588-019-0562-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Taylor, M. S. et al. Ultrasensitive detection of circulating LINE-1 ORF1p as a specific multi-cancer biomarker. Cancer Discov.13, 2532–2547 (2023). [DOI] [PMC free article] [PubMed]
- 4.Rice GI, et al. Reverse-transcriptase inhibitors in the Aicardi-Goutieres syndrome. N. Engl. J. Med. 2018;379:2275–2277. doi: 10.1056/NEJMc1810983. [DOI] [PubMed] [Google Scholar]
- 5.Carter V, et al. High prevalence and disease correlation of autoantibodies against p40 encoded by long interspersed nuclear elements in systemic lupus erythematosus. Arthritis Rheumatol. 2020;72:89–99. doi: 10.1002/art.41054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.De Cecco M, et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature. 2019;566:73–78. doi: 10.1038/s41586-018-0784-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Simon M, et al. LINE1 derepression in aged wild-type and SIRT6-deficient mice drives inflammation. Cell Metab. 2019;29:871–885.e875. doi: 10.1016/j.cmet.2019.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ardeljan D, et al. Cell fitness screens reveal a conflict between LINE-1 retrotransposition and DNA replication. Nat. Struct. Mol. Biol. 2020;27:168–178. doi: 10.1038/s41594-020-0372-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Boeke JD, Garfinkel DJ, Styles CA, Fink GR. Ty elements transpose through an RNA intermediate. Cell. 1985;40:491–500. doi: 10.1016/0092-8674(85)90197-7. [DOI] [PubMed] [Google Scholar]
- 10.Hohjoh H, Singer MF. Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA. EMBO J. 1996;15:630–639. doi: 10.1002/j.1460-2075.1996.tb00395.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mathias SL, Scott AF, Kazazian HH, Jr, Boeke JD, Gabriel A. Reverse transcriptase encoded by a human transposable element. Science. 1991;254:1808–1810. doi: 10.1126/science.1722352. [DOI] [PubMed] [Google Scholar]
- 12.Feng Q, Moran JV, Kazazian HH, Jr, Boeke JD. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell. 1996;87:905–916. doi: 10.1016/S0092-8674(00)81997-2. [DOI] [PubMed] [Google Scholar]
- 13.Weichenrieder O, Repanas K, Perrakis A. Crystal structure of the targeting endonuclease of the human LINE-1 retrotransposon. Structure. 2004;12:975–986. doi: 10.1016/j.str.2004.04.011. [DOI] [PubMed] [Google Scholar]
- 14.Wei W, et al. Human L1 retrotransposition: cis preference versus trans complementation. Mol. Cell. Biol. 2001;21:1429–1439. doi: 10.1128/MCB.21.4.1429-1439.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kulpa DA, Moran JV. Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat. Struct. Mol. Biol. 2006;13:655–660. doi: 10.1038/nsmb1107. [DOI] [PubMed] [Google Scholar]
- 16.Monot C, et al. The specificity and flexibility of l1 reverse transcription priming at imperfect T-tracts. PLoS Genet. 2013;9:e1003499. doi: 10.1371/journal.pgen.1003499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Taylor MS, et al. Affinity proteomics reveals human host factors implicated in discrete stages of LINE-1 retrotransposition. Cell. 2013;155:1034–1048. doi: 10.1016/j.cell.2013.10.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Taylor MS, et al. Dissection of affinity captured LINE-1 macromolecular complexes. eLife. 2018;7:e30094. doi: 10.7554/eLife.30094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liu N, et al. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature. 2018;553:228–232. doi: 10.1038/nature25179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cost GJ, Feng Q, Jacquier A, Boeke JD. Human L1 element target-primed reverse transcription in vitro. EMBO J. 2002;21:5899–5910. doi: 10.1093/emboj/cdf592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 1993;72:595–605. doi: 10.1016/0092-8674(93)90078-5. [DOI] [PubMed] [Google Scholar]
- 22.Wilkinson, M. E., Frangieh, C. J., Macrae, R. K. & Zhang, F. Structure of the R2 non-LTR retrotransposon initiating target-primed reverse transcription. Science380, 301–308 (2023). [DOI] [PMC free article] [PubMed]
- 23.Deng P, et al. Structural RNA components supervise the sequential DNA cleavage in R2 retrotransposon. Cell. 2023;186:2865–2879.e2820. doi: 10.1016/j.cell.2023.05.032. [DOI] [PubMed] [Google Scholar]
- 24.Khadgi BB, Govindaraju A, Christensen SM. Completion of LINE integration involves an open ‘4-way’ branched DNA intermediate. Nucleic Acids Res. 2019;47:8708–8719. doi: 10.1093/nar/gkz673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kojima KK. Different integration site structures between L1 protein-mediated retrotransposition in cis and retrotransposition in trans. Mob. DNA. 2010;1:17. doi: 10.1186/1759-8753-1-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fukuda S, et al. Cytoplasmic synthesis of endogenous Alu complementary DNA via reverse transcription and implications in age-related macular degeneration. Proc. Natl Acad. Sci. USA. 2021;118:e2022751118. doi: 10.1073/pnas.2022751118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Thomas CA, et al. Modeling of TREX1-dependent autoimmune disease using human stem cells highlights L1 accumulation as a source of neuroinflammation. Cell Stem Cell. 2017;21:319–331.e318. doi: 10.1016/j.stem.2017.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rajurkar M, et al. Reverse transcriptase inhibition disrupts repeat element life cycle in colorectal cancer. Cancer Discov. 2022;12:1462–1481. doi: 10.1158/2159-8290.CD-21-1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Benitez-Guijarro M, et al. RNase H2, mutated in Aicardi-Goutieres syndrome, promotes LINE-1 retrotransposition. EMBO J. 2018;37:e98506. doi: 10.15252/embj.201798506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Stamos JL, Lentzsch AM, Lambowitz AM. Structure of a thermostable group II intron reverse transcriptase with template-primer and its functional and evolutionary implications. Mol. Cell. 2017;68:926–939.e924. doi: 10.1016/j.molcel.2017.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pyle AM. Group II intron self-splicing. Annu. Rev. Biophys. 2016;45:183–205. doi: 10.1146/annurev-biophys-062215-011149. [DOI] [PubMed] [Google Scholar]
- 32.Piskareva O, Schmatchenko V. DNA polymerization by the reverse transcriptase of the human L1 retrotransposon on its own template in vitro. FEBS Lett. 2006;580:661–668. doi: 10.1016/j.febslet.2005.12.077. [DOI] [PubMed] [Google Scholar]
- 33.Dai L, Huang Q, Boeke JD. Effect of reverse transcriptase inhibitors on LINE-1 and Ty1 reverse transcriptase activities and on LINE-1 retrotransposition. BMC Biochem. 2011;12:18. doi: 10.1186/1471-2091-12-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Baldwin ET, et al. Human endogenous retrovirus-K (HERV-K) reverse transcriptase (RT) structure and biochemistry reveals remarkable similarities to HIV-1 RT and opportunities for HERV-K-specific inhibition. Proc. Natl Acad. Sci. USA. 2022;119:e2200260119. doi: 10.1073/pnas.2200260119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lentzsch AM, Stamos JL, Yao J, Russell R, Lambowitz AM. Structural basis for template switching by a group II intron-encoded non-LTR-retroelement reverse transcriptase. J. Biol. Chem. 2021;297:100971. doi: 10.1016/j.jbc.2021.100971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pimentel SC, Upton HE, Collins K. Separable structural requirements for cDNA synthesis, nontemplated extension, and template jumping by a non-LTR retroelement reverse transcriptase. J. Biol. Chem. 2022;298:101624. doi: 10.1016/j.jbc.2022.101624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Christian CM, Sokolowski M, deHaro D, Kines KJ, Belancio VP. Involvement of conserved amino acids in the C-terminal region of LINE-1 ORF2p in retrotransposition. Genetics. 2017;205:1139–1149. doi: 10.1534/genetics.116.191403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Adney EM, et al. Comprehensive scanning mutagenesis of human retrotransposon LINE-1 identifies motifs essential for function. Genetics. 2019;213:1401–1414. doi: 10.1534/genetics.119.302601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Christian CM, deHaro D, Kines KJ, Sokolowski M, Belancio VP. Identification of L1 ORF2p sequence important to retrotransposition using Bipartile Alu retrotransposition (BAR) Nucleic Acids Res. 2016;44:4818–4834. doi: 10.1093/nar/gkw277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ruiz F, Arnold E. Evolving understanding of HIV-1 reverse transcriptase structure, function, inhibition, and resistance. Curr. Opin. Struct. Biol. 2020;61:113–123. doi: 10.1016/j.sbi.2019.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gilbert N, Lutz-Prigge S, Moran JV. Genomic deletions created upon LINE-1 retrotransposition. Cell. 2002;110:315–325. doi: 10.1016/S0092-8674(02)00828-0. [DOI] [PubMed] [Google Scholar]
- 42.Symer DE, et al. Human l1 retrotransposition is associated with genetic instability in vivo. Cell. 2002;110:327–338. doi: 10.1016/S0092-8674(02)00839-5. [DOI] [PubMed] [Google Scholar]
- 43.Traut TW. Physiological concentrations of purines and pyrimidines. Mol. Cell. Biochem. 1994;140:1–22. doi: 10.1007/BF00928361. [DOI] [PubMed] [Google Scholar]
- 44.Morrish TA, et al. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 2002;31:159–165. doi: 10.1038/ng898. [DOI] [PubMed] [Google Scholar]
- 45.Roulois D, et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell. 2015;162:961–973. doi: 10.1016/j.cell.2015.07.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chiappinelli KB, et al. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell. 2015;162:974–986. doi: 10.1016/j.cell.2015.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Sarafianos SG, et al. Lamivudine (3TC) resistance in HIV-1 reverse transcriptase involves steric hindrance with beta-branched amino acids. Proc. Natl Acad. Sci. USA. 1999;96:10027–10032. doi: 10.1073/pnas.96.18.10027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Pradhan M, Govindaraju A, Jagdish A, Christensen SM. The linker region of LINEs modulates DNA cleavage and DNA polymerization. Anal. Biochem. 2020;603:113809. doi: 10.1016/j.ab.2020.113809. [DOI] [PubMed] [Google Scholar]
- 49.Piskareva O, Ernst C, Higgins N, Schmatchenko V. The carboxy-terminal segment of the human LINE-1 ORF2 protein is involved in RNA binding. FEBS Open Bio. 2013;3:433–437. doi: 10.1016/j.fob.2013.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Miller I, et al. Structural dissection of sequence recognition and catalytic mechanism of human LINE-1 endonuclease. Nucleic Acids Res. 2021;49:11350–11366. doi: 10.1093/nar/gkab826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 1999;16:793–805. doi: 10.1093/oxfordjournals.molbev.a026164. [DOI] [PubMed] [Google Scholar]
- 52.Katz-Summercorn AC, et al. Multi-omic cross-sectional cohort study of pre-malignant Barrett’s esophagus reveals early structural variation and retrotransposon activity. Nat. Commun. 2022;13:1407. doi: 10.1038/s41467-022-28237-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Ahl V, Keller H, Schmidt S, Weichenrieder O. Retrotransposition and crystal structure of an Alu RNP in the ribosome-stalling conformation. Mol. Cell. 2015;60:715–727. doi: 10.1016/j.molcel.2015.10.003. [DOI] [PubMed] [Google Scholar]
- 54.Doucet AJ, Wilusz JE, Miyoshi T, Liu Y, Moran JV. A 3′ poly(A) tract is required for LINE-1 retrotransposition. Mol. Cell. 2015;60:728–741. doi: 10.1016/j.molcel.2015.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Suzuki J, et al. Genetic evidence that the non-homologous end-joining repair pathway is involved in LINE retrotransposition. PLoS Genet. 2009;5:e1000461. doi: 10.1371/journal.pgen.1000461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Warkocki Z, et al. Uridylation by TUT4/7 restricts retrotransposition of human LINE-1s. Cell. 2018;174:1537–1548.e1529. doi: 10.1016/j.cell.2018.07.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ahmad S, et al. Breaching self-tolerance to Alu duplex RNA underlies MDA5-mediated inflammation. Cell. 2018;172:797–810.e713. doi: 10.1016/j.cell.2017.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Mehdipour P, et al. Epigenetic therapy induces transcription of inverted SINEs and ADAR1 dependency. Nature. 2020;588:169–173. doi: 10.1038/s41586-020-2844-1. [DOI] [PubMed] [Google Scholar]
- 59.Šulc, P. et al. Repeats mimic pathogen-associated patterns across a vast evolutionary landscape. Preprint at bioRxiv10.1101/2021.11.04.467016 (2023).
- 60.Sun, S. et al. Cancer cells co-evolve with retrotransposons to mitigate viral mimicry. Preprint at bioRxiv10.1101/2023.05.19.541456 (2023).
- 61.Ago H, et al. Crystal structure of the RNA-dependent RNA polymerase of hepatitis C virus. Structure. 1999;7:1417–1426. doi: 10.1016/S0969-2126(00)80031-3. [DOI] [PubMed] [Google Scholar]
- 62.Hsiou Y, et al. Structure of unliganded HIV-1 reverse transcriptase at 2.7 Å resolution: implications of conformational changes for polymerization and inhibition mechanisms. Structure. 1996;4:853–860. doi: 10.1016/S0969-2126(96)00091-3. [DOI] [PubMed] [Google Scholar]
- 63.An W, et al. Characterization of a synthetic human LINE-1 retrotransposon ORFeus-Hs. Mob. DNA. 2011;2:2. doi: 10.1186/1759-8753-2-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Crossley MP, et al. Catalytically inactive, purified RNase H1: a specific and sensitive probe for RNA-DNA hybrid imaging. J. Cell Biol. 2021;220:e202101092. doi: 10.1083/jcb.202101092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ren J, et al. Structural mechanisms of drug resistance for mutations at codons 181 and 188 in HIV-1 reverse transcriptase and the improved resilience of second generation non-nucleoside inhibitors. J. Mol. Biol. 2001;312:795–805. doi: 10.1006/jmbi.2001.4988. [DOI] [PubMed] [Google Scholar]
- 66.Das K, Martinez SE, Bandwar RP, Arnold E. Structures of HIV-1 RT-RNA/DNA ternary complexes with dATP and nevirapine reveal conformational flexibility of RNA/DNA: insights into requirements for RNase H cleavage. Nucleic Acids Res. 2014;42:8125–8137. doi: 10.1093/nar/gku487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Rhee SY, et al. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 2003;31:298–303. doi: 10.1093/nar/gkg100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Melikian GL, et al. Non-nucleoside reverse transcriptase inhibitor (NNRTI) cross-resistance: implications for preclinical evaluation of novel NNRTIs and clinical genotypic resistance testing. J. Antimicrob. Chemother. 2014;69:12–20. doi: 10.1093/jac/dkt316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Vingerhoets J, et al. Resistance profile of etravirine: combined analysis of baseline genotypic and phenotypic data from the randomized, controlled phase III clinical studies. AIDS. 2010;24:503–514. doi: 10.1097/QAD.0b013e32833677ac. [DOI] [PubMed] [Google Scholar]
- 70.Azijn H, et al. TMC278, a next-generation nonnucleoside reverse transcriptase inhibitor (NNRTI), active against wild-type and NNRTI-resistant HIV-1. Antimicrob. Agents Chemother. 2010;54:718–727. doi: 10.1128/AAC.00986-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Ren J, et al. Crystal structures of HIV-1 reverse transcriptases mutated at codons 100, 106 and 108 and mechanisms of resistance to non-nucleoside inhibitors. J. Mol. Biol. 2004;336:569–578. doi: 10.1016/j.jmb.2003.12.055. [DOI] [PubMed] [Google Scholar]
- 72.Tambuyzer L, et al. Characterization of genotypic and phenotypic changes in HIV-1-infected patients with virologic failure on an etravirine-containing regimen in the DUET-1 and DUET-2 clinical studies. AIDS Res. Hum. Retroviruses. 2010;26:1197–1205. doi: 10.1089/aid.2009.0302. [DOI] [PubMed] [Google Scholar]
- 73.Xie Y, et al. Cell division promotes efficient retrotransposition in a stable L1 reporter cell line. Mob. DNA. 2013;4:10. doi: 10.1186/1759-8753-4-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Boyer PL, et al. YADD mutants of human immunodeficiency virus type 1 and Moloney murine leukemia virus reverse transcriptase are resistant to lamivudine triphosphate (3TCTP) in vitro. J Virol. 2001;75:6321–6328. doi: 10.1128/JVI.75.14.6321-6328.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Jamburuthugoda VK, Eickbush TH. Identification of RNA binding motifs in the R2 retrotransposon-encoded reverse transcriptase. Nucleic Acids Res. 2014;42:8405–8415. doi: 10.1093/nar/gku514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Blocker FJ, et al. Domain structure and three-dimensional model of a group II intron-encoded reverse transcriptase. RNA. 2005;11:14–28. doi: 10.1261/rna.7181105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chung K, et al. Structures of a mobile intron retroelement poised to attack its structured DNA target. Science. 2022;378:627–634. doi: 10.1126/science.abq2844. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The coordinates for the ORF2p crystal structure have been deposited in the Protein Data Bank (PDB ID: 8C8J). Single-particle cryo-EM maps for the ORF2p core have been deposited in the Electron Microscopy Data Bank and their associated model coordinates in the Protein Data Bank under accession codes EMD-40858, PDB ID:8SXT (heteroduplex); EMD-40859, PDB ID:8SXU (oligo(A)); EMD-40856 (apo). Raw videos and motion-corrected micrographs for apo ORF2p have been deposited in the Electron Microscopy Public Image Archive under accession code EMPIAR-11556. The mass spectrometry proteomics data have been deposited at the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) through the PRIDE partner repository with dataset identifier PXD038615. Files containing the input data, scripts and results of integrative modelling are available at https://github.com/integrativemodeling/ORF2p and the nascent integrative modelling section of the worldwide Protein Data Bank (wwPDB) PDB-Dev repository for integrative structures and corresponding data under accession code PDBDEV_00000211. AlphaFold2 predictions, molecular dynamics simulation results and full-atom versions of best-matching models are available from the ModelArchive repository (https://www.modelarchive.org/doi/10.5452/ma-fejd6, https://www.modelarchive.org/doi/10.5452/ma-joo4d, https://www.modelarchive.org/doi/10.5452/ma-lzyrq,https://www.modelarchive.org/doi/10.5452/ma-xlzzy, https://www.modelarchive.org/doi/10.5452/ma-9wovj). New plasmids have been deposited at Addgene.
Software for the evolutionary analysis is available at https://github.com/dfhoyosg/Plexy.