Abstract
Hepatitis B virus (HBV) chronically infects >250 million people. It replicates by a unique protein‐primed reverse transcription mechanism, and the primary anti‐HBV drugs are nucleos(t)ide analogs targeting the viral polymerase (P). P has four domains compared to only two in most reverse transcriptases: the terminal protein (TP) that primes DNA synthesis, a spacer, the reverse transcriptase (RT), and the ribonuclease H (RNase H). Despite being a major drug target and catalyzing a reverse transcription pathway very different from the retroviruses, HBV P has resisted structural analysis for decades. Here, we exploited computational advances to model P. The TP wrapped around the RT domain rather than forming the anticipated globular domain, with the priming tyrosine poised over the RT active site. The orientation of the RT and RNase H domains resembled that of the retroviral enzymes despite the lack of sequences analogous to the retroviral linker region. The model was validated by mapping residues with known surface exposures, docking nucleic acids, mechanistically interpreting mutations with strong phenotypes, and docking inhibitors into the RT and RNase H active sites. The HBV P fold, including the orientation of the TP domain, was conserved among hepadnaviruses infecting rodent to fish hosts and a nackednavirus, but not in other non‐retroviral RTs. Therefore, this protein fold has persisted since the hepadnaviruses diverged from nackednaviruses >400 million years ago. This model will advance mechanistic analyses into the poorly understood enzymology of HBV reverse transcription and will enable drug development against non‐active site targets for the first time.
Keywords: hepatitis B virus, polymerase, predicted structure, reverse transcription
1. INTRODUCTION
Hepatitis B virus (HBV) is a small, partially double‐stranded DNA virus in the Hepadnaviridae virus family that replicates by protein‐primed reverse transcription. 1 HBV has nine genotypes differing by >8% at the nucleotide level. 2 Over 250 million people are chronically infected with the virus, 3 leading to >850,000 deaths annually. 4 The primary drugs for HBV infection are nucleos(t)ide analogs that block reverse transcription, but treatment is not curative and is life‐long for most patients. 5 , 6
The HBV polymerase (P) that catalyzes reverse transcription has two additional domains, the terminal protein (TP) and spacer domains, that are absent in the much better understood retroviral reverse transcriptases (RT). Together, these unique domains account for almost half of P's sequence (Figure 1a). The TP domain contains about 180 amino acids (aa), including Y63 that primes DNA synthesis. This is followed by the poorly conserved ~175 aa spacer with little known function. 1 The catalytic core of the enzyme contains the RT and ribonuclease H (RNase H) domains and is homologous to retroviral RTs despite sharing only about 20% aa sequence homology and lacking sequences analogous to the linker region between the RT and RNase H domains. The RT domain (~335 aa) carries the A‐E reverse transcription motifs, 7 including the YMDD motif that chelates 2 Mg2+ ions essential for DNA synthesis. The ~155 aa RNase H domain contains the D‐E‐D‐D motif that chelates 2 Mg2+ ions essential for RNA hydrolysis. 8 , 9 , 10 The C‐terminal ~35 residues of P are dispensable for RNase H activity in vitro 11 and are presumed to be unstructured.
FIGURE 1.

Gene organization for HBV P and conformations adopted by the enzyme. (a) Gene organization. Genotype B numbers are used for the domain boundaries, and the Y63 priming residue, YMDD RT active site motif, and D‐E‐D‐D RNase H active site motif are indicated. (b) Conformations adopted by P identified by prior analyses. The TP domain is in light gray, the RT domain in dark gray, the RNase H domain as an unshaded oval, and the spacer domain is indicated by thick lines. AS, RT active site; T3, T3 motif in the TP domain; RT1, RT1 motif in the RT domain; ε, the HBV ε stem loop needed for specific RNA encapsidation and priming. Reprinted with permission from 16
P is 843 aa long in most HBV genotypes, with the genotypes sharing 86–88% identity. Genotype A has a two‐residue insertion between positions 16 and 17 in the terminal protein (genotype B numbering employed unless indicated otherwise), genotype D lacks 11 aa in the spacer domain (residues 184–194), and genotypes E and G lack aa 184 in the spacer.
HBV reverse transcription 12 , 13 begins with binding of P to the ε RNA stem‐loop on the viral pregenomic RNA (pgRNA) via an HSP90‐mediated reaction. 14 , 15 ε binding depends on a bi‐partite binding element comprised of the T3 motif in the TP domain and the RT1 motif in the RT domain. 16 The RT activity initiates DNA synthesis using Y63 in the TP domain as a primer and a bulge in ε as the template. 17 This covalently links P to the viral minus‐polarity DNA strand, but it is unknown how Y63 accesses the RT active site. The RNase H degrades the pgRNA after it has been copied into minus‐polarity DNA. Studies with duck hepatitis B virus (DHBV), a distant homolog of HBV, reveal that the RNase H leaves a 15–18 nt capped RNA that primes plus‐polarity DNA strand synthesis due to the distance between the RT and RNase H active sites. 18 Reverse transcription employs three strand transfers to make the mature partially double‐stranded viral DNA in viruses. P is a monomer and remains covalently attached to the viral DNA within mature virions. 19 Only the enzymology of HBV pgRNA encapsidation and DNA chain elongation are understood in any depth.
Structural analyses of P have been stymied for >30 years despite its importance as a drug target, the presence of two unique domains, and the unusual reverse transcription pathway it catalyzes. P is extremely difficult to produce in an active, soluble form due to its instability in most protein production systems and aggregation issues. Both problems likely stem from P's existence in a complex with HSP90 chaperones 14 , 15 in which the chaperones are believed to cover hydrophobic regions on the protein surface. Second, P has at least five known conformations, identified primarily using DHBV P (Figure 1b). 16 , 20 , 21 , 22 , 23 Flexibility is promoted by HSP90 in the holoenzyme complex and is necessary to permit reverse transcription to occur while P is covalently attached to one end of its product DNA. A globular ab initio predicted model exists for the TP domain based on secondary structure predictions. 24 , 25 Multiple homology models generated against retroviral RTs exist for the RT domain of HBV P. 26 , 27 The RT models are accurate enough in the active site to permit interpretation of the mechanisms of resistance to nucleoside analog drugs, but little is known of their accuracy outside of the active site. Three homology models exist for the RNase H domain. 28 , 29 , 30 None of them contain the full domain 31 and only one 28 correctly predicts the last D‐E‐D‐D active site residue. 8 , 32
AlphaFold is an ab initio protein structure prediction program. 33 It uses correlated sequence variations among homologous sequences to define intra‐protein distance constraints plus structural information from homologous proteins (when available) to predict the structure of the α‐carbon backbone of a protein. Side chain orientations are then refined by energy minimization. Structures were predicted by AlphaFold for 10,795 proteins with experimentally determined structures, and >50% of the models had α‐carbon chain root‐mean‐square deviation (RMSD) values ≤2 Å compared to experimentally determined structures. 33 Accuracy of the predictions is measured in part by the lowest distance difference test (lDDT). 34 The predicted lDDT scores (pLDDT) generated by AlphaFold correlate well with lDDT scores for known protein structures. 33 These studies validate AlphaFold's utility, with the understanding that the predicted structures need to be experimentally validated.
Here, we predicted the structure of HBV P and validated the model using data from prior analyses of P and molecular docking. Structures of P proteins from animal hepadnaviruses and nackednaviruses were predicted and compared to the HBV P. These studies reveal the fold of the enzyme, permit approximation of its antiquity, generate predictions for how P primes reverse transcription, and enable drug discovery against non‐active sites for the first time.
2. RESULTS AND DISCUSSION
2.1. Modeling
HBV P models were primarily generated using genotype B, Genbank AB554017. Models were generated for each domain independently, the catalytic core of the enzyme, and the full‐length enzyme using AlphaFold2. Models with the highest pLDDT scores among the five primary models for each fold were energy‐minimized and used for all analyses. The domain boundaries were from Donlin et al., 35 with the N‐terminus of the RT domain moved to residue 357 to include a β‐sheet at the boundary of the spacer and RT domains conserved in all models. The individual domain models are in Figures S1–S4.
2.2. Catalytic core
A model of the HBV catalytic core containing the RT and RNase H domains folded with high confidence (pLDDT >80) for the bulk of the model, with lower confidence at the N‐terminus of the RT domain, ~80 aa in the RT that includes the YMDD motif, and the extreme C‐terminus (Figure S5).
The RT domain adopted the expected right‐hand shape of a DNA polymerase with fingers, palm, and thumb subdomains (Figure 2a). The conserved A‐E DNA polymerase motifs 7 were in the palm, similar to their locations in other DNA polymerases. The model accommodated two Mg2+ ions separated by 3.23 Å in the RT active site that were coordinated by the YMDD motif in positions analogous to those in the HIV RT. The RT domain within the predicted HBV catalytic core model could be superimposed with the RT domain model from Das et al. 26 with an RMSD = 3.38 Å, with the AlphaFold structure predicting more of the conserved α‐helixes and β‐sheets in the fingers subdomain (Figure 3a). The HBV RT domain from the catalytic core model aligned to the HIV RT domain (PDB: 5XN1) with an RMSD = 3.34 Å (Figure 3b). The orientations of the α‐helixes and β‐sheets in the fingers subdomain from the AlphaFold HBV RT model were more similar to the HIV structure than the Das et al. model, 26 although they were rotated outward more than expected.
FIGURE 2.

Models for the catalytic core of HBV P. (a) Genotype B RT and RNase H domains. (b) Genotype D RT and RNase H domains. Yellow, RT domain; Green, RNase H domain; Magenta spheres, Mg2+ ions. The amino termini are marked with cyan spheres
FIGURE 3.

Superpositions of HBV P models. (a) HBV genotype B RT domain from the catalytic core model (yellow) versus the HBV RT domain model (purple) from Das et al. 26 (b) HBV RNase H (green) versus HIV RNase H (magenta). (c) HBV genotype B catalytic core model (orange) versus HIV RT‐RNase H structure (cyan). Top panels, superpositions; Bottom panels, plots of RMSD values by residue number in the superposition alignment. RT, RT domain; RH, RNase H domain; Magenta spheres, Mg2+ ions
The RNase H domain adopted the canonical RNase H fold 9 , 10 with a β‐sheet platform overlaid by α‐helices arranged in an H (Figures S5 and S24b–d). The model accommodated 2 Mg2+ ions 3.96 Å apart that were coordinated by the D‐E‐D‐D motif that binds divalent cations in other RNases H, as expected because HBV RNase H activity requires Mg2+ or Mn2+ and mutating the D‐E‐D‐D residues yields an RNase H‐deficient phenotype. 8 , 32 The RNase H domain from the catalytic core model aligned well with the HIV RNase H (PDB: 3K2P, RMSD = 2.84 Å; Figure S24b) and human RNase H1 (PDB: 2QKK, RMSD = 2.66 Å; Figure S24c). The AlphaFold model is likely superior to our prior RNase H model 28 due to AlphaFold's better performance 33 compared to the Phyre2 modeling software 36 used for the older model, and because it predicted an α helix and a β sheet strand that is conserved among known RNase H structures 9 , 10 that were previously poorly formed.
The relative orientation of the two domains in the HBV catalytic core model was similar to the HIV RT‐RNase H (Figures 3b and S24f) despite the HIV enzyme being a heterodimer that carries a second copy of an enzymatically inactive RT domain. The HBV genotype B catalytic core aligned with the HIV enzyme with an RMSD of 3.75 Å (Figure S24f) and with the monomeric Ty3 RT‐RNase H (PDB: 4OL8) with an RMSD of 3.60 Å (Figure S24h), but alignment qualities were suppressed because the HBV model lacks the linker sequences between the two catalytic domains which helps accommodate the heteroduplex in HIV and other retroviral RT‐RNases H. 37 , 38 A plausible nucleic acid binding channel between the two active sites featuring positively charged and neutral residues was present, although the channel was not as positively charged as in HIV. The reduced affinity for nucleic acids implied by the relatively low positive charge of the binding channel in the HBV enzyme is plausible because reverse transcription occurs within a capsid particle with an interior diameter of 250 Å, so the effective concentration of the nucleic acids is very high. This consideration also applies to the HIV enzyme which is similarly active within a viral capsid during viral replication.
We also predicted a two‐domain model for HBV genotype D as it is commonly used for inhibitor screening (Figure S6). The genotype B and D RT‐RNase H models aligned with an RMSD = 2.15 Å (Figure S24e). The orientation of the fingers subdomain in the genotype D model is more plausible than in the genotype B model because it resembles the orientation commonly seen in DNA polymerases.
2.3. Full‐length HBV polymerase
Full‐length P folded with high confidence (pLDDT >80), with the exception of the extreme N‐terminus, spacer, the C‐terminal ~50 residues, and short regions within each domain (Figures 4a and S8). A full‐length P model for genotype D was also predicted that aligned well with the genotype B model (RMSD = 1.96 Å; Figures S10 and S25c).
FIGURE 4.

Predicted model of genotype B P. (a) Full‐length P. The amino terminus is marked with cyan spheres. (b) RT active site. (c) RNase H active site. Red, TP domain; Gray, Spacer domain; Yellow, RT domain; Green, RNase H domain. Magenta spheres, Mg2+ ions. Y63 and the residues in the YMDD and D‐E‐D‐D motifs are labeled and shown as sticks
Intriguingly, the TP domain formed a novel C‐shaped structure that cupped around the catalytic core of P rather than the expected globular fold (Figure 4a). Y63 in the TP domain that primes DNA synthesis and covalently links P to the viral minus‐polarity DNA strand 17 was predicted to be on a loop above the domain YMDD RT active site motif. Y63 is near the N‐terminal edge of a region of high pLDDT confidence that was flanked by regions with pLDDT values of 50–60 that imply poorly structured and/or flexible sequences. The putative flexibility of the loop is consistent with the need for Y63 to bend downwards to prime DNA synthesis in the RT active site and then be displaced from the active site by the growing DNA. It is also consistent with the conformational changes that must occur in the priming loop during the three DNA strand transfers during reverse transcription 12 , 13 while P remains attached to the 5′ end of the minus‐polarity DNA strand via Y63.
The spacer domain was predicted to form an unstructured loop (Figures 4a and S2) that was different in all five primary models generated by AlphaFold. This is consistent with its low sequence conservation, the ability of the TP and RT domains to complement each other in trans, 39 , 40 , 41 and with the ability to alter sequences in the spacer without affecting reverse transcription. 19 , 42 , 43 , 44
The RT and RNase H domain predictions in the two‐ and four‐domain genotype B HBV models aligned with an RMSD of 2.37 Å (Figure S26e). The fingers subdomain of the AlphaFold HBV P model was more similar to the HIV RT than the fingers predicted by the Das et al. HBV RT domain model 26 and had α‐helixes and β‐sheets of similar lengths and orientations as in the HIV RT (Figures S26d and S27a). Both catalytic domains in the full‐length model accommodated Mg2+ ions coordinated by the YMDD and D‐E‐D‐D motifs in positions very similar to those in the HIV RT and RNase H active sites (Figure 4b,c).
2.4. Nucleic acid docking
RNA:DNA heteroduplexes (21 and 24 bp) were extracted from HIV RT‐RNase H co‐crystal structures (PDB: 4B3Q and 6BSH) and docked into the HBV RT‐RNase H catalytic core and full‐length P models for genotypes B and D employing the Schrödinger BioLuminate module. The heteroduplexes docked into the RT active site but were usually located just outside the RNase H active site (Figures 5a,b and S28b,d). One pose for the genotype D catalytic core model simultaneously aligned the DNA strand near the Mg2+ ions in the RT active site and the RNA strand opposite the Mg2+ ions in the RNase H active site (Figure S28e–g), with the heteroduplex following the nucleic acid binding channel predicted by the apo models. The 19 bp spanning the RT and RNase H active sites is similar to the 15–18 bp measured with DHBV P. 18 The infrequency of binding poses that engage both active sites is similar to what has been seen with HIV RT‐RNase H co‐crystals where the heteroduplexes were located outside the RNase H active site. 45 , 46 , 47
FIGURE 5.

Computational heteroduplex nucleic acid docking into the HBV genotype B catalytic core and full‐length P models. Docking was conducted using PIPER program in the BioLuminate module within the Schrödinger molecular analysis suite. (a and b) Representative binding poses between the indicated model and a heteroduplex. The fingers, palm, and thumb subdomains of the RT domain are labeled. Residues in the RT and RNase H domains contacting the RNA:DNA heteroduplex are shown as sticks and labeled. Locations of residues that interact with ε among the various binding poses are indicated: Violet, RT1 motif; Cyan, T3; Yellow, all other interacting residues. (c) Positions of the T3 and RT1 motifs. Magenta sphere, Mg2+ ions
The HBV ε RNA stem‐loop that templates DNA priming [PDB: 6VAR 48 ] was also docked into P (Figure S29a–c). Three classes of binding poses were found. Two classes wrapped end‐to‐end around the side of P, and the third class docked perpendicularly to the other poses. All poses contacted the palm subdomain, the TP domain, and the spacer, but none placed ε in the RT active site. These binding poses may represent the non‐specific early phase in binding between ε and P (Figure 1b). 16 Overall, the ε docking studies were less informative than the heteroduplex dockings due to the diversity of binding poses, possibly due to inability of the docking algorithm to recapitulate the dynamic, chaperone‐driven ε binding mechanism.
2.5. Assessing plausibility of the predicted HBV P model
The novelty of the HBV P fold necessitated careful validation. This was done computationally and using the wealth of experimental data generated by us and others over the last 30 years.
Plausibility of the TP interface with the catalytic core of the enzyme was evaluated computationally by assessing binding interactions between the TP and the catalytic core of P comprised of the RT and RNase H domains. The model predicted 31 hydrogen bonds and one salt bridge between residues in the TP and RT domains, and three hydrogen bonds and two salt bridges between the TP and RNase H domains (Table 1). The model predicted 184 residues with extensive hydrophobic regions that were buried within the structure at the interface between the TP domain and the catalytic core (76 in the TP, 91 in the RT, and 17 in the RNase H domains; Tables 2 and 3). The large majority of these hydrophobic residues formed patches between the TP and RT domains. These observations are consistent with binding of the TP domain to the rest of P being strong enough to support folding of the enzyme into this conformation.
TABLE 1.
Molecular bonds predicted between the TP domain and the rest of P
| TP‐RT H‐bonds | TP‐RT salt bridges | ||||
|---|---|---|---|---|---|
| TP residue | Distance (Å) | RT residue | TP residue | Distance (Å) | RH residue |
| L41 | 1.77 | S670 | K130 | 2.60 | D391 |
| T53 | 1.93 | P627 | TP‐RNase H H‐bonds | ||
| T53 | 2.14 | D629 | TP residue | Distance (Å) | RH a residue |
| T60 | 1.85 | R635 | E22 | 3.70 | R739 |
| L62 | 2.18 | D629 | N34 | 2.25 | L763 |
| N71 | 1.93 | T383 | F95 | 1.84 | R829 |
| W74 | 1.89 | D377 | TP‐RNase H salt bridges | ||
| Q75 | 2.15 | S386 | TP residue | Distance (Å) | RH residue |
| I82 | 1.74 | K514 | E22 | 2.96 | R739 |
| Q85 | 1.98 | L577 | E103 | 3.45 | R539 |
| V96 | 2.09 | R539 | |||
| E103 | 1.37 | S535 | |||
| R106 | 2.13 | Q528 | |||
| A113 | 1.90 | H440 | |||
| A113 | 1.97 | P438 | |||
| T120 | 1.85 | H446 | |||
| K121 | 1.93 | Q394 | |||
| Y122 | 2.15 | P445 | |||
| L123 | 1.78 | V369 | |||
| D126 | 2.32 | R460 | |||
| G128 | 1.80 | V373 | |||
| K130 | 1.91 | V373 | |||
| K130 | 1.07 | D391 | |||
| H146 | 2.00 | Y504 | |||
| I156 | 2.03 | K514 | |||
| I156 | 2.33 | R513 | |||
| R160 | 1.89 | T365 | |||
| R164 | 2.90 | T362 | |||
| F168 | 2.05 | H358 | |||
| F168 | 2.08 | E357 | |||
| Q180 | 2.71 | R359 | |||
RH, RNase H domain.
TABLE 2.
TP domain residues with buried hydrophobic regions at the interface with the RT or RNase H domains
| Residue | Solvent accessibility (%) | Residue | Solvent accessibility (%) | Residue | Solvent accessibility (%) |
|---|---|---|---|---|---|
| S4 | 9.7 | T76 | 15.4 | H136 | 46.1 |
| F8 | 9.3 | F79 | 6.9 | V137 | 8.6 |
| L11 | 23.5 | I82 | 0.2 | V138 | 15.4 |
| L12 | 2.6 | L84 | 0.1 | Y141 | 0 |
| L13 | 8.4 | Q85 | 39.4 | F142 | 0 |
| L21 | 40.3 | I88 | 0.8 | Q143 | 10.6 |
| L25 | 0 | V89 | 22.7 | T144 | 0 |
| P26 | 8.6 | C92 | 0 | R145 | 1.9 |
| L28 | 14.5 | F95 | 32.5 | L148 | 0 |
| A29 | 6.1 | L99 | 0 | H149 | 7.3 |
| L33 | 34.8 | T100 | 19.7 | L151 | 0 |
| V37 | 17.5 | Q103 | 1.5 | W152 | 0.7 |
| N42 | 46.4 | R106 | 23.6 | K153 | 47 |
| L 43 | 3.9 | L107 | 0.3 | I156 | 0.1 |
| L46 | 48.4 | I110 | 0.3 | L157 | 0 |
| V48 | 35.9 | M111 | 5.1 | Y158 | 0.1 |
| W52 | 37.5 | A113 | 0 | K159 | 11.5 |
| T53 | 10.6 | F115 | 0.9 | A166 | 0 |
| H54 | 34.4 | Y116 | 45.3 | F168 | 0 |
| V56 | 0.3 | T120 | 5.5 | C169 | 6 |
| G57 | 26.4 | Y122 | 28.8 | G170 | 25.1 |
| F59 | 11.1 | L123 | 27.3 | Y173 | 1.2 |
| G61 | 4.8 | L125 | 12 | W175 | 12 |
| L62 | 46.4 | D126 | 21.5 | E176 | 0.4 |
| F70 | 1.8 | I129 | 0.3 | L179 | 37.3 |
| N71 | 28.8 | Y132 | 37.3 | ||
| W74 | 0.9 | Y133 | 3.6 |
TABLE 3.
Catalytic core residues with buried hydrophobic regions at the interface with the TP domain
| Residue | Solvent accessibility (%) | Residue | Solvent accessibility (%) | Residue | Solvent accessibility (%) |
|---|---|---|---|---|---|
| RT domain | RT domain (continued) | RT domain (continued) | |||
| I360 | 1.0 | R460 | 31.5 | P623 | 22.7 |
| T362 | 0 | L461 | 27.8 | P627 | 4.7 |
| P366 | 28.1 | L486 | 4.9 | I628 | 0 |
| G371 | 0.4 | S489 | 26.8 | W630 | 15 |
| G372 | 2.4 | L493 | 8.3 | L631 | 34.4 |
| V373 | 0 | Y497 | 7.3 | VV632 | 0 |
| F374 | 13.9 | G498 | 32.1 | I636 | 0 |
| L375 | 1.0 | L501 | 0 | Q648 | 17.4 |
| V376 | 29 | L503 | 17.3 | C649 | 0 |
| S386 | 0.7 | S505 | 0 | P656 | 5.3 |
| L388 | 3.1 | I508 | 7 | L657 | 0 |
| V389 | 38.9 | I509 | 7.4 | C660 | 5.7 |
| V390 | 2.8 | L510 | 0 | Q665 | 45.6 |
| Q394 | 19 | L514 | 1.3 | A666 | 11.4 |
| F395 | 0.8 | I515 | 0.6 | T668 | 10 |
| V402 | 27.4 | P516 | 0.2 | F676 | 3.2 |
| W404 | 16.7 | M517 | 34.3 | N678 | 38.7 |
| S431 | 13.2 | L521 | 12 | RNase H domain | |
| F434 | 16.6 | S522 | 1.6 | P719 | 0.6 |
| Y435 | 5.8 | L525 | 6.9 | L723 | 13.5 |
| L437 | 0 | L 526 | 0.1 | T727 | 2.7 |
| P438 | 0.3 | Q528 | 23.2 | L731 | 0 |
| L439 | 1.0 | F529 | 0.4 | C734 | 0 |
| H440 | 24.1 | A532 | 0 | F735 | 0 |
| P441 | 0 | I533 | 1.3 | P761 | 28 |
| A443 | 0 | V536 | 0.1 | W762 | 11.1 |
| M444 | 0 | R539 | 8.6 | L763 | 5.4 |
| P445 | 7.1 | A540 | 1.2 | L764 | 0 |
| L447 | 0.3 | F541 | 4.2 | C766 | 1.3 |
| V449 | 0 | V570 | 0 | A 767 | 0 |
| G450 | 0 | F573 | 0.4 | A768 | 0 |
| S452 | 0.6 | L574 | 0.6 | W770 | 21.3 |
| G453 | 0 | L577 | 0.3 | I771 | 10.5 |
| L454 | 0.6 | I579 | 1.3 | S819 | 18.9 |
| Y457 | 0.7 | H580 | 34.3 | R829 | 32.6 |
| V458 | 12.5 | C618 | 0.1 | ||
| A459 | 48.1 | L622 | 0 | ||
The T3 motif in the TP domain and the RT1 motif in the RT domain were previously found to form two parts of a discontinuous RNA binding motif in HBV and DHBV that is obscured when P is initially translated but that promotes binding of ε to P upon being exposed by cellular chaperones (Figure 1b). 16 , 20 , 21 , 49 , 50 Despite being separated by 207 residues, the T3 and RT1 motifs in the models formed a nearly contiguous motif wrapping over the fingers subdomain and into the interior face of the RT domain (Figure 5c). All the binding poses for ε (Figure S29) had at least one contact between ε and RT1, but few contacts were found with T3. A strand of the TP domain overlaid the RT1 motif near where it contacts T3, and one of the few structured portions of the spacer domain obscured most of T3. Comparing the full P model to previously identified conformations for P implies the AlphaFold model may approximate the closed complex 16 initially formed upon translation of P that cannot bind specifically to ε (Figure 1b).
Further support for the hepadnaviral P fold comes from the DHBV P model (Figure S16; see below). Partial proteolysis revealed that E17 or E18, E369, E370, E371, D468, E555, E565, E571, and E572 are protease accessible, and that D468 is inaccessible. 21 , 23 The location of these 11 residues on the DHBV P model is consistent with these observations. E164, E176, and E199 are resistant to proteolysis in the absence of a chaperone‐mediated conformational change but become sensitive upon chaperone action. 21 E164 is partially obscured in a groove but could be exposed by minor structural changes, and E199 is buried but could be exposed by shifting the helix in which it is located. These observations are consistent with the model reflecting the closed complex (Figure 1b). E176 is at the end of an α‐helix that could reduce its proteolytic sensitivity. Remodeling that helix upon chaperone activation is plausible because E176 is part of the T3 motif that participates in chaperone‐dependent ε binding. E106 and E124 are protease resistant 21 but were exposed in the model. E124 is in an α‐helix that could reduce its sensitivity to proteolysis, but E106 is in a loop. The E106 result indicates either a weakness in the model or that the site is obscured by the spacer or chaperones in the native holoenzyme complex. The epitope for monoclonal antibody 11 (residues 53–59) is constitutively exposed, and the epitopes for monoclonal antibodies 5 (residues 141–147) and 6 (residues 191–197) are obscured before the chaperone‐mediated conformational change leading to formation of the open complex (Figure 1b). 16 , 21 The antibody 11 epitope is fully exposed in the DHBV P model, the antibody 6 epitope is in a deep groove that could be exposed by shifting the C‐terminus of the spacer, and the spacer domain may overlay the antibody 5 epitope and be removed by chaperone activation.
We next evaluated the ability of the P model to provide mechanistic explanations for the effects of mutations affecting HBV RNA binding, DNA priming, or RNase H activity. We introduced these mutations into the full‐length genotype B model, re‐energy minimized the wild‐type and mutant models using the Schrödinger OPSL4 force field, and identified polar interactions involving the residues. Limitations to the interpretation of the mutant phenotypes were anticipated because the extent of the functional limitations imposed on P by the closed conformation that the model appears to reflect is unknown.
Four sets of mutations in the TP that impair RNA binding and/or RNA packaging into viral capsids can be explained by the positions of the mutated residues (Figure 6). First, R105A is deficient in RNA packaging, R114E reduces RNA binding and packaging, and both mutations ablate protein priming and DNA synthesis [R105 50 , 51 ; R114 24 ]. The model places these residues close to each other on the outside of the fingers subdomain (Figure 6a,b), and docking studies predict that they may form salt bridges with the sugar‐phosphate backbone of ε (Figures 7a and S30d,e). Mutating these residues removes these interactions, which would impair ε binding and subsequent DNA priming. Second, K130 in the TP domain is predicted to form hydrogen bonds with D391 and V373 in the RT1 motif, and to form a salt bridge with D391 (Figures 6a and 7b). It also binds to ε in docking pose 2 (Figure S29). K130L ablates these interactions (Figure S30f) and is defective in packaging RNA into viral capsids. 52 This would be expected if the orientation, flexibility, and/or exposure of RT1 were changed by disrupting K130:D391 binding, or alternatively, if K130 works together with RT1 to promote ε binding. Third, Y147 and L148 in the TP domain are adjacent to the T3 motif, and Y147 hydrogen bonds to S386 and L388 in the RT1 motif (Figure 7c). Y147A/L148A removes these contacts (Figure S30g) and is deficient in RNA packaging into capsids for both HBV and DHBV, 16 , 49 as would be anticipated if disrupting these interactions between the TP and RT1 impaired T3 and/or RT1 function. Finally, T162 is two residues C‐terminal to the T3 motif in the TP domain. It is predicted to be the N terminal residue of a turn between two strands of a β‐sheet. It hydrogen bonds with R164 (the last residue in the turn) and S165 (Figure 7d). T162P is defective in RNA packaging into capsids, 53 which requires T3‐dependent binding to ε. 16 Inserting a proline disrupts the hydrogen bonds with R164 and S165 (Figure S30h), which may disrupt the local structure and/or impede conformational changes required for the specific binding of ε to P.
FIGURE 6.

Location of motifs and residues used for assessing validity of the predicted model for full‐length P. (a) Positions of the mutated residues in the context of the full‐length genotype B model. (b) Positions of R105 and R114 in the context of the P model docked to the ε RNA stem loop. This image reflects only one of the three docking poses proposed for ε (Figure S29) and these interactions should be interpreted conservatively
FIGURE 7.

Interactions among motifs and residues used for assessing validity of the model for P. Interactions among HBV motifs and key residues are shown in the genotype B model. (a–g) Networks of polar interactions involving the wild‐type residues at the indicated positions are indicated as colored dashed lines. Mutating the indicated residues yielded strong phenotypes as described in the text. Comparisons between the wild‐type and mutant interaction networks are in Figure S30. Red, sites that were mutated and used for validation analyses; Orange, ε RNA with phosphates interacting with P shown; Violet, RT1 motif; Cyan, T3 motif; Blue, Y63 priming residue; Magenta, YMDD RT active site motif; Light blue, D‐E‐D‐D motif. Blue dashed lines, salt bridges; Yellow dashed lines, hydrogen bonds
Two interactions predicted by the HBV P model can explain mutations defective in DNA priming. T60 is in the priming loop in the TP domain and is predicted to form hydrogen bonds with S64 in the TP domain and R653 in the RT domain (Figure 7e). T60E is defective in protein priming, but T60A has no phenotype. 24 The T60A mutation would lessen its affinity with R653 in the RT domain and retain flexibility of the priming loop (Figure S30a). In contrast, T60E would create a salt bridge with R653 (Figure S30b) which would increase affinity for R653 and impair folding of the priming loop downward, reducing Y63's ability to access the RT active site during priming. Second, W74 at the C‐terminus of the priming loop hydrogen bonds with T383 and D377 within the RT1 motif to help place Y63 above the RT active site (Figure 7f). W74A removes these interactions (Figure S30c) and is deficient in protein priming but not RNA packaging, as expected if the mutation disrupted a network of interactions anchoring the priming loop in a position where Y63 can shift downward into the RT active site during priming. 50 , 54
Finally, the RNase H‐deficient R714A mutation 32 was assessed. R714 hydrogen bonds to the backbone of R799. R799 also hydrogen bonds with A783, and D789. It also forms salt bridges with D788 and D789, the last residue in the RNase H D‐E‐D‐D motif (Figure 7g). This region forms a loop in the HBV model but is an α‐helix in most other RNase H enzymes. R714A removes a hydrogen bond at the base of this interaction network (Figure S30i) which would increase the flexibility of the loop. Shifting the loop containing D788 would inhibit RNase H activity by altering the location or conformation of the DEDD motif.
2.6. Inhibitor docking
2.6.1. Reverse transcriptase active site
Plausibility of the RT active site in the models was evaluated by docking the active triphosphate form of three nucleos(t)ide analog HBV reverse transcriptase inhibitors, Entecavir, Tenofovir, and Lamivudine 5 into the active site in the genotypes B and D catalytic core and full‐length P models. A double‐stranded DNA duplex from the HIV:substrate co‐crystal (PDB: 1RTD) was superimposed on the HBV RT active site in the four models, and the inhibitors were docked using Glide XP within the Schrödinger Maestro suite. The inhibitors bound in the expected poses in all four models, with the α and β phosphates coordinated by the active site Mg2+ ions, and the inhibitor pairing with the template strand of the primer‐template (Figure 7a). Docking energies ranged from −8.86 to −13.3 kCal/mol (Table 4), correspond to K d values of 0.32 to 0.00018 μM. The near identity of the AlphaFold model to the prior RT domain models (e.g., Das et al. 26 ) in the RT active site indicates that the ability of the older RT models to reveal mechanisms of drug resistance mutations is retained by the AlphaFold model of P.
TABLE 4.
Docking scores for the most stable poses of compounds into the RT and RNase H active sites
| Compound a | Class b | gtB RT‐RNase H | gtB P | gtD RT‐RNase H | gtD P |
|---|---|---|---|---|---|
| RT active site c | |||||
| Entecavir‐TP | NA | −13.31 | −13.01 | −12.92 | −10.42 |
| Tenofovir‐TP | NA | −11.05 | −8.86 | −9.27 | −12.96 |
| Lamivudine‐TP | NA | −11.76 | −8.94 | −9.61 | −11.31 |
| RNase H active site c | |||||
| 208 | HPD | −10.82 | −9.45 | −9.58 | −9.38 |
| A25 | −10.69 | −9.71 | −9.92 | −10.74 | |
| 110 | αHT | −7.88 | −7.78 | −7.64 | −7.80 |
| 404 | −8.64 | −8.72 | −8.31 | −8.47 | |
| 12 | HNO | −7.44 | −8.12 | −5.10 | −7.46 |
| 1,073 | −7.72 | −7.73 | −7.39 | −7.64 | |
2.6.2. Ribonuclease H active site
The HBV RNase H active site was validated by docking inhibitors into the active site in the genotype B and D catalytic core and full‐length models using Glide XP. Inhibitors were α‐hydroxytropolones (compounds 110 and 404), N‐hydroxypyridinediones (208 and A25), and N‐hydroxynapthyridinones (12 and 1,073). 8 , 28 , 55 , 56 , 57 , 58 All four models docked HBV RNase H inhibitors in poses where the compounds coordinated the Mg2+ ions via their trident of metal chelating atoms (Figure 7b), as predicted from the experimentally determined binding poses of similar compounds against the HIV RNase H 59 , 60 , 61 , 62 , 63 and the need for an intact metal chelating trident for the inhibitors to work against HBV. The compounds adopted many binding poses as anticipated from their structural diversity, usually along the nucleic acid binding groove, with docking energies of −5.1 to −10.8 kCal/mol (Table 4), corresponding to K d values of 182 to 0.012 μM.
2.7. Effects of HBV's genetic diversity on the models
HBV has nine genotypes 2 with P proteins being 832–845 aa long and differing by 11–16% at the aa level (Table S1). To evaluate how these variations may affect P's structure, models for P proteins from all genotypes were generated using AlphaFold (Figures S7–S15) and compared to the genotype B model. All models had similar pLDDT score profiles, with the C‐terminal half of the TP domain and the catalytic core having high pLDDT values (~70–90), with much lower pLDDT values for the spacer domain and the N‐ and C‐termini. Superpositioning all full‐length models against the genotype B model revealed the same overall fold, with an average pairwise RMSD of 1.98 Å (1.67–2.16 Å) outside the highly variable spacer domain (Figure S25a–h). The key features of the genotype B model were conserved in all models, including cupping of the TP domain around the catalytic core, positioning the priming tyrosine residue above the YMDD active site motif, the orientation of the T3 and RT1 motifs, and the orientation of the RT and RNase H domains (Figures S7–S15).
2.8. Models of animal P proteins and non‐retroviral reverse transcriptases
To evaluate phylogenetic conservation of the HBV P fold, models were constructed for P proteins from animal hepadnaviruses 2 , 64 including woodchuck hepatitis virus (WHV, rodent), DHBV (bird), skink HBV (SkHBV, reptile), Tibetan frog HBV (TFHBV, amphibian), and tetra metahepadnavirus (TMDV, fish; Figure 8, Table S1, and Figures S16–S20). These viruses share the same four‐domain structure of P and replicate by protein‐primed reverse transcription. 64 We also generated a model for P from the rockfish nackednavirus (RNDV; Figure S21). Nackednaviruses and hepadnaviruses diverged about 400 million years ago, before the lineage that became the hepadnaviruses acquired its surface glycoproteins. 64 Consequently, nackednavirus P lacks the spacer domain that encodes part of the surface protein gene in a different reading frame. The priming mechanism is conserved between HBV, DHBV, and RNDV because their P proteins can prime DNA synthesis using ε RNAs from the other viruses with only minor nucleotide substitutions in ε. 65
FIGURE 8.

Computational docking of compounds into the RT and RNase H active sites. Compounds were docked using the Glide XP program within the Schrödinger molecular analysis suite. Magenta spheres, Mg2+ ions. (a) Triphosphate form of nucleos(t)ide analog drugs docked into the RT active site. Red; DNA duplex; Orange, Tenofovir triphosphate; Green, Entecavir triphosphate; Cyan, Lamivudine triphosphate. (b) RNase H inhibitors docked into the RNase H active site. Yellow, Compound A25; Cyan, 208; Pink, 404; Orange, 110; Blue, 1,073; Green, 12
All animal virus P proteins shared the same basic predicted structure, with the catalytic core of the RT and RNase H domain forming a globular unit featuring a contiguous nucleic acid binding groove, the TP domain cupping around the catalytic core, and the priming tyrosine residue poised over the RT active site (Figures 9 and S16–S21). All models had the A‐E DNA polymerase active site motifs and the D‐E‐D‐D RNase H motif in positions analogous to their locations in the HBV P model. The T3 and RT1 motifs in all six animal P protein models were in the same relative position as in HBV P. As expected, the spacer domain in the animal hepadnaviruses was unstructured, and RNDV had only a very short sequence linking the TP to the RT domain (Figure 9d). The genotype B HBV P model superimposed well with the RNDV RT model (RMSD = 2.76 Å; Figure S27d), with the largest differences being in the flexible priming loop, the tip of the fingers subdomain, and position of the RNase H domain (Figure S27d).
FIGURE 9.

Predicted folds of animal hepadnavirus and nackednavirus P proteins. (a) DHBV (avian hepadnavirus). (b) TFHBV (frog hepadnavirus). (c) TMDV (fish hepadnavirus). (d) RNDV (fish nackednavirus). Red, TP domain; Gray, spacer domain; Yellow, RT domain; Green, RNase H domain. The priming tyrosine residues, YMDD RT active site motif residues, and D‐E‐D‐D RNase H active site residues are shown as sticks. Putative priming residues are labeled. Magenta spheres, Mg2+ ions. The amino termini are marked with cyan spheres
These predicted structures imply that the hepadnaviral P fold existed before the split between the nackednaviruses and hepadnaviruses 400 million years ago. 64 The models also imply that the acquisition of an envelope in the hepadnaviruses included insertion of sequences for part of the surface glycoproteins into sequences encoding residues corresponding to 174–192 of the modern nackednavirus RNDV.
Finally, we compared the predicted hepadnaviral P fold to models we generated for reverse transcriptases from cauliflower mosaic virus (CaMV) 66 , 67 and the mitochondrial retrotransposon pFOXC3 from the fungus Fusarium oxysporum f. sp. Matthiolae 68 , 69 (Figures S22 and S23). CaMV primes reverse transcription using a tRNA similar to the retroviruses, whereas the pFOXC3 RT primes reverse transcription using an unidentified tyrosine residue in a mechanism similar to the hepadnaviruses. The predicted folds for the CaMV and pFOXC3 RTs revealed a nucleic acid binding groove and YVDD (CaMV) and YADD (pFOXC3) motifs analogous to the hepadnaviral YMDD RT active site motif in their expected positions. Neither the CaMV nor pFOXC3 models had domains analogous to the TP or spacer domains. The pFOXC3 model aligned to the HBV RT‐RNase H domains in the full‐length model with an RMSD = 4.52 Å (Figure S27e) and positioned Y35 in a plausible position for priming reverse transcription. The pFOXC3 model did not have a readily identifiable RNase H domain or an identifiable D‐E‐D‐D RNase H active site motif. The HBV RT and RNase H domains aligned with the CaMV RT‐RNase H domains with an RMSD = 3.96 Å (Figure S27g). The CaMV RNase H active site was predicted to contain a D‐E‐E‐D motif rather than D‐E‐D‐D, and the CaMV N‐terminal aspartic proteinase formed a globular domain absent in the hepadnaviral proteins. Overall, these models imply that the hepadnaviral P fold is a feature of the nackednavirus/hepadnavirus lineages rather than being widespread among non‐retroviral eukaryotic RTs.
2.9. Limitations
There are two limitations to this analysis. First, all models are predictions. The models are well supported by the available biological and biochemical data, but they are not based on experimental structural data. Second, P is structurally dynamic, so there is no single structure that can represent all of its biologically relevant conformations. The full‐length P models correspond best to the closed conformation, 16 but they could represent an average of multiple conformations adopted by P.
2.10. Utility of the models
HBV P has resisted empirical structural analyses for decades due to intractable protein production problems, so these models provide the first structural approximation for the whole enzyme. They reveal an orientation of the TP domain relative to the rest of the enzyme that makes clear predictions for ε binding, DNA priming, and conformational dynamics of P during viral replication. They enable formulation of detailed hypotheses regarding the mechanisms of DNA elongation during reverse transcription and how the RT and RNase H domains coordinate during reverse transcription. Finally, the models can support structure‐guided drug design against the RT and RNase H active sites, and they open the door to rational drug discovery against targets other than the enzymatic active sites.
3. MATERIALS AND METHODS
3.1. Sequences and protein structures
Protein sequences are in Dataset S1. Table S1 contains Genbank numbers or literature references for the sequences plus their pairwise identities with the genotype B reference sequence (AB554017). Appendix S1 lists the protein domain and motif boundaries employed. The Tibetan frog hepatitis B virus (TFHBV) sequence lacks an in‐frame ATG for P, so the start site was set to match the N‐terminus of HBV genotype B P.
3.2. Molecular modeling with AlphaFold
Sequences were folded using AlphaFold2 Advanced (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb) with default parameters. Five models were generated for each sequence, and side chain positions for the model having the highest pLDDT value were refined by the Amber‐Relax module. The relaxed structures were used for all analyses. Predicted protein database (PDB) files are in Dataset S2.
Hydrophobic surfaces on HBV P gtB model were analyzed using the Protein Surface Analyzer program within the Schrödinger suite. Residues at the interfaces between the TP domain and the catalytic core of HBV P were manually identified. Solvent accessibility of each residue was measured using the web server GETAREA (http://curie.utmb.edu/getarea.html). Residues having ratio values >50% were considered to be solvent exposed, whereas residues with the ratio value <20% were defined as buried. The Schrödinger Protein Preparation Wizard was used to identify hydrogen bonds and salt bridges. Hydrogen bonds and salt bridges at the interface of TP domain and catalytic core of HBV polymerase were then manually identified.
3.3. Superpositions
Superpositioning was done using Protein Structure Alignment in the Maestro module of the Schrödinger suite (Schrödinger LLC, New York, NY). Superpositioning employed no constraints except that the HBV P and catalytic core superpositions with HIV RT‐RNase H, Ty3 RT, and CaMV RT were done by aligning selected residues in fingers, palm, and thumb subdomains of HBV RT domain due to the lack of the linker region in HBV P. Graphs for the RMSD distributions were determined using Stamp Structural Alignment utility in the Visual Molecular Dynamics (VMD) software followed by plotting data each residue's RMSD values against the amino acid alignment generated by the superpositioning. In some cases, structure alignments were performed by deleting nonaligned residues to improve the alignment of similar regions between two structures.
3.4. Nucleic acid docking
A 21 bp heteroduplex (PDB: 4B3Q) and a 24 bp heteroduplex (PDB: 6BSH) were extracted from HIV RT‐RNase H co‐crystals and docked into HBV P using the PIPER program in the BioLuminate module of the Schrödinger suite. Protonation state was set using Epik at pH 7.5 ± 2, hydrogen bonds were assigned employing PROPKA at pH 7.5, and energy minimization was done with the OPSL4 force field. Models were docked with both substrates separately without constraints. Final poses were selected based on substrate placement into the binding channel between active sites of RT and RNase H domains while considering the engagement of RNA or DNA strands with active site residues. Similarly, an NMR structure for ε (PDB: 6VAR) was docked into HBV P. Protein and nucleic acid structures were prepared with the Protein Preparation wizard in the BioLuminate module.
3.5. Compound docking
Compound docking into the models employed Glide XP in the Schrödinger Maestro module. Triphosphates for the nucleos(t)ide analogs Entecavir, Lamivudine, and Tenofovir were extracted from HIV‐1 RT‐RNase H:inhibitor co‐crystals (PDB: 5XN1, 6KDJ, and 3JSM) for docking. RNase H inhibitors were selected for structural diversity. Ligands were prepared with LigPrep (Schrödinger LLC), where compound energy minimizations were done with the OPSL4 force field, protonation states of the ligands were defined using Epik at pH 7.5 +/−2 to set the ionization state of the metal binding motifs, and compounds were desalted and tautomerized while retaining chirality. The RT‐RNase H and full‐length P models containing Mg2+ ions in the appropriate ionization states were prepared with the Schrödinger Protein Preparation wizard as described above. A 20 bp double‐stranded DNA was placed into the RT active site of the models for nucleos(t)ide analog docking by superposition of HIV RT‐RNase H:dsDNA co‐crystal (PDB: 1RTD). The RT active site docking grid was defined by placing Entecavir‐triphosphate from the HIV RT‐RNase H structure co‐crystal (PDB: 5XN1) into the RT active site of the HBV models by superposing the YMDD motifs. The RNase H docking grid was defined by placing β‐thujaplicinol into the active site by superposing the D‐E‐D‐D motif from the HIV RNase H:β‐thujaplicinol co‐crystal (PDB: 3K2P) onto the HBV model. The centroids of the ligands in the active sites were used to create 10 Å receptor grids for docking. Docking employed Schrödinger Glide XP with default settings.
3.6. Mutations employed for model validation
Mutations to HBV P were identified from the literature, especially. 24 Mutations were mapped onto the full‐length HBV genotype B P model (genotype B numbers are reported, which may differ from the source if a different genotype was used). Hydrogen bonds were identified using PyMOL's Find polar contacts function (h‐bond cutoff center = 4.0 Å); all predicted polar interactions assessed were ≤3.0 Å, implying moderate to strong interactions.
AUTHOR CONTRIBUTIONS
Razia Tajwar: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (lead); project administration (equal); resources (equal); software (lead); supervision (supporting); validation (equal); visualization (lead); writing – original draft (supporting); writing – review and editing (equal). Daniel P. Bradley: Conceptualization (supporting); data curation (supporting); formal analysis (equal); methodology (supporting); resources (supporting); software (supporting); validation (equal); visualization (supporting); writing – original draft (supporting); writing – review and editing (equal). Nathan L. Ponzar: Data curation (supporting); formal analysis (supporting); methodology (supporting); resources (supporting); visualization (supporting); writing – original draft (supporting). John Tavis: Conceptualization (lead); data curation (equal); formal analysis (equal); funding acquisition (lead); investigation (equal); methodology (equal); project administration (lead); resources (equal); software (equal); supervision (lead); validation (equal); visualization (equal); writing – original draft (lead); writing – review and editing (lead).
CONFLICT OF INTEREST
The authors have no conflicts to declare.
Supporting information
Appendix S1. Domain and motif boundaries used in these analyses
Figures S1–S23. Images and key quality metrics for all predicted models
Figures S24–S27. Images of the superpositions used in these analyses
Figures S28–S29. Detailed images for the docking experiments
Figure S30. Comparisons of interaction networks for the wild‐type and mutant residues used to validate the HBV P model
Table S1. Pairwise identities of P proteins relative to P from HBV genotype B
Dataset S1. Sequences employed in modeling
Dataset S2. Protein database (PDB) files for all predicted models.
ACKNOWLEDGEMENTS
This work was supported by NIH grants R01 AI150610 and R01 AI148362 and Department of Defense grant W81XWH‐18‐1‐0307 to John E. Tavis. We thank Dr John Kennell for input on the pFOX‐C3 reverse transcriptase, Dr Juan Villa for early modeling work on the HBV RNase H, and the AlphaFold team for enabling this study. A version of this article is posted on the BioRxiv preprint server under the Attribution‐NonCommercial 4.0 International (CC BY‐NC 4.0) license.
Tajwar R, Bradley DP, Ponzar NL, Tavis JE. Predicted structure of the hepatitis B virus polymerase reveals an ancient conserved protein fold. Protein Science. 2022;31(10):e4421. 10.1002/pro.4421
Review Editor: Nir Ben‐Tal
Funding information Department of Defense, Grant/Award Number: W81XWH‐18‐1‐0307; National Institutes of Health, Grant/Award Numbers: R01 AI148362, R01 AI150610
REFERENCES
- 1. Seeger C, Zoulim F, Mason WS. Hepadnaviridae. Fields virology, Vol 2: DNA viruses. Philadelphia, PA: Wolters Kluwer, 2021; p. 640–682. [Google Scholar]
- 2. Glebe D, Goldmann N, Lauber C, Seitz S. HBV evolution and genetic variability: Impact on prevention, treatment and development of antivirals. Antiviral Res. 2021;186:104973. [DOI] [PubMed] [Google Scholar]
- 3. Polaris Observatory Collaborative . Global prevalence, treatment, and prevention of hepatitis B virus infection in 2016: A modelling study. Lancet Gastroenterol Hepatol. 2018;3:383–403. [DOI] [PubMed] [Google Scholar]
- 4. Trepo C, Chan HL, Lok A. Hepatitis B virus infection. Lancet. 2014;384:2053–2063. [DOI] [PubMed] [Google Scholar]
- 5. Pierra Rouviere C, Dousson CB, Tavis JE. HBV replication inhibitors. Antiviral Res. 2020;179:104815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Ghany MG. Current treatment guidelines of chronic hepatitis B: The role of nucleos(t)ide analogues and peginterferon. Best Pract Res Clin Gastroenterol. 2017;31:299–309. [DOI] [PubMed] [Google Scholar]
- 7. Poch O, Sauvaget I, Delarue M, Tordo N. Identification of four conserved motifs among the RNA‐dependent polymerase encoding elements. EMBO J. 1989;8:3867–3874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Tavis JE, Cheng X, Hu Y, et al. The hepatitis B virus ribonuclease H is sensitive to inhibitors of the human immunodeficiency virus ribonuclease H and integrase enzymes. PLoS Pathog. 2013;9:e1003125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Yang W, Steitz TA. Recombining the structures of HIV integrase, RuvC and RNase H. Structure. 1995;3:131–134. [DOI] [PubMed] [Google Scholar]
- 10. Nowotny M. Retroviral integrase superfamily: The structural perspective. EMBO Rep. 2009;10:144–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Villa JA, Pike DP, Patel KB, et al. Purification and enzymatic characterization of the hepatitis B virus ribonuclease H, a new target for antiviral inhibitors. Antiviral Res. 2016;132:186–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Tavis JE, Badtke MP. Hepadnaviral genomic replication. In: Cameron CE, Goette M, Raney KD, editors. Viral genome replication. New York: Springer Science+Business Media, LLC, 2009; p. 129–143. [Google Scholar]
- 13. Beck J, Nassal M. Hepatitis B virus replication. World J Gastroenterol. 2007;13:48–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Hu J, Toft DO, Seeger C. Hepadnavirus assembly and reverse transcription require a multi‐component chaperone complex which is incorporated into nucleocapsids. EMBO J. 1997;16:59–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hu J, Flores D, Toft D, Wang X, Nguyen D. Requirement of heat shock protein 90 for human hepatitis B virus reverse transcriptase function. J Virol. 2004;78:13122–13131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Badtke MP, Khan I, Cao F, Hu J, Tavis JE. An interdomain RNA binding site on the hepadnaviral polymerase that is essential for reverse transcription. Virology. 2009;390:130–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Lanford RE, Notvall L, Lee H, Beams B. Transcomplementation of nucleotide priming and reverse transcription between independently expressed TP and RT domains of the hepatitis B virus reverse transcriptase. J Virol. 1997;71:2996–3004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Loeb DD, Hirsch RC, Ganem D. Sequence‐independent RNA cleavages generate the primers for plus strand DNA synthesis in hepatitis B viruses: Implications for other reverse transcribing elements. EMBO J. 1991;10:3533–3540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zhang Z, Tavis JE. The duck hepatitis B virus reverse transcriptase functions as a full‐length monomer. J Biol Chem. 2006;281:35794–35801. [DOI] [PubMed] [Google Scholar]
- 20. Cao F, Badtke MP, Metzger LM, et al. Identification of an essential molecular contact point on the duck hepatitis B virus reverse transcriptase. J Virol. 2005;79:10164–10170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Stahl M, Beck J, Nassal M. Chaperones activate hepadnavirus reverse transcriptase by transiently exposing a C‐proximal region in the terminal protein domain that contributes to epsilon RNA binding. J Virol. 2007;81:13354–13364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wang X, Hu J. Distinct requirement for two stages of protein‐primed initiation of reverse transcription in hepadnaviruses. J Virol. 2002;76:5857–5865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Lin L, Wan F, Hu J. Functional and structural dynamics of hepadnavirus reverse transcriptase during protein‐primed initiation of reverse transcription: Effects of metal ions. J Virol. 2008;82:5703–5714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Clark DN, Flanagan JM, Hu J. Mapping of functional subdomains in the terminal protein domain of hepatitis B virus polymerase. J Virol. 2017;91:e01785‐16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Buhlig TS, Bowersox AF, Braun DL, et al. Molecular, evolutionary, and structural analysis of the terminal protein domain of hepatitis B virus polymerase, a potential drug target. Viruses. 2020;12:570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Das K, Xiong X, Yang H, et al. Molecular modeling and biochemical characterization reveal the mechanism of hepatitis B virus polymerase resistance to lamivudine (3TC) and emtricitabine (FTC). J Virol. 2001;75:4771–4779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Xu X, Thai H, Kitrinos KM, et al. Modeling the functional state of the reverse transcriptase of hepatitis B virus and its application to probing drug‐protein interaction. BMC Bioinformatics. 2016;17(Suppl 8):280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li Q, Lomonosova E, Donlin MJ, et al. Amide‐containing alpha‐hydroxytropolones as inhibitors of hepatitis B virus replication. Antiviral Res. 2020;177:104777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Potenza N, Salvatore V, Raimondo D, et al. Optimized expression from a synthetic gene of an untagged RNase H domain of human hepatitis B virus polymerase which is enzymatically active. Protein Expr Purif. 2007;55:93–99. [DOI] [PubMed] [Google Scholar]
- 30. Hayer J, Rodriguez C, Germanidis G, et al. Ultradeep pyrosequencing and molecular modeling identify key structural features of hepatitis B virus RNase H, a putative target for antiviral intervention. J Virol. 2014;88:574–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Hyjek M, Figiel M, Nowotny M. RNases H: Structure and mechanism. DNA Repair (Amst). 2019;84:102672. [DOI] [PubMed] [Google Scholar]
- 32. Ko C, Shin YC, Park WJ, Kim S, Kim J, Ryu WS. Residues Arg703, Asp777, and Arg781 of the RNase H domain of hepatitis B virus polymerase are critical for viral DNA synthesis. J Virol. 2014;88:154–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Mariani V, Biasini M, Barbato A, Schwede T. lDDT: A local superposition‐free score for comparing protein structures and models using distance difference tests. Bioinformatics. 2013;29:2722–2728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Donlin MJ, Szeto B, Gohara DW, Aurora R, Tavis JE. Genome‐wide networks of amino acid covariances are common among viruses. J Virol. 2012;86:3050–3063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc. 2015;10:845–858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Das D, Georgiadis MM. The crystal structure of the monomeric reverse transcriptase from Moloney murine leukemia virus. Structure. 2004;12:819–829. [DOI] [PubMed] [Google Scholar]
- 38. Sarafianos SG, Marchand B, Das K, et al. Structure and function of HIV‐1 reverse transcriptase: Molecular mechanisms of polymerization and inhibition. J Mol Biol. 2009;385:693–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Lanford RE, Kim YH, Lee H, Notvall L, Beames B. Mapping of the hepatitis B virus reverse transciptase TP and RT domains by transcomplementation for nucleotide priming and by protein‐protein interaction. J Virol. 1999;73:1885–1893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Beck J, Nassal M. Reconstitution of a functional duck hepatitis B virus replication initiation complex from separate reverse transcriptase domains expressed in Escherichia coli. J Virol. 2001;75:7410–7419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Boregowda RK, Adams C, Hu J. TP‐RT domain interactions of duck hepatitis B virus reverse transcriptase in cis and in trans during protein‐primed initiation of DNA synthesis in vitro. J Virol. 2012;86:6522–6536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Tavis JE, Ganem D. Expression of functional hepatitis B virus polymerase in yeast reveals it to be the sole viral protein required for correct initiation of reverse transcription. Proc Natl Acad Sci USA. 1993;90:4107–4111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Beck J, Nassal M. Efficient Hsp90‐independent in vitro activation by Hsc70 and Hsp40 of duck hepatitis B virus reverse transcriptase, an assumed Hsp90 client protein. J Biol Chem. 2003;278:36128–36138. [DOI] [PubMed] [Google Scholar]
- 44. Hu J, Toft D, Anselmo D, Wang X. In vitro reconstitution of functional hepadnavirus reverse transcriptase with cellular chaperone proteins. J Virol. 2002;76:269–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Huang H, Chopra R, Verdine GL, Harrison SC. Structure of a covalently trapped catalytic complex of HIV‐1 reverse transcriptase: Implications for drug resistance. Science. 1998;282:1669–1675. [DOI] [PubMed] [Google Scholar]
- 46. Ha B, Larsen KP, Zhang J, et al. High‐resolution view of HIV‐1 reverse transcriptase initiation complexes and inhibition by NNRTI drugs. Nat Commun. 2021;12:2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Sarafianos SG, Das K, Tantillo C, et al. Crystal structure of HIV‐1 reverse transcriptase in complex with a polypurine tract RNA:DNA. EMBO J. 2001;20:1449–1461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. LeBlanc RM, Kasprzak WK, Longhini AP, et al. Structural insights of the conserved “priming loop” of hepatitis B virus pre‐genomic RNA. J Biomol Struct Dyn. 2021;1–13. 10.1080/07391102.2021.1934544 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Cao F, Jones S, Li W, et al. Sequences in the terminal protein and reverse transcriptase domains of the hepatitis B virus polymerase contribute to RNA binding and encapsidation. J Viral Hepat. 2014;21:882–893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Jones SA, Clark DN, Cao F, Tavis JE, Hu J. Comparative analysis of hepatitis B virus polymerase sequences required for viral RNA binding, RNA packaging, and protein priming. J Virol. 2014;88:1564–1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Shin YC, Park S, Ryu WS. A conserved arginine residue in the terminal protein domain of hepatitis B virus polymerase is critical for RNA pre‐genome encapsidation. J Gen Virol. 2011;92:1809–1816. [DOI] [PubMed] [Google Scholar]
- 52. Roychoudhury S, Faruqui AF, Shih C. Pregenomic RNA encapsidation analysis of eleven missense and nonsense polymerase mutants of human hepatitis B virus. J Virol. 1991;65:3617–3624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Blum HE, Galun E, Liang TJ, von Weizsacker F, Wands JR. Naturally occurring missense mutation in the polymerase gene terminating hepatitis B virus replication. J Virol. 1991;65:1836–1842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Shin YC, Ko C, Ryu WS. Hydrophobic residues of terminal protein domain of hepatitis B virus polymerase contribute to distinct steps in viral genome replication. FEBS Lett. 2011;585:3964–3968. [DOI] [PubMed] [Google Scholar]
- 55. Edwards TC, Mani N, Dorsey B, et al. Inhibition of HBV replication by N‐hydroxyisoquinolinedione and N‐hydroxypyridinedione ribonuclease H inhibitors. Antiviral Res. 2019;164:70–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Edwards TC, Lomonosova E, Patel JA, et al. Inhibition of hepatitis B virus replication by N‐hydroxyisoquinolinediones and related polyoxygenated heterocycles. Antiviral Res. 2017;143:205–217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Lu G, Lomonosova E, Cheng X, et al. Hydroxylated tropolones inhibit hepatitis B virus replication by blocking the viral ribonuclease H activity. Antimicrob Agents Chemother. 2015;59:1070–1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Chauhan R, Li Q, Woodson ME, Gasonoo M, Meyers MJ, Tavis JE. Efficient inhibition of hepatitis B virus (HBV) replication and cccDNA formation by HBV ribonuclease H inhibitors during infection. Antimicrob Agents Chemother. 2021;65:e0146021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Lansdon EB, Liu Q, Leavitt SA, et al. Structural and binding analysis of pyrimidinol carboxylic acid and N‐hydroxy quinazolinedione HIV‐1 RNase H inhibitors. Antimicrob Agents Chemother. 2011;55:2905–2915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Kirschberg TA, Balakrishnan M, Squires NH, et al. RNase H active site inhibitors of human immunodeficiency virus type 1 reverse transcriptase: Design, biochemical activity, and structural information. J Med Chem. 2009;52:5781–5784. [DOI] [PubMed] [Google Scholar]
- 61. Himmel DM, Maegley KA, Pauly TA, et al. Structure of HIV‐1 reverse transcriptase with the inhibitor beta‐Thujaplicinol bound at the RNase H active site. Structure. 2009;17:1625–1635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Chung S, Himmel DM, Jiang JK, et al. Synthesis, activity, and structural analysis of novel alpha‐hydroxytropolone inhibitors of human immunodeficiency virus reverse transcriptase‐associated ribonuclease H. J Med Chem. 2011;54:4462–4473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Billamboz M, Bailly F, Lion C, et al. 2‐hydroxyisoquinoline‐1,3(2H,4H)‐diones as inhibitors of HIV‐1 integrase and reverse transcriptase RNase H domain: Influence of the alkylation of position 4. Eur J Med Chem. 2011;46:535–546. [DOI] [PubMed] [Google Scholar]
- 64. Lauber C, Seitz S, Mattei S, et al. Deciphering the origin and evolution of hepatitis B viruses by means of a family of non‐enveloped fish viruses. Cell Host Microbe. 2017;22:387–399.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Beck J, Seitz S, Lauber C, Nassal M. Conservation of the HBV RNA element epsilon in nackednaviruses reveals ancient origin of protein‐primed reverse transcription. Proc Natl Acad Sci USA. 2021;118:e2022373118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Haas M, Bureau M, Geldreich A, Yot P, Keller M. Cauliflower mosaic virus: Still in the news. Mol Plant Pathol. 2002;3:419–429. [DOI] [PubMed] [Google Scholar]
- 67. Volovitch M, Modjtahedi N, Yot P, Brun G. RNA‐dependent DNA polymerase activity in cauliflower mosaic virus‐infected plant leaves. EMBO J. 1984;3:309–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Walther TC, Kennell JC. Linear mitochondrial plasmids of F. oxysporum are novel, telomere‐like retroelements. Mol Cell. 1999;4:229–238. [DOI] [PubMed] [Google Scholar]
- 69. Galligan JT, Marchetti SE, Kennell JC. Reverse transcription of the pFOXC mitochondrial retroplasmids of Fusarium oxysporum is protein primed. Mob DNA. 2011;2:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix S1. Domain and motif boundaries used in these analyses
Figures S1–S23. Images and key quality metrics for all predicted models
Figures S24–S27. Images of the superpositions used in these analyses
Figures S28–S29. Detailed images for the docking experiments
Figure S30. Comparisons of interaction networks for the wild‐type and mutant residues used to validate the HBV P model
Table S1. Pairwise identities of P proteins relative to P from HBV genotype B
Dataset S1. Sequences employed in modeling
Dataset S2. Protein database (PDB) files for all predicted models.
