ABSTRACT
Viruses with large, double-stranded DNA genomes captured the majority of their genes from their hosts at different stages of evolution. The origins of many virus genes are readily detected through significant sequence similarity with cellular homologs. In particular, this is the case for virus enzymes, such as DNA and RNA polymerases or nucleotide kinases, that retain their catalytic activity after capture by an ancestral virus. However, a large fraction of virus genes have no readily detectable cellular homologs, meaning that their origins remain enigmatic. We explored the potential origins of such proteins that are encoded in the genomes of orthopoxviruses, a thoroughly studied virus genus that includes major human pathogens. To this end, we used AlphaFold2 to predict the structures of all 214 proteins that are encoded by orthopoxviruses. Among the proteins of unknown provenance, structure prediction yielded clear indications of origin for 14 of them and validated several inferences that were previously made via sequence analysis. A notable emerging trend is the exaptation of enzymes from cellular organisms for nonenzymatic, structural roles in virus reproduction that is accompanied by the disruption of catalytic sites and by an overall drastic divergence that precludes homology detection at the sequence level. Among the 16 orthopoxvirus proteins that were found to be inactivated enzyme derivatives are the poxvirus replication processivity factor A20, which is an inactivated NAD-dependent DNA ligase; the major core protein A3, which is an inactivated deubiquitinase; F11, which is an inactivated prolyl hydroxylase; and more similar cases. For nearly one-third of the orthopoxvirus virion proteins, no significantly similar structures were identified, suggesting exaptation with subsequent major structural rearrangement that yielded unique protein folds.
KEYWORDS: AlphaFold2, exaptation, orthopoxviruses, protein structure analysis, virus evolution
INTRODUCTION
Viruses are ubiquitous, obligatory, intracellular parasites of all life forms. Virus genome sizes vary over 3 orders of magnitude, from about two kilobases comprising a single gene to more than two megabases, consisting of thousands of genes (1). Virus genes comprise three major functional classes: (i) genes encoding components of virus replication machinery, (ii) genes for structural components of virions and proteins involved in morphogenesis, and (iii) genes encoding proteins involved in virus-host interactions. The fractions of the genes in these classes depend on the size of the virus genome. Viruses with small genomes, in particular, most of the RNA viruses and all ssDNA viruses, primarily encompass genes of the first two classes, with few genes dedicated to interactions with the hosts. In contrast, in viruses with large dsDNA genomes, many of the genes are involved in various aspects of virus-host interactions, particularly in counterdefense. The first two gene classes include a small set of virus hallmark genes that are conserved in a broad range of viruses (2). Some of the virus hallmark genes that encode proteins that are involved in replication appear to originate in a primordial pool of genetic elements, whereas hallmark genes encoding major virion components can be traced to ancient acquisitions of cellular genes (3). For many other genes from all three functional classes, more recent cellular ancestry is readily traceable through significant sequence similarity with the apparent cellular ancestors. However, the provenance of numerous other virus genes remains obscure because no cellular homologs are detectable, even with the most sensitive methods for protein sequence comparisons.
Protein structures are uniformly more strongly conserved in evolution than are sequences, meaning that structural comparison can illuminate the origin and function of many proteins that remain intractable at the sequence level. However, until recently, the utility of structural comparisons for the study of protein evolution remained severely hampered by the technical difficulty as well as by the time and labor costs of protein structure determination. The recent revolution in protein structure prediction that has been ushered in by the new artificial intelligence-based methods, namely, AlphaFold and RosettaFold, has dramatically expanded the opportunities for detecting homologous relationships among proteins via the comparison of protein structure models to experimentally solved structures or other models (4, 5). For instance, recent benchmarking suggests that structural similarities now can be detected for half of the human proteins that have been considered “dark matter” (6, 7).
We were interested in exploring the potential of this new generation of protein structure prediction methods in uncovering the origins of the “dark matter” of virus genomes. We selected a thoroughly studied group of viruses with large (about 200 kb) dsDNA genomes, namely, the orthopoxviruses (ORPV), as the target. The ORPV include one of the most historically devastating human pathogens, variola virus, and the current major threat to human health, Mpox (monkeypox) virus, as well as vaccinia virus (VACV), the major vaccine that was instrumental in the eradication of smallpox and one of the most popular model systems in virology (8, 9).
Together, the ORPV possess 214 genes (OPGs), of which subsets have been differentially lost in different virus lineages (10). Over decades of study, each of these genes has been extensively analyzed computationally, and, for the most part, experimentally, as well (8, 9). Nevertheless, for nearly half of the ORPV proteins, no homologs from cellular life forms could be detected, even using the most sensitive methods of sequence analysis, and, thus, the provenance of these genes remained enigmatic (10). We employed AlphaFold2 to predict the structures of all ORPV proteins and compared the resulting models to the databases of experimentally solved protein structures as well as precomputed AlphaFold2 models. This analysis identified apparent cellular ancestors for 14 ORPV genes that, until now, belonged to the dark matter. An emerging trend is the exaptation of host enzymes for nonenzymatic, structural roles in virus reproduction, which is typically accompanied by the disruption of the catalytic sites and extensive divergence such that homologous relationships become undetectable at the sequence level. However, the origins of many OPGs remain indecipherable, even through structure comparison, suggesting the emergence of multiple unique protein folds during poxvirus evolution.
RESULTS
Structure predictions for orthopoxvirus proteins.
The structures were predicted for representative sequences of each of the 214 OPGs using AlphaFold2 (4). High quality models that were indicative of globular structure were obtained for 186 proteins (mean Predicted Local Difference Distance Test [pLDDT] score of >70) (Table S1). For additional 8 proteins, although the overall prediction was of comparatively low quality, individual globular domains had a mean plddt score of ≥70 (Table S1). These 194 high quality models of orthopoxvirus proteins were then compared to the Protein Data Bank (PDB) database of protein structures and to the AphaFold.db, the database of precomputed models, using FoldSeek (fast but with relatively low sensitivity) (11) and Dali (relatively slow but with higher sensitivity) (12). Similar structures with significant scores (E < 0.001 for FoldSeek and/or z >5 for Dali) were detected for 188 proteins, and, for 179 of these, the similar structures included cellular proteins, whereas for the remaining 9 proteins, only viral structures were retrieved (Table S1). All outputs were manually examined for the extent and quality of structure superposition and alignment, and, for uncharacterized proteins without matches above the threshold, hits with lower scores were assessed. The known homologous relationships (10) were accurately reflected by structure predictions. In addition, predictions with significant scores were also obtained for 14 OPGs for which no homologs were previously detected, and these are next considered in detail.
Comprehensive structural modeling and analysis for orthopoxvirus proteins (OPG). Download Table S1, XLSX file, 0.05 MB (47.1KB, xlsx) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Exaptation of cellular proteins for viral functions.
Our analysis of the structural models of ORPV proteins identified previously undetected, likely origins from cellular ancestors for 14 OPGs (Table 1; Table S1). Although we show specific cellular proteins in comparison and superposition to the analyzed OPGs, these proteins and the corresponding organisms do not necessarily reflect the original source of the exapted protein but instead illustrate the respective fold and original function of the protein family. To emphasize this, we include three different cellular hits per OPG in the structural alignment (see Supplemental Material for full structural alignments and complete structure comparison results).
TABLE 1.
Exaptation of cellular proteins in orthopoxvirusesf
| OPG | VACV gene | FoldSeek (eVal) | Dali (pdb match; z score, RMSD)b | Function in ORPV | Ancestral function/activity | Gain nodec |
|---|---|---|---|---|---|---|
| Recruitment of enzymes for structural roles associated with a loss of catalytic activityd | ||||||
| 20 | C10L | 3itq 2.4e-4 | 3dkq 13.1 4.7 | Hypoxic response induction, inhibition of cytosolic DNA sensing | Prolyl 4-hydroxylase, 2OG-Fe(II) Oxygenase family | 3 |
| 31 | C4L | None | 6n1f 11.8 2.9 | Hypoxic response induction, NFkB inhibition | Oxidoreductase, 2OG-Fe(II) Oxygenase family | 10 |
| 55a | F11L | 6tex 1.7e-4 | 4j25 11.0 3.2 | RhoA-mDia signaling inhibitor | Prolyl 4-hydroxylase, 2OG-Fe(II) Oxygenase family | 3 |
| 56 | F12L | P09804 1.1e−18 | P09804 17.2 16.7 | Wrapped virion component, promotes virus trafficking | Protein-primed DNA polymerase | 2 |
| 61 | F16L | Q04PB3 5.2e-4 | 6dgc 11.6 7.6 | Unknown, nucleolar localization | Serine recombinase | 2 |
| 64 | E2L | None | 7mis 14.5 5.8 | Promotes virus trafficking in complex with F12 | Pseudokinase domain of glutamylase | 2 |
| 74a | O1L | None | 4ykn 5.0 21.7 | Activator of the ERK1/2 pathway | Pseudokinase domain of glutamylase | 2 |
| 97a | L3L | None | 6g37 10.6 2.9 | Core component, involved in transcription | Haspin, Ser/Thr protein kinase | A |
| 98a | L4R | None | 2q5t 4.9 4.6 | Core component, involved in transcription | ADP-ribosyltransferase domain of Cholix cytotoxin | A |
| 115a | D3R | 3j8 × 2.5e-5 | 5lt4 11.3 3.4 | Virion core protein | Kinesin | 2 |
| 120 | D8L | Q8HY33 1.4e-27 | 6b00 33.6 1.6 | MV membrane component, cell surface binding | Carbonic anhydrase | 10 |
| 129a | A3L | 4w4u 2.4e-5 | 2vhf 15.7 3.1 | Precursor of major virion core protein | Deubuiquitnating enzyme, CylD | A |
| 148a | A20R | B8D122 4.5e-9 | 5d1o 10.2 4.3 | DNA polymerase processivity factor | DNA (or RNA?) ligase | 2 |
| 165 | A37R | 6ax6 3.6e-4 | 3o1r 12.0 2.6 | Unknown, putatively involved in hypoxic response induction | 2OG-Fe(II) Oxygenase family | 4 |
| 181a | A51R | None | 6n1F 11.5 3.2 | Unknown | Oxidoreductase, 2OG-Fe(II) oxygenase family | 5 |
| 198 | B12R | 6cqh 5.3e-26 | 2chl 27.1 2.1 | Immunomodulation? | Ser/Thr kinase | 11 |
| Recruitment of nonenzymatic proteinse | ||||||
| 77a | I1L | None | 7vdv 4.3 22.3 | DNA-binding core protein | SWIB domain involved in chromatin remodeling | A |
| 82a | I6L | None | 2j5a 8.4 2.9 | Telomere-binding protein | Ribosomal protein S6 | A |
| 127a | A2L | None | 5wh1 5.6 6.1 | Viral late transcription factor | Transcription factor TFIIB | A |
| 134a | A8R | None | 3h4c 9.3 4.3 | Viral intermediate transcription factor, small subunit | Transcription factor TFIIB | A |
| 150a | A23R | None | 1mp9 7.8 3.1 | Viral intermediate transcription factor, large subunit | TATA-binding protein | A |
| 185a | A56R | 2ij0 2.8e-7 | 3u83 15.2 11.3 | EV membrane protein, hemagglutinin | Plasma membrane receptor, paralog of poliovirus receptor | 10 |
OPGs for which a structural prediction was made and the origin was inferred in this work.
The ID of the best cellular hit against either the PDB or the AlphaFold2 database is indicated and followed by the Dali z score and RMSD.
The numbers represent the nodes of the phylogenetic tree of chordopoxviruses (Fig. 2 in [10]) to which the gain of the respective gene by chordopoxviruses was mapped. A, ancestral, that is, acquired by the common ancestor of chordopoxviruses or earlier in evolution.
Both previously reported cases of enzyme exaptation and cases that were discovered in this work are listed.
Only cases that were discovered in this work are listed.
IL1, Interleukin 1; NFkB, nuclear factor kappa-light-chain-enhancer of activated B cells; ERK, extracellular signal-regulating kinase; MV, mature virion; TFIIB, transcription factor II B; EV, enveloped virus.
The results convincingly supported previous findings on the exaptation of host enzymes for nonenzymatic functions in poxviruses that were originally made by using sensitive methods for sequence analysis (Table 1). In particular, OPG56 (F12L), which is involved in virus egress from infected cells (13), was shown to be a derived, inactivated DNA polymerase, possibly of bacteriophage origin (14). Another poxvirus protein with yet unknown functions, namely, OPG61 (F16L), is an inactivated serine recombinase (15). Three homologous OPGs were shown to be inactive prolyl hydroxylases: OPG20 (C10L), OPG31 (C4L), and OPG165 (A37R) (10). The structure modeling in this work fully validated the sequence-based inferences for these proteins, with high structure similarity scores (Table S1) and convincing structural superpositions of the respective core domains (Fig. 1; Fig. S1). Additionally, structural alignments of OPG56 (F12L) and OPG61 (F16L) across OPGs from diverse Orthopoxvirus species and cellular homologs highlight the replacement or loss of key amino acid residues that are required for enzymatic activity and therefore underline the inactivation of those OPGs (Fig. 1B and D, respectively).
FIG 1.
Structural modeling validates cases of enzyme exaptation discovered through sequence similarity. (A) OPG56 (F12L) (blue, aa 205 to 634) and the best Dali hit, a DNA polymerase type-B from yeast (AF-P09804-F, green, aa 338 to 916). (B) Structural alignment of prototype OPG56 Q, query, seven OPG members from diverse chordopoxviruses (blue, from top to bottom: VARV, MPXV Zaire 96-I-16, VACV, SFV, MyxV, SORPV, and MCV subtype 1), and three top hits found by Dali (green, DNA polymerases type-B from Kluyveromyces lactis [af2-db P09804], Claviceps purpurea [af2-db P22373] and Bacillus virus phi29 [2py5-B] [83]). Alignment parts corresponding to the ExoI motif, polymerase motif C, and KxY motif are highlighted by a red box. The numbers indicate positions in the structural alignment. (C) OPG61 (F16L) (blue, aa 1 to 118 [of 231]) and the best cellular Dali hit, namely, the catalytic domain of a serin recombinase from Sulfolobus sp. L00 11 (archaea) (pdb 6dgc, [84] green, aa 65 to 164 [of 211]). The exemplified catalytic subdomain DRLXR (aa 139 to 143) in serin recombinase (magenta) and the mutated stretch KQISI (aa 73 to 77) in OPG61 (cyan) are highlighted. H, alpha-helix; E, beta sheet; L, loop. (D) The structural alignment of prototype OPG61 (top), seven OPG members from diverse chordopoxviruses (blue, ORPV species as indicated), and three Dali hits (green, an integrase from Lactococcus phage TP901-1 [3bvp-B] [85], an IS607-like serine recombinase from Sulfolobus sp. L00 11 [6dgc-D] [84]), and a resolvase family site-specific recombinase from Streptococcus pneumoniae SP19-BS75 (3guv-A [86]). Red bars highlight the catalytic centers (DRLxR motifs) of the serin recombinases.
Supplement to Fig. 1. OPG20 (C10L), OPG31 (C4L), and OPG165 (A37R) are homologs of hydroxylases. (A–C) The superimposition OPG (blue) with the respective hydroxylase domain of cellular protein (green). (A) OPG20 (aa 1 to 160) and the PKHD-type hydroxylase of Shewanella baltica (3dkq [DOI: 10.2210/pdb3dkq/pdb], aa 5 to 200). (B) OPG31 (aa1 to 147) and oxidoreductase, the 2OG-Fe(II) Oxygenase family of Burkholderia pseudomallei (6n1f [DOI: 10.2210/pdb6n1f/pdb], aa 3 to 216). (C) OPG165 (aa1 to 139) and dioxygenase from E. coli (3o1r [96], aa13 to 214). (D) Structural alignment of OPGs with hydroxylases as the best hit. Blue: OPGs from this figure and Fig. 2A and C with OPG55 (F11L) as the query for pairwise structural alignment. From top to bottom: OPG55, OPG165, OPG20, OPG181 (A51R), and OPG31. Green: hydroxylases. From top to bottom: dioxygenase from E. coli (3o1r [96]), PKHD-type hydroxylase of Shewanella baltica (3dkq [DOI: 10.2210/pdb3dkq/pdb]), human lysyl hydroxylase LH3 (6tex [DOI: 10.2210/pdb6tex/pdb]), and an oxidoreductase from Burkholderia pseudomallei (6n1f [DOI: 10.2210/pdb6n1f/pdb]). Download FIG S1, PDF file, 0.7 MB (700.6KB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
We identified 8 additional cases of apparent exaptation accompanied by a loss of enzymatic activity that were not detectable at the sequence level, bringing the total number of detected cases of exaptation of enzymes accompanied by inactivation in ORPV to 16 (Table 1; Table S1). The inactivation of the recruited cellular enzymes is clearly indicated by the replacement of the amino acid residues that are essential for substrate binding and/or catalytic activity in the respective OPGs, which is apparent from the structure-guided multiple sequence alignments (Fig. 2, see details on the binding and catalytic sites in the figure legend). Two additional proteins, OPG55 (F11L) and OPG181 (A51R), were shown to be highly derived Fe-dependent dioxygenases, the protein superfamily that includes prolyl hydroxylases, three inactivated homologs of which were previously identified in chordopoxviruses, as discussed above (Table 1; Fig. 2A and B).
FIG 2.
Newly identified cases of enzyme exaptation for structural roles in poxviruses that are accompanied by disruption of the catalytic sites. In each panel (A–F), the left subpanel shows the superposition of the AlphaFold2 model of an OPG (blue) with a structurally similar cellular enzyme (green). Residues that are important for substrate binding and/or the catalytic activity of the cellular enzyme are highlighted in magenta (cellular enzyme), and the corresponding residues within OPG are shown in gray. The right subpanel shows the structural alignment of the respective query OPG (Q, top), seven OPG members from diverse chordopoxviruses (blue), and three structural homologs found by Dali (green). The proteins are listed from top to bottom. The catalytic and binding amino acid residues are highlighted in red. The numbers on top of the alignment refer to amino acid positions in that alignment. (A) Left panel: OPG55 (F11L) (aa 34–220) and human Lysyl Hydroxylase LH3 (6tex [doi:10.2210/pdb6tex/pdb], aa 545 to 738). Highlights: Residues that are known to bind Fe2+ and to be essential for the catalytic activity within 2-OG dioxygenase enzyme members (H667, D669 and H719) and OPG55 (L136, L138 and V184) are highlighted. Right panel: Structural alignment of OPG55 (Q), OPG55 from CMLV, VARV, MPXV Zaire-96-I-16, VACV, SwPV, SORPV ELK, and LSDV NI-2490, Dali hits (prolyl hydroxylase from Paramecium bursaria Chlorella virus 1 [5c5t-A] [87], PKHD-type hydroxylase from Psychrobacter sp. [af2-db A5WFM3] and human lysyl hydroxylase LH3 [6tex-A] [doi:10.2210/pdb6tex/pdb]. (B) Left panel: OPG181 (A51R) (aa 1 to 166) and Burkholderia pseudomallei oxidoreductase (6n1f [doi:10.2210/pdb6n1f/pdb]). Highlighted: Residues that are known to bind Fe2+ and to be essential for the catalytic activity within 2-OG dioxygenase enzyme members (H134, D136, H188); OPG181 (N100, F102 and F150). Right panel: Structural alignment of prototype OPG188 (Q), OPG188 from VARV, CMLV, MPXV Zaire 96-I-16, YLDV, SwPV, and LSDV NI-2490, Dali hits (oxidoreductase from Burkholderia pseudomallei [6n1f-B] [doi:10.2210/pdb6n1f/pdb]), Fe2OG dioxygenase domain-containing protein from Dictyostelium discoideum (af2-db Q54K28) and procollagen-proline 4-dioxygenase from Onchocerca volvulus (af2-db A0A2K6VMM0). (C) Left panel: OPG148 (A20R) (aa 28 to 284) and a DNA ligase B from Klebsiella pneumoniae (af-db B5XTF0, aa 61 to 406). Highlights: The key amino acids of motifs I (KxDG), IV (DG), and V (K) within the ligase adenylation domain appear in the structure from left to right. Right panel: Structural alignment of prototype OPG148 (Q), OPG148 from VACV, MPXV Zaire-96-I-16, VARV, MyxV, Orf virus, MCV subtype 1, and CRV), Dali hits (DNA ligases from Klebsiella pneumoniae [af2-db B5XTF0], E. coli [af2-db B7M4D2], and Streptococcus pneumoniae [af2-db B1IBQ3]). (D) Left panel: OPG129 (A3L) and human CYLD USP domain (2vhf [88], aa 583 to 955), a deubiquitinating enzyme. Highlights: Residues of the catalytic triad within the USP domain (C601, H871, D889); OPG129 (L136, L138 and V184). Right panel: Structural alignment of prototype OPG129 (Q), OPG129 from VACV, MyxV, VARV, MPXV Zaire-96-I-16, CRV, Orf virus, and SGVP), Dali hits (all CYLD USP domains found in Danio rerio [af2-db, E7F1X5], Homo sapiens [2vhf-B] [88], and Sporothrix schenckii [af2-db, U7Q4Z6]). (E) Left panel: OPG115 (D3R) and a kinesin motor ATPase from S. cerevisiae (1f9u [89], aa 385 to 722). Highlights: The P-loop (Walker A motif GxxxxGK(S/T)), Switch1 (SSRSH) and Switch2 (DLAGSE) motif within the ATPase. Right panel: Structural alignment of prototype OPG115 (Q), OPG115 from VACV, VARV, MPXV Zaire-96-I-16, MyxV, SFV, MCV subtype 1, and SOPV ELK, Dali hits (all Kinesins from S. cerevisiae [1f9u-A] [89], Homo sapiens [5lt4-D] [90] and Drosophila melanogaster [5hnz-K] [91]). (F) Left panel: OPG98 (L4R) and the best cellular Dali hit: Cholix toxin, a ADP-ribosyltransferase of V. cholerae (2q5t [92], aa 415 to 630). Highlights: The Cholix catalytic cluster (H460, Y493, Y504. E574, E581). Right panel: Structural alignment of prototype OPG98 (Q), OPG98 from VACV, VARV, MPXV Zaire-96-I-16, Orf virus, MyxV, MCV subtype 1, and CRV, and Dali hits (all ADP-ribosyltransferase toxins of: P. aeruginosa [af2-db: P11439] and V. cholerae [2q5t-A] [92] and 3ki7-A [doi:10.2210/pdb3ki7/pdb]. The residues of the catalytic cluster are highlighted in red. P11439 contains additional site that is highlighted in blue (S474).
Especially notable is the case of OPG148 (A20R), which showed highly significant structural similarity to DNA ligases, particularly bacterial NAD-dependent ones (Fig. 2C). Although NAD-dependent DNA ligases present the majority of hits within the PDB and AlphaFold2 databases, there are several hits against both ATP-dependent DNA and RNA ligases with comparable Dali z scores of approximately 8 to 10, including a RNA ligase as the best cellular hit within PDB (Table 1). Hence, although NAD-dependent DNA ligases seem to be the most likely candidates for the ancestor of OPG148, given their preponderance among the similar structures, other ligases cannot be ruled out as the source of this exaptation. The VACV DNA polymerase holoenzyme consists of the DNA polymerase itself (OPG71, E9L) and two accessory subunits that function as processivity factors, OPG116 (D4R) and OPG148 (16, 17). Notably, both of these proteins are derived from host enzymes that are involved in DNA replication and/or repair, namely, uracil DNA glycosylase (UDG; OPG116) (18) and, as determined here, DNA ligase (OPG148). However, these proteins represent two contrasting modes of exaptation. OPG116 retains high similarity to eukaryotic UDGs and the corresponding enzymatic activity which is, however, not required for the function of this protein in VACV DNA replication in cell culture (19), although catalytic site mutations reduce virus reproduction in quiescent cells and attenuate virulence (20). Conversely, in OPG148, the ligase catalytic site is disrupted (Fig. 2C), and the protein sequence diverged beyond recognition such that structure comparison was essential for inferring the origin of this virus protein. While the manuscript was under revision, the structural similarities between OPG148 and ligases were reported, and, moreover, the monkeypox virus DNA polymerase holoenzyme has been experimentally shown to lack ligase activity (21). The ligase that gave rise to OPG148 might have been captured by an ancestral chordopoxvirus from an external source, such as a bacterium, assuming that the ancestor was a NAD-dependent DNA ligase. However, given that evolutionary reconstructions suggest that the ancestor of chordopoxviruses encoded a NAD-dependent ligase that was subsequently replaced by an ATP-dependent one (22), an alternative scenario of “intramural” exaptation (23) appears plausible. Under this scenario, OPG148 evolved from a duplication of the ancestral NAD-dependent ligase, and this was followed by inactivation, whereas the original, active NAD-dependent ligase was subsequently lost. To assess this hypothesis, we constructed AlphaFold2 models of the NAD-dependent ligases from fish poxviruses and entompoxviruses to approximate the ancestral poxvirus structure, and we compared these to the model of OPG148 and the experimental structures of different ligases. The resulting z scores produced by DALI were the highest between OPG148 and the poxvirus NAD-dependent ligases (Table S2), which appears compatible with the “intramural” exaptation scenario.
Comparison of the structural model for OPG148 with models of poxvirus NAD-dependent ligases and structures of representative NAD-dependent and ATP-dependent ligases. VAVC, vacinia virus; MPXV, mpox (monkeypox) virus; CMLV, camelpox virus; VARV, variola virus; SwPV, swinepox virus; SORPV, sea otterpox virus; LSDV, lumpy skin disease virus; YLDV, Yaba-like disease virus; MyxV, myxoma virus; MCV, Molluscum contagiosum virus; CRV, Nile crocodilepox virus; CNPV, canarypox virus; SGPV, salmon gill poxvirus; SFV, rabbit (shope) fibroma virus; af2-db, AlphaFold2 database; ECTV, Ectromelia virus; RCNV, raccoonpox virus; YKV, Yoka poxvirus. Download Table S2, XLSX file, 0.01 MB (10.4KB, xlsx) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Another ORPV protein, namely, OPG129 (A3L), which a major core protein (24), is an inactivated derivative of a distinct deubiquitinating enzyme (DUB) (Fig. 2D). The recruitment of DUBs as proteases catalyzing the processing of virus proteins is a well-known phenomenon that is exemplified by OPG83 (I7L), which is the protease that is involved in virion protein processing in poxviruses and in many other viruses of the realm Varidnaviria (25). We are unaware of previously reported cases of exaptation of DUBs for structural roles, although, in alphaviruses, a serine protease has been exapted as the major capsid protein (26).
Another newly identified case of enzyme exaptation for a structural role in orthopoxviruses involves the recruitment of a major cytoskeleton component. OPG115 (D3R), a virion core protein (27), appears to be a derivative of the enzymatic domain of kinesin, the motor ATPase that is involved in various intracellular trafficking processes (Fig. 2E). Recapitulating the pattern observed in other enzymes exapted for structural roles, the catalytic residues of the ATPases, particularly the canonical Walker A and B motifs, are replaced in OPG115 (Fig. 2E).
OPG98 (L4R), which is a core protein involved in VACV early transcription, showed structural similarity to the C-terminal ADP-ribosyltransferase domain of the Vibrio cholerae Cholix cytotoxin, which additionally contains an N-terminal receptor recognition domain (28). Because of the presence of the N-terminal domain, the Dali score in this case was relatively low, but OPG98 fully superimposed over the ADP-ribosyltransferase domain, whereas the catalytic residues were replaced (Fig. 2F).
Four OPGs contain inactivated protein kinase or pseudokinase domains. OPG97 (L3L) is a VACV core component that is involved in the transcription of early genes. The AlphaFold model of this protein showed significant structural similarity to Ser/Thr proteins kinases, particularly the atypical kinase domain of haspin, which is an animal chromatin remodeling regulator (29), but the catalytic site residues are partially replaced in the poxvirus protein (Fig. 3A). The aspartate in the active site is preserved in the Orf virus OPG97, but the lysine at the ATP-binding site is replaced by a methionine. Conversely, in the OPG97 proteins from other chordopoxviruses, the lysine in the binding site is intact, whereas the aspartate was replaced. Interestingly, the glutamic acid residue that is involved in ATP-binding is still present across all analyzed OPG97 proteins, whereas the Haspin specific ATP-binding motif DYT is absent in all. OPG97 is the second inactivated protein kinase in chordopoxviruses, along with OPG198 (B12R), that has a much higher similarity to active Ser/Thr kinases and is readily detectable at the sequence level, again, with only partial replacement of the catalytic residues (Fig. 3B), which apparently reflects a later acquisition of a host kinase (see below). The two inactivated kinase derivatives followed different paths of exaptation, namely, recruitment for an essential structural role in the case of OPG97 and apparent involvement in immunomodulation in the case of OPG198 (30, 31).
FIG 3.
Inactivated kinases and pseudokinases in orthopoxviruses. (A) Left panel: OPG97 (L3L) (blue, aa 66 to 350) and Haspin, an atypical Ser/Thr kinase (green, 6g37 [Heroven, 2018 number 2959], aa 472 to 798); (Mutated) ATP binding site, helix αC glutamate, and active site are highlighted (K511, E535, and D649 in Haspin, magenta; K93, E99, and E177 in OPG97, gray). Right panel: Structural alignment of prototype OPG97, seven OPG members from diverse chordopoxviruses and three Dali hits, all kinases. The Haspin specific ATP-binding motif DYT is highlighted in red. PDB structure: 6g37 (93). (B) Left: OPG198 (B12R) (blue) and human vaccinia-related kinase (VRK, 6cqh [doi:10.2210/pdb6cqh/pdb], green, aa 22 to 341). The ATP binding site and the active site are highlighted (K71 and D171 in VRK, magenta; K45 and K139 in OPG198). Right panel: Structural alignment prototype OPG198 (Q), seven OPG members from diverse chordopoxviruses and three Dali hits, all vaccina-related kinases. PDB structure: 6cqh [doi:10.2210/pdb6cqh/pdb]. (C) Left panel: OPG64 (E2L) (blue, aa 444 to 737) and the best cellular hit, namely, SidJ, a glutamylation protein with a pseudokinase-fold from Legionella pneumophila (7mis [33], green, aa 336–758). Key amino acids of the SidJ nucleotide-binding pocket (H492, R500, Y506, R522, N733 [orange]) and SidJ kinase-like active site (R352, K367, E373, E381, Y452, Y532, N534, D542 [magenta]) are shown. Right panel: Structural alignment of prototype OPG64 (Q), seven OPG members from diverse chordopoxviruses (blue) and three Dali hits (green). The residues that are important within SidJ for nucleotide binding (R522, orange) and kinase-like activity (Y532, red) are highlighted. PDB structures: 7mis (33), 7pqe (32), and 6oqq (94). (D) Superposition of OPG64 (purple) and OPG74 (O1L) (yellow). (E) Left panel: Pseudokinase domains of OPG74 (blue, aa 380 to 666) and SidJ (7mis [33], green, aa 336 to 758). The sites are highlighted as in panel C. Right panel: Structural alignment of prototype OPG74 (Q), seven OPG members from diverse chordopoxviruses (blue), and the same Dali hits as for OPG64 (green). The residues are highlighted as in panel C.
Two more inactivated kinase homologs contain domains that are structurally similar to the pseudokinase domain of the bacterial glutamylase SidJ, in which the pseudokinase domain catalyzes ATP-dependent glutamylation that, in the case of Legionella pneumophyla, inactivates bacterial ubiquitin ligase (32, 33). OPG64 (E2L) is a large, two-domain protein for which the structure was recently solved, and it has been shown to consist of a globular head domain and an annular (ring) domain that is comprised of multiple α-helices (33, 34). The head domain was found to share structural similarity with the pseudokinase domain of SidJ, but this similarity was interpreted as weak and potentially spurious (34). In our comparison, OPG64 produced a highly significant Dali score with SidJ (Table 1), and the overlay of the two domains included the superposition of all core elements of the pseudokinase domain (Fig. 3C), strongly suggesting that the head domain of OPG64 is indeed derived from the pseudokinase. However, the catalytic residues of the pseudokinase domain are replaced in OPG64 (Fig. 3C), following the trend of inactivation of exapted enzymes. The annular domain of OPG64 seems to be unrelated to any structure outside the ORPV. We further observed that another large ORPV protein, namely, OPG74 (O1L), showed significant structural similarity to OPG64, with full superposition of both domains (Fig. 3D), although the similarity of the OPG74 head domain to the pseudokinase was much lower than in the case of OPG64 (Fig. 3E). The conservation of the domain architecture, including the unique annular domain, implies that, notwithstanding the low sequence similarity, OPG64 and OPG74 evolved as a result of ancient gene duplication in an ancestral chordopoxvirus, and this was followed by extensive divergence. Despite the structural similarity and apparent common origin, OPG64 and OPG74 perform quite different roles in ORPV: OPG64 is involved in trafficking in infected cells (35), whereas OPG74 activates the ERK1/2 pathway and promotes virulence (36). Notably, OPG64 functions as a complex with OPG56 (F12L) (35), another inactivated enzyme, which is a derived DNAP, discussed above.
Six other cases of apparent exaptation of cellular proteins for viral functions involve nonenzymatic, structural proteins (Table 1; Table S1; Fig. 4). The poxvirus telomere-binding protein OPG82 (I6L) (37) appears to be a derivative of the ribosomal protein S6, although a Dali structure comparison also detected two other protein families with a similar fold (four-strand beta sheet and two alpha helices), namely, a formyltetrahydrofolate deformylase and bacterial glycine cleavage system. However, the root mean square deviation (RMSD) of these candidates was higher than that of ribosomal protein S6 (see Supplemental Material for the raw Dali hits and the full structural alignment files for the top three cellular proteins with 7 OPGs from diverse Orthopoxvirus species for OPG82 and all of the OPGs that are discussed below as well as for the raw Dali hits and the full structural alignment of the respective prototype OPG, 7 OPG members from diverse chordopoxviruses, and the top three hits found by Dali). In this case, the recruitment of a nucleic acid-binding protein might not involve a radical functional change and presents as the most biologically plausible candidate. Even less surprising seems to be the apparent adoption of the transcription factor TFIIB as poxvirus late (OPG127 [A2L]) and intermediate (OPG134 [A8R]) transcription factors (38, 39). Notably, cyclines e.g., N-terminal cyclin box of cyclin A2 for OPG127 (7b5r-y, [40]) are among the best hits, besides the C-terminal domain of TFIIB for both OPG127 and 134. However, the arrangement and succession of alpha helices are different between the OPGs and TFIIB, on the one hand, and the cycline box, on the other hand (for further detail, see the respective structural alignment files). OPG77 (I1L), a DNA-binding core component, was shown to contain a SWIB domain that is present in various chromatin proteins and is involved in chromatin remodeling (41). A similar role of this domain in the poxvirus core can be envisaged. Along the same lines, OPG150 (A23R), an intermediate transcription factor (42), shows structural similarity to a TATA binding protein (e.g., 1mp9 [43] among many others), apparently using the primary function of the original protein, namely, DNA binding, during the poxvirus life cycle. OPG185 (A56R), an ORPV membrane protein and hemagglutinin (44), appears to be a derivative of a cellular receptor containing an immunoglobin-like domain, such as is present in the structural similarity search hits nectin-1 (6LSA [45]), CD80 (4rwh, doi:10.2210/pdb4rwh/pdb), or immunoglobulin MS6-12 (1mjj [46]) (see the respective Supplemental Material files for details).
FIG 4.
Newly identified cases of exaptation of nonenzymatic proteins for structural roles in poxviruses. Superimposition of OPG models (blue) over the best structural match (green), as identified by Dali. (A) OPG77 (I1L) and SWIB domain of mouse BRG1-associated factor 60a in Mus musculus (1uhr [doi:10.2210/pdb1uhr/pdb]). The putative SWIB domain in OPG77 (amino acid positions 138 to 222) is rendered in gray. The mouse SWIB domain contains 2 small antiparallel beta-sheets. (B) OPG82 (I6L) and ribosomal protein S6 (2j5a [75], Aquifex aeolicus). (C) OPG 127 (A2L) and the C-terminal region of transcription factor IIB (5wh1 [76], Homo sapiens). (D) OPG134 (A8R) and the C-terminal domain of transcription factor IIB (3h4c [78]). (E) OPG150 (A23R) and the TATA-binding protein (1mp9 [43], Sulfolobus acidocaldarius). (F) OPG185 (A56R) and nectin-1 (3u83 [95], Homo sapiens).
From the previous reconstruction of chordopoxvirus evolution, information on the stage of evolution at which each of the exapted genes was captured by the viruses can be extracted (10). The majority of these genes were gained at an early stage of chordpoxvirus evolution, along the branch separating fish poxviruses from the rest of the chordopoxviruses (Table 1). The exceptions are OPG31 (C4L) and OPG198 (B12R) that apparently emerged at a later stage of chordopoxvirus evolution, with the first occurring via the duplication of OPG20 (an already inactivated prolyl hydroxylase) and the second occurring via the capture of an ancestral protein kinase from the host. The multiple alignments of all exapted enzymes show the consistent replacement of the catalytic amino acid residues in all poxviruses (Fig. 2), demonstrating that the respective enzymatic activities were lost shortly after the capture of the respective host genes by the ancestral poxviruses. An apparent exception to this pattern of enzyme inactivation is OPG61 (F16L), in which the deepest poxvirus branch carrying this gene, the crocodilepoxviruses, is predicted to encode an active serine recombinase, with inactivation apparently having occurred along the branch separating crocodilepoxviruses from the rest of the chordopoxviruses (15).
Routes of evolution of orthopoxvirus genes.
Recently, virus genes acquired from the hosts have been classified into five categories with respect to the degree of functional change of the encoded proteins (23): (i) virus hallmark proteins that are shared by a broad variety of viruses and were acquired at the earliest stages of virus evolution (such as capsid proteins) or possibly inherited from primordial replicators (such as some replicative enzymes), (ii) “radical” exaptation accompanied by a major change in the protein function, (iii) “conservative” exaptation of host proteins when the original activity is exploited for virus functions, (iv) direct exploitation of host proteins in their original capacity, (v) virus proteins of unknown provenance, some possibly evolving de novo.
Furthermore, a distinction has been made between “extramural” exaptation that involves the direct recruitment of host genes and “intramural” exaptation when proteins encoded by the virus itself are repurposed either via duplication or functional moonlighting (23).
The structural comparisons that are presented allow us to reach greater confidence in inferring the likely origins of the viral proteins than was previously attainable. Here, we classified the OPGs according to the above categories (Table 2; Fig. 5). There are only three bona fide hallmark genes in poxviruses: those encoding the homolog of the major capsid protein involved in virion morphogenesis, the packaging ATPase, and a single replicative enzyme, namely, the primase-helicase.
TABLE 2.
Inferred routes of evolution of orthopoxvirus proteins
| Evolutionary/functional category | No. of genes | OPG (genes)a |
|---|---|---|
| Virus hallmark genes | 3 | 125 (D13L), 160 (A32L), 117 (D5R) |
| Radical exaptation, enzymes recruited for structural roles | 18 | 20 (C10L), 31 (C4L), 55 (F11L)b, 56 (F12L), 61 (F16L), 64 (E2L), 74 (O1L)b, 97 (L3R)b, 98 (L4R)b, 115 (D3R)b, 116 (D4R), 120 (D8L), 129 (A3L)b, 148 (A20R)b, 165 (A37R), 175 (A45R)c, 181 (A51R), 198 (B12R) |
| Conservative exaptation, proteins recruited for new function based on the same activity | 95 | 2-6 (C23L-16L), 8-11, 13-17, 19 (C11R), 21-23, 25 (C9L), 29 (C6L), 30 (C5L), 32-34 (C3L-1L), 35-36 (N1L-2L), 37 (M1L), 39-41 (K1L-3L), 44 (K7R), 45 (F1L), 47 (F3L), 49 (F5L), 54 (F10L), 57 (F13L), 63 (E1L), 65 (E3L), 67 (E5R), 72 (E10R), 83 (I7L), 82 (I6L)b, 84 (I8R), 85 (G1L), 89 (G5R), 91 (G6R), 93 (G8R), 106 (H1L), 108 (H3L), 113 (D1R), 118 (D6R), 121-124 (D9R, D10R, D11L, D12L), 133 (A7L), 145 (A18R), 146 (A19R), 150 (A23R)b, 159 (A31R), 161 (A33R), 162 (A34R), 167-169 (A38L-40L), 176 (A46R), 177 (A47L), 179 (A49R), 182-185 (A52R-56Rb), 187-191 (B1R-6R), 193 (B8R), 194d, 196 (B10R), 199-201 (B13R-16R), 203-205 (B18R-20R), 206, 208 (C12L), 210d, 211 (C15L), 212-214 |
| Direct functional recruitment | 30 | 42 (K4L), 43 (K5L, K6L), 46 (F2L), 48 (F4L), 66 (E4L), 71 (E9L), 75 (O2L), 77 (I1L)b, 79 (I3L), 80 (I4L), 88 (G4L), 90 (G5.5R), 101 (J2R), 102 (J3R), 103 (J4R), 105 (J6R), 109 (H4L), 111 (H6R), 119 (D7R), 127 (A2L)b, 131 (A5R), 134 (A8R)b, 149 (A22R), 151 (A24R), 156 (A29L), 171 (A42R), 174 (A44L), 178 (A48R), 180 (A50R), 186 (A57R) |
| Origin unknown | 68 | 1, 7, 12, 18, 24, 26 (C8L), 27 (C7L), 28, 38 (M2L), 50-53 (F6L-9L), 58 (A14L), 59 (A14.5L), 60 (F15L), 62 (F17R), 68-70 (E6R-8R), 73 (E11L), 76 (O3L), 78 (O2L), 81 (I5L), 86 (G3L), 87 (G2R), 92 (G7L) 94 (G9R), 95-96 (L1R-2R), 99 (5R), 100 (J1R), 104 (J5L), 107 (H2R), 110 (H5R), 112 (H7R), 114 (D2L), 126 (A1L), 128 (A2.5L), 130 (A4L), 132 (A6L), 135-139 (A9L-13L), 140 (A14L), 141 (A14.5L), 142 (A15L), 143 (A16L), 144 (A17L), 147 (A21L), 152-155 (A25L-28L), 157 (A30L), 158 (A30.5L), 163 (A35R), 164 (A36R), 166, 170 (A41L), 172 (A43R), 173 (A43.5R), 192 (B7R), 195 (B9R), 197 (B11R), 202 (B17L), 207 (C11.5R), 209 (C13L, C14L) |
OPG numbers are indicated, with the VACV-Copenhagen gene names given in parentheses, if available.
The OPGs for which the structure was predicted and the origin was inferred in this work.
OPG116 (uracil DNA glycosylase) and OPG175 (superoxide dismutase) are special cases of exaptation in which the enzymatic activity is retained, but the principal role of the protein in ORPV is structural.
OPGs194 and 210 are multidomain membrane spanning proteins with one domain resembling the Ig protein superfamily. Most of the protein is modeled with low confidence and gives no convincing match.
FIG 5.
Inferred routes of evolution of orthopoxvirus proteins. The number of OPGs assigned to the different classes of virus proteins, with respect to the degree of functional change from the respective cellular ancestors, are shown. Black, virus hallmark proteins; blue, direct functional recruitment; blue-gray, “conservative” exaptation; opal, “radical” exaptation; shades of purple, unknown provenance. OPGs of unknown provenance were classified into disordered and generic (those that were predicted to adopt a globular fold but had no convincing match, with only generic matches [for example, to various β-sandwiches]) and PIE domains (predicted globular proteins with no match [mostly short proteins]) (see Table S1 for details).
The inactivated enzymes adopted for structural roles by the viruses, as discussed in the preceding section, represent clear cases of radical exaptation. This route of exaptation involves major sequence changes and even structural changes to the proteins involved, which make the recognition of the ancestral relationships a nontrivial task. A distinct exaptation scenario is apparent for UDG, OPG116 (D4R), and superoxide dismutase, OPG175 (A45R) (47), which retain the catalytic sites and activity as well as high sequence similarity to their respective cellular homologs, and they perform dual roles in ORPV reproduction, both structural and enzymatic. Conversely, one of the inactivated protein kinases discussed in the previous section, namely, OPG198 (B12R), and OPG120 (D8L), which is an inactivated carbonic anhydrase, are highly similar to the respective cellular homologs, despite the loss of the enzymatic activity, having been acquired relatively late in chordopoxvirus evolution (48).
The largest group of poxvirus proteins appears to have evolved via “conservative” exaptation (Table 2). This group includes most of the proteins that are involved in virus-host interactions, particularly the three major families of paralogous poxvirus proteins: those containing ankyrin and PRANC domains, Kelch and BTB domains, and BclII domains (10, 49). The functional diversification of virus proteins within these families represents “intramural exaptation”, in which virus proteins adopt a new function after gene duplication within evolving virus genomes. The origin of OPG148 from an ancestral poxvirus NAD-dependent ligase explored here might be a less obvious case of intramural expatation.
The smaller group of poxvirus proteins that represent the direct recruitment of cellular activities primarily consists of enzymes that are involved in genome replication and expression as well as those involved in nucleotide metabolism (Table 2).
Finally, one-third (50) of the poxvirus proteins, particularly the virion structural components and proteins involved in virion morphogenesis, showed no convincing structural similarity to any available structures or AlphaFold2 models. Thus, their origin remains obscure. Several of these are small, apparently nonglobular proteins for which no good models were obtained. For example, OPG24, OPG173 (A43.5R), and several others are tiny membrane proteins (OPG28, 59 [F14.5L], 76 [O3L], 78 [I2L]). However, more than half of the proteins in this group are globular, and the quality of their AlphaFold2 models is, on average, close to that for proteins with recognizable folds (mean plddt values of 83.3 and 86.1, respectively) (Table S1; Fig. S2). Some of these proteins did have a Dali match(es) with a z score of >5. However, the “gray zone” of Dali searches is wide (12), and an inspection of the matches for these proteins showed a lack of consistency among the matching structures and/or large RMSD values (Table S1), indicating a lack of evidence of homologous relationships. Given that all of the models in this work were compared both to the PDB and to the large database of AlphaFold2 models that covers the complete proteomes of humans and other model organisms (51), the lack of similar structures strongly suggests that these proteins adopt unique folds that are missing or are extremely rare in cellular organisms. Of note, 10 proteins in this group are subunits of the entry-fusion complex (EFC) (52, 53): OPGs 53 (F9L), 86 (G3L), 94 (G9R), 95 (L1R), 99 (L5R), 104 (J5L), 107 (H2R), 143 (A16L), 147 (A21L), 155 (A28L) (Fig. S3 and S4), with OPG143 (A16L), OPG94 (G9R), and OPG104 (J5L) as well as OPG53 (F9L) and OPG95 (L1R) comprising previously described groups of paralogs (53). Two additional pairs of paralogous proteins with apparent unique folds were detected: OPG18 (missing in VACV)-OPG27 (C7L) and OPG152 (A25L)-OPG153 (A26L). All of these paralogous relationships among OPGs were validated by an all-against-all comparison of the AlphaFold2 models (Fig. S4), but no additional significant structural similarities were identified, emphasizing the diversity of the OPGs of unknown provenance with apparent unique folds. Notably, in addition to the structural proteins, this group includes the fourth family of chordopoxvirus paralogous proteins, namely, those containing the chemokine-binding PIE domain, which is an all-beta domain with a unique topology (54). Fig. 6 illustrates 8 apparent unique folds, each representing a compact, globular structure with a high confidence prediction, at least, for the corresponding core domains.
FIG 6.
Predicted unique folds of orthopoxvirus proteins. Predicted globular structures of ORPV proteins with no homologs detected outside poxviruses are shown. The coloring is according to the AlphaFold2 plddt-score, as shown in panel A. The experimentally resolved structures of the respective OPGs are shown in green. Weakly supported C-terminal domains are not shown for OPG95 (L1R) and OPG153 (A26L). (A) OPG27 (C7L) and VACV C7L (5cyw). (B) OPG95 (L1R) (aa 1 to 176 [of 250]) and VACV L1R (1ypy). (C) OPG153 (A26L) (aa 1 to 359 [of 518]). (D) OPG114 (D2L). (E) OPG112 (H7R) and VACV H7 (4w60). (F) OPG70 (E8R). (G) OPG132 (A6L) and VACV A6L (N-term 6cb6, C-term 6br9). (H) OPG163 (A35R). OPG27, 95, and 153 have homologs among other poxvirus OPGs (Fig. S4).
Predicted global folds of OPGs of known and unknown provenance. OPGs were classified as having a global disordered or (partially) globular fold based on their plddt score (see Methods for details). Average plddt for OPGs with known provenance is 86.1 (SD 8.9), for OPGs with unknown provenance 76.5 (SD 11.5) and for OPGs of unknown provenance with a globular fold 82.6 (SD 6.4). Excluding small OPGs with a generic fold from this fraction, the average plddt is 83.3 (SD 6.4). Download FIG S2, PDF file, 1.2 MB (1.2MB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Predicted novel folds for OPGs, including subunits of the viral entry-fusion complex. Predicted globular structures with no homologs detected outside poxviruses are shown. Superposition with the experimentally resolved structure from PDB (green), where applicable. (A) OPG53 (F9L) and VACV F9 (6cj6), (B) OPG86 (G3L), (C) OPG94 (G9R), (D) OPG99 (L5R), (E) OPG104 (J5L), (F) OPG107 (H2R), (G) OPG143 (A16L), (H) OPG 147 (A21L), and (I) OPG155 (A28L). The coloring is according to the AlphaFold2 plddt score. Download FIG S3, PDF file, 0.9 MB (884KB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Supplement to Fig. 6. Recurring unique poxvirus protein structures across OPGs. (A) Dali z score matrix for an all-versus-all run of all OPGs. The OPGs were sorted based on their z scores. Hence, the numbers at the axes are arbitrary. The arrow indicates the cluster of OPGs with PIE domains. For visualization, the z scores were capped at 20. Individual OPG pairs can have a z score beyond 20. (B–D) Representative globular structures with no homologs identified outside poxviruses but recurring among the OPGs. (B) OPG27 (C7L) (purple) and OPG18 (orange), (C) OPG95 (L1R) (purple) and OPG53 (F9L) (orange), and (D) OPG153 (A26L) (purple), aa 1 to 359 [of 518] and OPG152 (A25L) (orange, aa 1 to 332 [of 1,279]). Download FIG S4, PDF file, 0.7 MB (755.3KB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
DISCUSSION
The recently developed methods for protein structure modeling, particularly AlphaFold2 (4, 55), as well as RosettaFold (5), open up unprecedented opportunities for tracing the origin of proteins through structural similarity. These developments are particularly promising for the study of the origins of virus proteins because viruses typically evolve (much) faster than do cellular organisms (56, 57). We applied AlphaFold2 structural modeling to the proteins of ORPV, which is a well-characterized group of large mammalian viruses of major medical importance (8, 9). High quality structural models were obtained for the great majority of the ORPV proteins and for 14 proteins without any homologs that were detectable by sequence similarity. Structural similarity pointed to the likely cellular ancestors. These findings include both radical exaptation, in which a host enzyme is repurposed for a structural role in virus reproduction, whereas the catalytic activity is lost, and conservative exaptation, in which the repurposing involves the original activity of a protein (23). The exaptation of enzymes accompanied by the disruption of the catalytic sites was detected for 8 additional ORPV proteins, bringing the total number of such cases to 16 (Table 1). These inactivated enzyme derivatives perform various functions in ORPV reproduction, but particularly notable is the exaptation of enzymes for the role of major virus core proteins, of which 4 cases were detected. It should be noted that even the inactivated enzyme derivatives among the ORPV proteins display a broad range of similarity to the inferred cellular ancestors, from high sequence conservation (despite catalytic site disruption) to moderate (even if significant) structural similarity. Apparently, this broad spectrum of conservation reflects different stages on the evolutionary path of exaptation and different degrees of functional change. Taken together, the results emphasize the importance of both radical and conservative modes of exaptation in virus evolution as well as the utility of structural comparison in detecting this phenomenon.
However, apart from the identified cases of exaptation of cellular proteins, the surprising outcome of this work is that for nearly one-third of the OPGs, no similar structures that appeared indicative of the likely origin were detected. AlphaFold2 modeling (along with RosettaFold) followed by structural comparison is by far the most powerful current approach for detecting homologous relationships among proteins (58, 59). Further methodological improvement will certainly follow. However, given the high model quality already attained, together with the completeness of the databases used for structure comparison, using the PDB structures and AlphaFold2 predictions for multiple model proteomes combined, future methodological developments appear unlikely to result in a radical improvement in the recognition of similar structures. Hence the question of the origin of a large fraction of virus proteins becomes pressing. Several proteins of unknown provenance are small, apparently intrinsically disordered proteins, and several more are tiny membrane proteins. For these, a de novo origin from a noncoding sequence during virus evolution seems to be a likely option (60, 61). Pervasive transcriptional initiation within poxvirus genomes could facilitate the generation of new proteins (62). However, the majority of the proteins in this category appeared to be globular and yielded high quality AlphaFold2 models. Thus, it appears most likely that the core folds of these proteins have no counterparts among cellular proteins (or, at the least, that these are extremely rare folds). Conceivably, these unique folds evolved via the exaptation of host proteins that was accompanied by the major rewiring of structural elements, thereby resulting in unique topologies. A notable case in point is the family of poxvirus proteins containing the chemokine-binding PIE domain (54). In its general shape, the PIE domain resembles other all-beta domains, such as the immunoglobulin fold, but the topology of the beta sheets is unique, such that structural comparisons detect no significant similarity. This apparent extensive protein fold remodeling suggests that the evolution of viruses with large genomes is even more innovative than was previously suspected. It has been observed that sequence diversity in a single family of large DNA viruses can surpass that of entire domains of cellular life (63). The present observations complement these findings by demonstrating the commensurate structural diversity of virus proteomes.
MATERIALS AND METHODS
Structure prediction of selected OPG representatives using AlphaFold2.
For each OPG, a single member was chosen for structural model prediction with Alphafold2 2.2.0. These OPG representatives were from either cowpox or VACV (10). In order to maintain the consistency of the protein nomenclature, VACV-Cop protein names were used in addition to the OPG numbering, wherever available. Multiple sequence alignments (MSAs) were generated for each, using the Alphafold2 preprocessing pipeline with the default parameters and databases as of 2022-04-22. Exceptions were OPGs 174 and 189, which required additional hhblits parameters for the query against the BFD + Uniclust30 databases (OPG174: -prepre_smax_thresh 50 -pre_evalue_thresh 100 -maxres 80000; OPG189: -maxres 60000). MSAs were used for structure modeling with the AlphaFold2 monomer model, using a template date cutoff of 2022-01-01. Of the 5 models generated for each OPG by AlphaFold2, the model with the highest mean plddt score was chosen for further analysis and structural similarity searches. No existing ORPV structures were excluded during the structure prediction using AlphaFold2. Hence, all predicted structures discussed in detail in the manuscript were compared to their experimentally resolved counterparts in PDB, if available. These include OPG20 (C10L) and VACV C10L (8AG4,8AG5 [64], OPG27 [C7L] and VACV C7L [5cyw] [65]), OPG95 (L1R) and VACV L1R (1ypy [66]), OPG112 (H7R) and VACV H7R (4w60 [50]) and OPG132 (A6L) and VACV A6 (N-term 6cb6, C-term 6br9 [67]) in Fig. 6, OPG53 (F1L) and VACV F9 (6cj6 [68]) in Fig. S3 and OPG64 (E2L) and VACV E2 (7phy [34]), OPG116 (D4) and VACV D4 (5jx8 [18]), OPG120 (D8) and VACV D8 (4e9o [48]), as well as OPG148 (A20R) and VACV A20 (N-term 4od8 [69], C-term 6zyc [70]) in Fig. S5.
Comparison between predicted structures and experimentally resolved structures. The modeled ORPV structures discussed in the manuscript were compared to their respective experimentally resolved structures whenever such comparison was not done in any other figure. The modeled structures are colored according to their AlphaFold2 plddt scores, as shown in panel A, with the experimentally resolved structure in green. (A) OPG64 (E2L) and VACV E2 (7phy), (B) OPG116 (D4R) and VACV D4 (5jx8), (C) OPG120 (D8L) and VACV D8 (4e9o), and (D) OPG148 (A20R) and VACV A20 (N-term 4od8, dark green; C-term 6zyc, light green). Download FIG S5, PDF file, 1.0 MB (1MB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
All of the AlphaFold2 models were assessed by their average and local plddt scores. Of the 214 OPG models, 186 showed a reliable overall average plddt score of 70 or higher (Table S1). An additional 8 proteins contained one or more predicted globular domains with a local plddt score higher than 70. A single OPG (OPG172) showed a globular fold but with a local plddt score below 70. The remaining 19 OPGs are short proteins for which single alpha helices can have high plddt scores but no globular fold was detectable. All of the models were kept for downstream analysis, although results obtained from low quality models were further examined manually. In addition, proteins were classified into small (≤100 amino acids [aa]), intermediate (100 to 200 aa) and large (>200 aa). Ordered, globular stretches were identified as those with a plddt score of 70 or higher for 6 or more consecutive amino acids. All ordered stretches of a single protein were considered to classify the overall fold as either (partially) structured/globular (either at least 50% of the sequence for small and intermediate proteins or at least 100 amino acids for large proteins being part of ordered stretches) or disordered (failing the above criteria).
Comparison of the OPG structural models to databases of protein structures.
All of the high quality AlphaFold2 models of OPGs were compared to the PDB andAlphaFold2 (v2) databases, using local installations of FoldSeek (2-8bd520) and Dali (5.1).
For foldseek, prebuilt databases of PDB, AlphaFold/Proteome, and Alphafold/Swiss-Prot were obtained using “foldseek databases” on 2022-06-04. Each structural model was used to query each of the three databases using “foldseek search -s 9.5 –max-seqs 2000 -a”, and this was followed by conversions to HTML and tabular output files.
For Dali, a prebuilt database for AlphaFoldDB v2 was obtained from http://ekhidna2.biocenter.helsinki.fi/dali/AF-Digest.tar.gz. Before using this database, a small number of empty data files and structures with more than 200 structural elements had to be removed to accommodate the limitations of Dali (989,438 structures). A PDB database was built by importing a local mirror of PDB (507,304 structures of subunits). For both databases, a 70% clustering was generated using “cd-hit -c 0.7” to be used as a representative set in hierarchical searches with “dali.pl –hierarchical –oneway –repset”.
In order to compare all of the representative OPG structures with each other, a local Dali (12) all-versus-all run was performed (default settings, multimode mode with 50 nodes). The corresponding Dali z scores were visualized in an ordered matrix. A local Dali all-versus-all run was also performed with the same settings to compare the OPG148 inactivated ligase with poxvirus NAD-dependent DNA ligases (from carp edema virus BCT22668.1, salmon gill poxvirus YP_009162495.1, Melanoplus sanguinipes entomopoxvirus NP_048233.1, and Amsacta moorei entomopoxvirus NP_064981.1) as well as a diverse set of NAD-dependent and ATP-dependent DNA ligases and RNA ligases.
Protein structure visualization.
Protein structures were visualized with Chimera X (71). The superposition of proteins was either realized by the Chimera X internal matchmaker (command “match number 2 to number 1”) if the protein sequences were similar enough. Otherwise, Dali translation-rotation matrices were used from the structural alignment runs (the Chimera X command “view matrix mod number 2,” which was followed by the 12 positions, comma separated, in order to match protein number 2 to number 1). See also: https://www.rbvi.ucsf.edu/pipermail/chimerax-users/2022-May/003656.html.
Structural alignment.
The structural alignments were generated using the Dali web server (http://ekhidna2.biocenter.helsinki.fi/dali/) (12) by running the representative OPG pairwise against seven diverse OPG members and three selected Dali hits, either from the PDB database (72) or from the AlphaFold2 database (4, 73). All of the full structural alignment files are available as additional material (see below). The following structural alignments are included in the Supplemental Material only: OPG77 (I1L) with 6uxv-H (74), 1uhr-A (doi:10.2210/pdb1uhr/pdb) and af2-db I1MUP5, OPG82 (I6L) with 3obi-C (doi:10.2210/pdb3obi/pdb), 2j5a-A (75) and 1u8s-B (doi:10.2210/pdb1u8s/pdb), OPG127 (A2L) with 5wh1-B (76), 7o75-M (77) and 7b5r-Y (40), OPG134 (A8R) with 3h4c-A (78), 4 rod-A (79) and 6qtg-B (80), OPG150 (A23R) with 1mp9-B (43), 2z8u-B (81), and 7q5b-Y (82), as well as OPG185 (A56R) with 6lsa-B (45), 4rwh-A (doi:10.2210/pdb4rwh/pdb), and 1mjj-B (46).
Additional material.
Additional files can be found at (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JXYYFN) and include: AlphaFold2 models for all OPGs, structural alignments of the OPGs mentioned above and their respective homologs, and the Dali all-versus-all z score matrix of representative OPGs.
ACKNOWLEDGMENTS
P.M. and E.V.K. are supported by the Intramural Research Program of the National Institutes of Health (National Library of Medicine). W.R. is supported by the Center for Information Technology, NIH. T.G.S. and B.M. were supported by the Division of Intramural Research, NIAID, NIH. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).
We declare no conflict of interest.
E.V.K., T.G.S., and B.M. initiated the project. E.V.K. designed the research. W.R. and G.F. designed and ran the computational pipelines for the protein structure modeling and comparison. P.M., T.G.S., E.V.K., and B.M. analyzed the results. P.M. performed the structure superposition and alignment. E.V.K., P.M., and W.R. wrote the manuscript. The manuscript was edited and approved by all authors.
Footnotes
This article is a direct contribution from Eugene V. Koonin, a Fellow of the American Academy of Microbiology, who arranged for and secured reviews by Stefan Rothenburg, University of California, Davis, and Nels Elde, The University of Utah Department of Human Genetics.
Contributor Information
Eugene V. Koonin, Email: koonin@ncbi.nlm.nih.gov.
Bernard Moss, Email: bmoss@niaid.nih.gov.
Igor B. Zhulin, The Ohio State University
REFERENCES
- 1.Koonin EV, et al. 2020. Global organization and proposed megataxonomy of the virus world. Microbiol Mol Biol Rev 84. doi: 10.1128/MMBR.00061-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Koonin EV, Senkevich TG, Dolja VV. 2006. The ancient virus world and evolution of cells. Biol Direct 1:29. doi: 10.1186/1745-6150-1-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Krupovic M, Dolja VV, Koonin EV. 2019. Origin of viruses: primordial replicators recruiting capsids from hosts. Nat Rev Microbiol 17:449–458. doi: 10.1038/s41579-019-0205-6. [DOI] [PubMed] [Google Scholar]
- 4.Jumper J, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Baek M, et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373:871–876. doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Porta-Pardo E, Ruiz-Serra V, Valentini S, Valencia A. 2022. The structural coverage of the human proteome before and after AlphaFold. PLoS Comput Biol 18:e1009818. doi: 10.1371/journal.pcbi.1009818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Binder JL, Berendzen J, Stevens AO, He Y, Wang J, Dokholyan NV, Oprea TI. 2022. AlphaFold illuminates half of the dark human proteins. Curr Opin Struct Biol 74:102372. doi: 10.1016/j.sbi.2022.102372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Moss D, Smith GL. Poxviridae: the viruses and their replication. In: Fields Virology. Vol 2, 7 ed: LWW; 2021. [Google Scholar]
- 9.Greseth MD, Traktman P. 2022. The life cycle of the vaccinia virus genome. Annu Rev Virol doi: 10.1146/annurev-virology-091919-104752. [DOI] [PubMed] [Google Scholar]
- 10.Senkevich TG, Yutin N, Wolf YI, Koonin EV, Moss B. 2021. Ancient gene capture and recent gene loss shape the evolution of orthopoxvirus-host interaction genes. mBio 12:e0149521. doi: 10.1128/mBio.01495-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Van Kempen M, et al. 2022. Foldseek: fast and accurate protein structure search. doi: 10.1101/2022.02.07.479398. [DOI] [PMC free article] [PubMed]
- 12.Holm L. 2020. DALI and the persistence of protein shape. Protein Sci 29:128–140. doi: 10.1002/pro.3749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Carpentier DCJ, Van Loggerenberg A, Dieckmann NMG, Smith GL. 2017. Vaccinia virus egress mediated by virus protein A36 is reliant on the F12 protein. J Gen Virol 98:1500–1514. doi: 10.1099/jgv.0.000816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yutin N, Faure G, Koonin EV, Mushegian AR. 2014. Chordopoxvirus protein F12 implicated in enveloped virion morphogenesis is an inactivated DNA polymerase. Biol Direct 9:22. doi: 10.1186/1745-6150-9-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Senkevich TG, Koonin EV, Moss B. 2011. Vaccinia virus F16 protein, a predicted catalytically inactive member of the prokaryotic serine recombinase superfamily, is targeted to nucleoli. Virology 417:334–342. doi: 10.1016/j.virol.2011.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Czarnecki MW, Traktman P. 2017. The vaccinia virus DNA polymerase and its processivity factor. Virus Res 234:193–206. doi: 10.1016/j.virusres.2017.01.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Tarbouriech N, et al. 2017. The vaccinia virus DNA polymerase structure provides insights into the mode of processivity factor binding. Nat Commun 8:1455. doi: 10.1038/s41467-017-01542-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schormann N, et al. 2016. Poxvirus uracil-DNA glycosylase-An unusual member of the family I uracil-DNA glycosylases. Protein Sci 25:2113–2131. doi: 10.1002/pro.3058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.De Silva FS, Moss B. 2003. Vaccinia virus uracil DNA glycosylase has an essential role in DNA synthesis that is independent of its glycosylase activity: catalytic site mutations reduce virulence but not virus replication in cultured cells. J Virol 77:159–166. doi: 10.1128/jvi.77.1.159-166.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.De Silva FS, Moss B. 2008. Effects of vaccinia virus uracil DNA glycosylase catalytic site and deoxyuridine triphosphatase deletion mutations individually and together on replication in active and quiescent cells and pathogenesis in mice. Virol J 5:145. doi: 10.1186/1743-422X-5-145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Peng Q, Xie Y, Kuai L, Wang H, Qi J, Gao GF, Shi Y. 2023. Structure of monkeypox virus DNA polymerase holoenzyme. Science 379:100–105. doi: 10.1126/science.ade6360. [DOI] [PubMed] [Google Scholar]
- 22.Yutin N, Koonin EV. 2009. Evolution of DNA ligases of nucleo-cytoplasmic large DNA viruses of eukaryotes: a case of hidden complexity. Biol Direct 4:51. doi: 10.1186/1745-6150-4-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Koonin EV, Dolja VV, Krupovic M. 2022. The logic of virus evolution. Cell Host Microbe 30:917–929. doi: 10.1016/j.chom.2022.06.008. [DOI] [PubMed] [Google Scholar]
- 24.Jesus DM, et al. 2015. Vaccinia virus protein A3 is required for the production of normal immature virions and for the encapsidation of the nucleocapsid protein L4. Virology 481:1–12. doi: 10.1016/j.virol.2015.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Koonin EV, Yutin N. 2019. Evolution of the large nucleocytoplasmic DNA viruses of eukaryotes and convergent origins of viral gigantism. Adv Virus Res 103:167–202. doi: 10.1016/bs.aivir.2018.09.002. [DOI] [PubMed] [Google Scholar]
- 26.Aggarwal M, Dhindwal S, Kumar P, Kuhn RJ, Tomar S. 2014. trans-Protease activity and structural insights into the active form of the alphavirus capsid protease. J Virol 88:12242–12253. doi: 10.1128/JVI.01692-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Dyster LM, Niles EG. 1991. Genetic and biochemical characterization of vaccinia virus genes D2L and D3R which encode virion structural proteins. Virology 182:455–467. doi: 10.1016/0042-6822(91)90586-Z. [DOI] [PubMed] [Google Scholar]
- 28.Ogura K, Yahiro K, Moss J. 2020. Cell death signaling pathway induced by Cholix toxin, a cytotoxin and eEF2 ADP-ribosyltransferase produced by Vibrio cholerae. Toxins (Basel) 13. doi: 10.3390/toxins13010012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cartwright TN, et al. 2022. Dissecting the roles of Haspin and VRK1 in histone H3 phosphorylation during mitosis. Sci Rep 12:11210. doi: 10.1038/s41598-022-15339-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rico AB, Linville AC, Olson AT, Wang Z, Wiebe MS. 2021. The Vaccinia virus B12 pseudokinase represses viral replication via interaction with the cellular kinase VRK1 and activation of the antiviral effector BAF. J Virol 95. doi: 10.1128/JVI.02114-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Linville AC, Rico AB, Teague H, Binsted LE, Smith GL, Albarnaz JD, Wiebe MS. 2022. Dysregulation of cellular VRK1, BAF, and innate immune signaling by the vaccinia virus B12 pseudokinase. J Virol 96:e0039822. doi: 10.1128/jvi.00398-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Adams M, et al. 2021. Structural basis for protein glutamylation by the Legionella pseudokinase SidJ. Nat Commun 12:6174. doi: 10.1038/s41467-021-26429-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Osinski A, et al. 2021. Structural and mechanistic basis for protein glutamylation by the kinase fold. Mol Cell 81:4527–4539. doi: 10.1016/j.molcel.2021.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gao WND, et al. 2022. The crystal structure of vaccinia virus protein E2 and perspectives on the prediction of novel viral protein folds. J Gen Virol 103. doi: 10.1099/jgv.0.001716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Carpentier DC, Gao WN, Ewles H, Morgan GW, Smith GL. 2015. Vaccinia virus protein complex F12/E2 interacts with kinesin light chain isoform 2 to engage the kinesin-1 motor complex. PLoS Pathog 11:e1004723. doi: 10.1371/journal.ppat.1004723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Schweneker M, Lukassen S, Späth M, Wolferstätter M, Babel E, Brinkmann K, Wielert U, Chaplin P, Suter M, Hausmann J. 2012. The vaccinia virus O1 protein is required for sustained activation of extracellular signal-regulated kinase 1/2 and promotes viral virulence. J Virol 86:2323–2336. doi: 10.1128/JVI.06166-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Grubisha O, Traktman P. 2003. Genetic analysis of the vaccinia virus I6 telomere-binding protein uncovers a key role in genome encapsidation. J Virol 77:10929–10942. doi: 10.1128/JVI.77.20.10929-10942.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Keck JG, Baldick CJ, Jr, Moss B. 1990. Role of DNA replication in vaccinia virus gene expression: a naked template is required for transcription of three late trans-activator genes. Cell 61:801–809. doi: 10.1016/0092-8674(90)90190-P. [DOI] [PubMed] [Google Scholar]
- 39.Warren RD, Cotter CA, Moss B. 2012. Reverse genetics analysis of poxvirus intermediate transcription factors. J Virol 86:9514–9519. doi: 10.1128/JVI.06902-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Horn-Ghetko D, et al. 2021. Ubiquitin ligation to F-box protein targets by SCF-RBR E3-E3 super-assembly. Nature 590:671–676. doi: 10.1038/s41586-021-03197-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bennett-Lovsey R, Hart SE, Shirai H, Mizuguchi K. 2002. The SWIB and the MDM2 domains are homologous and share a common fold. Bioinformatics 18:626–630. doi: 10.1093/bioinformatics/18.4.626. [DOI] [PubMed] [Google Scholar]
- 42.Sanz P, Moss B. 1999. Identification of a transcription factor, encoded by two vaccinia virus early genes, that regulates the intermediate stage of viral gene expression. Proc Natl Acad Sci USA 96:2692–2697. doi: 10.1073/pnas.96.6.2692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Koike H, et al. 2004. Origins of protein stability revealed by comparing crystal structures of TATA binding proteins. Structure 12:157–168. doi: 10.1016/j.str.2003.12.003. [DOI] [PubMed] [Google Scholar]
- 44.DeHaven BC, Gupta K, Isaacs SN. 2011. The vaccinia virus A56 protein: a multifunctional transmembrane glycoprotein that anchors two secreted viral proteins. J Gen Virol 92:9):1971–1980. doi: 10.1099/vir.0.030460-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yue D, et al. 2020. Crystal structure of bovine herpesvirus 1 glycoprotein D bound to nectin-1 reveals the basis for its low-affinity binding to the receptor. Sci Adv 6:eaba5147. doi: 10.1126/sciadv.aba5147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ruzheinikov SN, et al. 2003. High-resolution crystal structure of the Fab-fragments of a family of mouse catalytic antibodies with esterase activity. J Mol Biol 332:423–435. doi: 10.1016/S0022-2836(03)00902-1. [DOI] [PubMed] [Google Scholar]
- 47.Almazan F, Tscharke DC, Smith GL. 2001. The vaccinia virus superoxide dismutase-like protein (A45R) is a virion component that is nonessential for virus replication. J Virol 75:7018–7029. doi: 10.1128/JVI.75.15.7018-7029.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Matho MH, Maybeno M, Benhnia MR-E-I, Becker D, Meng X, Xiang Y, Crotty S, Peters B, Zajonc DM. 2012. Structural and biochemical characterization of the vaccinia virus envelope protein D8 and its recognition by the antibody LA5. J Virol 86:8050–8058. doi: 10.1128/JVI.00836-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Bratke KA, McLysaght A, Rothenburg S. 2013. A survey of host range genes in poxvirus genomes. Infect Genet Evol 14:406–425. doi: 10.1016/j.meegid.2012.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kolli S, et al. 2015. Structure-function analysis of vaccinia virus H7 protein reveals a novel phosphoinositide binding fold essential for poxvirus replication. J Virol 89:2209–2219. doi: 10.1128/JVI.03073-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Tunyasuvunakool K, et al. 2021. Highly accurate protein structure prediction for the human proteome. Nature 596:590–596. doi: 10.1038/s41586-021-03828-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Senkevich TG, Ojeda S, Townsley A, Nelson GE, Moss B. 2005. Poxvirus multiprotein entry-fusion complex. Proc Natl Acad Sci USA 102:18572–18577. doi: 10.1073/pnas.0509239102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Moss B. 2012. Poxvirus cell entry: how many proteins does it take? Viruses 4:688–707. doi: 10.3390/v4050688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Nelson CA, Epperson ML, Singh S, Elliott JI, Fremont DH. 2015. Structural conservation and functional diversity of the poxvirus immune evasion (PIE) domain superfamily. Viruses 7:4878–4898. doi: 10.3390/v7092848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mirdita M, et al. 2022. ColabFold: making protein folding accessible to all. Nat Methods 19:679–682. doi: 10.1038/s41592-022-01488-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Aiewsakun P, Katzourakis A. 2016. Time-dependent rate phenomenon in viruses. J Virol 90:7184–7195. doi: 10.1128/JVI.00593-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Domingo E, Garcia-Crespo C, Lobo-Vega R, Perales C. 2021. Mutation rates, mutation frequencies, and proofreading-repair activities in RNA virus genetics. Viruses 13. doi: 10.3390/v13091882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Schleif R, Espinosa M. 2022. Where to from here? Front Mol Biosci 9:848444. doi: 10.3389/fmolb.2022.848444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Jones DT, Thornton JM. 2022. The impact of AlphaFold2 one year on. Nat Methods 19:15–20. doi: 10.1038/s41592-021-01365-3. [DOI] [PubMed] [Google Scholar]
- 60.Vakirlis N, et al. 2020. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat Commun 11:781. doi: 10.1038/s41467-020-14500-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Fesenko I, et al. 2021. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants. Nucleic Acids Res 49:10328–10346. doi: 10.1093/nar/gkab816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Yang Z, Martens CA, Bruno DP, Porcella SF, Moss B. 2012. Pervasive initiation and 3′-end formation of poxvirus postreplicative RNAs. J Biol Chem 287:31050–31060. doi: 10.1074/jbc.M112.390054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Mihara T, et al. 2018. Taxon richness of “Megaviridae” exceeds those of bacteria and archaea in the ocean. Microbes Environ 33:162–171. doi: 10.1264/jsme2.ME17203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rivera-Calzada A, et al. 2022. Structural basis for the inactivation of cytosolic DNA sensing by the vaccinia virus. Nat Commun 13:7062. doi: 10.1038/s41467-022-34843-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Meng X, Krumm B, Li Y, Deng J, Xiang Y. 2015. Structural basis for antagonizing a host restriction factor by C7 family of poxvirus host-range proteins. Proc Natl Acad Sci USA 112:14858–14863. doi: 10.1073/pnas.1515354112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Su HP, et al. 2005. The 1.51-Angstrom structure of the poxvirus L1 protein, a target of potent neutralizing antibodies. Proc Natl Acad Sci USA 102:4240–4245. doi: 10.1073/pnas.0501103102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pathak PK, et al. 2018. Structure of a lipid-bound viral membrane assembly protein reveals a modality for enclosing the lipid bilayer. Proc Natl Acad Sci USA 115:7028–7032. doi: 10.1073/pnas.1805855115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Diesterbeck US, Gittis AG, Garboczi DN, Moss B. 2018. The 2.1 A structure of protein F9 and its comparison to L1, two components of the conserved poxvirus entry-fusion complex. Sci Rep 8:16807. doi: 10.1038/s41598-018-34244-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Contesto-Richefeu C, et al. 2014. Crystal structure of the vaccinia virus DNA polymerase holoenzyme subunit D4 in complex with the A20 N-terminal domain. PLoS Pathog 10:e1003978. doi: 10.1371/journal.ppat.1003978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Bersch B, Tarbouriech N, Burmeister WP, Iseni F. 2021. Solution structure of the C-terminal domain of A20, the missing brick for the characterization of the interface between vaccinia virus DNA polymerase and its processivity factor. J Mol Biol 433:167009. doi: 10.1016/j.jmb.2021.167009. [DOI] [PubMed] [Google Scholar]
- 71.Pettersen EF, et al. 2021. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci 30:70–82. doi: 10.1002/pro.3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Burley SK, et al. 2019. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47:D464–D474. doi: 10.1093/nar/gky1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Varadi M, et al. 2022. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Han Y, Reyes AA, Malik S, He Y. 2020. Cryo-EM structure of SWI/SNF complex bound to a nucleosome. Nature 579:452–455. doi: 10.1038/s41586-020-2087-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Olofsson M, Hansson S, Hedberg L, Logan DT, Oliveberg M. 2007. Folding of S6 structures with divergent amino acid composition: pathway flexibility within partly overlapping foldons. J Mol Biol 365:237–248. doi: 10.1016/j.jmb.2006.09.016. [DOI] [PubMed] [Google Scholar]
- 76.Bratkowski M, et al. 2018. Structural dissection of an interaction between transcription initiation and termination factors implicated in promoter-terminator cross-talk. J Biol Chem 293:1651–1665. doi: 10.1074/jbc.M117.811521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Schilbach S, Aibara S, Dienemann C, Grabbe F, Cramer P. 2021. Structure of RNA polymerase II pre-initiation complex at 2.9 A defines initial DNA opening. Cell 184:4064–4072. doi: 10.1016/j.cell.2021.05.012. [DOI] [PubMed] [Google Scholar]
- 78.Ibrahim BS, et al. 2009. Structure of the C-terminal domain of transcription factor IIB from Trypanosoma brucei. Proc Natl Acad Sci USA 106:13242–13247. doi: 10.1073/pnas.0904309106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Gouge J, et al. 2015. Redox signaling by the RNA polymerase III TFIIB-related factor Brf2. Cell 163:1375–1387. doi: 10.1016/j.cell.2015.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hofmann MH, et al. 2020. Selective and potent CDK8/19 inhibitors enhance NK-cell activity and promote tumor surveillance. Mol Cancer Ther 19:1018–1030. doi: 10.1158/1535-7163.MCT-19-0789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Adachi N, Senda M, Natsume R, Senda T, Horikoshi M. 2008. Crystal structure of Methanococcus jannaschii TATA box-binding protein. Genes Cells 13:1127–1140. [DOI] [PubMed] [Google Scholar]
- 82.Abascal-Palacios G, Jochem L, Pla-Prats C, Beuron F, Vannini A. 2021. Structural basis of Ty3 retrotransposon integration at RNA Polymerase III-transcribed genes. Nat Commun 12:6992. doi: 10.1038/s41467-021-27338-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Berman AJ, et al. 2007. Structures of phi29 DNA polymerase complexed with substrate: the mechanism of translocation in B-family polymerases. EMBO J 26:3494–3505. doi: 10.1038/sj.emboj.7601780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Chen W, et al. 2018. Multiple serine transposase dimers assemble the transposon-end synaptic complex during IS607-family transposition. Elife 7. doi: 10.7554/eLife.39611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Yuan P, Gupta K, Van Duyne GD. 2008. Tetrameric structure of a serine integrase catalytic domain. Structure 16:1275–1286. doi: 10.1016/j.str.2008.04.018. [DOI] [PubMed] [Google Scholar]
- 86.Bonanno JB, et al. 2009. Crystal structure of a resolvase family site-specific recombinase from Streptococcus pneumoniae. PDB. [Google Scholar]
- 87.Longbotham JE, et al. 2015. Structure and mechanism of a viral collagen prolyl hydroxylase. Biochemistry 54:6093–6105. doi: 10.1021/acs.biochem.5b00789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Komander D, et al. 2008. The structure of the CYLD USP domain explains its specificity for Lys63-linked polyubiquitin and reveals a B box module. Mol Cell 29:451–464. doi: 10.1016/j.molcel.2007.12.018. [DOI] [PubMed] [Google Scholar]
- 89.Yun M, Zhang X, Park CG, Park HW, Endow SA. 2001. A structural pathway for activation of the kinesin motor ATPase. EMBO J 20:2611–2618. doi: 10.1093/emboj/20.11.2611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Cao L, Cantos-Fernandes S, Gigant B. 2017. The structural switch of nucleotide-free kinesin. Sci Rep 7:42558. doi: 10.1038/srep42558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Yamagishi M, et al. 2016. Structural basis of backwards motion in kinesin-1-kinesin-14 chimera: implication for kinesin-14 motility. Structure 24:1322–1334. doi: 10.1016/j.str.2016.05.021. [DOI] [PubMed] [Google Scholar]
- 92.Jorgensen R, et al. 2008. Cholix toxin, a novel ADP-ribosylating factor from Vibrio cholerae. J Biol Chem 283:10671–10678. doi: 10.1074/jbc.M710008200. [DOI] [PubMed] [Google Scholar]
- 93.Heroven C, et al. 2018. Halogen-Aromatic pi interactions modulate inhibitor residence times. Angew Chem Int Ed Engl 57:7220–7224. doi: 10.1002/anie.201801666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Black MH, et al. 2019. Bacterial pseudokinase catalyzes protein polyglutamylation to inhibit the SidE-family ubiquitin ligases. Science 364:787–792. doi: 10.1126/science.aaw7446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Zhang N, et al. 2011. Binding of herpes simplex virus glycoprotein D to nectin-1 exploits host cell adhesion. Nat Commun 2:577. doi: 10.1038/ncomms1571. [DOI] [PubMed] [Google Scholar]
- 96.Yi C, et al. 2010. Iron-catalysed oxidation intermediates captured in a DNA repair dioxygenase. Nature 468:330–333. doi: 10.1038/nature09497. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Comprehensive structural modeling and analysis for orthopoxvirus proteins (OPG). Download Table S1, XLSX file, 0.05 MB (47.1KB, xlsx) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Supplement to Fig. 1. OPG20 (C10L), OPG31 (C4L), and OPG165 (A37R) are homologs of hydroxylases. (A–C) The superimposition OPG (blue) with the respective hydroxylase domain of cellular protein (green). (A) OPG20 (aa 1 to 160) and the PKHD-type hydroxylase of Shewanella baltica (3dkq [DOI: 10.2210/pdb3dkq/pdb], aa 5 to 200). (B) OPG31 (aa1 to 147) and oxidoreductase, the 2OG-Fe(II) Oxygenase family of Burkholderia pseudomallei (6n1f [DOI: 10.2210/pdb6n1f/pdb], aa 3 to 216). (C) OPG165 (aa1 to 139) and dioxygenase from E. coli (3o1r [96], aa13 to 214). (D) Structural alignment of OPGs with hydroxylases as the best hit. Blue: OPGs from this figure and Fig. 2A and C with OPG55 (F11L) as the query for pairwise structural alignment. From top to bottom: OPG55, OPG165, OPG20, OPG181 (A51R), and OPG31. Green: hydroxylases. From top to bottom: dioxygenase from E. coli (3o1r [96]), PKHD-type hydroxylase of Shewanella baltica (3dkq [DOI: 10.2210/pdb3dkq/pdb]), human lysyl hydroxylase LH3 (6tex [DOI: 10.2210/pdb6tex/pdb]), and an oxidoreductase from Burkholderia pseudomallei (6n1f [DOI: 10.2210/pdb6n1f/pdb]). Download FIG S1, PDF file, 0.7 MB (700.6KB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Comparison of the structural model for OPG148 with models of poxvirus NAD-dependent ligases and structures of representative NAD-dependent and ATP-dependent ligases. VAVC, vacinia virus; MPXV, mpox (monkeypox) virus; CMLV, camelpox virus; VARV, variola virus; SwPV, swinepox virus; SORPV, sea otterpox virus; LSDV, lumpy skin disease virus; YLDV, Yaba-like disease virus; MyxV, myxoma virus; MCV, Molluscum contagiosum virus; CRV, Nile crocodilepox virus; CNPV, canarypox virus; SGPV, salmon gill poxvirus; SFV, rabbit (shope) fibroma virus; af2-db, AlphaFold2 database; ECTV, Ectromelia virus; RCNV, raccoonpox virus; YKV, Yoka poxvirus. Download Table S2, XLSX file, 0.01 MB (10.4KB, xlsx) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Predicted global folds of OPGs of known and unknown provenance. OPGs were classified as having a global disordered or (partially) globular fold based on their plddt score (see Methods for details). Average plddt for OPGs with known provenance is 86.1 (SD 8.9), for OPGs with unknown provenance 76.5 (SD 11.5) and for OPGs of unknown provenance with a globular fold 82.6 (SD 6.4). Excluding small OPGs with a generic fold from this fraction, the average plddt is 83.3 (SD 6.4). Download FIG S2, PDF file, 1.2 MB (1.2MB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Predicted novel folds for OPGs, including subunits of the viral entry-fusion complex. Predicted globular structures with no homologs detected outside poxviruses are shown. Superposition with the experimentally resolved structure from PDB (green), where applicable. (A) OPG53 (F9L) and VACV F9 (6cj6), (B) OPG86 (G3L), (C) OPG94 (G9R), (D) OPG99 (L5R), (E) OPG104 (J5L), (F) OPG107 (H2R), (G) OPG143 (A16L), (H) OPG 147 (A21L), and (I) OPG155 (A28L). The coloring is according to the AlphaFold2 plddt score. Download FIG S3, PDF file, 0.9 MB (884KB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Supplement to Fig. 6. Recurring unique poxvirus protein structures across OPGs. (A) Dali z score matrix for an all-versus-all run of all OPGs. The OPGs were sorted based on their z scores. Hence, the numbers at the axes are arbitrary. The arrow indicates the cluster of OPGs with PIE domains. For visualization, the z scores were capped at 20. Individual OPG pairs can have a z score beyond 20. (B–D) Representative globular structures with no homologs identified outside poxviruses but recurring among the OPGs. (B) OPG27 (C7L) (purple) and OPG18 (orange), (C) OPG95 (L1R) (purple) and OPG53 (F9L) (orange), and (D) OPG153 (A26L) (purple), aa 1 to 359 [of 518] and OPG152 (A25L) (orange, aa 1 to 332 [of 1,279]). Download FIG S4, PDF file, 0.7 MB (755.3KB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Comparison between predicted structures and experimentally resolved structures. The modeled ORPV structures discussed in the manuscript were compared to their respective experimentally resolved structures whenever such comparison was not done in any other figure. The modeled structures are colored according to their AlphaFold2 plddt scores, as shown in panel A, with the experimentally resolved structure in green. (A) OPG64 (E2L) and VACV E2 (7phy), (B) OPG116 (D4R) and VACV D4 (5jx8), (C) OPG120 (D8L) and VACV D8 (4e9o), and (D) OPG148 (A20R) and VACV A20 (N-term 4od8, dark green; C-term 6zyc, light green). Download FIG S5, PDF file, 1.0 MB (1MB, pdf) .
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.






