LETTER
The genus Rubivirus was previously comprised of a single member, Rubella virus (RUBV), which is spread by airborne or maternal-fetal transmission and only infects humans (1). This genus recently expanded with the identification of two new members, Ruhugu virus (RUHV), which is reported to use bats as its reservoir, and Rustrela virus (RUSV), which uses mice as a reservoir (2) (Fig. 1A). RUSV appears to be carried by field mice and was found to cause lethal encephalitis in capybaras, donkeys, and wallabies (2). Sequence analysis of these new rubiviruses suggested that the RUSV capsid protein (Cp) is considerably shorter than that of RUHV or RUBV, while having a correspondingly longer intergenic region located upstream of the Cp sequence (2) (Fig. 1A). To analyze the potential truncation of RUSV Cp, we aligned the Cps of the three rubiviruses (Fig. 1B) using the Clustal Omega program of UniProt and ENDscript server (3). The entire RUBV/RUHV N-terminal Cp region (amino acids [aa] ∼1 to 110) was missing from mouse RUSV Cp (Fig. 1B). In the RUBV Cp, this polybasic region (PB) contains an RNA binding domain (RB) and is reported to be disordered (Fig. 1A) (4, 5). ProtParam (6) calculated the pI (isoelectric point) of the first 110 aa of the RUBV and RUHV Cps as ∼11.8, in keeping with their role in binding genomic RNA.
FIG 1.
Capsid of Rubivirus. (A, top) Diagram of the nonstructural (p150 and p90) and structural (Cp, E2, and E1) proteins (1), their positions (number at the bottom), and the intergenic region (black line) of the RUBV M33 strain (accession no. Q86500 and P08563). (A, bottom) Expanded view of the Rubivirus Cp. The disordered region (green) with the RNA binding region (RB [gray]), C-terminal domain (CTD [orange]), and E2 signal sequence (blue) and their positions are indicated (4, 5). Reported natural reservoirs are mentioned in parentheses under each rubivirus. Note that the Cp of RUSV (∼200 aa) is considerably shorter than those of RUBV and RUHV (∼300 aa) (2). (B) Sequence alignment of the Rubivirus Cp. The aa sequences were aligned using Clustal Omega (UniProt), ENDscript server (3), and PDB file 4HAR (4). The N-terminal disordered region (sequence between the two cyan-shaded residues) with RB (gray shading), the secondary structure of the CTD region, and E2 signal sequence (E2SS [blue shading]) are indicated. The reported first five residues of RUSV Cp are marked with green shading (2). The Cys residues participating in inter-CTD dimer formation are indicated by the green numbers 1 and 2 (4). Strictly conserved residues and ∼66% conserved or similarly substituted (10) residues are indicated by the white text on red background and red text on white background, respectively. Gaps in the alignment are shown as dots (11). The beta-strands are labeled βA to βE. Helix αA corresponds to helix H in reference 4. Helix ηA corresponds to the small helix in the flexible loop region of Cp CTD (4). Beta turns are indicated by TT. Note that the N-terminal disordered region corresponding to aa ∼1 to 110 of RUBV or RUHV Cp is missing from RUSV Cp.
To examine the N-terminal region of the RUSV Cp more carefully, we first used the blastx feature of UniProt to find possible homologous sequences in the RUSV structural region. We defined the RUSV structural region as the nucleotide sequence starting immediately after the stop codon of the nonstructural open reading frame (NS ORF) and extending to the end of the genome. For all of the RUSV query sequences, blastx identified additional sequence upstream but in frame with MATGR, the proposed start site (2) of RUSV Cp (Fig. 2A). The additional sequences shared >52% identity with that of RUBV Cp. This newfound sequence starting from (RRGGR) matched 100% the sequences between the 3 RUSV isolates from different animals (Fig. 2A). We then manually translated the complete structural regions of the different RUSV Cps. By aligning the sequences that remained in frame with both RRGGR and MATGR, we found >90% conservation in the region upstream of the Cp among different RUSV Cps (Fig. 2B). We added these new sequences to the N terminus of RUSV Cp and aligned it with RUBV or RUHV Cp. The alignment with RUHV Cp is shown in Fig. 2C.
FIG 2.
Putative N-terminal region of RUSV capsid. (A) Sequence alignment of UniProt blastx results from the structural regions of the mouse (M_), capybara (C_), and donkey (D_) RUSV isolates. The blastx results show the upstream regions that have similarity to RUBV Cp and also remain in frame with the proposed start site (green shading) of RUSV Cp. The sequences shown inside the blue box are strictly conserved between RUSV Cps from different animals. The first and last five residues inside the blue box are also highlighted by yellow and green shading, respectively. (B) Alignment of frame 1 (F_1) of the extreme N-terminal portion of the RUSV structural polyprotein. The last three residues of the nonstructural ORF and the stop codon are marked by fuchsia shading and X (on top of the alignment), respectively. Note how the aligned sequences between the fuchsia and green shading remain in frame with the yellow shading as well. A region of D_RUSV (black box) varies between Fig. 2A and B. By aligning the D_RUSV genome to that of C_RUSV, we found that the D_RUSV genome is missing a single G nucleotide between positions 6061 and 6062. After incorporating a G nucleotide in this position, the F_1 of D_RUSV aligned strongly with that of the M_RUSV and C_RUSV genomes throughout the N-terminal portion of the structural region (between the fuchsia and yellow shading). The new sequence resulting from this G insertion is marked by a blue asterisk and was used for subsequent analysis. (C) Alignment of the polybasic region (PB) of RUSV Cp with that of RUHV Cp. Features are highlighted as in panel B. Strictly conserved (white text with red background) and 75% conserved or similarly substituted (10) (red text with white background) residues are shown. The sequence just under the initiating M residue of RUHV Cp is highlighted with gray shading for RUSV. For panels A to C, gaps are shown as dots in the alignment (11).
The pI of the newfound N-terminal sequence (from STAPH to MATGR) (Fig. 2C) of the capybara RUSV Cp is ∼12.2, so we termed this the polybasic region (PB). The highly basic nature of the RUSV PB suggests a possible RNA-binding function. Of note, the rubivirus structural polyprotein is translated from a subgenomic RNA that carries the starting Met (1), but none of our analyses identified a Met in frame with the PB in the RUSV structural region. The PB of the RUSV Cp sequences is in frame with the stop codon of the NS ORF (Fig. 2B and C), and thus a readthrough of the opal stop codon in the genomic RNA could allow the translation of a longer form of Cp. Alternatively, a noncanonical start site (7) could lead to translation of Cp containing all or some of the PB. A Met is present in a different reading frame in the structural region (at nucleotide [nt] positions 5824 to 5826 of C_RUSV). If this Met is used to initiate translation, a programmed ribosomal frameshift (7) could allow synthesis of the remainder of Cp, although with less conservation of Cp sequence and length among the three RUSV isolates. The RUSV intergenic region contains repetitive sequences and an area with exceptionally high GC content (87% GC between nt 5961 and 6149), which also makes sequence analysis challenging.
The Cps of some other positive-strand RNA viruses also show unexpected features (8, 9). For example, the Cp of the flaviviruses dengue virus and tick-borne encephalitis virus can tolerate large deletions (8), while it is unclear if some pegiviruses even contain a Cp (9). It remains to be determined if and how the PB is expressed in RUSV-infected cells, and if not, how the RUSV Cp binds and packages viral RNA. Mapping of the subgenomic promoter and sequence analysis of the RUSV subgenomic RNA may help to resolve this paradox.
ACKNOWLEDGMENTS
This work was supported by a grant to M.K. from the NIH/NIAID (R01-AI075647). The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the NIH/NIAID. The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.
We declare no conflicts of interest.
REFERENCES
- 1.Hobman TC. 2013. Rubella virus, Chapter 24, p 687–711. In Knipe DM, Howley PM, Cohen JI, Griffin DE, Lamb RA, Martin MA, Racaniello VR, Roizman B (ed), Fields virology, 6th ed. Lippincott Williams & Wilkins, Philadelphia, PA. [Google Scholar]
- 2.Bennett AJ, Paskey AC, Ebinger A, Pfaff F, Priemer G, Höper D, Breithaupt A, Heuser E, Ulrich RG, Kuhn JH, Bishop-Lilly KA, Beer M, Goldberg TL. 2020. Relatives of rubella virus in diverse mammals. Nature 586:424–428. doi: 10.1038/s41586-020-2812-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Robert X, Gouet P. 2014. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res 42:W320–W324. doi: 10.1093/nar/gku316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mangala Prasad V, Willows SD, Fokine A, Battisti AJ, Sun S, Plevka P, Hobman TC, Rossmann MG. 2013. Rubella virus capsid protein structure and its role in virus assembly and infection. Proc Natl Acad Sci U S A 110:20105–20110. doi: 10.1073/pnas.1316681110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu Z, Yang D, Qiu Z, Lim KT, Chong P, Gillam S. 1996. Identification of domains in rubella virus genomic RNA and capsid protein necessary for specific interaction. J Virol 70:2184–2190. doi: 10.1128/JVI.70.4.2184-2190.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Bairoch A. 2005. Protein identification and analysis tools on the ExPASy server, p 571–607. In Walker JM (ed), The proteomics protocols handbook. Humana Press, Totowa, NJ. [Google Scholar]
- 7.Firth AE, Brierley I. 2012. Non-canonical translation in RNA viruses. J Gen Virol 93:1385–1409. doi: 10.1099/vir.0.042499-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Byk LA, Gamarnik AV. 2016. Properties and functions of the dengue virus capsid protein. Annu Rev Virol 3:263–281. doi: 10.1146/annurev-virology-110615-042334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stapleton JT, Foung S, Muerhoff AS, Bukh J, Simmonds P. 2011. The GB viruses: a review and proposed classification of GBV-A, GBV-C (HGV), and GBV-D in genus Pegivirus within the family Flaviviridae. J Gen Virol 92:233–246. doi: 10.1099/vir.0.027490-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pommié C, Levadoux S, Sabatier R, Lefranc G, Lefranc M-P. 2004. IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties. J Mol Recognit 17:17–32. doi: 10.1002/jmr.647. [DOI] [PubMed] [Google Scholar]
- 11.Fassler J, Cooper P. 2011. BLAST Glossary BLAST help. National Center for Biotechnology Information, Bethesda, MD. [Google Scholar]