Abstract
The genetic element s2m has been acquired through horizontal transfer by many distantly related viruses, including the SARS-related coronaviruses. Here we show that s2m is evolutionarily conserved in these viruses. Though several lineages of severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) devoid of the element can be found, these variants seem to have been short lived, indicating that they were less evolutionary fit than their s2m-containing counterparts. On a species-level, however, there do not appear to be any losses and this pattern strongly suggests that the s2m element is essential to virus replication in SARS-CoV-2 and related viruses. Further experiments are needed to elucidate the function of s2m.
Subject terms: Molecular evolution, Viral genetics
The coding capacity of severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) has been investigated in great detail1, and the secondary structure of genomic RNA elements has also been studied2,3, but the biological significance of all of these components has not yet been fully elucidated. One of the annotated elements in the reference SARS-CoV-2 genome is the stem-loop II (s2m) element (Genbank accession NC_045512.2, position 29,728–29,768) that was originally described in astroviruses4. s2m is a 41-bp sequence located in the non-coding 3′ part of the SARS-CoV-2 genome. It has been found in members of a least four different virus families, including several lineages of coronaviruses5,6. There also seems to be a xenolog of s2m in some insect species, which likely results from endogenization of s2m-containing viral elements7. The evolutionary relationships between these homologs remain unclear, but it appears as if s2m has been horizontally transferred between distantly related organisms several times6. The function is unknown, but the high degree of conservation is consistent with this locus being under selective pressure.
Phylogenetic analyses support several acquisitions of s2m within the coronavirus family, with one gain basal to a cluster of SARS-related betacoronaviruses5. This cluster encompasses both SARS-CoV and SARS-CoV-2, as well as many related virus species, primarily isolated from bat species7,8. We have done a comprehensive phylogenetic analysis in order to map the distribution of s2m within the Coronaviridae subfamily (CoV). In particular, we have tried to assess whether there have been any losses of s2m within the clusters where this motif can be found, with emphasis on the SARS-related species.
All CoV nucleotide and amino acid sequence data were downloaded from GenBank. Based on an alignment of protein sequences from distantly related CoV species, two regions within the ORF1ab polyprotein were identified that could reliably be aligned across a broad range of accessions. The corresponding amino acid sequences from the reference SARS-CoV-2 genome (NC_045512.2 coding positions 10,334–13,468 and 13,462–21,552) were used as query sequences in tblastn sequence similarity searches against the CoV nucleotide data. When tabulating the results, the best matching sequence for every unique GenBank ‘ORGANISM’ identifier was extracted (Supplementary Table 1). In order to score a viral species as having s2m, the motif had to be found near the 3′ end of the genome with a maximum of one mismatch compared to published s2m sequences4–7,9 in at least one accession from the corresponding ‘ORGANISM’ identifier. To remove redundancy in the 436 CoV amino acid sequences that were retrieved from GenBank while retaining their full phylogenetic diversity, we aligned them using MAFTT10 and removed ambiguously aligned blocks with GBLOCKS11. We then used mothur12 to clusterize s2m-containing sequences and sequences devoid of s2m at 0.1% and 2.5% distance threshold, respectively. We chose to use a higher clustering threshold for sequence devoid of s2m because these sequences were not the focus of our study and were thus primarily included to place s2m-containing sequences in their evolutionary context. The resulting alignment of 133 amino acid sequences was subjected to a phylogenetic analysis using PHYML 3.013 with the LG + G + I model, as determined by ProtTest 314.
The resulting unrooted topology (Fig. 1) revealed three monophyletic clusters of s2m-containing operational taxonomic units (OTUs). The tree was highly supported, and in addition to two s2m-containing clades comprising isolates stemming from birds, a large group of SARS-related s2m-containing OTUs could readily be identified. This cluster included sequences sampled from several different bat species in addition to eight other vertebrates (Fig. 1). The most basal member of this cluster was Bat Hp-betacoronavirus Zhejiang2013, the only member thus far described from the Betacoronavirus subgenus Hibecovirus15. After excluding a small number of accessions without coverage in the 3′-end of the genome, this clade represented 183 unique ORGANISM identifiers (collapsed into 44 mothur-defined groups, see Supplementary Table 1) that all contained s2m.
Though s2m showed no species-level losses within any of the three clusters, the vast amount of sequence data available from SARS-CoV-2 isolates permit a detailed analysis of how this motif might behave on a virus lineage-level. Sequence data and corresponding metadata from 537 360 SARS-CoV-2 isolates were downloaded from the GISAID database16. The 3′ end of high-quality genomes was screened for the presence of s2m single nucleotide polymorphisms (SNPs) and indels. A large number of SNP variants were observed, and, as expected, many of these correlated strongly with virus lineages (as defined by PANGOLIN annotation; Supplementary Table 2)17. Looking at indel variants, there also appeared to be lineage-specific variability and several isolates with complete deletion of s2m were observed (Fig. 2; Supplementary Table 3). Two lineages (B.1.1.311 and B.1.160.7) were found to be dominated by s2m deletion mutants (representing 63.7% and 76.1% of the submitted sequences, respectively). Lineage B.1.1.311 had complete deletion of the entire s2m region, whereas lineage B.1.160.7 had a smaller lesion (Fig. 2; DelSeq_1183 and DelSeq_325). Both lineages seem to have had peak distribution Fall 2020 and to have emerged within the United Kingdom (https://cov-lineages.org/lineages/lineage_B.1.1.311.html and https://cov-lineages.org/lineages/lineage_B.1.160.7.html). These lineages have obviously been viable, but their subsequent decline could imply that they were less fit than other emerging strains. Phylogenetic analyses of lineages containing s2m deletion mutants indicated that the primary genetic lesion often is the deletion of a small section of s2m, followed by complete elimination of the element from the lineage’s genome (data not shown).
Coronaviruses have been studied extensively in order to identify regions under selective pressure18. A recent study identified s2m as having the highest mutation rate in the SARS-CoV-2 genome and the authors suggest that this could be interpreted as either loss of purifying constraints or the result of diversifying selection19. Some early reports also proposed that s2m could be involved in recombination events20. It is reasonable to assume that the function of s2m is tightly linked with the element’s secondary structure. Assuming that the structure is not dependent on interactions with factors that have yet to be identified, an analysis of the canonical SARS-CoV-2 genome using an in vivo-based approach indicated that the structure of s2m deviates significantly from the structure observed for SARS-CoV3. The two versions of s2m differ in two positions, constituting two transversions that both seem to disrupt the stem-forming ability of s2m3. It is thus unclear if s2m in SARS-CoV and SARS-CoV-2 are functionally equivalent.
In our opinion, the fact that this element never seems to be lost at the species level within the SARS-related coronaviruses suggests that s2m became essential to virus replication after being acquired through horizontal transfer. Both cellular genes and non-coding RNAs acquired by double-stranded DNA viruses through horizontal transfer have been shown to become fixed in viral species, most likely due to their positive effect on viral replication21–24. On the contrary, populations of the AcMNPV baculovirus continuously receive transposable elements (TE) from their moth hosts, but all TE copies integrated into the viral genomes become rapidly lost, probably because they impose a fitness cost to the virus25,26.We argue that for s2m to be non-essential for viral replication, its distribution within the SARS-related coronaviruses should be significantly more patchy, due to frequent losses.
Though the exact function of s2m has not yet been revealed, several hypotheses have been entertained, including interference with the translational machinery of infected cells9, involvement in gene regulation through RNA silencing6 and protection of genomic/transcriptomic virus RNA from ribonuclease degradation5. The effect of swapping the 3′ UTR regions between s2m-containg and s2m-deficient virus strains has also been investigated, but the effect on viral fitness seems subtle27. Due to s2m’s highly conserved nature, it has also been suggested as a potential drug target28,29 and as a polymerase chain reaction (PCR) primer site for exploring virus diversity30. Further studies are needed in order to elucidate the role of s2m in virus replication, not just within the coronaviruses, but in all virus families where this horizontally transferred element has been detected.
Supplementary Information
Author contributions
The project was conceptualized by T.T. C.G. did the all the phylogenetic analyses and the final version of the manuscript was written by both authors.
Funding
This article was funded by Agence Nationale de la Recherche (Grant Number: ANR-18-CE02-0021-01 TranspHorizon).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-021-95496-4.
References
- 1.Finkel Y, et al. The coding capacity of SARS-CoV-2. Nature. 2021;589:125–130. doi: 10.1038/s41586-020-2739-1. [DOI] [PubMed] [Google Scholar]
- 2.Wacker A, et al. Secondary structure determination of conserved SARS-CoV-2 RNA elements by NMR spectroscopy. Nucleic Acids Res. 2020;48:12415–12435. doi: 10.1093/nar/gkaa1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Huston NC, et al. Comprehensive in vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms. Mol. Cell. 2021;81:584–598 e585. doi: 10.1016/j.molcel.2020.12.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Monceyron C, Grinde B, Jonassen TO. Molecular characterisation of the 3′-end of the astrovirus genome. Arch. Virol. 1997;142:699–706. doi: 10.1007/s007050050112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tengs T, Jonassen CM. Distribution and evolutionary history of the mobile genetic element s2m in coronaviruses. Diseases. 2016 doi: 10.3390/diseases4030027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tengs T, Kristoffersen AB, Bachvaroff TR, Jonassen CM. A mobile genetic element with unknown function found in distantly related viruses. Virol. J. 2013;10:132. doi: 10.1186/1743-422x-10-132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tengs T, Delwiche CF, Monceyron Jonassen C. A genetic element in the SARS-CoV-2 genome is shared with multiple insect species. J. Gen. Virol. 2021 doi: 10.1099/jgv.0.001551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dimonaco NJ, Salavati M, Shih BB. Computational analysis of SARS-CoV-2 and SARS-like coronavirus diversity in human, bat and pangolin populations. Viruses. 2020 doi: 10.3390/v13010049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Robertson MP, et al. The structure of a rigorously conserved RNA element within the SARS virus genome. PLoS Biol. 2005;3:e5. doi: 10.1371/journal.pbio.0030005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Katoh K, Rozewicki J, Yamada KD. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 2019;20:1160–1166. doi: 10.1093/bib/bbx108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007;56:564–577. doi: 10.1080/10635150701472164. [DOI] [PubMed] [Google Scholar]
- 12.Schloss PD, et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 2009;75:7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Guindon S, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- 14.Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: Fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–1165. doi: 10.1093/bioinformatics/btr088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wu Z, et al. ORF8-related genetic evidence for Chinese horseshoe bats as the source of human severe acute respiratory syndrome coronavirus. J. Infect. Dis. 2016;213:579–583. doi: 10.1093/infdis/jiv476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. 2017;1:33–46. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rambaut A, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020;5:1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Forni D, Cagliani R, Clerici M, Sironi M. Molecular evolution of human coronavirus genomes. Trends Microbiol. 2017;25:35–48. doi: 10.1016/j.tim.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chiara M, Horner DS, Gissi C, Pesole G. Comparative genomics reveals early emergence and biased spatio-temporal distribution of SARS-CoV-2. Mol. Biol. Evol. 2021 doi: 10.1093/molbev/msab049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yeh TY, Contreras GP. Emerging viral mutants in Australia suggest RNA recombination event in the SARS-CoV-2 genome. Med. J. Aust. 2020;213:44. doi: 10.5694/mja2.50657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Holzerlandt R, Orengo C, Kellam P, Alba MM. Identification of new herpesvirus gene homologs in the human genome. Genome Res. 2002;12:1739–1748. doi: 10.1101/gr.334302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Guo YE, Riley KJ, Iwasaki A, Steitz JA. Alternative capture of noncoding RNAs or protein-coding genes by herpesviruses to alter host T cell function. Mol. Cell. 2014;54:67–79. doi: 10.1016/j.molcel.2014.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Theze J, Takatsuka J, Nakai M, Arif B, Herniou EA. Gene acquisition convergence between entomopoxviruses and baculoviruses. Viruses-Basel. 2015;7:1960–1974. doi: 10.3390/v7041960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McFadden G, Murphy PM. Host-related immunomodulators encoded by poxviruses and herpesviruses. Curr. Opin. Microbiol. 2000;3:371–378. doi: 10.1016/S1369-5274(00)00107-7. [DOI] [PubMed] [Google Scholar]
- 25.Gilbert C, et al. Continuous influx of genetic material from host to virus populations. PLoS Genet. 2016;12:e1005838. doi: 10.1371/journal.pgen.1005838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gilbert C, Cordaux R. Viruses as vectors of horizontal transfer of genetic material in eukaryotes. Curr. Opin. Virol. 2017;25:16–22. doi: 10.1016/j.coviro.2017.06.005. [DOI] [PubMed] [Google Scholar]
- 27.Goebel SJ, Taylor J, Masters PS. The 3′ cis-acting genomic replication element of the severe acute respiratory syndrome coronavirus can function in the murine coronavirus genome. J. Virol. 2004;78:7846–7851. doi: 10.1128/Jvi.78.14.7846-7851.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lulla V, et al. Targeting the conserved stem loop 2 motif in the SARS-CoV-2 genome. J. Virol. 2021;95:e0066321. doi: 10.1128/JVI.00663-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Manfredonia I, et al. Genome-wide mapping of SARS-CoV-2 RNA structures identifies therapeutically-relevant elements. Nucleic Acids Res. 2020;48:12436–12452. doi: 10.1093/nar/gkaa1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Jonassen CM. Detection and sequence characterization of the 3′-end of coronavirus genomes harboring the highly conserved RNA motif s2m. Methods Mol. Biol. 2008;454:27–34. doi: 10.1007/978-1-59745-181-9_3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.