Abstract
Reverse-transcribing animal DNA viruses include the hepadnaviruses, a well-characterized family of small enveloped viruses that infect vertebrates but also a sister group of nonenveloped viruses more recently discovered in fish and termed the nackednaviruses. Here, we describe the complete sequence of a virus found in the feces of an insectivorous bat, which encodes a core protein and a reverse transcriptase but no envelope protein. A database search identified a viral sequence from a permafrost sample as its closest relative. The two viruses form a cluster that occupies a basal phylogenetic position relative to hepadnaviruses and nackednaviruses, with an estimated divergence time of 500 My. These findings may lead to the definition of a “proto-nackednavirus” family and support the hypothesis that the ancestors of hepadnaviruses were nonenveloped.
Keywords: pararetroviruses, virus evolution, metagenomics
Two major families of nonintegrated reverse-transcribing DNA viruses have been defined by the International Committee for the Taxonomy of Viruses: Caulimoviridae (1) and Hepadnaviridae (2). While caulimoviruses infect plants, hepadnaviruses infect vertebrates, including humans. In particular, human hepatitis B virus (HBV) is a leading cause of cirrhosis and hepatocellular carcinoma worldwide (3). In 2017, a new group of viruses distantly related to hepadnaviruses was identified in teleost fishes (4), and related viral sequences have since been reported (5). These viruses may represent a viral family called the nackednaviruses, or Nudnaviridae (2). Their genome is slightly smaller than that of hepadnaviruses (2.7 to 3.1 kb versus 3.0 to 3.4 kb), and importantly, they do not encode an envelope protein. The characterization of nackednaviruses suggested an ancient origin for hepadnaviruses from nonenveloped ancestors in fish (4).
In a recent study, we described DNA virus sequences from fecal samples of different bat species sampled in Spain (6). Analysis of the raw data from this study led us to identify a viral contig with an ambiguous taxonomic classification within the order Blubervirales. This sequence was obtained from the feces of a single Myotis scalerai individual captured on June 2022 in the Sima de la Higuera, a cave near the village of Pliego in the province of Murcia, Spain. PCR and sequencing of the cytochrome B gene confirmed that this sample was from M. scalerai, an insectivorous species (7). The viral contig corresponded to a complete circular genome of 3,472 nt, as it had terminal redundancy, and was sequenced with an average coverage of 150 reads per base. Its identity was then confirmed by PCR using sequence-specific primers that allowed us to amplify the entire viral genome in overlapping fragments.
This virus, which we called Pliego virus, contained two open reading frames (ORFs) of 1,149 and 2,376 nt homologous to, but highly divergent from, the capsid/core (C) and polymerase (P) of hepadnaviruses and nackednaviruses. The predicted C protein of Pliego virus was considerably larger than that of hepadnaviruses and nackednaviruses (382 aa versus 180 to 260 aa). Protein BLAST (BLASTp) analysis indicated that the closest sequences for the C and P proteins corresponded to HBV (Genbank accession ANQ89943.1) and fish-associated HBV (WAQ80622.1), respectively. However, the presence of multiple stop codons in the S-congruent reading frame revealed a typical nackednavirus feature. The Pliego virus genome contained three additional putative ORFs unrelated to those of other nackednaviruses or hepadnaviruses (Fig. 1A), but the structure and function of the corresponding proteins could not be predicted.
Fig. 1.
Comparative genome annotation. (A) Genome organization of Pliego virus, RNDV, and HBV. In addition to genes with known function, putative ORFs, and two short direct repeats of sequence AACTTCTACTGCAC (DR1 and DR2) are indicated. (B) Similarity between the amino acid sequences of Pliego virus and RNDV or HBV for the C (Left) and P (Right) proteins, using a 30-residue sliding window. Core conserved motifs CI-CIII, terminal protein domain (TP), and reverse transcriptase and RNaseH domains (RT/RH) are shown. (C) Alignments of the three core conserved motifs (mapping to residues 36 to 48, 303 to 312, and 326 to 341 of the C protein in the Pliego virus sequence) and selected regions of P domains for seven representatives of different hepadnavirus genera, two nackednaviruses, Pliego virus, and Toolik virus. Notice that, in the GLY conserved motif of TP, L is substituted for V in Pliego virus and I in Toolik virus. In contrast, the tyrosine-methionine-aspartate-aspartate (YMDD) motif of the RT/RH is fully conserved. (D, Left) Alignment of the C protein α-helices α4b and α5 for representatives of hepadnaviruses and nackednaviruses, along with Pliego and Toolik viruses. (D, Right) conservation of the predicted secondary structure along the protein C of Pliego and Toolik viruses. Black waves, yellow arrows, and red lines correspond to α-helices, β-sheets, and loops, respectively.
A similarity plot of ORF C against a representative nackednavirus (rockfish nackednavirus, RNDV) and HBV showed a very high divergence (10.5% and 11.9% overall amino acid identity, respectively, Fig. 1B). Nevertheless, we were able to identify conserved core motifs of vertebrate HBVs (8), which were more similar to those of hepadnaviruses than to those of nackednaviruses (Fig. 1C). Phyre 2 and mapping of predicted secondary structures to the sequence alignment also suggested conservation of the core α-helix 4b found in hepadnaviruses and nackednaviruses (4, 9) and, to a lower extent, of α-helix 5 (Fig. 1D). The P protein was less extremely divergent than C (23.8% and 18.3% overall amino acid identity with RDNV and HBV, respectively; Fig. 1B). In the Blubervirales, P contains a reverse transcriptase (RT; IPR00477) and RNaseH (IPR001462) domain, as well as a terminal protein (TP; IPR000201) domain involved in the initiation of reverse transcription (10), all of which were detected (Fig. 1C). However, the RNA element epsilon, whose secondary structure is essential for the priming of reverse transcription (11), was not found. Similar to nackednaviruses (4), the spacer between the TP and RT found in hepadnaviruses was shorter than that in hepadnaviruses. Other genomic motifs conserved in nackednaviruses and hepadnaviruses were identified, such as the direct repeats (DR1 and DR2) containing the TATA box and the polyadenylation signal.
We set out to detect similar sequences in databases. To this end, we first searched two large bat metagenomics projects (PRJNA953205, PRJNA929070), but this did not yield any hit. We then carried out a BLASTp analysis of the C and P proteins in IMG/VR4, an extended database of uncultivated virus genomes (12). This revealed an unpublished 3,316 nt viral sequence (BK068391) assembled in a metagenomics study of permafrost soil carried out in the Toolik Field Station, Alaska. This “Toolik virus” shared amino acid sequence identities of 34.5% and 43.4% with Pliego virus C and P proteins, respectively, and also lacked a gene encoding an S protein. Despite the high divergence between the Pliego and Toolik sequences, the predicted secondary structure of the C protein showed a remarkable conservation (Fig. 1 C and D).
We then inferred the phylogenetic relationships between Pliego virus, Toolik virus, hepadnaviruses, nackednaviruses, and other reverse-transcribing viruses (Fig. 2A). In this analysis, we also included HEART insect endogenous retroelements related to Blubervirales (13). The RT phylogeny showed that nackednaviruses and hepadnaviruses formed well-supported sister clades, whereas the Pliego and Toolik viruses formed a separate cluster that occupied a basal position relative to the nackedna/hepadnavirus group. Following previous work (4), we also obtained a time-calibrated Bayesian tree of the P protein sequence (excluding HEART elements), using an endogenous avihepadnaviral element (eAHBV-FRY) integrated into the Neoaves genome for calibration (4, 14, 15). We estimated a divergence time of about 450 My between nackednaviruses and hepadnaviruses, similar to the value obtained in previous work (4), while Pliego and Toolik viruses would have diverged more than 500 Mya (Fig. 2B). We propose that the order Blubervirales should include hepadnaviruses, nackednaviruses, and the Pliego/Toolik clade, which we tentatively designate as “proto-nackednaviruses.” Moreover, our results support the hypothesis that ancestral nonintegrated reverse-transcribing animal DNA viruses were nonenveloped and that the S ORF in hepadnaviruses probably arose by genetic overprinting (4).
Fig. 2.
Phylogenetic position of the newly described cluster. (A) Bayesian phylogenetic tree of the RT domain. The Pliego and Toolik sequences were added to the alignment used in a previous work (13) including blubervirus-related insect HEART retroelements. Scale bar, number of amino acid substitutions per site. (B) Time-calibrated Bayesian tree of the P protein. The Pliego virus and Toolik virus sequences were added to the alignment used in a previous work (4). Scale bar, Mya. Nodes are collapsed by genus or family, and the number of sequences in each clade is given in parentheses. Enveloped viruses are shown in yellow and nonenveloped viruses in green. Numbers at branching nodes indicate posterior probability values.
According to the above tree, proto-nackednaviruses might have diverged from other Blubervirales around the Cambrian. Hence, the rapid radiation of early metazoans during this time might have been accompanied by diversification of bluberviruses. However, uncertainties in estimated divergence times, cross-species transmission events, and the fact that many viruses remain undiscovered complicate evolutionary inferences. Cross-species transmission is generally thought to be rare among DNA viruses (16, 17) but has been suggested for bat hepadnaviruses (18) and fish nackednaviruses (19).
In contrast to nackednaviruses, which have been found in teleost fish (4, 5), the identified proto-nackednaviruses are unlikely to be fish viruses since they were obtained from the feces of an insectivorous bat and permafrost soil. We speculate that these might be arthropod viruses, but alternative possibilities cannot be ruled out at present. Further research is warranted to test whether Pliego or related viruses could infect bats, as bats are a known source of primate hepadnaviruses (20).
Methods
Bats were captured and identified at the species level, and fresh fecal samples were collected in accordance with European and Spanish regulations. Extracted nucleic acids were used for cytochrome B Sanger sequencing and de novo sequencing on a NextSeq instrument. Contigs were assembled and used for viral sequence detection. PCR amplification and Sanger sequencing were performed to confirm the Pliego virus sequence. Sequence annotation included ORF search and prediction of protein domains, structure, and function. Sequence diversity and phylogenetic analyses were performed using R, Biostrings, and Bayesian inference.
Supplementary Material
Appendix 01 (PDF)
Acknowledgments
This research was financially supported by grant PID2020-118602RB-I00 from the Spanish Ministerio de Ciencia e Innovación to J.M.C. and R.S., grant CIAICO/2022/110 from the Conselleria de Educación, Universidades y Empleo (Generalitat Valenciana) to R.S., and European Research Council (ERC) Advanced Grant 101019724-EVADER to R.S.
Author contributions
J.M.C. and R.S. designed research; J.B., A.V., R.M.-R., and J.S.M. performed research; J.B. analyzed data; and J.M.C. and R.S. wrote the paper.
Competing interests
The authors declare no competing interest.
Contributor Information
José M. Cuevas, Email: cuevast@uv.es.
Rafael Sanjuán, Email: rafael.sanjuan@uv.es.
Data, Materials, and Software Availability
Previously published data were used for this work (6). The viral contig described in this study was deposited in Genbank under accession number PQ119727 (21). Other data are available at NCBI BioProject (22, 23).
Supporting Information
References
- 1.Teycheney P.-Y., et al. , ICTV virus taxonomy profile: Caulimoviridae. J. Gen. Virol. 101, 1025–1026 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Magnius L., et al. , ICTV virus taxonomy profile: Hepadnaviridae. J. Gen. Virol. 101, 571–572 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jeng W.-J., Papatheodoridis G. V., Lok A. S. F., Hepatitis B. Lancet Lond. Engl. 401, 1039–1052 (2023). [DOI] [PubMed] [Google Scholar]
- 4.Lauber C., et al. , Deciphering the origin and evolution of Hepatitis B viruses by means of a family of non-enveloped fish viruses. Cell Host Microbe 22, 387–399.e6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ford C. E., Dunn C. D., Leis E. M., Thiel W. A., Goldberg T. L., Five species of wild freshwater sport fish in Wisconsin, USA, reveal highly diverse viromes. Pathog. Basel Switz. 13, 150 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Buigues J., et al. , Full-genome sequencing of dozens of new DNA viruses found in Spanish bat feces. Microbiol. Spectr. 12, e0067524 (2024), 10.1128/spectrum.00675-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Novella-Fernandez R., et al. , Trophic resource partitioning drives fine-scale coexistence in cryptic bat species. Ecol. Evol. 10, 14122–14136 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dill J. A., et al. , Distinct viral lineages from fish and amphibians reveal the complex evolutionary history of hepadnaviruses. J. Virol. 90, 7920–7933 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pfister S., et al. , Structural conservation of HBV-like capsid proteins over hundreds of millions of years despite the shift from non-enveloped to enveloped life-style. Nat. Commun. 14, 1574 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Clark D. N., Flanagan J. M., Hu J., Mapping of functional subdomains in the terminal protein domain of Hepatitis B virus polymerase. J. Virol. 91, e01785-16 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Beck J., Seitz S., Lauber C., Nassal M., Conservation of the HBV RNA element epsilon in nackednaviruses reveals ancient origin of protein-primed reverse transcription. Proc. Natl. Acad. Sci. U.S.A. 118, e2022373118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Camargo A. P., et al. , IMG/VR v4: An expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gong Z., Han G.-Z., Insect retroelements provide novel insights into the origin of Hepatitis B viruses. Mol. Biol. Evol. 35, 2254–2259 (2018). [DOI] [PubMed] [Google Scholar]
- 14.Suh A., Brosius J., Schmitz J., Kriegs J. O., The genome of a Mesozoic paleovirus reveals the evolution of hepatitis B viruses. Nat. Commun. 4, 1791 (2013). [DOI] [PubMed] [Google Scholar]
- 15.Claramunt S., Cracraft J., A new time tree reveals Earth history’s imprint on the evolution of modern birds. Sci. Adv. 1, e1501005 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Geoghegan J. L., Duchêne S., Holmes E. C., Comparative analysis estimates the relative frequencies of co-divergence and cross-species transmission within viral families. PLoS Pathog. 13, e1006215 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Costa V. A., et al. , Limited cross-species virus transmission in a spatially restricted coral reef fish community. Virus Evol. 9, vead011 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nie F.-Y., et al. , Extensive diversity and evolution of hepadnaviruses in bats in China. Virology 514, 88–97 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Costa V. A., et al. , Host adaptive radiation is associated with rapid virus diversification and cross-species transmission in African cichlid fishes. Curr. Biol. CB 34, 1247–1257.e3 (2024). [DOI] [PubMed] [Google Scholar]
- 20.Drexler J. F., et al. , Bats carry pathogenic hepadnaviruses antigenically related to hepatitis B virus and capable of infecting human hepatocytes. Proc. Natl. Acad. Sci. U.S.A. 110, 16151–16156 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Buigues J., et al. , A new clade of pararetroviruses distantly related to hepadnaviruses and nackednaviruses. NCBI. https://www.ncbi.nlm.nih.gov/nuccore/PQ119727. Deposited 25 August 2024.
- 22.Chen Y. M., et al. , Host traits govern virome composition and viral cross-species transmission in small mammals. NCBI BioProject. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA953205/. Accessed 18 March 2024.
- 23.Wang J., et al. , Individual bat virome analysis reveals co-infection and spillover among bats and virus zoonotic potential. NCBI BioProject. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA929070/. Accessed 18 March 2024. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Data Availability Statement
Previously published data were used for this work (6). The viral contig described in this study was deposited in Genbank under accession number PQ119727 (21). Other data are available at NCBI BioProject (22, 23).


