Abstract
Metatranscriptome sequencing dramatically expanded the known diversity of the global RNA virome and, in particular, suggested several new candidate phyla in riboviruses. Using a double-stranded RNA (dsRNA) sequencing, here, we report five complete, bisegmented RNA genomes of a putative phylum group, paraxenoviruses, identified from marine environments. Phylogenetic analysis of the RNA-directed RNA polymerases of paraxenoviruses demonstrated their affinity with the ribovirus order Durnavirales within the class Duplopiviricetes of the phylum Pisuviricota. The order Durnavirales includes families Cystoviridae that consists of well-characterized dsRNA bacteriophages and less thoroughly studied Picobirnaviridae that are also suspected to infect bacteria. Consistently, modeling and analysis of the structure of the predicted capsid protein (CP) of several paraxenoviruses revealed similarity to picobirnavirus CP although the paraxenovirus CP is much larger and contains unique structural elaborations. Taken together, these affinities suggest that paraxenoviruses represent a distinct family within Durnavirales, which we provisionally name “Paraxenoviridae”. Both genomic segments in Picobirnaviridae and “Paraxenoviridae” encompass multiple open reading frames, each preceded by a typical bacterial ribosome-binding site, strongly suggesting that these families consist of bacterial viruses. Search for homologs of paraxenovirus genes shows widespread distribution of this virus group in the global ocean, suggesting an important contribution to marine microbial ecosystems. Our findings further expand the diversity and ecological role of the bacterial RNA virome, reveal extensive structural variability of RNA viral CPs, and demonstrate the common ancestry of several distinct families of bacterial viruses with dsRNA genomes.
Keywords: marine RNA virus, metatranscriptome, dsRNA sequencing, bacteriophages, virus evolution
Introduction
In the comprehensive taxonomy of viruses that was recently adopted by the International Committee on Taxonomy of Viruses, the vast majority of RNA viruses without a DNA stage in their reproduction cycles comprise the kingdom Orthornavirae within the realm Riboviria [1]. All these viruses share a single conserved gene that encodes the enzyme responsible for virus RNA replication, RNA-directed RNA polymerase (RdRP). Based on the phylogeny of the RdRPs, the kingdom Orthornavirae splits into seven phyla corresponding to five major clades in the tree, Lenarviricota, Pisuviricota, Kitrinoviricota, Duplornaviricota, and Negarnaviricota [2–4], and two smaller phyla, Ambiviricota and Artimaviricota, created more recently [5, 6].
In the last few years, culture-independent metatranscriptome analysis followed by phylogenetic analyses of RdRPs dramatically expanded the diversity and ecological spread of RNA viruses in the kingdom Orthornavirae [7–11]. For example, metatranscriptome analysis of a single large coastal water sample from the Yangtze estuary in China yielded more than 4500 previously unidentified distinct members of Orthornavirae, doubling the number of known RNA virus lineages at the level between species and genus [8]. Subsequent, large scale metatranscriptome studies have expanded the diversity of the known riboviruses by at least an order of magnitude [9–11]. Most of the discovered viruses fall into the original five phyla, but in the latest Global Ocean RNA metatranscriptome study, five new candidate phyla have been proposed (Taraviricota, Pomiviricota, Arctiviricota, Paraxenoviricota, and Wamoviricota) [11], and two distinct candidate phyla, which tentatively called “p.0001” and “p.0002”, were proposed in another large scale metatrancriptome study [10]. More recently, an artificial intelligence approach (a deep learning algorithm, termed LucaProt) was employed to discover ~162 000 potential RNA virus species and 180 RNA virus supergroups from more than 10 000 global metatranscriptomes [12].
The recent updates of the diversity of RNA viromes are generally based only on RdRP sequence whereas complete genomes are necessary to infer with confidence virus replication and expression strategies, and to elucidate evolutionary relationships. This is a challenging task, particularly, in the case of viruses with multi-segmented genomes because genes encoding proteins other than the RdRP are not highly conserved among RNA viruses [13]. To overcome this problem, adequate methods are required to obtain complete RNA virus genome sequences. We developed a dsRNA-derived cDNA construction and sequencing method, FLDS (Fragmented and primer Ligated DsRNA Sequencing), that enabled us to determine complete sequences of intracellular long dsRNA molecules [14, 15], such as genomes of dsRNA viruses and replication intermediates of ssRNA viruses [16]. FLDS also enables reconstruction of complete genomes for multi-segmented RNA viruses, given the similarities among the respective terminal sequences [14, 15, 17, 18]. Recently, FLDS has been successfully applied to characterize complete segmented genomes of two novel groups of bacterial riboviruses, one of which represents the new phylum, Artimaviricota, that might be subsequently upgraded to the kingdom rank [6].
“Candidatus (Ca.) Paraxenoviricota” has been proposed as a new phylum within the kingdom Orthornavirae based on the phylogenetic placement of a limited number of RdRP sequences from a global ocean metatranscriptome [11]. In the RdRP tree of Zayed et al. [11], paraxenoviruses form a distinct clade outside the phylum Kitrinoviricota. However, Edgar [19] presented an alternative phylogenetic analysis in which paraxenoviruses are lodged within Kitrinoviricota, leaving uncertain the phylogenetic and taxonomic positions of this group of viruses. Furthermore, no complete genomes of paraxenoviruses have been reported.
We employed FLDS to reconstruct complete bisegmented genomes of paraxenoviruses from RNA viromes of pelagic surface water microbial communities from the North Pacific and East Indian Oceans. The expanded dataset including previously uncharacterized paraxenovirus lineages, allowed us to reassess the phylogeny of their RdRPs which showed that paraxenoviruses are related to members of the order Durnavirales within the class Durnaviricetes of the phylum Pisuviricota. We also identified the capsid protein (CP) of paraxenoviruses that has an elaborate predicted structure distantly related to that of the CPs of picobirnaviruses, one of the families of Durnavirales. Taken together, these findings further increase the diversity of the bacterial RNA virome that was expanded through the recent effort in metatranscriptome mining, reveal previously unknown structural variability of CPs of RNA viruses with dsRNA genome, and demonstrate common origin of several groups of these viruses. We further demonstrate that paraxenoviruses are widely represented across the global ocean, suggesting a potential impact of these dsRNA bacterial viruses on marine microbial ecosystems.
Materials and methods
Sample collection
Four pelagic surface water samples, which “Ca. Paraxenoviricota” sequences were obtained, were collected from each two stations in the North Pacific and the East Indian Ocean during JAMSTEC cruises (Table 1). At each station, ~10 L of surface water was collected by a bucket, and each 3–4 L of seawater was filtered with a 0.2-μm-pore-size cellulose acetate filter (Advantec, Tokyo, Japan). The filters were stored at −80°C until nucleic acid extraction.
Table 1.
Property of the pelagic surface water samples.
| Library name | Curise ID | Station ID | Geographical coordination | Area | Sampling date |
|---|---|---|---|---|---|
| UraH2 | KM17–01 | Stn 8 | 29°18′ N, 143°31′ E | Izu-Ogasawara Trench, Northwest Pacific Ocean | 06-Jan-2017 |
| UraH6 | MR15–05 | Stn 22 | 18°32′ S, 111°52′ E | East Indian Ocean | 01-Jan-2016 |
| UraH20 | MR14–04 | Stn 147 | 47°0′ N, 126°0′ W | Northeast Pacific Ocean | 22-Aug-2014 |
| UraH22 | MR17-03C | 005 (SC4/KG5) | 25°30′ N, 126°5′ E | Okinawa Trough, East China Sea | 31-May-2017 |
RNA extraction
The RNA extraction method was described previously [15]. Cells collected on a portion of the 0.2-μm-pore-size filters corresponding to ~2 L of seawater were pulverized in a mortar in liquid nitrogen and suspended in dsRNA extraction buffer [20 mM Tris–HCl, pH 6.8; 200 mM NaCl; 2 mM EDTA; 1% SDS; 0.1% (v/v) β-mercaptoethanol]. The manually extracted total dsRNA using an SDS-phenol method was further purified with the cellulose resin chromatography [20, 21].
cDNA synthesis and Illumina sequencing library construction and sequencing
Libraries for FLDS were synthesized from the extracted total RNA samples [15, 18]. In brief, the dsRNA was physically fragmented into ~1.5 kbp and ligated an adapter oligonucleotide to 3′-end of fragmented dsRNAs. After heat denaturation with an oligonucleotide primer, which consisted of complementary sequence to the adapter oligonucleotide, cDNA was synthesized using SMARTer RACE 5′/3’ Kit (Takara Bio, Shiga, Japan). After PCR amplification, short DNA fragments including the primers were removed using an 80% volume of AMPure XP (Beckman Coulter, Brea, CA, United States), and the purified cDNA was fragmented using the ultrasonicator Covaris S220 (Covaris, MA, USA).
Illumina sequencing libraries were constructed using KAPA Hyper Prep Kit Illumina platforms (Kapa Biosystems, MA, USA) from the physically sheared cDNAs constructed from the dsRNA. The libraries were sequenced using the Illumina MiSeq v3 Reagent Kit (600 cycles) with 300-bp paired-end reads on the MiSeq platform (Illumina).
Data processing for RNA virome community analysis and viral genome reconstruction
The raw reads from the dsRNA-derived cDNA libraries were processed with a custom Perl (ver. 5.16.3) script (https://github.com/takakiy/FLDS/Cleanup_FLDS.pl) [17]. In brief, Illumina adaptor sequences, cDNA synthesis adaptors, and low-quality or low-complexity reads were trimmed from the obtained raw reads as previously described [15, 17]. Reads of rRNA sequences were removed using SortMeRNA 2.0 [22]. Cleaned-up reads (Table S1) were subjected to de novo assembly using CLC Genomics Workbench ver. 21.0 (Qiagen, Tokyo, Japan) with the following parameters: a minimum contig length of 500, word value set to 33, and bubble size set to 300.
To analyze the composition of the pelagic RNA virome communities, RdRP-encoding contigs were scanned from the assembles of each sample using hmmsearch from the HMMER package ver. 3.3.2 with several RdRP HMM profiles, including Pfam 34.0 [23], RdRP hmm profiles used by Zayed et al. [11], and NeoRdRp program ver. 1.0 [24]. The search was set up with E-value cut-offs of 1e-5. All predicted RdRP-encoding contigs were clustered at 97% similarity using CD-HIT-EST (ver. 4.6.8) [25]. Finally, 294 clustered RdRP-encoding virus contigs were constructed. Cleaned-up reads from each sample were mapped to these RdRP contigs using BBMap ver. 38.47 [26] with the following parameters: minimum mapped identity and maximum mapped indel set to 0.97 and 3, respectively. The number of mapped reads to each contig was counted using SAMtools ver. 1.18 [27], and FPKM (fragments per kilobase per million cleaned-up reads) of each contig was calculated. The code used in this paragraph is available at the following GitHub repository (https://github.com/takakiy/Paraxeno).
To reconstruct complete genomes of multi-segmented RNA viruses from the contigs assembled in this study, a custom Perl script (https://github.com/takakiy/FLDS/TermCount_FLDS.pl) was used to determine terminal sequences of RNA viral genome segments [17]. The completeness of terminal sequences of each RNA viral genome segment was determined based on the following cases: (i) reads with adaptor sequence were aligned at both termini of a genomic segment, and (ii) their frequencies are generally higher than those in the central region of the segment. The grouping of full-length genomic segments derived from a multipartite RNA virus or its population was conducted based on the conserved sequences at both termini of these segments [6, 14, 15, 17, 18]. The open reading frames (ORFs) and their upstream ribosome-binding sites (RBSs) were identified using Prodigal ver. 2.6.3 with default parameters [28] and by manual inspection. The taxonomic assignment of RdRP genes in these virus genome candidates were examined using the HMM searches with RdRP hmm profiles of Zayed et al. [11]. To obtain more genomes associated with “Ca. Paraxenoviricota”, a BLASTX search (ver. 2.10.0+; E-value threshold: 1e-10) with default parameters [29] using contigs obtained in this study was conducted against paraxenovirus RdRPs identified by the HMM profile search. Transmembrane domains were predicted using the TMHMM Server ver. 2.0 [30] with the default settings. The predicted ORFs were used for HHpred search [31] against PFAM, CDD, and Protein Data Bank (PDB) databases. The genome maps of paraxenoviruses were visualized using DiGAlign ver. 2.0 [32]. The geographical distribution of paraxenoviral RdRP- and CP-encoding contigs was visualized using R v4.3.3 (library sf v1.0-21 [33] and library rnaturalearth v1.1.0).
Analysis of untranslated regions on the RNA virus genomes
To verify that the 5′ termini of RdRP-encoding segments of paraxenovirus genomes do not contain any conserved ORFs, the gene calling was performed with Prodigal using non-standard translational tables (1-25). The RNA secondary structures were predicted and visualized using ViennaRNA package (RNAFold v2.6.3) with default parameters [34]. Trinucleotide frequency analysis was done in R v4.3.3 (trinucleotideFrequency, library Biostrings v2.70.3) [35]. The heatmaps showing trinucleotide frequency and distribution of stop codons were visualized using R v4.3.3 (standard library and pheatmap library: https://github.com/raivokolde/pheatmap).
Phylogenetic analysis
To identify additional RdRP sequences related to paraxenoviruses, a BLASTP search [36] using paraxenovirus sequences obtained from this study and TARA study [11] was performed on the IMG/VR v4 [37] website (BLASTP ver. 2.13.0+; Blast Database: Virus Protein DB ver. 4 2022-09-23; E-value threshold: 1e-10; https://img.jgi.doe.gov/cgi-bin/vr/main.cgi?section=WorkspaceBlast&page=viralform). This search led to the identification of 41 RdRP sequences with high similarity to those of paraxenoviruses. Including these sequences, 48 core sequences of paraxenovirus full-length RdRPs were aligned using MUSCLE5 (ver. 5.0.1278_linux64; -super5 option) [38]; the alignment was used as a query in an HHsearch ver. 3.0.3 [39] with the consensus sequence defining the match states (-M first); no alignment similarity prefiltering (-id 100 -diff inf); permissive thresholds (-e 0.1 -p 10) [31], which is run against a database of family-level RdRP core alignments from Neri et al. [10]. Because clusters of the order Durnavirales were identified as the closest relatives of paraxenoviruses, consensus sequences of 24 family-level clusters of Durnavirales, and one outgroup family (f.0121 of Neri et al. [10], that formed a sister branch to Durnavirales proper) were aligned together with the consensus of the 48 paraxenovirus RdRPs using MUSCLE5 (-super5 option). Then, each consensus amino acid was expanded to the full alignment column, resulting in a complete alignment [10]. Alignment positions containing more than 67% gaps and with homogeneity below 0.05 were removed [40]. An approximate ML phylogenetic tree was reconstructed from the trimmed alignment of 4453 sequences using the FastTree program (ver. 2.1.4 SSE3) with WAG evolutionary model and gamma-distributed site rates [41], and was visualized using MEGA12 [42]. A set of up to 40 most diverse representatives was selected from each family-level alignment, forming a subset of 507 sequences with all families represented. An alignment of these representatives was extracted from the complete alignment, and a ML tree was reconstructed for this subset using IQ-TREE (multicore ver. 2.1.3) [43] with Q.pfam+F+R8 evolutionary model, selected by ModelFinder, and aBayes branch support values.
Modeling of RdRP structures with AlphaFold2 and structural comparisons
To model paraxenovirus RdRP structures, for each prediction, we used Clustal Omega (ver. 1.2.2; the command executed in Geneious Prime = clustalo.exe -i input.fasta -o clustal.aln -v --outfmt = clustal --output-order = tree-order --iter = 0 --cluster-size = 100 -t Protein) [44] to create a custom multiple sequence alignment (MSA), consisting of five sequences from this study (GT1–5), 11 sequences from the TARA study, and additional sequences from the IMG/VR database. Each of GT4, GT3, and GT5 RdRP sequences was used as a query for BLASTP search [36] against the IMG/VR Viral Protein Database on the website [37] (E-value threshold: 1e-10) as described above, and highly similar homologs were identified (18, 24, and 13 sequences, respectively). TARA_132 RdRP model prediction used a custom MSA with the same sequence set as GT4 because BLASTP search against IMG/VR using TARA_132 RdRP sequence resulted in the same hits. All predictions were performed using locally installed colabfold 1.5.1 through LocalColabFold (https://github.com/YoshitakaMo/localcolabfold; --model-type alphafold2_ptm, --num-recycle 10) [45, 46], and the final models were AMBER-relaxed [47]. To assess potential variability of the prediction, we also ran five seeds for each of five alphafold2_ptm_model 1 ~ 5, generating 25 models in total for each RdRP (default seed_000 and additional seed_001 ~ 004; --num-seeds 5). The resulting models were superimposed by the Matchmaker tool in ChimeraX [48] and were analyzed. The confidence scores for all of the RdRP models are summarized in Table S2. The structural model analyses and figure preparations were performed using ChimeraX.
Structural modeling of CPs
To model the paraxenovirus CP structure, we retrieved additional CP sequences from the LucaProt dataset [12] with an iterative profile search. One iteration included hmmsearch (-E e^-10), the alignment of hits with MAFFT v7.520 (--auto) [49] and construction of a new HMM profile for the next iteration using HMMER 3.4 [50]. The search was initiated using a profile created from the seed MSA of CPs from this study (GT1–5) and 11 CP sequences from IMG/VR database. After eight iterations, no more new sequences were found. Partial CPs sequences (less than 900 aa in length) were removed (seqtk seq -L 900). Redundant sequences were filtered out with mmseqs2 (id 1, coverage 0.8). The final CP alignment used for modelling consists of 145 non-redundant sequences. In the modelling job, the CP sequence of GT2 was used as a representative query for GT1, GT2, and GT3. The prediction of dimers was performed using a local installation of ColabFold 1.5.3 (--model-type alphafold2_multimer_v3, --num-recycle 12). The DALI server with default parameters [51] was used to compare obtained CP models against structures in PDB. The structural model analyses and figure preparations were performed using ChimeraX.
Results
dsRNA sequencing and identification of RdRPs of “Ca. Paraxenoviricota”
The FLDS method enables identification of full-length viral genomic segments based on the read mapping of contigs, and complete sets of the genomic segments of multipartite RNA viruses can be identified based on the conserved sequences at the termini of contigs [14, 17]. Using this method, we sequenced 22 dsRNA-derived cDNA libraries from North Pacific [15] and Indian Oceans (Yoshida et al., in preparation). The datasets from each library were then searched for the presence of RdRPs with public and environmental RdRP Hidden Markov Model (HMM) profiles [11]. These searches yielded four full-length contigs (CTG1 to CTG4) encoding “Ca. Paraxenoviricota”-type RdRPs from the FLDS-derived UraH2, UraH20, and UraH22 cDNA libraries (Table 1 and Table S3). A subsequent BLASTX search using the RdRP sequences of the newly identified CTG1–4 as the queries allowed the identification of the CTG5 segment from the UraH6 cDNA library (Table 1 and Table S3).
Complete genomes of “Ca. Paraxenoviricota”
Given that segments of the same virus genome typically share terminal sequences (e.g., [14, 15, 52]), we searched the FLDS libraries for the presence of segments with terminal sequences similar to those of the RdRP-encoding CTG1–5 segments. For each of the RdRP-encoding segments, we identified an additional putative genome segment harboring one or more open reading frames (ORFs) (Table S3). Accordingly, we concluded that the “Ca. Paraxenoviricota” viruses present in our datasets have bipartite genomes, which we refer to as GT1–5 (Genome ID). For each genome set, we designated the segment that harbored the RdRP-encoding ORF as RNA1 and the second segment that harbored non-RdRP ORFs as RNA2 (Fig. 1).
Figure 1.

Novel bipartite “Ca. Paraxenoviricota”-associated RNA virus genomes reconstructed from pelagic dsRNA virome samples. Each of the identified five “Ca. Paraxenoviricota”-associated virus genomes consist of RNA1 and RNA2 genomic segments. Open reading frames encoding homologous RdRP proteins are shown as arrows with orange color. Circles represent predicted SD RBS motifs. Asterisks denote putative genes encoding predicted transmembrane domain (TMD)-containing proteins.
RNA1 and RNA2 each harbored 1 to 4 ORFs encoding predicted proteins that varied in length from 80 to 1376 amino acid residues (Fig. 1 and Table S4). All RNA1 segments shared the RdRP-encoding ORF, and in addition, GT1–3 encompassed two homologous ORFs (Fig. S1). In RNA2 segments, three ORFs were found to be homologous between among GT1–3 (Fig. S2). The predicted paraxenovirus proteins showed no significant sequence similarity (E-value <10−4) to any proteins in the non-redundant protein sequence database (NCBI) that was searched using BLASTP, or in the PFAM, CDD, and PDB databases that were searched using HHpred with the multiple alignments of the respective predicted paraxenovirus proteins used as queries (Probability >50). However, we found that RNA1 of GT1–3 and RNA2 of GT1 and GT5 harbored at least one ORF encoding a protein with a predicted transmembrane domain (Fig. 1), which could be involved in virus-host interactions [53]. The RdRP sequences of paraxenoviruses identified in this study showed the highest similarity to those encoded by the two contigs among seven paraxenovirus contigs (Fig. S3 and Table S4) derived from the TARA RNA virome dataset of the Pacific Ocean [11]. Notably, most of the ORFs are preceded by typical bacterial-type RBSs (Shine-Dalgarno motifs [SD motifs]) (Fig. 1 and Table S4).
The sequences of paraxenoviruses GT1–3 RdRP-encoding RNA1 contain extended 5′-terminal regions that lack any long and/or conserved ORFs that could be predicted using the standard or any of the available alternative translation tables. These regions have markedly different trinucleotide compositions compared to the predicted protein-coding genes, in particular, those encoding the predicted RdRPs (Fig. 2C), strongly suggesting that the long 5’UTRs are indeed non-coding. Long UTRs are not uncommon in RNA viruses, but the ~1.5 kb 5’UTRs of paraxenoviruses appear to be among the longest. Notably, although most likely non-coding, the nucleotide sequences of the 5’UTRs are conserved, especially between GT2 and GT3. Folding of these sequences suggests that the conserved regions (blue boxes in Fig. 2A) adopt similar secondary structures in all three RNA1 sequences (see Fig. 2B). The UTRs might be involved in regulation of translation as, e.g., shown for the 5’UTRs of picornaviruses [54, 55]. We additionally examined the paraxenovirus UTRs for the presence of potential ribozymes using the cmscan program of Infernal, but no ribozymes were predicted. In GT4 and GT5, as well as in the Tara Ocean sequences, no long 5’UTRs were detected. Such UTRs could be a signature of a distinct group of paraxenoviruses (see below). The alternative possibility, namely, that the 5′-terminal regions of the GT4 and GT5 RNA1 segments are incomplete, appears less likely given the similarity between the terminal sequences of cognate RNA1 and RNA2 segments.
Figure 2.
Long 5′ untranslated regions in paraxenovirus RNA segments. (A) Distribution of stop codons in GT1_RNA1, GT2_RNA1, and GT3_RNA1. The positions of stop codons in all six translation frames are shown as stripes below the genome map. Regions conserved between three viruses (nucleotide identity >68%) are connected by shaded links. The dashed box shows a position of predicted secondary RNA structure. (B) Predicted RNA secondary structure found in conserved non-coding regions of paraxenoviruses. The coordinates of secondary structures are indicated. (C) Trinucleotide composition of paraxenovirus genomes. Frequency of trinucleotides in non-coding regions (top three rows of the heatmap) are compared with the capsid protein and RdRP encoding regions.
Phylogenetic analysis of paraxenovirus RdRPs
Altogether, we identified 48 unique sequences of paraxenovirus RdRPs, which were filtered to 40 maximally diverse representatives using a large set of full-length RdRP core sequences for the purpose of phylogenetic analysis [10]. In the maximum likelihood phylogenetic tree of the core sequences of orthornavirus RdRPs, paraxenoviruses were nested within the Durnavirales order of the Duplopiviricetes class in the phylum Pisuviricota (Fig. 3). The sister group of paraxenoviruses is a yet unnamed group that was previously designated f.0117_base-cysto [10]. The clade comprising these two groups is lodged deeply within the Durnavirales branch of Duplopiviricetes and is supported by high bootstrap values (Fig. 3). Thus, there is a discrepancy between our findings and the previous placement of paraxenoviruses in the phylogeny of the Riboviria where these viruses were assigned to a putative new phylum [11]. The difference in the positions of paraxenoviruses in the two phylogenies might be due to the substantially smaller number of sequences from this group and related groups in the analysis of Zayed et al. [11].
Figure 3.

Phylogenetic tree of RdRPs sequences from paraxenoviruses and related viruses in the order Durnavirales. A maximum likelihood tree (IQ-TREE, Q.pfam+F+R8 evolutionary model) was reconstructed using an alignment of RdRp core sequences of 507 representatives of Durnavirales, paraxenoviruses, and an outgroup family (f.0121). aBayes support values are shown for the branches around paraxenoviruses.
Within the paraxenovirus group itself, the phylogeny splits into four clades (Fig. S4). The RdRPs of GT1, −2, and −3 formed a distinct subgroup in clade III, the GT5 RdRP belonged to clade IV, and the GT4 RdRP clustered with several TARA RdRPs (Clade I). For the following RdRP structure modeling, the RdRP sequences of GT3, −4, and − 5 were used as representatives of clades III, I, and IV, respectively, and for clade II, TARA132 RdRP was selected.
Structural modeling of paraxenovirus RdRPs
To further characterize paraxenoviruses, we predicted the structures of the newly identified RdRPs in GT3, GT4, and GT5 genomes along with one identified in the previous study (TARA132). High confidence models were obtained for all these predicted RdRPs, and additional tests with models predicted with different AphaFold parameters showed little variability (Fig. S5). The modeled structures of the paraxenovirus RdRPs were used as queries to search the BFVD database [56] of predicted viral protein structures and PDB database using FoldSeek. The RdRP structures of picobirnaviruses and partitiviruses were consistently identified as the most similar, with the structure of Mongoose picobirna-like virus [57] being the best hit with the probability of 1 (Tables S5 and S6). This result corroborates the phylogenetic placement of paraxenoviruses within Durnavirales. Thus, given the results of phylogenetic analysis (Fig. 3) and the structure-based searches, the predicted structural models of the paraxenoviral RdRPs were compared to those of structurally characterized durnaviral RdRPs, namely RdRPs of cystoviruses phi6 and human picobirnavirus, confirming the close similarity between paraxenovirus and picobirnavirus RdRPs (Fig. 4).
Figure 4.
Predicted models and structural features of paraxenovirus RdRPs. (A) Subdomain comparison of AlphaFold-predicted paraxenovirus RdRP models with experimentally determined cystovirus and picobirnavirus RdRP structures. Residues 1–158 are hidden in the GT5 model due to low confidence. For the AlphaFold models, pLDDT (predicted local distance difference test) and pTM (predicted template modelling) quality scores are indicated. (B) Comparison of motifs A–G in the predicted paraxenovirus RdRP models to those in cystovirus and picobirnavirus RdRPs. Conserved signature residues in each motif are displayed in stick style. Thumb subdomain residues are hidden for clarity. For reference, Mg2+ ions, a Mn2+ ion, two initiating GTPs and a template strand in the phi6 RdRP initiation complex are also shown (PDB:1hi0; [58]). (C) Amino acid sequence alignments of paraxenovirus RdRPs for inferred motifs A–G regions. Conserved signature residues in each motif and the extended HTH loop region are highlighted.
As expected, the paraxenovirus RdRP models showed the canonical “right-hand” RdRP architecture consisting of the fingers-palm-thumb subdomains. All the models of paraxenovirus RdRPs contained a unique, extended, ~60 aa long insertion in a conserved helix-turn-helix (HTH) motif immediately after motif A (Fig. 4A). The insertion extended directly from the first α-helix of the HTH motif and contained multiple β-strands and an α-helix, joining back to the second α-helix of the HTH motif followed by a conserved β-hairpin (“middle” finger). Although insertion of helices and loops within the β-hairpin motif was detected in Negarnaviricota RdRPs, the insertion of a long loop within the HTH motif, extending toward a dsRNA exit channel, appears to be unique to paraxenoviruses and may serve as a defining feature of this virus group.
In addition to the core motifs A–C (Fig. 4A), other canonical RdRP motifs D–G can be inferred based on conservation of key residues in each motif and their spatial positions in the predicted models of paraxenovirus RdRPs, in comparison to publicly available structures of RdRPs from relatively close groups within Durnavirales such as cystoviruses and picobirnaviruses (Fig. 4B and C). Paraxenovirus RdRP motif G appears unique in containing a longer loop, following a canonical template-interacting region that encompasses conserved small residues such as serine, alanine and glycine [59]. The additional loop size varies among paraxenoviruses, as can be seen in sequences of TARA-associated RdRPs (Fig. 4C). Motif F of paraxenovirus RdRP is similar to that of cystovirus and picobirnavirus RdRPs, containing multiple positively charged arginine and lysine residues conserved in positions where they would interact with incoming NTPs. Based on motif F, paraxenovirus RdRPs can be divided into two subgroups, the first one containing three conserved arginines (GT1–3 and GT5) and the second one containing an additional, conserved lysine residue (GT4 and TARA-associated). In the former group, GT5 RdRP stands out further due to (i) the presence of a glutamate as the second negatively charged residue at 1-aa shifted position in motif A, (ii) a histidine replacing the conserved serine in motif B, and (iii) lack of the conserved aromatic amino acid in motif E. The GT5 RdRP also contained an extended N-terminal region that could not be modelled confidently, likely due to insufficient coverage (aa 1–158, predicted local distance difference test [pLDDT] < 50).
Identification and structural modeling of paraxenovirus CPs
In an attempt to gain further insight into the functions of paraxenovirus proteins, we modeled and analyzed structures for all conserved ORF products. This analysis showed that the largest proteins encoded by the corresponding ORF1 of RNA2 segments of paraxenoviruses GT1–4 represent the putative CP (Figs. 1 and S2). The ORF1 proteins of GT1–3, GT4, and GT5 were not recognizably similar at the sequence level, forming three distinct groups, and hence were modeled separately.
For the GT1–3 group, a good quality (pLDDT = 71.8) model was obtained for the corresponding protein of GT2 (1376 aa). The protein consists of an unstructured N-terminal domain (NTD; residues 1–46) and two structured domains, the shell (S; residues 47–319, 369–679, and 1219–1376) and projection (P; residues 320–368 and 680–1218) domains (Fig. 5A). The S domain is predicted to play the key role in the icosahedral capsid (shell) formation, the P domain points away from the capsid and is likely involved in interaction with the host, whereas the N-terminal region contains seven positively charged arginine/lysine residues and is likely to bind to the genomic RNA in the capsid interior. The S domain was modeled with high confidence (pLDDT = 89.5), whereas the pLDDT scores for the NTD and P domain were considerably lower (see Fig. S6A). The top hit in DALI search queried with the S domain alone was the picobirnavirus CP (PDB: 2VF1; [60]). The Z-score was relatively low (Z = 2.4), but manual inspection of the two structural models showed that the two CPs had the same fold, strongly suggesting they are homologous. In particular, the two CPs share a core of three two-stranded β-sheets (Fig. 5B), surrounded by a network of α-helices, all within the S domain. Notably, the predicted CP of paraxenoviruses is far more complex than that of picobirnaviruses, with an especially elaborate P domain consisting of a twisted β-barrel extending ~7 nm above the capsid surface (~3 times higher than in picobirnavirus).
Figure 5.

Predicted structures of CPs of paraxenoviruses. (A) Domain organization of the paraxenovirus CPs encoded by GT2 (top) and GT4 (bottom). The structural models are colored using the rainbow scheme from blue N-terminus to red C-terminus. The shell and projection domains are labeled, whereas the unstructured N-terminal domain is omitted (refer to Fig. S6 for full model). (B) Comparison of the shell domains of the rabbit picobinavirus (left) and paraxenovirus (right) CPs. In both models, the conserved structural elements are colored using the rainbow scheme as in panel A, whereas the non-conserved elements are shown in white. The six β-strands forming the three two stranded β-sheets are numbered. (C) Comparison of the CP dimers of rabbit picobinavirus (left, PDB id: 2vf1), paraxenoviruses GT2 (middle, pLDDT = 70.8, ipTM [interface predicted template modelling] = 0.661), and GT4 (right, pLDDT = 75.2, ipTM = 0.388). In all structures the subunits are colored differently. Side and bottom views are shown.
Although the putative CP (1177 aa) encoded by RNA2 of GT4 does not share recognizable sequence similarity with the CPs encoded by GT1–3, comparison of the corresponding structural models showed that the two proteins have the same fold (Fig. 5A). Notably, the similarity was restricted to the S domains, whereas the P domains were unrelated and the NTD was missing in the CP altogether. The P domain of GT4 CP did not form an extended structure elevated above the S domain. Nevertheless, similar to the S domain of GT2 CP, DALI search queried with the GT4 CP model as the query resulted in the best hit to the picobirnavirus CP (PDB: 2VF1) with a significant Z-score of 5.7.
No high-confidence model for the large protein encoded by RNA2 of GT5 could be obtained. Although the low confidence model (pLDDT = 55.8) bears resemblance to the shell domain of CPs from paraxenoviruses GT1–GT4 (Fig. S7), we refrained from assigning this protein as the GT5 CP at this point.
Capsids of picobirnaviruses, similar to many other dsRNA viruses, including partitiviruses, cystoviruses, and totiviruses, are assembled from CP dimers, with 60 CP dimers forming a T = 1 icosahedral capsid [60–64]. Therefore, we modeled the putative dimers of the paraxenovirus CPs. Although the predicted aligned error for the P domain was relatively high, the confidently predicted dimerization interface within the S domain of GT2 involved regions equivalent to those involved in the picobirnavirus CP dimerization (Figs. 5C and S6D). A dimer prediction for GT4 also showed a similar dimerization pattern, though with lower confidence (Fig. 5C, ipTM [interface predicted template modelling] = 0.388). This observation further supports the homology between the CPs of paraxenoviruses and picobirnaviruses.
Relative abundance and biogeography of paraxenoviruses in marine habitats
The community composition of the four pelagic intracellular RNA viromes from the North Pacific (UraH2, −20, and −22) and East Indian Ocean (UraH6) is shown in Fig. 6A. In addition to paraxenoviruses, members of several ribovirus families with dsRNA genomes previously detected in marine habitats, including Picobirnaviridae, Totiviridae, and Sedoreoviridae [8, 15], were identified in these viromes. The relative abundance of paraxenoviruses varied from 2% to 18% of the RNA viromes.
Figure 6.
Distribution of paraxenoviruses in samples from the world ocean. (A) Composition of the family-level RNA viral communities identified from the North Pacific (UraH2, −20, and − 22) and East Indian Ocean (UraH6). Relative abundance of the representative RNA virus genomes in each sample were estimated based on the FPKM values of the RdRP-encoding sequences. The “other dsRNA” and “other ssRNA” categories represent their associated virus taxa that share less than 2% of the abundance of the total phylotypes. The “others” category represents sequences of unclassified viruses. The biogeographic areas of the virome samples are represented as follows: NwPO (Northwest Pacific Ocean), EIO (East Indian Ocean), NePO (Northeast Pacific Ocean), and ECS (East China Sea). (B) Global distribution of paraxenoviral RdRP- and CP-encoding contigs in marine ecosystems. The circles represent the locations from which samples containing paraxenovirus-like sequences were collected. Circle colors indicate the water depth.
To analyze the ecological and geographic distribution of the paraxenovirus members, we assembled a dataset of orthologous RdRP (n = 658) and CP (n = 800) sequences from IMG/VR, LucaProt, and Tara Oceans databases for which information on the sampling sites was available. Analysis of the extracted metadata showed that paraxenoviruses are exclusive to marine ecosystems, are globally distributed, and are found in epipelagic and mesopelagic layers of the water column, although their presence in the deeper layers (e.g., bathypelagic and abyssopelagic) cannot be currently excluded due to the sampling bias.
Discussion
In this work, we sequenced five complete bipartite genomes of paraxenoviruses from pelagic surface waters. In contrast to the previous study that failed to reveal affinity between paraxenoviruses and any other ribovirus group [11], our analysis of the paraxenovirus proteins yielded strong indications that this virus group belongs to the class Duplopiviricetes and, probably, to the order Durnavirales within phylum Pisuviricota. The evidence of this affinity is threefold. First, in the RdRP phylogeny, paraxenoviruses, together with an uncharacterized related virus family, form a strongly supported clade within Durnavirales (Fig. 3). Second, comparison of the modeled structures of the paraxenovirus RdRPs confirmed the highest structural similarity with the RdRPs of picobirnaviruses, one of the families of Durnavirales (Tables S5 and S6). We suspect that the previously reported lack of affinity of paraxenoviruses with any other riboviruses or alternative placement of this group within Kitrinoviricota was due to the relatively low representation of this group in the initially analyzed samples. Third, the structural model of the predicted paraxenovirus CP is similar to the solved structure of picobirnavirus CP although the paraxenovirus CP is much larger and has a far more elaborate structure than the picobirnavirus counterpart. Thus, paraxenoviruses appear to be relatives of picobirnaviruses, partitiviruses, and cystoviruses, and their most plausible taxonomic affiliation is a family within the order Durnavirales, for which we provisionally suggest the name “Paraxenoviridae”.
The apparent affinity of paraxenoviruses with Durnavirales has notable implications. First, similarly to other viruses of this order, paraxenoviruses most likely have dsRNA genomes. Indeed, our analyses of pelagic RNA virome communities in this study and the previous study [15] revealed the presence of diverse dsRNA ribovirus families, further reinforcing the case for paraxenoviruses. The ecological underpinning of the diversity of the dsRNA virus genomes in these marine environments remains to be explored. Second, in each of the genomic segments of both picobirnaviruses and paraxenoviruses, the presence of multiple ORFs, each preceded by a typical bacterial RBS, strongly suggests bacterial hosts. Thus, the order Durnavirales seems to be dominated by bacteriophages with dsRNA genomes, although eukaryotic viruses infecting plants, fungi, and protists represent a sizable fraction as well [65]. More generally, these findings extend the trend that became prominent in recent metatranscriptome mining studies, namely, expansion of the diversity of RNA bacteriophages far beyond its previously perceived scope (e.g., [10]). Furthermore, our results reveal unprecedented variability of the CP structures of viruses with dsRNA genomes. These findings emphasize the potential of increased sequence coverage combined with detailed comparative genomic analysis to clarify the phylogenetic positions of virus families that initially appear unrelated to any known groups of viruses.
Until now, the information on dsRNA bacteriophages remained scarce due to a low number of isolates [66], and dsRNA phages of marine origin have not been identified. The results of the present work suggest the addition of the potential dsRNA “Paraxenoviridae” family to the growing diversity of known dsRNA phage groups, and furthermore, we showed that this family is widely distributed in different oceans, from polar to tropical regions. Given the presence of the cosmopolitan paraxenoviruses in marine ecosystems, further research may shed light on the host-virus relationships of these dsRNA phages (e.g., lytic versus non-lytic life cycle such as chronic infection) [67, 68], and consequent contributions to the viral shunt in the case of the lytic life cycle [69, 70]. Recently, marine metatranscriptome studies [10, 11] have also reported contigs of potential dsRNA phages within the phylum Pisuviricota. Thus, additional dsRNA phages likely exist in marine environments, and our FLDS-based approach can be expected to help unveiling environmental dsRNA viromes. In this study, our data also revealed that the depth distributions of the paraxenovirus-like sequences are limited to the shallow areas of the sea (Fig. 6b); however, this is likely affected by the sampling bias (more samples were taken from the ocean surface layer), and thus does not appear to reflect genuine ecological distribution of paraxenoviruses. Further FLDS-based RNA sequencing coupled with vertical and horizontal high-resolution sampling of pelagic waters is expected to provide insights into the diversity, ecology, and evolution of unexplored RNA viruses in the global ocean.
Supplementary Material
Acknowledgements
We would like to thank the captain, crew, and onboard scientists and technicians of the R/V Mirai and R/V Kaimei of JAMSTEC during the MR14-04, MR15-05, and MR17-03C and KM17-01 cruises, respectively. We are grateful to Miho Hirai, Fumie Kondo, and Miwako Tsuda for technical assistance for RNA sequencing. Molecular graphics and analyses were performed with UCSF ChimeraX developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases.
Contributor Information
Mitsuhiro Yoshida, Deep-Sea Bioresource Research Group, Research Center for Bioscience and Nanoscience (CeBN), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan.
Sofia Medvedeva, Institut Pasteur, Université Paris Cité, CNRS UMR6047, Cell Biology and Virology of Archaea Unit, 75015 Paris, France.
Akihito Fukudome, Department of Biology and Department of Molecular and Cellular Biochemistry, Howard Hughes Medical Institute, Indiana University, Bloomington, IN 47405, United States.
Yuri I Wolf, Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.
Syun-ichi Urayama, Laboratory of Fungal Interaction and Molecular Biology, Department of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan; Microbiology Research Center for Sustainability, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan.
Yosuke Nishimura, Deep-Sea Bioresource Research Group, Research Center for Bioscience and Nanoscience (CeBN), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan.
Yoshihiro Takaki, Super-cutting-edge Grand and Advanced Research (SUGAR) Program, Institute for Extra-cutting-edge Science and Technology Avant-garde Research (X-star), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan.
Eugene V Koonin, Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.
Mart Krupovic, Institut Pasteur, Université Paris Cité, CNRS UMR6047, Cell Biology and Virology of Archaea Unit, 75015 Paris, France.
Takuro Nunoura, Deep-Sea Bioresource Research Group, Research Center for Bioscience and Nanoscience (CeBN), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan.
Author contributions
Mitsuhiro Yoshida (Conceptualization, Data curation, Writing—original draft, Writing—review & editing), Syun-ichi Urayama (Investigation, Writing—original draft), Yoshihiro Takaki (Data curation, Writing—original draft, Writing—review & editing), Yuri I. Wolf (Data curation, Writing—original draft, Writing—review & editing), Akihito Fukudome (Data curation, Writing—original draft, Writing—review & editing), Sofia Medvedeva (Data curation, Writing—original draft, Writing—review & editing), Yosuke Nishimura (Data curation, Writing—review & editing), Takuro Nunoura (Conceptualization, Writing—original draft, Writing—review & editing), Eugene V. Koonin (Conceptualization, Writing—original draft, Writing—review & editing), Mart Krupovic (Conceptualization, Writing—original draft, Writing—review & editing)
Conflicts of interest
JAMSTEC holds a patent for the “Double-stranded RNA fragmentation method and use thereof”, with S.U. and T.N. listed as inventors. These patents include European Patent Registration No. 3363898, registered on 30 November 2022; China Registration No. ZL201680060127.X, registered on 8 February 2022; US Registration No. 10894981, registered on 19 January 2021; and Japanese patent No. 6386678, registered on 17 August 2018. The other authors declare no conflicts of interest.
Funding
This work was supported in part by a Grant-in-Aid for Scientific Research (Grant No. 21 K06312 to M.Y.) from the Japan Society for the Promotion of Science (JSPS), a Grant-in-Aid for Challenging Research (Grant No. 20 K20377 to T.N.) from JSPS, and Grants-in-Aid for Scientific Research on Innovative Areas from the Ministry of Education, Culture, Science, Sports and Technology (MEXT) of Japan (Grant Nos. 17H05830 to M.Y.; 22H04879 and 20H05579 to S.U.; 19H05684, 16H06429, 16 K21723, and 16H06437 to T.N.), and by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute which provided supercomputing resources for protein structure modelling. S.M. and M.K. were supported by Agence Nationale de la Recherche (grant ANR-23-CE02–0022). Y.I.W. and E.V.K. were supported through the Intramural Research Program of the National Institutes of Health of the USA (National Library of Medicine).
Data availability
Datasets of sequences obtained in this study are available under accession numbers as follows: GT1, LC876750 and LC876751; GT2, LC876752 and LC876753; GT3, LC876754 and LC876755; GT4, LC876756 and LC876757; GT5, LC876758 and LC876759 for identified paraxenoviruses (Table S3) and are described in BioProject accession no. PRJDB20418.
References
- 1. International Committee on Taxonomy of Viruses Executive Committee . The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks. Nat Microbiol 2020;5:668–74. 10.1038/s41564-020-0709-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Wolf YI, Kazlauskas D, Iranzo J. et al. Origins and evolution of the global RNA virome. MBio 2018;9:e02329–18. 10.1128/mbio.02329-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Koonin EV, Dolja VV, Krupovic M. et al. Global organization and proposed megataxonomy of the virus world. Microbiol Mol Biol 2020;84:e00061–19. 10.1128/mmbr.00061-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Koonin EV, Kuhn JH, Dolja VV. et al. Megataxonomy and global ecology of the virosphere. ISME J 2024;18:wrad042. 10.1093/ismejo/wrad042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Kuhn JH, Botella L, de la Peña M. et al. Ambiviricota, a novel ribovirian phylum for viruses with viroid-like properties. J Virol 2024;98:e00831–24. 10.1128/jvi.00831-24 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Urayama S, Fukudome A, Hirai M. et al. Double-stranded RNA sequencing reveals distinct riboviruses associated with thermoacidophilic bacteria from hot springs in Japan. Nat Microbiol 2024;9:514–23. 10.1038/s41564-023-01579-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Starr EP, Nuccio EE, Pett-Ridge J. et al. Metatranscriptomic reconstruction reveals RNA viruses with the potential to shape carbon cycling in soil. Proc Natl Acad Sci USA 2019;116:25900–8. 10.1073/pnas.1908291116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Wolf YI, Silas S, Wang Y. et al. Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome. Nat Microbiol 2020;5:1262–70. 10.1038/s41564-020-0755-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Edgar RC, Taylor J, Lin V. et al. Petabase-scale sequence alignment catalyses viral discovery. Nature 2022;602:142–7. 10.1038/s41586-021-04332-2 [DOI] [PubMed] [Google Scholar]
- 10. Neri U, Wolf YI, Roux S. et al. Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 2022;185:4023–37. 10.1016/j.cell.2022.08.023 [DOI] [PubMed] [Google Scholar]
- 11. Zayed AA, Wainaina JM, Dominguez-Huerta G. et al. Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome. Science 2022;376:156–62. 10.1126/science.abm5847 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Hou X, He Y, Fang P. et al. Using artificial intelligence to document the hidden RNA virosphere. Cell 2024;187:6929–42. 10.1016/j.cell.2024.09.027 [DOI] [PubMed] [Google Scholar]
- 13. Mutz P, Camargo AP, Sahakyan H. et al. The protein structurome of Orthornavirae and its dark matter. mBio 2024;16:e03200–24. 10.1128/mbio.03200-24 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Urayama S, Takaki Y, Nunoura T. FLDS: a comprehensive dsRNA sequencing method for intracellular RNA virus surveillance. Microbes Environ 2016;31:33–40. 10.1264/jsme2.ME15171 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Urayama S, Takaki Y, Nishi S. et al. Unveiling the RNA virosphere associated with marine microorganisms. Mol Ecol Resour 2018;18:1444–55. 10.1111/1755-0998.12936 [DOI] [PubMed] [Google Scholar]
- 16. Kumar M, Carmichael GG. Antisense RNA: function and fate of duplex RNA in cells of higher eukaryotes. Microbiol Mol Biol Rev 1998;62:1415–34. 10.1128/mmbr.62.4.1415-1434.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Hirai M, Takaki Y, Kondo F. et al. RNA viral metagenome analysis of subnanogram dsRNA using fragmented and primer ligated dsRNA sequencing (FLDS). Microbes Environ 2021;36:ME20152. 10.1264/jsme2.ME20152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hirai J, Urayama S, Takaki Y. et al. RNA Virosphere in a marine zooplankton Community in the Subtropical Western North Pacific. Microbes Environ 2022;37:ME21066. 10.1264/jsme2.ME21066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Edgar R. Known phyla dominate the Tara oceans RNA virome. Virus Evol 2023;9:vead063. 10.1093/ve/vead063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Morris TJ, Dodds JA. Isolation and analysis of double-stranded RNA from virus-infected plant and fungal tissue. Phytopathology 1979;69:854–8. 10.1094/Phyto-69-854 [DOI] [PubMed] [Google Scholar]
- 21. Okada R, Kiyota E, Moriyama H. et al. A simple and rapid method to purify viral dsRNA from plant and fungal tissue. J Gen Plant Pathl 2015;81:103–7. 10.1007/s10327-014-0575-6 [DOI] [Google Scholar]
- 22. Kopylova E, Noé L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 2012;28:3211–7. 10.1093/bioinformatics/bts611 [DOI] [PubMed] [Google Scholar]
- 23. Mistry J, Chuguransky S, Williams L. et al. Pfam: the protein families database in 2021. Nucleic Acids Res 2021;49:D412–9. 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Sakaguchi S, Urayama SI, Takaki Y. et al. NeoRdRp: a comprehensive dataset for identifying RNA-dependent RNA polymerases of various RNA viruses from metatranscriptomic data. Microbes Environ 2022;37:ME22001. 10.1264/jsme2.ME22001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Fu L, Niu B, Zhu WS. et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012;28:3150–2. 10.1093/bioinformatics/bts565 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. Berkeley, CA: Lawrence Berkeley National Lab, 2014. [Google Scholar]
- 27. Li H, Handsaker B, Wysoker A. et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–9. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Hyatt D, Chen GL, LoCascio PF. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010;11:1–11. 10.1186/1471-2105-11-119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Altschul SF, Gish W, Miller W. et al. Basic local alignment search tool. J Mol Biol 1990;215:403–10. 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
- 30. Krogh A, Larsson B, Von Heijne G. et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001;305:567–80. 10.1006/jmbi.2000.4315 [DOI] [PubMed] [Google Scholar]
- 31. Zimmermann L, Stephens A, Nam SZ. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J Mol Biol 2018;430:2237–43. 10.1016/j.jmb.2017.12.007 [DOI] [PubMed] [Google Scholar]
- 32. Nishimura Y, Yamada K, Okazaki Y. et al. DiGAlign: versatile and interactive visualization of sequence alignment for comparative genomics. Microbes Environ 2024;39:ME23061. 10.1264/jsme2.ME23061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Pebesma E. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 2018;10:439–46. 10.32614/RJ-2018-009 [DOI] [Google Scholar]
- 34. Lorenz R, Bernhart SH, Höner zu Siederdissen C. et al. ViennaRNA package 2.0. Algorithms Mol Biol 2011;6:1–14. 10.1186/1748-7188-6-26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Lifschitz S, Haeusler EH, Catanho M. et al. Bio-strings: a relational database data-type for dealing with large biosequences. BioTech (Basel) 2022;11:31. 10.3390/biotech11030031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Altschul SF, Madden TL, Schäffer AA. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–402. 10.1093/nar/25.17.3389 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Camargo AP, Nayfach S, Chen IMA. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res 2023;51:D733–43. 10.1093/nar/gkac1037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Edgar RC. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat Commun 2022;13:6968. 10.1038/s41467-022-34630-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Steinegger M, Meier M, Mirdita M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 2019;20:473. 10.1186/s12859-019-3019-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Esterman ES, Wolf YI, Kogay R. et al. Evolution of DNA packaging in gene transfer agents. Virus Evol 2021;7:veab015. 10.1093/ve/veab015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 2010;5:e9490. 10.1371/journal.pone.000949 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Kumar S, Stecher G, Suleski M. et al. MEGA12: molecular evolutionary genetic analysis version 12 for adaptive and green computing. Mol Biol Evol 2024;41:msae263. 10.1093/molbev/msae263 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Nguyen LT, Schmidt HA, Von Haeseler A. et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 2015;32:268–74. 10.1093/molbev/msu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Sievers F, Higgins DG. The clustal omega multiple alignment package. Methods Mol Biol 2021;2231:3–16. 10.1007/978-1-0716-1036-7_1 [DOI] [PubMed] [Google Scholar]
- 45. Jumper J, Evans R, Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Mirdita M, Schütze K, Moriwaki Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 2022;19:679–82. 10.1038/s41592-022-01488-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Eastman P, Swails J, Chodera JD. et al. OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS Comput Biol 2017;13:e1005659. 10.1371/journal.pcbi.1005659 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Meng EC, Goddard TD, Pettersen EF. et al. UCSF ChimeraX: tools for structure building and analysis. Protein Sci 2023;32:e4792. 10.1002/pro.4792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013;30:772–80. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Johnson LS, Eddy SR, Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 2010;11:431. 10.1186/1471-2105-11-431 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Holm L, Laiho A, Törönen P. et al. DALI shines a light on remote homologs: one hundred discoveries. Protein Sci 2023;32:e4519. 10.1002/pro.4519 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Hutchinson EC, von Kirchbach JC, Gog JR. et al. Genome packaging in influenza a virus. J Gen Virol 2010;91:313–28. 10.1186/1471-2105-11-431 [DOI] [PubMed] [Google Scholar]
- 53. Bamford DH, Romantschuk M, Somerharju PJ. Membrane fusion in prokaryotes: bacteriophage phi 6 membrane fuses with the pseudomonas syringae outer membrane. EMBO J 1987;6:1467–73. 10.1002/j.1460-2075.1987.tb02388.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Balvay L, Rifo RS, Ricci EP. et al. Structural and functional diversity of viral IRESes. Biochim Biophys Acta 2009;1789:542–57. 10.1016/j.bbagrm.2009.07.005 [DOI] [PubMed] [Google Scholar]
- 55. Peng T, Yang F, Yang F. et al. Structural diversity and biological role of the 5’untranslated regions of picornavirus. RNA Biol 2023;20:548–62. 10.1080/15476286.2023.2240992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Kim RS, Levy Karin E, Mirdita M. et al. BFVD—a large repository of predicted viral protein structures. Nucleic Acids Res 2025;53:D340–7. 10.1093/nar/gkae1119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Kleymann A, Becker AA, Malik YS. et al. Detection and molecular characterization of picobirnaviruses (PBVs) in the mongoose: identification of a novel PBV using an alternative genetic code. Viruses 2020;12:99. 10.3390/v12010099 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Butcher SJ, Grimes JM, Makeyev EV. et al. A mechanism for initiating RNA-dependent RNA polymerization. Nature 2001;410:235–40. 10.1038/35065653 [DOI] [PubMed] [Google Scholar]
- 59. Wang M, Li R, Shu B. et al. Stringent control of the RNA-dependent RNA polymerase translocation revealed by multiple intermediate structures. Nat Commun 2020;11:2605. 10.1038/s41467-020-16234-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Duquerroy S, Da Costa B, Henry C. et al. The picobirnavirus crystal structure provides functional insights into virion assembly and cell entry. EMBO J 2009;28:1655–65. 10.1038/emboj.2009.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Poranen MM, Bamford DH. Assembly of large icosahedral double-stranded RNA viruses. Adv Exp Med Biol 2012;726:379–402. 10.1007/978-1-4614-0980-9_17 [DOI] [PubMed] [Google Scholar]
- 62. Luque D, Mata CP, Suzuki N. et al. Capsid structure of dsRNA fungal viruses. Viruses 2018;10:481. 10.3390/v10090481 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Mata CP, Rodríguez JM, Suzuki N. et al. Structure and assembly of double-stranded RNA mycoviruses. Adv Virus Res 2020;108:213–47. 10.1016/bs.aivir.2020.08.001 [DOI] [PubMed] [Google Scholar]
- 64. Ortega-Esteban Á, Mata CP, Rodríguez-Espinosa MJ. et al. Cryo-electron microscopy structure, assembly, and mechanics show morphogenesis and evolution of human picobirnavirus. J Virol 2020;94:e01542–20. 10.1128/jvi.01542-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Krupovic M, Dolja VV, Koonin EV. Plant viruses of the Amalgaviridae family evolved via recombination between viruses with double-stranded and negative-strand RNA genomes. Biol Direct 2015;10:12. 10.1186/s13062-015-0047-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Nguyen HM, Watanabe S, Sharmin S. et al. RNA and single-stranded DNA phages: unveiling the promise from the underexplored world of viruses. Int J Mol Sci 2023;24:17029. 10.3390/ijms242317029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Mäntynen S, Laanto E, Oksanen HM. et al. Black box of phage–bacterium interactions: exploring alternative phage infection strategies. Open Biol 2021;11:210188. 10.1098/rsob.210188 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Urayama S, Takaki Y, Chiba Y. et al. Eukaryotic microbial RNA viruses—acute or persistent? Insights into their function in the aquatic ecosystem. Microbes Environ 2022;37:ME22034. 10.1264/jsme2.ME22034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Wilhelm SW, Suttle CA. Viruses and nutrient cycles in the sea: viruses play critical roles in the structure and function of aquatic food webs. Bioscience 1999;49:781–8. 10.2307/1313569 [DOI] [Google Scholar]
- 70. Zimmerman AE, Howard-Varona C, Needham DM. et al. Metabolic and biogeochemical consequences of viral infection in aquatic ecosystems. Nat Rev Microbiol 2020;18:21–34. 10.1038/s41579-019-0270-x [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Datasets of sequences obtained in this study are available under accession numbers as follows: GT1, LC876750 and LC876751; GT2, LC876752 and LC876753; GT3, LC876754 and LC876755; GT4, LC876756 and LC876757; GT5, LC876758 and LC876759 for identified paraxenoviruses (Table S3) and are described in BioProject accession no. PRJDB20418.



