Virology has long been viewed through the prism of human, cattle, or plant diseases, leading to a largely incomplete picture of the viral world. The serendipitous discovery of the first giant virus visible under a light microscope (i.e., >0.3 μm in diameter), mimivirus, opened a new era of environmental virology, now incorporating protozoan-infecting viruses. Planet-wide isolation studies and metagenome analyses have shown the presence of giant viruses in most terrestrial and aquatic environments, including upper Pleistocene frozen soils. Those systematic surveys have led authors to propose several new distinct families, including the Mimiviridae, Marseilleviridae, Faustoviridae, Pandoraviridae, and Pithoviridae. We now propose to introduce one additional family, the Molliviridae, following the description of M. kamchatka, the first modern relative of M. sibericum, previously isolated from 30,000-year-old arctic permafrost.
KEYWORDS: Acanthamoeba, comparative genomics, Kamchatka, NCLDV, paleovirology, Arctic
ABSTRACT
Microbes trapped in permanently frozen paleosoils (permafrost) are the focus of increasing research in the context of global warming. Our previous investigations led to the discovery and reactivation of two Acanthamoeba-infecting giant viruses, Mollivirus sibericum and Pithovirus sibericum, from a 30,000-year old permafrost layer. While several modern pithovirus strains have since been isolated, no contemporary mollivirus relative was found. We now describe Mollivirus kamchatka, a close relative to M. sibericum, isolated from surface soil sampled on the bank of the Kronotsky River in Kamchatka, Russian Federation. This discovery confirms that molliviruses have not gone extinct and are at least present in a distant subarctic continental location. This modern isolate exhibits a nucleocytoplasmic replication cycle identical to that of M. sibericum. Its spherical particle (0.6 μm in diameter) encloses a 648-kb GC-rich double-stranded DNA genome coding for 480 proteins, of which 61% are unique to these two molliviruses. The 461 homologous proteins are highly conserved (92% identical residues, on average), despite the presumed stasis of M. sibericum for the last 30,000 years. Selection pressure analyses show that most of these proteins contribute to virus fitness. The comparison of these first two molliviruses clarify their evolutionary relationship with the pandoraviruses, supporting their provisional classification in a distinct family, the Molliviridae, pending the eventual discovery of intermediary missing links better demonstrating their common ancestry.
IMPORTANCE Virology has long been viewed through the prism of human, cattle, or plant diseases, leading to a largely incomplete picture of the viral world. The serendipitous discovery of the first giant virus visible under a light microscope (i.e., >0.3 μm in diameter), mimivirus, opened a new era of environmental virology, now incorporating protozoan-infecting viruses. Planet-wide isolation studies and metagenome analyses have shown the presence of giant viruses in most terrestrial and aquatic environments, including upper Pleistocene frozen soils. Those systematic surveys have led authors to propose several new distinct families, including the Mimiviridae, Marseilleviridae, Faustoviridae, Pandoraviridae, and Pithoviridae. We now propose to introduce one additional family, the Molliviridae, following the description of M. kamchatka, the first modern relative of M. sibericum, previously isolated from 30,000-year-old arctic permafrost.
INTRODUCTION
The serendipitous discovery of the Acanthamoeba-infecting mimivirus (1) and its detailed characterization (2, 3) more than 15 years ago started a new era in virology that has now revealed the existence of several families of so-called giant viruses exhibiting particles rivaling in size and gene contents those in the cellular world. As of today, the use of Acanthamoeba (or related amoebozoa) as laboratory hosts (and environmental baits) has allowed the discovery and isolation of previously overlooked giant and large DNA viruses exhibiting very diverse distinct virion morphologies and sizes, gene contents, and intracellular modes of replication (4). Few of them have yet received a formal International Committee on Taxonomy of Viruses (ICTV) taxonomy (5), with the exception of the Mimiviridae (i.e., mimivirus relatives) (6) and the Marseilleviridae (7), which were chronologically first described and which appear to be the most abundant in the environment. More recent discoveries include the pandoraviruses in 2013 (i.e., the proposed Pandoraviridae family) (8), the number of which also increases rapidly (9); pithovirus (the prototype of the proposed Pithoviridae family) in 2014 (10, 11); faustovirus (12) (related to the asfarviruses) in 2015; and the more elusive mollivirus (13), which had no known relative, until this work. The latest addition to the list of large DNA virus-infecting amoebas is medusavirus (14). The main giant virus groups are positioned relative to each other in Fig. 1, using a phylogenetic tree of the DNA polymerase (the most informative of the only 3 core proteins strictly conserved across all large eukaryotic DNA viruses) (15). As expected from such a small number of common markers (especially relative to the hundreds of proteins that the genomes of these viruses encode), the evolutionary relationships and origins of these amoeba-infecting giant viruses remain a controversial subject (4, 15–17).
FIG 1.
Phylogeny of DNA polymerase B of large and giant dsDNA viruses. This neighbor-joining tree was computed (JTT substitution model, 100 resamplings) on 397 amino acid positions from an alignment of 42 sequences computed by the MAFFT program (29). Branches with bootstrap values of <60% were collapsed.
Studies started about 30 years ago have provided multiple evidence that soils frozen since the late Pleistocene and predominantly located in arctic and subarctic Siberia do contain a wide diversity of microbes that can be revived upon thawing (18–20) after tens of thousands of years. These studies culminated by the regeneration of a plant from 30,000-year-old fruit tissue (21). Inspired by those studies, we then isolated from a similar sample two different Acanthamoeba-infecting large DNA viruses, named Pithovirus sibericum (10) and Mollivirus sibericum (13), demonstrating the ability of these viruses, and maybe many others, to remain infectious after long periods of stasis in permafrost. In contrast to P. sibericum, of which several modern relatives have since been characterized (11, 22–24), no other relative of M. sibericum was found, despite the increasing sampling efforts deployed by several laboratories. Without additional isolates, its classification as a prototype of a new family or as a distant relative of the pandoraviruses (with which it shared several morphological features and 16% of its gene content) (13) remained an open question. Here we report the discovery and detailed characterization of the first modern M. sibericum relative, named Mollivirus kamchatka, after the location of the Kronotsky River bank where it was retrieved. The comparative analysis of these first two molliviruses highlights their evolutionary processes and suggests their provisional classification into their own family, the Molliviridae, distinct from the Pandoraviridae, pending the eventual discovery of intermediary missing links clearly establishing their common ancestry.
RESULTS
Virus isolation.
The original sample consisted of about 50 ml of vegetation-free superficial soil scooped (in sterile tubes) from the bank of the Kronotsky River (coordinates, 54°32′59″N, 160°34′55″E) on 6 July 2017. Before being stored at 4°C, the sample was transported in a backpack for a week at ambient temperature (5°C up to 24°C). This area corresponds to a continental subarctic climate: very cold winters and short, cool to mild summers, low humidity, and little precipitation. Back in the laboratory, a few grams of the sample were used in the Acanthamoeba cocultivation procedure previously described (13). After a succession of passages and enrichment on Acanthamoeba castellanii cultures, viral particles were produced in a sufficient quantity to be recovered and purified.
Virion morphology and ultrastructure.
As for M. sibericum, light microscopy of infected cultures showed the multiplication of particles. Using transmission electron microscopy (TEM), these particles, which were undistinguishable from those of M. sibericum, appeared to be approximately spherical, 600 nm in diameter, lined by an internal lipid membrane, and enclosed in a 20-nm-thick electron-dense thick tegument covered with a mesh of fibers (Fig. 2).
FIG 2.
Ultra-thin-section TEM image of a newly synthesized M. kamchatka particle in the cell cytoplasm at 7 h postinfection. The structure of the mature particles appears to be identical to that of M. sibericum mature particles.
Analysis of the replication cycle.
The replication cycle in A. castellanii cells was monitored using TEM and light microscopy of DAPI (4′,6-diamidino-2-phenylindole)-stained infected cultures as previously described (13). The suite of events previously described for host cells infected by M. sibericum was similarly observed upon M. kamchatka replication (13, 25). After entering the amoeba cell through phagocytosis, M. kamchatka virions were found gathered in large vacuoles individually or in groups of 2 to 6 particles. Multiple nuclear events occurred during the infection, starting with the drift of the host cell nucleolus to the periphery of the nucleus at 4 to 5 h postinfection (p.i.). At 7 h p.i., the nucleus appeared to be filled with numerous fibrils that may correspond to viral genomes tightly packed in DNA-protein complexes (Fig. 3A). About 30% of the nuclei observed at that time exhibited a ruptured nuclear membrane (Fig. 3B). Besides those internal nuclear events, we observed a loss of vacuolization within the host cell from 4 h p.i. to the end of the cycle (on average, at 9 h p.i.). Large viral factories were formed in the cytoplasm at the periphery of the disorganized nucleus. These viral factories displayed the same characteristics as those formed during M. sibericum infections, involving an active recycling of membrane fragments (25).
FIG 3.
Ultra-thin-section TEM image of an A. castellanii cell at 7 to 10 h postinfection by M. kamchatka. (A) A viral factory exhibiting fibrils (F), a nascent viral particle (V), and surrounding mitochondria (M). Fragments of the ruptured nuclear membrane are visible as dark bead strings. (B) Details of a nuclear membrane rupture through which fibrils synthesized in the nucleus (N) are shed into the cytoplasm (C).
Comparative genomics.
DNA prepared from purified M. kamchatka particles was sequenced using both the Illumina and the Oxford Nanopore Technologies (ONT) platforms. The M. kamchatka genome, a linear double-stranded DNA (dsDNA) molecule, was readily assembled as a unique sequence of 648,864 bp. The read coverage was uniform throughout the genome, except for a 10-kb terminal segment presumably repeated at both ends and exhibiting twice the average value. The M. kamchatka genome is thus topologically identical to that of M. sibericum, is slightly larger (when including both terminal repeats), and has the same global nucleotide composition (G+C content = 60%). This similarity was confirmed by a detailed comparison of their genome sequences, which showed that they exhibit a global collinearity interrupted by only a few insertions and deletions.
Prior to the comparison of their gene contents, both M. sibericum and M. kamchatka were annotated using the same stringent procedure that we previously developed to correct for gene overpredictions suspected to occur in G+C-rich sequences, such as those of pandoraviruses (8, 26, 27). A total of 495 and 480 genes were predicted for M. sibericum and M. kamchatka, respectively, with the encoded proteins ranging from 51 to 2,171 residues and from 57 to 2,176 residues, respectively.
M. kamchatka predicted protein sequences were used in a similarity search against the nonredundant protein sequence database (28) and the reannotated M. sibericum predicted proteome. Out of the 480 proteins predicted to be encoded by M. kamchatka, 463 had their closest homologs in M. sibericum, with 92% identical residues on average. After clustering the paralogs, these proteins corresponded to 434 distinct genes clusters delineating a first estimate of the mollivirus core gene set. Four hundred eleven of these clusters contained a single-copy gene (singleton) for each strain. Two hundred ninety of the 480 (60.4%) M. kamchatka-encoded proteins did not exhibit a detectable homolog among cellular organisms or previously sequenced viruses (excluding M. sibericum). Those proteins are referred to as “ORFans.” Among the 190 proteins exhibiting significant (E < 10−5) matches, in addition to their M. sibericum counterparts, 78 (16% of the total gene content) were most similar to pandoravirus predicted proteins, 18 (3.7%) were most similar to proteins of other virus families, 51 (10.6%) were most similar to A. castellanii proteins, 24 (5%) were most similar to proteins of other eukaryotes, 17 (3.5%) were most similar to bacterial proteins, and 2 (0.4%) were most similar to proteins of the Archaea (Fig. 4).
FIG 4.
Distribution of the best-matching nonredundant homologs of M. kamchatka predicted proteins. Best-matching homologous proteins were identified using the BLASTp program (E value, <10−5) against the nonredundant (NR) database (15) (after excluding M. sibericum). Green shades are used for eukaryotes, and red shades are used for viruses.
The interpretation of these statistics is ambiguous as, on the one hand, the large proportion of ORFans (>60%) is characteristic of what is usually found for the prototypes of novel giant virus families (4). On the other hand, the closest viral homologs are not scattered in diverse previously defined virus families but mostly belong to the Pandoraviridae (78/96 [81%]) (Fig. 4). The two molliviruses thus constitute a new group of viruses with their own specificity but with a phylogenetic affinity with the pandoraviruses, as previously noticed (4). The proportion of M. kamchatka proteins with best-matching counterparts in Acanthamoeba confirms a gene exchange propensity with the host, already noticed for M. sibericum (13).
Recent evolutionary events since the M. sibericum/M. kamchatka divergence.
We investigated the evolutionary events specific for each of the molliviruses by focusing on proteins lacking reciprocal best matches between the two strains. We found 63 such cases, of which 10 corresponded to unilateral strain-specific duplications of genes and 53 were unique to a given strain. These unique genes (Tables 1 and 2) result from gains or losses in either of the mollivirus strains (20 in M. kamchatka, 33 in M. sibericum). The likely origins of these strain-specific genes (horizontal acquisition, de novo creation [26, 27], or differential loss) are listed in Table 1 and Table 2.
TABLE 1.
Status of the protein-coding genes unique to M. kamchatka
ORF identifier | Predicted function | Putative evolutionary scenario |
---|---|---|
mk_25 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_92 | DNA methyltransferase | Loss in M. sibericum (present in some pandoraviruses) |
mk_93 | Ring domain | Loss in M. sibericum (present in some pandoraviruses) |
mk_104 | Ring domain | Loss in M. sibericum (present in some pandoraviruses) |
mk_127 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_159 | None | Loss in M. sibericum (present in some pandoraviruses) |
mk_165 | None | Horizontal gene transfer from pandoravirus |
mk_166 | Peptidase | Loss in M. sibericum (present in some pandoraviruses) |
mk_172 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_182 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_231 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_313 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_369 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_415 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_441 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_466 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_467 | None | Loss in M. sibericum (horizontal gene transfer to Acanthamoeba) |
mk_469 | (BI)-1 like | Loss in M. sibericum (present in Acanthamoeba) |
mk_476 | None (ORFan) | De novo creation or loss in M. sibericum |
mk_478 | None (ORFan) | De novo creation or loss in M. sibericum |
TABLE 2.
Status of the protein-coding genes unique to M. sibericum
ORF identifier | Predicted function | Putative evolutionary scenario |
---|---|---|
ms_1 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_3 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_5 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_7 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_8 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_13 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_14 | None | Horizontal gene transfer to Acanthamoeba |
ms_38 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_42 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_53 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_64 | None | Loss in M. kamchatka (present in Noumeavirus) |
ms_109 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_120 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_136 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_138 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_139 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_144 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_157 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_159 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_166 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_172 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_190 | None | Loss in M. kamchatka (present in some pandoraviruses) |
ms_193 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_246 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_258 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_311 | Zinc-finger domain | Loss in M. kamchatka (present in Gossypium hirsutum) |
ms_312 | Zinc-finger domain | Loss in M. kamchatka (present in Cavenderia fasciculata) |
ms_313 | None | Loss in M. kamchatka (present in some pandoraviruses) |
ms_464 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_465 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_479 | DNA methyltransferase | Loss in M. kamchatka (present in some pandoraviruses) |
ms_494 | None (ORFan) | De novo creation or loss in M. kamchatka |
ms_495 | None (ORFan) | De novo creation or loss in M. kamchatka |
Six M. kamchatka proteins, absent from M. sibericum, have homologs in pandoraviruses, suggesting common gene ancestors (and loss in M. sibericum) or horizontal acquisitions. According to its embedded position within the pandoravirus phylogenetic tree, only one anonymous protein (mk_165, sharing 57% identical residues with its homolog in Pandoravirus salinus) could be interpreted to be an ancient horizontal transfer from pandoraviruses (Fig. 5A). Another candidate, mk_92, shared 75% identical residues with a pandoravirus DNA methyltransferase (pqer_cds_559). However, the very long branch associated with the Pandoravirus dulcis homolog (eventually due to a nonorthologous replacement) raises some doubts about the origin of the M. kamchatka gene (Fig. 5B).
FIG 5.
Eventual gene transfers from a pandoravirus to M. kamchatka. Both phylogenetic trees were computed from the global alignments of orthologous protein sequences using MAFFT (29). IQtree (48) was used to determine the optimal substitution model (options, -m TEST and –bb 1000). (A) Results for the mk_165 protein (no predicted function). The corresponding long branch suggests its accelerated divergence since an ancient acquisition from a pandoravirus. (B) Results for the predicted methyltransferase mk_92. The long branch leading to the P. dulcis homolog might alternatively be interpreted as a nonorthologous replacement of the ancestral pandoravirus version of the gene.
Two M. kamchatka-specific proteins, encoded by adjacent genes (mk_466, mk_467), have homologs in Acanthamoeba, suggesting potential host-virus exchanges. Phylogenetic reconstruction did not suggest a direction for the transfer of mk_466. However, since the unique homolog of mk_467 is found in Acanthamoeba (and not in other eukaryotes), the corresponding gene could have been recently acquired from a close relative of M. kamchatka.
Three proteins unique to M. sibericum have homologs in pandoraviruses (Table 2), suggesting common gene ancestors (and loss in M. kamchatka) or horizontal acquisitions. One protein (ms_14) has a unique homolog in Acanthamoeba, suggesting a virus-to-host exchange. The homolog of ms_312 in Cavenderia fasciculata (29) might be testimony to past interactions between molliviruses and a deeply rooted ancestor of the Amoebozoa clade.
The analyses of the genes unique to each mollivirus described above indicate that if horizontal transfer may contribute to their presence, it is not the predominant mechanism for their acquisition. We then further investigated the 12 genes unique to M. kamchatka and the 26 genes unique to M. sibericum (i.e., encoding strain ORFans without homologs in the databases) by computing three independent sequence properties (Fig. 6): the codon adaptation usage index (CAI), the G+C composition, and the open reading frame (ORF) length.
FIG 6.
Genomic features of strain-specific genes encoding ORFans. (A) Codon adaption index (CAI); (B) G+C content; (C) protein length. The box plots show the median and the 25th and 75th percentiles. P values were calculated using the Wilcoxon test.
With an average CAI value of 0.26, the strain-specific ORFans appear to be significantly different from the rest of the proteins encoded by mollivirus genes (mean = 0.40; Wilcoxon test, P < 2 × 10−7). These genes also exhibit a significantly lower G+C content (56% for M. kamchatka and 57% for M. sibericum) than the rest of the genes (60.5% for both viruses), also closer to the value computed for intergenic regions (54%, on average, for both viruses). Moreover, the strain-specific genes encoding ORFans are smaller, on average, than the rest of the genes (115 bp and 378 bp, respectively, for M. kamchatka and 122 bp and 369 bp, respectively, for M. sibericum). Altogether, those results suggest that de novo gene creation might occur in the intergenic regions of molliviruses, as already postulated for pandoraviruses (26, 27).
New predicted protein functions in M. kamchatka.
Sixty-four of the M. kamchatka predicted proteins exhibited sequence motifs associated with known functions. Fifty-nine of them were orthologous to previously annotated genes in M. sibericum (13). This common subset confirms the limited complement of DNA processing and repair enzymes found in molliviruses: mainly, a DNA polymerase (mk_287), a primase (mk_236), and 3 helicases (mk_291, mk_293, mk_351). M. kamchatka confirms the absence of key deoxynucleotide synthesis pathways (such as thymidylate synthase, thymidine kinase, and thymidylate kinase) and of a ribonucleoside-diphosphate reductase (present in pandoraviruses). The five M. kamchatka-specific proteins (Table 1) exhibiting functional motifs or domain signatures correspond to two proteins (mk_93 and mk_104) containing a type of zinc finger (ring domain) mediating protein interactions, one protein (mk_469) with similarity to the BAX inhibitor (BI)-1- like family of small transmembrane proteins, one predicted LexA-related signal peptidase (mk_166), and one DNA methyltransferase (mk_92).
Evaluation of the selection pressure exerted on mollivirus genes.
The availability of two distinct strains of mollivirus allows the first estimation of the selection pressure exerted on their shared genes during their evolution. This was done by computing the ratio ω = dN/dS, which is the ratio of the rate of nonsynonymous mutations (dN) over the rate of synonymous mutations (dS) for pairs of orthologous genes. ω values much less than 1 are associated with genes in which mutations have the strongest negative impact on virus fitness. The high sequence similarity of proteins shared by M. sibericum and M. kamchatka allowed the generation of flawless pairwise alignments and the computation of highly reliable ω values for most (i.e., 397/411) of their orthologous singletons.
Fourteen singleton pairs were not taken into account in the selection pressure analysis because of their either identical (11 of the pairs) or quasi-identical (1 pair, >98% identical nucleotides) sequences or unreliable pairwise alignments (2 pairs). For the 397 gene pairs retained in the analysis, the mean ω value was 0.24 ± 0.14 (Fig. 7). This result corresponds to a strong negative selection pressure, indicating that most of the encoded proteins greatly contribute to the molliviruses’ fitness. Together with the high level of pairwise similarity (92%) of their proteins, this also indicates that M. kamchatka evolved very little during the last 30,000 years and that the M. sibericum genome was not prominently damaged during its cryostasis in permafrost.
FIG 7.
Selection pressure among different classes of genes. Values of ω (i.e., dN/dS) were computed from the alignments of homologous coding regions in M. kamchatka and M. sibericum. (A) Distribution of calculated ω values (n = 397). (B) Box plots of the ω ratio among ORFan genes (n = 243) and non-ORFan genes (n = 154). The box plots show the median and the 25th and 75th percentiles. All P values were calculated using the Wilcoxon test.
The analysis restricted to the 244 pairs of ORFan-encoding genes resulted in a very similar ω value of 0.29 ± 0.15 (Fig. 7). This indicates that although homologs of these ORFan proteins are found only in molliviruses, they have the same impact on virus fitness as more ubiquitous proteins. Importantly, this also confirms that these predicted ORFans are actual proteins, albeit with unknown functions. In contrast, five orthologous pairs (ms_160/mk_141, ms_280/mk_262, ms_171/mk_151, ms_430/mk_411, ms_60/mk_48) exhibited ω values larger than 1. Those ORFans under positive selection might correspond to newly created gene products undergoing refinement or pseudogenization.
We further examined the selection pressure of protein-coding genes with homologs in pandoraviruses. We used their 10 sequenced genomes to generate the corresponding gene clusters (Fig. 8). The 90 clusters shared by both virus groups included 64 singletons (single-copy genes present in all viruses), among which 55 were suitable for dN/dS computations. The mean ω value (0.17 ± 0.1) was very low, indicating that these genes, forming a super core gene set common to the molliviruses and pandoraviruses, are under an even stronger negative selection pressure than those constituting the provisional (most likely overestimated) mollivirus core gene set.
FIG 8.
Comparison of the mollivirus and pandoravirus core gene contents. (A) The distribution of the protein clusters shared by all pandoraviruses (black), by the two Molliviruses (pink), and by both virus groups (super core genes) (blue). (B) Box plot of ω values calculated from the alignment of mollivirus core genes (pink) and super core genes (blue). The box plots show the median and the 25th and 75th percentiles.
Genomic inhomogeneity.
The original genome analysis of Lausannevirus (a member of the Marseilleviridae family) (30) revealed an unexpected nonuniform distribution of genes according to their annotation. Hypothetical genes (i.e., mostly encoding ORFans) were segregated from annotated genes (i.e., mostly encoding non-ORFans) in two different halves of the genome. In a more recent work, we noticed a similar bias in the distribution of pandoravirus core genes (26). The availability of a second mollivirus isolate gave us the opportunity to investigate this puzzling feature for yet another group of Acanthamoeba-infecting virus. In Fig. 9, we plotted the distribution of three types of genes: (i) those with homologs in A. castellanii (n = 55 for M. sibericum and n = 51 for M. kamchatka), (ii) those belonging to the super core set shared by both molliviruses and pandoraviruses (n = 64), and (iii) those unique to either mollivirus strains (n = 26 for M. sibericum, n = 12 for M. kamchatka). These plots reveal a strong bias in the distribution of the super core versus ORFan genes (Fig. 9). The first half of the M. sibericum genome exhibits 90% of its ORFan genes, while the second half contains most of the members of the super core gene set. In contrast, genes eventually exchanged with the host display a more uniform distribution. The lack of an apparent segregation in the distribution of ORFan-encoding genes in the M. kamchatka genome might be due to their unreliable prediction, as no transcriptome information is available for this strain. Figure 10 shows that there is also a strong bias in the distribution of single-copy genes versus those with paralogs in either M. sibericum or M. kamchatka. Altogether, these analyses suggest that the right and left genome halves follow different evolutionary scenarios, with the first half concentrating the genomic plasticity (de novo gene creation, gene duplication) and the other half concentrating the most conserved, eventually essential, gene content.
FIG 9.
Distribution of different classes of genes along mollivirus genomes. (A) Variation of the gene density computed by use of the ggpplot2 geom_density function (49). The distribution of super core genes (n = 64, in green) is strongly biased toward the right half of the genome, in contrast to the genes with best-matching homologs in A. castellanii (in the nonredundant database, excluding mollivirus), which are more evenly distributed (in pink) (n = 55 and n = 51 for M. sibericum and M. kamchatka, respectively). M. sibericum-specific ORFan-encoding genes (in blue) also exhibit a nonuniform distribution toward the left half of the genome. (B) Cumulative distribution of the above-described classes of genes using the same color code used in panel A.
FIG 10.
Distribution of single-copy versus multiple-copy genes along mollivirus genomes. Single-copy genes (in blue) in both strains are evenly distributed, in contrast to genes with paralogs (pink), which cluster in the left half of the genomes. (A) M. sibericum (n = 48); (B) M. kamchatka (n = 46).
DISCUSSION
Following the discovery of their first representatives, each family of giant (e.g., Mimiviridae, Pithoviridae, Pandoraviridae) and large (e.g., Marseilleviridae) viruses infecting Acanthamoeba has expanded steadily, suggesting that they are relatively abundant and present in a large variety of environments. One noticeable exception has been the molliviruses, the prototype of which remained unique after its isolation from 30,000-year-old permafrost. The absence of M. sibericum relatives from the large number of samples processed by others and us since 2014 raised the possibility that they might have gone extinct or might be restricted to the Siberian Arctic. Our isolation of a second representative of the proposed Molliviridae family, M. kamchatka, at a location more than 1,500 km from where the first isolate was recovered and in a milder climate is now refuting these hypotheses. Yet, the planet-wide ubiquity of these viruses remains to be established, in contrast to other Acanthamoeba-infecting giant viruses (4). Even when present, mollivirus-like viruses appear to be present at a very low abundance, as judged from the very small fraction of metagenomic reads that they represent in the total sample DNA for M. kamchatka (about 0.02 part per million) as well as for M. sibericum (about 1 part per million) (13). Another possibility would be that the preferred environmental host is not Acanthamoeba, the model host used in our laboratory, making the reactivation less effective. However, evidences of specific gene exchanges with Acanthamoeba (including a highly conserved homolog major capsid protein) (13, 31, 32) make this explanation unlikely. We speculate that members of the proposed Molliviridae family are simply less abundant than other Acanthamoeba-infecting viruses, a conclusion further supported by the paucity of mollivirus-related sequences in the publicly available metagenomics data (data not shown).
As is always the case, the characterization of a second representative of a new virus representative opened new opportunities of analysis. Unfortunately, the closeness of M. kamchatka with M. sibericum limited the amount of information that could be drawn from their comparison. For instance, the number of genes shared by the two isolates is probably a large overestimate of the core gene set characterizing the whole family. On the other hand, the closeness of the two isolates allowed an accurate determination of the selection pressure (ω = dN/dS) exerted on many genes, showing that most of them, including those encoding mollivirus ORFans, encode actual proteins under strong negative selection contributing to virus fitness. Given the partial phylogenetic affinity (i.e., 90 shared gene clusters) of the mollivirus with the pandoraviruses, we also assessed the selection pressure exerted on 55 of these super core genes and found them to be under even stronger negative selection (Fig. 8). This finding suggests that this super core gene set might have been present in an ancestor common to both proposed families.
If we postulate that M. sibericum underwent a complete stasis when it became frozen in permafrost while M. kamchatka remained in contact with living acanthamoebae, we could consider the two viral genomes to be separated by at least 30,000 years of evolution (eventually more, if they are not in a direct ancestry relationship) (33). The high percentage of identical residues (92%) in their proteins corresponds to a low substitution rate of 1.7 × 10−6 amino acid change/position/year. This is an overestimate, since the two viruses probably started to diverge from each other longer than 30,000 years ago. This value is nevertheless comparable to the estimates computed for poxviruses (34), given the uncertainty in the number of replicative cycles occurring per year. The high level of sequence similarity of M. kamchatka with M. sibericum also indicates that the latter did not suffer much DNA damage during its frozen stasis, even in the absence of detectable virus-encoded DNA repair functions.
Horizontal gene transfers with the host were suggested by the fact that 51 proteins shared by the two mollivirus strains exhibited a second-best match in Acanthamoeba. Because no homolog is detected in other eukaryotes for most of them, these transfers probably occurred in the mollivirus-to-host direction.
The clearest case is that of a major capsid protein homolog (mk_314, ml_347) sharing 64% identical residues with a predicted acanthamoeba protein (locus GenBank accession number XP_004333827). Two other genes encoding proteins that also have homologs in molliviruses flank the corresponding host gene. However, the corresponding viral genes are not colinear in M. sibericum or M. kamchatka and were probably transferred from a different, yet unknown mollivirus strain. The presence of a 100% conserved major capsid protein homolog in the genome of M. kamchatka and M. sibericum is itself puzzling. Such a protein (with a double jelly roll fold) is central to the structure of icosahedral particles (35). Consistent with its detection in M. sibericum virions (13), its conservation in M. kamchatka suggests that it still plays a role in the formation of the spherical mollivirus particles, while it has no homolog in the pandoraviruses. Inspired by previous observations made on the unrelated Lausannevirus genome (30), we unveiled a marked asymmetry in the distribution of different types of protein-coding genes in the mollivirus genomes. As shown in Fig. 9, the left half of the genome concentrates most of the genes coding for strain-specific ORFans, while the right half concentrates most of super core genes shared with pandoraviruses. This asymmetry is even stronger for the multiple-copy genes, while single-copy genes are uniformly distributed along the genome (Fig. 10). The molliviruses thus appear to confine their genomic creativity (de novo creation and gene duplication) in one half of their genome, leaving the other half more stable. An asymmetry in the distribution of the core genes was previously noticed in the pandoravirus genomes (26). Such features might be linked to the mechanism of replication, which is probably similar for the two virus families. Further studies are needed to investigate this process. The asymmetrical genomic distribution of pandoravirus core genes and mollivirus super core genes might be a testimony to their past common ancestry.
Despite their differences in morphology, as well as in virion and genome sizes, the comparative analysis of the prototype M. sibericum and of the new isolate, M. kamchatka, confirms their phylogenetic affinity with the pandoraviruses (Fig. 1 and 4). However, it remains unclear whether this is due to a truly ancestral relationship between them or whether it is only the consequence of numerous past gene exchanges favored by the use of the same cellular host. From the perspective of the sole DNA polymerase sequence, the two known molliviruses do cluster with the pandoraviruses, albeit at an evolutionary distance larger than that usually observed between members of the same virus family (Fig. 1). In the absence of an objective threshold and pending the characterization of eventual missing links, we thus propose to classify M. sibericum and M. kamchatka as members of the proposed Molliviridae family, distinct from the Pandoraviridae.
MATERIALS AND METHODS
Virus isolation.
We isolated M. kamchatka from muddy grit collected near Kronotski Lake, Kamchatka, Russian Federation (54°32′59″N, 160°34′55″E). The sample was stored for 20 days in pure rice medium (36) at room temperature.
An aliquot of the pelleted sample triggered an infected phenotype on a culture of Acanthamoeba castellanii Neff (ATCC 30010) cells adapted to 2.5 μg/ml of amphotericin B (Fungizone), ampicillin (100 μg/ml), chloramphenicol (30 μg/ml), and kanamycin (25 μg/ml) in protease-peptone-yeast extract-glucose (PPYG) medium after 2 days of incubation at 32°C. A final volume of 6 ml of supernatant from two T25 flasks exhibiting infectious phenotypes was centrifuged for 1 h at 16,000 × g at room temperature. Two T75 flasks were seeded with 60,000 cells/cm2 and infected with the resuspended viral pellet. Infected cells were cultured under the same conditions described below. We confirmed the presence of viral particles by light microscopy.
Validation of the presence of M. kamchatka in the original sample.
To confirm the origin of the M. kamchatka isolate from the soil of the Kronotsky River bank, DNA was extracted from the sample and sequenced on an Illumina platform, leading to 340,320,265 paired-end reads (mean length, 150 bp). These metagenomic reads were then mapped onto the genome sequence of M. kamchatka. Seven matching (100% identity) paired-end reads (hence, 14 distinct reads) were detected, indicating the presence of virus particles in the original sample, although at very low concentration. However, the very low probability of such matches by chance (P < 10−63), together with the scattered distribution of these matches along the viral genome, further demonstrates the presence of M. kamchatka in the original sample.
Virus cloning.
Fresh A. castellanii cells were seeded on a 12-well culture plate at a final concentration of 70,000 cells/cm2. Cell adherence was controlled under a light microscope after 45 min, and about 50 viral particles per host cell were added (multiplicity of infection [MOI] = 50). After 1 h, the well was washed 15 times with 3 ml of PPYG to remove any viral particles in suspension. The cells were then recovered by gently scraping the well, and a serial dilution was performed in the next three wells by mixing 200 μl of the previous well with 500 μl of fresh medium. Drops of 0.5 μl of the last dilution were recovered and observed by light microscopy to confirm the presence of a unique A. castellanii cell. The 0.5-μl droplets were then distributed in each well of three 24-well culture plate. One thousand uninfected A. castellanii cells in 500 μl of PPYG were added to the wells seeded with a single cell and incubated at 32°C until witnessing the evidence of a viral production from the unique clone. The corresponding viral clones were recovered and amplified prior to purification, DNA extraction, and cell cycle characterization by electron microscopy.
Virus mass production and purification.
A total number of 40 T75 flasks were seeded with fresh A. castellanii cells at a final concentration of 60,000 cells/cm2. We controlled cell adherence using light microscopy after 45 min, and flasks were infected with a single clone of M. kamchatka at an MOI of 1. After 48 h of incubation at 32°C, we recovered cells exhibiting infectious phenotypes by gently scraping the flasks. We centrifuged for 10 min at 500 × g to remove any cellular debris, and viruses were pelleted by a 1-h centrifugation at 6,800 × g. The viral pellet was then layered on a discontinuous cesium chloride gradient (1.2 g/cm2, 1.3 g/cm2, 1.4 g/cm2, 1.5 g/cm2) and centrifuged for 20 h at 103,000 × g. The viral fraction produced a white disk, which was recovered, washed twice in phosphate-buffered saline (PBS), and stored at 4°C or −80°C with 7.5% dimethyl sulfoxide.
Infectious cycle observations using TEM.
Twelve T25 flasks were seeded with a final concentration of 80,000 cells/cm2 in PPYG medium containing antibiotics. In order to get a synchronous infectious cycle, 11 flasks were infected by freshly produced M. kamchatka at a substantial MOI of 40. The A. castellanii-seeded flasks were fixed by adding an equal volume of PBS buffer with 5% glutaraldehyde at different time points after the infection: 1 h p.i., 2 h p.i., 3 h p.i., 4 h p.i., 5 h p.i., 6 h p.i., 7 h p.i., 8 h p.i., 9 h p.i., 10 h p.i., and 25 h p.i. After 45 min of fixation at room temperature, the cells were scraped and pelleted for 5 min at 500 × g. Then, the cells were resuspended in 1 ml of PBS buffer with 2.5% glutaraldehyde and stored at 4°C. Each sample was coated in 1 mm3 of 2% low-melting-point agarose and embedded in Epon 812 resin. An optimized osmium-thiocarbohydrazide-osmium (OTO) protocol was used for staining the samples: 1 h of fixation in PBS with 2% osmium tetroxide and 1.5% potassium ferrocyanide, 20 min in water with 1% thiocarbohydrazide, 30 min in water with 2% osmium tetroxide, overnight incubation in water with 1% uranyl acetate, and finally, 30 min in lead aspartate. Dehydration was done using increasing concentrations of ethanol (50%, 75%, 85%, 95%, 100%) and cold dry acetone. Samples were progressively impregnated with an increasing mix of acetone and Epon-812 resin mixed with dodecenyl succinic anhydride (DDSA; 0.34, vol/vol) and nadic methyl anhydride (NMA; 0.68, vol/vol) (33%, 50%, 75% and 100%). Final molding was done using a hard Epon 812 mix with DDSA (0.34, vol/vol), NMA (0.68, vol/vol), and 0.031 (vol/vol) tri(dimethyl amino methyl) phenol (DMP30) accelerator, and the mixture was hardened in an oven at 60°C for 5 days. Ultrathin sections (90 nm thick) were observed using an FEI Tecnai G2 operating at 200 kV.
DNA extraction.
M. kamchatka genomic DNA was extracted from approximately 5 × 109 purified virus particles using a PureLink genomic extraction minikit according to the manufacturer’s recommendations. Lysis was performed in a buffer provided with the kit and extra dithiothreitol (DTT) at a final concentration of 1 mM.
Genome sequencing and assembly.
The genome of M. kamchatka was sequenced using both the Oxford Nanopore Technologies (ONT) and HiSeq 2500 platforms. The DNA sequencing (DNA-seq) paired-end protocol produced 4,515,973 and 4,577,450 reads, respectively, with an average quality score of 34.9 (84% of the reads > Q30). The sequencing of 1,026 ng using the ONT platform allowed us to retrieve 1,104,003 long reads (average size = 6,768 bp, N50 = 9,513). Reads were assembled using the Spades program (version SPAdes-3.12.0) (37) with a stringent k-mer parameter using various iteration steps (k = 21, 41, 61, 81, 99, 127) and the options —nanopore and —careful to minimize the number of mismatches in the final contig.
Annotation of Mollivirus sibericum and Mollivirus kamchatka.
A stringent gene annotation of M. sibericum was performed as previously described (26) using transcriptome sequencing (RNA-seq) transcriptomic data (13). Stranded RNA-seq reads were used to accurately annotate protein-coding genes. Stringent gene annotation of M. kamchatka was performed without RNA-seq data but was performed by taking into account protein similarity with M. sibericum. Gene predictions were manually curated using the web-based genomic annotation editing platform Web Apollo (38). Functional annotations of protein-coding genes of both genomes were performed using a two-sided approach, as already previously described (26). Briefly, protein domains were searched with the CD-search tool (39), and protein sequence searching based on the pairwise alignment of hidden Markov models (HMM) was performed against the Uniclust30 database using the HHblits tool (40). Gene clustering was done using Orthofinder’s default parameters (41) and adding the -M msa -oa option. Strict orthology between pairs of proteins was confirmed using best reciprocal BLASTp matches.
Selection pressure analysis.
Ratios of the rate of nonsynonymous mutations (dN) over the rate of synonymous mutations (dS) for pairs of orthologous genes were computed from the MAFFT global alignment (42) using the PAML package and codeml with the model = 2 (43). A strict filter was applied to the dN/dS ratio: dN > 0, dS > 0, dS ≤ 2, and dN/dS ≤ 10. The computation of the codon adaptation usage index (CAI) of both molliviruses was performed using the CAI tool from the Emboss package (44).
Metagenome sequencing, assembly, and annotation.
Total DNA was extracted from 0.25 and 0.26 g of soil sample using a PowerSoil DNA isolation kit (Qiagen) following the manufacturer’s protocol, except for the addition of 80 mM DTT to the lysis buffer (C1) to increase the particle lysis effectiveness. Two hundred fifty nanograms of purified DNA was sequenced on the Illumina HiSeq platform (French National Sequencing Center, Genoscope, Paris, France) using the DNA-seq paired-end protocol, producing a data set of 340,350,938 read pairs (2 × 150 bp in length) with an average quality score of 37.87 (90.45% > Q30). Raw read quality was evaluated with the FASTQC program (45). Identified contaminants (primers and chimeric reads) were discarded. Valid reads were trimmed on the right end using 30 as the quality threshold with BBTools (46). All filtered reads were then mapped to the generated contigs using the Bowtie2 program (47) with the –very-sensitive option.
Availability of data.
The M. kamchatka annotated genome sequence is freely available to the public through the GenBank repository (https://www.ncbi.nlm.nih.gov/genbank/) under accession number MN812837.
ACKNOWLEDGMENTS
We are deeply indebted to our volunteer collaborator, Alexander Morawitz, for collecting the Kamchatka soil sample. We thank N. Brouilly, F. Richard, and A. Aouane (imagery platform, Institut de Biologie du Développement de Marseille Luminy) for their expert assistance and the PACA Bioinfo platform for computing support.
E. Christo-Foroux is the recipient of a DGA-MRIS scholarship (scholarship 201760003), and this project was funded by the French National Research Center (grant PRC1484-2018 to C. Abergel).
The funding bodies had no role in the design of the study, analysis and interpretation of data, and writing the manuscript.
We declare that we have no competing interests.
REFERENCES
- 1.La Scola B, Audic S, Robert C, Jungang L, de Lamballerie X, Drancourt M, Birtles R, Claverie JM, Raoult D. 2003. A giant virus in amoebae. Science 299:2033. doi: 10.1126/science.1081867. [DOI] [PubMed] [Google Scholar]
- 2.Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM. 2004. The 1.2-megabase genome sequence of mimivirus. Science 306:1344–1350. doi: 10.1126/science.1101485. [DOI] [PubMed] [Google Scholar]
- 3.Claverie JM, Grzela R, Lartigue A, Bernadac A, Nitsche S, Vacelet J, Ogata H, Abergel C. 2009. Mimivirus and Mimiviridae: giant viruses with an increasing number of potential hosts, including corals and sponges. J Invertebr Pathol 101:172–180. doi: 10.1016/j.jip.2009.03.011. [DOI] [PubMed] [Google Scholar]
- 4.Abergel C, Legendre M, Claverie JM. 2015. The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiol Rev 39:779–796. doi: 10.1093/femsre/fuv037. [DOI] [PubMed] [Google Scholar]
- 5.Lefkowitz EJ, Dempsey DM, Hendrickson RC, Orton RJ, Siddell SG, Smith DB. 2018. Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic Acids Res 46:D708–D717. doi: 10.1093/nar/gkx932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Claverie JM, Abergel C. 2018. Mimiviridae: an expanding family of highly diverse large dsDNA viruses infecting a wide phylogenetic range of aquatic eukaryotes. Viruses 10:506. doi: 10.3390/v10090506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Colson P, Pagnier I, Yoosuf N, Fournous G, La Scola B, Raoult D. 2013. “Marseilleviridae,” a new family of giant viruses infecting amoebae. Arch Virol 158:915–920. doi: 10.1007/s00705-012-1537-y. [DOI] [PubMed] [Google Scholar]
- 8.Philippe N, Legendre M, Doutre G, Couté Y, Poirot O, Lescot M, Arslan D, Seltzer V, Bertaux L, Bruley C, Garin J, Claverie JM, Abergel C. 2013. Pandoraviruses: amoeba viruses with genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science 341:281–286. doi: 10.1126/science.1239181. [DOI] [PubMed] [Google Scholar]
- 9.Poirot O, Jeudy S, Abergel C, Claverie JM. 2019. A puzzling anomaly in the 4-mer composition of the giant pandoravirus genomes reveals a stringent new evolutionary selection process. J Virol 93:e01206-19. doi: 10.1128/JVI.01206-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Legendre M, Bartoli J, Shmakova L, Jeudy S, Labadie K, Adrait A, Lescot M, Poirot O, Bertaux L, Bruley C, Couté Y, Rivkina E, Abergel C, Claverie JM. 2014. Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology. Proc Natl Acad Sci U S A 111:4274–4279. doi: 10.1073/pnas.1320670111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bertelli C, Mueller L, Thomas V, Pillonel T, Jacquier N, Greub G. 2017. Cedratvirus lausannensis—digging into Pithoviridae diversity. Environ Microbiol 19:4022–4034. doi: 10.1111/1462-2920.13813. [DOI] [PubMed] [Google Scholar]
- 12.Reteno DG, Benamar S, Khalil JB, Andreani J, Armstrong N, Klose T, Rossmann M, Colson P, Raoult D, La Scola B. 2015. Faustovirus, an asfarvirus-related new lineage of giant viruses infecting amoebae. J Virol 89:6585–6594. doi: 10.1128/JVI.00115-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Legendre M, Lartigue A, Bertaux L, Jeudy S, Bartoli J, Lescot M, Alempic JM, Ramus C, Bruley C, Labadie K, Shmakova L, Rivkina E, Couté Y, Abergel C, Claverie JM. 2015. In-depth study of Mollivirus sibericum, a new 30,000-y-old giant virus infecting Acanthamoeba. Proc Natl Acad Sci U S A 112:E5327–E5335. doi: 10.1073/pnas.1510795112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yoshikawa G, Blanc-Mathieu R, Song C, Kayama Y, Mochizuki T, Murata K, Ogata H, Takemura M. 2019. Medusavirus, a novel large DNA virus discovered from hot spring water. J Virol 93:e02130-18. doi: 10.1128/JVI.02130-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Guglielmini J, Woo AC, Krupovic M, Forterre P, Gaia M. 2019. Diversification of giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. Proc Natl Acad Sci U S A 116:19585–19592. doi: 10.1073/pnas.1912006116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nasir A, Caetano-Anollés G. 2015. A phylogenomic data-driven exploration of viral origins and evolution. Sci Adv 1:e1500527. doi: 10.1126/sciadv.1500527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Koonin EV, Yutin N. 2019. Evolution of the large nucleocytoplasmic DNA viruses of eukaryotes and convergent origins of viral gigantism. Adv Virus Res 103:167–202. doi: 10.1016/bs.aivir.2018.09.002. [DOI] [PubMed] [Google Scholar]
- 18.Shi T, Reeves RH, Gilichinsky DA, Friedmann EI. 1997. Characterization of viable bacteria from Siberian permafrost by 16S rDNA sequencing. Microb Ecol 33:169–179. doi: 10.1007/s002489900019. [DOI] [PubMed] [Google Scholar]
- 19.Vishnivetskaya T, Kathariou S, McGrath J, Gilichinsky D, Tiedje JM. 2000. Low-temperature recovery strategies for the isolation of bacteria from ancient permafrost sediments. Extremophiles 4:165–173. doi: 10.1007/s007920070031. [DOI] [PubMed] [Google Scholar]
- 20.Graham DE, Wallenstein MD, Vishnivetskaya TA, Waldrop MP, Phelps TJ, Pfiffner SM, Onstott TC, Whyte LG, Rivkina EM, Gilichinsky DA, Elias DA, Mackelprang R, VerBerkmoes NC, Hettich RL, Wagner D, Wullschleger SD, Jansson JK. 2012. Microbes in thawing permafrost: the unknown variable in the climate change equation. ISME J 6:709–712. doi: 10.1038/ismej.2011.163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yashina S, Gubin S, Maksimovich S, Yashina A, Gakhova E, Gilichinsky D. 2012. Regeneration of whole fertile plants from 30,000-y-old fruit tissue buried in Siberian permafrost. Proc Natl Acad Sci U S A 109:4008–4013. doi: 10.1073/pnas.1118386109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Levasseur A, Andreani J, Delerce J, Bou Khalil J, Robert C, La Scola B, Raoult D. 2016. Comparison of a modern and fossil pithovirus reveals its genetic conservation and evolution. Genome Biol Evol 8:2333–2339. doi: 10.1093/gbe/evw153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Andreani J, Aherfi S, Bou Khalil JY, Di Pinto F, Bitam I, Raoult D, Colson P, La Scola B. 2016. Cedratvirus, a double-cork structured giant virus, is a distant relative of pithoviruses. Viruses 8:300. doi: 10.3390/v8110300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Andreani J, Khalil JYB, Baptiste E, Hasni I, Michelle C, Raoult D, Levasseur A, La Scola B. 2017. Orpheovirus IHUMI-LCC2: a new virus among the giant viruses. Front Microbiol 8:2643. doi: 10.3389/fmicb.2017.02643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Quemin ER, Corroyer-Dulmont S, Baskaran A, Penard E, Gazi AD, Christo-Foroux E, Walther P, Abergel C, Krijnse-Locker J. 2019. Complex membrane remodeling during virion assembly of the 30,000-year-old Mollivirus sibericum. J Virol 93:e00388-19. doi: 10.1128/JVI.00388-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Legendre M, Fabre E, Poirot O, Jeudy S, Lartigue A, Alempic JM, Beucher L, Philippe N, Bertaux L, Christo-Foroux E, Labadie K, Couté Y, Abergel C, Claverie JM. 2018. Diversity and evolution of the emerging Pandoraviridae family. Nat Commun 9:2285. doi: 10.1038/s41467-018-04698-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Legendre M, Alempic JM, Philippe N, Lartigue A, Jeudy S, Poirot O, Ta NT, Nin S, Couté Y, Abergel C, Claverie JM. 2019. Pandoravirus celtis illustrates the microevolution processes at work in the giant Pandoraviridae genomes. Front Microbiol 10:430. doi: 10.3389/fmicb.2019.00430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.NCBI Resource Coordinators. 2018. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46:D8–D13. doi: 10.1093/nar/gkx1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Heidel AJ, Lawal HM, Felder M, Schilde C, Helps NR, Tunggal B, Rivero F, John U, Schleicher M, Eichinger L, Platzer M, Noegel AA, Schaap P, Glöckner G. 2011. Phylogeny-wide analysis of social amoeba genomes highlights ancient origins for complex intercellular communication. Genome Res 21:1882–1891. doi: 10.1101/gr.121137.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Thomas V, Bertelli C, Collyn F, Casson N, Telenti A, Goesmann A, Croxatto A, Greub G. 2011. Lausannevirus, a giant amoebal virus encoding histone doublets. Environ Microbiol 13:1454–1466. doi: 10.1111/j.1462-2920.2011.02446.x. [DOI] [PubMed] [Google Scholar]
- 31.Maumus F, Blanc G. 2016. Study of gene trafficking between acanthamoeba and giant viruses suggests an undiscovered family of amoeba-infecting viruses. Genome Biol Evol 8:3351–3363. doi: 10.1093/gbe/evw260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chelkha N, Levasseur A, Pontarotti P, Raoult D, Scola B, Colson P. 2018. A phylogenomic study of Acanthamoeba polyphaga draft genome sequences suggests genetic exchanges with giant viruses. Front Microbiol 9:2098. doi: 10.3389/fmicb.2018.02098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Duchêne S, Holmes EC. 2018. Estimating evolutionary rates in giant viruses using ancient genomes. Virus Evol 4:vey006. doi: 10.1093/ve/vey006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hughes AL, Irausquin S, Friedman R. 2010. The evolutionary biology of poxviruses. Infect Genet Evol 10:50–59. doi: 10.1016/j.meegid.2009.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.San Martín C, van Raaij MJ. 2018. The so far farthest reaches of the double jelly roll capsid protein fold. Virol J 15:181. doi: 10.1186/s12985-018-1097-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM. 2011. Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae. Proc Natl Acad Sci U S A 108:17486–17491. doi: 10.1073/pnas.1110889108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bankevich A, Nurk S, Antipov D, Gurevich A, Dvorkin M, Kulikov AS, Lesin V, Nikolenko S, Pham S, Prjibelski A, Pyshkin A, Sirotkin A, Vyahhi N, Tesler G, Alekseyev MA, Pevzner P. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Dunn NA, Unni DR, Diesh C, Munoz-Torres M, Harris NL, Yao E, Rasche H, Holmes I, Elsik C, Lewis S. 2019. Apollo: democratizing genome annotation. PLoS Comput Biol 15:e1006790. doi: 10.1371/journal.pcbi.1006790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH. 2015. CDD: NCBI’s conserved domain database. Nucleic Acids Res 43:D222–D226. doi: 10.1093/nar/gku1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Remmert M, Biegert A, Hauser A, Söding J. 2011. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
- 41.Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157. doi: 10.1186/s13059-015-0721-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Katoh K, Misawa K, Kuma K-I, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 44.Rice P, Longden I, Bleasby A. 2000. EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277. doi: 10.1016/S0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
- 45.Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
- 46.Bushnell B, Rood J, Singer E. 2017. BBMerge—accurate paired shotgun read merging via overlap. PLoS One 12:e0185056. doi: 10.1371/journal.pone.0185056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. Mol Biol Evol 32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wickham H. 2016. ggplot2: elegant graphics for data analysis, p 33–74. Springer-Verlag, New York, NY. [Google Scholar]