Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2013 Apr;87(8):4225–4236. doi: 10.1128/JVI.03398-12

Diversity of Virophages in Metagenomic Data Sets

Jinglie Zhou a,b,c, Weijia Zhang b, Shuling Yan d, Jinzhou Xiao a,b, Yuanyuan Zhang b,c, Bailin Li a,b, Yingjie Pan a,b, Yongjie Wang a,b,
PMCID: PMC3624350  PMID: 23408616

Abstract

Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages.

INTRODUCTION

Virophages, a group of circular double-stranded DNA (dsDNA) viruses, are icosahedral in shape and approximately 50 to 100 nm in size (14). Virophages have three unique features (2). First, the nuclear phase is absent during the infection cycle of virophages. Second, the replication of virophages takes place in a viral factory of the giant host DNA viruses. Third, they depend on enzymes from host viruses instead of host cells. Accordingly, virophages are considered to be parasites of giant DNA viruses, e.g., mimiviruses and phycodnaviruses (13). Giant DNA viruses possess huge genome sizes (up to ≈1,259 kb), some of which are even larger than those of certain bacteria (57). The infection and propagation of virophages lead to a significant decrease in host virus particles and, consequently, an increase in host cell survival (13). Additionally, exchanges of genes may occur between virophages and giant DNA viruses (13, 8, 9). Therefore, virophages are potential mediators of lateral gene transfer between large DNA viruses (8, 9).

Thus far, four virophages have been identified in distinct locations (Table 1). The first reported virophage, Sputnik, was isolated from an Acanthamoeba species infected with the large mamavirus in a water-cooling tower in Paris, France (2). The second virophage, Mavirus, was observed in a marine phagotrophic flagellate (Cafeteria roenbergensis) in the presence of the host virus, Cafeteria roenbergensis virus, originating from the coastal waters of Texas (1, 10). The third virophage, Organic Lake virophage (OLV), discovered in a hypersaline meromictic lake in Antarctica, is thought to parasitize large DNA viruses infecting microalgae (3, 11). At the time of this report, a fourth virophage, Sputnik 2, together with its host virus, Lentille, has been detected in the contact lens solution of a patient with keratitis in France (12). The fact that virophages exist in a wide range of virus and eukaryotic hosts, as well as in a variety of unique habitats, implies the possibility that they are more widely distributed and diverse than previously thought.

Table 1.

Features of virophages

Virophage Location Host
Genome
Virus Eukaryote Size (bp) No. of ORFs C+G content (%)
Sputnik A cooling tower in Paris, France Acanthamoeba polyphaga mimivirus A. polyphaga 18,343 21 27.0
Mavirus Coastal waters of Texas Cafeteria roenbergensis virus Marine phagotrophic flagellate (C. roenbergensis) 19,063 20 30.3
OLV Organic Lake, a hypersaline meromictic lake in Antarctica Large DNA viruses Prasinophytes (phototrophic algae) 26,421 26 39.1
Sputnik 2 Contact lens fluid of a patient with keratitis, France Lentille virus A. polyphaga 18,338 20 28.5
YSLV1 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 27,849 26 33.4
YSLV2 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 23,184 21 33.6
YSLV3 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 27,050 23 34.9
YSLV4 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 28,306 34 37.2
ALM Ace Lake in Antarctica mimiviruses? Phagotrophic protozoan? 17,767 22 26.7

To obtain greater insight into the unusual diversity of the global distribution and abundance of virophages, in this study, metagenomic databases on the Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA) 2.0 Portal (https://portal.camera.calit2.net/) (13) were analyzed comprehensively. Four complete genomic sequences of virophages and one nearly complete sequence were assembled based on the metagenomic DNA sequences of Yellowstone Lake, Wyoming, and Ace Lake, Antarctica. Comparative genomics and phylogenetic analyses were performed in order to better understand the genomic sequence features, phylogeny, and evolution of virophages.

MATERIALS AND METHODS

Analysis of metagenomic databases.

The gene sequences of the three known virophages, Sputnik, Mavirus, and OLV (Sputnik 2 was excluded in the analysis since it was a new strain of Sputnik), were downloaded from the NCBI genome database and blasted against the NCBI nr database. The genomic sequence of another Sputnik, strain 3, was also available in GenBank; however, because Sputnik 2 and Sputnik 3 actually have the same sequence, Sputnik 3 was also not included in the analysis. Genes showing blastp hits to virophages only or no hits (E-value<10−5) were considered virophage-specific marker genes and were used to evaluate the global distribution and abundance of virophages. The genes were searched (tblastx, E-value<10−5) against databases of all metagenomic pyrosequencing reads and all Sanger reads on the CAMERA 2.0 Portal. The screened virophage-related sequences were further confirmed based on a blast similarity search against the NCBI nr databases. Mapping of the global distribution pattern of virophages was visualized through MapInfo Professional (version 11.0; Pitney Bowes Software, Inc.). The abundance of virophages is presented as the ratio of the number of virophage-like sequences in a given metagenomic data set and the total number of sequences in that respective data set, normalized to 1,000,000.

Analysis of virophage conserved genes.

All gene sequences of virophages Sputnik, Mavirus, and OLV were compared to the NCBI nr database using both blastp and PSI-BLAST searches (14, 15). Homologous genes shared among these three virophages were considered to be conserved. Their sequence similarities were also proofed based on multiple sequence alignment using MUSCLE (16) on Geneious Pro (version 5.5.7; Biomatters Ltd.).

Assembling of genomic sequences of new virophages.

Major capsid protein (MCP), the homolog of MV18 (Mavirus), V20 (Sputnik), and OLV09 (OLV), was searched (tblastx, E-value<10−5) against all metagenomic pyrosequencing read databases and all Sanger read databases on the CAMERA 2.0 Portal. Sequences significantly similar to these three MCPs were screened, downloaded, and treated as virophage MCP-related sequences. Subsequently, they were assembled to obtain MCP-related contigs. Each contig served as a reference sequence to which all reads from the corresponding metagenomic database were assembled. Once an extended sequence with a relatively longer size and higher coverage was obtained after assembly, it was used as the next reference to assemble all reads from metagenomic databases. This procedure was repeated until the assembled sequence stopped extending. If there was a repeat region of approximately 100 bp at both ends of the sequence obtained, it was eventually self-assembled to a circular DNA sequence. All sequence assemblies were performed using Geneious Pro. The sequence assembly parameters used in this study were a minimum overlap of 25 bp with >90% sequence identity, as well as 50% maximum mismatches per read.

Prediction and annotation of ORFs.

The prediction and annotation of virophage open reading frames (ORFs) followed the procedures described in the literature (17, 18). Each predicted ORF encompassed a start codon of ATG, minimum size of 135 bp, standard genetic code, and a stop codon. The blastp, tblastx, and PSI-BLAST programs were used for sequence similarity comparisons of the predicted ORFs to NCBI nr databases (14, 15). A local database that contained the translated protein sequences of all predicted ORFs in Sputnik, Mavirus, and OLV, as well as the five new virophages described in this study, was also included in the blast search. ORFs were searched for characteristic sequence signatures using the InterProScan program (19).

Phylogenetic analysis.

Amino acid sequences were aligned using MUSCLE (16), and the phylogenetic trees were reconstructed by using PhyML (version 3.0, Méthodes et Algorithmes pour la Bioinformatique, LIRMM, CNRS—Université de Montpellier; http://www.atgc-montpellier.fr/phyml/) (20).

Nucleotide sequence accession numbers.

The genomic sequences of the four Yellowstone Lake virophages (YSLVs) and Ace Lake Mavirus (ALM) have been deposited in GenBank under the accession numbers KC556924 (YSLV1), KC556925 (YSLV2), KC556926 (YSLV3), KC556922 (YSLV4), and KC556923 (ALM).

RESULTS AND DISCUSSION

Diversity of global distribution and abundance of virophages.

The blast similarity search (E-value<10−5) indicated that a total of 44 ORFs turned out to be virophage-specific marker genes, comprising 16 ORFs of Sputnik, 13 of Mavirus, and 15 of OLV (Table 2). These genes were used as query sequences and searched against all metagenomic data deposited in the CAMERA database. The CAMERA database is a web-based analysis portal that allows for depositing, locating, analyzing, visualizing, and sharing microbial data obtained from various environments, such as marine, soil, freshwater, wastewater, hot springs, animal hosts, and other habitats (13). Therefore, the general tendency of the global distribution and abundance of virophages can be predicted according to the virophage-related sequence information of blast hits provided by the CAMERA 2.0 Portal. The search found 1,766 pyrosequencing reads and 204 Sanger reads related to Sputnik, 203 pyrosequencing reads and 253 Sanger reads akin to Mavirus, and more than 50,000 pyrosequencing reads and Sanger reads similar to OLV (see Table S1 in the supplemental material). The redundant reads were incorporated and removed. Finally, 23,599 virophage-related sequences were obtained. Among them, 148 were Mavirus hits, 812 were Sputnik hits, and 22,639 were OLV hits, accounting for 95% of the total sequences associated with virophages (23,599). It appeared that OLV and its relatives were more abundant than Sputnik and Mavirus virophages in the environments.

Table 2.

Virophage-specific genes

Virophage Genes
Sputnik V01, V02, V03, V04, V05, V07, V08, V09, V14, V15, V16, V17, V18, V19, V20, V21
Mavirus MV04, MV05, MV07, MV08, MV09, MV10, MV11, MV12, MV14, MV15, MV16, MV17, MV18
OLV OLV01, OLV02, OLV03, OLV04, OLV05, OLV06, OLV07, OLV08, OLV09, OLV10, OLV11, OLV15, OLV21, OLV24, OLV26

As depicted in Figure 1, virophages were distributed widely throughout the world, including almost all geographical zones. The habitats of virophages were also localized in a variety of environments, ranging from the deep ocean to inland (Fig. 2). The abundance of virophages tended to increase from the ocean to land environments, was the highest in freshwater habitats, and was relatively greater in ocean sediment than in deep seawater (Fig. 2A). As for vertical distribution, in general, virophage abundance decreased with the increase in ocean depth (Fig. 2B). The epipelagic zone seemed to be enriched with virophages. This was probably because this illuminated zone at the surface of the sea is colonized by the most living organisms in the sea. Interestingly, although there is a large difference between the conditions of the abyssopelagic and the mesopelagic zones, it seemed that the numbers of virophage-related sequences observed in these two zones were quite similar (Fig. 2B). Whether real virophage enrichment was present in the abyssopelagic zone or whether it was a result of the virophage-infected host viruses and/or host cells settling to the deep sea remains to be studied further. In terms of geographical zones, the frigid zones turned out to have the greatest abundance of virophages, followed by the tropical zones (Fig. 2C). Obvious limitations and biases of the data deposited in CAMERA exist, and caution should be taken during attempts to interpret the global distribution and abundance of virophages. However, these findings open a new window into further exploration and survey of the diversity of unique virophages worldwide.

Fig 1.

Fig 1

Geographic distribution and corresponding abundance of virophages. Colored dots indicate distinct abundances of virophages in metagenomic data sets obtained from a specific area of latitude and longitude (see Table S1 in the supplemental material). Abundance was normalized to 1,000,000.

Fig 2.

Fig 2

Abundance of virophages in different environments (A), ocean depths (B), and latitudes (C). Abundance was normalized to 1,000,000.

In addition, unexpectedly, a small number of virophage-related sequences was detected in nonaquatic environments, e.g., 65 sequences from the human gut, 11 from animal-associated habitats, 7 from soils, 4 from glacier metagenomes, and 1 from air in the East Coast of Singapore. Thus far, little is known with regard to such unusual diversity (21). Taken together, comparative analyses of metagenomic databases revealed the global distribution and distinct abundance of virophage-related sequences, which suggested that virophages are common entities on Earth. Large-scale sampling and analyses are necessary to obtain a complete picture of the diversity of virophages.

Four complete genomes of Yellowstone Lake virophages and one nearly complete genome of Ace Lake Mavirus.

Major capsid protein is generally considered to be a conserved protein among viruses, and it is widely used to reconstruct phylogenetic trees. It was also conserved in virophages, based on blast sequence similarity searches and sequence alignment. In our study, four complete virophage genomes and one nearly complete virophage genome were obtained from two metagenomic databases named Yellowstone Lake: Genetic and Gene Diversity in a Freshwater Lake and Antarctica Aquatic Microbial Metagenome, which were downloaded from the CAMERA 2.0 Portal. These virophages were tentatively named YSLV1, YSLV2, YSLV3, YSLV4, and ALM. Detailed results of the metagenome assembly, i.e., genome coverage, the number of reads recruited to each genome, and the size of the data sets from which the metagenomes originated, are shown in Table 3; see also Figures S1 and S2 in the supplemental material.

Table 3.

Data on metagenomic assemblies of the five new virophages

Name No. of reads recruited to each genome No. of identical sites Pairwise identity (%) Genome coverage
Size of dataset (Gb)
Mean Minimum Maximum
YSLV1 5,544 22,271 98.0 67.9 16 127 11.1
YSLV2 834 21,453 97.7 13.1 3 27 11.1
YSLV3 1,098 25,529 98.2 15.1 3 35 11.1
YSLV4 1,119 25,732 97.2 14.5 4 32 11.1
ALM 494 13,654 95.4 14.4 4 26 32.4

They were all dsDNA viruses, with G+C contents of 33.4% (YSLV1), 33.6% (YSLV2), 34.9% (YSLV3), 37.2% (YSLV4), and 26.7% (ALM) (Table 1). Their genomes were 27,849 bp in length with 26 predicted ORFs (YSLV1), 23,184 bp with 21 predicted ORFs (YSLV2), 27,050 bp with 23 predicted ORFs (YSLV3), 28,306 bp with 34 predicted ORFs (YSLV4), and 17,767 bp with 22 predicted ORFs (ALM) (Table 1 and Fig. 3 and 4). The YSLVs and OLV were generally alike in genome size, number of ORFs, and G+C content (Table 1).

Fig 3.

Fig 3

Circular maps of the complete genomes of Yellowstone Lake virophages. Homologous genes are indicated in the same color, the five conserved genes are labeled with red asterisks, and the inner circles represent G+C content plots. The dashed-line boxes represent the conserved gene cluster in all eight virophages, the dotted-line boxes represent the gene cluster shared by YSLVs 2, 3, and 4 and OLV, and the dash-dot-dot–line boxes represent the gene cluster present in YSLVs 3 and 4.

Fig 4.

Fig 4

Linear genomic map of ALM and Mavirus. Homologous genes are shown in the same color, while syntenic regions are presented in green, light blue, and orange. The five virophage conserved genes are labeled with red asterisks, and the conserved gene cluster is marked with dashed-line boxes.

Among 126 predicted ORFs from these five new virophages, 59 showed significant similarity to 33 of 67 ORFs of three known virophages, 11 showed similarity to the nucleocytoplasmic large DNA viruses (NCLDs) of eukaryotes (including phycodnaviruses, Marseilleviruses, and mimiviruses), and 3 showed similarity to sequences of unicellular eukaryotic organisms (marine choanoflagellate Monosiga brevicollis and ciliated protozoan Tetrahymena thermophila); 67 ORFs had no sequence hits to current NCBI databases (Table 4). Given that the virus and eukaryotic hosts of the virophages obtained in this study may be the NCLDs and the protists mentioned above (or their associated relatives), it is conceivable that horizontal gene transfer and/or gene recombination occurred between ancestor virophages and their viruses, as well as cellular hosts. Such gene replacement traces have been observed in virophages (Sputnik, Mavirus, and OLV) and their hosts (13). In addition, significant sequence similarity (E-value<10−5) was not detected between virophages and any viruses infecting multicellular organisms, which suggested that virophages diverged early and subsequently underwent a strict and unique evolution with their viruses and unicellular eukaryotic hosts.

Table 4.

ORFs and their homologs predicted in YSLVs and ALM

Virophage, ORF Position
Lengtha
Best blastp hit in GenBank nr database and/or virus data set
NCBI conserved domain (identifier, E-value, alignment length in aa, alignment position [start–end]) Interproscan matches (identifier, E-value[s], alignment position[s] [start–end])
Start End nt aa ORF, protein encoded, or mass Species Accession no. E-value % aa identity Alignment length in aa (position start–end)
YSLV1
    1 1 771 771 256 Hypothetical protein OLV4 Organic Lake virophage ADX05765 2.01E−79 51 246 (1–246) AAA domain (pfam13401, 4.46E−04, 124, 57–158) P loop-containing nucleoside triphosphate hydrolases (SSF52540, 2.0E−6, 49–226)
    2 968 768 201 66
    3 1387 1049 339 112
    4 1475 3775 2,301 766 Putative DNA primase/polymerase Organic Lake virophage ADX05784 7.79E−26 23.9 626 (50–675) P loop-containing nucleoside triphosphate hydrolases (SSF52540, 6.5E−13, 486–663)
    5 3959 3825 135 44 SP (1–17), TM (4–22)
    6 4037 4429 393 130
    7 4563 5129 567 188 VP11 Micromonas pusilla reovirus YP_654554 5.95E−15 24 146 (4–149) TM (39–59 69-91)
    8 5275 8955 3,681 1,226 ATHOOK (PR00929, 6.8E−6, 6.8E−6, 6.8E−6, 345–355, 380–391, 417–427)
    9 9051 9728 678 225 Hypothetical protein MV06 Mavirus YP_004300284 0.004 34 84 (48–131) N-terminal catalytic domain of GIY-YIG intron endonuclease I-TevI, I-BmoI, I-BanI, I-BthII, and similar proteins (cd10437, 3.38E−03, 90, 48–129) Alpha/beta-hydrolases (SSF53474, 1.1E−8, 159–273)
    10 9816 10742 927 308 Hypothetical protein OLV11 Organic Lake virophage ADX05772 2.83E−56 37.3 299 (7–305)
    11 11999 10743 1,257 418
    12 12201 13709 1,509 502
    13 13740 14087 348 115
    14 14199 14354 156 51
    15 14586 14335 252 83
    16 15090 14758 333 110
    17 15260 15084 177 58
    18 15567 15905 339 112
    19 15974 16462 489 162
    20 16755 16588 168 55 SP (1–25)
    21 16765 16935 171 56
    22 20302 16940 3,363 1120 Unnamed protein product Paenibacillus sp. JDR-2 YP_003010191 4.00E−56 25 834 (114–947)
Hypothetical protein Cyanophage NATL2A-133 8.00E−34 31 355 (380–734)
    23 20395 20967 573 190 Hypothetical protein OLV7 Organic Lake virophage ADX05768 8.35E−21 31.6 168 (22–189)
    24 20939 21115 177 58 SP (1–31), TM (5–25)
    25 22991 21120 1,872 623 Major capsid protein Organic Lake virophage ADX05770 1.00E−32 27 579 (1–579)
    26 25733 23133 2,601 866 Putative minor capsid protein Organic Lake virophage ADX05769 4.00E−15 29 205 (643–847) Putative isomerase YbhE (SSF101908, 1.6E−10, 119–433)
    27 27321 25888 1,434 477 Putative minor capsid protein Organic Lake virophage ADX05769 2.00E−15 34 125 (332–456)
    28 27741 27427 315 104 Tlr 6Fp protein Tetrahymena thermophila AF451864_6 5.28E−10 32.5 80 (25–104)
Hypothetical protein MAR_433 Marseillevirus YP_003407157 7.00E−13 35 78 (9–86)
YSLV2
    1 1 765 765 254 Hypothetical protein OLV4 Organic Lake virophage ADX05765 1.30E−48 40.8 242 (3–244)
    2 807 1322 516 171 Hypothetical protein PRSM4_062 Prochlorococcus phage P-RSM4 YP_004323191 1.85E−08 40 70 (12–81)
    3 1570 1319 252 83
    4 1691 2725 1,035 344 Hypothetical protein OLV11 Organic Lake virophage ADX05772 2.32E−40 33.9 297 (19–315)
    5 2839 4758 1,920 639 Putative ATP-dependent RNA helicase Acanthamoeba polyphaga mimivirus YP_003987051 2.63E−32 29.8 444 (179–622) DEAD-like helicases superfamily (smart00487, 4.48E−10, 177, 174–350) Helicase_C (PF00271, 2.4E−6, 551–606)
SNF2 family N-terminal domain (pfam00176, 1.19E−09, 238, 182–419) DEAD-like helicases superfamily (SM00487, 0.0013, 175–378)
Helicase superfamily C-terminal domain (cd00079, 1.52E−04, 145, 461–605) P loop-containing nucleoside triphosphate hydrolases (SSF52540, 8.7E−23, 2.3E−20, 137–378, 385–636)
    6 4992 7772 2,781 926 C terminus: hypothetical protein 162275902 Organic Lake phycodnavirus 2 ADX06405 6.07E−07 50 66 (834–899) Methyltransferase domain (pfam13659, 2.52E−07, 111, 151–261) S-Adenosyl-l-methionine-dependent methyltransferases (SSF53335, 3.3E−10, 156–307)
    7 7862 8086 225 74 TM (34–52)
    8 8538 8164 375 124
    9 9182 8628 555 184
    10 9656 12484 2,829 942 Helicase Acanthamoeba castellanii mamavirus AEQ60154 4.43E−44 30.5 465 (295–759) Origin of replication binding protein (pfam02399, 7.47E−08, 148, 353–500) P loop-containing nucleoside triphosphate hydrolases (SSF52540, 2.5E−8, 276–521)
    11 12724 14148 1,425 474 DEAD-like helicases superfamily (smart00487, 2.98E−06, 179, 341–519)
    12 14260 14847 588 195 Hypothetical protein OLV7 Organic Lake virophage ADX05768 1.07E−15 29.7 161 (32–192) SP (1–30), TM (15–35, 41–56, 77–97)
    13 15207 14911 297 98
    14 15192 16394 1,203 400 Putative minor capsid protein Organic Lake virophage ADX05769 3.00E−19 24.9 393 (1–393)
    15 16489 18243 1,755 584 Putative capsid protein V20 Sputnik virophage YP_002122381 2.15E−37 26.1 554 (10–563)
    16 18726 18424 303 100 SP (1–19), TM (4–22)
    17 18826 19386 561 186
    18 19764 20378 615 204 Hypothetical protein MV08 Mavirus YP_004300286 0.46 32 96 (88–183)
    19 20460 21449 990 329
    20 21536 21841 306 101 Hypothetical protein Paramecium bursaria chlorella virus 1 NP_048469 3.43E−11 33.8 80 (7–86)
    21 21902 23116 1215 404 N terminus: hypothetical protein OLV5 Organic Lake virophage ADX05766 1.24E−08 24.6 197 (25–221)
YSLV3
    1 1 765 765 254 Hypothetical protein OLV4 Organic Lake virophage ADX05765 7.58E−54 40 248 (4–251)
    2 950 762 189 62
    3 976 1308 333 110
    4 1772 1311 462 153 SP (1–18)
    5 2528 2010 519 172 Hypothetical protein OLV7 Organic Lake virophage ADX05768 4.81E−14 30.5 164 (5–168)
    6 3442 2606 837 278 Hypothetical protein OLV12 Organic Lake virophage ADX05773 6.04E−27 55.7 113 (135–247)
    7 4020 3523 498 165 Hypothetical protein HMPREF9628_01282 Eubacteriaceae bacterium CM5 ZP_09316646 2.85E−13 32.2 158 (7–164) Site-specific DNA methylase (COG0338, 1.22E−16, 160, 5–164)
Putative modification methylase DpnIIA Clostridium phage phiSM101 YP_699979 3.00E−11 32 160 (5–164) DNA adenine methylase (TIGR00571, 6.15E−11, 154, 5–158) S-Adenosyl-l-methionine-dependent methyltransferases (SSF53335, 5.6E−21, 5–165)
    8 6254 3954 2,301 766 D12 class N6 adenine-specific DNA methyltransferase (pfam02086, 2.44E−07, 152, 13–164)
    9 7249 6317 933 310 Hypothetical protein OLV11 Organic Lake virophage ADX05772 1.35E−48 39.4 285 (16–300)
    10 7715 7311 405 134
    11 10417 7820 2,598 865 D5-ATPase-helicase, partial Moumouvirus ochan AEY99298 1.27E−33 29.8 426 (287–712) Phage/plasmid primase, P4 family, C-terminal domain (TIGR01613, 1.40E−22, 293, 495–787) D5_N (PF08706, 1.4E−11, 390–542)
D5 N terminus-like (pfam08706, 2.13E−06, 89, 434–522) PriCT_2 (PF08707, 1.2E−4, 279–345)
    12 11103 10600 504 167 Hypothetical protein Mavirus YP_004300284 5.79E−02 47.2 51 (24–74) Phage-associated DNA primase (COG3378, 1.08E−22, 379, 394–772)
    13 11975 11307 669 222
    14 12033 14423 2,391 796
    15 14450 14650 201 66 SP (1–25), TM (5–25)
    16 14726 17311 2,586 861
    17 17402 17722 321 106
    18 18065 19318 1,254 417 Putative minor capsid protein Organic Lake virophage ADX05769 4.26E−12 25.2 386 (26–411)
    19 19412 21148 1,737 578 Putative capsid protein V20 Sputnik virophage YP_002122381 4.79E−39 26.8 544 (9–552)
    20 21761 22132 372 123
    21 22204 23868 1,665 554 C terminus: hypothetical protein OLV10 Organic Lake virophage ADX05771 0.13 27 133 (421–553)
    22 23942 26155 2,214 737
    23 26196 27023 828 275 Hypothetical protein OLV5 Organic Lake virophage ADX05766 1.98E−13 27.6 190 (26–215)
YSLV4
    1 1 768 768 255 Hypothetical protein OLV4 Organic Lake virophage ADX05765 3.10E−67 44 248 (1–248)
    2 831 1922 1092 363
    3 1982 2317 336 111
    4 2341 3300 960 319 Ribonucleoside-diphosphate reductase small subunit Phaeocystis globosa virus 12T AET72957 3.39E−136 60.5 317 (3–319) Ribonucleotide reductase, R2/beta subunit, ferritin-like diiron-binding domain (cd01049, 9.46E−102, 274, 13–286) Ribonuc_red_sm (PF00268, 1.0E−98, 7–283)
Ferritin-like (SSF47240, 3.1E−105, 1–303)
    5 3337 3489 153 50 SP (1–43), TM (25–45)
    6 3496 3687 192 63 SP (1–21), TM (5–25)
    7 3730 4248 519 172 Hypothetical protein PRSM4_062 Prochlorococcus phage P-RSM4 YP_004323191 7.07E−09 43.4 76 (12–87) N6_MTASE (PS00092, −1.0, 76–82)
    8 4638 4339 300 99
    9 4820 4680 141 46
    10 5196 5525 330 109
    11 5700 8342 2,643 880 C terminus: D5-like helicase-primase Marseillevirus YP_003407183 9.04E−20 28.2 218 (474–691) VirE (PF05272, 4.4E−5, 554–695)
PriCT_2 (PF08707, 1.1E−4, 300–375)
    12 8402 8551 150 49 TM (14–32)
    13 8602 8802 201 66
    14 8836 9816 981 326 Hypothetical protein OLV11 Organic Lake virophage ADX05772 7.68E−59 42.1 296 (11–306)
    15 9862 10302 441 146
    16 10874 10299 576 191 Hypothetical protein OLV7 Organic Lake virophage ADX05768 8.06E−27 37.2 147 (37–183)
    17 10991 11410 420 139
    18 11412 11855 444 147
    19 11880 12311 432 143
    20 12462 12662 201 66
    21 12890 14074 1,185 394 Putative minor capsid protein Organic Lake virophage ADX05769 3.20E−40 31.2 359 (27–385)
    22 14173 16026 1,854 617 Major capsid protein Organic Lake virophage ADX05770 6.10E−48 28 569 (2–570)
    23 16143 18170 2,028 675
    24 18232 19557 1,326 441
    25 21005 19740 1,266 421 C terminus: hypothetical protein OLV12 Organic Lake virophage ADX05773 8.69E−38 42.2 201 (209–409) Alpha/beta-hydrolases (SSF53474, 1.3E−9, 238–360)
N terminus: hypothetical protein OLV12 Organic Lake virophage ADX05773 2.19E−19 36 194 (14–207)
    26 21151 21834 684 227 C terminus: hypothetical protein OLV2 Organic Lake virophage ADX05763 9.46E−10 41.1 91 (137–227)
    27 22466 21915 552 183
    28 23048 22686 363 120
    29 24066 23677 390 129
    30 24923 24360 564 187
    31 25230 25613 384 127
    32 25909 26466 558 185 N-Acetyltransferase GCN5 Clostridium phytofermentans ISDg YP_001560895 1.03E−04 27.7 82 (67–148) Acetyltransferase family (pfam00583, 1.11E−09, 85, 62–146) Acetyltransf_1 (PF00583, 2.5E−10, 67–146)
    33 26677 27198 522 173 Acyl-coenzyme A N-acyltransferases (Nat) (SSF55729, 1.2E−12, 48–168)
    34 27327 28304 978 325 Hypothetical protein OLV5 Organic Lake virophage ADX05766 3.04E−25 28.4 305 (14–318)
ALM
    1 706 840 135 44
    2 890 2551 1,662 553 Hypothetical protein Monosiga brevicollis MX1 XP_001743771 5.06E−37 28.9 363 (8–370) Phage/plasmid primase, P4 family, C-terminal domain (TIGR01613, 1.80E−05, 246, 163–408) Winged helix DNA-binding domain (SSF46785, 7.9E−6, 366–441)
Highly derived D5-like helicase-primase Marseillevirus YP_003406787 5.51E−16 26.4 283 (144–426) P loop-containing nucleoside triphosphate hydrolases (SSF52540, 2.2E−6, 126–294)
    3 2868 2593 276 91 Putative RVE superfamily integrase Mavirus YP_004300280 3.63E−08 39.4 91 (1–91) Chromo (chromatin organization modifier) domain (pfam00385, 8.54E−03, 21, 56–76) Chromatin organization modifier domain (SM00298, 8.0E−10, 35–90)
    4 3689 3033 657 218 Putative RVE superfamily integrase Mavirus YP_004300280 2.21E−36 41.1 192 (1–192) Integrase core domain (pfam00665, 4.94E−07, 114, 46–159) Rve (PF00665, 8.4E−16, 48–157) RNase H-like
(SSF53098, 1.6E−18, 38–209)
    5 3773 5533 1,761 586 Putative protein-primed B-family DNA polymerase Mavirus YP_004300281 8.21E−88 35 536 (47–582) DNA polymerase type B, organellar and viral (pfam03175, 1.34E−03, 224, 291–514) DNA/RNA polymerases (SSF56672, 1.2E−19, 290–569)
    6 5578 5850 273 90 Hypothetical protein MV04 Mavirus YP_004300282 5.9 31 45 (2–46)
    7 5884 6315 432 143
    8 8027 6366 1,662 553 Putative major capsid protein MV18 Mavirus YP_004300296 1.18E−76 31.3 522 (3–524)
    9 8947 8057 891 296 Hypothetical protein MV17 Mavirus YP_004300295 1.85E−65 42.1 291 (1–291)
    10 9511 8984 528 175 Putative cysteine protease MV16 Mavirus YP_004300294 1.25E−54 52.2 175 (1–175)
    11 10541 9537 1,005 334 Putative FtsK-HerA family ATPase MV15 Mavirus YP_004300293 7.25E−80 50 266 (69–334) AAA_17 (PF13207, 1.1E−6, 95–228)
    12 11354 10566 789 262 Hypothetical protein MV14 Mavirus YP_004300292 5.20E−55 43.6 259 (1–259)
    13 11406 12266 861 286 N terminus: hypothetical protein MV13 Mavirus YP_004300291 9.03E−06 29.5 129 (1–129)
    14 12160 12930 771 256 Hypothetical protein MV12 Mavirus YP_004300290 1.88E−69 52.6 196 (57–252) TM (34–52)
    15 12956 13714 759 252
    16 13774 14406 633 210 Hypothetical protein MV09 Mavirus YP_004300287 0.21 29 97 (103–199)
    17 14430 15020 591 196 Hypothetical protein MV08 Mavirus YP_004300286 1.02E−12 48.3 84 (112–195)
    18 15794 15060 735 244 Hypothetical protein MV13 Mavirus YP_004300291 7.52E−13 29.1 198 (29–226) Lipase (class3) (cd00519, 3.78E−08, 123, 96–218) Lipase_3 (PF01764, 2.1E−4, 132–162)
    19 15806 16093 288 95 Alpha/beta-hydrolases (SSF53474, 5.1E−10, 95–178)
    20 16261 17145 885 294
    21 17400 17224 177 58
    22 17579 17445 135 44
a

aa, amino acids.

Conserved genes of virophages.

Based on a blastp and PSI-BLAST search against NCBI nr databases and a local database comprising all ORFs of eight virophages (five in this study and three published), five genes were found to be present in all eight virophages (Table 5). They were putative FtsK-HerA family DNA packaging ATPase and genes encoding putative DNA helicase/primase (HEL/PRIM), putative cysteine protease (PRSC), putative MCP, and putative minor capsid protein (mCP). These four genes had blastp hits to virophage genes only (E-value<10−1), with the exception of HEL/PRIM (Table 4). Sequence alignment of these four proteins also revealed unambiguous similarity of amino acids (data not shown). Hence, it is reasonable to define them as virophage conserved core genes. The HEL/PRIM homolog was predicted according to either functional domains or sequence similarity, since significant sequence similarity was undetectable among some virophage species (Table 4).

Table 5.

Gene homologues present in virophages

Gene product ORF(s) (size in aaa) in indicated virophage
YSLV1 YSLV2 YSLV3 YSLV4 OLV Sputnik ALM Mavirus
Putative FtsK-HerA family ATPase 01 (256) 01 (254) 01 (254) 01 (255) 04 (256) 03 (245) 11 (334) 15 (310)
Putative DNA helicase/primase/polymerase 04 (766) 10 (942) 11 (865) 11 (880) 25 (777) 13 (779) 02 (553) 01 (652)
Putative GIY-YIG endonuclease 09 (225) 12 (167) 24 (129) 14 (114) 06 (165)
Hypothetical protein 10 (308) 04 (344) 09 (310) 14 (326) 11 (298)
Putative cysteine protease 23 (190) 12 (195) 05 (172) 16 (191) 07 (190) 09 (175) 10 (175) 16 (189)
Putative major capsid protein 25 (623) 15 (584) 19 (578) 22 (617) 09 (576) 20 (595) 08 (553) 18 (606)
Putative minor capsid protein 27 (477), 26 (866) 14 (400) 18 (417) 21 (394) 08 (389) 18 (167), 19 (218) 09 (296) 17 (303)
Hypothetical protein 28 (104) 20 (101) 03 (110) 26 (227) 02 (123)
Hypothetical protein 02 (171) 07 (172)
Hypothetical protein 09 (184) 07 (143)
Hypothetical protein 18 (204) 17 (196) 08 (122)
Hypothetical protein 21 (404) 23 (275) 34 (325) 05 (290) 21 (438)
Hypothetical protein 06 (278) 25 (421) 12 (347)
Hypothetical protein 10 (134) 17 (139)
Hypothetical protein 21 (554) 10 (236) 12 (262) 14 (271)
Putative rve superfamily integrase 03 (91), 04 (218) 02 (358)
Putative protein-primed B-family DNA polymerase 05 (586) 03 (617)
Hypothetical protein 06 (90) 04 (112)
Hypothetical protein 13 (286), 18 (244) 13 (712)
Hypothetical protein 14 (256) 12 (211)
Hypothetical protein 16 (210) 09 (190)
a

nt, nucleotides; aa, amino acids.

Besides these five conserved genes, the four YSLVs shared two other homologous genes with unknown functions, which were present in the OLV as well, but not in Sputnik, Mavirus, or ALM (Table 5 and Fig. 5). Interestingly, in all four YSLVs, homolog counterparts of the conserved genes of ATPase, PRSC, and mCP always showed the highest sequence similarity to that in OLV (Table 4); their second and third matches were strictly in the order of Sputnik and Mavirus. In most cases, their blast E-values were >10−5 for Mavirus hits but <10−10 for Sputnik hits. Taken together, these results suggested that the YSLVs were more closely related to OLV than to Sputnik and that they were distantly related to Mavirus.

Fig 5.

Fig 5

Numbers of homologous genes shared among OLV and YSLVs.

The evolutionary relationship between Mavirus and ALM was evident, as they shared 13 homologous genes (Table 4 and Fig. 4). Among them, five were virophage conserved genes, three encoded putative GIY-YIG endonuclease, putative rve (integrase core domain) superfamily integrase, and putative protein-primed B-family DNA polymerase, and five were functionally unknown. Furthermore, three syntenic regions existed between Mavirus and ALM (Fig. 4); however, two of these regions ran in opposite directions in the two virophages (Fig. 4).

Conserved gene clusters.

In this study, a gene cluster (or order) was considered to be several adjacent genes whose arrangement was conserved in some virophages; if present in all eight virophages, it was defined as a conserved gene cluster. As shown in Figures 3 and 4, a conserved gene cluster, comprised of the two conserved genes MCP and mCP, was present in all eight virophages. YSLVs 2, 3, and 4 and OLV shared a gene cluster consisting of the core gene ATPase and an ORF of unknown function. Furthermore, a gene cluster of the conserved PRIM/HEL gene and an ORF with unknown function was detected in YSLVs 3 and 4, and Mavirus and ALM had three gene clusters in common.

Phylogeny and evolution.

Three virophage core genes, encoding ATPase, PRSC, and MCP, were used to reconstruct the phylogenetic tree. As shown in Figure 6, three phylogenetic affiliation groups were observed. YSLVs and OLV seemed to form a group of closely related virophages, and Mavirus and ALM were apparently derived from a common ancestor, whereas Sputnik was an orphaned group. Such phylogenetic clustering of virophages was in agreement with the findings of the physical features of genomic DNA molecules, conserved genes, and gene orders as mentioned above. In addition, the phylogenetic trees of MCP and PRSC suggested that YSLVs were much closer to each other than to OLV (Fig. 6). This observation was consistent with the local tblastx results (search against a local database containing all ORFs of the eight virophages) that the best MCP hits of YSLVs were always themselves. Although it was impossible to shed light on the evolutionary relationship between these four YSLVs based on the current data, YSLVs 3 and 4 appeared to be the closest relatives. They were sister lineages on the MCP tree supported by a 70% bootstrap value (Fig. 6), shared the largest number of homologous genes (10) (Fig. 5), and had the highest number of gene clusters (three) (Fig. 3).

Fig 6.

Fig 6

Unrooted phylogenetic trees of DNA packaging ATPases (A), cysteine proteases (B), and major capsid proteins (C) of virophages. The five new virophages are shown in boldface. The numbers at the branches represent bootstrap values.

Habitat diversity of virophages.

Though they were more closely related to each other than to any other dsDNA viruses known so far, the habitats of these virophages were extremely diverse. Mavirus was from the coastal waters of Texas (1). Its closest relative ALM, however, was discovered in a hypersaline meromictic lake, Ace Lake (68°28′49″S, 78°11′19″E), in Antarctica. This lake is covered with ice for as long as 11 months to an entire year, with an average temperature of approximately 0°C (22). OLV was also found in the neighboring Organic Lake in Antarctica (3). In contrast, YSLVs, close to OLV, were found in a freshwater lake (Yellowstone Lake) with a temperature ranging from 12 to 73°C in Yellowstone National Park, Wyoming (23). Hence, these results indicated that virophages have adapted to habitats with a wide range of temperature variations.

In conclusion, the distinct abundance and global distribution of virophages, including almost all geographical zones as well as a variety of environments (ranging from the deep ocean to inland and iced to hydrothermal lakes), indicated that virophages appear to be widespread and genetically diverse, with at least three major lineages. Moreover, the overall low sequence similarity between the shared homologous genes in virophages and their distant phylogenetic relationships suggested that the genetic diversity of virophages is far beyond what we know thus far.

Supplementary Material

Supplemental material

ACKNOWLEDGMENTS

This work was supported by The Program for Professor of Special Appointment (Eastern Scholar) grant 20101222 from Shanghai Institutions of Higher Learning, Shanghai Talent Development Fund grant 2011010 from Shanghai Municipal Human Resources and Social Security Bureau, and Science and Technology Development Program grant 10540503000 from Shanghai Municipal Science and Technology Commission, China.

We thank two anonymous reviewers for their insightful comments on the manuscript.

Footnotes

Published ahead of print 13 February 2013

Supplemental material for this article may be found at http://dx.doi.org/10.1128/JVI.03398-12.

REFERENCES

  • 1. Fischer MG, Suttle CA. 2011. A virophage at the origin of large DNA transposons. Science 332:231–234 [DOI] [PubMed] [Google Scholar]
  • 2. La Scola B, Desnues C, Pagnier I, Robert C, Barrassi L, Fournous G, Merchat M, Suzan-Monti M, Forterre P, Koonin E, Raoult D. 2008. The virophage as a unique parasite of the giant mimivirus. Nature 455:100–104 [DOI] [PubMed] [Google Scholar]
  • 3. Yau S, Lauro FM, DeMaere MZ, Brown MV, Thomas T, Raftery MJ, Andrews-Pfannkoch C, Lewis M, Hoffman JM, Gibson JA, Cavicchioli R. 2011. Virophage control of Antarctic algal host-virus dynamics. Proc. Natl. Acad. Sci. U. S. A. 108:6163–6168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Sun S, La Scola B, Bowman VD, Ryan CM, Whitelegge JP, Raoult D, Rossmann MG. 2010. Structural studies of the Sputnik virophage. J. Virol. 84:894–897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM. 2011. Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae. Proc. Natl. Acad. Sci. U. S. A. 108:17486–17491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306:1344–1350 [DOI] [PubMed] [Google Scholar]
  • 7. Van Etten JL, Lane LC, Dunigan DD. 2010. DNA viruses: the really big ones (giruses). Annu. Rev. Microbiol. 64:83–99 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Claverie JM, Abergel C. 2009. Mimivirus and its virophage. Annu. Rev. Genet. 43:49–66 [DOI] [PubMed] [Google Scholar]
  • 9. Raoult D, Boyer M. 2010. Amoebae as genitors and reservoirs of giant viruses. Intervirology 53:321–329 [DOI] [PubMed] [Google Scholar]
  • 10. Fischer MG, Allen MJ, Wilson WH, Suttle CA. 2010. Giant virus with a remarkable complement of genes infects marine zooplankton. Proc. Natl. Acad. Sci. U. S. A. 107:19508–19513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Dunigan DD, Fitzgerald LA, Van Etten JL. 2006. Phycodnaviruses: a peek at genetic diversity. Virus Res. 117:119–132 [DOI] [PubMed] [Google Scholar]
  • 12. Desnues C, La Scola B, Yutin N, Fournous G, Robert C, Azza S, Jardot P, Monteil S, Campocasso A, Koonin EV, Raoult D. 2012. Provirophages and transpovirons as the diverse mobilome of giant viruses. Proc. Natl. Acad. Sci. U. S. A. 109:18078–18083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, Wooley J. 2011. Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource. Nucleic Acids Res. 39:D546–D551 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schaffer AA, Yu YK. 2005. Protein database searches using compositionally adjusted substitution matrices. FEBS J. 272:5101–5109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113 doi:10.1186/1471-2105-5-113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wang Y, Bininda-Emonds OR, van Oers MM, Vlak JM, Jehle JA. 2011. The genome of Oryctes rhinoceros nudivirus provides novel insight into the evolution of nuclear arthropod-specific large circular double-stranded DNA viruses. Virus Genes 42:444–456 [DOI] [PubMed] [Google Scholar]
  • 18. Wang Y, Kleespies RG, Huger AM, Jehle JA. 2007. The genome of Gryllus bimaculatus nudivirus indicates an ancient diversification of baculovirus-related nonoccluded nudiviruses of insects. J. Virol. 81:5395–5406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. 2005. InterProScan: protein domains identifier. Nucleic Acids Res. 33:W116–W120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Guindon S, Lethiec F, Duroux P, Gascuel O. 2005. PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33:W557–W559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Parola P, Renvoise A, Botelho-Nevers E, La Scola B, Desnues C, Raoult D. 2012. Acanthamoeba polyphaga mimivirus virophage seroconversion in travelers returning from Laos. Emerg. Infect. Dis. 18:1500–1502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Coolen MJL, Hopmans EC, Rijpstra WIC, Muyzer G, Schouten S, Volkman JK, Sinninghe Damsté JS. 2004. Evolution of the methane cycle in Ace Lake (Antarctica) during the Holocene: response of methanogens and methanotrophs to environmental change. Org. Geochem. 35:1151–1167 [Google Scholar]
  • 23. Clingenpeel S, Macur RE, Kan J, Inskeep WP, Lovalvo D, Varley J, Mathur E, Nealson K, Gorby Y, Jiang H, LaFracois T, McDermott TR. 2011. Yellowstone Lake: high-energy geochemistry and rich bacterial diversity. Environ. Microbiol. 13:2172–2185 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES