Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2024 Oct 15;98(11):e00361-24. doi: 10.1128/jvi.00361-24

Genomics and evolutionary analysis of Chlorella variabilis-infecting viruses demarcate criteria for defining species of giant viruses

João Victor R P Carvalho 1, Roger M Carlson 2,3, Jayadri Ghosh 2,3, Victória F Queiroz 1, Ellen G de Oliveira 1, Bruna B Botelho 1, Clécio A C Filho 1, Irina V Agarkova 2,3, O William McClung 4, James L Van Etten 2,3, David D Dunigan 2,3,, Rodrigo A L Rodrigues 1,
Editor: Kristin N Parent5
PMCID: PMC11575271  PMID: 39404263

ABSTRACT

Chloroviruses exhibit a close relationship with their hosts with the phenotypic aspect of their ability to form lytic plaques having primarily guided the taxonomy. However, with the isolation of viruses that are only able to complete their replication cycle in one strain of Chlorella variabilis, systematic challenges emerged. In this study, we described the genomic features of 53 new chlorovirus isolates and used them to elucidate part of the evolutionary history and taxonomy of this clade. Our analysis revealed new chloroviruses with the largest genomes to date (>400 kbp) and indicated that four genomic features are statistically different in the viruses that only infect the Syngen 2–3 strain of C. variabilis (OSy viruses). We found large regions of dissimilarity in the genomes of viruses PBCV-1 and OSy-NE5 when compared with the other genomes. These regions contained genes related to the interaction with the host cell machinery and viral capsid proteins, which provided insights into the evolution of the replicative and structural modules in these giant viruses. Phylogenetic analysis using hallmark genes of Nucleocytoviricota revealed that OSy-viruses evolved from the NC64A-viruses, possibly emerging as a result of the strict relationship with their hosts. Merging phylogenetics and nucleotide identity analyses, we propose strategies to demarcate viral species, resulting in seven new species of chloroviruses. Collectively, our results show how genomic data can be used as lines of evidence to demarcate viral species. Using the chloroviruses as a case study, we expect that similar initiatives will emerge using the basis exhibited here.

IMPORTANCE

Chloroviruses are a group of giant viruses with long dsDNA genomes that infect different species of Chlorella-like green algae. They are host-specific, and some isolates can only replicate within a single strain of Chlorella variabilis. The genomics of these viruses is still poorly explored, and the characterization of new isolates provides important data on their genetic diversity and evolution. In this work, we describe 53 new chlorovirus genomes, including many isolated from alkaline lakes for the first time. Through comparative genomics and molecular phylogeny, we provide evidence of genomic gigantism in chloroviruses and show that a subset of viruses became highly specific for their hosts at a particular point in evolutionary history. We propose criteria to demarcate species of chloroviruses, paving the way for an update in the taxonomy of other groups of viruses. This study is a new and important piece in the complex puzzle of giant algal viruses.

KEYWORDS: giant viruses, chlorella viruses, Phycodnaviridae, Chlorovirus, genomic evolution, host range, alkaline lakes, virus species

INTRODUCTION

Chlorovirus is a genus belonging to the family Phycodnaviridae whose representatives infect green algae from the family Chlorellaceae, including the Chlorella and Micractinium genera that are commonly found in inland waters (1). These algae, also called zoochlorellae, are known for their symbiotic relation with protozoans, such as Paramecium bursaria (2) and Acanthocystis turfacea (3), and with metazoans, like Hydra viridis (4).

The systematics of the genus Chlorovirus is currently based on the host species or strain with which the virus can form lytic plaques, given the strong specificity of these viruses for the algae they infect (5). There are four clades identified within the taxon to date, including NC64A viruses, which infect Chlorella variabilis NC64A (6); Only-Syngen (hereinafter “OSy”) viruses, which infect only C. variabilis Syngen 2–3 (5); SAG viruses, which infect Chlorella heliozoae (3); and Pbi viruses, which infect Micractinium conductrix (7). A phylogenetic analysis based on 47 concatenated genes supports part of this classification (5), but the relationships between OSy-viruses and NC64A viruses need to be better studied. The genus has 19 “species” currently described and recognized by the International Committee on Taxonomy of Viruses (ICTV), distributed among the different clades, such as the Paramecium bursaria chlorella virus 1 (PBCV-1), a NC64A virus (8), and Acanthocystis turfacea chlorella virus 1 (ATCV-1), a SAG virus (9). It is important to notice that these “species” are indeed viral isolates, with only the genus classification performed, and the criteria for species demarcation have not formally been established for these algae viruses.

Chloroviruses have a linear, double-strand DNA (dsDNA) genome with sizes ranging from 290 to 370 thousand base pairs (kbp), containing 330 to 415 coding sequences (CDS) (10). The GC content of these genomes ranges from 40% to 52% and is highly conserved among viruses infecting the same host. These viruses have been isolated mainly from freshwater environments, including rivers and ponds. So far, a total of 42 chloroviruses genomes are available in public databases. Genomic analysis including 36 of these viruses distributed among the three clades supports a total of 1,345 clusters of orthologous groups of genes (COGs) and an open pan-genome, indicating that the isolation of new viruses from unexplored regions could bring important genetic novelties (11). In the past few years, many new chloroviruses have been isolated from alkaline lakes of the Nebraska Sandhills (USA) using different host species. In this study, we provide the genomic landscape of chloroviruses associated with four strains of C. variabilis. Using phylogenetics and comparative genomics on 68 viruses, including 53 being described for the first time, we provide comprehensive analyses of the genomic data for this group of chloroviruses and propose an evolutionary scenario for the OSy-viruses. Ultimately, our data led us to discuss the ecology of chloroviruses in extreme environments and demarcate criteria for species of viruses and a reclassification proposal for the genus Chlorovirus.

RESULTS

New isolates reveal giant genomes in genus Chlorovirus

Fifty-three new viruses were sequenced and included in the analyses, in addition to the 15 chlorovirus genomes already described (Table 1). The assemblies generated a set of contigs and scaffolds ranging in size from 290,838 bp, as seen in isolate OSyNE-5A-L1, to 409,489 bases in isolate O-NE-23, with an average size of 364,166 bp (see Table S1 at https://doi.org/10.6084/m9.figshare.25822657). The N50 found in most of the cases is equal to the corresponding genome size because it was the only contig generated by the software during the process, revealing a very high quality of genome sequencing. Some isolates necessitated the genomeSize parameter be modified to obtain a high-quality assembly. Isolates 41-NE-6, O-NE-14, and O-NE-23 had the parameter set to 500 k, and isolate O-NE-25 had this parameter set to 600 k. For isolates 41-NE-6 and O-NE-25, the online software MeDuSa (12) was used to generate a single scaffold from the multiple contigs obtained from the assembly.

TABLE 1.

General genomic features of chloroviruses associated with C. variabilis

Virus Genome size (bp) CDS count tRNA count GC content (%) Host straina GenBank accession
IL-5–2s1 345,255 377 9 40.19 NC64A JX997170
NYs-1 348,463 378 9 40.71 NC64A NC_043235
MA-1D 339,653 368 12 40.69 NC64A JX997172
WNE-10B-S1 373,376 403 11 40.87 NC64A PP681875
NY-2B 344,863 370 9 40.53 NC64A JX997182
AR-158 344,691 388 7 40.76 NC64A NC_009899
NY-2A 368,683 412 8 40.69 NC64A NC_009898
NC-1A 337,446 397 9 39.77 NC64A PP681879
SH-6A 326,843 376 9 39.92 NC64A PP681881
XZ-6E 333,030 386 9 39.78 NC64A PP681885
XZ-5C 321,664 377 9 39.89 NC64A PP681884
PBCV-1 330,611 377 11 39.97 NC64A NC_000852
XZ-3A 328,655 377 12 39.88 NC64A PP681882
40-NE-4 365,757 409 12 39.99 NIES 2540 PP681863
41-NE-5 355,747 402 12 39.88 NIES 2541 PP681866
40-NE-3 364,166 411 12 39.92 NIES 2540 PP681862
NE-JV-4 328,315 351 11 40.04 NC64A JX997179
IL-3A 323,497 344 13 40.19 NC64A JX997169
WNE-11A-L2 339,491 365 11 40.15 NC64A PP681876
AN69C 332,309 361 11 40.12 NC64A JX997153
CA-4B 302,002 355 12 40.45 NC64A PP681878
MA-1E 339,391 386 14 40.26 NC64A JX997173
CvsA1 311,223 338 14 40.29 NC64A JX997165
CviKI 309,195 334 14 40.36 NC64A JX997162
XZ-4C 305,821 365 13 40.55 NC64A PP681883
KS1B 287,769 319 13 40.39 NC64A JX997171
NY-2C 323,145 373 10 40.23 NC64A PP681880
CA-4A 313,306 367 10 40.44 NC64A PP681877
O-NE-18 386,133 432 9 42.55 Syngen 2–3 PP681894
O-NE-25 408,691 438 9 42.32 Syngen 2–3 PP681900
O-NE-29 385,991 433 10 42.23 Syngen 2–3 PP681904
O-NE-27 374,307 411 10 42.32 Syngen 2–3 PP681902
NE-O-7-s 353,969 399 9 42.22 Syngen 2–3 PP681905
O-NE-19 385,571 440 7 42.33 Syngen 2–3 PP681895
O-NE-23 409,489 463 6 42.26 Syngen 2–3 PP681898
O-NE-22 409,110 456 7 42.19 Syngen 2–3 PP681897
OSyNE-5B-M2 365,554 400 12 42.08 Syngen 2–3 PP681912
OSyNE-4B-M2 371,253 404 12 42.13 Syngen 2–3 PP681909
OSy-NE5 327,147 357 14 42.37 Syngen 2–3 NC_032001
OSyNE-5B-S1 383,466 427 12 42.27 Syngen 2–3 PP681913
O-NE-10 355,565 402 14 42.11 Syngen 2–3 PP681886
NE-O-9-L 338,249 384 12 41.64 Syngen 2–3 PP681907
O-NE-24 351,825 394 13 41.83 Syngen 2–3 PP681899
OSyNE-ZA 314,723 355 10 42.16 Syngen 2–3 PP681914
O-NE-28 358,607 399 12 42.15 Syngen 2–3 PP681903
O-NE-15 351,222 387 15 42.31 Syngen 2–3 PP681891
OSyNE-4B-S1 310,357 340 13 42.75 Syngen 2–3 PP681910
O-NE-16 327,414 365 12 42.70 Syngen 2–3 PP681892
O-NE-17 341,837 376 15 42.11 Syngen 2–3 PP681893
OSyNE-4B-L2 323,328 358 14 41.95 Syngen 2–3 PP681908
O-NE-20 370,366 421 13 41.95 Syngen 2–3 PP681896
NE-O-8-L 317,497 358 9 41.71 Syngen 2–3 PP681906
OSyNE-5A-L1 290,838 327 14 42.23 Syngen 2–3 PP681911
O-NE-14 354,143 403 11 41.07 Syngen 2–3 PP681890
O-NE-11 341,205 393 11 41.26 Syngen 2–3 PP681887
NE-41–1-L 327,412 355 14 42.14 NIES 2541 PP681870
NE-41–2-m 301,364 353 13 41.53 NIES 2541 PP681871
NE-40–1-m 312,612 350 13 41.94 NIES 2540 PP681868
41-NE-6 389,101 428 15 41.64 NIES 2541 PP681867
NE-41–3-s 376,634 428 17 40.85 NIES 2541 PP681872
40-NE-5 351,902 414 14 40.71 NIES 2540 PP681864
41-NE-4 379,575 434 18 40.90 NIES 2541 PP681865
NE-40–2-s 352,746 405 17 40.81 NIES 2540 PP681869
N-NE-5 379,477 444 18 40.81 NC64A PP681874
O-NE-12 332,900 386 16 41.77 Syngen 2–3 PP681888
N-NE-4 345,564 385 11 41.68 NC64A PP681873
O-NE-26 333,679 375 14 41.68 Syngen 2–3 PP681901
O-NE-13 367,782 404 11 41.64 Syngen 2–3 PP681889
a

Host strain used for virus isolation.

To date, the largest C. variabilis-infecting virus genome was from NY-2A, a NC64A virus, with 368,683 base pairs. Our findings suggest that the genome size range of this group of viruses was larger than previously described, with 15 of the 53 novel genomes being larger than the NY-2A genome. In fact, three isolates caught our attention, O-NE-22, O-NE-23, and O-NE-25, because their genomes exceed 400,000 bp in size, a phenomenon observed only in phycodnaviruses in the genus Phaeovirus. The genome sizes were confirmed by pulsed-field gel electrophoresis (see Fig. S1 at https://doi.org/10.6084/m9.figshare.25822657). No smaller genome than 287,769 bp, related to KS1B (13), was discovered, maintaining this virus as the smallest C. variabilis-infecting virus to date in terms of genome size.

Phylogeny of C. variabilis-infecting viruses

The phylogenetic reconstruction using the hallmark genes of all 68 C. variabilis-infecting viruses indicated that the OSy viruses’ group is a descendent clade of the NC64A viruses (Fig. 1). The generated tree shows that there was a single point of origin for all the OSy viruses sequenced so far, with a high bootstrap value [99] at the node separating the lineage from the rest of the tree. This suggests that, at some point in the evolutionary history of these viruses, an ancestor acquired mutations that limited the host range to only the Syngen 2–3 strain of the alga. It is important to note that a few isolates cluster with OSy-viruses, despite being able to infect and form plaques in different strains of C. variabilis. From a broader point of view, all these viruses can be considered Syngen-viruses (i.e., all can infect and replicate in Syngen 2–3 strain), but a subset of isolates can only infect Syngen 2–3 strain, therefore named OSy-viruses (the isolate nomenclature uses either “OSy” or “O” to indicate the host). In this scenario, it is possible that after the emergence of the OSy-virus lineage, a few isolates recovered the capacity to infect a broader host range, like virus NE-41–1-L, constituting a case of phenotypic regression among giant viruses.

Fig 1.

Phylogenetic tree depicts relationships between isolates with bootstrap support. Circles indicate confidence levels, and a matrix on the right aligns isolates with categorized data, likely genomic or phenotypic traits.

Evolution of C. variabilis-infecting viruses. Phylogenetic reconstruction using 68 C. variabilis-infecting viruses based on hallmark genes of Nucleocytoviricota. Colored strips indicate the proposed subgenera of Chlorovirus (squares in shades of black and gray), and host lineages the different isolates are capable of infecting and forming plaques (colored squares). The CVB-1 and ATCV-1 viruses were selected as outgroups, representing the Betachloroviruses and Gammachloroviruses, respectively. The bootstrap values of the nodes are indicated by the size of the circles at each node, with a cutoff of 70%.

When Quispe et al. (2017) discovered the OSy-viruses, their phylogenetic tree grouped the viruses in a topology similar to the one presented here, albeit with fewer isolates and different markers. These independent studies support the hypothesis of the well-cohesive phylogenetic relationships within the clades of C. variabilis-infecting viruses, and the same can be inferred for the remaining chloroviruses. Our findings support the hypothesis that the OSy viruses correspond to at least one new unique clade. However, given their host specificity, using this way of nomenclature can be confusing. To try to resolve this, we propose a restructuring within the genus Chlorovirus. Viruses infecting any strain of C. variabilis would be viruses of the subgenus Alphachlorovirus; those infecting Micractinium conductrix would be viruses of the subgenus Betachlorovirus; and those infecting C. heliozoae would be viruses of the subgenus Gammachlorovirus (Fig. 1).

Genomic landscape of C. variabilis-infecting viruses

The genome size of these viruses ranges from 287,769 base pairs for KS1B to 409,489 base pairs for O-NE-23 (=345,119 bp), and there is a significant difference between the means of the subgroups NC64A and Syngen 2–3 (Fig. 2A and see Fig. S2A at https://doi.org/10.6084/m9.figshare.25822657). A similar pattern is found for the number of CDSs for each genome included in this work, with KS1B containing the least number of CDSs (n = 319), whereas O-NE-23 has the most (n = 457) (Inline graphic=388 CDSs) (Fig. 2B and see Fig. S2B at https://doi.org/10.6084/m9.figshare.25822657). The number of tRNA’s per genome, on the other hand, does not have the same behavior as the previous characteristics. The genome with the least amount of tRNA’s, O-NE-23 (n = 6), has the largest genome in size, and one of those that has the largest number of CDSs, 41-NE-4, has the most tRNA’s (n = 18) (Inline graphic=12 tRNA’s) (Fig. 2C and see Fig. S2C at https://doi.org/10.6084/m9.figshare.25822657). We observed statistical differences between the subgroups NIES 2541 and Syngen 2–3 (P-value = 0.0176) and NIES 2541 and NC64A (P-value = 0.0057) (Fig. 2C). The last characteristic we analyzed, the GC content, showed the most remarkable results. The lowest GC content was found in NC-1A (GC = 39.77%), whereas the highest was found in OSyNE-4B-S1 (GC = 42.75%) (Inline graphic=41.2%) (Fig. 2D and see Fig. S2D at https://doi.org/10.6084/m9.figshare.25822657). This data set had a divergent histogram of frequencies, with a tendency of two peaks and a general structure distinct from the Gaussian distribution, which was confirmed with the Shapiro-Wilk test (P-value = 8.819e-5). In addition, Lavene’s test indicated that these data are heterogeneous in variance (P-value = 0.3469). We observed a statistical difference between the subgroups NIES 2541 and NC64A (P-value = 0.259), Syngen 2–3 and NIES 2540 (P-value = 0.0051), Syngen 2–3 and NIES 2541 (P-value = 0.0055), and Syngen 2–3 and NC64A (P-value = 9.7e−10) (Fig. 2D). Notably, principal component analysis (Fig. 2E) indicates a separation between the genomes of viruses infecting NC64A and those infecting Syngen 2–3. The former were clustered mainly in the second quadrant, whereas the latter were scattered across the first, third, and fourth quadrants. The viruses infecting NIES 2540 and NIES 2541 overlapped with those infecting Syngen 2–3. It is important to note that the small number of samples from these two groups may influence the analysis. This result reinforces the findings from the variance tests performed earlier, indicating a distinct separation of the data and, consequently, a significant difference between the genomes of these viruses.

Fig 2.

Box plots compare genome size, CDS count, tRNA count, and GC content among different host strains depicting statistical significance. PCA plot depicts strain clustering based on genomic features.

Comparative genomics of C. varialibis-infecting viruses. Boxplot graphs of four genomic features: (A) genome size; (B) CDS count; (C) tRNA count; and (D) GC content. The colors are related to which host strain each viral lineage was identified with; (blue) NC64A; (pink) NIES-2540; (orange) NIES-2541; and (gray) Syngen 2–3. Analysis of variance was performed for each data set, with ɑ = 0.05. The P-values of the post-hoc tests are represented as the * over the correlation lines, with * <0.05, ** <0.005, and *** <0.0005. (E) Principal components analysis of the four genomic features. We used dimensions 1 and 2, once they explained over 75% of the total data set variance. The ellipses were calculated according to a confidence of 0.95. The individuals represented as blue circles indicate infected C. variabilis NC64A, the ones represented as pink triangles indicate infected C. variabilis NIES-25640, the ones represented as orange squares indicate infected C. variabilis NIES-2541, and the ones represented as gray pluses indicate infected C. variabilis syngen 2–3.

Considering all the 68 alphachloroviruses, we predicted a total of 26,349 CDSs. To obtain a general picture of the biological processes those genes were involved with, we classified them considering functional categories based on the NucleoCytoplasmic Virus Orthologous Groups (NCVOG) with a few modifications to better adapt to chloroviruses. Among the genes with defined functions, most of them are related to DNA replication, recombination, and repair (e.g., DNA polymerase B family and DNA topoisomerase), followed by genes involved with virion structure and morphogenesis (e.g., Major Capsid Protein and DNA packaging ATPase) and carbohydrate metabolism (e.g. glycosyltransferases and chitinases) (Fig. 3A). As expected, a large fraction of the alphachloroviruses’ genes had no defined function (median = 45.37%), and future works focusing on structural biology could provide interesting insights about the putative roles of these proteins during the viruses’ life cycle (Fig. 3B).

Fig 3.

Number of COGs in various functional categories with uncharacterized proteins having highest count. Pie chart represents the proportion of each functional category, with Uncharacterized as largest group.

General genomic features of alphachloroviruses. (A) Functional categories of genes predicted for alphachloroviruses. Boxplots represent the data of all 68 viruses included in this study. (B) Functional annotation of alphachloroviruses genes based on NCVOG. The numbers indicate the median of genes from each category found in all 68 alphachloroviruses

Similarity and genomic organization of “alphachloroviruses”

Beyond the genomic structure, genomic similarity and gene organization are also important features during a comparative analysis. Using BRIG, we performed two distinct analyses, one with PBCV-1 as the reference (Fig. 4A) and another with OSy-NE5 as the reference (Fig. 5A). In the first one, we identified three other genomes, 40-NE-3, 40-NE-4, and 41-NE-5, which are practically identical with PBCV-1, whereas the others are more distinct, with almost all sequences being less than 70% similar to the reference. Interestingly, these four genomes belong to the same phylogenetic branch, with the possibility of even being the same species (Fig. 1).

Fig 4.

Circular genome map of Chlorella variabilis-infecting viruses, with regions highlighting gene positions, protein functions, and their roles in DNA replication, virion structure, lipid metabolism, and carbohydrate metabolism.

Local alignment output of PBCV-1 genome compared against 11 chloroviruses genomes. (A) The innermost rings show GC content (black). The second innermost ring shows the reference genome, PBCV-1 (light blue). The remaining rings show BLAST comparisons of 11 other chloroviruses against the reference. The major colors are related to the algal strain that each virus was identified with: blue, NC64A; gray, Syngen 2–3; pink, NIES-2540; and orange, NIES-2541. The color scale of each ring is related to the identity of the genome parts when compared with the reference, with the darker representing 100% identity, the lighter representing 70% identity, and white representing 50% identity. The color gradient is indicated only once for each group, but it refers to each genome of each group. (B) Genes located at the main regions of dissimilarity. We compared the gene prediction and annotation and classified into NCVOG each protein codified by the genes. MCP, major capsid protein. The full list of genomes used in this analysis is described in Table S1.

Fig 5.

Circular genomic map of Chlorella variabilis-infecting viruses highlights gene regions involved in virion structure, carbohydrate metabolism, and DNA replication.

Local alignment output of OSy-NE5 genome compared against 11 chloroviruses genomes. (A) The innermost rings show GC content (black). The second innermost ring shows the reference genome, OSy-NE5 (black). The remaining rings show BLAST comparisons of 11 other chloroviruses against the reference. The major colors are related to the algal strain that each virus was identified with: gray, Syngen 2–3; blue, NC64A; pink, NIES 2540; and orange, NIES 2541. The color scale of each ring is related to the identity of the genome parts when compared with the reference, with the darker representing 100% identity, the lighter representing 70% identity, and white representing 50% identity. The color gradient is indicated only once for each group, but it refers to each genome of each group. (B) Genes located at the main regions of dissimilarity. We compared the gene prediction and annotation and classified into NCVOG each protein codified by the genes. mCP, minor capsid protein. The full list of genomes is described in Table S1.

Additionally, three large regions of low similarity could be identified on this plot: (i) one ranging from the beginning of the genomes to approximately 20 kbp, (ii) another from the position 25 kbp to the position 40 kbp, and (iii) the last from the position 120 kbp to the position 135 kbp, approximately. To study these regions, we divided these genomes into two groups: one with those that have high similarity to PBCV-1 and another with low similarity to it.

In the first region, besides lots of hypothetical proteins, we found two genes with functions predicted to be involved with the virion structure and morphogenesis, in addition to one classified as miscellaneous. This classification is proposed considering the Nucleo-Cytoplasmic Virus Ortholog Groups (NCVOG), a set of probable orthologs and paralogs clustered by their predicted metabolic functions (11, 14). On the other hand, the second and third regions contained genes related to DNA replication, recombination, and repair functions, lipid metabolism, host-virus interaction, and carbohydrate metabolism (Fig. 4B). According to the transcriptome analysis of PBCV-1 (15, 16), the genes found in the first region are considered late genes, with their expression starting after the onset of viral DNA synthesis, between 60 and 90 minutes after infection, whereas the genes found in the second region and in the third regions can vary from early genes, with expression starting before the initiation of viral DNA synthesis, to intermediate genes, with expression starting before DNA synthesis, but persisting afterward, as well as late genes.

In the second BRIG plot, which has OSy-NE5 as the reference, we observed two regions of high dissimilarity compared with the other genomes (Fig. 5A). One region started from position 75 kbp and extended to position 100 kbp, whereas the other region started at position 132 kbp and extended to position 140 kbp. Similar to the previous plot, we also analyzed each of these regions separately. In region 1, we found six genes with functions predicted with the virion structure and morphogenesis, one with carbohydrate metabolism and one with DNA replication, recombination, and repair. Region 2 contained four genes related to DNA replication and recombination (Fig. 5B).

Although region one encoded six structural proteins, they were not the same (Fig. 5B). In fact, it was one single copy of Vp54, the main capsid protein, and five copies of a protein rich in glycoprotein repeats. Because the OSy-NE5 annotation is outdated, we performed a BLAST comparison between this glycoprotein and all the identified capsid proteins of PBCV-1 (17). The results indicated a low similarity, about 32%, of these OSy-NE5 proteins with the PBCV-1′s P22 minor capsid protein (see Table S3 at https://doi.org/10.6084/m9.figshare.25822657). The P22 protein is present as homotrimers in type II fibers, which are structures located on the outer surface of the PBCV-1 capsid, attached to pseudo-hexameric type IV capsomeres formed by the variant 5 of the MCP (MCPv5) (17). Coevolution experiments with the host alga demonstrate that the a122/123 r gene (encoding P22) is highly variable, with 14 polymorphic sites that emerged after the emergence of algal clones (18). In this regard, this gene appears to follow the host changes, with mutations being positively selected. In a scenario where we observe a change in the host lineage that is susceptible to the virus, it is expected that a protein so closely related to the host range would also be different. Indeed, the identity analysis of the P22 proteins from OSy-NE5 and PBCV-1 indicates approximately 30% similarity between them, highlighting this high variation (see Table S3 at https://doi.org/10.6084/m9.figshare.25822657). The second region was predominantly composed of genes with functions predicted as DNA methyltransferases (n = 3) and a protein similar to A22 of poxviruses. os5 126 r, os5 128 r, and os5 129 r are predicted to be early genes, whereas os5 123 L is an intermediate gene (compared with its homologs in (15) and (16). This once again indicates that genes related to the replicative module are quite different among these viruses, which can impact the virus-host relationship. These conclusions complement the findings of Esmael et al. (19), who demonstrated that the OSyNE5 genome enters NC64A cells, indicating that the structural module is not significantly different; despite being unable to replicate its genome in this algal strain, suggesting a significant alteration in the replicative module (19).

In contrast, a multiple genome alignment indicates that there is good conservation at the level of syntenic blocks (Fig. 6) and is consistent with previous studies (20). An inversion can be observed in the genomes of the N-NE-5, NE-40–2-s, and 41-NE-4 viruses, reversing the order of some blocks. When searching for these viruses in the phylogenetic tree (Fig. 1), we identified they belong to the same phylogenetic branch, which may indicate again that they are the same species. In this case, the viruses are classified into three different groups (NC64A, NIES 2540, and NIES 2541), the latter two groups are not real clades but rather a phenetic way of separating individuals.

Fig 6.

Synteny plot compares genomic organization across viruses from different host strains, highlighting conserved and rearranged regions through color-coded blocks and connecting lines.

Synteny analysis of chloroviruses genomes. Multiple alignment of 12 chloroviruses genomes. On top of the viruses’ names, we have indicated the host strain used to discover the virus. PBCV-1 - Paramecium bursaria chlorella virus 1, OSy-NE5 - Only-Syngen Nebraska virus 5. The full list of genomes is described in Table S1.

Pan-genome of “alphachloroviruses”

The pan-genome of C. variabilis-infecting viruses was constructed using the 26,041 genes predicted from the 68 viruses and was shown to consist of 779 clusters of orthologous groups (COGs) (Fig. 7A), along with 106 singletons (see Table S4 at https://doi.org/10.6084/m9.figshare.25822657). Of these 779, 182 are exclusively related to “NC64A,” whereas 83 are only related to OSy, and 514 are the core genome. Within the exclusive OSy clusters, none were pan-OSy (i.e., present in all representatives of the clade); thus, no proteins with a priori phylogenetic support were found that could explain such host specificity.

Fig 7.

Gene clusters of orthologous genes in a network graph, categorized into different virus groups. Bar and line graph depicts core and pan-genome sizes across various clades, illustrating genetic diversity among viral strains.

Pan-genome of the C. variabilis-infecting chloroviruses. (A) A bipartite network graph connects the 779 COGs (singletons not represented) to the two major clades of viruses included in the pan-genome analysis. Colored nodes correspond to the viruses: blue, NC64A, NIES-2540, and NIES-2541; and gray, Only-Syngen. White nodes correspond to the COGs. The core genome is composed of the 514 COGs in the middle of the graph, whereas 83 are related only to the OSy-viruses, and 182 are related to the other chloroviruses. (B) Evolution of alphachlorovirus pan-genome created by step-wise addition of new chloroviruses. The indicated clades correspond to those defined in Fig. 1.

In addition, we analyzed the evolution of the pan-genome of alphachloroviruses (Fig. 7B). It is interesting to note that there is a conservation pattern in the core genome number among isolates corresponding to the same branch of the tree in Fig. 1. This highlights how similar these viruses are. Including each virus individually also showed that little information is added to the group’s pan-genome, with major jumps observed mainly when switching clades, although a constant increase in the genetic diversity is clearly observed.

We annotated and characterized all exclusive OSy-virus clusters based on the NCVOGs (Fig. 8A). Most of the genes encoding for uncharacterized proteins (21), with 30 ORFans and 22 hypothetical proteins (Fig. 8B). Other genes with known function were observed, including chitinase, endonucleases, capsid proteins, ion-transporting enzymes, and others already identified in chloroviruses (Fig. 8B).

Fig 8.

Pie chart depicts the functional classification of genes, with most categorized as uncharacterized. Table lists specific genes grouped by function, such as carbohydrate metabolism, virion structure, DNA replication, and other functions.

Functional annotation of the 83 COGs exclusive to the Only-Syngen viruses. (A) Pie-chart indicating the major functions of the COGs, with the categories related to the NCVOG classification, proposed by YUTIN et al. (14) and updated by Rodrigues et al. (11). (B) Composition of each NCVOG found within the COGs. The number between the parentheses represents the number of each protein found.

Considering the topology of our phylogenetic tree (Fig. 1), three clusters of orthologous genes were found only in the branch formed by NE-O-7-s, O-NE-18, O-NE-19, O-NE-22, O-NE-23, O-NE-25, O-NE-27, and O-NE-29, in which one of these was annotated as an ORFan and the other two as hypothetical proteins. Another branch also presented its own clusters (n = 2), formed by OSyNE-5B-M2, OSyNE-4B-M2, OSy-NE5, OSyNE-5B-S1, and O-NE-10, of which one was annotated as an ORFan and the other as a hypothetical protein. In this sense, it is possible to conclude that these unknown proteins, which were exclusive to some branches (which could potentially be considered species), would be characteristic of autapomorphies, although they cannot be characterized as such precisely because they are unknown. No exclusive clusters were found for the other branches.

Demarcating chlorovirus species

Previously, we have argued how our data may indicate that the isolates can be grouped into species. However, there is no formal rationale to demarcate what a nucleocytoviruses species is, qui sait a chlorovirus, which led us to propose such a concept using C. variabilis-infecting viruses as a model. Based on the ICTV, we used different lines of evidence to demarcate species of chloroviruses (21). Using phylogenetic reconstructions based on NCLDV core genes (Fig. 1), we confidently observed that chloroviruses cluster in at least three major clades, which we now refer to as Alphachlorovirus, Betachlorovirus, and Gammachlorovirus, standing for different subgenera of Chlorovirus. Moreover, our phylogenetic analysis indicates at least 10 clades of alphachloroviruses. Additionally, when performing the local alignments (Fig. 4), we notice that four genomes are practically identical and located on the same branch of the tree mentioned above. Therefore, this observation makes nucleotide similarity another essential line of evidence for the demarcation of viral species, as indicated by Simmonds et al. (21).

To better explore the nucleotide similarity, we performed an average nucleotide identity (ANI) analysis comprising all 68 virus genomes (Fig. 9). Based on the clustering pattern and comparing with the results of the same analysis for betachloroviruses and gammachloroviruses (data not shown), we reached the percentage of 94% similarity to demarcate species for the genus Chlorovirus. With this parameter, we identified seven groups, most of them including the same representatives of the clades observed in phylogenetics (Fig. 1). Given that ANI should be a key line of evidence for demarcating species (21), we will use it for our final listing. With this, we propose the existence of seven species in the subgenus “Alphachlorovirus,” most of them with strong phylogenetic support (Fig. 9; Table 2). It is important to note a few incongruences with the phylogenetics tree, mainly within the large branch of the OSy-viruses, which includes four of the seven described species, demonstrating a certain degree of variability of these viruses. This can be explained considering that our phylogeny was built using a handful of selected genes, whereas ANI comprises the whole genome.

Fig 9.

Heatmap depicts relationships between samples based on similarity percentages. Clustered groups are highlighted with colors, indicating regions with high levels of similarity between them, ranging from 80% to 100%.

Average nucleotide identity of 68 alphachlorovirus. The seven major colored squares group the isolates with ANI > 94%. The ANI values of the sites vary from 75% to 100%.

TABLE 2.

Virus species and isolates

Species Isolates
I WNE-10B-S1, MA-1D, NYs1, NY-2B, IL-5–2s1, AR158, NY-2A
II 40-NE-3, 40-NE-4, 41-NE-5, PBCV-1, XZ-3A, WNE-11A-L2, NE-JV-4, IL-3A, AN69C, CA-4B, NY-2C, CA-4A, MA-1E, CviKI, XZ-4C, CvsA1, KS1B, NC-1A, XZ-5C, SH-6A, XZ-6E
III N-NE-5, NE-41–3-s, NE-40–2-s, 41-NE-4, 40-NE-5, NE-41–2-m, NE-40–1-m, O-NE-26, O-NE-12, 41-NE-6, N-NE-4
IV O-NE-11, O-NE-14
V NE-41–1-L, OSyNE-5A-L1, OSyNE-4B-L2, O-NE-17, O-NE-20, O-NE-16, NE-O-9-L, O-NE-24, NE-O-8-L, OSyNE-ZA, O-NE-28, O-NE-15, OSy-NE5, OSyNE-5B-S1, O-NE-10, OSyNE-5B-M2, OSyNE-4B-M2, OSyNE-4B-S1
VI O-NE-13
VII O-NE-25, O-NE-27, O-NE-29, O-NE-19, O-NE-22, O-NE-23, NE-O-7-s, O-NE-18

DISCUSSION

Chloroviruses have predominantly been isolated from freshwater environments over the last four decades. In a previous study, we demonstrated the open pan-genome of chloroviruses, suggesting that the isolation of new viruses from diverse locations could serve as a source of genetic novelty (11). In this study, we present the genomic features of 53 newly isolated chloroviruses that infect various strains of C. variabilis, which we are naming “alphachloroviruses.” Among these isolates, we identified 15 viruses with genomes larger than the previously described largest chlorovirus’ genome (Chlorovirus NY-2A with a genome size of 368,683 bp), three of them exceeding 400 kbp, suggesting the existence of genome gigantism within the chlorovirus group. Increases in genome size have been associated with different events among giant viruses of the phylum Nucleocytoviricota, including gene duplication and horizontal gene transfer (2224). Algal viruses of the order Imitervirales also have long genomes. Chlorella virus XW01, discovered in 2022 in Shanghai, PRC, has a 407,612 bp-long genome, encoding 346 possible CDSs, of which 42.5% are ORFans (25). Despite infecting algae, this virus is related to the protozoa-infecting family Mimiviridae, being a sister group of the genus Cafeteriavirus. We performed a paralogy analysis which indicated that the chloroviruses with the largest genomes (>400 kbp) have more paralogous genes than OSyNE5, PBCV-1, and XW01, inferring that it was a singular event for a particular lineage within the genus (data not shown).

Analyzing the whole genome synteny of various isolates, we observed certain regions of dissimilarity, indicating gene mutations among the new viruses. This discovery opens possibilities for studying the impact of these genetic modifications. Interestingly, it was reported that changes in the genome sequence of PBCV-1 lead to phenotypic variability. In 1995, Landstein et al. showed that PBCV-1 variants with spontaneous large deletions, which removed long parts of the left end of the genome, had smaller burst sizes and lysis plaques diameter, yet remained viable (26). They also showed that this phenomenon was related to the loss of two enzymes related to glycosylation. In this work, we observed that one of the previous predictions was correct. The gene A064R of PBCV-1 is homologous to glycosyltransferases and is located in the same genome position as in viruses 40-NE-4 and 41-NE-5, which we presume are in the same species as PBCV-1. With this information, we presume that genes related to the glycosylation machinery are important to viral evolution, since they impact viral fitness (26).

Considering all the 68 alphachloroviruses, we identified over 26,000 genes that were clustered in 885 COGs, of which 106 are singletons. Most of these singletons correspond to proteins with uncharacterized functions (hypothetical proteins). Nevertheless, 14% of the singletons involve genes that encode proteins with described functions, but which are positioned either before or after genes with the same predicted functions as them, albeit with larger sizes. Gene duplication events often generate smaller copies of genes in tandem and can occur randomly (i.e., errors during replication), which could explain the lack of homology. However, we also need to consider the hypothesis that this is an artifact of the gene prediction method, given that we used the "Prokaryotic'' function of GeneMarkS, which does not consider the existence of introns, genetic elements found in chloroviruses (10).

In addition, we observed the conservation of the number of orthologous groups within the clades, with the exception of the first (Fig. 7). This clade 1 has viruses that were grouped into three different species, and whose variability was also observed in Fig. 9. These data give more confidence in the demarcation criteria, evidencing that different and independent analyzes were able to indicate similar results. Furthermore, despite constant increases, we noticed that there is a tendency for the pan-genome curve to reach the plateau. As a large part of the data comes from viruses collected from similar regions, and therefore, we may be analyzing a small population sample, we understand that global efforts to find new chloroviruses need to be encouraged.

With the increased number of isolates, we foresaw an opportunity to propose robust criteria to finally demarcate a chlorovirus species. According to the ICTV, a virus species is a monophyletic group of mobile genetic elements whose properties can be distinguished from other species by multiple criteria (27, 28). In the current genomic era, using corresponding data of phylogenetics, that is, the clustering pattern and ANI seems to be the best strategy to to demarcate a giant virus species and avoid arbitrariness. Other elements can be included (e.g., host range), but this must be done carefully because we can be surprised by novel discoveries and the expansion of the virosphere-host relationship. In this context, we showed that at the level of algal strain, the host range within a single group is higher than previously thought. In this sense, we propose abolishing the systematic use of the host to delimit groups within the proposed subgenera, given that only more in-depth analyses will allow error-free classification. With this, we propose the existence of three subgenera of Chlorovirus, namely Alphachlorovirus, Betachlorovirus, and Gammachlorovirus. Within the Alphachlorovirus, we identified the presence of seven species, five of which exclusively comprise new isolates described in this study. This highlights the importance of ongoing efforts to isolate and characterize new chloroviruses, thus contributing to revealing the remarkable diversity of the virosphere.

Conclusions

Collectively, our results indicate that a process of specialization occurred in a lineage of the ancient NC64A clade, which gave rise to the OSy-viruses. Viral specialization processes can occur via receptor modification (29) or by optimizing the use of certain cellular mechanisms, immune escape, and maximizing transmission (30). We found some modifications in proteins related to the capsid, but none of them directly related to the spike, a structure hypothesized for virus entry. Nevertheless, we observed differences in genes related to proteins involved in the glycosylation of viral proteins and in the host cell cycle, indicating that host specialization likely occurred with that optimization of virus fitness. In addition, we also show that there was a series of genes exclusive to OSy-viruses and that it was not from one or more of these new genes that the evolutionary divergence occurred, given that none was pan-OSy. In this sense, the specialization of OSy-viruses probably occurred through mutations in genes of the core genome of all chloroviruses that infect C. variabilis. This statement is further supported when we analyze the pan-genome evolution and observe the conservation of the number of orthologous groups within the clades. Finally, our results support the reorganization of chlorovirus taxonomy and provide criteria for defining chlorovirus species, which may be useful for other groups of giant viruses. Environmental virology is an astonishing and growing field of research. The addition of genomic and evolutionary analysis to this field has led us to reveal and solve many mysteries of the virosphere. In this way, this work resolved one part of the interesting puzzle of the chloroviruses, putting us a step forward in the understanding of their diversity and evolutionary history.

MATERIALS AND METHODS

Sample acquisition

The data used in this work were generated in three ways: (i) recent collection data; (ii) older collection data; and (ii) downloading genomes and genes from public databases.

In the first way, we collected water samples from lakes and rivers in Crescent Lake National Wildlife Refuge and surrounding areas in the State of Nebraska, USA, located in a biome called “Sandhills,” which can be characterized as stabilized sand dunes covered in grass (31). The collections took place between 2017 and 2020, on similar dates each year. From these samples, we isolated 44 viruses and pre-classified them according to their ability to form lytic plaques on hosts C. variabilis NC64A (ATCC 50528), C. variabilis Syngen 2–3 (ATCC 30561), C. variabilis NIES-2540, and C. variabilis NIES-2541 algae strains, with later infection assays on other cell types. Viral isolates from this set were named based on the specific host strain on which they were initially isolated. The viruses were produced and purified on sucrose density gradients as previously described (5). The genomes were isolated and sent to GENEWIZ, a facility of Azenta Life Sciences, and to the University of Delaware sequencing facility, for sequencing using Pacific Biosciences' PacBio Sequel II technology, along with the analysis of the quality of long reads.

In the second way, nine genomes from collections between 1984 and 1991 were evaluated so that we could perform the first genomic characterizations of these sequences. Some of these genomes are from viruses collected in Nebraska and surrounding states; the XZ-3A, XZ-4C, XZ-5C, and XZ-6E isolates are from water samples collected in the People’s Republic of China (PRC) (32).

Finally, in the third way, we selected genomes and genes of fifteen previously published chloroviruses that infect C. variabilis. The data were downloaded from the Genbank database.

Genome assembly

The files with the long reads generated by PacBio Sequel II were delivered by the sequencing facility, and each genome was assembled de novo using Canu software (33). For most jobs, the parameters used were “genomeSize = 400 m,” and “-pacbio,” which indicates the sequencing method used to generate the input file. The genomeSize parameter changed depending on the results obtained to maximize the acquisition of a single scaffold. For the final genome assembly in case of obtaining more than one contig from Canu, the online software MeDuSa (12) was used with a reference genome to organize the contigs to obtain the final scaffold.

Pulse-field gel electrophoresis

PFGE studies were carried out according to Agarkova et al. (34) with some modifications (34). Chlorella viruses were standardized to virus concentration 109 plaque-forming units per mL in Tris Buffer (TB, 50 mM Tris-HCl, pH 7.8). The virus solution was mixed with an equal volume of 2% low melting point agarose (Bio-Rad) in TB at 45°C, poured into plug molds (Bio-Rad, Hercules, CA), and placed at 4°C for 15 min to solidify. Agarose blocks were incubated in approximately 2 mL of 1 mg/mL proteinase K in Digestion Buffer (DB, 250 mM EDTA, pH 9.5; 1% N-Lauroylsarcosine) for 48 h. After digestion, samples were washed two times for 30 min with DB and cut into small pieces that fit into gel wells (~250 ng DNA/well). Samples were sealed with 1% low melting point agarose at 45°C in an electrophoresis buffer. Viral DNAs were separated in a CHEF-DR II (Bio-Rad) unit in a 1% agarose gel in the running buffer (0.5 × Tris Borate Buffer, TBE). Electrophoresis conditions were 6 V/cm with ramped pulses from 25 to 70 sec for 24 h. Yeast Chromosome (225–1,900 Kb) PFGE Marker (New England BioLabs) was used as a size marker. Gels were stained with 0.5 µg/mL ethidium bromide for 30 min, destained in water for 2 h, and digital images were made with the ChemiDoc EQ System (Bio-Rad).

Phylogenetic analyses

Phylogenetic analyses were conducted using concatenated amino acid sequences for the hallmark genes of nucleocytoviruses found in the chloroviruses, namely SNF2-like helicase, DNA polymerase family B, transcription initiation factor IIB, DNA topoisomerase II, packing ATPase A32, and poxvirus late transcription factor VLTF3 (35). In addition to the 68 viruses mentioned above, the protein sequences from viruses ATCV-1 (accession code: NC 008724) and Paramecium bursaria chlorella virus CVB-1 (accession code: JX997160) were included in the data set as outgroups.

The sequences were aligned using the ClustalW algorithm (36), in MEGA11 (37), and trimmed manually, considering a burn-out threshold of 90% of gaps in each character column. Phylogenetic reconstructions were performed using the IQ-TREE 2 software (3840) based on the maximum likelihood method. The substitution models and site rate heterogeneity used were LG + F + I + G4 (41, 42), and statistical support for the nodes was evaluated using bootstrap values from 1,000 replicates. The resulting trees were visualized and edited using iToL (43).

Gene prediction and annotation

Genes were predicted using the online software GeneMarkS (44), with the “Prokaryotic” option. For tRNA prediction, the online software applications tRNAscan-SE (45) and ARAGORN (46) were used. The predicted ORFs were annotated following the workflow described by Queiroz et al. (24). Briefly, we used Blastp (e-value <10−5) (47) against the NCBI nr database, followed by protein domain search using HHpred (48, 49). In case of results discordance, we used Interproscan (50) and compiled the results to achieve the final gene annotation.

Statistical analysis

Initially, we compiled all metadata generated in this study, including genome size, CDS count, tRNA count, and GC content, into data sets, with every genome included. This allowed us to then allocate the genomes based on the C. variabilis strain that each respective virus was selected on, initially. This way, each data set was divided into four treatments: NC64A, NIES 2540, NIES 2541, and Syngen 2–3.

To assess the behavior of the data sets, we conducted Shapiro-Wilk tests for normality and Levene’s tests for homogeneity of variance, with a significance level of ɑ = 0.05 for both tests. Afterward, we performed an analysis of variance to verify any differences between the means or medians of each group. If the P-value of the Shapiro-Wilk and Levene tests was >0.05, we conducted one-way ANOVA tests, followed by Tukey multiple comparisons of means post-hoc tests. If those P-values were <0.05, we conducted Kruskal-Wallis rank sum tests, followed by pairwise comparisons using the Wilcoxon rank sum test with the Benjamini-Hochberg P-value adjustment test. All these tests were performed in R Studio, using the native statistical tools in R version 4.2.2 (51); the graphics were generated using the ggplot2 (52) and ggsignif (53) packages for R. To improve the exploratory analysis of data, we performed a principal component analysis (PCA) to identify possible clusters of genomes, using the factoextra package (54) for R.

Genome colinearity analysis

To demonstrate the conservation of gene organization between genomes, synteny analysis of genomic blocks was conducted. Both block prediction and comparison between different viruses were performed using Mauve software (55) and BRIG software (56). Three genomes from each host-based group were randomly chosen (see Table S2 at https://doi.org/10.6084/m9.figshare.25822657), except PBCV-1 and OSy-NE5, which were chosen as the references for NC64A and OSy groups, respectively. For Mauve, all 12 genomes were used simultaneously for the multiple alignment comparisons. In addition, dot plots were performed using BLAST.

Viral pan-genome construction and visualization

To assess the size of the chlorovirus pan-genome, we employed the OrthoFinder tool (57) to group protein prediction files and infer clusters of orthologous groups (COGs), using an MCL inflation parameter of 4 (58). Each OrthoFinder run was performed progressively, using viruses infecting the same host as unique groups, following the order NC64A and then Only-Syngen. Groups NIES-2540 and NIES-2541 were grouped with NC64A, since their host ranges are the same. To visualize the overall COG-sharing among the C. variabilis-infecting viruses, we utilized Gephi (59) and employed the Force-Atlas 2 algorithm to generate the network layout. The pan-genome evolution was plotted using GraphPad Prism9.

Nucleotide identity

To calculate the average nucleotide identity between the 68 chloroviruses studied here, we used FastANI (60), hosted in the European Galaxy server (usegalaxy.eu). The tabular output generated by the tool was converted into a similarity matrix using an in-house script. This matrix was used as the input to construct the heatmap, using an in-house script written with the numpy (61), seaborn (62), and matplotlib (63) packages for PYTHON.

ACKNOWLEDGMENTS

We thank our colleagues from Laboratório de Vírus - ICB/UFMG for the technical support and theoretical discussions about the methods and results. Special thanks go to Professor José Miguel Ortega for helping us with the use of the SAGARANA server, an essential tool of analysis for all ICB community; to Professor Renan Pedra de Souza, for helping us with the statistical analysis; and to Professors Betânia Paiva Drummond and Francisco Pereira Lobo, for evaluating the project during its proposal for the Graduation Program in Microbiology - ICB/UFMG.

We acknowledge financial support from Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) under grant APQ-01057–23, Pró-Reitorias de Pesquisa (PRPq) and Pós-Graduação (PRPG) of UFMG, National Science Foundation under grant 1736030 (D.D.D. and J.L.V.E.), the University of Nebraska-Lincoln Agricultural Research Division (J.L.V.E.), the University of Nebraska-Lincoln Office of Research and Economic Development (D.D.D.), University of Nebraska-Collaboration Initiative (D.D.D.), and Algal Virus Research Funds from the University of Nebraska Foundation (J.L.V.E.). We also acknowledge financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and FAPEMIG for guaranteeing the graduation scholarships necessary to carry out this work. R.A.L.R. is a CNPq researcher.

Contributor Information

David D. Dunigan, Email: ddunigan2@unl.edu.

Rodrigo A. L. Rodrigues, Email: rodriguesral07@gmail.com.

Kristin N. Parent, Michigan State University, East Lansing, Michigan, USA

DATA AVAILABILITY

All new virus genomes described in this study are available in the NCBI GenBank database under the accession numbers PP681862-PP681914. PacBio raw reads are included in BioProject PRJNA1154233.

REFERENCES

  • 1. Van Etten JL, Dunigan DD. 2012. Chloroviruses: not your everyday plant virus. Trends Plant Sci 17:1–8. doi: 10.1016/j.tplants.2011.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Siegel RW. 1960. Hereditary endosymbiosis in Paramecium bursaria. Exp Cell Res 19:239–252. doi: 10.1016/0014-4827(60)90005-7 [DOI] [PubMed] [Google Scholar]
  • 3. Bubeck JA, Pfitzner AJP. 2005. Isolation and characterization of a new type of chlorovirus that infects an endosymbiotic Chlorella strain of the heliozoon Acanthocystis turfacea. J Gen Virol 86:2871–2877. doi: 10.1099/vir.0.81068-0 [DOI] [PubMed] [Google Scholar]
  • 4. Meints RH, Van Etten JL, Kuczmarski D, Lee K, Ang B. 1981. Viral infection of the symbiotic chlorella-like alga present in Hydra viridis. Virology (Auckl) 113:698–703. doi: 10.1016/0042-6822(81)90198-7 [DOI] [PubMed] [Google Scholar]
  • 5. Quispe CF, Esmael A, Sonderman O, McQuinn M, Agarkova I, Battah M, Duncan GA, Dunigan DD, Smith TPL, De Castro C, Speciale I, Ma F, Van Etten JL. 2017. Characterization of a new chlorovirus type with permissive and non-permissive features on phylogenetically related algal strains. Virology (Auckl) 500:103–113. doi: 10.1016/j.virol.2016.10.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. VAN Etten JL, Burbank DE, Kuczmarski D, Meints RH. 1983. Virus infection of culturable chlorella-like algae and dlevelopment of a plaque assay. Science 219:994–996. doi: 10.1126/science.219.4587.994 [DOI] [PubMed] [Google Scholar]
  • 7. Reisser W, Burbank DE, Meints SM, Meints RH, Becker B, Van Etten JL. 1988. A comparison of viruses infecting two different Chlorella-like green algae. Virol (Auckl) 167:143–149. doi: 10.1016/0042-6822(88)90063-3 [DOI] [PubMed] [Google Scholar]
  • 8. Fitzgerald LA, Graves MV, Li X, Feldblyum T, Nierman WC, Van Etten JL. 2007. Sequence and annotation of the 369-kb NY-2A and the 345-kb AR158 viruses that infect Chlorella NC64A. Virol (Auckl) 358:472–484. doi: 10.1016/j.virol.2006.08.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Fitzgerald LA, Graves MV, Li X, Hartigan J, Pfitzner AJP, Hoffart E, Van Etten JL. 2007. Sequence and annotation of the 288-kb ATCV-1 virus that infects an endosymbiotic chlorella strain of the heliozoon Acanthocystis turfacea. Virol (Auckl) 362:350–361. doi: 10.1016/j.virol.2006.12.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Van Etten JL, Agarkova IV, Dunigan DD. 2019. Chloroviruses. Viruses 12:20. doi: 10.3390/v12010020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Rodrigues RAL, Queiroz VF, Ghosh J, Dunigan DD, Van Etten JL. 2022. Functional genomic analyses reveal an open pan-genome for the chloroviruses and a potential for genetic innovation in new isolates. J Virol 96:e0136721. doi: 10.1128/JVI.01367-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Bosi E, Donati B, Galardini M, Brunetti S, Sagot M-F, Lió P, Crescenzi P, Fani R, Fondi M. 2015. MeDuSa: a multi-draft based scaffolder. Bioinformatics 31:2443–2451. doi: 10.1093/bioinformatics/btv171 [DOI] [PubMed] [Google Scholar]
  • 13. Jeanniard A, Dunigan DD, Gurnon JR, Agarkova IV, Kang M, Vitek J, Duncan G, McClung OW, Larsen M, Claverie J-M, Van Etten JL, Blanc G. 2013. Towards defining the chloroviruses: a genomic journey through a genus of large DNA viruses. BMC Genomics 14:158. doi: 10.1186/1471-2164-14-158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Yutin N, Wolf YI, Raoult D, Koonin EV. 2009. Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6:223. doi: 10.1186/1743-422X-6-223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Yanai-Balser GM, Duncan GA, Eudy JD, Wang D, Li X, Agarkova IV, Dunigan DD, Van Etten JL. 2010. Microarray analysis of Paramecium bursaria chlorella virus 1 transcription. J Virol 84:532–542. doi: 10.1128/JVI.01698-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Blanc G, Mozar M, Agarkova IV, Gurnon JR, Yanai-Balser G, Rowe JM, Xia Y, Riethoven J-J, Dunigan DD, Van Etten JL. 2014. Deep RNA sequencing reveals hidden features and dynamics of early gene transcription in Paramecium bursaria chlorella virus 1. PLOS ONE 9:e90989. doi: 10.1371/journal.pone.0090989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Shao Q, Agarkova IV, Noel EA, Dunigan DD, Liu Y, Wang A, Guo M, Xie L, Zhao X, Rossmann MG, Van Etten JL, Klose T, Fang Q. 2022. Near-atomic, non-icosahedrally averaged structure of giant virus Paramecium bursaria chlorella virus 1. Nat Commun 13:6476. doi: 10.1038/s41467-022-34218-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Retel C, Kowallik V, Becks L, Feulner PGD. 2022. Strong selection and high mutation supply characterize experimental Chlorovirus evolution. Virus Evol 8:veac003. doi: 10.1093/ve/veac003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Esmael A, Agarkova IV, Dunigan DD, Zhou Y, Van Etten JL. 2023. Viral DNA accumulation regulates replication efficiency of Chlorovirus OSy-NE5 in two closely related Chlorella variabilis strains. Viruses 15:1341. doi: 10.3390/v15061341 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Seitzer P, Jeanniard A, Ma F, Van Etten JL, Facciotti MT, Dunigan DD. 2018. Gene gangs of the chloroviruses: conserved clusters of collinear monocistronic genes. Viruses 10:576. doi: 10.3390/v10100576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Simmonds P, Adriaenssens EM, Zerbini FM, Abrescia NGA, Aiewsakun P, Alfenas-Zerbini P, Bao Y, Barylski J, Drosten C, Duffy S, et al. 2023. Four principles to establish a universal virus taxonomy. PLOS Biol 21:e3001922. doi: 10.1371/journal.pbio.3001922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Boyer M, Yutin N, Pagnier I, Barrassi L, Fournous G, Espinosa L, Robert C, Azza S, Sun S, Rossmann MG, Suzan-Monti M, La Scola B, Koonin EV, Raoult D. 2009. Giant marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms. Proc Natl Acad Sci U S A 106:21848–21853. doi: 10.1073/pnas.0911354106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Legendre M, Fabre E, Poirot O, Jeudy S, Lartigue A, Alempic J-M, Beucher L, Philippe N, Bertaux L, Christo-Foroux E, Labadie K, Couté Y, Abergel C, Claverie J-M. 2018. Diversity and evolution of the emerging pandoraviridae family. Nat Commun 9:2285. doi: 10.1038/s41467-018-04698-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Machado TB, Picorelli ACR, de Azevedo BL, de Aquino ILM, Queiroz VF, Rodrigues RAL, Araújo JP Jr, Ullmann LS, Dos Santos TM, Marques RE, Guimarães SL, Andrade A, Gularte JS, Demoliner M, Filippi M, Pereira V, Spilki FR, Krupovic M, Aylward FO, Del-Bem L-E, Abrahão JS. 2023. Gene duplication as a major force driving the genome expansion in some giant viruses. J Virol 97:e0130923. doi: 10.1128/jvi.01309-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Sheng Y, Wu Z, Xu S, Wang Y. 2022. Isolation and identification of a large green alga virus (Chlorella virus XW01) of Mimiviridae and its virophage (Chlorella virus virophage SW01) by using unicellular green algal cultures. J Virol 96:e0211421. doi: 10.1128/jvi.02114-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Speciale I, Notaro A, Abergel C, Lanzetta R, Lowary TL, Molinaro A, Tonetti M, Van Etten JL, De Castro C. 2022. The astounding world of glycans from giant viruses. Chem Rev 122:15717–15766. doi: 10.1021/acs.chemrev.2c00118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. International Committee on Taxonomy of Viruses: ICTV. 2023. https://ictv.global/about/code. https://ictv.global/about/code.
  • 28. Sun T-W, Yang C-L, Kao T-T, Wang T-H, Lai M-W, Ku C. 2020. Host range and coding potential of eukaryotic giant viruses. Viruses 12:1337. doi: 10.3390/v12111337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Meyer JR, Dobias DT, Medina SJ, Servilio L, Gupta A, Lenski RE. 2016. Ecological speciation of bacteriophage lambda in allopatry and sympatry. Science 354:1301–1304. doi: 10.1126/science.aai8446 [DOI] [PubMed] [Google Scholar]
  • 30. Longdon B, Brockhurst MA, Russell CA, Welch JJ, Jiggins FM. 2014. The evolution and genetics of virus host shifts. PLOS Pathog 10:e1004395. doi: 10.1371/journal.ppat.1004395 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Hayford B, Baker D. 2012. Lakes of the Nebraska sandhills, p 26–30. In Lake line [Google Scholar]
  • 32. Van Etten JL, Lane LC, Meints RH. 1991. Viruses and viruslike particles of eukaryotic algae. Microbiol Rev 55:586–620. doi: 10.1128/mr.55.4.586-620.1991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736. doi: 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Agarkova IV, Dunigan DD, Van Etten JL. 2006. Virion-associated restriction endonucleases of chloroviruses. J Virol 80:8114–8123. doi: 10.1128/JVI.00486-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Aylward FO, Moniruzzaman M, Ha AD, Koonin EV. 2021. A phylogenomic framework for charting the diversity and evolution of giant viruses. PLOS Biol 19:e3001430. doi: 10.1371/journal.pbio.3001430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. doi: 10.1093/nar/22.22.4673 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Tamura K, Stecher G, Kumar S. 2021. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38:3022–3027. doi: 10.1093/molbev/msab120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Chernomor O, von Haeseler A, Minh BQ. 2016. Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol 65:997–1008. doi: 10.1093/sysbio/syw037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14:587–589. doi: 10.1038/nmeth.4285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. 2020. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534. doi: 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Le SQ, Gascuel O. 2008. An improved general amino acid replacement matrix. Mol Biol Evol 25:1307–1320. doi: 10.1093/molbev/msn067 [DOI] [PubMed] [Google Scholar]
  • 42. Gu X, Fu YX, Li WH. 1995. Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol Biol Evol 12:546–557. doi: 10.1093/oxfordjournals.molbev.a040235 [DOI] [PubMed] [Google Scholar]
  • 43. Letunic I, Bork P. 2021. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49:W293–W296. doi: 10.1093/nar/gkab301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Besemer J, Lomsadze A, Borodovsky M. 2001. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618. doi: 10.1093/nar/29.12.2607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Chan PP, Lowe TM.. 2019. tRNAscan-SE: searching for tRNA genes in genomic sequences, p. 1–14. In Kollmar, M (ed.), Gene prediction: methods and protocols. Springer, New York. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 32:11–16. doi: 10.1093/nar/gkh152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 48. Zimmermann L, Stephens A, Nam S-Z, Rau D, Kübler J, Lozajic M, Gabler F, Söding J, Lupas AN, Alva V. 2018. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J Mol Biol 430:2237–2243. doi: 10.1016/j.jmb.2017.12.007 [DOI] [PubMed] [Google Scholar]
  • 49. Gabler F, Nam S-Z, Till S, Mirdita M, Steinegger M, Söding J, Lupas AN, Alva V. 2020. Protein sequence analysis using the MPI bioinformatics toolkit. Curr Protoc Bioinformatics 72:e108. doi: 10.1002/cpbi.108 [DOI] [PubMed] [Google Scholar]
  • 50. Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, et al. 2023. InterPro in 2022. Nucleic Acids Res 51:D418–D427. doi: 10.1093/nar/gkac993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. R Foundation for Statistical Computing . 2022. R: a language and environment for statistical computing. Vienna, Austria [Google Scholar]
  • 52. Wickham H. 2016. Ggplot2: elegant graphics for data analysis. Springer, New York. [Google Scholar]
  • 53. Ahlmann-Eltze C, Patil I. 2024. ggsignif: R package for displaying significance brackets for “ggplot2. doi: 10.31234/osf.io/7awm6 [DOI]
  • 54. Kassambara A, Mundt F.. 2020. factoextra: extract and visualize the results of multivariate data analyses. https://rpkgs.datanovia.com/factoextra/
  • 55. Darling ACE, Mau B, Blattner FR, Perna NT. 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403. doi: 10.1101/gr.2289704 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA. 2011. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 12:402. doi: 10.1186/1471-2164-12-402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238. doi: 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Waite DW, Liefting L, Delmiglio C, Chernyavtseva A, Ha HJ, Thompson JR. 2022. Development and validation of a bioinformatic workflow for the rapid detection of viruses in biosecurity. Viruses 14:2163. doi: 10.3390/v14102163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Bastian M, Heymann S, Jacomy M. 2009. Gephi: an open source software for exploring and manipulating networks. ICWSM 3:361–362. doi: 10.1609/icwsm.v3i1.13937 [DOI] [Google Scholar]
  • 60. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114. doi: 10.1038/s41467-018-07641-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, et al. 2020. Array programming with NumPy. Nat New Biol 585:357–362. doi: 10.1038/s41586-020-2649-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Waskom ML. 2021. Seaborn: statistical data visualization. JOSS 6:3021. doi: 10.21105/joss.03021 [DOI] [Google Scholar]
  • 63. Hunter JD. 2007. Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. doi: 10.1109/MCSE.2007.55 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All new virus genomes described in this study are available in the NCBI GenBank database under the accession numbers PP681862-PP681914. PacBio raw reads are included in BioProject PRJNA1154233.


Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES