Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2013 Jul 15;110(31):12798–12803. doi: 10.1073/pnas.1305956110

Twelve previously unknown phage genera are ubiquitous in global oceans

Karin Holmfeldt a,1,2, Natalie Solonenko a, Manesh Shah b, Kristen Corrier b, Lasse Riemann c, Nathan C VerBerkmoes b,3, Matthew B Sullivan a,1
PMCID: PMC3732932  PMID: 23858439

Abstract

Viruses are fundamental to ecosystems ranging from oceans to humans, yet our ability to study them is bottlenecked by the lack of ecologically relevant isolates, resulting in “unknowns” dominating culture-independent surveys. Here we present genomes from 31 phages infecting multiple strains of the aquatic bacterium Cellulophaga baltica (Bacteroidetes) to provide data for an underrepresented and environmentally abundant bacterial lineage. Comparative genomics delineated 12 phage groups that (i) each represent a new genus, and (ii) represent one novel and four well-known viral families. This diversity contrasts the few well-studied marine phage systems, but parallels the diversity of phages infecting human-associated bacteria. Although all 12 Cellulophaga phages represent new genera, the podoviruses and icosahedral, nontailed ssDNA phages were exceptional, with genomes up to twice as large as those previously observed for each phage type. Structural novelty was also substantial, requiring experimental phage proteomics to identify 83% of the structural proteins. The presence of uncommon nucleotide metabolism genes in four genera likely underscores the importance of scavenging nutrient-rich molecules as previously seen for phages in marine environments. Metagenomic recruitment analyses suggest that these particular Cellulophaga phages are rare and may represent a first glimpse into the phage side of the rare biosphere. However, these analyses also revealed that these phage genera are widespread, occurring in 94% of 137 investigated metagenomes. Together, this diverse and novel collection of phages identifies a small but ubiquitous fraction of unknown marine viral diversity and provides numerous environmentally relevant phage–host systems for experimental hypothesis testing.

Keywords: model systems, phage genomics, phage taxonomy, queuosine biosynthesis, prophage


Microbes are the main drivers of the planet’s biogeochemical cycles (1), and their viruses (phages) play important roles, ranging from mortality and nutrient cycling to gene transfer (reviewed in ref. 2). However, our knowledge of phage biology, ecology, and evolution is biased toward phages that infect only a few hosts. For example, 85% of 1,100 sequenced phage genomes in GenBank are isolated by using bacteria from only three of 45 known bacterial phyla (Actinobacteria, Firmicutes, and Proteobacteria of the class Gammaproteobacteria), predominantly involved in human diseases and food processing. In contrast, with the exception of phages infecting cyanobacteria (cyanophages), phages that infect environmental microbes are largely unstudied and unknown. This lack of genomic representation results in unidentified DNA sequences accounting for ∼90% of the sequences in nearly any viral metagenome, even when using technologies that provide longer read lengths (3), which hinders inference power in viral ecology.

One bacterial phylum whose phages are underexplored is Bacteroidetes, which is currently represented by the genomes of only six phages in GenBank, isolated from both aquatic and human-related environments (e.g., refs. 4, 5). Bacteroidetes bacteria are abundant and active members of bacterial communities in various habitats ranging from Antarctic soil (6) to surface (7) and deep (8) oceans and even the human gut. In humans, Bacteroidetes comprise 30% of the gut microbiota and play important roles for fat storage (9) and the immune system (10). In the oceans, Bacteroidetes is the third most abundant bacterial phylum (7, 8), and there these bacteria are active in degrading biopolymers (11) and involved in recycling of phytoplankton bloom-related organic matter (12).

Here we present 31 genomes and 13 representative structural proteomes of Bacteroidetes phages isolated by using 17 Cellulophaga host strains. This genomic and proteomic information helps define Cellulophaga phage taxonomy, diversity, and functional potential, and metagenomic comparisons elucidate their distribution in natural aquatic systems.

Results and Discussion

Cellulophaga Phages Represent at Least 12 Novel Phage Genera.

Genomes were sequenced from 31 phages isolated from the strait of Öresund, between Sweden and Denmark, using 17 closely related Cellulophaga baltica host strains [99.5–100% 16S rRNA gene identity (13)]. The genomes ranged in size from 6.5 to 145 kb with a G+C content of 29% to 38% (summarized in Table 1 and detailed in Table S1). As is common for environmental phages (e.g., refs. 4, 14), few (average, 39%; range, 23–53%) predicted ORFs matched anything in databases, with only 3% to 39% (average, 22%) functionally annotated beyond “hypothetical” protein (Table 1 and Table S1; full annotation details in Dataset S1). Few structural proteins could be identified based on sequence data alone: 83% of the 192 proteins identified through virion structural MS-based proteomics lacked any sequence-based similarity to known structural proteins (Dataset S1). Per genome, this allowed 8 to 27 ORFs to be annotated as “structural,” compared with zero to nine ORFs based on sequence similarity alone (Dataset S1). This clearly delineated each genome’s structural module (example in Fig. S1), as observed for other environmental phages (e.g., ref. 4). MS also identified that proteins with lytic activity (based on sequence similarity) were present in the structural particle of seven of the investigated phages (Dataset S1), whose function could be to aid penetration of the bacterial cell wall upon entry (e.g., ref. 15).

Table 1.

General features of Cellulophaga phage genera

Group Type phage Family Putative genus DNA type Genome size, kb G+C, % ORFs tRNA ORFs with hits to databases, % ORFs with function, % ORFs with proteomic data
1 ϕ40:1 Podoviridae Cba401likevirus dsDNA 72.5 38 101 16 35 15 14
2 ϕ18:3 Podoviridae Cba183likevirus dsDNA 73.2 33 125 1 41 14 12
3 ϕ14:2 Podoviridae Cba142likevirus dsDNA 100.4 30 133 26 15 19
4 ϕ4:1 Podoviridae Cba41likevirus dsDNA 145.7 33 198 24 32 14 27
5 ϕSM Myoviridae CbaSMlikevirus dsDNA 54.2 33 80 41 29 22
6 ϕ39:1 Siphoviridae Cba391likevirus dsDNA 28.8 31 49 43 22 14
7 ϕ46:1 Siphoviridae Cba461likevirus dsDNA 34.8 38 54 52 28 8
8 ϕ18:1 Siphoviridae Cba181likevirus dsDNA 38.9 37 64 52 37 10
9 ϕ10:1 Siphoviridae Cba101likevirus dsDNA 55.6 31 112 3 33 16 12
10 ϕ13:1 Siphoviridae Cba131likevirus dsDNA 77.8 30 107 40 24 15
11 ϕ18:4 Microviridae Cba184likevirus ssDNA 6.5 34 13 23 15 10
12 ϕ48:2 Novel Cba482likevirus ssDNA 11.5 29 30 27 3 17

Each row represents the average data for each genus. Group numbers refer to numbers used for each genus in Figs. 1 and 5.

Comparative genomics delineated 12 groups (Fig. 1A), which, using current taxonomic metrics whereby phages within a genus share 40% of their genes (16), represent 12 new viral genera with >40% of the genes shared within genera and 0% to 18% of the genes shared between genera. The genera are named following International Committee on Taxonomy of Viruses conventions that use the name of the host bacterium (Cba for C. baltica) and type phage, where we remove the “:” from the type phage name (Table 1 and Table S1). Although most genera shared >65% of their genes, the two phages in genus Cba101 were more divergent, sharing only 41% of their genes and at lower percent identity (averages of 97% vs. 82% aa identity, respectively), and are tracked specifically here as Cba101a (ϕ10:1) and Cba101b (ϕ19:1; Fig. S1B).

Fig. 1.

Fig. 1.

(A) Heat map showing percentage of shared genes between the 31 Cellulophaga phages. Numbers in the boxes indicate the 12 genera delineated by this genome comparison. (B) EM images of representative phages from each genus (affiliation designated by number in the micrograph) delineated from gene comparisons. (Scale bars: 100 nm.)

Morphologically, these 12 genera derive from at least four viral families according to current International Committee on Taxonomy of Viruses taxonomy (17): 10 belonged to dsDNA tailed phage families [order Caudovirales, families Myoviridae (n = 1 genus), Podoviridae (n = 4 genera), and Siphoviridae (n = 5 genera); Fig. 1B], whereas two were previously described as nontailed ssDNA phages (18).

High Diversity and Novelty: Breaking Marine Paradigms.

Cellulophaga phage diversity starkly contrasts the only other extensively sequenced aquatic phage collections, but parallels collections of phages that infect heterotrophic bacteria (Fig. 2). The marine systems, Prochlorococcus and Synechococcus cyanophages, represent fewer (n = 4 and n = 5, respectively) groups per host genus (Fig. 2A), despite more diverse hosts and water samples used to isolate cyanophages compared with Cellulophaga phages. In contrast, the nonmarine Escherichia coli and Pseudomonas aeruginosa phages were as diverse as the Cellulophaga phages (Fig. 2 B and C). Curiously, the nonmarine Mycobacterium phages also show large genomic diversity [15 clusters and seven singletons (19)], but more limited morphological diversity [Siphoviridae and Myoviridae (20)]. Although no genomes were available, similarly high diversity was also reported among 22 marine Pseudoalteromonas phages isolated from the North Sea, where morphology, DNA hybridization, and host range analysis delineated 13 groups (21). One possible explanation for the apparent reduced diversity in cyanophages might be that K-strategist hosts (e.g., cyanobacteria) have relatively invariant host physiology compared with r-strategists [e.g., C. baltica and E. coli (22)]. This could reduce phage diversity via fewer niche opportunities on the former and increase diversity via host-state–dependent phage genera cycling on the latter. Notably, some marine phages that infect heterotrophic bacteria appear less diverse, but this could be an artifact, as these phage collections suffer from a paucity of isolates (23); the use of broad, nonquantitative metrics for delineating groupings [60 phages for Vibrio parahaemolyticus (24)]; or ascertainment bias [e.g., selection in isolation procedures to recover near-identical roseophages (25)].

Fig. 2.

Fig. 2.

Heat maps showing percentage of shared genes between phages infecting the same bacterial host species. (A) Cyanophages isolated on Prochlorococcus or Synechococcus, (B) E. coli phages, and (C) P. aeruginosa phages. Dashed lines separate Prochlorococcus and Synechococcus phages. Squares enclose phages belonging to the same phage family: I, Inoviridae; L, Leviviridae; M, Myoviridae; Mi, Microviridae; P, Podoviridae; S, Siphoviridae; T, Tectiviridae. Dark areas (large proportion of genes shared) outside of family squares indicate putatively horizontally transferred genes.

Evolutionarily, signatures of horizontal gene transfer were observed in entero- and cyanophages. Among enterophages, phages of different morphotypes (podo- and siphoviruses) shared a large number of genes (Fig. 2B), which complicates phage taxonomy (e.g., refs. 16, 26). Cyanophages, on the contrary, shared genes between phages that were isolated by using different host genera (Fig. 2A). For example, cyanomyoviruses commonly share half of their genes whether isolated within one genus or from two genera. Indeed, exchange of genes, even core genes, between cyanophages that infect across generic boundaries is well known (e.g., ref. 27). However, no such signature, either sharing >50% of their genes between phages of different families or between phages infecting different hosts, was observed for Cellulophaga phages.

Marine cyanophages and roseophages also share genes with nonmarine coliphages (e.g., refs. 28, 29), and again Cellulophaga phages do not. Marine cyanophages share 16 (podoviruses) and 38 (myoviruses) genes with nonmarine enterophages (28), and roseophage DSS3ϕ2 shares 26 genes with enterophage N4 (29). In contrast, most Cellulophaga phages share few genes (average, n = 3; range, n = 1–6) with any non-Cellulophaga phage. The exceptions are Cellulophaga myovirus genus CbaSM (14 shared genes with Vibrio phage VP16T, GenBank accession no. AY328852; Fig. 1A) and siphovirus genera Cba181 and Cba101a (10 and 18 genes shared, respectively, with Flavobacterium phage 11b, GenBank accession no. NC_006356; Fig. 1B). These are both phages of aquatic origin (4, 14) and the host of the latter (Flavobacterium phage 11b) belongs to the same family as C. baltica. In all cases, these shared genes are highly divergent (26–40% aa identity). This highlights that, even though a few Cellulophaga phages share genes with other phages, they are still exceptionally different from known phages, even when comparing the siphoviruses, which represent the bulk (57% per GenBank, accessed December 2012) of sequenced Caudovirales phages.

Phage Giants: Exceptionally Large Cellulophaga Podo- and Nontailed Phages.

Six Cellulophaga phage genera (the myo- and siphoviruses) had genome sizes within the known range, whereas the genome sizes of the other six genera (the podoviruses and nontailed icosahedral phages) were quite different from known representatives (Fig. 3).

Fig. 3.

Fig. 3.

Genome size comparison of the Cellulophaga phages (colored asterisks) to genome-sequenced phages available in GenBank (accessed December 2012; box plots). Phages within the family Myoviridae have been divided into two groups (larger and smaller than 100 kb) in view of the large range of genome sizes. [Note: a 498-kb myovirus, Bacillus phage G (JN63751), was not included here to minimize white space in the figure.] The box plot of icosahedral ssDNA phages represents phages belonging to Microviridae, the only known nontailed, icosahedral ssDNA phage family. The box represents the lower and upper quartiles with the median marked. The whiskers present 1.5 interquartile range (IQR) from the lower and upper quartiles, respectively; circles are outliers (1.5–3 IQR from the end of the box) and black asterisks are extremes (>3 IQR from the end of the box).

Podoviruses.

The four Cellulophaga podovirus genera defined through sequence similarity fell into three groups by genome size: 71 to 76 kb (genera Cba401 and Cba183), 100 kb (genus Cba142), and 145 to 146 kb (genus Cba41). Considering that only 7% of previously sequenced podovirus genomes are >70 kb and the largest was only 79 kb (Fig. 3), the Cellulophaga podoviruses are clearly exceptionally large. Not surprisingly, their genomes differed from known podoviruses: only 4% to 10% of genes were similar to podoviruses and, instead, often, more genes were similar to those from sipho- and myoviruses—although, in all cases with relatively low similarities (averaging 32–33% aa identity).

Numerous other traits highlight the novelty of the Cellulophaga podoviruses. First, although tRNAs occur in ∼20% of previously sequenced podovirus genomes (GenBank, accessed June 2012), their frequency is one to three per genome, which contrasts the 16 to 24 observed for genera Cba401 and Cba41 Cellulophaga podovirus genomes (Table 1 and Table S1). Such phage-encoded tRNA abundances are more characteristic of myoviruses, for which as many as 33 tRNAs are used to expand the phage’s codon use capabilities during infection of diverse hosts (30). Consistent with this, the Cellulophaga podoviruses with several tRNAs (genera Cba401 and Cba41) infect 9 to 15 Cellulophaga strains, whereas those with fewer tRNAs (genera Cba183 and Cba142, 0–1 tRNA) infect only one to four Cellulophaga strains (13). Second, the protein-folding genes chaperonin GroEL or chaperonin Cpn10 occur in all four Cellulophaga podovirus genera (Dataset S1). Such genes (GroEL and Cpn10) have previously been reported in myoviruses (four GroEL and 56 Cpn10 of 284 investigated myoviruses) but are lacking in siphoviruses and podoviruses (GenBank, accessed December 2012). Chaperonins are critical for protein folding, perhaps here involved in folding of phage capsid proteins (31). Why these are shared among all these Cellulophaga podovirus genera is unknown, but may reflect larger-genome podoviruses requiring larger and possibly more complex capsid structures. This would be in agreement with chaperonin-containing myoviruses, all 60 of which have genome sizes >100 kb (GenBank, accessed December 2012). Finally, relatively unique combinations of nucleotide metabolism genes occur in the large podovirus genera Cba142 and Cba41 (details provided later).

Nontailed ssDNA phages.

The nontailed, ssDNA Cellulophaga phages (18) are either slightly larger (genus Cba184 exceeds the largest Microviridae genome by 300 bp) or nearly double the size of known nontailed, icosahedral ssDNA phages (11.7 kb, genus Cba482; Fig. 3). For phages of genus Cba184, none of their 14 predicted genes were similar to Microviridae phage genes or Microviridae-like genes found integrated in Bacteroidetes genomes (32), but functional parallels emerged as follows. First, the largest gene (gp 2) is a DNA replication protein (Dataset S1), as in the ssDNA model phage ϕX174 [the A protein (33)]. Second, the gene gp 8 is in the viral particle (Dataset S1) and annotated as a mannosyl-glycoprotein endo-beta-N-acetylglucosaminidase. This is likely an analogue to the ϕX174 pilot protein (H) (33) and adapted to recognize and penetrate the polysaccharides present in the C. baltica host cell wall (34). Third, the second largest gene (gp 4) garnered the most spectral counts in proteomic analyses (Dataset S1), which parallels the ϕX174 major capsid protein (F) (33). Thus, we posit that, despite being half the size of F, gp 4 is the major capsid protein for the genus Cba184 Cellulophaga phages. Together, our data suggest that these phages represent a divergent subfamily within Microviridae.

In contrast, the genus Cba482 phage represents an entirely new phage family. At 11.7 kb, its genome is the largest among known ssDNA bacteriophages and icosahedral ssDNA viruses but is only half the genome size of the largest known ssDNA virus—a rod-shaped archaeal virus, ACV, with a 25 kb genome (35). Functionally, one gene (gp 19) encoded a peptidase (Dataset S1), whereas another 23% were similar to nonviral database sequences. Among these, four were exclusively similar to Zunongwangia profunda (phylum Bacteroidetes) genes, and, along with a highly divergent but syntenic peptidase, a remnant prophage could be identified in this microbial genome (Fig. 4). Together with turbid plaque morphology (Fig. S2), this suggests that the genus Cba482 phage is temperate (i.e., capable of lysogeny). Although common in filamentous ssDNA phages (e.g., ref. 36), lysogeny is rare in icosahedral ssDNA phages. The only evidence available is that Microviridae-like phage genomes occur in bacterial genomes [Chlamydia (37) and Bacteroidetes (32)], and a nontailed ssDNA phage has been induced from Synechococcus cultures [no phage genomes available (38)]. Interestingly, the latter has a capsid size similar to the genus Cba482 phage.

Fig. 4.

Fig. 4.

Comparison of Cellulophaga phage ϕ48:2 to the region in Z. profunda (phylum Bacteroidetes) possibly containing a temperate phage (ORF 2,661–2,685). Lines drawn between the genomes represent shared sequence similarity, which is given next to each line as percentage amino acid identity and e-value.

Although nontailed phages are thought uncommon as they are underrepresented in culture collections [e.g., <4% are nontailed among 5,500 isolated phages (39)], recent work shows that they dominate marine viral communities in the global surface oceans (40). Thus, both nontailed phage genera described here offer windows into a dominant marine phage type as they represent the first in culture infecting a Bacteroidetes bacterium, and join only four other marine nontailed phages in culture [two infecting Synechococcus (38, 41), one infecting Pseudoalteromonas (only sequenced; ref. 42), and one infecting a host of unknown taxonomy (43)].

Unusual Phage-Encoded Nucleotide Metabolisms.

Three Cellulophaga phage genera (large podovirus genera Cba142 and Cba41 and large siphovirus genus Cba131) contained genes for de novo nucleotide synthesis including ribonucleoside-diphosphate reductase (RNR) and thymidylate synthase or thymidylate synthase complementing protein (Thy; Tables S2 and S3). These genes are common among myoviruses (>50% of 284 genomes), but less common in siphoviruses (70 [RNR] and 86 [Thy] of 625 genomes; Table S2) and podoviruses (only 12 of 186 genomes have either; GenBank, accessed December 2012; Table S3). Siphoviruses have any of three classes of RNRs, whereas published podoviruses are restricted to class II (11 roseo- and cyanophages) or III (1 Cronobacter phage) RNRs. All Cellulophaga phage RNRs are class I, regardless of morphology. Previous phylogenetic work showed that podovirus-encoded RNR type reflects that of the host, whereas myovirus-encoded RNRs might not (44). Our data, along with additional RNR-containing roseo- and cyanopodovirus and host RNR data obtained from GenBank, support this hypothesis (Table S3). Notably, only marine (roseo- or cyanophages) or large (N4- or phiEco32-like) phages have RNR or Thy genes (Table S4). This suggests that smaller genome phages gain little from the ability to convert available RNA pools into DNA for phage replication (45) except in predominantly phosphorous-limited marine environments.

Besides genes for RNR and Thy, all Cellulophaga phages in genera CbaSM (myovirus) and Cba131 (siphovirus) have queuosine (Que) biosynthesis genes (Dataset S1). Que is a hypermodified guanosine derivative in tRNAs specific for four amino acids (Asp, Asn, His, or Tyr) that increase translation efficiency and is found across all domains of life (46). Que de novo synthesis in prokaryotes requires five genes in the bacteria preQ1 pathway (46), of which five (genus CbaSM) and three (genus Cba131) are present in the Cellulophaga phages (Table 2 and Fig. S3). However, Que is uncommon among phages: (i) only 16 phages have Que genes similar to those in Cellulophaga (Table 2), and (ii) only one phage, Streptococcus phage Dp-1, has a larger number of Que genes [also involved in Que insertion (47)]. Analyses of viral metagenomes, however, suggest that virus-encoded Que genes are broadly distributed in aquatic environments [occurring in 55 of 137 viral metagenomes available as Broad Phage Metagenomes at Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA); Table 2], indicating that Que might be more important for aquatic phages than for phages in other environments.

Table 2.

Summary of Que biosynthesis genes occurring in all phages in genera CbaSM (g5) and Cba131 (g10)

Function Phage homologues (CAMERA)
Recruited metagenome reads of phage origin, %
Metagenomes containing phage recruits
g5 g10 g5 g10 g5 g10
GTP cyc I* 16 0 0.3 84 1 32
QueD 1 0 46 90 33 23
QueC 16 NA 3 NA 6 NA
QueE 16 0 8 88 14 34
QueF 3 NA 0.9 NA 1 NA

The table includes data concerning phage homologues to the Cellulophaga phages Que genes in CAMERA and recruitment of metagenome reads from Broad Phage Metagenome to the Cellulophaga phages Que genes. cyc, cyclohydrolase; NA, not applicable, i.e., gene did not occur in genome.

Global Distribution of the Cellulophaga Phage Types.

Given Cellulophaga phage novelty, their distributions were investigated using recruitment to 137 Broad Phage Metagenomes. Reads were recruited from 17 to 115 (average, 87) metagenomes per phage (details in Fig. 5) with representation from environments ranging from freshwater to marine, coastal to open ocean, and surface water to deep sediments. Curiously, 35% to 52% of genes from the tailed, dsDNA Cellulophaga phages recruited metagenomic reads, whereas only 11% to 14% of the nontailed, ssDNA phage genes did (Fig. 5 and Dataset S1). We posit that nontailed phage recruitment is repressed as a result of artifacts: (i) nontailed phages are commonly lost during CsCl purification as their low buoyant density differs from tailed phages (48), and (ii) DNA extraction methods are not optimized for the Cellulophaga nontailed phages (Methods) (18). Consistent with this hypothesis, a metagenome that targeted ssDNA phages (CAM_SMPL_000841; available on CAMERA) was responsible for most (58%) of the recruited ssDNA, nontailed phage reads.

Fig. 5.

Fig. 5.

Box plots show the percent amino acid identity for metagenomic reads (all 137 metagenomes available at CAMERA, Broad Phage Metagenome, January 2012) recruiting to predicted genes from C. baltica phages (designated as genera 1–12), as well as three T4-like phages: marine Prochlorococcus phage P-SSM4 (GenBank accession no. NC_006884), marine Vibrio phage KVP40 (GenBank accession no. NC_005083), and nonmarine Enterobacteria phage T4 (GenBank accession no. NC_000866). (A) Cellulophaga phage group (Table 1) or T4-like phage isolate. (B) Gene products to which the metagenomic reads were recruited (as percentages). (C) Number of metagenomes from which the recruited reads originated. (D) Reads with higher bitscore to a Cellulophaga phage than NCBI (as percentages). (E) Reads exclusively recruiting to a Cellulophaga phage (as percentages). (F) Maximum proportion of novel reads identified in a single metagenome (as per mils). The box represents the lower and upper quartiles with the median marked. The whiskers present 1.5 IQR from the lower and upper quartiles, respectively; circles are outliers (1.5–3 IQR from the end of the box) and asterisks are extremes (>3 IQR from the end of the box).

The types of genes that recruited reads and the quality of recruitment were used to better interpret their meaning. First, reads recruited to conserved proteins (e.g., DNA replication or nucleotide metabolism) as well as phage-group–specific, experimentally identified structural proteins and predicted ORFs of unknown function. Although the former may be spurious recruits, indicative of conserved proteins found across many phage groups, the latter are likely bona fide recruits because phage structural proteins rarely share sequence similarity across groups (49). Second, on average, many of the recruited reads were exclusive to the Cellulophaga phages (average, 15%) or had higher bitscore to the Cellulophaga phages (average, 14%) than to anything in GenBank (details in Fig. 5 and Dataset S1). Third, sensitivity analysis with metagenomic recruitment to the T4-like phages suggested that these Cellulophaga phage genera are indeed ubiquitous (Fig. 5). Here, cyanophage P-SSM4, described as abundant in the oceans (50), recruited reads to 84% of its genes, averaging 57% aa identity. In contrast, reads recruited to only 38% of Vibrio phage KVP40 and Enterobacteria phage T4 genes and at a lower average percent identity (42% aa identity), which was similar to that observed for Cellulophaga phages. Many (51% for KVP40 and 68% for T4) of these recruited reads did so redundantly across the T4 phages, likely because of nonspecific recruitment to core T4 genes (Table S4), where the cyanobacterial T4 phage (P-SSM4) was likely the most representative reference genome for these metagenomes. Given such hierarchical recruitment results, we propose that, although the particular Cellulophaga phages investigated here are not abundant in these metagenomes, the phage genera they represent are ubiquitous, as they occur in 94% of 137 investigated aquatic viral metagenomes. However, caution is warranted when making quantitative statements from available metagenomic datasets (reviewed in ref. 51). Future work with the use of emerging viral metagenomic methods (3, 51, 52) to generate quantitative datasets (e.g., ref. 3) will enable better quantification of these and other new viral genera in the global oceans. Given these caveats, however, the novel Cellulophaga phages presented here help identify a small (average maximum of 0.04%; details in Fig. 5) but ubiquitous portion of the vast unknown sequence space that dominates (∼90%) (3) marine viral metagenomes.

Conclusions

The 31 Cellulophaga phages represent 12 novel genera and together comprise the largest collection of genome-sequenced aquatic phages that infect a single host species. Their novelty, diversity, and ubiquity in the oceans are striking and warrant future structural, ecological, and evolutionary investigations. Although less abundant than the marine pelagi-, roseo-, or cyanophages (23), Cellulophaga phages likely present a first glimpse into the phage side of the “rare biosphere” (reviewed in ref. 53). Although not very abundant, such rare biosphere bacteria are thought to impact nutrient cycling during blooms conditions, with low abundances leading to cryptic escape from virus infections (53). Given the large diversity of Cellulophaga phages, bloom events must be frequent, with viral and microbial population structure likely shaped by microscale heterogeneity (54). These novel phage genera may also be critical in other environments in which Bacteroidetes phylum hosts are abundant, including the human gut (9). Mechanistic and discovery-based exploration of these new viral types will help elucidate yet another aspect of the large virus–host diversity that is increasingly being recognized as important in natural and manmade ecosystems.

Methods

Bacterial strains and phages were isolated as described previously (13). DNA for dsDNA phages was extracted by using the Wizard Genomic DNA Purification Kit (Promega) per the manufacturer’s recommendations. DNA for ssDNA phages was extracted as a dsDNA replicative intermediate during infection. Phages were sequenced by using a combination of 454, Illumina, and Ion Torrent sequencing, and closed as needed by using Sanger sequencing. Because ssDNA phage genomes were nearly completely novel, they were also fully Sanger-sequenced by using CsCl-purified viral particles as PCR template. ORFs were predicted by using GeneMark, then annotated by using BLASTP against the National Center for Biotechnology Information (NCBI) nonredundant (as of November 2012), Conserved Domain, and Pfam databases (e-value cutoff <1 × 10−3); tRNAs were identified by using tRNAscan-SE. Percentage of shared genes between the Cellulophaga phages was calculated from BLASTP comparison (e-value cutoff <1 × 10−3). Metagenomic reads were recruited to predicted ORFs by using the TBLASTN workflow (cutoffs of e-value <1 × 10−3, >20% aa identity, alignment length >45 nt, and 300 alignments per query) in CAMERA against Broad Phage Metagenomes (https://portal.camera.calit2.net/; January 2012). Recruited reads were compared with NCBI nonredundant (e-value cutoff <1 × 10−3) to calculate novelty of Cellulophaga phage contributions to each metagenome. Transmission EM was conducted as described previously (18). For proteomics, CsCl-purified viral particles were tryptically digested by using an optimized Filter-Aided Sample Preparation kit protocol (Protein Discovery; now Expedion) (55) and analyzed via 2D nano-LC-MS/MS (56). Resultant MS/MS spectra were searched against a compiled viral predicted protein database with SEQUEST and conservatively filtered with DTASelect (56). For proteomics, databases, peptide and protein results, MS/MS spectra and Tables S1S4 are archived and available at https://compbio.ornl.gov/Cellulophaga_phages_proteome. MS .raw files or other extracted formats are available upon request. Phage genome GenBank accession numbers are KC821604 to KC821634. A complete description of materials and methods is provided in SI Methods.

Supplementary Material

Supporting Information

Acknowledgments

We thank J. Cesar Ignacio-Espinoza and Melissa Duhaime for help with phage genome analyses; Jarl Haggerty and Kenth Holmfeldt for development of in-house bioinformatics programs; and Forest Rohwer for launching M.B.S. and K.H. in phage genomics and supporting preliminary genome sequencing of isolates ϕ4:1, ϕ13:1, and ϕ39:1. This study was supported by the Gordon and Betty Moore Foundation (M.B.S.) and postdoctoral fellowships from the Sweden–America Foundation and the Swedish Research Council (to K.H.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. KC821604KC821634).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1305956110/-/DCSupplemental.

References

  • 1.Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
  • 2.Breitbart M. Marine viruses: Truth or dare. Annu Rev Mar Sci. 2012;4:5.1–5.24. doi: 10.1146/annurev-marine-120709-142805. [DOI] [PubMed] [Google Scholar]
  • 3.Hurwitz BL, Sullivan MB. The Pacific Ocean virome (POV): A marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology. PLoS ONE. 2013;8(2):e57355. doi: 10.1371/journal.pone.0057355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Borriss M, et al. Genome and proteome characterization of the psychrophilic Flavobacterium bacteriophage 11b. Extremophiles. 2007;11(1):95–104. doi: 10.1007/s00792-006-0014-5. [DOI] [PubMed] [Google Scholar]
  • 5.Hawkins SA, Layton AC, Ripp S, Williams D, Sayler GS. Genome sequence of the Bacteroides fragilis phage ATCC 51477-B1. Virol J. 2008;5:97. doi: 10.1186/1743-422X-5-97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Aislabie JM, et al. Dominant bacteria in soils of Marble Point and Wright Valley, Victoria Land, Antarctica. Soil Biol Biochem. 2006;38:3041–3056. [Google Scholar]
  • 7.Kirchman DL, Dittel AI, Malmstrom RR, Cottrell MT. Biogeography of major bacterial groups in the Delaware estuary. Limnol Oceanogr. 2005;50(5):1697–1706. [Google Scholar]
  • 8.Baltar F, Aristegui J, Gasol JM, Hernandez-Leon S, Herndl GJ. Strong coast-ocean and surface-depth gradients in prokaryotic assemblage structure and activity in a coastal transition zone region. Aquat Microb Ecol. 2007;50(1):63–74. [Google Scholar]
  • 9.Bäckhed F, Ley RE, Sonnenburg JL, Peterson DA, Gordon JI. Host-bacterial mutualism in the human intestine. Science. 2005;307(5717):1915–1920. doi: 10.1126/science.1104816. [DOI] [PubMed] [Google Scholar]
  • 10.Yanagibashi T, et al. Bacteroides induce higher IgA production than Lactobacillus by increasing activation-induced cytidine deaminase expression in B cells in murine Peyer’s patches. Biosci Biotechnol Biochem. 2009;73(2):372–377. doi: 10.1271/bbb.80612. [DOI] [PubMed] [Google Scholar]
  • 11.Fernández-Gómez B, et al. Ecology of marine Bacteroidetes: A comparative genomics approach. ISME J. 2013;7(5):1026–1037. doi: 10.1038/ismej.2012.1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pinhassi J, et al. Changes in bacterioplankton composition under different phytoplankton regimens. Appl Environ Microbiol. 2004;70(11):6753–6766. doi: 10.1128/AEM.70.11.6753-6766.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Holmfeldt K, Middelboe M, Nybroe O, Riemann L. Large variabilities in host strain susceptibility and phage host range govern interactions between lytic marine phages and their Flavobacterium hosts. Appl Environ Microbiol. 2007;73(21):6730–6739. doi: 10.1128/AEM.01399-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Seguritan V, Feng IW, Rohwer F, Swift M, Segall AM. Genome sequences of two closely related Vibrio parahaemolyticus phages, VP16T and VP16C. J Bacteriol. 2003;185(21):6434–6447. doi: 10.1128/JB.185.21.6434-6447.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Nakagawa H, Arisaka F, Ishii S. Isolation and characterization of the bacteriophage T4 tail-associated lysozyme. J Virol. 1985;54(2):460–466. doi: 10.1128/jvi.54.2.460-466.1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lavigne R, Seto D, Mahadevan P, Ackermann HW, Kropinski AM. Unifying classical and molecular taxonomic classification: Analysis of the Podoviridae using BLASTP-based tools. Res Microbiol. 2008;159(5):406–414. doi: 10.1016/j.resmic.2008.03.005. [DOI] [PubMed] [Google Scholar]
  • 17.King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ. Virus Taxonomy; Eighth Report of the International Committee on Taxonomy of Viruses. San Diego: Academic; 2012. [Google Scholar]
  • 18.Holmfeldt K, Odić D, Sullivan MB, Middelboe M, Riemann L. Cultivated single-stranded DNA phages that infect marine Bacteroidetes prove difficult to detect with DNA-binding stains. Appl Environ Microbiol. 2012;78(3):892–894. doi: 10.1128/AEM.06580-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jacobs-Sera D, et al. Science Education Alliance Phage Hunters Advancing Genomics And Evolutionary Science Sea-Phages Program On the nature of mycobacteriophage diversity and host preference. Virology. 2012;434(2):187–201. doi: 10.1016/j.virol.2012.09.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hatfull GF. Mycobacteriophages: Genes and genomes. Annu Rev Microbiol. 2010;64:331–356. doi: 10.1146/annurev.micro.112408.134233. [DOI] [PubMed] [Google Scholar]
  • 21.Wichels A, et al. Bacteriophage diversity in the North Sea. Appl Environ Microbiol. 1998;64(11):4128–4133. doi: 10.1128/aem.64.11.4128-4133.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lauro FM, et al. The genomic basis of trophic strategy in marine bacteria. Proc Natl Acad Sci USA. 2009;106(37):15527–15533. doi: 10.1073/pnas.0903507106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhao Y, et al. Abundant SAR11 viruses in the ocean. Nature. 2013;494(7437):357–360. doi: 10.1038/nature11921. [DOI] [PubMed] [Google Scholar]
  • 24.Kellogg CA, Rose JB, Jiang SC, Thurmond JM, Paul JH. Genetic diversity of related vibrophages isolated from marine environments around Florida and Hawaii, USA. Mar Ecol Prog Ser. 1995;120:89–98. [Google Scholar]
  • 25.Angly F, et al. Genomic analysis of multiple Roseophage SIO1 strains. Environ Microbiol. 2009;11(11):2863–2873. doi: 10.1111/j.1462-2920.2009.02021.x. [DOI] [PubMed] [Google Scholar]
  • 26.Lawrence JG, Hatfull GF, Hendrix RW. Imbroglios of viral taxonomy: Genetic exchange and failings of phenetic approaches. J Bacteriol. 2002;184(17):4891–4905. doi: 10.1128/JB.184.17.4891-4905.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ignacio-Espinoza JC, Sullivan MB. Phylogenomics of T4 cyanophages: Lateral gene transfer in the ‘core’ and origins of host genes. Environ Microbiol. 2012;14(8):2113–2126. doi: 10.1111/j.1462-2920.2012.02704.x. [DOI] [PubMed] [Google Scholar]
  • 28.Sullivan MB, Coleman ML, Weigele P, Rohwer F, Chisholm SW. Three Prochlorococcus cyanophage genomes: Signature features and ecological interpretations. PLoS Biol. 2005;3(5):e144. doi: 10.1371/journal.pbio.0030144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhao Y, Wang K, Jiao N, Chen F. Genome sequences of two novel phages infecting marine roseobacters. Environ Microbiol. 2009;11(8):2055–2064. doi: 10.1111/j.1462-2920.2009.01927.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Enav H, Béjà O, Mandel-Gutfreund Y. Cyanophage tRNAs may have a role in cross-infectivity of oceanic Prochlorococcus and Synechococcus hosts. ISME J. 2012;6(3):619–628. doi: 10.1038/ismej.2011.146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hildenbrand ZL, Bernal RA. Chaperonin-mediated folding of viral proteins. In: Rossmann MG, Rao VB, editors. Advances in Experimental Medicine and Biology. Vol 726. New York: Springer; 2012. pp. 307–324. [DOI] [PubMed] [Google Scholar]
  • 32.Krupovic M, Forterre P. Microviridae goes temperate: Microvirus-related proviruses reside in the genomes of Bacteroidetes. PLoS ONE. 2011;6(5):e19893. doi: 10.1371/journal.pone.0019893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cherwa JE, Fane BA. Encyclopedia of Life Sciences. Chichester, UK: Wiley; 2011. Microviridae: Microviruses and gokushoviruses. [Google Scholar]
  • 34.Tomshich SV, et al. Structure of acidic O-specific polysaccharide from the marine bacterium Cellulophaga baltica. Bioorg Khim. 2007;33(1):91–95. doi: 10.1134/s1068162007010104. [DOI] [PubMed] [Google Scholar]
  • 35.Mochizuki T, et al. Archaeal virus with exceptional virion architecture and the largest single-stranded DNA genome. Proc Natl Acad Sci USA. 2012;109(33):13386–13391. doi: 10.1073/pnas.1203668109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Waldor MK, Mekalanos JJ. Lysogenic conversion by a filamentous phage encoding cholera toxin. Science. 1996;272(5270):1910–1914. doi: 10.1126/science.272.5270.1910. [DOI] [PubMed] [Google Scholar]
  • 37.Read TD, et al. Genome sequence of Chlamydophila caviae (Chlamydia psittaci GPIC): Examining the role of niche-specific genes in the evolution of the Chlamydiaceae. Nucleic Acids Res. 2003;31(8):2134–2147. doi: 10.1093/nar/gkg321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.McDaniel LD, delaRosa M, Paul JH. Temperate and lytic cyanophages from the Gulf of Mexico. J Mar Biol Assoc U K. 2006;86:517–527. [Google Scholar]
  • 39.Ackermann HW. 5500 Phages examined in the electron microscope. Arch Virol. 2007;152(2):227–243. doi: 10.1007/s00705-006-0849-1. [DOI] [PubMed] [Google Scholar]
  • 40.Brum JR, Schenck RO, Sullivan MB. Global morphological analysis of marine viruses shows minimal regional variation and dominance of non-tailed viruses. ISME J. 2013 doi: 10.1038/ismej.2013.1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kuznetsov YG, Chang SC, Credaroli A, Martiny J, McPherson A. An atomic force microscopy investigation of cyanophage structure. Micron. 2012;43(12):1336–1342. doi: 10.1016/j.micron.2012.02.013. [DOI] [PubMed] [Google Scholar]
  • 42.Männistö RH, Kivelä HM, Paulin L, Bamford DH, Bamford JK. The complete genome sequence of PM2, the first lipid-containing bacterial virus to be isolated. Virology. 1999;262(2):355–363. doi: 10.1006/viro.1999.9837. [DOI] [PubMed] [Google Scholar]
  • 43.Hidaka T, Ichida K-i. Properties of a marine RNA-containing bacteriophage. Mem Fac Fish Kagoshima Univ. 1976;25:77–89. [Google Scholar]
  • 44.Dwivedi B, Xue B, Lundin D, Edwards RA, Breitbart M. A bioinformatic analysis of ribonucleotide reductase genes in phage genomes and metagenomes. BMC Evol Biol. 2013;13(33):33. doi: 10.1186/1471-2148-1113-1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Horton HR, Moran LA, Ochs RS, Rawn JD, Scrimgeour KG. Principles of Biochemistry. 2nd Ed. Upper Saddle River, NJ: Prentice-Hall; 1996. [Google Scholar]
  • 46.El Yacoubi B, Bailly M, de Crécy-Lagard V. Biosynthesis and function of posttranscriptional modifications of transfer RNAs. Annu Rev Genet. 2012;46:69–95. doi: 10.1146/annurev-genet-110711-155641. [DOI] [PubMed] [Google Scholar]
  • 47.Sabri M, et al. Genome annotation and intraviral interactome for the Streptococcus pneumoniae virulent phage Dp-1. J Bacteriol. 2011;193(2):551–562. doi: 10.1128/JB.01117-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Thurber RV, Haynes M, Breitbart M, Wegley L, Rohwer F. Laboratory procedures to generate viral metagenomes. Nat Protoc. 2009;4(4):470–483. doi: 10.1038/nprot.2009.10. [DOI] [PubMed] [Google Scholar]
  • 49.Abrescia NG, Bamford DH, Grimes JM, Stuart DI. Structure unifies the viral universe. Annu Rev Biochem. 2012;81:795–822. doi: 10.1146/annurev-biochem-060910-095130. [DOI] [PubMed] [Google Scholar]
  • 50.Williamson SJ, et al. The Sorcerer II Global Ocean Sampling Expedition: Metagenomic characterization of viruses within aquatic microbial samples. PLoS ONE. 2008;3(1):e1456. doi: 10.1371/journal.pone.0001456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Duhaime MB, Sullivan MB. Ocean viruses: Rigorously evaluating the metagenomic sample-to-sequence pipeline. Virology. 2012;434(2):181–186. doi: 10.1016/j.virol.2012.09.036. [DOI] [PubMed] [Google Scholar]
  • 52.Solonenko SA, et al. Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics. 2013;14:320. doi: 10.1186/1471-2164-14-320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Pedrós-Alió C. The rare bacterial biosphere. Annu Rev Mar Sci. 2012;4:449–466. doi: 10.1146/annurev-marine-120710-100948. [DOI] [PubMed] [Google Scholar]
  • 54.Stocker R. Marine microbes see a sea of gradients. Science. 2012;338(6107):628–633. doi: 10.1126/science.1208929. [DOI] [PubMed] [Google Scholar]
  • 55.Wiśniewski JR, Zougman A, Mann M. Combination of FASP and StageTip-based fractionation allows in-depth analysis of the hippocampal membrane proteome. J Proteome Res. 2009;8(12):5674–5678. doi: 10.1021/pr900748n. [DOI] [PubMed] [Google Scholar]
  • 56.Verberkmoes NC, et al. Shotgun metaproteomics of the human distal gut microbiota. ISME J. 2009;3(2):179–189. doi: 10.1038/ismej.2008.108. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1305956110_sd01.xlsx (320.8KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES