Abstract
Recent advances in environmental genomics have provided unprecedented opportunities for the investigation of viruses in natural settings. Yet, our knowledge of viral biogeographic patterns and the corresponding drivers is still limited. Here, we perform metagenomic deep sequencing on 90 acid mine drainage (AMD) sediments sampled across Southern China and examine the biogeography of viruses in this extreme environment. The results demonstrate that prokaryotic communities dictate viral taxonomic and functional diversity, abundance and structure, whereas other factors especially latitude and mean annual temperature also impact viral populations and functions. In silico predictions highlight lineage-specific virus-host abundance ratios and richness-dependent virus-host interaction structure. Further functional analyses reveal important roles of environmental conditions and horizontal gene transfers in shaping viral auxiliary metabolic genes potentially involved in phosphorus assimilation. Our findings underscore the importance of both abiotic and biotic factors in predicting the taxonomic and functional biogeographic dynamics of viruses in the AMD sediments.
Subject terms: Biogeography, Microbial ecology, Metagenomics
The biogeography of viral communities in extreme environments remains understudied. Here, the authors use metagenomic sequencing on 90 acid mine drainage sediments sampled across Southern China, showing the predominant effects of prokaryotic communities and the influence of environmental variables on viral taxonomy and function.
Introduction
Microorganisms are the most phylogenetically diverse and widespread form of life on Earth1. Unraveling the processes that generate and underlie microbial biodiversity across space and time is critical for predicting the dynamics of microbial communities in the environment2,3. Gene surveys, especially those utilizing high throughput sequencing technologies, have advanced our understanding of the biogeographic patterns of microbes in nature, revealing significant roles of contemporary environmental variation or historical contingency in shaping their large-scale ecological ranges4. More recently, advances in metagenomic sequencing technologies and bioinformatics have moved microbial biogeography forward, allowing the examination of functional trait variation in their natural settings and the evolutionary and ecological processes creating and maintaining the biogeographic patterns5,6. Collectively, these efforts have greatly furthered our understanding of the mechanisms shaping microbial biodiversity on the planet.
Viruses are key entities in natural microbial assemblies, impacting prokaryotic population size through lysis7, reprogramming host metabolism with auxiliary metabolic genes (AMGs)8, and shaping microbial evolution via horizontal gene transfers (HGTs)9. However, viral ecology studies have been hampered by an absence of universal marker genes and thus were traditionally dependent on cultivation-based methods10. More recently, meta-omics approaches have been applied to explore viral diversity in the environment11, uncovering high viral diversity with little similarity to previously recognised viruses12. Despite these progresses, the biogeographic variation of viruses in ecosystems remains largely unstudied. The marine environments have been the focus of several studies of viral biogeography, revealing patterns whereby viral communities are passively transported on oceanic currents and locally structured by environmental conditions13, and the existence of specific ecological zones throughout the global ocean, with epipelagic waters and the Arctic as hotspots for viral biodiversity14. Our current understanding of viral biogeography stems from these pioneering studies.
The reduced-complexity prokaryotic communities in extreme environments have served as models for the study of microbial community structure and function15,16. The relatively low species richness, broad range and steep gradients of geochemical variables promise more straight-forward establishment of ecological patterns and underlying mechanisms. The diversity and community dynamics of viruses in extreme environments such as the Atacama Desert17, cryosphere18,19, acid mine drainage (AMD) environment20,21, and Earth’s subsurface7,22,23 have recently been investigated through meta-omics approaches; yet, extensive sampling and analysis of viral communities across large geographic scales to resolve their ecological distribution patterns and drivers have not been conducted. Here we strive to address this knowledge gap by utilizing a massive metagenomic data set generated from 90 AMD sediments sampled across Southern China (Fig. 1a). Extensive recovery of viral and prokaryotic genomes was performed and the results were analysed with a comprehensive set of metadata on geochemistry, geographic location and climate variables for each sample24, to quantify the effects of both biotic (prokaryotic hosts) and abiotic factors on the viral assemblages in this extreme ecosystem.
Fig. 1. Overview of acid mine drainage (AMD) sediment viruses.
a Geographic distribution of collected AMD sediment samples. The provinces from which AMD sediments were sampled are presented in gray. All sampled AMD sites (n = 18) are marked by orange squares. b Histogram showing the distribution of viral genome size. c Accumulation curve of viral operational taxonomic units (vOTUs, red) and viral protein clusters (PCs, blue) in the AMD sediment metagenomes. Dots represent the average number of vOTUs and PCs for all combinations of a given number of samples, and error bars represent the range. The numbers of viral PCs were divided by ten for better visualization. d Bar graphs showing the relative proportion and taxonomy of vOTUs based on reticulate classification method (vContact2) and Lowest Common Ancestor (LCA) algorithm. e Relative abundances of viral functions in the AMD sediments as annotated by eggNOG v5.0.0 database and VOG database. All COG categories are grouped into four types, including information storage and processing (COG categories A, RNA processing and modification; B, chromatin structure and dynamics; J, translation, ribosomal structure and biogenesis; K, transcription; L, replication, recombination and repair), cellular processes and signaling (D, cell cycle control, cell division, chromosome partitioning; M, cell wall/membrane/envelope biogenesis; N, cell motility; O, posttranslational modification, protein turnover, chaperones; T, signal transduction mechanisms; U, intracellular trafficking, secretion and vesicular transport, V, defence mechanisms; W, extracellular structures; Y, nuclear structure; Z, cytoskeleton), metabolism and transportation (C, energy production and conversion; E, amino-acid transport and metabolism; F, nucleotide transport and metabolism; G, carbohydrate transport and metabolism; H, coenzyme transport and metabolism; I, lipid transport and metabolism; P, inorganic ion transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism), and unknown functions (S, unknown; Unannotated proteins). Source data are provided in the Source Data file.
Results
Viral diversity in the AMD sediments
Metagenomic sequencing was conducted on the 90 sediment samples taken from geographically separated and geochemically diverse AMD environments24. Assemblies from the metagenomes were screened using a viral protein families-based pipeline25, VirSorter v1.0.626 and CheckV v0.6.027 and manually curated to predict 11,112 putative viral genomes that ranged between 10 - 350 kb with ~94% from 10 to 50 kb in size (Fig. 1b and Supplementary Data 1). We identified a total of 5,678 potential viral populations (viral operational taxonomic units, vOTUs), which are suggested to approximately represent species-level taxonomy12, and 143,610 viral protein clusters (PCs) that help organise the dominant unknown sequence space13 (Fig. 1c). The number of vOTUs and viral PCs in each sample ranged from 537 to 3,199 and 6,628 to 52,631, respectively (Supplementary Data 2). Despite such a broad range in viral taxonomic and functional richness across all samples, the cumulative curves of vOTUs and PCs were saturated, indicating that viral communities in the AMD sediments were relatively adequately sampled (Fig. 1c).
Taxonomic analyses of the 5,678 viral population genomes against the NCBI Viral RefSeq v201 database showed that the vast majority (96.0%) of vOTUs could not be assigned taxonomy through reticulate classification (vConTACT2)28, while 66.1% of vOTUs could be annotated at the family level using the LCA algorithm29 (Fig. 1c). Most classified viruses were resolved as one of the three families (Myoviridae, Siphoviridae, and Podoviridae) in the Caudovirales order (Fig. 1d and Supplementary Data 3). Comparisons of the predicted viral proteins against the eggNOG database30 and VOG database revealed that most viral proteins from the AMD sediments were uncharacterised, with the annotated proteins enriched in information storage and processing (COG categories ABJKL) and virus replication (VOG category Xr) or virus function beneficial for the host (VOG category Xh) (Fig. 1e).
Distribution patterns of viral diversity and functions
To explore the variability in viral populations and functions across the AMD sediments, pairwise Pearson’s correlations were used to uncover relationships between viral communities and other biotic and abiotic factors. Prokaryotic community structure in the sediments was resolved by extensive reconstruction and dereplication of bacterial and archaeal genomes from metagenomes, and the results were highly similar to those from the 16 S rRNA gene amplicon analysis24 (Supplementary Fig. 1). The prokaryotic richness, estimated as the number of prokaryotic metagenome-assembled genomes (MAGs) in each sample (Supplementary Data 1), was found to be most relevant to the number of viral populations (Pearson’s r = 0.89, P < 0.001) and functions (Pearson’s r = 0.82, P < 0.001) (Fig. 2a). Meanwhile, overall viral taxonomic and functional richness increased toward the equator and were both negatively correlated with electronic conductivity (EC). Significant positive correlations were observed between viral abundance and ferric iron (Pearson’s r = 0.30, P < 0.05), as well as between viral functional abundance and Fe (Pearson’s r = 0.29, P < 0.05). We further evaluated the dependence of viral taxonomic and functional distributions on different factors by correlating dissimilarities of viral taxonomic and functional community composition with those of abiotic variables. Results showed that mean annual temperature (MAT) and Fe were the strongest correlates of both viral taxonomic and functional dissimilarities, which also increased with increasing differences in mean annual precipitation (MAP), pH, ferric iron, sulphate, and distance from the equator of the AMD sediments (Fig. 2a). Furthermore, Mantel test analysis revealed significant correlations between prokaryotic dissimilarity and viral taxonomic (Mantel’s r = 0.96, P < 0.001) and functional dissimilarities (Mantel’s r = 0.95, P < 0.001) across all samples.
Fig. 2. Dynamics of viral populations and functions.
a Pairwise comparisons of the biotic and abiotic variables. The color gradient in the heatmap denotes Pearson’s correlation coefficients and the asterisk indicates two-tailed test of Pearson’s statistical significance adjusted using the Benjamini and Hochberg false discovery rate controlling procedure. *P < 0.05, **P < 0.01 and ***P < 0.001. Edge width corresponds to the Mantel’s r statistic for the corresponding distance correlations, and edge color denotes the statistical significance. vOTUs, the number of viral operational taxonomic units; PCs, the number of viral protein clusters; MAGs, the number of metagenome-assembled genomes; EC, electronic conductivity; Ferrous, ferrous iron; Ferric, ferric iron; TOC, total organic carbon; TN, total nitrogen; TP, total phosphorus; AP, available phosphorus; MAT, mean annual temperature; MAP, mean annual precipitation; Dist., distance from the equator; vOTUs abun., the abundance of viruses; and PCs abun., the abundance of viral functions. b, c Principal coordinate analysis (PCoA) of viral taxonomic b and functional (c) structure colored by sampling sites. d, e Distance-decay relationships (DDRs) based on Bray-Curtis similarity (1 - dissimilarity) of viral taxonomic d and functional e community compositions. The blue line denotes the least-squares linear regression across all spatial scales. Red and purple lines denote separate regressions within samples whose distance ≤ 1 km and within samples whose distance > 1 km, respectively. Color-coded best-fit lines and adjusted R2 values for each DDR are presented. The statistical test used was two-tailed. Source data are provided in the Source Data file.
To examine whether geographic distance may influence viral distributions, principal coordinate analyses (PCoA) were used to assess the degree of segregation of the viral communities. We observed a separation of viral taxonomic and functional structure for the 90 AMD sediment samples, with a similar distribution within the same site (Fig. 2b, c). In support of this, significant negative distance-decay relationships (DDRs) were observed across all samples based on the Bray-Curtis similarities (1 - dissimilarity) of viral taxonomic (slope = −0.10, P < 0.001) and functional (slope = −0.09, P < 0.001) structure. Furthermore, the slopes of the DDRs depended on spatial scale. Specifically, the overall slope was significantly shallower than the slopes within a local scale (pairwise distance ≤ 1 km) but steeper than the slopes within a regional scale (pairwise distance > 1 km) (Fig. 2d, e).
Ecological drivers of viral taxonomic and functional community structure
Having illustrated the roles of individual factors in shaping viral taxonomic and functional diversity and distributions, we next sought to discern the causality and quantify the direct and indirect effects of the drivers using structural equation modeling (SEM). The final SEM models provided satisfactory fit to our data compared with the priori models (Supplementary Fig. 2), as suggested by the P-values (Chi-squared test) and root mean square error of approximation (RMSEA) (Fig. 3). Specifically, the hypothesised direct effects of pH on prokaryotic diversity and community structure in the priori models were not observed in our final SEM models. For viral communities, we did not find significant impacts of viral taxonomic and functional abundance on their composition, suggesting discrepancies between our priori predictions and the final models (Fig. 3 and Supplementary Fig. 2). On the other hand, our final SEM models were consistent with the Pearson’s correlation results. Distance from the equator probably had impacts on the number of vOTUs and viral PCs in different samples through its direct negative effect on MAP (r = −0.32, P < 0.01), or prokaryotic richness (r = −0.42, P < 0.001) which was the most influential variable directly related to viral taxonomic (r = 0.86, P < 0.001) and functional richness (r = 0.81, P < 0.001). The SEM models also revealed that pH and MAT had some direct effect on viral taxonomic and functional richness (Fig. 3a, b).
Fig. 3. Drivers of viral taxonomic and functional diversity and compositions.
Path diagram for SEM showing only significant direct and indirect effects of biotic and abiotic variables on viral taxonomic a and functional b diversity and compositions. Composition is represented by the PC1 from the Bray-Curtis dissimilarity-based principal coordinate analyses. Numbers adjacent to the arrows are standardised path coefficients (r), analogous to relative regression weights and indicative of the effect size of the relationship. Blue and red arrows represent significant (P < 0.05) positive and negative pathways, respectively. Double-headed arrows indicate covariance between variables, single-headed arrows indicate a one way directed relationship. R2 represents the proportion of variance explained for every dependent variable in the model. The fit of models was evaluated using one-tailed Chi-squared test and root mean square error of approximation (RMSEA). vOTUs, the number of viral operational taxonomic units; PCs, the number of viral protein clusters; Ferric, ferric iron; MAT, mean annual temperature; MAP, mean annual precipitation; and MAGs, the number of metagenome-assembled genomes. Source data are provided in the Source Data file.
The prokaryotic richness had negative impacts on viral taxonomic (r = −0.33, P < 0.001) and functional (r = −0.16, P < 0.001) composition. Meanwhile, prokaryotic composition, which was positively and directly affected by MAT (r = 0.72, P < 0.001), distance from the equator (r = 0.58, P < 0.001) and prokaryotic abundance (r = 0.29, P < 0.001), was found to drive both viral taxonomic (r = 0.94, P < 0.001) and functional (r = 0.96, P < 0.001) composition. Unexpectedly, the abundances of viral populations and functions were negatively related to the abundance of prokaryotes, which was negatively driven by pH (r = −0.32, P < 0.001) and MAP (r = −0.25, P < 0.001), and positively associated with MAT (r = 0.93, P < 0.001) and distance from the equator (r = 0.74, P < 0.001). Additionally, both MAP and prokaryotic richness affected the abundances of viral populations and functions, with increased abundance associated with higher MAP and lower prokaryotic richness. The other direct drivers of viral taxonomic and functional abundance were ferric iron (r = 0.23, P < 0.01) and pH (r = 0.20, P < 0.05).
Virus-host interaction dynamics
To further resolve potential host effects on viral ecology, we screened the 7,991 high-quality (≥ 50% genome completeness and < 10% contamination) prokaryotic MAGs recovered from the sediment metagenomes for genomic features to link viruses to their putative hosts. As a result, 6,003 viral genomes were linked to 3,404 prokaryotic MAGs. Summarizing these results at the population level revealed virus-host pairs for 3,031 out of the 5,678 vOTUs and 1,488 out of the 2,897 prokaryotic populations (Supplementary Data 4). Most (97%) of the predicted host populations were assigned to 20 prokaryotic phyla, including bacteria belonging to Proteobacteria (433 populations), Actinobacteriota (193) and Acidbacteriota (137) and archaea from the Thermoplasmatota (132) (Fig. 4a). The predicted hosts were also affiliated with many poorly characterised phyla, including 14 bacterial populations from the Dormibacterota, 13 from Elusimicrobiota and 13 from Eremiobacterota, and 41 archaeal populations from the Micrarchaeota, 17 from Nanoarchaeota and 8 from Thermoproteota. The abundances of these host phyla were mostly (19 of the 20 phyla) significantly correlated with the total abundance of viruses infecting the same host lineage across the AMD sediments, indicating a high accuracy of our host prediction (Fig. 4a). We also calculated virus-host abundance ratios (VHRs) to assess how virus-host dynamics varied across different hosts. A range of lineage-specific VHRs (typically > 1) were observed, with the highest average values recorded in Chloroflexota (Fig. 4a).
Fig. 4. Virus-host abundance patterns.
a The blue and orange dots indicate lineage-specific virus-host abundance ratios (VHRs, log10 scale) and virus-host abundance correlation coefficients, respectively. The solid orange dot represents significant (P < 0.05, two-tailed test) Pearson’s correlations while the hollow orange dots represent nonsignificant correlations. The Pearson’s correlations were calculated across samples in which host phyla and associated viruses were both detected. Numbers in the brackets indicate numbers of predicted host populations. b Top panel: total abundance of Proteobacteria and Thermoplasmatota; middle panel: total abundance of Proteobacteria hosts and Thermoplasmatota hosts; bottom panel: total abundance of Proteobacteria-associated viruses and Thermoplasmatota-associated viruses. c Relationships between viral abundance and prokaryotic abundance. The blue line indicate best-fitting polynomial functions. Adjusted R2 value for this plot is presented. The statistical test used was two-tailed. d Relationships between prokaryotic abundance and summed relative abundances of temperate and virulent viruses. Color-coded best-fit lines and adjusted R2 values for each linear regression are presented. The statistical test used was two-tailed. Source data are provided in the Source Data file.
Given the dominance of Proteobacteria and Thermoplasmatota across the 90 AMD sediments (Supplementary Fig. 3), we examined their virus-host abundance dynamics in detail. The VHRs were significantly higher in Proteobacteria than in Thermoplasmatota (Supplementary Fig. 4a). We contrasted the abundance between the two phyla across the 90 sediments, and found that Proteobacteria and Thermoplasmatota showed distinct dynamics in both total abundance and predicted host abundance. The abundance of Proteobacteria increased firstly and then decreased along the elevated prokaryotic abundance, while the abundance of Thermoplasmatota consistently and substantially increased. These abundance patterns were similar to those of their associated viruses (Fig. 4b). However, the Thermoplasmatota-associated viruses showed a weaker increase in abundance compared with their hosts (Fig. 4b). As a result, we found that the total abundance of viruses peaked at intermediate prokaryotic abundance (Fig. 4c).
We next investigated whether prokaryotic hosts might affect viral life strategies and virus-host interaction structure. A deep learning approach was applied to distinguish virulent and temperate viral populations in our data (Supplementary Data 3)31. Results showed that the relative abundance of virulent viruses increased while the relative abundance of temperate viruses decreased significantly as the prokaryotic abundance increased, suggesting that virulent life strategies became more prevalent in sediment communities with higher prokaryotic abundance (Fig. 4d). Concomitantly, significant (Wilcoxon t-test, P < 0.001) higher virulent/temperate abundance ratios were observed in Thermoplasmatota-associated viruses than in Proteobacteria-associated viruses (Supplementary Fig. 4b). When averaged at the host phylum level, lineage-specific host range (number of host populations for each viral population) and viral range (number of viral populations for each host population) were highest in Thermoplasmatota and Proteobacteria, respectively. Besides, the host range significantly increased with the prokaryotic richness (Pearson’s r = 0.45, P < 0.05), and the viral range significantly increased with the viral richness (Pearson’s r = 0.86, P < 0.001) across the host phyla (Fig. 5a). Further, increased prokaryotic richness and viral richness were associated with significant decline in modularity (Fig. 5b, d) and significant increase in nestedness of virus-host bipartite sub-networks across the sediment samples (Fig. 5c, e).
Fig. 5. Viral-host interaction structure across host lineages and sediment samples.
The prokaryotic richness and viral richness were estimated as the number of prokaryotic MAGs and vOTUs, respectively, within different phyla (a) or in sediment samples b–e. a Bar graphs showed the prokaryotic richness (blue) and viral richness (orange), while the line charts indicated host range (circle) and viral range (triangle) by host lineage. The modularity and nestedness in each sample was calculated based on the sub-networks derived from the overall virus-host interaction networks by preserving viral and host populations presented in each sample. The blue lines denote linear regression relationships between prokaryotic richness or viral richness, and modularity (b, d) or nestedness (c, e) of sub-networks across the 90 sediment samples. The adjusted R2 values for each linear regression are presented. The statistical test used was two-tailed. Source data are provided in the Source Data file.
Case study of viral AMGs
To further elucidate virus-host interactions, we analysed viral AMGs to assess whether abiotic factors impact viral functions, which in turn affect host metabolism and sediment biogeochemistry. We focused on phosphorus (P) metabolism-related genes because of their putative roles in response to P deficiency in AMD environments32,33. We identified 75 viral genes annotated as phosphate starvation-inducible protein (phoH)34, which belongs to the COG number of 4QCHF and COG0172 (Fig. 6a and Supplementary Data 5). To further explore the origin of these predicted viral phoH genes, 111 homologs from eggNOG database (v5.0.0) and 114 homologs from the recovered MAGs were recruited and combined to build a phylogenetic tree (Fig. 6a and Supplementary Data 6). The result showed that the phoH genes were widely distributed in both prokaryotes and viruses and clustered phylogenetically. Further examination of the recovered phoH genes showed that genes assigned as 4QCHF were mostly clustered with their counterparts from viruses and Bacteroidota, while genes assigned as COG0172 were mostly affiliated with homologs from Proteobacteria and Patescibacteria. Interestingly, significant increase in the total abundance of the phoH genes was observed with decreasing concentrations of total P (TP) and available P (AP) in the sediments, suggesting that the viral phoH genes might be induced under P starvation in AMD sediments (Fig. 6b).
Fig. 6. Genomic analyses of viral phosphorus (P) metabolism-related genes.
a Maximum-likelihood phylogenetic tree with phoH genes from the AMD sediments (indicated by stars) compared to homologs found in eggNOG v5.0.0 database and the host proteins colored by different phyla (the outer color ring). b Linear regression relationships between the total abundance of viral phoH genes and the concentrations of total P (TP) and available P (AP). The statistical test used was two-tailed. c Genome map of a latent provirus genome containing phnCDE genes annotated by eggNOG v5.0.0 database. Genes related to information storage and processing are shown in blue; genes related to metabolism and transport are shown in yellow; genes related to cellular processes and signaling are shown in green; virus-specific genes are in red; phnCDE genes are in purple; and unknown genes are in grey. Detailed function descriptions of the nine viral scaffolds are listed in Supplementary Data 5. Source data are provided in the Source Data file.
In addition, we assembled a provirus genome encoding the first three genes of the phn operon - phnCDE, which also belongs to the pho regulon and comprises a binding protein-dependent transporter involved in the uptake of P in the form of phosphonate (Fig. 6c and Supplementary Data 5)35. This provirus genome covered 72% of the whole fragment that was ‘co-binned’ with a host population genome (FK3.bin20) classified as Burkholderiales of Gammaproteobacteria (Supplementary Data 5). Meanwhile, 11 additional Burkholderiales populations were predicted as hosts of the provirus based on BLASTn of genomic content, as evidenced by the significant positive correlation between the abundance of provirus and these Burkholderiales populations (Supplementary Fig. 5). Furthermore, phylogenetic analyses indicated that the phnCDE genes identified in the provirus were affiliated with homologous genes from Burkholderiales spp. in eggNOG v5.0.0 database, implying a potential origin of these viral functional genes (Supplementary Fig. 6 and Supplementary Data 6).
Discussion
Recent metagenomic and viromic surveys have uncovered an unprecedented diversity of viruses in both aquatic and terrestrial environments12. Fully accessing viral biodiversity is important for the study of biogeographic patterns but represents a major challenge especially for soil and sediments, where viruses are typically diverse and abundant29,36. To bypass this hurdle, we adopted a total metagenome approach to uncover viral taxonomic and functional diversity in AMD sediments and generated a large number of viral genomes and genes. It should be noted, however, that a recent study showed that viromes outperformed metagenomes in recovering viral contigs especially the rare taxa from agricultural soils, indicating the limitation of using metagenomes alone to explore viral communities in complex environmental samples37. Thus, a virome-based approach would likely capture more viral populations in our AMD sediments.
Annotation through the reticulate method revealed that a vast majority of our predicted viral genomes could not be taxonomically classified (Fig. 1d), highlighting the uniqueness of viral populations unearthed in the current study. Such a low annotation rate is largely attributable to the absence of complete genomes of viral isolates from AMD and associated environments. This finding suggests that, despite extensive meta-omics analyses of the prokaryotic communities residing the AMD model system15, our knowledge of the viral biodiversity therein is unbalancedly very limited20,21,38,39. Nearly one third of the predicted viral proteins could be annotated by eggNOG v5.0.0 database30, and they were mostly assigned to known functions that are pivotal for the survival and proliferation of viruses. These metabolic functions have previously been found over-represented in viral assemblages in other habitats40,41, indicating a universal distribution of viral core genes while there is also evidence of adaptation of certain viral functions to specific environments42.
The viral taxonomic and functional richness in our study follows the latitudinal diversity gradient paradigm that suggests higher biodiversity in the tropics with a decrease toward the poles (Fig. 2a). While in general agreement with the diversity patterns of other domains of life43,44, more samples from a wider range of latitudes should be analysed to verify this result. The overall effect of latitude on viral taxonomic and functional richness in the AMD sediments may be primarily attributable to the variations in prokaryotic richness (Fig. 3). However, the role of other factors, in particular pH and MAT, in directly shaping the number of viral populations and functions should not be overlooked. The mechanism explaining the influence of pH and MAT remains unknown, but decreased pH and increased MAT not only exert impacts on prokaryotes and consequently alter the indigenous viral assemblies, but also may increase the fitness cost of viruses persisting in the environment.
Our analyses identified ferric iron concentration as the most important environmental factor governing viral abundance in the AMD sediments (Fig. 2a and Fig. 3). The Ferrojan horse hypothesis has depicted that phages with their tail fibers incorporated with iron ions may effectively infect hosts through competing with siderophore-bound iron for uptake receptors45. Therefore, non-Ferrojan viruses would have a fitness advantage in iron-replete conditions46. Thus, the iron-rich AMD sediments subsequently may favor the survival and enrichment of non-Ferrojan viruses, contributing to the variation of viral abundance observed in the current study. Another possibility would be potential adsorption of viral particles on iron-bearing minerals precipitated from water phase to the sediments as previous investigations have documented strong relationships between viral abundance and mineral saturation indices47,48. A similar scenario (i.e., the attachment of viruses on particles and then co-precipitation to the seafloor) has been demonstrated in the marine environment49. While being mineral attached may make these viruses inactive, they could subsequently be released with increased pH since minerals with higher isoelectric point tend to be a better adsorbent of viruses48.
The biogeographic pattern that community similarity decreases with increasing geographical distance has been observed in both prokaryotic and microbial eukaryotic communities50,51. Our results extend this pattern to the viral world, revealing a scale-dependent distance-decay distribution of viral taxonomic and functional composition (Fig. 2d, e). Meanwhile, SEM model indicated that MAT, MAP, distance from the equator, and pH were most important in shaping prokaryotic assemblages, which was further the major driver of viral taxonomic and functional composition. This contrasts results from our previous biogeography survey of prokaryotes in AMD solutions where pH was the strongest predictor of microbial community52, but is consistent with the patterns in marine viruses in that viral communities are influenced by temperature and latitude13,14. Furthermore, our data suggest that the distribution of viral populations and functions is unlikely to be primarily affected by environmental variables and geographic distance, but rather by their host compositions. While the strong influence of prokaryotes on viral communities have also been observed in previous studies19,29, which could be partly attributable to the parasitic lifestyle of viruses, it might also reflect potential methodological limitations that recovered viral genomes from bulk metagenomes biased toward intracellular viruses and thus should be interpreted with caution12.
The tight couplings between viral taxonomic and functional composition and prokaryotes were further corroborated by our host prediction analysis, which described numerous virus-host interactions at the population level. Using the predicted virus-host linkages, we demonstrated that almost all viruses exhibited parallel variations in abundance with their hosts (Fig. 4a), which was consistent with genuine virus-host pairs. Notably, total viral abundance was better described as a nonlinear, polynomial function of prokaryotic abundance. This pattern is probably due to the different VHRs between the two dominant phyla: a decrease in the abundance of Proteobacteria created niche occupancy for Thermoplasmatota to fill, whereas the significantly lower VHRs in Thermoplasmatota might result in the observed trend of shallower increase in the abundance of Thermoplasmatota-associated viruses (Fig. 4b). Meanwhile, the decrease in viral abundance at higher prokaryotic abundance is unlikely a result of switching of viral life strategies from virulent to temperate, since virulent viruses were more abundant in Thermoplasmatota-dominated samples (Fig. 4d). Additionally, the specialisation or generalization of virus-host interactions are subjected to the host group, as indicated by the lineage-specific host range and viral range (Fig. 5a). Furthermore, the prokaryotic and viral richness-related modularity and nestedness supports experimental models that show how the increase of host or viral diversity can select for generalised over specialised phages53,54.
Thus far, very limited information is available for viral AMGs in extreme AMD environments21. Our study identified a number of pho regulon genes (i.e., phoH and phnCDE) in the predicted viral genomes (Fig. 6). This suggests frequent horizontal gene transfers (HGTs) of these different types of P metabolism-related genes, which was further supported by the phylogenies of the phoH and phnCDE genes, as well as previous reports of pho regular genes in viral genomes55,56. That none of the viral phnCDE genes were affiliated with homologs from the prokaryotic MAGs recovered from the AMD sediments may be a result of mutation events occurred on them. As AMD and associated environments are often oligotrophic, the identified P metabolism-related genes may provide the viruses with the ability to supplement or sustain P assimilation in their hosts, indicating an important adaptation in AMD environments. The observed negative correlations between total abundance of the phoH genes and concentrations of TP and AP supported this assumption. It should be noted, however, that the roles and relative importance of phage-encoded phoH genes in the P cycle have not been fully resolved57–59. Divergent functions such as RNA modification and lipid metabolism have also been documented for these genes60. On the other hand, phoH has been developed as a novel biomarker for assessing phage diversity in the environment56. The identification of phoH genes in our AMD sediments provides evidence for the wide distribution of these viral AMGs in different habitats including extreme environments.
Our study contributes to the understanding of viral biogeography by providing an initial view of the community patterns and ecological constraints of viruses populating an extreme environment. Our data suggest that the dynamics of viral populations and functions are subjected to their hosts, and also directly or indirectly correlated with other environmental and geographical variables. Extensive prokaryotic genome recovery from the metagenomic data set further refines our knowledge of how host abundance and diversity may affect virus-host interplays from the point of VHRs and interaction structure, respectively. Future efforts are needed to resolve the mechanisms shaping the viral biogeographic patterns observed in the AMD model system, and to examine whether such findings are relevant to other types of extreme environments on the planet.
Methods
Sample collection
AMD sediments were collected from 18 mine sites in six provinces across Southern China (22.96°−31.68°N, 105.73°−118.63°E) from August to October in 201724. These samples (10 for each site) represent a wide range of mineralogy and environmental conditions. Samples were collected using a shovel from the top 10 cm of AMD sediments either at the center or at ~1 m from the edge of AMD ponds depending on the safety and size of the features at each mine site. The samples were sealed in 50 mL sterile tubes, kept in an icebox and transported to the laboratory, where they were stored at 4 °C and processed within 24 h. Each sediment was well mixed and divided into two fractions: one fraction for DNA extraction (subsequently stored at −80 °C) and the other for physicochemical measurements (air-dried)24.
Environmental measurements
Geochemical parameters were determined with standard methods24. Specially, air-dried subsamples were ground and passed through 20-mesh and 100-mesh sieves, and stored at ambient temperature until use. Total organic carbon (TOC) (TOC-VCPH; Shimadzu, Columbia, MD), total nitrogen (TN) and TP (SmartChem; Westco Scientific Instruments Inc., Brookfield, CT) were analysed with standard methods (0.2 g each). AP was determined colorimetrically by the molybdenum blue method at 700 nm wavelength61 (5.0 g of subsamples). For measuring pH and EC, 4.0 g of sediments was mixed with 10 mL of deionised water (1:2.5 (w/v)) and the supernatant was then measured using a pH meter and an EC meter. The concentrations of HCl-extractable ferrous iron (Fe2+) and ferric iron (Fe3+) were determined by UV colorimetric assay with 1, 10-phenanthroline method at 530 nm wavelength (1.0 g of subsamples)62, and sulphate (SO42-) was measured by a BaSO4-based turbidimetric method (2.0 g of subsamples)63. Total concentrations of heavy metals (including Pb, Zn, Cu, Cd, Fe, and Mn) were determined by inductively coupled plasma optical emission spectrometry (ICP-OES; Optima 2100DV, PerkinElmer, Wellesley, MA) after digestion of 0.2 g sediments with an HNO3/HCl mixture (1:3 (v/v)). Estimates of the MAT and MAP were obtained from the WorldClim2 database (www.worldclim.org).
DNA extraction and metagenomic sequencing
Total DNA was extracted from 10 g of each sediment which was pretreated with 30 mL solution containing 0.1 mol/L ethylene diamine tetraacetic acid (EDTA), 0.1 mol/L Tris (pH 8.0), 1.5 mol/L NaCl, and 0.1 mol/L NaH2PO4 and Na2HPO4 prior to the employment of the FastDNA Spin Kit (MP Biomedicals, Irvine, CA)24,64. Extracted DNA was purified using the QIAquick Gel Extraction Kit (Qiagen, Chatsworth, CA). Finally, a total of 90 samples (with the other samples being discarded due to their low DNA yield/quality) were used for library preparation with NEBNext Ultra II DNA Prep Kit (New England Biolabs, MA) and sequenced from both ends with MiSeq Reagent Kit v3 on an Illumina MiSeq platform (150 bp, paired end reads). This generated totally ~7 Tb metagenomic raw reads data.
Processing of metagenomic sequence data
Metagenomic reads were quality filtered and trimmed using in-house Perl scripts. A trim quality threshold of 30 was used and reads containing more than five ‘N’s were discarded. All quality-controlled reads from a sediment sample were assembled using SPAdes v3.14.1 and kmers of 21, 33, 55, 77, 99, 127 under the ‘--meta’ mode65. Genes were predicted by Prodigal 2.6.3 with the parameters set as ‘-p meta -g 11 -f gff -q -m’66. For functional annotation, the protein-coding sequences were separately compared against the Pfam v33.167, Kyoto Encyclopedia of Genes and Genomes (KEGG) database68, Non-supervised Orthologous Groups (eggNOG v5.0.0)30, and Virus Orthologous Group (VOG, http://vogdb.org, Accessed 5 Oct. 2021) with a threshold of 50 for bit score and 10−5 for E-value. Annotations with the lowest E-value in each database were then selected as the best hits for the proteins.
Identification and clustering of viral genomes
Three methods were employed separately to identify viral genomes in the metagenomic assemblies: (1) viral protein families25, (2) VirSorter v1.0.6 software26, and (3) CheckV v0.6.0 software27. Specifically, viral protein families were downloaded from the Integrated Microbial Genomes with Microbiome (IMG/M) system and used as bait to screen the proteins of metagenomic contigs longer than 10 kb (hmmsearch v3.3.2, threshold of 10−5 for E-value)69. Contigs with five or more viral protein families were collected and then filtered based on the number of genes covered with Pfams and KO terms25. Meanwhile, VirSorter (run with default parameters using the ‘virome’ database) was also used to recover viral contigs longer than 10 kb and those identified as categories 1 and 2 were retained and curated, as described previously70. Additionally, prophages identified as VirSorter categories 4 and 5 were processed with CheckV ‘contamination’ program to identify and remove host contaminations27. Finally, viral genomes predicted by the three methods were pooled. All predicted viral genomes originating from eukaryotic viruses based on a BLAST affiliation of the genes to the NCBI RefseqVirus database (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral, Accessed 20 July. 2020) were removed71. Besides, predicted viral genomes with no genes displaying a best BLAST hit to prokaryotic viruses were also excluded.
The identified viral genomes were clustered into vOTUs using the parameters 95% average nucleotide identity (ANI) and 85% alignment fraction of the smallest scaffolds based on the scripts (https://bitbucket.org/berkeleylab/checkv/src/master/) provided in CheckV27. Representative viral population genomes were then detected with DeePhage v1.0 to distinguish life strategies (virulent or temperate)31. Genes of all identified viral genomes were predicted by Prodigal 2.6.3 (with the parameters set as ‘-p meta -g 11 -f gff -q -m’)66, and clustered by using cd-hit (-n 4 -d 0 -g 1; 60% identity and 80% coverage)72. Reads from each of the 90 sediment metagenomes were mapped to the viral representative genomes and genes using BamM ‘make’ v1.7.3 (http://ecogenomics.github.io/BamM/) with default parameters, and the coverage of each sequence was calculated with BamM ‘parse’ v1.7.3 using the ‘tpmean’ coverage mode (remove the highest 5% and the lowest 5% coverage regions, minimum nucleotide identity of 95%, minimum aligned length of 75% of each read). The abundance for a given scaffold or gene was computed as the average scaffold or gene coverage divided by the number of reads in a given library and multiplied by the mean value of the number of reads in the 90 libraries. For taxonomic assignment, a gene content-based network analysis was used to taxonomically place the viral representative genomes in the context of known viruses28. Briefly, predicted proteins from viral genomes were clustered with predicted proteins from isolate reference viruses (v201) based on an all-versus-all BLASTp search with an E-value of 10−3, and protein clusters were defined with the Markov clustering algorithm and processed using vConTACT v2.028. Meanwhile, predicted viral proteins were aligned against the NCBI Viral RefSeq v201 database using BLASTp with a threshold of 50 for bit score and 10−5 for E-value. The LCA algorithm was then used for taxonomic analysis of each viral genome based on the taxonomic rank of annotated proteins29.
Recovery of prokaryotic population genomes
Prokaryotic population genomes were recovered from the 90 sediment metagenome assemblies (excluded free viral genomes) using MetaBAT v2.12.173, MaxBin v2.2.274, Abawaca v1.0075, and Concoct v0.4.076 with default parameters, considering tetranucleotide frequencies, scaffolds coverage and GC content. The resulting bins were then combined using DASTool v1.1.277, and further manually curated to obtain high-quality genomes using RefineM v0.0.2478. These genomes were then classified using the genome taxonomy database (GTDB-Tk v1.6.0)79. The completeness and contamination of genome bins were assessed using CheckM v1.1.3 with default parameters, except those assigned as Patescibacteria which were estimated using a smaller set of markers80. Genomes estimated to be ≥ 50% complete and < 10% contaminated were selected to calculate the ANI. Genomes with > 97% ANI over >70% alignment were grouped as a population: the highest quality genome calculated as ‘completeness – 4 × contamination’ in each population was chosen as the representative81. Finally, reads from each of the 90 sediment metagenomes were mapped to the set of dereplicated genomes using BamM v1.7.3 as described above for the viral sequences (Supplementary Data 7).
Virus–host linkage analyses
Viral genomes were putatively linked to their hosts in silico82. Briefly, these linkages were based on (1) shared genomic content between viral scaffolds and host genomes, (2) prophages identified in host genomes, and (3) sequence similarity between CRISPR-spacers in host genomes and protospacers in viral scaffolds. All viral genomes were compared to the recovered prokaryotic genomes using BLASTn (E-value ≤ 10−3, bit score ≥ 50, alignment length ≥ 2.5 kb and identity ≥ 70%)71. Viral genomes identified as prophages were matched to their corresponding host genomes. CRISPR spacers were recovered from metagenomic scaffolds using metaCRT with default parameters83. Extracted spacers were compared to viral scaffolds using BLASTn with thresholds of an E-value ≤ 10−10 and no mismatches over the whole spacer length71,84.
Viral AMGs analyses
The predicted viral proteins were assigned to eggNOG v5.0.0 database using BLASTp (threshold of 50 for bit score and 10−5 for E-value)30. As a result, 75 viral proteins were assigned as phoH genes (4QCHF and COG0172) and three were assigned as phn operon (phnCDE) genes. These viral proteins were compared to the host proteins and eggNOG v5.0.0 database (BLASTp, threshold of 50 for bit score and 10−3 for E-value) to recruit relevant sequences (up to 5 for each viral AMG sequence)71. Each set of viral AMGs were then aligned with Muscle v3.8.31 and filtered by TrimAL v1.4.rev22 to remove columns comprised of more than 95% gaps85,86. Finally, phylogenetic trees were constructed using iqtree2 with the parameters set as ‘-mem 100GB -T 20 -m MFP -B 1000 --bnni’, and visualized and formatted in the Interactive Tree of Life online interface using the Newick file with the best tree topology87,88.
Statistical analyses
Statistical analyses were implemented with various packages within the statistical program R v4.0.389. Biotic and abiotic matrices were standardised using ‘decostand’ function in vegan v2.5–5 with methods of ‘Hellinger’ and ‘Standardize’, respectively90. Bray–Curtis dissimilarity was used to show distances for prokaryotic and viral community structure and function profiles, whereas Euclidean distances were calculated using environmental variables (vegan v2.5–5)90. Pearson correlations were performed using ‘rcorr’ function (999 permutations) in Hmisc v4.2-0 to assess the relationships between the richness and abundances of viral populations and functions, prokaryotes and environmental variables in all samples91. Mantel tests were performed to reveal the correlations between the dissimilarity matrices (vegan v2.5-5)90. In all correlation analyses, P values were adjusted for multiple testing using the Benjamini and Hochberg false discovery rate controlling procedure (stats v4.0.3)92.
To understand how local spatial organisation of the viral communities varies within and across different AMD sites, PCoA (utilizing the Bray-Curtis dissimilarity metric), which allows dimensionality reduction, was used (vegan v2.5-5)90. The rate of the DDRs was calculated as the slope of a linear least squares regression on the relationship between log10-transformed geographical distance versus viral taxonomic and functional community composition similarity. SEM was used to tease apart the direct and indirect relationships among environmental and geographical variables, prokaryotic community composition, and viral taxonomic and functional composition (lavaan v2.1.2)93. Community composition was represented by PCoA PC1 based on the Bray-Curtis dissimilarity metric. Priori models were first constructed, considering all theoretical or empirical mechanisms whereby abiotic and biotic factors influence viral taxonomic and functional diversity, abundance and structure (Supplementary Fig. 2). The priori models were then optimized until attaining the final models. A Chi-squared test and the RMSEA were used to evaluate the fit of models. Sub-networks for virus-host interactions in each sediment sample were also generated from meta-networks by preserving viral or prokaryotic populations presented in the sample. The modularity and nestedness values for each sub-network were computed with ‘Brim’ and ‘NODF’ algorithm in MATLAB BiMat package with 1000 permutions94. The Shapiro-Wilk test and Bartlett’s test were performed to check for normality and equal variance between groups92. Statistical significance of differences was then determined using non-parametric Wilcoxon t-test (unpaired)92.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Description of Additional Supplementary data 1-9
Acknowledgements
This work was supported by the National Natural Science Foundation of China to L.N.H. (nos. 31870111 and 31570500) and to W.S.S. (no. 41830318), as well as by the Natural Science Foundation of Guangdong Province to L.N.H. (no. 2021A1515012468).
Source data
Author contributions
S.M.G., L.N.H., and W.S.S. designed the experiments. S.M.G., H.X.A., and J.Z. conducted the experiments and collected the data. S.M.G., Z.H.L. and H.C. analysed the data. S.M.G. and L.N.H. wrote the initial draft of the manuscript while D.P.-E., J.L.L. provided substantial feedback.
Peer review
Peer review information
Nature Communications thanks Han Olff, Alexander Probst, Erinne Stirling, Gareth Trubl and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
Raw reads of metagenomes and all assembled prokaryotic population genomes have been deposited in NCBI BioProject database under accession code PRJNA666025. Short Reads Archive accession numbers for individual reads are listed in Supplementary Data 8. Biosample accession numbers for individual prokaryotic genomes are listed in Supplementary Data 9. Assembled viral genomes are available from the NCBI BioProject database under accession code PRJNA648034. eggNOG database is available at http://eggnog5.embl.de/download/eggnog_5.0. NCBI viral RefSeq database is available at https://ftp.ncbi.nlm.nih.gov/refseq/release. WorldClim database is available at https://www.worldclim.org/data/worldclim21.html. Source data are provided with this paper.
Code availability
The in-house Perl scripts, R scripts, Matlab scripts, and relevant data used to generate figures of this study are provided with this paper and publicly available on GitHub at https://github.com/eco-gaoshaom/viral-biogeography (10.5281/zenodo.6374561).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-30049-5.
References
- 1.Torsvik V, Øvreås L, Thingstad TF. Prokaryotic diversity-magnitude, dynamics, and controlling factors. Science. 2002;296:1064–1066. doi: 10.1126/science.1071698. [DOI] [PubMed] [Google Scholar]
- 2.Kuang J, et al. Predicting taxonomic and functional structure of microbial communities in acid mine drainage. ISME J. 2016;10:1527–1539. doi: 10.1038/ismej.2015.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mod, H. K. et al. Predicting spatial patterns of soil bacteria under current and future environmental conditions. ISME J. (2021). [DOI] [PMC free article] [PubMed]
- 4.Pace NR. A molecular view of microbial diversity and the biosphere. Science. 1997;276:734–740. doi: 10.1126/science.276.5313.734. [DOI] [PubMed] [Google Scholar]
- 5.Violle C, Reich PB, Pacala SW, Enquist BJ, Kattge J. The emergence and promise of functional biogeography. Proc. Natl Acad. Sci. USA. 2004;111:13690–13696. doi: 10.1073/pnas.1415442111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Green JL, Bohannan BJ, Whitaker RJ. Microbial biogeography: from taxonomy to traits. Science. 2008;320:1039–1043. doi: 10.1126/science.1153475. [DOI] [PubMed] [Google Scholar]
- 7.Daly RA, et al. Viruses control dominant bacteria colonizing the terrestrial deep biosphere after hydraulic fracturing. Nat. Microbiol. 2019;4:352–361. doi: 10.1038/s41564-018-0312-6. [DOI] [PubMed] [Google Scholar]
- 8.Howard-Varona C, et al. Phage-specific metabolic reprogramming of virocells. ISME J. 2020;14:881–895. doi: 10.1038/s41396-019-0580-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chevallereau A, Pons BJ, van Houte S, Westra ER. Interactions between bacterial and phage communities in natural environments. Nat. Rev. Microbiol. 2022;20:49–62. doi: 10.1038/s41579-021-00602-y. [DOI] [PubMed] [Google Scholar]
- 10.Sullivan MB, Weitz JS, Wilhelm S. Viral ecology comes of age. Environ. Microbiol. Rep. 2017;9:33–35. doi: 10.1111/1758-2229.12504. [DOI] [PubMed] [Google Scholar]
- 11.Brum JR, Sullivan MB. Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat. Rev. Microbiol. 2015;13:147–159. doi: 10.1038/nrmicro3404. [DOI] [PubMed] [Google Scholar]
- 12.Roux S, et al. Minimum information about an uncultivated virus genome (MIUViG) Nat. Biotechnol. 2019;37:29–37. doi: 10.1038/nbt.4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brum JR, et al. Patterns and ecological drivers of ocean viral communities. Science. 2015;348:1261498. doi: 10.1126/science.1261498. [DOI] [PubMed] [Google Scholar]
- 14.Gregory AC, et al. Marine DNA viral macro- and microdiversity from pole to pole. Cell. 2019;177:1109–1123. doi: 10.1016/j.cell.2019.03.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shu, W. S. & Huang, L. N. Microbial diversity in extreme environments. Nat. Rev. Microbiol. (2021). [DOI] [PubMed]
- 16.Huang LN, Kuang JL, Shu WS. Microbial ecology and evolution in the acid mine drainage model system. Trends Microbiol. 2016;24:581–593. doi: 10.1016/j.tim.2016.03.004. [DOI] [PubMed] [Google Scholar]
- 17.Hwang Y, Rahlff J, Schulze-Makuch D, Schloter M, Probst AJ. Diverse viruses carrying genes for microbial extremotolerance in the Atacama desert hyperarid soil. mSystems. 2021;6:e00385–21. doi: 10.1128/mSystems.00385-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Adriaenssens EM, et al. Environmental drivers of viral community composition in Antarctic soils identified by viromics. Microbiome. 2017;5:83. doi: 10.1186/s40168-017-0301-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Emerson JB, et al. Host-linked soil viral ecology along a permafrost thaw gradient. Nat. Microbiol. 2018;3:870–880. doi: 10.1038/s41564-018-0190-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Andersson AF, Banfield JF. Virus population dynamics and acquired virus resistance in natural microbial communities. Science. 2008;320:1047–1050. doi: 10.1126/science.1157358. [DOI] [PubMed] [Google Scholar]
- 21.Gao SM, et al. Depth-related variability in viral communities in highly stratified sulfidic mine tailings. Microbiome. 2020;8:89. doi: 10.1186/s40168-020-00848-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Holmfeldt K, et al. The Fennoscandian Shield deep terrestrial virosphere suggests slow motion ‘boom and burst’ cycles. Commun. Biol. 2021;4:307. doi: 10.1038/s42003-021-01810-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rahlff J, et al. Lytic archaeal viruses infect abundant primary producers in Earth’s crust. Nat. Commun. 2021;12:4642. doi: 10.1038/s41467-021-24803-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hao YQ, et al. Microbial biogeography of acid mine drainage sediments at a regional scale across Southern China. FEMS Microbiol. Ecol. 2022;98:fiac002. doi: 10.1093/femsec/fiac002. [DOI] [PubMed] [Google Scholar]
- 25.Paez-Espino D, Pavlopoulos GA, Ivanova NN, Kyrpides NC. Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. Nat. Protoc. 2017;12:1673–1682. doi: 10.1038/nprot.2017.063. [DOI] [PubMed] [Google Scholar]
- 26.Roux S, Enault F, Hurwitz BL, Sullivan MB. VirSorter: mining viral signal from microbial genomic data. PeerJ. 2015;3:e985. doi: 10.7717/peerj.985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nayfach S, et al. CheckV: assessing the quality of metagenome-assembled viral genomes. Nat. Biotechnol. 2021;39:578–585. doi: 10.1038/s41587-020-00774-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bin Jang H, et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 2019;37:632–639. doi: 10.1038/s41587-019-0100-8. [DOI] [PubMed] [Google Scholar]
- 29.Li, Z. et al. Deep sea sediments associated with cold seeps are a subsurface reservoir of viral diversity. ISME J. 15, (2021). [DOI] [PMC free article] [PubMed]
- 30.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Wu S, et al. DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience. 2021;10:giab056. doi: 10.1093/gigascience/giab056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chen LX, et al. Comparative metagenomic and metatranscriptomic analyses of microbial communities in acid mine drainage. ISME J. 2015;9:1579–1592. doi: 10.1038/ismej.2014.245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Liang JL, et al. Novel phosphate-solubilizing bacteria enhance soil phosphorus cycling following ecological restoration of land degraded by mining. ISME J. 2020;14:1600–1613. doi: 10.1038/s41396-020-0632-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hsieh YJ, Wanner BL. Global regulation by the seven-component Pi signaling system. Curr. Opin. Microbiol. 2010;13:198–203. doi: 10.1016/j.mib.2010.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Stasi R, Neves HI, Spira B. Phosphate uptake by the phosphonate transport system PhnCDE. BMC Microbiol. 2019;19:79. doi: 10.1186/s12866-019-1445-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Narr A, Nawaz A, Wick LY, Harms H, Chatzinotas A. Soil viral communities vary temporally and along a land use transect as revealed by virus-like particle counting and a modified community fingerprinting approach (fRAPD) Front. Microbiol. 2017;8:1975. doi: 10.3389/fmicb.2017.01975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Santos-Medellin C, et al. Viromes outperform total metagenomes in revealing the spatiotemporal patterns of agricultural soil viral communities. ISME J. 2021;15:1956–1970. doi: 10.1038/s41396-021-00897-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Tyson GW, Banfield JF. Rapidly evolving CRISPRs implicated in acquired resistance of microorganisms to viruses. Environ. Microbiol. 2008;10:200–207. doi: 10.1111/j.1462-2920.2007.01444.x. [DOI] [PubMed] [Google Scholar]
- 39.Sun CL, et al. Phage mutations in response to CRISPR diversification in a bacterial population. Environ. Microbiol. 2013;15:463–470. doi: 10.1111/j.1462-2920.2012.02879.x. [DOI] [PubMed] [Google Scholar]
- 40.Hurwitz BL, Westveld AH, Brum JR, Sullivan MB. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses. Proc. Natl Acad. Sci. USA. 2014;111:10714–10719. doi: 10.1073/pnas.1319778111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Jin M, et al. Diversities and potential biogeochemical impacts of mangrove soil viruses. Microbiome. 2019;7:58. doi: 10.1186/s40168-019-0675-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dinsdale EA, et al. Functional metagenomic profiling of nine biomes. Nature. 2008;452:629–632. doi: 10.1038/nature06810. [DOI] [PubMed] [Google Scholar]
- 43.Tedersoo L, et al. Fungal biogeography. Global diversity and geography of soil fungi. Science. 2014;346:1256688. doi: 10.1126/science.1256688. [DOI] [PubMed] [Google Scholar]
- 44.Miraldo A, et al. An Anthropocene map of genetic diversity. Science. 2016;353:1532–1535. doi: 10.1126/science.aaf4381. [DOI] [PubMed] [Google Scholar]
- 45.Bonnain C, Breitbart M, Buck KN. The Ferrojan horse hypothesis: iron-virus interactions in the ocean. Front. Mar. Sci. 2016;3:82. doi: 10.3389/fmars.2016.00082. [DOI] [Google Scholar]
- 46.Muratore D, Weitz JS. Infect while the iron is scarce: nutrient-explicit phage-bacteria games. Theor. Ecol. 2021;14:467–487. doi: 10.1007/s12080-021-00508-8. [DOI] [Google Scholar]
- 47.Kyle JE, Pedersen K, Ferris FG. Virus mineralization at low pH in the Rio Tinto. Spain Geomicrobiol. J. 2008;25:338–345. doi: 10.1080/01490450802402703. [DOI] [Google Scholar]
- 48.Kyle JE, Ferris FG. Geochemistry of virus–prokaryote interactions in freshwater and acid mine drainage environments, Ontario, Canada. Geomicrobiol. J. 2013;30:769–778. doi: 10.1080/01490451.2013.770978. [DOI] [Google Scholar]
- 49.Hewson I, O’Neil JM, Fuhrman JA, Dennison WC. Virus-like particle distribution and abundance in sediments and overlying waters along eutrophication gradients in two subtropical estuaries. Limnol. Oceanogr. 2001;46:1734–1746. doi: 10.4319/lo.2001.46.7.1734. [DOI] [Google Scholar]
- 50.Wu L, et al. Global diversity and biogeography of bacterial communities in wastewater treatment plants. Nat. Microbiol. 2019;4:1183–1195. doi: 10.1038/s41564-019-0426-5. [DOI] [PubMed] [Google Scholar]
- 51.Bates ST, et al. Global biogeography of highly diverse protistan communities in soil. ISME J. 2013;7:652–659. doi: 10.1038/ismej.2012.147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kuang JL, et al. Contemporary environmental variation determines microbial diversity patterns in acid mine drainage. ISME J. 2013;7:1038–1050. doi: 10.1038/ismej.2012.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Sant DG, Woods LC, Barr JJ, McDonald MJ. Host diversity slows bacteriophage adaptation by selecting generalists over specialists. Nat. Ecol. Evol. 2021;5:350–359. doi: 10.1038/s41559-020-01364-1. [DOI] [PubMed] [Google Scholar]
- 54.Betts A, Gray C, Zelek M, MacLean RC, King KC. High parasite diversity accelerates host adaptation and diversification. Science. 2018;360:907–911. doi: 10.1126/science.aam9974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Goldsmith DB, Parsons RJ, Beyene D, Salamon P, Breitbart M. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea. Peer. J. 2015;3:e997. doi: 10.7717/peerj.997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Goldsmith DB, et al. Development of phoH as a novel signature gene for assessing marine phage diversity. Appl. Environ. Microbiol. 2011;77:7730–7739. doi: 10.1128/AEM.05531-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Martiny AC, Coleman ML, Chisholm SW. Phosphate acquisition genes in Prochlorococcus ecotypes: evidence for genome-wide adaptation. Proc. Natl Acad. Sci. USA. 2006;103:12552–12557. doi: 10.1073/pnas.0601301103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Tetu SG, et al. Microarray analysis of phosphate regulation in the marine cyanobacterium Synechococcus sp. WH8102. ISME J. 2009;3:835–849. doi: 10.1038/ismej.2009.31. [DOI] [PubMed] [Google Scholar]
- 59.Zeng Q, Chisholm SW. Marine viruses exploit their host’s two-component regulatory system in response to resource limitation. Curr. Biol. 2012;22:124–128. doi: 10.1016/j.cub.2011.11.055. [DOI] [PubMed] [Google Scholar]
- 60.Kazakov AE, Vassieva O, Gelfand MS, Osterman A, Overbeek R. Bioinformatics classification and functional analysis of PhoH homologs. Silico Biol. 2003;3:3–15. [PubMed] [Google Scholar]
- 61.Bray RH, Kurtz LT. Determination of total, organic, and available forms of phosphorus in soils. Soil Sci. 1945;59:39–46. doi: 10.1097/00010694-194501000-00006. [DOI] [Google Scholar]
- 62.Hill AG, et al. Standardized general method for the determination of iron with 1,10-phenanthroline. Analyst. 1978;103:391–396. doi: 10.1039/an9780300391. [DOI] [Google Scholar]
- 63.Chesmin L, Yien CH. Turbidimetric determination of available sulphate. Soil Sci. Soc. Am. Proc. 1951;15:149–151. doi: 10.2136/sssaj1951.036159950015000C0032x. [DOI] [Google Scholar]
- 64.Fang Y, et al. Modified pretreatment method for total microbial DNA extraction from contaminated river sediment. Front. Environ. Sci. Eng. 2015;9:444–452. doi: 10.1007/s11783-014-0679-4. [DOI] [Google Scholar]
- 65.Bankevich A, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Hyatt D, et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 2010;11:119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.El-Gebali S, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–D462. doi: 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Eddy SR. Accelerated profile HMM searches. PLOS Comput. Biol. 2011;7:e1002195. doi: 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Roux S, Hallam SJ, Woyke T, Sullivan MB. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife. 2015;4:e08490. doi: 10.7554/eLife.08490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Roux S, et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature. 2016;537:689–693. doi: 10.1038/nature19366. [DOI] [PubMed] [Google Scholar]
- 72.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next- generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Kang DD, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi: 10.7717/peerj.7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Wu YW, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
- 75.Brown CT, et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature. 2015;523:208–201. doi: 10.1038/nature14486. [DOI] [PubMed] [Google Scholar]
- 76.Alneberg J, et al. Binning metagenomic contigs by coverage and composition. Nat. Methods. 2014;11:1144–1146. doi: 10.1038/nmeth.3103. [DOI] [PubMed] [Google Scholar]
- 77.Sieber CMK, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 2018;3:836–843. doi: 10.1038/s41564-018-0171-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Parks DH, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2017;2:1533–1542. doi: 10.1038/s41564-017-0012-7. [DOI] [PubMed] [Google Scholar]
- 79.Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36:1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Woodcroft BJ, et al. Genome-centric view of carbon processing in thawing permafrost. Nature. 2018;560:49–54. doi: 10.1038/s41586-018-0338-1. [DOI] [PubMed] [Google Scholar]
- 82.Edwards RA, McNair K, Faust K, Raes J, Dutilh BE. Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 2016;40:258–272. doi: 10.1093/femsre/fuv048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Rho M, Wu YW, Tang H, Doak TG, Ye Y. Diverse CRISPRs evolving in human microbiomes. PLoS Genet. 2012;8:e1002441. doi: 10.1371/journal.pgen.1002441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Paez-Espino D, et al. Uncovering Earth’s virome. Nature. 2016;536:425–430. doi: 10.1038/nature19094. [DOI] [PubMed] [Google Scholar]
- 85.Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Minh BQ, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:W256–W259. doi: 10.1093/nar/gkz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.R Development Core Team. R: A Language and environment for statistical computing. (2013).
- 90.Oksanen, J. et al. vegan: Community ecology package. R package version 2.5-5. (2019).
- 91.Harrell FE, Jr, Dupont MC. The hmisc package. R. package version. 2019;4:2–0. [Google Scholar]
- 92.R Development Core Team. The R Stats Package. R package version 4.0.3 (2013).
- 93.Rosseel Y. Lavaan: An R package for structural equation modeling and more. Version 0.5-12 (BETA) J. Stat. Soft. 2012;48:1–36. doi: 10.18637/jss.v048.i02. [DOI] [Google Scholar]
- 94.Flores CO, Meyer JR, Valverde S, Farr L, Weitz JS. Statistical structure of host-phage interactions. Proc. Natl Acad. Sci. USA. 2011;108:E288–E297. doi: 10.1073/pnas.1101595108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary data 1-9
Data Availability Statement
Raw reads of metagenomes and all assembled prokaryotic population genomes have been deposited in NCBI BioProject database under accession code PRJNA666025. Short Reads Archive accession numbers for individual reads are listed in Supplementary Data 8. Biosample accession numbers for individual prokaryotic genomes are listed in Supplementary Data 9. Assembled viral genomes are available from the NCBI BioProject database under accession code PRJNA648034. eggNOG database is available at http://eggnog5.embl.de/download/eggnog_5.0. NCBI viral RefSeq database is available at https://ftp.ncbi.nlm.nih.gov/refseq/release. WorldClim database is available at https://www.worldclim.org/data/worldclim21.html. Source data are provided with this paper.
The in-house Perl scripts, R scripts, Matlab scripts, and relevant data used to generate figures of this study are provided with this paper and publicly available on GitHub at https://github.com/eco-gaoshaom/viral-biogeography (10.5281/zenodo.6374561).