Skip to main content
mSystems logoLink to mSystems
. 2024 Feb 21;9(3):e00067-24. doi: 10.1128/msystems.00067-24

Increasing transposase abundance with ocean depth correlates with a particle-associated lifestyle

Juntao Zhong 1,2, Troy Osborn 1, Thais Del Rosario Hernández 1,3, Oleksandr Kyrysyuk 1,4, Benjamin J Tully 5, Rika E Anderson 1,
Editor: Samuel Chaffron6
Reviewed by: Tom Delmont7
PMCID: PMC10949469  PMID: 38380923

ABSTRACT

Transposases are mobile genetic elements that move within and between genomes, promoting genomic plasticity in microorganisms. In marine microbial communities, the abundance of transposases increases with depth, but the reasons behind this trend remain unclear. Our analysis of metagenomes from the Tara Oceans and Malaspina Expeditions suggests that a particle-associated lifestyle is the main covariate for the high occurrence of transposases in the deep ocean, and this trend holds true for individual genomes as well as in a community-wide sense. We observed a strong and depth-independent correlation between transposase abundance and the presence of biofilm-associated genes, as well as the prevalence of secretory enzymes. This suggests that mobile genetic elements readily propagate among microbial communities within crowded biofilms. Furthermore, we show that particle association positively correlates with larger genome size, which is in turn associated with higher transposase abundance. Cassette sequences associated with transposons are enriched with genes related to defense mechanisms, which are more highly expressed in the deep sea. Thus, while transposons spread at the expense of their microbial hosts, they also introduce novel genes and potentially benefit the hosts in helping to compete for limited resources. Overall, our results suggest a new understanding of deep ocean particles as highways for gene sharing among defensively oriented microbial genomes.

IMPORTANCE

Genes can move within and between microbial genomes via mobile genetic elements, which include transposases and transposons. In the oceans, there is a puzzling increase in transposase abundance in microbial genomes as depth increases. To gain insight into this trend, we conducted an extensive analysis of marine microbial metagenomes and metatranscriptomes. We found a significant correlation between transposase abundance and a particle-associated lifestyle among marine microbes at both the metagenome and genome-resolved levels. We also observed a link between transposase abundance and genes related to defense mechanisms. These results suggest that as microbes become densely packed into crowded particles, mobile genes are more likely to spread and carry genetic material that provides a competitive advantage in crowded habitats. This may enable deep sea microbes to effectively compete in such environments.

KEYWORDS: transposase, marine microbiology

INTRODUCTION

Mobile genetic elements (MGEs) are segments of DNA that facilitate the movement of genetic sequences within and between bacterial and archaeal genomes (1). MGEs generally encode enzymes to mediate this process (1): in transposons, the mediating enzymes are transposases, and the mechanism of transfer results in inverted repeats (2). During migration, some MGEs carry cassette sequences that often contain functional genes, including those for antibiotic resistance (3), metal uptake (4, 5), and regulatory genes influencing gene expression in host cells (6, 7).

Transposases are the most abundant and ubiquitous genes in nature (8), although their distribution among taxa is uneven (9, 10). In marine systems, one of the most striking—and as of yet unexplained—trends in microbial genomics is the distinct increase in transposase abundance with depth. Genomic analyses of samples collected at station ALOHA in the North Pacific revealed a substantial increase in transposase abundance from 500 m to their observed maximum at 4,000 m (11, 12). Transposases were one of the most overrepresented cluster of orthologous gene (COG) categories in ALOHA deep waters, accounting for 1.2% of all fosmid sequences from 4,000 m (11, 13). Similarly, a study of hydrothermal chimneys demonstrated a high abundance of transposases in biofilms, comprising 8% of all metagenomic reads, which is 10 times higher than observed in metagenomes from other habitats (14). Based on these observations, our analyses focused on three central questions: (i) why does transposase abundance increase with depth in marine systems? (ii) are transposases selfish genes with neutral or deleterious effects on host genomes? and (iii) do transposases provide useful functions to the microbial hosts that harbor them?

To better understand the high transposase abundance in the deep sea and to gain insights into the role of MGEs in marine microbial communities, we analyzed 138 microbial metagenomes and 152 microbial metatranscriptomes from the Tara Oceans Expedition (15, 16). The Tara Oceans samples spanned depths from 5 m to 1,000 m. In order to represent bathypelagic microbial communities, we also incorporated 58 metagenomes from the 2010 Malaspina Expedition (17), collected between 2,400 m and 4,000 m. To explore the role of transposases at a genome-resolved level, we analyzed a total of 3,290 metagenome-assembled genomes (MAGs) previously generated from the Tara Oceans (18, 19, 20) and Malaspina (17) metagenomes. Previous studies have shown that deep-sea prokaryotes have a predominantly particle-associated lifestyle (21). Here, we show that the increasing abundance of transposases with depth in ocean microbial communities is associated with a shift toward an inferred particle-associated lifestyle. In addition to particle association, we identified taxonomy and genome size as key covariates with transposase abundance in MAGs. Additionally, we observed a high abundance of open reading frames (ORFs) in the functional category “defense mechanisms” among the ORFs in cassette sequences associated with transposases, suggesting that transposons introduce beneficial genes to their microbial hosts inhabiting highly competitive habitats.

RESULTS AND DISCUSSION

Previous studies have revealed that transposase abundance in microbial genomes is associated with increased ocean depth, lower dissolved oxygen (DO) concentrations, and a particle-associated lifestyle (as opposed to a planktonic lifestyle) (9, 12, 22). However, the reason and nature of these correlations have not been further explored. To confirm these findings on a broader scale, we screened for significant covariates for transposase abundance in the Tara Oceans and Malaspina metagenomes. We plotted the transposase abundance against depth and DO. In our metagenome analysis, transposase abundance was defined as the proportion of reads mapped back to all transposase ORFs (the database used to identify putative transposase ORFs is shown in Table S1). We confirmed that transposase abundance increases steadily as depth increases, despite large variations across samples (Fig. 1). In every ocean, the transposase abundances of the deep water samples were higher than those of the shallow water samples (Table S2). Here, “deep water” was defined as mesopelagic (depth 250 m–1,000 m) and bathypelagic (2,500–4,000 m) waters, and “shallow water” was defined as surface (<10 m) and deep chlorophyll maximum (DCM, depth 17 m–120 m) waters.

Fig 1.

Fig 1

Samples from the deep ocean have higher transposase abundance. The abundance of transposase ORFs in metagenomic samples from the Tara Oceans and Malaspina Expedition, separated by depth and by filter size fraction. SRF, surface; DCM, deep chlorophyll maximum; MES, mesopelagic zone; BAT, bathypelagic zone. For Tara Oceans samples, size fractions were between 0.22 µm and 3.0 µm (here called “particle associated”) and 0.22 µm and 1.6 µm (here called “planktonic”). For Malaspina samples, the “particle-associated” size fraction was between 0.8 µm and 20 µm, and the “planktonic” size fraction was between 0.2 µm and 0.8 µm. The widths of the boxplots reflect the sample count in each category (counts shown on the right).

To investigate how MGE abundances relate to the planktonic and particle-associated lifestyle, we calculated transposase abundance in metagenomes collected from different filter sizes. In the Tara Oceans samples, samples from the 0.22 μm–3.0 μm filter size fraction were considered to be enriched in particle-associated cells, and samples from the 0.22 μm–1.6 µm size fraction were considered to be enriched in planktonic cells. We focused on these two size fractions for full metagenomic analysis as they are considered to be prokaryote-enriched (16); larger size fractions were considered to be eukaryote-enriched and thus were not included for full metagenomic analysis. In the Malaspina samples, samples from the 0.8 μm–5.0 μm size fraction were considered to be particle-associated and 0.2 μm–0.8 μm were planktonic (18). After controlling for depth, samples from particle-associated microbial communities were more enriched in transposases than their planktonic counterparts (Fig. 1; analysis of variance [ANOVA] F-test P = 0.0032).

The depth of a sample (e.g., surface, DCM) was a significant predictor for transposase abundance across all prokaryote-enriched size fractions (Table S3). Adding temperature, however, did not significantly improve the prediction of transposase abundance in a sample (ANOVA F-test P = 0.48, Table S3; Fig. S1A). Similarly, adding dissolved oxygen also did not improve the prediction of transposase abundance (ANOVA F-test P = 0.09, Table S3). This was partially due to the inconsistent relationship between DO and transposase abundance across depths. In the surface and bathypelagic samples, DO positively correlated with transposase abundance, but this correlation was reversed in DCM and mesopelagic samples (Fig. S1B).

Transposase abundance is positively correlated with a particle-associated lifestyle in all depths

To further investigate the relationship between transposase abundance and particle-associated lifestyles, we examined the correlation between transposase abundance and the percentage of predicted secretory carbohydrate-active enzymes (CAZymes) and peptidases among all CAZyme and peptidase ORFs (see Materials and Methods). We conducted these analyses across all prokaryote-enriched metagenomic samples. The percentage of secretory enzymes among all CAZymes and peptidases has previously been applied to quantify the degree to which microbial communities rely on a particle-associated lifestyle (21). Microorganisms rely on extracellular enzymes to degrade large particulate organic carbon into compounds of smaller molecular weight to incorporate into the cell (21, 23) and CAZymes and peptidases are key enzymes for carbohydrate and protein degradation, respectively (24, 25). To account for differences in biological processes at different depths, we normalized the gene and transcript abundance of predicted secretory CAZymes and peptidases by those of all CAZyme and peptidase ORFs in metatranscriptomic and metagenomic samples (the bathypelagic zone was excluded for all transcript analyses, because no metatranscriptomes were available for Malaspina samples). The large number of CAZymes and peptidases in the database make it challenging to identify their exact binding substrates, which might negatively impact our ability to quantify the particle association of a microbial community. However, microorganisms often use biofilms to attach to the surfaces of particles (23), and thus, we paired the CAZyme/peptidase analysis with a secondary analysis quantifying the abundance of biofilm-associated genes using a manually curated database of biofilm-associated ORFs (see Table S4 for accession numbers for the database of biofilm-associated genes).

From the Tara Oceans metagenomes, we identified 52,890 secretory CAZymes out of 421,080 CAZyme ORFs, and 179,932 secretory peptidases out of 1,294,210 peptidase ORFs; from the Malaspina metagenomes, we identified 7,002 secretory CAZymes in 40,415 CAZyme ORFs, and 18,014 secretory peptidases out of 67,617 peptidases (see Materials and Methods). We observed an increasing percentage of secretory CAZymes and peptidases with depth in both metagenomic and metatranscriptomic samples (Fig. S2A through D), which was consistent with previous findings (21). The biofilm-associated ORFs showed a similar increase in gene and transcript abundance with depth (Fig. S2E and F). The increase in the percentage of secretory CAZymes and peptidases with depth indicates increased extracellular enzyme activity, suggesting a shift toward a particle-associated lifestyle toward the deep ocean.

Marine particles are thought to be hotspots for horizontal gene transfer and transposase propagation (25), and the increased importance of a particle-associated lifestyle with depth in the ocean offers a promising explanation for the elevated transposase abundance in the deep ocean. In both metagenomic and metatranscriptomic samples, the abundance of transposases and the percentage of secretory CAZymes and peptidases were strongly correlated in general as well as within each depth (Fig. 2). Given a model with only depth as predictor of the transposase abundance in a sample, the addition of secretory CAZymes and peptidases each significantly improved the accuracy of prediction (both ANOVA F-test P = 3 × 10−7, Table S3). The depth-independent association between a particle-associated lifestyle and transposase abundance was also supported by persistent correlations between the abundance of transposase genes/transcripts with those of biofilm-associated ORFs in each depth (Fig. S3). Correlations in metagenomes and metatranscriptomes suggested that transposases are more abundant and more frequently transcribed in microbial communities with a greater reliance on a particle-associated lifestyle. After observing such trends at the community level, we sought to determine if these trends hold true at a genome-resolved level to support our hypothesis that the particle-associated lifestyle is a main driver for the high transposase abundance in the deep ocean.

Fig 2.

Fig 2

The gene potential (DNA) and transcript abundance (RNA) of transposases correlates with those of secretory CAZymes and peptidases. (A) The correlation between the abundance of transposases and secretory CAZyme and peptidases in metagenomes, separated according to marine layer. (B) The same correlation in metatranscriptomes: colors represent different ocean depths. The Spearman’s correlation coefficients, ρ, are shown on the top left. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Note that the percent transcript abundances are log-transformed.

Genomes from particle-associated samples have elevated transposase abundance

In order to determine whether the patterns identified above were consistent at the genome level, we undertook an analysis of 1,888 MAGs synthesized from co-assemblies of the Tara Oceans data that had previously been characterized across distinct size fractions (20). One major advantage to this approach was that transposases could be quantified on a genome-by-genome basis, and each genome could be designated as particle-associated or not based on their relative abundance in metagenomes from different size fractions. The 1,888 MAGs we used for this analysis were mostly bacterial (n = 1,778) with the rest being archaeal (n = 110), and all MAGs exhibited >70% completion and an average redundancy of 2.5%. From this analysis, we found that MAGs from the smallest size fraction (0.22 µm–5 µm), which should largely consist of planktonic cells, contained fewer transposases than MAGs from each of the larger size fractions, each of which should largely consist of particle-associated cells (Fig. 3). In particular, the proportion of MAGs with no transposases at all was found to be significantly different in the smallest size fraction compared to the larger size fractions (χ2-test, P < 2.2e−16). On the other hand, the proportion of MAGs with no transposases was not found to be significantly different among the larger size fractions (χ2-test, P = 0.8156), as would be expected if the MAGs from the smallest size fraction were largely planktonic and if MAGs from the larger size fractions were largely particle-associated. Taken together, these insights provide strong support for the hypothesis that transposase abundance is linked to a particle-associated lifestyle.

Fig 3.

Fig 3

Transposase abundance is markedly higher for particle-associated MAGs than planktonic ones. The x-axis here is square root-transformed for clarity, as log-transforming was not an option due to the amount of zeros in the data. Close to 70% of planktonic MAGs contained no transposases at all—a proportion that was found to be significantly different from the proportion of MAGs containing no transposases in the particle-associated group (see Results and Discussion). For this analysis, MAGs were considered planktonic if they came from the smallest size fraction (0.22 µm–5 µm) and particle-associated otherwise, as denoted by the colors of the boxplots. Note the wide difference in sample sizes among the size fractions.

Genome size and taxonomy are key factors to transposase abundance in populations

Given the enrichment of transposase ORFs in particle-associated MAGs, we ask three questions: (i) whether the correlation between transposase abundance persists with depth in MAGs; (ii) since larger genomes have an elevated rate of horizontal gene transfer (HGT) (24, 26), whether genome size correlates to transposase abundance in MAGs; and (iii) whether specific taxa encode more transposases than others, and thus whether community composition plays a role in determining transposase abundance. For this set of genome-resolved analyses, we analyzed a different set of 1,147 MAGs recovered from the Tara Oceans metagenomes across different depths and size fractions (18) as well as 255 MAGs from the Malaspina metagenomes (17). We chose to analyze this set of MAGs for these analyses because they spanned a wider depth range than the MAGs described in the analysis above. All MAGs in this analysis had percent completeness >70%, redundancy <10%, and contained <5% of eukaryotic sequences (see Materials and Methods). The transposase abundance in MAGs was quantified by the percentage of transposase ORFs among all ORFs in the MAG (abbreviated as %-transposase).

To address the first question, we found that MAGs from deep waters had higher transposase abundance than MAGs from shallow waters. Specifically, MAGs from the mesopelagic and bathypelagic zones had a mean %-transposase six times (95% CI: 5.38 to 6.57) that of MAGs from the surface and DCM.

Given that this set of MAGs was not classified by size fraction, we quantified the degree to which individual genomes were particle-associated by calculating the percentage of secretory CAZymes and peptidases out of all CAZyme and peptidase ORFs (a ratio of counts) in a MAG. This method has been verified in metagenomes separated through serial filtering techniques—MAGs assembled from samples with larger filter sizes contain a higher percentage of secretory CAZymes and peptidases in the genome compared to those filtered with smaller size fractions (27). We also independently validated this method by comparing the percentage of secretory CAZyme and peptidase ORFs in taxa generally known to be particle-associated or planktonic (Fig. S4). Taxonomy was analyzed at a class level, and the “complete genome size” was estimated using the genome length of a MAG divided by its estimated percent completeness. Although MAG assembly might bias against MGEs (28) and the completeness of MAGs might be underestimated for rare taxonomy groups (29, 30), we can still gain valuable insights by analyzing the distribution of transposase ORFs in MAGs of various genome sizes. To address the first question, we observed a positive correlation between the percentage of transposase ORFs and the percentage of secretory CAZymes and peptidases in MAGs (Spearman ρ = 0.208 and ρ = 0.210, respectively; both P < 10−14), thus confirming that the correlation between transposases and particle association persists on a genome-by-genome basis.

To examine the relationship between genome size and transposase abundance, we compared the estimated genome size of each MAG with the percentage of ORFs characterized as transposases within each MAG. In every depth, we observed a positive correlation between the estimated genome size and the percentage of transposase ORFs within a MAG (Spearman ρ was 0.56, 0.50, 0.46, and 0.60 for surface, DCM, mesopelagic, and bathypelagic samples, respectively; all P < 10−20), confirming previous speculations that larger genomes were linked to high transposase abundance (9, 12). While deep water genomes were bigger (Fig. 4A), the particle-associated lifestyle correlated with greater complete genome size in every depth (Fig. 5B). Previous work has shown that large genomes are associated with a high fraction of transposase ORFs in isolated microbial genomes (10, 31, 32), and in a study of metagenomes from the Baltic Sea (9). The correlation between transposase abundance and genome size was previously attributed to a higher frequency of HGT in larger genomes (24, 31). A correlation between the particle-associated lifestyle and genome size has also been previously observed and may be linked to the expanded metabolic versatility of particle-associated microbial populations (27). Thus, the particle-associated lifestyle might facilitate transposase propagation by reducing the distance between microorganisms, and may lead to larger genome size through increased rates of HGT. It is worth noting that the transposase abundance in MAGs peaked when the complete genome size reached 5 Mbp, but it stabilized or declined beyond the peak (Fig. 4C). A previous study has also reported a decreased proportion of transposase ORFs after bacterial genomes surpass 6 Mbp in size (10). However, most (88.9%) MAGs recovered from the Tara Oceans and Malaspina metagenomes were under 5 Mbp, so the association between large genomes and high transposase abundance generally held true.

Fig 4.

Fig 4

Deep-sea and particle-associated microbial populations tend to have larger genomes, which correlate with high transposase abundance in those populations. (A) The estimated complete genome size of MAGs (complete genome size = number of base pairs in a MAG/% completeness), grouped by depth. Letters on boxplots were generated from the Tukey honestly significant difference test. (B) Scatterplots showing the correlation between secretory CAZymes/peptidases and the estimated complete genome size of MAGs. (C) The transposase abundance in MAGs, grouped by depth and complete genome size. See Materials and Methods for the determination of depth of a MAG.

Fig 5.

Fig 5

Selection pressure and transposase abundance are not correlated on a genome-resolved scale. (A) The median pN/pS ratio of MAGs, grouped by depth. Since pN/pS were calculated on a per-sample basis, a MAG might have several median pN/pS ratios from multiple samples. (B) The relationship between the selection pressure (median pN/pS) and the transposase abundance of a MAG, separated by depth. Only MAGs with ≥100 pN/pS were computed for median pN/pS.

To address the third question, we found that taxonomic class was also a key predictor of transposase abundance in MAGs; information on the taxonomy and depth of MAGs together explained 53% of the variance in transposase abundance (Table 1). MAGs of the taxonomic classes Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, and Actinobacteria were enriched in transposases compared to other MAGs (for each class with ≥10 MAGs, the %-transposase of MAGs in that class was compared against those of all other MAGs; Wilcoxon test P cutoff: 0.05). MAGs of Flavobacteria, Acidimicrobidae, novelClass_E, and SAR202-2 had low transposase abundance. Taxa that were enriched/low in transposases mostly matched with previous studies (9, 10), except that Actinobacteria had previously been reported to be low in transposase abundance.

TABLE 1.

Multiple stepwise regression for log-transformed transposase abundance (%) of 1,402 MAGsa

Explanatory variable(s) P for F-test Cumulative R2
Depth <10−10 0.34
Depth + secretory CAZyme (%) <10−10 0.37
Depth + secretory peptidase (%) <10−10 0.37
Depth + genome size (Mbp) <10−10 0.52
Depth + taxon (class) <10−10 0.53
Depth + taxon + genome size + secretory CAZyme + secretory peptidase NAb 0.62
a

An ANOVA test was performed on each newly added covariate.

b

NA, not applicable.

The link between taxonomy and transposase abundance could partially be attributed to depth. MAGs of transposase-enriched classes were more abundant in deep waters, and MAGs from low transposase classes were more abundant in shallow waters (2 χ2-tests, both P < 1 × 10−6; Fig. S5). Thus, although community composition is a key covariate to transposase abundance, the distribution of taxa is nonetheless linked to the depth of a microbial community.

Relaxed selection does not explain transposase abundance in genomes

It is possible that transposases accumulate in deep ocean microbial populations as a result of genetic drift, because those populations tend to be smaller in size (33) and experience slow growth rates (34). To examine this possibility, we tested whether the strength of selection experienced by populations correlated with their transposase abundance. For each ORF in each MAG, we calculated its pN/pS ratio in each sample from its designated depth (pN/pS was only calculated for ORFs with ≥20× coverage). pN/pS represents the proportion of nonsynonymous mutations to the proportion of synonymous mutations, and characterizes selection at the level of the population, in contrast to dN/dS, which characterizes selection between individual species (3537).

We observed that MAGs from the mesopelagic zone had a higher median pN/pS than MAGs from the surface and DCM, and MAGs from the bathypelagic zone had a higher median pN/pS than MAGs from the mesopelagic zone (both P < 5 × 10−6, Fig. 5A). The high median pN/pS of populations from the deeper layers suggested relaxed selection pressure relative to populations from shallower layers, which was in agreement with previous results (12). The general explanation for such a trend is a slower replication rate, which leads to smaller effective population sizes and a reduced selection effect in the deep ocean (11, 12). However, further work is needed to substantiate this hypothesis.

Since transposases and deep-sea ORFs were both under relaxed selection, we tested whether high transposase abundance correlated with relaxed selection in a population. However, we did not observe a correlation between the median pN/pS of MAGs and their transposase abundance (Spearman ρ = −0.075, P = 0.08). Furthermore, the relationship between the selective pressure experienced by a population and its transposase abundance was inconsistent across different depths (Fig. 5B). It has been previously suggested that small populations would experience relaxed selection against mobile genetic elements (38, 39). However, our analyses show that although relaxed selection and transposase enrichment co-occur in deep oceans, we did not observe this trend on a genome-by-genome basis, suggesting that there is not a direct connection between transposase abundance and relaxed selection at a genome-resolved scale, and thus within individual microbial populations.

Transposons/integrons carry a high proportion of ORFs related to defense mechanisms

One important question regarding high transposase abundance in the deep sea is whether these transposases are neutral or deleterious, or whether they perform important functions in the genomes encoding them. In some cases, transposases are deleterious and proliferate as a result of genetic drift (40), but transposases can also benefit the host by introducing functional and regulatory genes (6, 25, 41, 42).

Transposases mediate the migration and integration of cassette sequences into host genomes; a transposon or integron is the combination of a transposase and its cassettes. Thus, cassette sequences are functionally similar to auxiliary metabolic genes (AMGs) (43) in that they are carried by a selfish MGE (or viruses in the case of AMGs) to increase the fitness of the host and therefore increase the fitness of the MGE.

To determine whether transposons in marine habitats carry advantageous genes, we began by querying the functional categories of cassette sequences. The software package Integron Finder (10) was used to locate cassettes on contigs; the program searches for the two palindromic flanking sites necessary for transposase-mediated recombination as well as a nearby transposase/integrase. We treated transposons and integrons as equivalent, due to the sequence and function similarity between integrases and transposases (10). We found 8,519 cassette ORFs from the co-assembled Malaspina (17) and Tara Oceans (16, 18) metagenomes (see Materials and Methods). Only 27% of the cassette sequences were assigned with COG annotations, which was a lower annotation rate compared to other ORFs in the Tara Oceans (55%–60%) and Malaspina (68%) metagenomes.

Compared to the rest of the metagenome, cassettes were enriched in ORFs of the COG categories “replication, recombination, and repair,” “defense mechanisms,” and “mobilome: prophages, transposons” (Fig. 6A). Transposons/integrons are expected to be enriched in ORFs related to the mobilome and replication/recombination. The prevalence of defense mechanism genes in cassettes was noteworthy: defense mechanisms accounted for 14.3% of known function calls in cassettes, but only accounted for 2.47% of known function calls in non-cassette ORFs (“known” was defined as all functional groups except “function unknown” and “general function prediction”). Out of the 276 defense mechanism cassettes, 123 of them encoded toxin/antitoxin genes. Since transposases are more pervasive on particles, toxin and defense genes they carry would be useful in competition and protection on crowded particles (4447). Moreover, the percentage of secretory CAZymes and peptidases in metagenomes both correlated with the abundance of defense mechanism ORFs (Fig. 6B). Similarly, we also found a strong correlation between the gene abundance of defense mechanism ORFs and that of biofilm-associated ORFs (Fig. S6). Thus, if defense mechanism genes in the deep ocean were more highly expressed, this would substantiate the hypothesis that deep ocean microorganisms benefit from novel genes introduced by integrons/transposons.

Fig 6.

Fig 6

Cassette sequences include high proportions of defense mechanism genes, which are more abundant in microbial communities that rely heavily on a particle-associated lifestyle. (A) The distribution of COG functional categories in cassette and non-cassette metagenome ORFs. T.M., transport and metabolism. One thousand nine hundred fifty-one cassette ORFs with COG function calls were identified from the 10 co-assembled Tara metagenomes, and 359 from the co-assembled Malaspina metagenome. Five million ORFs with COG function calls were then sampled from the Tara Oceans and Malaspina metagenomes according to the ratio of identified cassettes described above. (B) The correlation between the abundance of secretory CAZyme/peptidase ORFs and defense mechanism ORFs in metagenomes. Samples from different depths are distinguished by different colors. The Spearman’s correlation coefficients, ρ, are shown on top left. **** indicates that P < 0.0001.

ORFs related to defense mechanisms, secretory CAZymes, and biofilm-associated genes have higher expression in the deep sea

To determine whether defense mechanism genes are more highly expressed and how this correlates to expression of particle-associated genes, we calculated their RNA/DNA ratios (defined as the transcript abundance of a target gene divided by its gene abundance) of secretory CAZyme, secretory peptidase, defense mechanism, and transposase ORFs in each Tara Oceans sample with a paired metagenome and metatranscriptome (n = 94). Malaspina samples were excluded from this analysis because no metatranscriptomes were sequenced.

The ORFs related to secretory CAZymes and defense mechanisms had greater RNA/DNA ratios in the mesopelagic zone than in the surface and DCM (Fig. 7), demonstrating a greater need for these genes as deep ocean microbial communities switch toward a particle-associated lifestyle. The RNA/DNA ratio of biofilm-associated ORFs also supported this switch to particle-associated lifestyle (Fig. S7). In contrast, although transposases were more abundant in the mesopelagic zone (Fig. S8), their RNA/DNA ratios were similar across all depths (Fig. 7). A possible explanation is that although transposases are not upregulated, they sometimes carry beneficial cassette genes, which are increasingly expressed in deep-sea microbial genomes.

Fig 7.

Fig 7

Particle-associated and defense mechanism ORFs are more highly expressed in deeper waters, but transposases are not. Log-transformed RNA/DNA ratios of target genes in each sample, separated by depth. *P < 0.05, **P < 0.01, ***P < 0.001.

Conclusion

Our analysis provides insights into the factors driving the increasing abundance of transposases with ocean depth, highlighting the interplay between the particle-associated lifestyle, expanded genome size, and selection for defense mechanism genes (Fig. 8). We hypothesize that as microbial communities shift from a predominantly planktonic to particle-associated lifestyle from the surface to the bathypelagic zone, microbial communities become more densely packed on particles, leading to rampant transposase spread and more intense resource competition. Additionally, particle-associated microorganisms tend to have larger genomes, which are associated with high transposase abundance. These high abundances of mobile genetic elements actively shape the ecology of particle-associated microbial communities by introducing novel genes, with cassette sequences being particularly enriched in defense mechanism genes. These genes are highly expressed in the deep ocean, offering competitive advantages in the competitive particle-associated environment (44, 46).

Fig 8.

Fig 8

Potential mechanisms linking a particle-associated lifestyle with the high abundance of transposons in the deep ocean.

When interpreting these results, it is important to consider certain caveats. Our analysis focused on pelagic ocean water samples to minimize potential confounding variables, and thus, these observed trends may not apply to regions with unique characteristics, such as Arctic sea ice (48) and deep-sea hydrothermal vents (14). Moreover, as our results are based on correlative analyses, further experimental evidence is needed to establish causation and to identify mechanistic relationships between these variables.

Understanding these trends in mobile genetic element abundance is important for understanding gene flow, competition, and other ecological characteristics of the dark ocean, one of the largest habitats on Earth. Our results show a strong association between transposase abundance and a particle-associated lifestyle, suggesting that transposons may enable microbial lineages to compete in crowded biofilm-associated habitats. We did not find any correlation between the strength of selection and transposase abundance at the genome level, indicating that relaxed selection pressure is not the primary driver for the high transposase abundance in the deep ocean. Overall, our results suggest an emerging understanding of the ocean as a stratified system in which the deep ocean acts as a gene-sharing highway, fostering networks of gene exchange, particularly on particles. This results in large genomes with many defense-oriented genes in the deep-sea microbial communities, contrasting with the streamlined and specialized genomes that dominate the surface oceans. Future experimental studies should establish mechanistic connections between transposase gene cassette contents and microbial activity, particularly in planktonic and particle-associated communities.

MATERIALS AND METHODS

Analysis of the ocean microbial reference catalog v2 (OM-RGC.v2) from the Tara Oceans project

OM-RGC.v2 contained 47 million non-redundant ORFs (15, 16), and the coverage of each ORF in every metagenomic (n = 138) and metatranscriptomic (n = 152) sample (https://www.ocean-microbiome.org/) (15). Samples from the Arctic Ocean were excluded. We identified transposases and biofilm-associated ORFs in OM-RGC.v2 through TBLASTN (e-value <10−5). The transposase database contained “transposase” and “integrase” genes from the Pfam database (49) (Table S1). We identified ORFs of the “defense mechanisms” category through COG annotations in OM-RGC.v2. Finally, the coverage data in OM-RGC.v2 was used to determine the relative abundance of a target gene in a sample.

Acquisition of global ocean metagenomes

Metagenomic reads of the Tara Oceans Project and Malaspina Expedition were downloaded from the European Nucleotide Archive under study accessions PRJEB402 and PRJEB44456, respectively. Methods for sample collection and Illumina sequencing were described by Sunagawa et al. (50) for the Tara Oceans Project and Acinas et al. (17) for the Malaspina Expedition. For the assembled contigs used as references for mapping of the Tara Oceans metagenomes, we used 10 existing co-assembled metagenomes from the 10 oceanic provinces (18). For the reference contigs of the Malaspina metagenomes, we downloaded the co-assembled metagenome of all 58 samples (17). Samples from the Tara Oceans were filtered across a variety of size fractions. For full metagenomic analysis, we examined only samples that were filtered through 1.6 µm or 3 µm filters and collected on 0.22 µm filters (50). Samples from the Malaspina Expedition were collected from the 0.2 µm–0.8 µm and 0.8 µm–5 µm size fractions (17). We used EukDetect (51) to confirm that samples collected with a larger size fraction did not have a higher contribution from eukaryotic reads (see Supplementary Text).

Prediction of secretory CAZyme and peptidase ORFs

We adapted the methods employed in Zhao et al. (21) to assess whether microbial communities were largely planktonic or particle-associated by quantifying the relative abundance of secretory CAZymes and peptidases. Predicted CAZymes and peptidases were annotated using DIAMOND (2.0.15) (52) BLASTP (e-value <10−10) to search against the dbCAN (53) and MEROPS (54) databases, respectively. SignalP (5.0) (55) was used to identify signal peptides. We used the Gram-positive mode for ORFs affiliated with Actinobacteria and Firmicutes, Gram-negative mode for other bacterial phylum, and Archaea mode for ORFs affiliated with the Archaea domain. Only ORFs in the Bacteria or Archaea domains were included in the analysis. For MAGs, counts of secretory CAZymes and peptidases were used instead of abundance; validation based on predicted lifestyles of known taxa confirmed that this gave an accurate prediction of lifestyle (see Supplementary Text).

MAG selection for size fraction and lifestyle analysis

To analyze transposase abundance by size fraction (and thus lifestyle), we used a collection of 1,888 bacterial and archaeal MAGs derived from Tara Oceans co-assemblies recovered by Delmont et al. (20). All of these MAGs exhibited >70% completion, with an average completion of 87.1% and an average redundancy of 2.5%. Previous analysis of these MAGs used mapping to determine the relative abundances of each MAG across different size fractions. Bins were only considered to be present if ≥25% of the length of the MAG was mapped by reads in a given sample (20). We took the size fraction with the highest proportion of mapped reads to be the designated size fraction of that MAG. Four MAGs could not be assigned a size fraction and were thus left out, leaving a final sample size of 1,884. A large proportion of our MAGs were assigned the smallest size fraction (n = 1,586), with fewer in the larger size fractions (n = 175 for 5 µm–20 µm, n = 64 for 20 µm–180 µm, n = 59 for 180 µm–2,000 µm).

To quantify transposase abundance, we used BLASTP (56) to search each MAG against the transposase database previously discussed. The percent transposase abundance was calculated as the number of unique BLASTP hits (e < 10−5) divided by the total number of ORFs in the MAG, as identified by Prokka (v. 1.14.6) (57) using default settings.

All data processing and graphics for this analysis was done in R (v.4.3.2) (58) using the tidyverse ecosystem (v.2.0.0) (59).

MAG selection for depth, genome size, and taxonomy analysis

To conduct genome-resolved analyses focused on depth, genome size, taxonomy and signatures of selection, we selected 1,147 out of 2,631 MAGs that had previously been recovered from the Tara Oceans metagenomes (18) and 255 out of 317 MAGs recovered from the Malaspina metagenomes (17). All selected MAGs had <5% of eukaryotic sequences(in base pairs, Tiara [60] was used to identify putative eukaryotic contigs), completeness >70%, and redundancy <10% (both criteria were calculated with CheckM [61]).

Determination of depth and lifestyles for each MAG

MAGs that originated from co-assemblies in a specific province in the Tara Oceans data set (e.g., Red Sea), as in Tully et al. (18), were used to recruit reads from all Tara Oceans samples (n = 180) using Bowtie2 (default parameters). Each province was performed separately. SAM files were converted to BAM files using samtools (62) and used to determine RPKM (reads per kilobase pair MAG per million pair metagenome) as in Graham et al. (63).

To assign the depth origin of a MAG from the Tara Oceans metagenomes, we summed up its RPKM in surface, DCM, and mesopelagic samples (three groups, separately). The layer with the highest RPKM sum was assigned as the depth of that MAG. All Malaspina MAGs belonged to the bathypelagic layer.

Mapping and calculating pN/pS ratios of MAGs

To assess the strength of selection of target ORFs within specific MAGs, we calculated the pN/pS ratio (the proportion of nonsynonymous mutations over the proportion of synonymous mutations) for all ORFs within each target MAG. To do this, we mapped the metagenomic reads against the assembled contigs for each metagenome. We mapped the raw reads of each of the Tara Oceans metagenomes to the co-assembly from their corresponding provinces, and all raw reads of each of the Malaspina metagenomes to one co-assembled metagenome using Bowtie2 (v2.2.9; paired-end alignment with default parameters) (64). We calculated pN/pS for each ORF within each individual MAG using anvi’o (65) with the script “anvi-script-calculate-pn-ps-ratio” with the “--min-coverage” flag set to 20. The pN/pS ratios were calculated on a sample-per-sample basis, so an ORF from a co-assembled metagenome might have multiple pN/pS ratios from different samples.

Integron and cassette sequences detection

We used Integron Finder (v2) (10) to identify cassette sequences from the Tara Oceans and Malaspina metagenomes. Integron Finder uses HMMER to locate the integron-integrase intI, which is conserved for most integrons. Then, cassette sequences are identified with the near-palindromic flaking regions. anvi’o (65) was used to identify and annotate ORFs on metagenomic contigs using the script “anvi-run-ncbi-cogs.” If the anvi’o ORF calls from Prodigal (66) were within 100 bps (start and stop) of the Integron Finder’s cassette calls, the anvi’o COG annotations were used for those cassette ORFs.

Statistical analysis

We used the Wilcoxon signed-rank test to compare the differences between means of two numeric variables, and the Spearman’s rank correlation was used to determine the association between two numeric variables. Multiple hypothesis P-values were adjusted using the Benjamini-Hochberg procedure. All linear regressions had a normal distribution of residuals (Shapiro-Wilk test). An ANOVA F-test was used to perform model comparisons (whether adding additional variables makes the prediction significantly better). Statistical significance was assumed if P < 0.05. All statistical analyses were performed using base R packages (v.4.1.3 and v.4.3.2).

Supplementary Material

Reviewer comments
reviewer-comments.pdf (259.8KB, pdf)

ACKNOWLEDGMENTS

We would like to thank Dr. Shinichi Sunagawa for kindly providing information about the Tara Oceans data sets and the Ocean Microbial Reference Catalog, and Dr. Silvia G. Acinas for providing information about the Malaspina Deep Ocean data set. Dr. Murat Eren provided help with anvi’o, Mike Tie provided assistance with server administration, and Mark McKone provided statistical advice.

Funding for J.Z., O.K., and T.O. was provided by grants from the Towsley Endowment at Carleton College. Funding for T.O. was also provided by the Rosenow fund at Carleton College, and funding for T.D.R.H. was provided by the Summer Science Fellows program at Carleton College.

Contributor Information

Rika E. Anderson, Email: randerson@carleton.edu.

Samuel Chaffron, CNRS Delegation Bretagne et Pays de Loire, Nantes, France.

Tom Delmont, University of Chicago, Chicago, Illinois, USA.

DATA AVAILABILITY

All Python, R scripts, code explanations, and raw data for analysis are publicly accessible on GitHub at https://github.com/carleton-spacehogs/transposase-deep-ocean.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/msystems.00067-24.

Supplemental Information. msystems.00067-24-s0001.docx.

Supplemental figures and tables.

DOI: 10.1128/msystems.00067-24.SuF1
Table S1. msystems.00067-24-s0002.xlsx.

Accession numbers of 2,307 seed sequences of "transposase" and "integrase" genes from the Pfam database.

DOI: 10.1128/msystems.00067-24.SuF2
OPEN PEER REVIEW. reviewer-comments.pdf.

An accounting of the reviewer comments and feedback.

reviewer-comments.pdf (259.8KB, pdf)
DOI: 10.1128/msystems.00067-24.SuF3

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Frost LS, Leplae R, Summers AO, Toussaint A. 2005. Mobile genetic elements: the agents of open source evolution. Nat Rev Microbiol 3:722–732. doi: 10.1038/nrmicro1235 [DOI] [PubMed] [Google Scholar]
  • 2. Muñoz-López M, García-Pérez JL. 2010. DNA transposons: nature and applications in Genomics. Curr Genomics 11:115–128. doi: 10.2174/138920210790886871 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Collis CM, Hall RM. 1995. Expression of antibiotic resistance genes in the integrated cassettes of integrons. Antimicrob Agents Chemother 39:155–162. doi: 10.1128/AAC.39.1.155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rankin DJ, Rocha EPC, Brown SP. 2011. What traits are carried on mobile genetic elements, and why?. Heredity (Edinb) 106:1–10. doi: 10.1038/hdy.2010.24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hacker J, Carniel E. 2001. Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep 2:376–381. doi: 10.1093/embo-reports/kve097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Jones JM, Grinberg I, Eldar A, Grossman AD. 2021. A mobile genetic element increases bacterial host fitness by manipulating development. Elife 10:e65924. doi: 10.7554/eLife.65924 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Casacuberta E, González J. 2013. The impact of transposable elements in environmental adaptation. Mol Ecol 22:1503–1517. doi: 10.1111/mec.12170 [DOI] [PubMed] [Google Scholar]
  • 8. Aziz RK, Breitbart M, Edwards RA. 2010. Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res 38:4207–4217. doi: 10.1093/nar/gkq140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Vigil-Stenman T, Ininbergs K, Bergman B, Ekman M. 2017. High abundance and expression of transposases in bacteria from the Baltic sea. ISME J 11:2611–2623. doi: 10.1038/ismej.2017.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cury J, Jové T, Touchon M, Néron B, Rocha EP. 2016. Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Res 44:4539–4550. doi: 10.1093/nar/gkw319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard N-U, Martinez A, Sullivan MB, Edwards R, Brito BR, Chisholm SW, Karl DM. 2006. Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311:496–503. doi: 10.1126/science.1120250 [DOI] [PubMed] [Google Scholar]
  • 12. Konstantinidis KT, Braff J, Karl DM, DeLong EF. 2009. Comparative metagenomic analysis of a microbial community residing at a depth of 4,000 meters at station ALOHA in the North Pacific subtropical gyre. Appl Environ Microbiol 75:5345–5355. doi: 10.1128/AEM.00473-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Worden AZ, Cuvelier ML, Bartlett DH. 2006. In-depth analyses of marine microbial community genomics. Trends Microbiol 14:331–336. doi: 10.1016/j.tim.2006.06.008 [DOI] [PubMed] [Google Scholar]
  • 14. Brazelton WJ, Baross JA. 2009. Abundant transposases encoded by the metagenome of a hydrothermal chimney biofilm. ISME J 3:1420–1424. doi: 10.1038/ismej.2009.79 [DOI] [PubMed] [Google Scholar]
  • 15. Salazar G, Paoli L, Alberti A, Huerta-Cepas J, Ruscheweyh H-J, Cuenca M, Field CM, Coelho LP, Cruaud C, Engelen S, et al. 2019. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell 179:1068–1083. doi: 10.1016/j.cell.2019.10.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, et al. 2015. Structure and function of the global ocean microbiome. Science 348:1261359. doi: 10.1126/science.1261359 [DOI] [PubMed] [Google Scholar]
  • 17. Acinas SG, Sánchez P, Salazar G, Cornejo-Castillo FM, Sebastián M, Logares R, Royo-Llonch M, Paoli L, Sunagawa S, Hingamp P, et al. 2021. Deep ocean metagenomes provide insight into the metabolic architecture of bathypelagic microbial communities. Commun Biol 4:604. doi: 10.1038/s42003-021-02112-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Tully BJ, Graham ED, Heidelberg JF. 2018. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci Data 5:170203. doi: 10.1038/sdata.2017.203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Delmont TO. 2021. Discovery of nondiazotrophic Trichodesmium species abundant and widespread in the open ocean. Proc Natl Acad Sci U S A 118:e2112355118. doi: 10.1073/pnas.2112355118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Delmont TO, Pierella Karlusich JJ, Veseli I, Fuessel J, Eren AM, Foster RA, Bowler C, Wincker P, Pelletier E. 2022. Heterotrophic bacterial diazotrophs are more abundant than their cyanobacterial counterparts in metagenomes covering most of the sunlit ocean. ISME J 16:927–936. doi: 10.1038/s41396-021-01135-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Zhao Z, Baltar F, Herndl GJ. 2020. Linking extracellular enzymes to phylogeny indicates a predominantly particle-associated lifestyle of deep-sea prokaryotes. Sci Adv 6:eaaz4354. doi: 10.1126/sciadv.aaz4354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Ganesh S, Parris DJ, DeLong EF, Stewart FJ. 2014. Metagenomic analysis of size-fractionated picoplankton in a marine oxygen minimum zone. ISME J 8:187–211. doi: 10.1038/ismej.2013.144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Tuson HH, Weibel DB. 2013. Bacteria-surface interactions. Soft Matter 9:4368–4380. doi: 10.1039/C3SM27705D [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Cordero OX, Hogeweg P. 2009. The impact of long-distance horizontal gene transfer on prokaryotic genome size. Proc Natl Acad Sci U S A 106:21748–21753. doi: 10.1073/pnas.0907584106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Stewart FJ. 2013. Where the genes flow. Nature Geosci 6:688–690. doi: 10.1038/ngeo1939 [DOI] [Google Scholar]
  • 26. Koonin EV, Wolf YI. 2008. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 36:6688–6719. doi: 10.1093/nar/gkn668 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Leu AO, Eppley JM, Burger A, DeLong EF. 2022. Diverse genomic traits differentiate sinking-particle-associated versus free-living microbes throughout the oligotrophic open ocean water column. mBio 13:e0156922. doi: 10.1128/mbio.01569-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Meziti A, Rodriguez-R LM, Hatt JK, Peña-Gonzalez A, Levy K, Konstantinidis KT. 2021. The reliability of metagenome-assembled genomes (MAGs) in representing natural populations: insights from comparing MAGs against isolate genomes derived from the same fecal sample. Appl Environ Microbiol 87:e02593-20. doi: 10.1128/AEM.02593-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Garcia SL, Buck M, McMahon KD, Grossart H-P, Eiler A, Warnecke F. 2015. “Auxotrophy and intrapopulation complementary in the “interactome” of a cultivated freshwater model community”. Mol Ecol 24:4449–4459. doi: 10.1111/mec.13319 [DOI] [PubMed] [Google Scholar]
  • 30. Rodríguez-Gijón A, Nuy JK, Mehrshad M, Buck M, Schulz F, Woyke T, Garcia SL. 2021. A genomic perspective across earth’s microbiomes reveals that genome size in archaea and bacteria is linked to ecosystem type and trophic strategy. Front Microbiol 12:761869. doi: 10.3389/fmicb.2021.761869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Iranzo J, Gómez MJ, López de Saro FJ, Manrubia S. 2014. Large-scale genomic analysis suggests a neutral punctuated dynamics of transposable elements in bacterial genomes. PLoS Comput Biol 10:e1003680. doi: 10.1371/journal.pcbi.1003680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Touchon M, Rocha EPC. 2007. Causes of insertion sequences abundance in prokaryotic genomes. Mol Biol Evol 24:969–981. doi: 10.1093/molbev/msm014 [DOI] [PubMed] [Google Scholar]
  • 33. Costello MJ, Chaudhary C. 2017. Marine biodiversity, biogeography, deep-sea gradients, and conservation. Curr Biol 27:R511–R527. doi: 10.1016/j.cub.2017.04.060 [DOI] [PubMed] [Google Scholar]
  • 34. Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, Arrieta JM, Herndl GJ. 2006. “Microbial diversity in the deep sea and the underexplored “rare biosphere” Proc Natl Acad Sci U S A 103:12115–12120. doi: 10.1073/pnas.0605127103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Simmons SL, Dibartolo G, Denef VJ, Goltsman DSA, Thelen MP, Banfield JF. 2008. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation. PLoS Biol 6:e177. doi: 10.1371/journal.pbio.0060177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654. doi: 10.1038/351652a0 [DOI] [PubMed] [Google Scholar]
  • 37. Schloissnig S, Arumugam M, Sunagawa S, Mitreva M, Tap J, Zhu A, Waller A, Mende DR, Kultima JR, Martin J, Kota K, Sunyaev SR, Weinstock GM, Bork P. 2013. Genomic variation landscape of the human gut microbiome. Nature 493:45–50. doi: 10.1038/nature11711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, Harris DE, Holden MTG, Churcher CM, Bentley SD, Mungall KL, et al. 2003. Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 35:32–40. doi: 10.1038/ng1227 [DOI] [PubMed] [Google Scholar]
  • 39. Moran NA, Plague GR. 2004. Genomic changes following host restriction in bacteria. Curr Opin Genet Dev 14:627–633. doi: 10.1016/j.gde.2004.09.003 [DOI] [PubMed] [Google Scholar]
  • 40. Escobar-Páramo P, Ghosh S, DiRuggiero J. 2005. Evidence for genetic drift in the diversification of a geographically isolated population of the hyperthermophilic archaeon Pyrococcus. Mol Biol Evol 22:2297–2303. doi: 10.1093/molbev/msi227 [DOI] [PubMed] [Google Scholar]
  • 41. Flynn KJ, Swanson MS. 2014. Integrative conjugative element ICE-βox confers oxidative stress resistance to Legionella pneumophila in vitro and in macrophages. mBio 5:e01091-14. doi: 10.1128/mBio.01091-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Sullivan JT, Ronson CW. 1998. Evolution of rhizobia by acquisition of a 500-kb symbiosis island that integrates into a phe-tRNA gene. Proc Natl Acad Sci U S A 95:5145–5149. doi: 10.1073/pnas.95.9.5145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Breitbart M, Bonnain C, Malki K, Sawaya NA. 2018. Phage puppet masters of the marine microbial realm. Nat Microbiol 3:754–766. doi: 10.1038/s41564-018-0166-y [DOI] [PubMed] [Google Scholar]
  • 44. Nadell CD, Drescher K, Foster KR. 2016. Spatial structure, cooperation and competition in biofilms. Nat Rev Microbiol 14:589–600. doi: 10.1038/nrmicro.2016.84 [DOI] [PubMed] [Google Scholar]
  • 45. Basler M, Ho BT, Mekalanos JJ. 2013. Tit-for-tat: type VI secretion system counterattack during bacterial cell-cell interactions. Cell 152:884–894. doi: 10.1016/j.cell.2013.01.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Hayes CS, Aoki SK, Low DA. 2010. Bacterial contact-dependent delivery systems. Annu Rev Genet 44:71–90. doi: 10.1146/annurev.genet.42.110807.091449 [DOI] [PubMed] [Google Scholar]
  • 47. Ho BT, Dong TG, Mekalanos JJ. 2014. A view to a kill: the bacterial type VI secretion system. Cell Host Microbe 15:9–21. doi: 10.1016/j.chom.2013.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Rapp JZ, Sullivan MB, Deming JW. 2021. Divergent genomic adaptations in the microbiomes of arctic subzero sea-ice and cryopeg brines. Front Microbiol 12:701186. doi: 10.3389/fmicb.2021.701186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. 2021. Pfam: the protein families database in 2021. Nucleic Acids Res 49:D412–D419. doi: 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Sunagawa S, Acinas SG, Bork P, Bowler C, Tara Oceans Coordinators, Eveillard D, Gorsky G, Guidi L, Iudicone D, Karsenti E, Lombard F, Ogata H, Pesant S, Sullivan MB, Wincker P, de Vargas C. 2020. Tara oceans: towards global ocean ecosystems biology. Nat Rev Microbiol 18:428–445. doi: 10.1038/s41579-020-0364-5 [DOI] [PubMed] [Google Scholar]
  • 51. Lind AL, Pollard KS. 2021. Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing. Microbiome 9:58. doi: 10.1186/s40168-021-01015-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Buchfink B, Reuter K, Drost H-G. 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18:366–368. doi: 10.1038/s41592-021-01101-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Huang L, Zhang H, Wu P, Entwistle S, Li X, Yohe T, Yi H, Yang Z, Yin Y. 2018. dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation. Nucleic Acids Res 46:D516–D521. doi: 10.1093/nar/gkx894 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Rawlings ND, Barrett AJ, Thomas PD, Huang X, Bateman A, Finn RD. 2018. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 46:D624–D632. doi: 10.1093/nar/gkx1134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, von Heijne G, Nielsen H. 2019. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol 37:420–423. doi: 10.1038/s41587-019-0036-z [DOI] [PubMed] [Google Scholar]
  • 56. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 57. Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
  • 58. R Core Team . 2021. R: a language and environment for statistical computing
  • 59. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. 2019. Welcome to the Tidyverse. JOSS 4:1686. doi: 10.21105/joss.01686 [DOI] [Google Scholar]
  • 60. Karlicki M, Antonowicz S, Karnkowska A. 2022. Tiara: deep learning-based classification system for eukaryotic sequences. Bioinformatics 38:344–350. doi: 10.1093/bioinformatics/btab672 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi: 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Graham ED, Heidelberg JF, Tully BJ. 2018. Potential for primary productivity in a globally-distributed bacterial phototroph. ISME J 12:1861–1866. doi: 10.1038/s41396-018-0091-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Eren AM, Kiefl E, Shaiber A, Veseli I, Miller SE, Schechter MS, Fink I, Pan JN, Yousef M, Fogarty EC, et al. 2021. Community-led, integrated, reproducible multi-omics with anvi’o. Nat Microbiol 6:3–6. doi: 10.1038/s41564-020-00834-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. doi: 10.1186/1471-2105-11-119 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments
reviewer-comments.pdf (259.8KB, pdf)
Supplemental Information. msystems.00067-24-s0001.docx.

Supplemental figures and tables.

DOI: 10.1128/msystems.00067-24.SuF1
Table S1. msystems.00067-24-s0002.xlsx.

Accession numbers of 2,307 seed sequences of "transposase" and "integrase" genes from the Pfam database.

DOI: 10.1128/msystems.00067-24.SuF2
OPEN PEER REVIEW. reviewer-comments.pdf.

An accounting of the reviewer comments and feedback.

reviewer-comments.pdf (259.8KB, pdf)
DOI: 10.1128/msystems.00067-24.SuF3

Data Availability Statement

All Python, R scripts, code explanations, and raw data for analysis are publicly accessible on GitHub at https://github.com/carleton-spacehogs/transposase-deep-ocean.


Articles from mSystems are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES