Abstract
Background
Rubisco is among the most abundant enzymes on Earth and is a critical conduit for inorganic carbon into the biosphere. Despite this, the full extent of rubisco diversity and the biology of organisms that employ it for carbon fixation are still emerging, particularly in unlit ecosystems like the deep sea.
Results
We generate fifteen metagenomes along a spatially resolved transect off the California coast and combine them with globally distributed public data to examine the diversity, distribution, and metabolic features of rubisco-encoding organisms from the dark water column. Organisms with the form I and/or form II rubisco are detected in the vast majority of all samples and comprise up to around 20% of the binned microbial community. At 150 m and below, the potential for autotrophic carbon fixation via rubisco is dominated by just two orders of gammaproteobacteria and SAR324, encoding either the form I or II rubisco. Many of these organisms also possess genes for the oxidation of reduced sulfur compounds, which may energetically support carbon fixation. Transcriptomic profiling in the epi- and mesopelagic suggests that all major forms of rubisco (I, II, and III) can be highly expressed in the deep water column but are not done so constitutively, consistent with metabolic flexibility.
Conclusions
Our results demonstrate that the genetic potential to fix carbon via rubisco is significant and spatially widespread in the dark ocean. We identify several rubisco-encoding species that are particularly abundant and cosmopolitan, highlighting the key role they may play in deep-sea chemoautotrophy and the global marine carbon cycle.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-025-03625-3.
Keywords: Carbon fixation, Rubisco, Chemoautotrophy, Marine carbon cycle, Metagenomics
Background
Ribulose 1,5-bisphosphate carboxylase/oxygenase (rubisco) is the key enzyme underlying carbon fixation via the Calvin − Benson − Bassham (CBB) cycle and is responsible for the majority of carbon fixation on Earth [1, 2]. Employed by plants and photosynthetic phytoplankton, light-driven rubisco is estimated to fix as much as 258 billion tons of carbon dioxide annually [3], which then enters the biosphere as organic carbon. Due to its undeniable importance in photoautotrophy, studies of rubisco have historically focused on organisms from sunlit biomes. However, rubisco is not itself dependent on light, and carbon fixation via rubisco can instead be coupled to chemical energy sources in complete darkness (chemoautotrophy). To date, though, much less is known about the organisms mediating this coupling and their impact on local and global carbon cycling compared to their photoautotrophic counterparts.
One particularly significant dark ecosystem is the deep ocean water column (≥ 200 m water depth), where sunlight is low or entirely absent. There, diverse, abundant, and active microbial communities [4, 5] shape global biogeochemistry by remineralizing organic matter exported from the sunlit zone and coupling chemical energy to inorganic carbon fixation to form biomass [6]. In some regions of the deep sea, the biomass resulting from new fixation may reach a similar magnitude to that derived from heterotrophic degradation of organic compounds [7, 8], thus representing an important mechanism by which cells’ overall carbon demand might be met [9]. In this way, dark carbon fixation may also represent a significant carbon dioxide sink with potential ramifications for global climate.
For some time, nitrifying archaea fixing carbon with the 3-hydroxypropionate/4-hydroxybutyrate (3HP-4HB) cycle were considered primary drivers of deep-sea chemoautotrophy [10–12]. More recently, genomic surveys have indicated that lineages using other carbon fixation pathways—including the CBB [5, 13, 14]—might also be widespread and contribute more to dark carbon fixation than nitrifying archaea using the 3HP-4HB cycle [15]. Using a combination of single-cell and assembly-based metagenomics approaches, these studies have begun to illuminate the genetic features of organisms encoding rubisco in a subset of lineages and ocean depths/regions. However, despite its potential importance in shaping deep-sea biogeochemistry, a holistic view of rubisco diversity in the global deep sea and the ecological role of organisms that employ it for carbon fixation—including their abundance, distribution, and metabolic strategies—remains lacking [16]. Additionally, how different forms of rubisco (e.g., form I, II, and III) are differentially distributed and expressed over the physical and chemical gradients of the dark water column has not been carefully explored. Of particular interest is the extent to which form I and form II rubiscos, both of which function in the CBB cycle but are generally considered to possess different oxygen niches and biochemical characteristics [17–19], spatially overlap in the dark ocean. Similarly, the distribution of form III-related rubisco enzymes, which are thought to fix carbon dioxide during heterotrophic nucleotide assimilation [20, 21], is not well understood.
Here, we examine the diversity, distribution, and metabolism of rubisco-encoding organisms (hereafter termed REOs) in the global ocean, from the epipelagic to the abyssopelagic. Critically, we take a genome-resolved lens to REO diversity, enabling clearer resolution of the taxonomic affiliation, co-occurring genetic characteristics, and abundance compared to gene-centric approaches. Through this lens, we discover over one thousand species groups of bacteria and archaea with variable forms of rubisco and use their genomes to probe changes in carbon fixation potential with depth, both across a spatially resolved transect in the Northeastern Pacific as well as the global ocean. Supporting our metagenomic analyses, we also leverage public metatranscriptomes that confirm rubisco expression by diverse organisms from the surface through the mesopelagic. Together, our analyses illuminate new aspects of REO biology in the deep sea and provide foundational information that could help refine biogeochemical models of this vast habitat.
Results
Recovery of novel REOs from a deeply sequenced water column transect
We began our analysis by collecting and deeply sequencing 15 metagenomes (average ~ 52 gigabases/sample) from a spatially resolved water column transect from the California Coast (OC1703A) complementing 13 previously reported metagenomes from the same expedition [22]. The combined 28-metagenome dataset now spans six sites across nearly 300 km, covering 50 to 4000 m water depth. Paired physicochemical data on these sites is available in Arandia-Gorostidi et al. (2023) and Arandia-Gorostidi et al. (2024) [4, 22]. This dataset adds to existing sequencing efforts from the dark ocean, especially within the bathypelagic (1000 to 4000 m depth) where little sampling has been performed to date [23]. Here, we utilized the combined dataset of 28 metagenomes to resolve 3455 metagenome-assembled genomes (MAGs)—hereafter referred to as the OC1703A MAG set—of which 3.2% were high-quality draft genomes (> 90% completeness, < 5% contamination, presence of 5S, 16S, and 23S rRNAs, ≥ 18 tRNA genes), 79.5% were medium-quality drafts (≥ 50% completeness and < 10% contamination), and 17.3% were unclassifiable (> 50% completeness and ≥ 10% contamination) according to the MIMAG schema [24]. These MAGs accounted for 11.2 to 55.4% (mean ~ 41%) of raw metagenomic reads (Additional file 1: Table S1). We searched MAGs for rubisco genes, identifying and subsequently curating medium-to-high quality genome information for 24 REO species groups (Additional file 1: Table S2), 17 (~ 71%) of which were uniquely assembled at our site and not present in existing public databases (Additional file 1: Table S3).
Taxonomic diversity of rubisco-encoding organisms in the global ocean
We next combined REO MAGs from the OC1703A set with other genomes extracted from existing marine databases [23, 25–28], which were similarly searched at scale for the presence of enzymes in the rubisco superfamily (Methods). The combined set of global REO genomes—encompassing MAGs, single–amplified genomes, and isolates– was subjected to quality filtering (≥ 50% completeness and < 10% contamination, corresponding to high- and medium-quality draft genomes in the MIMAG schema) and de-replication, forming 1070 “species group” clusters at 95% average nucleotide identity, including 24 represented by MAGs from the OC1703A set (Additional file 1: Table S3). Next, we sorted the rubisco sequences present in REO species groups into previously described “forms” [18, 29, 30] and stringently curated the taxonomy of rubisco-encoding contigs to avoid incorrect inferences caused by metagenomic mis-binning. Despite this curation process, we cannot fully rule out the presence of a small number of mis-attributed contigs which may slightly inflate the inferred degree of REO diversity.
Nevertheless, curated results included a vast array of rubiscos spanning nearly all known phylogenetic forms, including the form I, form II (Fig. 1a, b), and numerous form III-related enzymes (Additional file 1: Table S4, S5). Organisms with the form I were particularly taxonomically diverse, encompassing over 40 orders in 10 non-Cyanobacterial lineages (phyla/classes) (Fig. 1a). The most diverse group of marine REOs was the Gammaproteobacteria, accounting for 125 distinct species groups with the form I enzyme (Additional file 1: Table S3). These lineages include some already well known to perform carbon fixation via rubisco—including the Arenicellales [14] and the PS1 (including members of the SUP05 clade) [31]—as well as numerous newly reported orders, including two species groups with no current order-level designation in GTDB (order: novel) (Additional file 1: Table S3, Fig. 1a). Though without genome-level taxonomy, preliminary searches of a conserved ribosomal marker protein suggested these genomes fall within the described gammaproteobacterial family Competibacteraceae. We also recovered evidence for form I enzymes in SAR324 (11 species groups), various Alphaproteobacteria, and the Chloroflexi (Additional file 1: Table S3).
Fig. 1.
Characteristics of global rubisco-encoding organisms (REOs) with the form I, form II, or both enzyme types (information on form III enzymes can be found in Additional file 2: Fig. S1). Organisms were clustered into species groups (95% ANI) and were aggregated at the order level. a For each order-level lineage, the number of distinct species groups, the completeness of representative genomes for each species group, and the fraction of representative genomes that encoded the rubisco small subunit (SSU), at least one rubisco-like protein (RLP), CBB cycle genes, and genes involved in the oxidation of sulfur or nitrogen compounds for energy gain. Only those order-level lineages with more than one species group are displayed. b Schematic of phylogenetic relationships within the rubisco superfamily adapted from [29]. c Pathway diagram for oxidative reactions involving sulfur, adapted from [32]
In contrast, the diversity of organisms bearing the form II gene, also associated with the CBB cycle, was restricted to a smaller number of groups, primarily in the Alpha- and Gammaproteobacteria (Fig. 1a). Intriguingly, almost all lineages with the form II also contained species groups with the form I, suggesting significant intra-lineage variability in rubisco type in these cases. Furthermore, several proteobacterial lineages—including the Rhodobacterales, the GCA-001735895, and Nitrosococcales (Additional file 1: Table S3)—contained a small number of species groups encoding both the form I and form II in the same genome, as has been observed for some autotrophic proteobacteria [17]. Lastly, organisms encoding various form III-related enzymes were recovered in Archaea—including the Nitrososphaerales (Thaumarchaeota) [33, 34]—and the Candidate Phyla bacteria [29, 35] (Additional file 2: Fig. S1, Additional file 1: Table S3). Protein sequences for all ~ 4400 marine rubisco reported here (609 clusters at 95% amino acid identity) are reported in Additional file 1: Table S4.
Rubisco-associated genes and metabolic features of REOs
Across the rubisco superfamily, there exists significant diversity both in pathway affiliation and genomic configuration (e.g., [29, 36]). To examine the metabolic context for these marine rubiscos and assign likely function, we searched representative genomes picked for each global species group for associated genes (Additional file 1: Table S6, Methods). As anticipated, only genomes with the form I rubisco encoded the small subunit rubisco (Fig. 1a), which forms a complex with the large subunit and augments its kinetic properties [37]. One exception to this trend was the Chloroflexi, some of whose members encode a divergent form I rubisco that does not require the small subunit to assemble [38]. Other REO lineages without consistent presence of the small subunit are likely due to genome incompleteness, though we cannot rule out the possibility of alternate pathway configurations in all cases.
We consistently recovered the remaining genes in the CBB cycle (Additional file 1: Table S7)—the canonical pathway affiliated with the form I and form II rubisco—among the REO lineages, confirming that these rubiscos likely act in functional carbon fixation pathways (Fig. 1a). Importantly, the phosphoribulokinase gene (prk), which regenerates the substrate for rubisco so that additional carbon dioxide can be fixed, was nearly omnipresent. On the contrary, fructose-1,6-bisphosphatase (fbpase) and sedoheptulose-1,7-bisphosphatase (sbpase) were more patchily distributed among the lineages surveyed, including some where representative genomes attained high completeness values (Fig. 1a). This finding suggests that some marine REOs generally may encode streamlined CBB cycles that rely on bifunctional enzymes to cover inner reaction branches, as has been previously described for other organisms [39], or are otherwise reconfigured. Form II/III and III-related enzymes generally co-occurred with genes (deo and ribp_isomerase) known to function in a non-autotrophic pathway incorporating CO2 with a scavenged nucleotide [20] and lacked the key CBB gene prk (Additional file 2: Fig. S1). Exceptions to this trend included several methanogenic lineages and one lineage of CPR bacteria that employ prk in validated or proposed variants of the CBB cycle (Additional file 2: Fig. S1), though it is currently unknown whether these pathways permit autotrophic growth [29, 36].
REOs in the deep sea, where light does not penetrate, must power their CBB cycle via chemical energy instead of light energy. Thus, we investigated potential sources of chemical energy by searching genomes for genes involved in nitrogen, sulfur, and hydrogen oxidation (Additional file 1: Table S6). Genes involved in the oxidation of nitrite and ammonia were rare, and only co-occurred with form I rubisco in six species groups in the proteobacterial orders Burkholderiales, Nitrosococcales, Nitrococcales, and Rhizobiales (Additional file 1: Table S8). In contrast, we observed that genes involved in the oxidation of reduced sulfur compounds frequently co-occurred with form I and form II rubiscos, particularly in the Alpha- and Gammaproteobacteria (Fig. 1a, c). The genetic capacity for thiosulfate oxidation (soxBCY) was widespread, detectable in 69% of proteobacterial species groups with the form I, ~ 74% with the form II, and ~ 86% with both the form I and form II (Additional file 1: Table S8, Fig. 1a). We suggest that these estimates are likely conservative, given that relatively lax thresholds for genomic completeness (≥ 50%) were employed here.
The ubiquitous detection of the capacity to oxidize thiosulfate broadens and highlights recent evidence from a few individual deep-sea lineages [14, 40] and indicates that this metabolic strategy may be a central one among REOs in the deep realm. Similarly, many sox-encoding lineages also possessed the genetic potential for sulfide oxidation via sqr or fcc, as well as the oxidation of elemental sulfur by the sulfur dioxygenase (sdo) and/or the reverse dissimilatory sulfite reductase (dsrAB) (Fig. 1a, c), indicating the potential to catalyze multiple sequential transformations of sulfur compounds. On the other hand, genes enabling the oxidation of sulfite to sulfate were rarer, as were genes encoding the production of sulfite from the organic sulfur compound taurine (Additional file 1: Table S8), which could serve as an alternative source of sulfur in the oxygenated water column [14, 41]. Finally, in lineages where machinery for sulfur oxidation was rare, like the Actinobacteria, we recovered genetic evidence for uptake hydrogenases (forms 1 d, 1 l, 2a) that could support fixation through the aerobic oxidation of molecular hydrogen [42, 43].
Abundance and distribution of REOs across a coastal California transect
Previous work has identified REOs in the dark ocean at specific depths [5, 13]; however, few studies have traced their distribution over physical and chemical gradients. We leveraged the unique degree of spatial resolution in our newly reported OC1703A transect to address this sampling gap and assess the abundance of REOs from the surface into the bathypelagic and from the coast to the open ocean. We combined representative REO genomes with those for non-REO species also recovered from the OC1703A sequencing transect, ensuring that we did not introduce bias in the way each group was sampled. Metagenomic read recruitment to this combined genome set, followed by stringent filtering of resulting alignments, revealed that organisms with forms I, II, or III-A rubisco were detectable at all depths surveyed from coast to open ocean (~ 300 km from shore) (Fig. 2a). Remarkably, we estimate that these organisms routinely comprised upwards of 10% of the binned microbial community—as measured by percentage of total sequencing coverage of recovered bins—below 200 m (mean 14.5%), reaching a maximum of 19.5% at 3000 m near the continental shelf. This abundance exceeds that observed in the photic zone (50 and 150 m)(Additional file 1: Table S9), though poor genome recovery in the 50 m samples likely obscures REO relative abundance and diversity there to some extent.
Fig. 2.
Distribution and abundance of global REOs with different rubisco forms across the OC1703A transect (coastal California). Dotplots represent 5 sites (OC2-6) spanning ~ 300 km offshore, sampled across depths. Size of dots represents the relative abundance (calculated as the percentage of total sequencing coverage) of MAGs encoding any form of rubisco, the form I, form II, and form III-A enzyme, respectively. Barplots represent either the composition of rubisco forms (all) or the taxonomic composition of rubisco-encoding organisms with depth (all others). In the upper left panel, percent dissolved oxygen (DO) saturation is overlaid. N.B. Seafloor topography was generated by GeoMapApp and is approximate
While rubiscos at 50 m could primarily be attributed to Cyanobacteria with the form I, we also found evidence for non-phototrophic Alpha- and Gammaproteobacteria at these depths (Fig. 1a, b). With the exception of the site nearest to shore (OC2), organisms with the form II rubisco were not recovered at 50 m depth, possibly due to this form’s sensitivity to high oxygen concentrations. On the other hand, at 150 m and below, there was no clear spatial separation of the form I and form II; instead, organisms of both forms co-occurred stably. In fact, we observed that organisms with form II tended to increase proportionally in abundance with depth (R2 = 0.301), despite increasing oxygen concentrations (Fig. 2a). Form III-A rubiscos, attributable to non-ammonia oxidizing Nitrososphaerales—also referred to as heterotrophic marine Thaumarchaeota, or HMT [34]—were present at trace abundances (average ~ 0.08% of the binned community) across the transect at 500 m and below (Fig. 2d, Additional file 1: Table S9).
Intriguingly, REO populations at the OC1703A transect were dominated by a small number of order-level lineages from the global set (Fig. 2). In our transect, organisms affiliated with the SAR324 comprised the majority of the metabolic potential for the form I rubisco (Fig. 2b, side panel). At 100 m and below, organisms from the Gammaproteobacteria PS1 also attained notable abundances; distinct species groups from this same lineage also accounted for the vast majority of form II potential across depth (Fig. 2c). We note that the SAR324 and PS1 lineages were mostly accounted for by singular species groups that were particularly abundant; additional, rarer species were restricted to narrower depth ranges (e.g., epi- or bathypelagic only) (Additional file 2: Fig. S2–3). Other order-level lineages, like the Marinistomatales and the Arenicellales, comprised only small portions of form II potential in the mesopelagic, although the latter group can be highly transcriptionally active despite low DNA abundance in some locales [14].
Our genome-resolved approach also allowed us to examine how catabolic pathways supporting carbon fixation varied with depth and space. Along our transect, the metabolic potential for oxidation of various reduced sulfur compounds was widespread throughout the deep water column below 150 m without strong depth patterning—frequently reaching 6–10% of the total binned microbial community (Additional file 2: Fig. S4)—mirroring the distribution of metabolically flexible SAR324 and PS1 (Fig. 2b, c). At this site, we also observed consistent metabolic potential for sulfite production from organic sulfur compounds, while genes for thiosulfate disproportionation were very rare (< 0.1% of the binned community), but apparently peaked in the bathypelagic around 2000 m (Additional file 2: Fig. S4). On the other hand, REOs with the ability to oxidize ammonia were restricted to the epi- and upper mesopelagic, and we did not detect nitrite or hydrogen oxidizers at the strict thresholds employed here. These findings suggest that reduced nitrogen compounds and molecular hydrogen only minimally support rubisco-mediated carbon fixation at this site, although they likely sustain other, spatially overlapping types of fixation (e.g., the 3HP/4HB cycle in ammonia-oxidizing Thaumarchaeota [22, 44]).
Abundance and distribution of REOs across the global ocean
To examine the generality of our results concerning REO abundance and distribution, and explore variations at the global scale, we amassed over 1000 water column metagenomes from around the world (Fig. 3a, Additional file 1: Table S10) and aligned their reads to representative REO genomes compiled above. As before, mappings were stringently filtered on read identity and fraction of genome covered by reads (coverage breadth) to minimize false detections (Methods). As observed in the OC1703A transect, organisms with rubisco were detected in nearly all global samples assessed, whether only deep samples (~ 93%), all samples (~ 92%), or only autotrophic forms I and II (~ 89%) were considered. Where detected, we computed the relative abundance of each REO (Additional file 1: Table S11) and summed these values by rubisco form to visualize relationships between abundance and depth (Fig. 3b).
Fig. 3.
Distribution and abundance of rubisco-encoding organisms (REOs) across the global ocean. a Water column sites where REOs were detected. An “X” refers to a site where at least one deep (≥ 200 m) sample was taken, whereas open circles indicate sites with only shallow (< 200 m) samples. b Relative abundance of REOs with different rubisco forms expressed as RPKM (reads per kilobase million). Shaded cells indicate the detection of one or more organisms in a given depth/RPKM bin, with hue intensity indicating the density of observations in that bin. The lower rightmost panel describes the number of metagenomes analyzed by depth
As in the OC1703A transect, we observed that organisms with the form I rubisco attained the highest relative abundances globally of all forms, reaching maximum values (≥ 40 reads per kilobase million, RPKM) in some epipelagic samples (Fig. 3b, Additional file 1: Table S11). The relative abundance of these groups remained fairly constant with depth through to the boundary of the abyssopelagic. Similar patterns were observed for organisms with the form II and III-like (typically CPR bacteria and DPANN archaea), although at a smaller scale (Fig. 3b). On the other hand, lineages encoding both the form I and form II in the same genome were essentially restricted to the mesopelagic and were rarely observed below 1000 m. Organisms with the III-A rubisco (generally non-ammonia oxidizing HMT) displayed the reverse pattern, apparently peaking in relative abundance below 3000 m (Fig. 3b), consistent with previous reports [34]. Most major rubisco forms were patchily distributed below 4000 m, but poor sampling hinders a complete picture in this ocean layer. Additionally, we acknowledge that decreased sampling in the bathypelagic may also impact genome recovery for deeper-dwelling REO species. Thus, REO abundances in this region should likely be interpreted as conservative minima.
Leveraging the global breadth of our metagenomic database, we next investigated the biogeography of individual species groups based on the read mapping described above. To reduce the impacts of uneven spatial sampling, we clustered all metagenomes drawn from 200 m depth or below into “locales” within 10 km radii (Additional file 1: Table S10, Methods) and counted the number of locales in which each REO species group was detected. Intriguingly, most REO species groups were provincial: in other words, appearing at only one or a few locales (Fig. 4a). In general, provincial REO species groups trended towards lower mean abundances (overall R2 = 0.249)(Fig. 4b).
Fig. 4.
Biogeography of REO species groups from various bacterial and archaeal lineages within the global metagenomic dataset. a Frequency detection of REO species groups among deep water locales (≥ 200 m depth), ordered by form and taxonomy. A subset of lineages is shown; the remaining lineages are marked here as ‘other’. Boxplot colors indicate different lineages and correspond to colors in b, which depicts frequency of detection (again, only deep locales) plotted against mean relative abundance. The trendline reflects a simple linear model with a 95% confidence interval. In this case, a species group was considered present at a given locale if it was detected in any size fraction and at any depth ≥ 200 m
We did, however, also recover evidence for multiple more cosmopolitan species groups, where representative genomes were detected in over one third of global deep locales. These species groups largely derived from the same lineages that dominated rubisco form I and II metabolic potential in the OC1703A transect: SAR324, PS1, and to a lesser extent, the Pedosphaerales, Pseudomonadales, and Marinistomatales (Fig. 4a). Remarkably, one SAR324 species group that was dominant at OC1703A (SG114_2) was also detected in almost 80% of global deep-water locales, the highest of any in our dataset. Visualization of SG114_2 abundance with depth over the global dataset showed that this species is generally rare at the ocean surface and increases in abundance towards a peak at or around 2000 m (Additional file 2: Fig. S5), implying that it is a deeper-adapted ecotype. A similar pattern was observed for SG210_1 from the Gammaproteobacterial order PS1 (~ 45% of locales), which was the most dominant lineage encoding the form II rubisco in the OC1703A transect (Additional file 2: Fig. S3, Additional file 2: Fig. S5).
Patterns of gene expression in REO genomes
While evidence for expression of rubisco has been reported at a number of water column sites, the extent to which this expression is quantitatively important for endemic taxa remains poorly understood. We next examined organism-specific rubisco expression by collecting approximately 200 metatranscriptomes from the world oceans across depth, as gene expression studies of the dark water column are sufficiently rare as to prevent a focus on deep datasets exclusively. We aligned sequenced transcripts to the global REO genome set, stringently filtered these alignments such that transcripts could be uniquely ascribed to a single organism, and finally required that genes surpass a minimum threshold of sequencing coverage (coverage breadth ≥ 50%) to be considered expressed. While we acknowledge that such an approach is likely biased against rarer or less active organisms that obtain lower sequencing coverage, here we focused on those with the clearest signal of gene expression to minimize the impacts of ambiguous mapping and/or trace DNA contamination.
First, among those organisms that we deemed transcriptionally active (at least 50 genes with detectable expression in a sample), we ranked the expression of rubisco relative to all other expressed genes. Broadly, this analysis revealed that diverse REOs actively express the major forms across depth, sometimes within the top 10% of all genes in their genomes (percentile expression ≥ 90%, Fig. 5a). We observed no major differences between these forms with depth. However, many non-cyanobacterial REO expressing rubisco genes did so at median or lower levels compared to other genes in the genome (percentile expression ≤ 50%), with overall distribution across depths and forms appearing relatively uniform (Fig. 5a, “non-cyanobacteria” panel). This stood in stark contrast to expression patterns for shallow cyanobacteria (two-sample K-S test, p = 9.99E − 16), where we observed a radically left-skewed distribution, indicating very high expression of form I rubisco under most conditions. This discrepancy may reflect major differences in rubisco turnover rates or fundamental metabolic strategies between generally autotrophic Cyanobacteria and the bacteria and archaea profiled here. Importantly, we note that in most cases, rubisco expression among active non-cyanobacterial species groups was zero (55% of observations) or non-zero but below threshold for active expression (~ 32%) (Additional file 1: Table S12). Interestingly, these same REOs did display active rubisco transcription in other samples (including PS1 and SAR324), highlighting dynamic transcriptional profiles for these groups (Additional file 1: Table S12). Thus, our results not only provide further evidence that diverse lineages use rubisco to fix carbon through the deep sea, but also suggest that it may constitute one of multiple anabolic strategies for some hosts.
Fig. 5.
Transcriptional profiles of globally abundant REOs. a Extent of rubisco expression in non-cyanobacterial species as a function of depth. Each point represents a transcriptionally active genome in a single sample with above-threshold rubisco gene expression (≥ 50% gene coverage breadth). X-axis values describe the percentile expression of rubisco among all transcribed genes in that genome (Methods). For clarity, active organisms with rubisco expression that was below the minimum threshold are not displayed. Upper histograms depict the density of observations, with equivalent measurements for cyanobacterial genomes shown for comparison. The rightmost panel describes the number of metatranscriptomes analyzed with depth. b Genome-wide transcriptional profiles for an abundant SAR324 species group (SG114_2) across a subset of water column transcriptomes from Tara Oceans, referenced by their BioSample identifier (SAMEA*). N.B. Each point represents a gene within the SG114_2 representative genome and is shaded by its pathway affiliation. However, the vertical position of different gene sets within a sample does not convey quantitative information
The genome-centric approach employed here also enabled us to scrutinize the expression profiles of individual organisms. Given the prevalence of SAR324 species group 114_2 (SAR324) in our set of deep metagenomes (Fig. 5a), we reasoned that this organism might also be frequently detected in metatranscriptomes, even with the lower extent of global sampling in current databases. Indeed, we found that this species group was transcriptionally active in over half of the metatranscriptomes sampled from below 200 m, and, notably, was responsible for the two deepest observations of rubisco expression in our transcriptome dataset (~ 750–800 m, Fig. 5a). We observed no strong relationship between depth and the extent of form I rubisco expression (R2 = 0.139), suggesting that other factors likely govern the regulation of carbon fixation machinery in this organism (Fig. 5b). Using expression values computed for all genes in the genome, we next compared rubisco expression to other gene sets of interest. Across samples, we noted concurrent expression of the CBB cycle and sulfur oxidation machinery, consistent with the latter’s potential role in supporting carbon fixation energetically (Fig. 5b). Intriguingly, the expression of genes within these pathways displayed high variance, with transcription levels of triosephosphate isomerase (CBB cycle), adenylylsulfate reductase (sulfite oxidation), and soxB (thiosulfate oxidation) often surpassing those of the ribosomal proteins (Additional file 1: Table S13).
Discussion
Diversity, abundance, and distribution
To date, organisms with rubisco have been recognized in a subset of deep-sea microbial lineages and/or depths (e.g., [5, 13, 14]); however, a unified view of their diversity, abundance, and distribution has remained elusive. Here, using a rubisco-centric screening approach, we significantly expand the number of bacterial and archaeal lineages known to encode this gene and demonstrate that REO species groups are widespread throughout the dark marine water column. Our depth-resolved analyses show that REOs are highly abundant below 200 m, in some cases reaching nearly a fifth of the binned microbial community (Fig. 2). The relatively high rate of read recruitment (Additional file 1: Table S1) by our resolved MAGs suggests that they are generally representative of the total microbial community; thus, we expect that these abundance estimates should approximate those derived through other metagenomic methods (e.g., contig or read-based analysis). Regardless, our results also clearly demonstrate that non-cyanobacterial REOs are commonly detected in shallow regions where sunlight penetrates (Fig. 2). Co-occurrence with phototrophs in some shallow samples suggests that chemoautotrophy may remain advantageous despite ample availability of organic matter [16] and light energy. Supporting this, we detected transcription of carbon fixation machinery across essentially the entire water column where data has been generated (Fig. 5). Future gene expression studies of the bathypelagic will illuminate the extent to which these trends extend to depths where nutrient limitation intensifies, if technical challenges to producing low-biomass RNA libraries can be surpassed. However, transcription of rubisco by at least one REO in sediment traps from 4000 m [45] is suggestive that microbial carbon fixation via rubisco occurs actively deep into the ocean interior.
Drawing on the public data, we demonstrate that REO genomes are detected in the vast majority of metagenomic samples from the major basins, spanning functional and phylogenetic forms in the rubisco superfamily (Fig. 3). Additionally, we also examine the ubiquity of individual chemoautotrophic species over the vast expanse of the global ocean, which is essentially unknown outside of a handful of characterized lineages. In our analysis, the most ubiquitous REOs were members of the SAR324, largely encoding the form I rubisco, as well as members of the PS1 (including the SUP05), which frequently encoded the form II enzyme type. Remarkably, one species from the former lineage was recovered in over three-quarters of deep locales analyzed, consistent with a previous study that consistently detected a singular SAR324 species within global ~ 4000 m sites [5]. We also report a number of other cosmopolitan taxa from the order-level lineages Rhodospirillales, Marinisomatales, Pseudomonadales, Pedosphaerales, and several others (Fig. 4). Together, these taxa may anchor community potential for carbon fixation by rubisco, even if less abundant species contribute more to instantaneous transcription under some conditions [14]. Intriguingly, REO genomes with both the form I and II enzyme types differed from other major types examined here in that they were barely detected below 1000 m (Fig. 3b), despite the metabolic flexibility theoretically afforded by this gene inventory [17]. Instead, organisms with multiple rubiscos may be selected for in zones where fluctuations in oxygen/carbon dioxide are greater in magnitude or might instead be limited by other environmental/biological factors.
Biochemical studies have suggested significantly different kinetic properties for form I and form II enzymes, which, like all bona fide (form I–III) rubisco, can utilize both carbon dioxide and oxygen as substrates. Largely due to decreased ability to discriminate between substrates (Sc/o), form II enzymes are thought to be specialized for lower oxygen, higher CO2 niches [17], where competition for enzyme active sites by undesirable oxygen molecules is reduced. Given these properties, we were surprised to observe that organisms encoding either the form I or form II enzyme co-occurred throughout the water column, both in the OC1703A transect, and showed no consistent relationship between abundance and oxygen concentration (Fig. 2). To explain this unexpected result, we hypothesize that these REOs might occupy distinct microniches within the dark water column, where they mediate carbon fixation at different rates according to their rubisco gene inventory. Supporting this notion is growing evidence that anaerobic conditions might arise in particle interiors where microbial activity depletes local oxygen levels [46], and that such aggregates may be more widespread in the oxygenated ocean than previously recognized [47]. However, while some REOs have been imaged in association with aggregates [13], their genomes do not always contain canonical genes for particle-associated lifestyle (e.g., attachment) [48]. Currently, the predominant lifestyles of deep-sea REOs remain an open question, to be addressed more comprehensively in future research.
Metabolism: genes and transcriptomes
Energy supporting carbon fixation in marine chemoautotrophs may be derived from a variety of sources, including oxidation of hydrogen, nitrite or ammonia, and reduced sulfur compounds [16, 49, 50]. Evidence for the particular association of rubisco and genes for oxidation of various reduced sulfur compounds (e.g., sulfide, thiosulfate, sulfite) is accumulating from both bulk gene quantification approaches as well as genome-resolved metagenomics [14, 48, 51]. Recently, a firmer link between these processes was established by demonstrating that amendment of deep-sea water with thiosulfate stimulated dissolved inorganic carbon fixation up to four times compared to an unamended control [40]. Here, we expand upon this growing body of evidence by reporting a wide distribution of sulfur oxidation genes among the largest set of REO genomes assembled to date. If reduced sulfur compounds can indeed serve as a source of energy supporting carbon fixation, our results suggest that this strategy is likely the primary one employed by REOs in the deep water column. Importantly, at least in the OC1703A transect, we predict that these organisms are spatially widespread across depth (Fig. 1, S4), and thus are unlikely to be solely the result of transport from anoxic coastal regions, as has been proposed for a subset of groups [31]. Alternatively, our results would be consistent with the notion that some REOs localize to suboxic niches where sulfate reduction can occur [13, 47, 52], or to interfaces where both oxygen and reduced inorganics would be available [17].
Outside of the Cyanobacteria, very few REOs identified in our study are known to function as obligate autotrophs dependent upon chemical/light energy and carbon for growth. In fact, a growing body of literature suggests that individual representatives of major bacterial lineages with rubisco may live mixotrophically, at least transiently depending on organic matter when it is available. Specifically, studies have reported the presence of various genes for organic compound transport in genomes also encoding rubisco and sulfur oxidation machinery [5, 13, 53, 54]. Recently, these genomic claims were bolstered by experimental evidence demonstrating tandem uptake of inorganic and organic substrates in the same population of cells [14]. However, to date, the potential for mixotrophy among deep sea REOs has not been systematically examined. While not a facet of our genomic analyses here, we did observe a wide range of rubisco expression values (Fig. 5), and many cases where organisms were active but no rubisco expression was detected. From these results, we infer that some form I and II-bearing species were sampled while growing heterotrophically, as obligate chemoautotrophs likely continually express the CBB cycle. This metabolic flexibility may permit REOs to persist through periods of low organic matter availability in their vast and dilute habitat.
Conclusions
Overall, our results yield important insights into the diversity, distribution, abundance, and metabolism of widespread lineages with the genetic potential to perform inorganic carbon fixation via rubisco in the ocean interior. The knowledge gained could inform efforts to model the microbial contribution to organic carbon supply/demand in this habitat, which is currently not well resolved [6, 9, 55]. One important contribution of this study is the notion that abundant global REOs fall into relatively few taxonomic (e.g., SAR324 and PS1) and metabolic types (e.g., sulfur oxidizers) facilitating their future inclusion in trait-based biogeochemical models. Further, that some individual REOs are found in the majority of marine communities identifies them as attractive targets for cultivation efforts, which could ultimately enable measurement of process rates in addition to further examination of the genetic features identified in this study.
Of course, in rationalizing the roles of carbon fixers in deep-sea biogeochemistry, it is important to keep in mind other pathways beyond the CBB cycle. This is particularly true in the vast expanses of the ocean that experience transient or permanent anoxia, and thereby may select for alternative pathways with higher oxygen sensitivity and reduced energetic demands [56, 57]. While preliminary work from a subset of depths suggests that the CBB cycle is dominant in the dark ocean water column [5], additional study is needed to determine the relative importance of the CBB to other pathways on a larger spatial scale, and relate emergent patterns to a broader slate of physical and chemical parameters.
Methods
Metagenomic analysis of microbial communities from the OC1703A transect
Samples for metagenomic analysis were collected in the Pacific Ocean off the USA California coast (north of Monterey Bay) using the R/V Oceanus in the spring 2017 and processed as described in Arandia-Gorostidi et al. (2023) [4]. Briefly, Niskin bottles were used to collect seawater at 50, 150, 500, 1000, 2000, 3000, and 4000 m water depth at 6 sites along a 300 km transect. Seawater (5–25 l depending on depth) was filtered onto 0.2 µm Sterivex units (Millipore, Germany) and flash frozen in liquid N2 before storage at − 80 °C. DNA extraction was performed using the AllPrep DNA/RNA kit (Qiagen, Valencia, CA, USA). Paired-end metagenomic sequencing was performed on a NovaSeq 6000 S4 platform (150 bp reads) at the UC Davis DNA Sequencing Facility. In this study, we report metagenome information for 3 sites—OC2, OC4, OC5 (15 samples total across depths)—for the first time, in addition to the 2 sites—OC3 and OC6 (13 samples)—reported previously [22].
To generate a set of MAGs representing the microbial community across sites, sequencing reads were quality-filtered using bbduk (sourceforge.net/projects/bbmap), assembled using MEGAHIT (v1.2.9) [58], and binned using the metabat tool suite (minimum contig length 1500 bp) [59] as described in Arandia-Gorostidi et al. (2024) [22]. Cross-mapping was performed for a subset of sample pairs determined by visualizing overall metagenome similarity with sourmash [60]. Briefly, sourmash signatures were generated for each metagenomic assembly using default parameters, and compared using a k-mer size of 31. Generally, sample pairs attaining ~ 10% similarity or greater were cross-mapped using bowtie2 [61], and results were passed to metabat2 [59] for differential coverage binning. Bins that met baseline quality thresholds (≥ 50% completeness and ≤ 25% contamination, as measured with CheckM [62]) and were selected as species group representatives by dRep [63] (see below) were further refined using anvi-refine from Anvi’o [64], in which member contigs were visualized and manually removed if they displayed abnormal GC content or coverage profiles across the set of metagenomes.
Construction of a global dataset of REOs from the marine water column
We combined the newly resolved OC1703A MAGs with genomes drawn from existing collections of metagenome-assembled genomes (MAGs), single-amplified genomes (SAGs), and other genomes from the global ocean– specifically, the Ocean Microbiomics Database and its component datasets [5, 23, 65–69], the Ocean DNA catalog [25], and several studies targeted low-oxygen ocean regions [27, 28, 70]. The combined set was subjected to a preliminary screen for rubisco using graftM [71] and a custom package built from rubisco superfamily sequences reported in [18] clustered at 75% identity using usearch [72]. This custom package is available at the project’s GitHub repository (https://github.com/alexanderjaffe/rubisco-genomics/). To control for false positives among protein hits from graftM, sequences were secondarily annotated using kofamscan [73], and only those hits with top hits to KEGG Orthology numbers K01601, K08965, and K25035 (describing rubiscos and rubisco-like proteins) at e ≤ 1E − 5 were retained. Genomes containing putative rubiscos were then assigned a genome-level taxonomy using GTDB-Tk [74], filtered at ≥ 50% completeness and ≤ 25% contamination as computed by CheckM [62], and clustered at 95% average nucleotide identity using dRep [63]. Inventories of rRNAs and tRNA genes were assessed using barrnap (https://github.com/tseemann/barrnap) and tRNAscan-SE [75], respectively. In the latter case, only tRNA genes corresponding to the 20 canonical amino acids were tallied, and multi-copy tRNA genes were disregarded.
Given that differential coverage information is not readily accessible for MAGs from public repositories, we did not generally curate these MAGs for mis-binned contigs. However, to specifically account for the possibility of mis-binned rubisco genes, we analyzed the taxonomic profile of rubisco-encoding contigs in all genomes (using the method outlined in [22]) and compared them to overall genome taxonomy assigned by GTDB-Tk. Briefly, contigs were assigned a taxonomic affiliation based on a consensus of individual protein taxonomies, and manually curated when short or ambiguous. Contigs were flagged as misbinned if consensus taxonomy deviated from genome taxonomy at the phylum level; most frequently, misbinned contigs encoded eukaryotic rubiscos that could be easily identified by this approach. A refined set of REO genomes was created by removing these contigs and any rubisco-encoding scaffold ≤ 1500 bp in length. Clustering was performed again as above to produce a set of species groups where all members attained ≥ 50% completeness and ≤ 10% contamination.
Identification and curation of encoded rubisco proteins
Each genome in the quality-filtered set was subjected to gene/protein prediction by Prodigal [76] (single mode). Resulting proteins, with the exception of those from contigs encoding rubisco that were either ≤ 1500 bp in length or likely misbinned, were combined and searched against a series of custom HMMs describing the major forms within the rubisco superfamily using HMMER [77]. These HMMs (available at https://github.com/alexanderjaffe/rubisco-genomics/) were built from multiple sequence alignments constructed from sequences reported in Prywes et al. [18] clustered at 90% identity using usearch [72]. HMM results were parsed using the SearchIO package in Python, and results filtered to those in which query sequences covered at least 50% of the model. If a query hit multiple models, the model with the highest HMM score was chosen.
To confirm these preliminary form-level annotations, sequences were also classified via a phylogenetic approach using the graftM package described above. Where HMM-based and phylogenetic classifications differed (~ 10% of 609 clusters), classifications were manually curated through a combination of BLAST and phylogenetic tree visualization (Additional file 1: Table S5). Curation was aided by forming sequence clusters at 95% identity using vsearch (–id 0.95 –maxrejects 0 –maxaccepts 0) [78]. Finally, each species group was assigned a set of representative rubisco sequences by determining consensus rubisco inventories of its member genomes. This approach allowed us to account for cases in which individual genomes encoded fragmentary rubiscos that were lost due to filtering of HMM results. Clusters with no rubisco proteins were removed, yielding a final set of 1070 clusters for downstream analysis.
Analysis of genome-level metabolism
We further analyzed the metabolism of the genome clusters from above, for each selecting a representative genome of the highest quality using dRep results. Genome clusters encoding only a Form IV rubisco/RLP were excluded from this analysis. For those included, proteins predicted from representative genomes were re-annotated using kofamscan [73]. HMM results were filtered using the provided score thresholds for each model; in some cases, thresholds were relaxed using a previously described method that aims to include divergent or fragmented sequences by visualizing HMM scores in a phylogenetic context [79]. Results for each representative genome were then queried for a set of metabolic genes known to function alongside rubisco in various CO2 fixation/incorporation pathways or in pathways that oxidize sulfur and nitrogen compounds for energy gain [80]. Gene names, their associated KEGG accession numbers, and operational thresholds employed here are reported in Additional file 1: Table S6. A metabolic pathway was considered present if at least one marker gene was found; however, in the case of sulfite oxidation, we required the presence of at least two (aprA and sat).
Results were further refined in several cases where bioinformatic identification using HMMs alone was not adequate to discriminate between the gene of interest and closely related homologs. Protein sequences for putative dsr (dissimilatory sulfite reductase) genes were merged and compared with a set of curated references [81] using blastp. Putative dsr genes with a combined 50% or more percent identity and 70% coverage to known members of the oxidative clade were retained, while any genes most closely matching reductive members were discarded. Similarly, putative sulfur dioxygenases (sdo) protein sequences were aligned with a set of references and examined for the presence of two residues found to distinguish true sdo from related metallo-β-lactamases [82]. Those sequences without either key residue were discarded. Nitrite oxidoreductases were distinguished from other members of the Type II DMSO reductase family by placing them in a phylogenetic tree with a diverse set of references [83]. Putative ammonia monooxygenases were distinguished from other copper membrane monooxygenases [84] in a similar fashion. In both cases, alignment was performed by MAFFT [85], alignment trimming with trimal (-gt 0.1) [86], tree inference with FastTreeMP [87], and tree visualization with iTol [88].
Lastly, the genetic capacity for H2 oxidation via was examined by comparing all predicted protein sequences from the representative genome set to published NiFe hydrogenases, using the reference set and annotation technique reported by [43]. Sequences were filtered to those likely permitting energy gain via aerobic oxidation of H2 (forms 1 d, 1 l, and 2a), and these form assignments were confirmed with the above alignment/tree-building approach.
Abundance and distribution of REOs across the OC1703A transect
The presence of REOs was first quantified across a 28-sample transect from the California coast. As described above, these samples were sequenced, quality-controlled, assembled, and binned, yielding a number of medium–high quality MAGs representing the full microbial community at these sites. This pool of MAGs was added to the global REO set, and all genomes were re-clustered at 95% ANI using dRep. Representative genomes for each cluster were chosen using default dRep parameters, unless the cluster included a previously identified REO genome, in which case that genome was selected. If a cluster contained multiple REO genomes, the one with the highest completeness was selected as a representative. To avoid oversampling REOs compared to organisms without rubisco, and thus skewing abundance metrics, only REOs clustering with MAGs assembled from the transect itself were retained.
Next, trimmed metagenomic reads from the 28 samples were mapped to representative genomes using bowtie2 [61] (default parameters). Coverage for each bin was computed using CoverM (https://github.com/wwood/CoverM) with a 95% read identity threshold. Genomes were considered present in a given sample if 50% or more of its bases were covered by reads (coverage breadth). Relative coverage of each genome was calculated by dividing its mean coverage by the summed coverages of all genomes detected in that sample. Abundance patterns for genomes encoding any form of rubisco were then visualized as a function of genome taxonomy or gene inventory using Python.
Abundance and distribution of REOs across the global ocean
To examine abundance and distribution patterns of REO at a larger scale, we amassed a list of nearly 1000 water column metagenomes from previous studies [5, 26, 28, 65, 69, 70, 89–96] as well as corresponding metadata. Raw reads were downloaded from the Sequence Read Archive, and, if multiple sequencing runs were listed for a single sample, their reads were merged. Next, forward and reverse read files were trimmed using bbduk.sh from BBTools (sourceforge.net/projects/bbmap/) (ref = adapters ktrim = r k = 23 mink = 11 hdist = 1 tbo qtrim = r trimq = 25 minlen = 20). Trimmed reads were then mapped against the non-redundant REO genome set using bowtie2 (default parameters), and the resulting alignment files were stringently filtered using inStrain (–min_read_ani 0.95 –min_mapq 10) [97]. A Snakemake [98] workflow implementing these sequential processing steps is available via GitHub (https://github.com/alexanderjaffe/rubisco-genomics/) and is conceptually visualized in Additional file 2: Fig. S5.
Once processed, inStrain results were read into a Pandas dataframe in Python and further filtered on coverage characteristics to reduce the incidence of false detections. Specifically, a genome was considered present in a sample if it attained ≥ 50% coverage breadth and ≥ 50% of expected coverage breadth as defined by inStrain. Next, RPKM values were calculated for each genome-sample pair using the number of filtered reads mapping specifically to the genome, the genome length, and the total number of trimmed metagenomic reads in the mapped sample. Genomic RPKM values were summed per sample for all organisms with the same rubisco genes, and subsequently displayed as a function of depth (Fig. 3b). In this case, only size fractions corresponding to prokaryotic cells were used (size fractions targeting viral communities were omitted). For ease of visualization, values were sorted into depth bins of 100 m.
Geographic coordinates for all metagenomic samples analyzed were plotted with GeoPandas in Python. To control for unequal spatial sampling, site coordinates were clustered into “locales” using density-based spatial clustering of applications with noise (DBSCAN) according to a published protocol [99], with an epsilon value of 10 km. To determine the frequency of detection of individual REO species across sites, we determined the number of unique locales in which that species was detected at the breadth thresholds described above. A species was considered present if detected in any size fraction or at any depth ≥ 200 m at a given site.
Genome-resolved transcriptomics of REOs
Gene expression was assessed in ~ 200 publicly available metatranscriptomes from TARA Oceans, as well as several studies of anoxic or oxygen-deficient ocean regions [68, 100–102] (Additional file 1: Table S12). Raw reads were first downloaded, trimmed, and mapped against the non-redundant genome set as described above. For each metatranscriptome successfully mapped, read coverage of any genomes detected was quantified using pysam (https://github.com/pysam-developers/pysam). Specifically, we counted the number of reads mapping stringently (min_mapq 20, max_mismatch 5) to each gene and used this to compute gene-wise mean coverage and breadth of coverage. To reduce computational complexity, only those genomes recruiting 500 reads or more from the transcriptome were subjected to gene-by-gene analysis. A Snakemake workflow detailing this is available via GitHub (https://github.com/alexanderjaffe/rubisco-genomics/) and drawn out in Additional file 2: Fig. S6.
After processing, results were further filtered for potential non-mRNAs by removing any genes with anomalously high read counts compared to other genes in the genome. RPKM values were calculated using gene lengths and total read counts from trimmed metatranscriptomic samples, as above. Each gene was then assigned a percentile expression value using the stats package in Python. Percentile gene expression/RPKM values were then visualized on a bulk and per-genome basis according to their functional annotation and/or taxonomic affiliation.
Supplementary Information
Additional file 1: Supplementary tables. Supporting data and characteristics of genomes/samples used in this study.
Additional file 2: Supplementary figures. Supporting figures for the analyses presented in this study.
Acknowledgements
We thank Zhichao Zhou, Xin Sun, Chris Greening, Rachel Lappan, Alexa Nicolas, Noam Prywes, and Graciela Chavez for helpful discussions and data access. We also thank Alex Crits-Christoph for sharing code that was adapted for our metatranscriptomic analyses, and Alma Parada, Julian Fortney, and Nestor Arandia-Gorostidi for collecting and/or processing samples used for the metagenomic analysis. We thank the captain, crew, and science party of the R/V Oceanus OC1703A expedition. Finally, we acknowledge the Stanford Research Computing Center for informatic support as well as the Stanford Geomicrobiology Shared Laboratories Core Facility (RRID:SCR_025000).
Authors’ contributions
A.L.J. and A.E.D. designed the project. A.E.D. collected samples. A.L.J. and R.S.R.S. processed the metagenomic data and performed downstream bioinformatic analysis. A.L.J. and A.E.D. interpreted data and wrote the manuscript. All authors made comments on the manuscript.
Funding
Oceanographic sampling and metagenomic sequencing were funded by the National Science Foundation (NSF; OCE-1634297 and OCE-2143035 to A.E.D.). A.L.J. was funded by the Stanford Science Fellows program and the NSF Postdoctoral Fellowship in Ocean Sciences. R.S.R.S. was funded by the Stanford Graduate Fellowship. A.E.D. was supported by NSF award OCE-2143035.
Data availability
Newly-resolved representative MAGs used in this study as well as raw metagenomic reads from the OC1703A transect are available through NCBI at PRJNA1054206 [103]. Specific source information for all genomes is listed in Additional file 1: Table S3, including those derived from previous studies [104–114]. Accession numbers for all public sequencing reads used for abundance and gene expression analyses are listed in Additional file 1: Table S10/S12 [112, 115–117]. Custom code describing the above analyses is available via Github/Zenodo under an MIT License (10.5281/zenodo.15376808) [118, 119].
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Change history
6/30/2025
The additional files were erroneously transposed. This has been corrected.
Contributor Information
Alexander L. Jaffe, Email: ajaffe@stanford.edu
Anne E. Dekas, Email: dekas@stanford.edu
References
- 1.Raven JA. Contributions of anoxygenic and oxygenic phototrophy and chemolithotrophy to carbon and oxygen fluxes in aquatic environments. Aquat Microb Ecol. 2009;56:177–92. [Google Scholar]
- 2.Bar-On YM, Milo R. The global mass and average rate of rubisco. Proc Natl Acad Sci U S A. 2019;116:4738–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Geider RJ, Delucia EH, Falkowski PG, Finzi AC, Grime JP, Grace J, et al. Primary productivity of planet earth: biological determinants and physical constraints in terrestrial and aquatic habitats. Glob Chang Biol. 2001;7:849–82. [Google Scholar]
- 4.Arandia-Gorostidi N, Parada AE, Dekas AE. Single-cell view of deep-sea microbial activity and intracommunity heterogeneity. ISME J. 2023;17:59–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Acinas SG, Sánchez P, Salazar G, Cornejo-Castillo FM, Sebastián M, Logares R, et al. Deep ocean metagenomes provide insight into the metabolic architecture of bathypelagic microbial communities. Commun Biol. 2021;4:604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Herndl GJ, Reinthaler T. Microbial control of the dark end of the biological pump. Nat Geosci. 2013;6:718–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Baltar F, Arístegui J, Sintes E, Gasol JM, Reinthaler T, Herndl GJ. Significance of non‐sinking particulate organic carbon and dark CO2 fixation to heterotrophic carbon demand in the mesopelagic northeast Atlantic. Geophys Res Lett. 2010;37. Available from: 10.1029/2010GL043105. [DOI]
- 8.Reinthaler T, van Aken HM, Herndl GJ. Major contribution of autotrophy to microbial carbon cycling in the deep North Atlantic’s interior. Deep Sea Res Part 2 Top Stud O ceanogr. 2010;57:1572–80. [Google Scholar]
- 9.Arístegui J, Gasol JM, Duarte CM, Herndl GJ. Microbial oceanography of the dark ocean’s pelagic realm. Limnol Oceanogr. 2009;54:1501–29. [Google Scholar]
- 10.Francis CA, Roberts KJ, Beman JM, Santoro AE, Oakley BB. Ubiquity and diversity of ammonia-oxidizing archaea in water columns and sediments of the ocean. Proc Natl Acad Sci U S A. 2005;102:14683–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ingalls AE, Shah SR, Hansman RL, Aluwihare LI, Santos GM, Druffel ERM, et al. Quantifying archaeal community autotrophy in the mesopelagic ocean using natural radiocarbon. Proc Natl Acad Sci U S A. 2006;103:6442–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Könneke M, Schubert DM, Brown PC, Hügler M, Standfest S, Schwander T, et al. Ammonia-oxidizing archaea use the most energy-efficient aerobic pathway for CO2 fixation. Proc Natl Acad Sci U S A. 2014;111:8239–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Swan BK, Martinez-Garcia M, Preston CM, Sczyrba A, Woyke T, Lamy D, et al. Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the dark ocean. Science. 2011;333:1296–300. [DOI] [PubMed] [Google Scholar]
- 14.Baltar F, Martínez-Pérez C, Amano C, Vial M, Robaina-Estévez S, Reinthaler T, et al. A ubiquitous gammaproteobacterial clade dominates expression of sulfur oxidation genes across the mesopelagic ocean. Nat Microbiol. 2023;8:1137–48. [DOI] [PubMed] [Google Scholar]
- 15.Baltar F, Herndl GJ. Ideas and perspectives: Is dark carbon fixation relevant for oceanic primary production estimates? Biogeosciences. 2019;16:3793–9. [Google Scholar]
- 16.Ricci F, Greening C. Chemosynthesis: a neglected foundation of marine ecology and biogeochemistry. Trends Microbiol. 2024;0. Available from: http://www.cell.com/article/S0966842X23003323/abstract. Cited 30 Jan 2024 30. [DOI] [PubMed]
- 17.Badger MR, Bek EJ. Multiple Rubisco forms in proteobacteria: their functional significance in relation to CO2 acquisition by the CBB cycle. J Exp Bot. 2008;59:1525–41. [DOI] [PubMed] [Google Scholar]
- 18.Prywes N, Phillips NR, Tuck OT, Valentin-Alvarado LE, Savage DF. Rubisco Function, Evolution, and Engineering. Annu Rev Biochem. 2023;92:385–410. [DOI] [PubMed] [Google Scholar]
- 19.Flamholz AI, Prywes N, Moran U, Davidi D, Bar-On YM, Oltrogge LM, et al. Revisiting Trade-offs between Rubisco Kinetic Parameters. Biochemistry. 2019;58:3365–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sato T, Atomi H, Imanaka T. Archaeal type III RuBisCOs function in a pathway for AMP metabolism. Science. 2007;315:1003–6. [DOI] [PubMed] [Google Scholar]
- 21.Erb TJ, Zarzycki J. A short history of RubisCO: the rise and fall (?) of Nature’s predominant CO2 fixing enzyme. Curr Opin Biotechnol. 2018;49:100–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Arandia-Gorostidi N, Jaffe AL, Parada AE, Kapili BJ, Casciotti KL, Salcedo RSR, et al. Urea assimilation and oxidation support activity of phylogenetically diverse microbial communities of the dark ocean. ISME J. 2024; Available from: 10.1093/ismejo/wrae230. [DOI] [PMC free article] [PubMed]
- 23.Paoli L, Ruscheweyh H-J, Forneris CC, Hubrich F, Kautsar S, Bhushan A, et al. Biosynthetic potential of the global ocean microbiome. Nature. 2022;607:111–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bowers RM, The Genome Standards Consortium, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35:725–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nishimura Y, Yoshizawa S. The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments. Sci Data. 2022;9:305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sun X, Jayakumar A, Ward BB. Community Composition of Nitrous Oxide Consuming Bacteria in the Oxygen Minimum Zone of the Eastern Tropical South Pacific. Front Microbiol. 2017;8:1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Anstett J, Plominsky AM, DeLong EF, Kiesser A, Jürgens K, Morgan-Lang C, et al. A compendium of bacterial and archaeal single-cell amplified genomes from oxygen deficient marine waters. Sci Data. 2023;10:332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang IH, Sun X, Jayakumar A, Fortin SG, Ward BB, Babbin AR. Partitioning of the denitrification pathway and other nitrite metabolisms within global oxygen deficient zones. ISME Commun. 2023;3:76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jaffe AL, Castelle CJ, Dupont CL, Banfield JF. Lateral Gene Transfer Shapes the Distribution of RuBisCO among Candidate Phyla Radiation Bacteria and DPANN Archaea. Mol Biol Evol. 2019;36:435–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tabita FR, Satagopan S, Hanson TE, Kreel NE, Scott SS. Distinct form I, II, III, and IV Rubisco proteins from the three kingdoms of life provide clues about Rubisco evolution and structure/function relationships. J Exp Bot. 2008;59:1515–24. [DOI] [PubMed] [Google Scholar]
- 31.Morris RM, Spietz RL. The Physiology and Biogeochemistry of SUP05. Ann Rev Mar Sci. 2022;14:261–75. [DOI] [PubMed] [Google Scholar]
- 32.Zhou Z, Tran PQ, Adams AM, Kieft K, Breier JA, Fortunato CS, et al. Sulfur cycling connects microbiomes and biogeochemistry in deep-sea hydrothermal plumes. ISME J. 2023;17:1194–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Reji L, Francis CA. Metagenome-assembled genomes reveal unique metabolic adaptations of a basal marine Thaumarchaeota lineage. ISME J. 2020;14:2105–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Aylward FO, Santoro AE. Heterotrophic Thaumarchaea with Small Genomes Are Widespread in the Dark Ocean. mSystems. 2020;5. Available from: 10.1128/mSystems.00415-20. [DOI] [PMC free article] [PubMed]
- 35.Wrighton KC, Castelle CJ, Varaljay VA, Satagopan S, Brown CT, Wilkins MJ, et al. RubisCO of a nucleoside pathway known from Archaea is found in diverse uncultivated phyla in bacteria. ISME J. 2016;10:2702–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kono T, Mehrotra S, Endo C, Kizu N, Matusda M, Kimura H, et al. A RuBisCO-mediated carbon metabolic pathway in methanogenic archaea. Nat Commun. 2017;8:14007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Spreitzer RJ. Role of the small subunit in ribulose-1,5-bisphosphate carboxylase/oxygenase. Arch Biochem Biophys. 2003;414:141–9. [DOI] [PubMed] [Google Scholar]
- 38.Banda DM, Pereira JH, Liu AK, Orr DJ, Hammel M, He C, et al. Novel bacterial clade reveals origin of form I Rubisco. Nat Plants. 2020;6:1158–66. [DOI] [PubMed] [Google Scholar]
- 39.Say RF, Fuchs G. Fructose 1,6-bisphosphate aldolase/phosphatase may be an ancestral gluconeogenic enzyme. Nature. 2010;464:1077–81. [DOI] [PubMed] [Google Scholar]
- 40.Srivastava A, De Corte D, Garcia JAL, Swan BK, Stepanauskas R, Herndl GJ, et al. Interplay between autotrophic and heterotrophic prokaryotic metabolism in the bathypelagic realm revealed by metatranscriptomic analyses. Microbiome. 2023;11:239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Callbeck CM, Canfield DE, Kuypers MMM, Yilmaz P, Lavik G, Thamdrup B, et al. Sulfur cycling in oceanic oxygen minimum zones. Limnol Oceanogr. 2021;66:2360–92. [Google Scholar]
- 42.Campbell BJ, Engel AS, Porter ML, Takai K. The versatile epsilon-proteobacteria: key players in sulphidic habitats. Nat Rev Microbiol. 2006;4:458–68. [DOI] [PubMed] [Google Scholar]
- 43.Lappan R, Shelley G, Islam ZF, Leung PM, Lockwood S, Nauer PA, et al. Molecular hydrogen in seawater supports growth of diverse marine bacteria. Nat Microbiol. 2023;8:581–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Reji L, Tolar BB, Smith JM, Chavez FP, Francis CA. Differential co-occurrence relationships shaping ecotype diversification within Thaumarchaeota populations in the coastal ocean water column. ISME J. 2019;13:1144–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Boeuf D, Edwards BR, Eppley JM, Hu SK, Poff KE, Romano AE, et al. Biological composition and microbial dynamics of sinking particulate organic matter at abyssal depths in the oligotrophic open ocean. Proc Natl Acad Sci U S A. 2019;116:11824–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Klawonn I, Bonaglia S, Brüchert V, Ploug H. Aerobic and anaerobic nitrogen transformation processes in N2-fixing cyanobacterial aggregates. ISME J. 2015;9:1456–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bianchi D, Weber TS, Kiko R, Deutsch C. Global niche of marine anaerobic metabolisms expanded by particle microenvironments. Nat Geosci. 2018;11:263–8. [Google Scholar]
- 48.Boeuf D, Eppley JM, Mende DR, Malmstrom RR, Woyke T, DeLong EF. Metapangenomics reveals depth-dependent shifts in metabolic potential for the ubiquitous marine bacterial SAR324 lineage. Microbiome. 2021;9:172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Anantharaman K, Breier JA, Sheik CS, Dick GJ. Evidence for hydrogen oxidation and metabolic plasticity in widespread deep-sea sulfur-oxidizing bacteria. Proc Natl Acad Sci U S A. 2013;110:330–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pachiadaki MG, Sintes E, Bergauer K, Brown JM, Record NR, Swan BK, et al. Major role of nitrite-oxidizing bacteria in dark ocean carbon fixation. Science. 2017;358:1046–51. [DOI] [PubMed] [Google Scholar]
- 51.De Corte D, Muck S, Tiroch J, Mena C, Herndl GJ, Sintes E. Microbes mediating the sulfur cycle in the Atlantic Ocean and their link to chemolithoautotrophy. Environ Microbiol. 2021;23:7152–67. [DOI] [PubMed] [Google Scholar]
- 52.Raven MR, Keil RG, Webb SM. Microbial sulfate reduction and organic sulfur formation in sinking marine particles. Science. 2021;371:178–81. [DOI] [PubMed] [Google Scholar]
- 53.Murillo AA, Ramírez-Flandes S, DeLong EF, Ulloa O. Enhanced metabolic versatility of planktonic sulfur-oxidizing γ-proteobacteria in an oxygen-deficient coastal ecosystem. Frontiers in Marine Science . 2014;1. Available from: https://www.frontiersin.org/articles/10.3389/fmars.2014.00018.
- 54.Spietz RL, Lundeen RA, Zhao X, Nicastro D, Ingalls AE, Morris RM. Heterotrophic carbon metabolism and energy acquisition in Candidatus Thioglobus singularis strain PS1, a member of the SUP05 clade of marine Gammaproteobacteria. Environ Microbiol. 2019;21:2391–401. [DOI] [PubMed] [Google Scholar]
- 55.Bayer B, Kitzinger K, Paul NL, Albers JB, Saito MA, Wagner M, et al. Contribution of ammonia oxidizers to inorganic carbon fixation in the dark ocean [Internet]. Microbiology. bioRxiv; 2024. Available from: https://www.biorxiv.org/content/10.1101/2024.11.16.623942v1.abstract.
- 56.Hügler M, Sievert SM. Beyond the Calvin cycle: autotrophic carbon fixation in the ocean. Ann Rev Mar Sci. 2011;3:261–89. [DOI] [PubMed] [Google Scholar]
- 57.Ruiz-Fernández P, Ramírez-Flandes S, Rodríguez-León E, Ulloa O. Autotrophic carbon fixation pathways along the redox gradient in oxygen-depleted oceanic waters. Environ Microbiol Rep. 2020;12:334–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31:1674–6. [DOI] [PubMed] [Google Scholar]
- 59.Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7: e7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Titus Brown C, Irber L. sourmash: a library for MinHash sketching of DNA. J Open Source Softw. 2016;1:27. [Google Scholar]
- 61.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11:2864–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, et al. Anvi’o: an advanced analysis and visualization platform for ’omics data. PeerJ. 2015;3: e1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Biller SJ, Berube PM, Dooley K, Williams M, Satinsky BM, Hackl T, et al. Marine microbial metagenomes sampled across space and time. Sci Data. 2018;5: 180176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Pachiadaki MG, Brown JM, Brown J, Bezuidt O, Berube PM, Biller SJ, et al. Charting the complexity of the marine microbiome through single-cell genomics. Cell. 2019;179:1623-35.e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Klemetsen T, Raknes IA, Fu J, Agafonov A, Balasundaram SV, Tartari G, et al. The MAR databases: development and implementation of databases specific for marine metagenomics. Nucleic Acids Res. 2018;46:D692–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Salazar G, Paoli L, Alberti A, Huerta-Cepas J, Ruscheweyh H-J, Cuenca M, et al. Gene Expression Changes and Community Turnover Differentially Shape the Global Ocean Metatranscriptome. Cell. 2019;179:1068-83.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al. Ocean plankton. Structure and function of the global ocean microbiome. Science. 2015;348:1261359. [DOI] [PubMed] [Google Scholar]
- 70.Sun X, Ward BB. Novel metagenome-assembled genomes involved in the nitrogen cycle from a Pacific oxygen minimum zone. ISME Commun. 2021;1:26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Boyd JA, Woodcroft BJ, Tyson GW. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. Nucleic Acids Res. 2018;46: e59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–1. [DOI] [PubMed] [Google Scholar]
- 73.Aramaki T, Blanc-Mathieu R, Endo H, Ohkubo K, Kanehisa M, Goto S, et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics. 2020;36:2251–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics . 2019; Available from: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed]
- 75.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49:9077–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4: e2584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Jaffe AL, Castelle CJ, Matheus Carnevali PB, Gribaldo S, Banfield JF. The rise of diversity in metabolic platforms across the Candidate Phyla Radiation. BMC Biol. 2020;18:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Zhou Z, Tran PQ, Breister AM, Liu Y, Kieft K, Cowley ES, et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. Microbiome. 2022;10:33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Müller AL, Kjeldsen KU, Rattei T, Pester M, Loy A. Phylogenetic and environmental diversity of DsrAB-type dissimilatory (bi)sulfite reductases. ISME J. 2015;9:1152–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Liu H, Xin Y, Xun L. Distribution, diversity, and activities of sulfur dioxygenases in heterotrophic bacteria. Appl Environ Microbiol. 2014;80:1799–806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Boddicker AM, Mosier AC. Genomic profiling of four cultivated Candidatus Nitrotoga spp. predicts broad metabolic potential and environmental distribution. ISME J. 2018;12:2864–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Diamond S, Lavy A, Crits-Christoph A, Matheus Carnevali PB, Sharrar A, Williams KH, et al. Soils and sediments host Thermoplasmata archaea encoding novel copper membrane monooxygenases (CuMMOs). ISME J. 2022;16:1348–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5: e9490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016;44:W242–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Glass JB, Kretz CB, Ganesh S, Ranjan P, Seston SL, Buck KN, et al. Meta-omic signatures of microbial metal and nitrogen cycling in marine oxygen minimum zones. Front Microbiol. 2015;6:998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Tsementzi D, Wu J, Deutsch S, Nath S, Rodriguez-R LM, Burns AS, et al. SAR11 bacteria linked to ocean anoxia and nitrogen loss. Nature. 2016;536:179–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Fuchsman CA, Palevsky HI, Widner B, Duffy M, Carlson MCG, Neibauer JA, et al. Cyanobacteria and cyanophage contributions to carbon and nitrogen cycling in an oligotrophic oxygen-deficient zone. ISME J. 2019;13:2714–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Stewart FJ, Ulloa O, DeLong EF. Microbial metatranscriptomics in a permanent marine oxygen minimum zone. Environ Microbiol. 2012;14:23–40. [DOI] [PubMed] [Google Scholar]
- 93.Ganesh S, Parris DJ, DeLong EF, Stewart FJ. Metagenomic analysis of size-fractionated picoplankton in a marine oxygen minimum zone. ISME J. 2014;8:187–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Wang X, Zain Ul Arifeen M, Hou S, Zheng Q. Depth-dependent microbial metagenomes sampled in the northeastern Indian Ocean. Sci Data. 2024;11:88. [DOI] [PMC free article] [PubMed]
- 95.Bertagnolli AD, Konstantinidis KT, Stewart FJ. Non-denitrifier nitrous oxide reductases dominate marine biomes. Environ Microbiol Rep. 2020;12:681–92. [DOI] [PubMed] [Google Scholar]
- 96.Sánchez P, Coutinho FH, Sebastián M, Pernice MC, Rodríguez-Martínez R, Salazar G, et al. Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition. Sci Data. 2024;11:154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Olm MR, Crits-Christoph A, Bouma-Gregson K, Firek BA, Morowitz MJ, Banfield JF. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat Biotechnol. 2021;39:727–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33. [DOI] [PMC free article] [PubMed]
- 99.Boeing G. Clustering to reduce spatial data set size. SocArXiv. 2018. Available from: 10.31235/osf.io/nzhdc.
- 100.Ganesh S, Bristow LA, Larsen M, Sarode N, Thamdrup B, Stewart FJ. Size-fraction partitioning of community gene transcription and nitrogen metabolism in a marine oxygen minimum zone. ISME J. 2015;9:2682–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Ganesh S, Bertagnolli AD, Bristow LA, Padilla CC, Blackwood N, Aldunate M, et al. Single cell genomic and transcriptomic evidence for the use of alternative nitrogen substrates by anammox bacteria. ISME J. 2018;12:2706–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Garcia-Robledo E, Padilla CC, Aldunate M, Stewart FJ, Ulloa O, Paulmier A, et al. Cryptic oxygen cycling in anoxic marine zones. Proc Natl Acad Sci U S A. 2017;114:8319–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Jaffe AL, Salcedo RSR, Dekas AE. Datasets. Microbial communities from the epipelagic, mesopelagic, and bathypelagic along a 300 km transect into the Northeast Pacific Ocean. National Center for Biotechnology Information. 2025. https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1054206.
- 104.Acinas SG, Sanchez P. Deep ocean metagenomes provide insight into the metabolic architecture of bathypelagic microbial communities. Supplementary materials. Datasets. BioStudies. 2021. https://www.ebi.ac.uk/biostudies/studies/S-BSST457?query=%20S-BSST457. [DOI] [PMC free article] [PubMed]
- 105.Paoli L, Ruscheweyh H-J, Forneris CC, Hubrich F, Kautsar S, Bhushan A, et al. PRJEB45951. Datasets. European Nucleotide Archive. 2021. https://www.ebi.ac.uk/ena/browser/view/PRJEB45951.
- 106.Nishimura Y, Yoshizawa S. PRJDB11811. Datasets. European Nucleotide Archive. 2022.https://www.ebi.ac.uk/ena/browser/view/PRJDB11811
- 107.Anstett J, Plominsky AM, DeLong EF, Kiesser A, Jürgens K, Morgan-Lang C, et al. A compendium of bacterial and archaeal single-cell amplified genomes from oxygen deficient marine waters. Datasets Figshare. 2023. 10.6084/m9.figshare.c.6137379.v5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Zhang IH, Sun X, Jayakumar A, Fortin SG, Ward BB, Babbin AR. Marine oxygen deficient zone metagenomic assembly. Datasets. National Center for Biotechnology Information. 2023. https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA955304.
- 109.Biller SJ, Berube PM, Dooley K, Williams M, Satinsky BM, Hackl T, et al. EMG produced TPA metagenomics assembly of PRJNA385854 data set (Marine metagenomes from the bioGEOTRACES project). Datasets. National Center for Biotechnology Information. 2021. https://www.ncbi.nlm.nih.gov/bioproject/735177.
- 110.Pachiadaki MG, Brown JM, Brown J, Bezuidt O, Berube PM, Biller SJ, et al. PRJEB33281. Datasets. European Nucleotide Archive. 2019 https://www.ebi.ac.uk/ena/browser/view/PRJEB33281.
- 111.Klemetsen T, Raknes IA, Fu J, Agafonov A, Balasundaram SV, Tartari G, et al. Marine Metagenomics Portal (MMP). Datasets. 2018. https://mmp.sfb.uit.no/. [DOI] [PMC free article] [PubMed]
- 112.Salazar, G. Supplementary information for samples and data used in Salazar et al. (2019). Datasets. Zenodo. 10.5281/zenodo.3473199 (2019).
- 113.Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al. Companion website for: "Structure and function of the global ocean microbiome". Datasets. 2015 https://ocean-microbiome.embl.de/companion.html. [DOI] [PubMed]
- 114.Sun X, Ward BB. MAGs from ETSP OMZ. Datasets. Figshare. https://figshare.com/articles/dataset/MAGs_from_ETSP_OMZ/12291281.
- 115.Ganesh S, Bristow LA, Larsen M, Sarode N, Thamdrup B, Stewart FJ. Size-fraction partitioning of community gene transcription and rates of nitrogen metabolism in a marine oxygen minimum zone. Datasets. National Center for Biotechnology Information. 2014 https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA263621. [DOI] [PMC free article] [PubMed]
- 116.Ganesh S, Bertagnolli AD, Bristow LA, Padilla CC, Blackwood N, Aldunate M, et al. Genomic evidence for the use of alternative nitrogen substrates by anammox bacteria. Datasets. National Center for Biotechnology Information. 2017. https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA407229.
- 117.Garcia-Robledo E, Padilla CC, Aldunate M, Stewart FJ, Ulloa O, Paulmier A, et al. EMG produced TPA metagenomics assembly of PRJNA305951 data set (Cryptic Oxygen Cycle in the Oxygen Minimum Zones: the role of the Secondary Chlorophyll Maximum). Datasets. National Center for Biotechnology Information. 2021. https://www.ncbi.nlm.nih.gov/bioproject/790605.
- 118.Jaffe AL. rubisco-genomics. Github. 2021. https://github.com/alexanderjaffe/rubisco-genomics.
- 119.Jaffe AL. rubisco genomics (Jaffe et al. 2025). Zenodo. 10.5281/zenodo.15376808 (2025).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1: Supplementary tables. Supporting data and characteristics of genomes/samples used in this study.
Additional file 2: Supplementary figures. Supporting figures for the analyses presented in this study.
Data Availability Statement
Newly-resolved representative MAGs used in this study as well as raw metagenomic reads from the OC1703A transect are available through NCBI at PRJNA1054206 [103]. Specific source information for all genomes is listed in Additional file 1: Table S3, including those derived from previous studies [104–114]. Accession numbers for all public sequencing reads used for abundance and gene expression analyses are listed in Additional file 1: Table S10/S12 [112, 115–117]. Custom code describing the above analyses is available via Github/Zenodo under an MIT License (10.5281/zenodo.15376808) [118, 119].