ABSTRACT
Microorganisms dominate all ecosystems on Earth and play a key role in the turnover of organic matter. By producing enzymes, they degrade complex carbohydrates, facilitating the recycling of nutrients and controlling the carbon cycle. Despite their importance, our knowledge regarding microbial carbohydrate utilization has been limited to genome-sequenced taxa and thus heavily biased to specific groups and environments. Here, we used the Genomes from Earth’s Microbiomes (GEM) catalog to describe the carbohydrate utilization potential in >7000 bacterial and archaeal taxa originating from a range of terrestrial, marine and host-associated habitats. We show that the production of carbohydrate-active enzymes (CAZymes) is phylogenetically conserved and varies significantly among microbial phyla. High numbers of carbohydrate-active enzymes were recorded in phyla known for their versatile use of carbohydrates, such as Firmicutes, Fibrobacterota, and Armatimonadota, but also phyla without cultured representatives whose carbohydrate utilization potential was so far unknown, such as KSB1, Hydrogenedentota, Sumerlaeota, and UBP3. Carbohydrate utilization potential reflected the specificity of various habitats: the richest complements of CAZymes were observed in MAGs of plant microbiomes, indicating the structural complexity of plant biopolymers.
IMPORTANCE This study expanded our knowledge of the phylogenetic distribution of carbohydrate-active enzymes across prokaryotic tree of life, including new phyla where the carbohydrate-active enzymes composition have not been described until now and demonstrated the potential for carbohydrate utilization of numerous yet uncultured phyla. Profiles of carbohydrate-active enzymes are largely habitat-specific and reflect local carbohydrate availability by selecting taxa with appropriate complements of these enzymes. This information should aid in the prediction of functions in microbiomes of known taxonomic composition and helps to identify key components of habitat-specific carbohydrate pools. In addition, these findings have a high relevance for the understanding of carbohydrate utilization and carbon cycling in the environment, the process that is closely link to the carbon storage potential of Earth habitats and the production of greenhouse gasses.
KEYWORDS: carbohydrate-active enzymes, earth microbiome, natural ecosystems, phylogenetic conservation, habitat specificity
INTRODUCTION
Microorganisms are important drivers of decomposition processes occurring in natural habitats, mediating organic carbon (C) turnover and nutrient recycling in all ecosystems on Earth. Through effects on C cycling, they contribute to a range of essential processes, including the control of the performance and health of their plant or animal hosts or the C fluxes at the ecosystem level. Understanding microbial roles in global C models is essential for future predictions of the health of our planet (1–3). Microorganisms degrade complex carbohydrates of various origins found in nature by producing enzymes (4, 5). These microbial enzymes involved in the degradation of C compounds are termed carbohydrate-active enzymes (CAZymes), i.e., enzymes that degrade or modify carbohydrates (http://www.cazy.org/) (6). Most enzymes performing carbohydrate decomposition are glycoside hydrolases (GHs). Furthermore, lytic polysaccharide monooxygenases (LPMOs), oxidases and peroxidases classified as enzymes with auxiliary activities (AAs) in the CAZy database have also been found to play an important role in carbohydrate degradation (7, 8). CAZymes are classified into families and subfamilies based on structural similarity, and this classification reflects to a large extent their substrate specificity (6).
Bacteria and archaea, whose cell numbers on Earth are estimated to be approximately 1030, inhabit all types of habitats from soils to the oceans and the sediments of the seabed, living freely or in association with their animal or plant hosts (9, 10). With the advancement of sequencing technologies and bioinformatic resources, first attempts have been made to describe the phylogenetic conservation of microbial enzymes in microbial phylogenies (11, 12) or the occurrence of microbial GHs across environments (13) using limited numbers of genomic and metagenomic data sets that were available at the time of analysis. However, since the potential for carbohydrate utilization of most microbial taxa could not be reliably explored because only a fraction of them have been isolated and cultured (14, 15). This is why our present knowledge of microbial carbohydrate utilization is limited to genome-sequenced taxa and thus heavily biased to specific groups and environments (12). Genome-resolved metagenomics has enabled the reconstruction of microbial genomes from microbial populations. This approach substantially improved our understanding of microbial evolution and enabled the prediction of metabolic capacities of high numbers of relevant microbes across all habitats, including members of novel microbial phyla that lack genome-sequenced isolates (16–20).
Here, we analyzed high-quality metagenome-assembled genomes (MAGs) from the recently compiled Genomes from Earth’s Microbiomes (GEM) catalog, which includes >10,000 metagenomes and >52,000 MAGs from 135 phyla (19). This large-scale genomic inventory provides a critical resource of prokaryotic genomes avoiding taxa isolation biases and linking them with a representative environment on Earth. In the gene models in this catalog, we identified CAZymes involved in the degradation of the main complex carbohydrates in nature, including cellulose, hemicellulose, α-glucans, β-glucans, pectin, chitin, peptidoglycan, and others. We aimed to present a descriptive overview of the abilities of bacterial and archaeal phyla to degrade carbohydrates and to characterize the CAZyme pools of the members of important microbial habitats. While information on CAZyme distribution and phylogenetic conservation should aid in the prediction of functions in microbiomes of known taxonomic composition, the description of habitat-specific enzyme pools helps to identify key components of habitat-specific carbohydrate pools and propose pathways of their transformation (20, 21).
RESULTS AND DISCUSSION
Composition and diversity of CAZymes across microbial taxa.
We selected high-quality MAGs (n = 9,143) that belonged to 90 bacterial and 14 archaeal phyla. While the potential for carbohydrate utilization was omnipresent along the bacterial and archaeal trees of life, the CAZyme content per genome differed among phyla. The composition of CAZyme pools of Proteobacteria and Firmicutes_C clustered separately from other major bacterial phyla, although MAGs of both groups still showed a high level of variation in CAZyme composition (Fig. 1a). Among the phyla showing high numbers of CAZymes per genome with a median >100 were some of those well-known for the utilization of a wide range of carbohydrates, such as Firmicutes_I, Fibrobacterota, and Armatimonadota (12, 22), as well as several other phyla uncultured thus far whose carbohydrate utilization potential was so far unknown, such as KSB1, Hydrogenedentota, Sumerlaeota, and UBP3 (Fig. 1b). Moreover, CAZymes of these phyla also encoded the most diverse complements of CAZyme families per genome, indicating the highest functional diversity (Fig. 1c). When focusing on the most populated phyla, those with the highest CAZy count were Planctomycetota (122 ± 5), Verrucomicrobiota (106 ± 2), Acidobacteriota (99 ± 3), and Bacteroidota (97 ± 2). In contrast, proteobacterial taxa contained 46 ± 1 and Actinobacteriota 41 ± 1 CAZymes per genome, placing them among average phyla. Another highly populated phylum, Firmicutes_C, contained only 28 ± 1 CAZymes per genome, placing it at the low end of the phylum comparison (Fig. 1b). Regarding the content of CAZy families per genome, MAGs of Planctomycetota contained genes of 47 ± 1 CAZy families per genome, Verrucomicrobiota 44 ± 1, Bacteroidota 42 ± 1, and Acidobacteriota 41 ± 1. Other phyla, such as Proteobacteria, Actinobacteriota, and Firmicutes_C, contained only 26, 21, and 17 CAZy families per genome, respectively (Fig. 1c).
FIG 1.
Composition and diversity of carbohydrate-active enzymes of metagenome-assembled genomes across microbial taxa. (a) PCoA based on CAZyme gene pools identified within MAGs. Taxonomic assignments of each MAG (n = 9, 143) are color coded. Abundant CAZy types representing >0.1% of all identified CAZymes were considered, and their counts were normalized to construct Euclidean distances. Ellipses represent group centroids. (b–c) Boxplots showing the total CAZy gene count per MAG (CAZy abundance) within phyla and CAZyme functional diversity (number of CAZy families per MAG). The category Other contains phyla with <10 MAGs per phylum (55 phyla with 166 MAGs in total). Numbers in brackets denote MAG count. All CAZymes belonging to the AA, CBM, PL, CE, GH, and GT classes are considered. Boxplots show median values and lower and upper quartiles.
Glycoside hydrolases and glycoside transferases (GTs) were found in all 90 bacterial phyla, and carbohydrate esterases (CEs) were found in 88 phyla. Carbohydrate-binding modules (CBMs), polysaccharide lyases (PLs) and AAs were less widespread, occurring in 65, 57, and 17 bacterial phyla, respectively (Fig. 2). The abundance of each class was also different for each taxonomic group. For example, the highest average number of GHs per genome was detected in OLB16 (200 genes), Hydrogenedentota (115 genes), Firmicutes_I (111 genes), KSB1 (86 genes), and Armatimonadota (76 genes), while the genomes of Campylobacterota, UBP7, and Patescibacteria encoded on average only 3, 2 and 1 GHs, respectively. AAs were generally rare, being more frequent only in the genomes of Firmicutes_I, Myxococcota, and Zixibacteria; in all cases, <1 gene per genome on average. CBMs were most abundant in Firmicutes_I, FCPU426, and Fibrobacterota at 35, 19, and 14 per genome, respectively.
FIG 2.
Occurrence and abundance of carbohydrate-active enzymes targeting various carbohydrates in bacterial phyla. Phyla containing at least one MAG are shown in bold. The rings indicate from inside to outside: (i) number of MAGs in the phylum, (ii) content of CAZyme classes, (iii) genomic potential for the degradation of selected carbohydrates, and (iv) habitats from which MAGs originate. Asterisks indicate phyla with previously unrecognized/unexamined potential for carbohydrate degradation. Phylogenetic tree of bacterial phyla was obtained from the Genome Taxonomy Database (GTDB) (https://gtdb.ecogenomic.org/).
Bacteroidota, Fibrobacterota, Acidobacteriota, Proteobacteria, Verrucomicrobiota, Planctomycetota, and several groups of Firmicutes were identified as the phyla whose members were most versatile in attacking diverse biopolymers of various origins. Species belonging to these taxa are known degraders in diverse natural and engineered ecosystems (12, 22). Interestingly, several less known bacterial phyla can also degrade a high variety of carbohydrates. This is the case for Goldbacteria, Myxococcota, Hydrogenedentota, OLB16, KSB1, Dictyoglomota, and Firestonebacteria. A few of these newly described candidate phyla from the rare biosphere were recently reported as carbohydrate degraders (23, 24). Our results now extend this list substantially, as we describe here the ability to degrade carbohydrates in 44 new phyla (Fig. 2), in addition of 46 phyla already included in the CAZY database.
As expected, α-glucanases and cello/xylobiases were highly prevalent, present in 85 and 79 bacterial phyla, respectively. These groups also showed the lower percentage of extracellular enzymes (only 21% α-glucanases and 37% of cello/xylobiases were extracellular) (Fig. S1 in the supplemental material). In contrast, genes involved in the degradation of more recalcitrant or less common biopolymers are less frequent and encode mostly extracellular enzymes (Fig. S1). For example, cellulases were found in 45 phyla (highly abundant in the genomes of Goldbacteria and Fibrobacterota), chitinases in 63 phyla (most abundant in Sumerlaeota and Verrucomicrobia), xylanases/xyloglucanases in 31 phyla (abundant in Firmicutes_I and Fibrobacterota), mannanases in 48 phyla (most abundant in Goldbacteria and KSB1), arabinogalactanases in 29 phyla (abundant in Firmicutes_I and Dictyoglomota), β-glucanases in 58 phyla (most abundant in Krumholzibacteriota), and pectinases in 58 phyla (abundant in Hydrogenedentota and OLB16). LPMOs of the AA10 family were identified only in 7 phyla: Proteobacteria, Bacteroidota, Planctomycetota, Verrucomicrobiota, Actinobacteriota, Firmicutes, and Firmicutes_I. Notably, the content of hydrolytic enzymes appears to reflect the lifestyles of some of the less known bacterial phyla, as recently shown for some deep-sea prokaryotes (25).
Diversity of CAZyme families grouped by activity. Bars indicate the number of genes belonging to that CAZy family. In dark blue, the total number of CAZymes; and in light blue, only the extracellular CAZymes (sequences with signal peptide). Numbers under the figures indicate the percentage of extracellular CAZymes involved in the degradation of that carbohydrate. Download FIG S1, PDF file, 0.2 MB (181.9KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
CAZymes were also found in eight archaeal phyla (Crenarchaeota, Halobacterota, Thermoplasmatota, Euryarchaeota, Hydrothermarchaeota, Altiarchaeota, Micrarchaeota, and UAP2) (Fig. S2 in the supplemental material). α-Glucanases were found in 6 phyla, cello/xylobiases in three phyla, mannanases and pectinases in two phyla and cellulases, chitinases and β-glucanases in only one phylum, showing relatively limited potential of archaea in carbohydrate degradation (17).
Occurrence of carbohydrate-active enzymes in archaeal phyla. Phyla containing at least one MAG are shown in bold. The columns and colors indicate from left to right: (i) number of MAGs in the phylum, (ii) content of 7 CAZyme classes, (iii) genomic potential for the degradation of selected carbohydrates, and (iv) habitats from which the MAGs originated. Tree obtained from the Genome Taxonomy Database (GTDB) (https://gtdb.ecogenomic.org/). Download FIG S2, PDF file, 0.2 MB (195.2KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
When looking in more detail at individual members of the 19 bacterial phyla with >20 MAGs, a notable variation in carbohydrate utilization potential was observed inside each phylum (Fig. 3, Fig. S3). Chitinases were present in 95% of MAGs of Marinisomatota and 90% of Armatimonadota, but only in 28% of Proteobacteria and Actinobacteriota. Cellulases were encoded by the 62% of Verrucomicrobiota but only by the 15% of Actinobacteriota. In general, cello/xylobiases and α-glucanases were the most common activities encoded by individual taxa, ranging between 60% and 100% of MAGs in all major phyla, confirming previous reports on the ubiquity of these enzymes (12). Among the major bacterial phyla, Acidobacteriota, Planctomycetota, and Verrucomicrobiota appear to be the most capable degraders of biopolymers, with >60% of MAGs possessing genes for the degradation of chitin and pectin and >50% of MAGs possessing cellulases; in Proteobacteria and Actinobacteriota, corresponding genes were found only in <35% of MAGs (Fig. 3).
FIG 3.
Distribution of CAZymes with various targets across MAGs of six major bacterial phyla. Numbers in brackets represent numbers of MAGs, and percentages indicate the share of MAGs within phyla containing CAZymes with each specific carbohydrate target.
Occurrence of carbohydrate-active enzymes in minor bacterial phyla. Numbers in brackets represent numbers of MAGs, and percentages indicate the share of MAGs within phyla containing CAZymes with each specific carbohydrate target. Download FIG S3, PDF file, 2.1 MB (2.1MB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Phylogenetic conservation of carbohydrates utilization.
Microbial traits are a product of evolution, and as such, they show various levels of phylogenetic conservation (26, 27). The existence of a phylogenetic signal is an essential factor in microbial ecology since it allows us to assess the probability of certain traits in microbes where trait information from phylogenetically related taxa exists (28, 29). In this sense, the ability to utilize certain classes of carbohydrates (enzymatic activities) can be understood as important microbial traits. We found a significant phylogenetic signal (P < 0.001) for the utilization of various carbohydrates in most analyzed phyla (Table S1) with some variation in the mean genetic depth (τD) for each trait (each enzymatic activity) and phylum (Table S2, Fig. S4). The percentage of divergence in the 16S rRNA gene for significantly conserved traits varied for the different activities inside the same phylum. For example, values ranged between 3 to 10% in Verrucomicrobiota and 2 to 8% in Actinobacteriota and between 1 to 3% in Proteobacteria and 0.1 to 0.5% in Firmicutes_A. In general, the highest significance of the trait-phylogeny association was found for β-glucanases and cellulases (significant for 73% and 63% of the tested phyla, respectively), and the lowest significance was obtained for α-glucanases and cello/xylobiases (25% and 32%, respectively). Previous observations showed phylogenetic clustering of a few selected traits at a relatively fine scale (11, 30). Our results now vastly extend this information to a wide range of microbial phyla and multiple carbohydrate utilization traits, showing the level of phylogenetic predictability.
Phylogenetic conservation of carbohydrate utilization traits in microorganisms. Heatmap showing the mean genetic depth (τD) of the consensus clades sharing the ability to degrade different carbohydrates in the selected phyla (phyla including more than 20 retrieved MAGs are shown). Only significant values (P < 0.05) of the trait-phylogeny association are shown. Download FIG S4, PDF file, 0.2 MB (245.5KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Significance of the phylogenetic distance of the genes involved in the utilization of different carbohydrates in several bacterial phyla. Abouheif's Cmean was utilized to assess the phylogenetic signal of quantitative variables (P values <0.05 are considered statistically significant). Activities statistically significant in each phylum are marked in bold. Download Table S1, DOCX file, 0.03 MB (29.1KB, docx) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Extent of the phylogenetic conservation of carbohydrate utilization in bacterial phyla based on the consenTRAIT algorithm. The mean genetic depth (τD) of the consensus clades sharing the ability to degrade certain carbohydrate class, the significance (Npermutations = 1,000, P-value) of the trait-phylogeny association and the percentage of divergence in the 16S rRNA gene for significantly conserved traits. Trait and values in bold indicate significant trait conservation (P < 0.05). Download Table S2, DOCX file, 0.04 MB (36.1KB, docx) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Composition and diversity of CAZymes across habitats.
Habitat specificity in CAZyme pool composition was previously observed when comparing environmental metagenomes (13, 31). Here, we show that a certain level of this habitat specificity of CAZyme pools indeed exists: CAZyme pools of MAGs from human and mammalian microbiomes clustered together and separately from those from other habitats. CAZyme pools of MAGs from terrestrial, aquatic and engineered habitats and MAGs from plant microbiomes, however, primarily overlapped (Fig. 4a), confirming that the CAZy content in microbes is more affected by phylogeny than by habitat. The differences in CAZyme profiles among habitats are thus driven by differences in the phylogenetic composition of microbiomes among habitats rather than by the selection of strains with specific traits. The richest complements of CAZymes were observed in MAGs of plant microbiomes, with 106 CAZymes belonging to 43 families on average followed by soils (88 CAZymes in 36 families); the MAGs from marine habitats were less rich in CAZymes, with 39 CAZyme genes in 20 families per MAG on average (Fig. 4b and c). These differences most likely indicate the diversity and structural complexity of carbohydrates in each habitat, with the lignocellulose of vascular plants representing the carbohydrate pool of the highest complexity (32).
FIG 4.
Composition and diversity of carbohydrate-active enzymes of metagenome-assembled genomes across habitats. (a) PCoA based on CAZymes identified within MAGs. Habitat occurrence of each MAG (n = 9,143) is color coded. Abundant CAZy types representing >0.1% of all identified CAZymes were considered, and their counts were normalized to construct Euclidean distances. Ellipses represent group centroids. (b–c) Boxplots showing total CAZy gene count per MAG (CAZy abundance) within habitat and CAZyme functional diversity (number of CAZy families per MAG). Numbers in brackets denote MAG count. All CAZymes belonging to the AA, CBM, PL, CE, GH, and GT classes are considered. Boxplots show median values and lower and upper quartiles.
Among those CAZymes where substrate specificity can be reliably assigned to target carbohydrates, genes targeting carbohydrates of plant origin (cellulose, hemicelluloses and pectin) were most abundant. In addition, the share of CAZymes targeting microbial biomass and reserve compounds was also high (Fig. 5). Notably, genes targeting the globally most abundant biopolymers, cellulose and chitin were targeted by only 2.4% and 3.8% of such CAZymes, respectively. The composition of CAZyme pools differed significantly among habitats (P < 0.001) and reflected the carbohydrate sources in various habitats. The plant-associated microbiomes, rumen microbiomes, and soils were enriched in genes targeting cellulose, xylan, pectin, and other plant-derived compounds. These habitats appear to be potential gold mines for strains and genes of interest in lignocellulosic biomass conversion processes (22, 33, 34). The gut microbiomes of humans showed a lower proportion of plant targets reflecting different diets (Fig. 5). Not surprisingly, marine microbiomes rich in algal biomass showed a higher share of carragenases, agarases, and β-glucanases, supporting the assumption that β-glucan laminarin and other algal compounds are major molecules in the marine C cycle (35). α-Glucanases showed abundance in the deep subsurface, soil and nonmarine saline and alkaline habitats while being less abundant in host-associated habitats and seem to reflect the higher temporal fluctuation of C supply where energy reserves are important for survival (36). Their increased share in freshwater indicates the importance of this habitat in the mineralization of organic C (37, 38). Habitat specificity was also observed at the level of the gene abundances of dominant CAZyme families: while plant and mammalian microbiomes were enriched in selected cellulases and xylanases, soil and freshwater microbiomes showed enrichment of α-glucanases, indicating the importance of utilization of reserve compounds (Fig. S5).
FIG 5.
Difference in the relative abundance of CAZymes targeting various carbohydrates across habitats. Mean relative abundances of CAZymes across all habitats are indicated by a red line and a percentage. They represent a share of CAZyme genes targeting certain carbohydrates of all CAZyme genes where targets can be reliably assigned. Differences in the relative abundance across habitats reflect relative enrichment compared with the mean relative abundance. Capital letters close to the activity indicate the main origin of compounds attacked by each activity: P, plant origin; M, microbial origin; A, algal origin; and R, reserve compounds.
Relative abundance of genes from the top 30 CAZyme families in selected habitats. We considered all CAZymes with assigned functions. Dots display average relative abundance across all habitats. Download FIG S5, PDF file, 0.2 MB (205.3KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Conclusions.
Despite the known limitations of genomes reconstructed from shotgun metagenomes (19), the use of high-quality MAGs allows an in-depth analysis of the prokaryotic tree of life, overcoming the taxonomic limitations of isolation-biased genomes available in other databases and offering also a direct link with a specific environment. Our results provide a comprehensive catalog of carbohydrate-active enzymes across microbial phyla, including new phyla where the CAZyme composition has not been described until now: 49% of the phyla with CAZYmes reported in this study are not yet listed in the CAZY database. We also demonstrate the potential for carbohydrate utilization of numerous yet uncultured phyla, making up 57% of the phyla reported in this study. We show that the composition of the CAZyme pool is phylogenetically conserved across bacterial phyla. Profiles of CAZymes are largely habitat-specific and reflect local carbohydrate availability by selecting taxa with appropriate complements of CAZymes.
MATERIALS AND METHODS
CAZyme annotation.
MAGs from the Genomes from Earth’s Microbiomes (GEM) catalog classified as high-quality (n = 9,143) were filtered from the published set of total MAGs (n = 52,515) (19). Their study defined MAGs as high-quality based on the presence of a near-full complement of rRNAs, tRNAs and single-copy protein-coding genes. CAZymes were predicted using the amino acid sequences of predicted genes of high-quality MAGs that served as the input into the run_dbcan.py (v2.0.11) program (39) and were compared with the dbCAN database V8 (40) using HMMER 3.3 (41). CAZymes predicted with a confidence threshold E value ≤ 1E-20 were considered correctly annotated and used further; genes annotated to CAZymes, but without known previous occurrence in bacteria or archaea were omitted. Prediction of function and substrate specificity of CAZyme families or subfamilies was performed based on a review of activities assigned to CAZymes with known structures (characterized enzymes) in the CAZy database (http://www.cazy.org) (6) and manually curated. The CAZyme families (or subfamilies where present) were grouped based on their main characterized enzymatic activities into 14 target carbohydrate classes or general activities according to their main target: (i) cellulases (acting on cellulose), (ii) xylanases/xyloglucanases (acting on main chain of xylans and xyloglucans from plant biomass), (iii) β-glucosidases/β-xylosidases (acting on oligomers from plant origin such as cellobiose and xylobiose), (iv) chitinases/chitosanases (acting on chitin, chitobiose and chitosan), (v) α-glucanases (acting on glucans linked by α-glycosidic bonds), (vi) β-glucanases (acting in glucans linked by different β-linkages), (vii) mannanases (acting on polymers and dimers of mannan), (viii) arabinogalactanases (acting on the links between galactans and arabinans in arabinogalactans from plant origin), (ix) other hemicellulases (acting on side groups such as arabinoses, galactoses or acetyl groups present in hemicelluloses), (x) pectinases (acting on pectin from plant origin), (xi) carragenases/agarases (acting in several polymers from algal origin), (xii) peptidoglycanases (degrading peptidoglycan), (xiii) glycoconjugate-degrading enzymes (CAZymes acting on glycoproteins, glycolipids or proteoglycans), and (xiv) lytic polysaccharide monoxygenases (LPMOs). The CAZyme families/subfamilies belonging to each of these classes or general activities are shown in detail in Table S3.
The CAZyme families/subfamilies were grouped according to the main enzymatic activities characterized in the family/subfamily into 14 different classes or general activities. Download Table S3, DOCX file, 0.04 MB (41KB, docx) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Phylogenetic tree construction.
Nucleotide sequences of those individual MAGs that contained annotated 16S rRNA gene sequences served as the input to RNAMMER 1.2 (42). In total, 7,162 of the high-quality MAGs contained one copy of 16S rRNA; the rest of the high-quality MAGs with multiple variants of 16S rRNA (n = 207) and with missing 16S rRNA (n = 1,774) were omitted from further phylogenetic analyses. Sequences of MAGs belonging to the same phylum were aligned with MAFFT (43). Phylogenetic trees were constructed via the maximum likelihood method (ML) using a Kimura 2-parameter Model (K2) and a discrete gamma distribution with invariant sites (G+I) (bootstrap confidence levels determined by 500 bootstrap replications) with the software package MEGA7 (44).
Statistical analyses.
Source metagenomes and taxonomy of MAGs were retrieved from published catalog (19) and used to calculate the distribution of CAZymes across habitats and taxonomic groups. The influence of MAG taxonomy and habitat occurrence on CAZyme counts was tested using permutational multivariate analysis of variance in vegan (v2.5 to 6) function adonis (45), and CAZy families representing >0.1% of all identified CAZymes were considered for principal coordinate analysis. Their counts in individual MAGs were normalized using function decostand and we used the “normalize” option, which made the sum of squares for each MAG equal to one, to construct a dissimilarity matrix with Euclidean distances that served as the input into the cmdscale function in the package stats (v4.0.2) (46). Ellipses representing group centroids were drawn for each phylum or habitat. The differences between the main activities across the habitats were also tested by permutational multivariate analysis of variance using pairwise.adonis(), a wrapper function for multilevel pairwise comparison using adonis in vegan. Signal peptides in the annotated CAZymes were detected using SignalP 6.0 (47). The phylosignal package (v1.3) (48) was used to calculate phylogenetic conservatism within bacterial phyla using Newick-formatted 16S rRNA phylogenetic trees and lists of related CAZymes. We used the “phyloSignal” function for estimating Abouheif's Cmean (49) statistics.
We applied a consenTRAIT analysis (26) using the package castor (v1.6.4) (50) to calculate the mean phylogenetic depth (τD) at which traits (i.e., the ability to decompose certain class of carbohydrates) are conserved across clades in the phylogenetic trees of the bacterial phyla. The consenTRAIT algorithm identifies phylogenetic clades that are positive for the trait and calculates the average depth of those clades from a phylogenetic tree. The average phylogenetic depth of the positive clades was compared with the same values calculated after randomizing the responses among the tips 1,000 times to determine whether the trait and phylogeny were significantly nonrandomly associated. The probability of phylogenetic conservation (nonrandomness) of the trait distribution was calculated as the fraction of simulated τD values that were greater than or equal to the observed τD. To obtain a measure of the sequence identity of organisms with a positive clade, τD was multiplied by two and then subtracted from 1. These values are comparable to a cutoff for defining operational taxonomic units (11).
Figure generation.
Manuscript figures were generated using custom R scripts (46), Rstudio (51), tidyverse (52), Inkscape (https://inkscape.org/), and iTOL (53).
Data availability.
The MAG data used in this study were published in the previous paper (19). Bulk download for their 52,515 MAGs is available at https://genome.jgi.doe.gov/GEMs and https://portal.nersc.gov/GEM. Annotation of CAZymes in the high-quality MAGs is available at Figshare, https://doi.org/10.6084/m9.figshare.16435581. The code for reproducing gene calling and the annotation of CAZymes is provided at https://github.com/TlaskalV/Global-CAZymes.
ACKNOWLEDGMENTS
This work was supported by the Czech Science Foundation (22-30769S). U.N.D.R. was supported by the bilateral project between the Czech Academy of Sciences and DAAD (DAAD-20-05).
R.L.-M. and P.B. jointly conceived the study and developed the experimental design. R.L.-M., V.T., and U.N.D.R. performed the annotation of MAGs and analyzed the results. R.L.M. wrote the draft of the manuscript with the contribution of all coauthors. All authors contributed to manuscript revision and approved the final version of the manuscript.
The authors declare that they have no conflicts of interest to declare.
Contributor Information
Rubén López-Mondéjar, Email: rubenlopezmondejar@gmail.com.
Matthias Hess, University of California, Davis.
REFERENCES
- 1.Ogle K. 2018. Microbes weaken soil carbon sink. Nature 560:32–33. doi: 10.1038/d41586-018-05842-2. [DOI] [PubMed] [Google Scholar]
- 2.Cavicchioli R, Ripple WJ, Timmis KN, Azam F, Bakken LR, Baylis M, Behrenfeld MJ, Boetius A, Boyd PW, Classen AT, Crowther TW, Danovaro R, Foreman CM, Huisman J, Hutchins DA, Jansson JK, Karl DM, Koskella B, Mark Welch DB, Martiny JBH, Moran MA, Orphan VJ, Reay DS, Remais JV, Rich VI, Singh BK, Stein LY, Stewart FJ, Sullivan MB, van Oppen MJH, Weaver SC, Webb EA, Webster NS. 2019. Scientists' warning to humanity: microorganisms and climate change. Nat Rev Microbiol 17:569–586. doi: 10.1038/s41579-019-0222-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jansson JK, Hofmockel KS. 2020. Soil microbiomes and climate change. Nat Rev Microbiol 18:35–46. doi: 10.1038/s41579-019-0265-7. [DOI] [PubMed] [Google Scholar]
- 4.Arnosti C. 2011. Microbial extracellular enzymes and the marine carbon cycle. Annu Rev Mar Sci 3:401–425. doi: 10.1146/annurev-marine-120709-142731. [DOI] [PubMed] [Google Scholar]
- 5.El Kaoutari A, Armougom F, Gordon JI, Raoult D, Henrissat B. 2013. The abundance and variety of carbohydrate-active enzymes in the human gut microbiota. Nat Rev Microbiol 11:497–504. doi: 10.1038/nrmicro3050. [DOI] [PubMed] [Google Scholar]
- 6.Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–495. doi: 10.1093/nar/gkt1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Levasseur A, Drula E, Lombard V, Coutinho PM, Henrissat B. 2013. Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes. Biotechnol Biofuels 6:41. doi: 10.1186/1754-6834-6-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Horn SJ, Vaaje-Kolstad G, Westereng B, Eijsink VG. 2012. Novel enzymes for the degradation of cellulose. Biotechnol Biofuels 5:45. doi: 10.1186/1754-6834-5-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vazquez-Baeza Y, Gonzalez A, Morton JT, Mirarab S, Zech Xu S, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight RC, Earth Microbiome Project Consortium . 2017. Earth Microbiome Project, a communal catalogue reveals Earth's multiscale microbial diversity. Nature 551:457–463. doi: 10.1038/nature24621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Flemming HC, Wuertz S. 2019. Bacteria and archaea on Earth and their abundance in biofilms. Nat Rev Microbiol 17:247–260. doi: 10.1038/s41579-019-0158-9. [DOI] [PubMed] [Google Scholar]
- 11.Zimmerman AE, Martiny AC, Allison SD. 2013. Microdiversity of extracellular enzyme genes among sequenced prokaryotic genomes. ISME J 7:1187–1199. doi: 10.1038/ismej.2012.176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Berlemont R, Martiny AC. 2015. Genomic potential for polysaccharides deconstruction in bacteria. Appl Environ Microbiol 81:1513–1519. doi: 10.1128/AEM.03718-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Berlemont R, Martiny AC. 2016. Glycoside hydrolases across environmental microbial communities. PLoS Comput Biol 12:e1005300. doi: 10.1371/journal.pcbi.1005300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Martiny AC. 2019. High proportions of bacteria are culturable across major biomes. ISME J 13:2125–2128. doi: 10.1038/s41396-019-0410-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Steen AD, Crits-Christoph A, Carini P, DeAngelis KM, Fierer N, Lloyd KG, Cameron Thrash J. 2019. High proportions of bacteria and archaea across most biomes remain uncultured. ISME J 13:3126–3130. doi: 10.1038/s41396-019-0484-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Parks DH, Rinke C, Chuvochina M, Chaumeil PA, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW. 2017. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542. doi: 10.1038/s41564-017-0012-7. [DOI] [PubMed] [Google Scholar]
- 17.Adam PS, Borrel G, Brochier-Armanet C, Gribaldo S. 2017. The growing tree of Archaea: new perspectives on their diversity, evolution and ecology. ISME J 11:2407–2425. doi: 10.1038/ismej.2017.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Castelle CJ, Brown CT, Anantharaman K, Probst AJ, Huang RH, Banfield JF. 2018. Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations. Nat Rev Microbiol 16:629–645. doi: 10.1038/s41579-018-0076-2. [DOI] [PubMed] [Google Scholar]
- 19.Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, Wu D, Paez-Espino D, Chen IM, Huntemann M, Palaniappan K, Ladau J, Mukherjee S, Reddy TBK, Nielsen T, Kirton E, Faria JP, Edirisinghe JN, Henry CS, Jungbluth SP, Chivian D, Dehal P, Wood-Charlson EM, Arkin AP, Tringe SG, Visel A, Consortium IMD, Woyke T, Mouncey NJ, Ivanova NN, Kyrpides NC, Eloe-Fadrosh EA, IMG/M Data Consortium . 2021. A genomic catalog of Earth's microbiomes. Nat Biotechnol 39:499–509. doi: 10.1038/s41587-020-0718-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Peng X, Wilken SE, Lankiewicz TS, Gilmore SP, Brown JL, Henske JK, Swift CL, Salamov A, Barry K, Grigoriev IV, Theodorou MK, Valentine DL, O'Malley MA. 2021. Genomic and functional analyses of fungal and bacterial consortia that enable lignocellulose breakdown in goat gut microbiomes. Nat Microbiol 6:499–511. doi: 10.1038/s41564-020-00861-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Woodcroft BJ, Singleton CM, Boyd JA, Evans PN, Emerson JB, Zayed AAF, Hoelzle RD, Lamberton TO, McCalley CK, Hodgkins SB, Wilson RM, Purvine SO, Nicora CD, Li C, Frolking S, Chanton JP, Crill PM, Saleska SR, Rich VI, Tyson GW. 2018. Genome-centric view of carbon processing in thawing permafrost. Nature 560:49–54. doi: 10.1038/s41586-018-0338-1. [DOI] [PubMed] [Google Scholar]
- 22.Lopez-Mondejar R, Algora C, Baldrian P. 2019. Lignocellulolytic systems of soil bacteria: a vast and diverse toolbox for biotechnological conversion processes. Biotechnol Adv 37:107374. doi: 10.1016/j.biotechadv.2019.03.013. [DOI] [PubMed] [Google Scholar]
- 23.Doud DFR, Bowers RM, Schulz F, De Raad M, Deng K, Tarver A, Glasgow E, Vander Meulen K, Fox B, Deutsch S, Yoshikuni Y, Northen T, Hedlund BP, Singer SW, Ivanova N, Woyke T. 2020. Function-driven single-cell genomics uncovers cellulose-degrading bacteria from the rare biosphere. ISME J 14:659–675. doi: 10.1038/s41396-019-0557-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Graham ED, Tully BJ. 2021. Marine Dadabacteria exhibit genome streamlining and phototrophy-driven niche partitioning. ISME J 15:1248–1256. doi: 10.1038/s41396-020-00834-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhao Z, Baltar F, Herndl GJ. 2020. Linking extracellular enzymes to phylogeny indicates a predominantly particle-associated lifestyle of deep-sea prokaryotes. Sci Adv 6:eaaz4354. doi: 10.1126/sciadv.aaz4354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Martiny AC, Treseder K, Pusch G. 2013. Phylogenetic conservatism of functional traits in microorganisms. ISME J 7:830–838. doi: 10.1038/ismej.2012.160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Royalty TM, Steen AD. 2019. Quantitatively partitioning microbial genomic traits among taxonomic ranks across the microbial tree of life. mSphere 4:e00446-19. doi: 10.1128/mSphere.00637-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Martiny JB, Jones SE, Lennon JT, Martiny AC. 2015. Microbiomes in light of traits: a phylogenetic perspective. Science 350:aac9323. doi: 10.1126/science.aac9323. [DOI] [PubMed] [Google Scholar]
- 29.Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Vega Thurber RL, Knight R, Beiko RG, Huttenhower C. 2013. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31:814–821. doi: 10.1038/nbt.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sun ZZ, Ji BW, Zheng N, Wang M, Cao Y, Wan L, Li S, Rong JC, He HL, Chen XL, Zhang YZ, Xie BB. 2021. Phylogenetic distribution of polysaccharide-degrading enzymes in marine bacteria. Front Microbiol 12:658620. doi: 10.3389/fmicb.2021.658620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Talamantes D, Biabini N, Dang H, Abdoun K, Berlemont R. 2016. Natural diversity of cellulases, xylanases, and chitinases in bacteria. Biotechnol Biofuels 9:133. doi: 10.1186/s13068-016-0538-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Himmel ME, Ding SY, Johnson DK, Adney WS, Nimlos MR, Brady JW, Foust TD. 2007. Biomass recalcitrance: engineering plants and enzymes for biofuels production. Science 315:804–807. doi: 10.1126/science.1137016. [DOI] [PubMed] [Google Scholar]
- 33.Luis AS, Martens EC. 2018. Interrogating gut bacterial genomes for discovery of novel carbohydrate degrading enzymes. Curr Opin Chem Biol 47:126–133. doi: 10.1016/j.cbpa.2018.09.012. [DOI] [PubMed] [Google Scholar]
- 34.Hess M, Sczyrba A, Egan R, Kim TW, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, Mackie RI, Pennacchio LA, Tringe SG, Visel A, Woyke T, Wang Z, Rubin EM. 2011. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331:463–467. doi: 10.1126/science.1200387. [DOI] [PubMed] [Google Scholar]
- 35.Becker S, Tebben J, Coffinet S, Wiltshire K, Iversen MH, Harder T, Hinrichs KU, Hehemann JH. 2020. Laminarin is a major molecule in the marine carbon cycle. Proc Natl Acad Sci USA 117:6599–6607. doi: 10.1073/pnas.1917001117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Žifčáková L, Větrovský T, Lombard V, Henrissat B, Howe A, Baldrian P. 2017. Feed in summer, rest in winter: microbial carbon utilization in forest topsoil. Microbiome 5:122. doi: 10.1186/s40168-017-0340-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Battin TJ, Luyssaert S, Kaplan LA, Aufdenkampe AK, Richter A, Tranvik LJ. 2009. The boundless carbon cycle. Nature Geosci 2:598–600. doi: 10.1038/ngeo618. [DOI] [Google Scholar]
- 38.Wehrli B. 2013. Conduits of the carbon cycle. Nature 503:346–347. doi: 10.1038/503346a. [DOI] [PubMed] [Google Scholar]
- 39.Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, Busk PK, Xu Y, Yin Y. 2018. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 46:W95–W101. doi: 10.1093/nar/gky418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Huang L, Zhang H, Wu P, Entwistle S, Li X, Yohe T, Yi H, Yang Z, Yin Y. 2018. dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation. Nucleic Acids Res 46:D516–D521. doi: 10.1093/nar/gkx894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS Comput Biol 7:e1002195. doi: 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Katoh K, Rozewicki J, Yamada KD. 2019. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 20:1160–1166. doi: 10.1093/bib/bbx108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Oksanen J, Blanchet F, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin P, O'Hara R, Simpson G, Solymos P, Stevens M, Szoecs E, Wagner H. 2018. Vegan: community ecology package. R package version 2.5–2. https://cran.r-project.org/web/packages/vegan/index.html.
- 46.RCoreTeam. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing; [https://www.R-project.org/]. [Google Scholar]
- 47.Teufel T, Almagro Armenteros JJ, Johansen AR, Gíslason MH, Pihl SI, Tsirigos KD, Winther O, Brunak S, Von Heijne G, Nielsen H. 2022. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 40:1023–1025. doi: 10.1038/s41587-021-01156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Keck F, Rimet F, Bouchez A, Franc A. 2016. phylosignal: an R package to measure, test, and explore the phylogenetic signal. Ecol Evol 6:2774–2780. doi: 10.1002/ece3.2051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Abouheif E. 1999. A method for testing the assumption of phylogenetic independence in comparative data. Evol Ecol Res 1:895–909. [Google Scholar]
- 50.Louca S, Doebeli M. 2018. Efficient comparative phylogenetics on large trees. Bioinformatics 34:1053–1055. doi: 10.1093/bioinformatics/btx701. [DOI] [PubMed] [Google Scholar]
- 51.Racine JS. 2012. RStudio: a Platform-Independent IDE for R and Sweave. J Appl Econ 27:167–172. doi: 10.1002/jae.1278. [DOI] [Google Scholar]
- 52.Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. 2019. Welcome to the tidyverse. JOSS 4:1686. doi: 10.21105/joss.01686. [DOI] [Google Scholar]
- 53.Letunic I, Bork P. 2019. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 47:W256–W259. doi: 10.1093/nar/gkz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Diversity of CAZyme families grouped by activity. Bars indicate the number of genes belonging to that CAZy family. In dark blue, the total number of CAZymes; and in light blue, only the extracellular CAZymes (sequences with signal peptide). Numbers under the figures indicate the percentage of extracellular CAZymes involved in the degradation of that carbohydrate. Download FIG S1, PDF file, 0.2 MB (181.9KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Occurrence of carbohydrate-active enzymes in archaeal phyla. Phyla containing at least one MAG are shown in bold. The columns and colors indicate from left to right: (i) number of MAGs in the phylum, (ii) content of 7 CAZyme classes, (iii) genomic potential for the degradation of selected carbohydrates, and (iv) habitats from which the MAGs originated. Tree obtained from the Genome Taxonomy Database (GTDB) (https://gtdb.ecogenomic.org/). Download FIG S2, PDF file, 0.2 MB (195.2KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Occurrence of carbohydrate-active enzymes in minor bacterial phyla. Numbers in brackets represent numbers of MAGs, and percentages indicate the share of MAGs within phyla containing CAZymes with each specific carbohydrate target. Download FIG S3, PDF file, 2.1 MB (2.1MB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Phylogenetic conservation of carbohydrate utilization traits in microorganisms. Heatmap showing the mean genetic depth (τD) of the consensus clades sharing the ability to degrade different carbohydrates in the selected phyla (phyla including more than 20 retrieved MAGs are shown). Only significant values (P < 0.05) of the trait-phylogeny association are shown. Download FIG S4, PDF file, 0.2 MB (245.5KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Significance of the phylogenetic distance of the genes involved in the utilization of different carbohydrates in several bacterial phyla. Abouheif's Cmean was utilized to assess the phylogenetic signal of quantitative variables (P values <0.05 are considered statistically significant). Activities statistically significant in each phylum are marked in bold. Download Table S1, DOCX file, 0.03 MB (29.1KB, docx) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Extent of the phylogenetic conservation of carbohydrate utilization in bacterial phyla based on the consenTRAIT algorithm. The mean genetic depth (τD) of the consensus clades sharing the ability to degrade certain carbohydrate class, the significance (Npermutations = 1,000, P-value) of the trait-phylogeny association and the percentage of divergence in the 16S rRNA gene for significantly conserved traits. Trait and values in bold indicate significant trait conservation (P < 0.05). Download Table S2, DOCX file, 0.04 MB (36.1KB, docx) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Relative abundance of genes from the top 30 CAZyme families in selected habitats. We considered all CAZymes with assigned functions. Dots display average relative abundance across all habitats. Download FIG S5, PDF file, 0.2 MB (205.3KB, pdf) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
The CAZyme families/subfamilies were grouped according to the main enzymatic activities characterized in the family/subfamily into 14 different classes or general activities. Download Table S3, DOCX file, 0.04 MB (41KB, docx) .
Copyright © 2022 López-Mondéjar et al.
This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.
Data Availability Statement
The MAG data used in this study were published in the previous paper (19). Bulk download for their 52,515 MAGs is available at https://genome.jgi.doe.gov/GEMs and https://portal.nersc.gov/GEM. Annotation of CAZymes in the high-quality MAGs is available at Figshare, https://doi.org/10.6084/m9.figshare.16435581. The code for reproducing gene calling and the annotation of CAZymes is provided at https://github.com/TlaskalV/Global-CAZymes.





