Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2022 Nov 17;88(23):e01755-22. doi: 10.1128/aem.01755-22

Human Gut Metagenomes Encode Diverse GH156 Sialidases

Evan Mann a, Shahrokh Shekarriz b, Michael G Surette a,b,
Editor: Danilo Ercolinic
PMCID: PMC9746317  PMID: 36394327

ABSTRACT

The intestinal lining is protected by a mucous barrier composed predominantly of complex carbohydrates. Gut microbes employ diverse glycoside hydrolases (GHs) to liberate mucosal sugars as a nutrient source to facilitate host colonization. Intensive catabolism of mucosal glycans, however, may contribute to barrier erosion, pathogen encroachment, and inflammation. Sialic acid is an acidic sugar featured at terminal positions of host glycans. Characterized sialidases from the microbiome belong to the GH33 family, according to CAZy (Carbohydrate-Active enZYmes Database). In 2018 a functional metagenomics screen using thermal spring DNA uncovered the founding member of the GH156 sialidase family, the presence of which has yet to be reported in the context of the human microbiome. A subset of GH156 sequences from the CAZy database containing key sialidase residues was used to build a hidden Markov model. HMMsearch against public databases revealed ~10× more putative GH156 sialidases than currently cataloged by CAZy. Represented phyla include Bacteroidota, Verrucomicrobiota, and Firmicutes_A from human microbiomes, all of which play notable roles in carbohydrate fermentation. Analyses of metagenomic data sets revealed that GH156s are frequently encoded in metagenomes, with a greater variety and abundance of GH156 genes observed in traditional hunter-gatherer or agriculturalist societies than in industrialized societies, particularly relative to individuals with inflammatory bowel disease (IBD). Nineteen GH156s were recombinantly expressed and assayed for sialidase activity. The five GH156 sialidases identified here share limited sequence identity to each other or the founding GH156 family member and are representative of a large subset of the family.

IMPORTANCE Sialic acids occupy terminal positions of human glycans where they act as receptors for microbes, toxins, and immune signaling molecules. Microbial enzymes that remove sialic acids, sialidases, are abundant in the human microbiome where they may contribute to shaping the microbiota community structure or contribute to pathology. Furthermore, sialidases have proven to hold therapeutic potential for cancer therapy. Here, we examined the sequence space of a sialidase family of enzymes, GH156, previously unknown in the human gut environment. Our analyses suggest that human populations with disparate dietary practices harbor distinct varieties and abundances of GH156-encoding genes. Furthermore, we demonstrate the sialidase activity of 5 gut-derived GH156s. These results expand the diversity of sialidases that may contribute to host glycan degradation, and these sequences may have biotechnological or clinical utility.

KEYWORDS: sialic acid, sialidase, glycoside hydrolase, microbiome, inflammatory bowel disease, GH156, human microbiome

INTRODUCTION

The human colon houses trillions of bacterial cells from hundreds to thousands of distinct species representing several diverse phyla (1). This consortium, the microbiota, is critical for host immune system development, nutrient acquisition, and colonization resistance but has also been implicated in the onset of noninfectious chronic conditions (2, 3). Inflammatory bowel disease (IBD), including Crohn’s disease (CD) and ulcerative colitis (UC), is one such condition which is increasing in incidence and prevalence globally, particularly in postindustrialized societies (4). IBD is characterized as an uncontrolled and vigorous immune response to the gut microbiota which can occur in predisposed individuals. However, the direct roles that the microbiota play in IBD etiology have yet to be fully resolved (5, 6), and it has become apparent that the host glycome is a critical factor in IBD onset and progression (710).

Collectively, N- and O-glycans on host proteins play myriad roles, notably in cell-to-cell signaling, host-microbe interactions, and immune system regulation (11), and the enzymes that contribute to their degradation are recognized virulence factors (1215). The colonic epithelium is protected from direct assault by a glycocalyx overlaid by a carbohydrate-rich mucous bilayer embedded with antimicrobial proteins (16). The mucous barrier is critical to gut health as evidenced by the observation that Muc2−/− and Muc2-glycosylation-impaired mice develop colitis spontaneously (1720). While the inner mucous layer is sterile (21), the outer mucous layer is an extensively colonized microbial niche (16, 22, 23). Critically, the mucous layer is often thinner during inflammation, and bacteria encroach toward the epithelial surface where they are better positioned to induce an immune response (24, 25), akin to mice with genetically compromised mucous barriers.

Host glycans, composing the bulk of the mucus, offer a plentiful nutrient source for the microbes capable of depolymerizing them into constituent sugars (2629), as well as those that scavenge the mucosal sugars released into the environment. Overall, host glycan consumption is essential to the maintenance and persistence of a stable microbiota capable of providing colonization resistance to enteric pathogens, and it has been reported that mucosal glycan structures, at least insofar as blood group antigen type and secretor status are concerned, can help select microbial community composition (3035). However, the repertoire of glycans produced by the host (and accessible to the gut microbiota) is altered in response to inflammatory signals (36, 37), which act as complementary immunoregulatory signals (7, 8). These altered glycoforms effectively shift the mucosal nutrient availability, which may in turn alter the microbiota community structure. The rate of host-glycan turnover could be important for mucous layer integrity, with overconsumption of mucin glycans contributing to inflammation-associated barrier erosion. The microbiota of mice fed fiber-deficient diets increasingly relies on host glycans (38, 39), which compromises the mucous barrier and sensitizes the host to colitis (25, 40).

Bacterial catabolism of glycans occurs in a coordinated stepwise manner, with individual glycoside hydrolases (GHs) sequentially removing specific monosaccharides or oligosaccharides until completely metabolized within the cell (27, 4143). GHs are classified within the CAZy database (http://www.cazy.org/) based on primary sequence similarity (4447). There are presently >165 GH families defined in the CAZy database, with each member of a given family possessing a conserved three-dimensional fold and set of catalytic residues. Family membership does not strictly define substrate specificity (in terms of sugar and linkage types), but in many families reported activities are limited, depending on the size and sequence diversity of the family. Enzymes from several GH families have been demonstrated to contribute to host glycan breakdown (27, 41, 4852), and new N- and O-glycan-targeting enzymes are being actively reported (5356). Together, these factors hinder bioinformatic annotation of a given GH’s macromolecular target and a microbe’s carbohydrate preferences. However, grouping closely related family members into subfamilies, clades, or clusters provides more robust substrate predictions for distinct “monofunctional groups” (5763).

Sialic acids occupy terminal positions of vertebrate N- and O-glycans predominantly via α2-3 and α2-6 linkages, and they are often the first components removed by microbiota GHs before the glycan core can be accessed (48). Sialic acid refers to a group of 9-carbon sugars characterized by a C1 carboxylic acid, an exocyclic 3-carbon side chain, and, often, (acetyl) modified amino and hydroxyl groups. The variety of distinguishing epitopes make sialic acids ideal for information transfer, notably in immune signaling processes (e.g., by Siglecs), or use as receptors for enteric pathogens and toxins. The predominant sialic acid on human glycans is 5-(acetylamino)-3,5-dideoxy-d-glycero-α-d-galacto-non-2-ulopyranosonic acid (Neu5Ac) (64). Sialidases, the GHs that remove sialic acids, are important for colonization and are often considered virulence factors. Free sialic acid, released into the environment by commensals, is implicated in enteropathogen expansion due to Neu5Ac foraging (6567). To date, the only gut bacterial sialidases implicated in N- and O-glycan breakdown belong to the CAZy family GH33 (48, 68, 69). GH33 enzymes display a six-blade β-propeller fold, and the active site includes a trio of Arg residues stabilizing the Neu5Ac carboxylate group, a Glu/Tyr/Asp catalytic triad, and a hydrophobic pocket to accommodate the amino-linked (acetyl) side chain at position 5.

The sialidase family GH156 was established in 2018 (70). The founding member of the family, subsequently designated EnvSia156, was identified with a functional metagenomics approach, using DNA isolated from a thermal spring that could not be traced back to a known taxon. This enzyme displayed α2-3 and α2-6 Neu5Ac and Neu5Gc sialidase activity on a variety of substrates, including glycoproteins, glycolipids, and sialolactose. Interestingly, GH156 catalyzes an inverting reaction, while family GH33 strictly contains retaining enzymes. Successive work revealed that EnvSia156 displays a (β/α)8-barrel fold, distinct from the six-blade β-propeller fold of GH33 family members, further illustrating the absence of shared phylogeny (71). Cocrystallization of the enzyme in complex with sialic acids strongly suggested candidates for catalytic residues. Here, we investigated the hypothesis that GH156s from human gut metagenomes contribute to the degradation of host sialosides.

RESULTS

Construction of a GH156 sialidase pHMM.

A profile hidden Markov model (pHMM) for the GH156 family was created to enable search of publicly available protein databases. To facilitate construction of the model, we leveraged the available structural information for EnvSia156 as well as the GH156 sequences cataloged in the CAZy database (4446). The proposed catalytic site, based on cocrystal structures with products and substrate analogs, includes a catalytic D14-H134 dyad as well as a conserved pair of Arg residues (R129 and R202) and an Asn (N346) positioned to stabilize the carboxylate group (71). The 51 GH156 sequences in the CAZy database were aligned, and those sequences lacking the catalytic Asp-His dyad or carboxylate-coordinating Arg-Arg-Asn triad, or suitable (i.e., polar) substitutions, were pruned from the alignment. We reasoned that this would favor the identification of homologs that hydrolyze sugars with a C1 carboxylate group, such as sialic acids, over potential GH156 family members that may target neutral sugars. The 32 CAZy sequences that met these criteria were truncated to the boundaries of the EnvSia156 (β/α)8-barrel domain to construct the pHMM (72). A sequence logo of the pHMM depicts the high weight of the catalytic and carboxylate-binding positions in the model (Fig. 1A).

FIG 1.

FIG 1

Diverse GH156s from distinct phyla were identified in global human gut metagenomes. (A) A sequence logo of the HMM used for database searches denoting the relative importance of proposed catalytic residues (red) and carboxylate-binding residues (blue), numbered based on the EnvSia156 sequence. (B) Taxonomic distribution of nonredundant GH156 proteins identified, highlighting the number of unique sequences from the human gut protein catalog. (C) Regional distribution of GH156-encoding UHGG MAGs, demonstrating that GH156s are globally distributed. (D) Sequence similarity network of nonredundant GH156 protein domain sequences (nodes) connected by edges indicating sequence identity of >45%; nodes in purple indicate the sequence is derived from the UHGP-95 catalog. Nodes clustered in the red circle are derived from commonly identified Bacteroidota genomes (see Table S3 in the supplemental material). The labeled nodes display sialidase activity (Fig. 5): GUT_GENOME011168_01282, GG_01282; GUT_GENOME049867_01444, GG_01444; GUT_GENOME096473_04400, GG_04400; GUT_GENOME246840_00663, GG_00663; and GUT_GENOME258514_02417, GG_02417. (E) Histogram of GH156 protein lengths. (F) Histogram of catalytic domain pairwise identity of GH156 family members. (G) Sequence conservation mapped to the surface of EnvSia156 (PDB 6S00) in complex with Neu5Ac (yellow sticks) showing high conservation (magenta) around the anomeric carbon and carboxylate and low conservation (cyan) around the exocyclic acetamido and glycerol groups.

Identification of GH156s in protein databases.

Our custom HMM was used to search three large public databases: UniProtKB’s over 225 million protein sequences (73), the Genomes from Earth’s Microbiomes protein catalog (GEM) consisting of over 5.7 million protein sequences derived from environmental metagenome sequencing projects (74), and the Unified Human Gastrointestinal Protein catalog (UHGP-95), consisting of 20.2 million nonredundant protein sequences translated from the Unified Human Gastrointestinal Genome database (UHGG) and clustered at 95% identity (75). GH156s were identified across diverse taxa, with 18 phyla represented. The most frequent phylum was Planctomycetes with 212 GH156 sequences exclusively from environmental sources (Fig. 1B). GH156s from the UHGP-95 catalog represent enzymes that may be active against human sialosides in the gastrointestinal tract. Ninety sequences were derived from the UHGP-95 catalog: 45 from Firmicutes_A, 30 from Verrucomicrobiota, and 14 from Bacteroidota species, as well as a single proteobacterium (Methylobacterium methylobacterium). The only Verrucomicrobiota member that could be assigned at the species level was Victivallis vadensis. Likewise, a single Firmicutes member was assigned at the species level: Mitsuokella jalaludinii. This reflects that this family of enzymes in the human microbiome is predominantly found in poorly studied, low-abundance, or rare taxa. Globally, over half of the GH156-containing metagenome-assembled genomes (MAGs) from UHGG were assigned to the same gene in various strains of Parabacteroides merdae, including 75% of those from North America.

Analysis of the geographic distribution of GH156-containing MAGs from UHGG suggests that GH156s are globally distributed (Fig. 1C). While assembled GH156-containing MAGs are primarily from Asia, Europe, and North America, less sampled continents are the source of a roughly proportional number of GH156-containing MAGs, relative to the total number of MAGs that were assembled from a given region. It is likely that additional sampling of these understudied regions will reveal further GH156 diversity.

To visually evaluate GH156 sequence diversity, we generated a series of sequence similarity networks (SSN) using various pairwise identity cutoffs. The maximum number of clusters with at least 10 sequences was at a 45% identity threshold, where there are 12 clusters with at least 10 sequences (Fig. 1D), and clustering was not strictly along taxonomic lines (see Fig. S1A in the supplemental material). At >60% identity, an approximate benchmark for intracluster substrate uniformity for glycoside hydrolases (57, 58, 76), there are only four clusters with at least 10 sequences, and a preponderance of singletons, doublets, and triplets. Relatively few UHGP-95 sequences are in the same cluster as EnvSia156, providing limited support for a conserved function across the entire family. Collectively, the GH156s identified here share low sequence similarity, with a mean pairwise sequence identity (minimum 100-amino-acid [aa] alignment) of 28.2% (Fig. 1F). This degree of sequence heterogeneity likely reflects unequal global availability of gut microbiome sequence data but may also indicate that GH156s recognize diverse substrates. This is supported by sequence variability in the substrate binding pocket around sialic acid NAc and C7-C9 functional groups (Fig. 1G). However, the majority of Bacteroidetes sequences formed a single cluster with high sequence similarity (Fig. 1D, red circle), and these likely share a function across the different species.

GH156 enzymes are modular.

EnvSia165 possesses a C-terminal immunoglobulin-like domain, similar to some carbohydrate-binding modules, suggesting it likely plays a binding role. Indeed, most GH156s identified here were between 500 and 600 aa (Fig. 1E), consistent with a multidomain architecture with a carbohydrate-binding module. A subset of sequences were larger than 800 aa, and we therefore predicted that they might contain an additional enzymatic domain(s). The separate domains in modular GHs often act upon the same glycan. GH156s active against human sialosides would be expected to be associated with other modules with activities targeting linkages seen in human glycans. We therefore annotated the identified GH156s using dbCAN2 (77), which revealed several distinct architectures distributed throughout the SSN (Fig. S1B). Associated domains included those targeting sugars seen on human glycans, including potential fucosidases (GH141), β-galactosidases (GH16 [53] and GH165), mannosidases (GH92), sialidases (GH33), and O-acetylesterases (CE4 and CE15) (Table 1). CE4 family members are deacetylases with substrates including amino sugars, though none characterized to date target Neu5Ac. CE15s are sparsely studied but so far are uniformly glucuronyl O-methylesterases.

TABLE 1.

Domain arrangement of multifunctional GH156s

Identifier(s) Architecture
GUT_GENOME011991_00921, GEM_PF-490474 GH156-GH141
A0A1G1A1F0 GH156-GH16
A0A1G3AKI5, A0A4V0XR61, GEM_PF-332583 GH141-GH156
A0A2D5S979, A0A2E1KRC2, A0A2E7LA74, A0A2E8CZL7, A0A3L7UH35, GEM_PF-207205, GEM_PF-499370 GH156-GH165
A0A356EQQ9 GH33-GH156
GEM_PF-1731955 GH156-CE4
GEM_PF-26113 GH92-GH156
GEM_PF-354959 GH156-GH33
GEM_PF-390790 GH165-GH156
GEM_PF-598500 GH156-GH33
GEM_PF-668270 CE15-GH156

Genomes encoding GH156s also contain genes associated with degradation and metabolism of human glycans.

We reasoned that organisms that use GH156s to break down host glycans would be able to release and/or utilize other sugar units composing these glycans. We therefore searched representative GH156-encoding UHGG genomes, predominantly in the form of MAGs, for genes related to the hydrolysis and metabolism of sugars found in human N- and O-glycans (Table S1 and Fig. S2). Several factors limit the interpretability of these analyses. MAG completeness ranged from 52.72% to over 99% with a median of 90.53% (Table S2). Even MAGs with over 95% completeness may lack a quarter of the organism’s conserved core genes, such as those involved in sugar metabolism, and half of its variable accessory genes, such as those encoding GHs (78). Furthermore, homology-based annotation of protein functions by standard classifiers and default parameters is less reliable for distant relatives of commonly studied model organisms (i.e., Bacteroidota, Verrucomicrobiota, and Firmicutes_A lineages), as the sequences have had greater opportunity to diverge (79). To account for the incomplete nature of MAGs, we grouped GH156-containing genomes at the taxonomic family level and evaluated the proportion of genomes from the family to encode a given function (Fig. 2). After removal of families with fewer than 3 genomes, 2 Bacteroidota, 5 Firmicutes_A, and 3 Verrucomicrobiota families were analyzed.

FIG 2.

FIG 2

Genes involved in host glycan monosaccharide hydrolysis and catabolism are frequently identified in GH156-encoding genomes. Genomes are grouped at the taxonomic family level, and the proportion of genomes encoding a function is represented by opacity (percent genomes with that gene = percent opacity). The families are grouped by phyla.

In general, GH156-encoding organisms appear to be capable of participating in N- and O-glycan catabolism by releasing and/or metabolizing the various monosaccharide constituents. Interestingly, all of these taxonomic families contain genes for GH33, which commonly display sialidase activity, potentially implying that these organisms exploit structurally distinct sialosides from diverse dietary and/or host sources. N-Acetylneuraminate lyase (EC 4.1.3.3) catalyzes the first committed step in sialic acid catabolism. Its gene was identified in MAGs from 7 of the families across all three phyla, and each of the families lacking an N-acetylneuraminate lyase gene encoded either sialate O-acetylesterase (EC 3.1.1.53), which liberates acetate from free or glycosidically linked sialic acid, or N-acylglucosamine 2-epimerase (EC 5.1.3.8), which participates in sialic acid and N-acetylmannosamine catabolism.

Genes encoding mannose-6-P isomerase (EC 5.1.3.8) were represented across all genomes, as well as at least one gene encoding a GH family member associated with mannose release from N-glycans (especially GH92 and GH130), excepting f__UBA644. Notably, none of the three f__UBA644 genomes annotated exceeded 90% completeness (Table S2), providing a plausible explanation for the absence of genes linked to mannose metabolism.

N-Acetyl-d-glucosamine/N-acetyl-d-galactosamine-6-phosphate deacetylase (3.5.1.25) and glucosamine/galactosamine-6-phosphate deaminase (3.5.99.6) play critical roles in GlcNAc and GalNAc degradation, and genes encoding these functions were identified in all families, except for f__UBA644. As well, several GH families were frequently observed across the genomes examined. Prominent examples include members of the exo-acting GH20 family, which are critical for HexNAc removal from host glycans, the GH18 family, which includes endo-acting β-N-acetyl glucosaminidases active against the N-glycan chitobiose core, and the α-GalNAc hydrolases GH109 and GH129, members of which have not been extensively studied, though representative substrates include the blood group A antigen and mucin GalNAc-α1-Ser/Thr (48).

The most common genes identified dedicated to galactose catabolism included those encoding aldose 1-epimerase (5.1.3.3) of the Leloir pathway and galactonate dehydratase (EC 4.2.1.6). Commonly observed galactosidase-containing GH families include GH110, members of which have been implicated in group B blood antigen hydrolysis. Both fucosidase GH families with members active on host glycans, GH29 and GH95, were frequently detected, as were l-fucose degradation I pathway constituents. The Tannerellaceae genomes were notably devoid of fucose-metabolizing genes, though they encoded GH29 and GH95, suggesting that these organisms might release fucose into the environment to the benefit of cross-feeders. For example, Bacteroides thetaiotaomicron cannot metabolize Neu5Ac but uses sialidases to access the underlying glycan, the constituents of which it can utilize, and releases Neu5Ac in the process (56).

GH156-encoding genes are colocalized with genes encoding diverse GHs.

In both Gram-positive and Gram-negative taxa, genes encoding GHs are often located on the chromosome alongside other enzymes, transporters, and regulators involved in the saccharification of the same complex carbohydrate (27, 42). This polysaccharide utilization locus (PUL) architecture has been observed for the B. thetaiotaomicron breakdown of N- and O-glycans (80). We reasoned that colocalization of GH156-encoding genes with several genes encoding GHs assigned to families known to contribute to host glycan degradation (such as those in Fig. 2) would provide support for the hypothesis that GH156s contribute to host sialoglycan degradation. To this end, we annotated the GH-encoding genes 20 kb up- and downstream from the GH156-encoding genes (Fig. 3 and Fig. S3). While many GH156-encoding genes are colocalized with at least one other GH family gene potentially active against host glycans, none of the clusters examined display a complete complement of modules required for comprehensive degradation of a host glycan. Some Bacteroidota GH156s are encoded adjacent to and in the same orientation as GH92 genes. GH92s include α1-2, 3, and 6 mannosidases, and B. thetaiotaomicron GH92s have been implicated in host N-glycan breakdown (56). Other Bacteroidota GH families include GH141 (l-fucosidases) and GH106 (l-rhamnosidases). Genes in proximity to Firmicutes_A GH156s encode putative galactosidases (GH2 and GH16), endo-β-N-acetylglucosaminidases (GH163), exo-α-N-acetylgalactosaminidases (GH109), sialidases (GH33), fucosidases (GH95), and GH123, a family that includes a β-N-acetylgalactosaminidase active against host glycolipids. Notable GH families encoded alongside Verrucomicrobiota GH156 genes include exo-α-N-acetylgalactosaminidases (GH109), exo-α-galactosidases (GH110), and mannosidases (GH130).

FIG 3.

FIG 3

Representative genomic neighborhoods of GH156-encoding genes from the human gut microbiome. Organization of genes colocalized with UHGP-95-derived GH156-encoding genes from genes frequently occurring in metagenomes (Fig. 4) or with sialidase activity (Fig. 5) (or larger gene clusters with shared synteny [see Fig. S3 in the supplemental material]).

GH156 genes are more diverse and abundant in traditional populations than in Western populations.

To investigate the distribution of the most common GH156 genes from the human microbiome, we used publicly available metagenomic data on distinct human populations. Reads from three studies (8183) were mapped against GH156 domain nucleotide sequences. Metagenomic reads were obtained from IBD (CD or UC) patients or non-IBD controls from industrialized societies (Italy, Oklahoma, and Massachusetts) as well as from traditional hunter-gatherer (Hadza and Matses) or agriculturalist (Tunapuco) communities. Here, we designate the Hadza, Matses, and Tunapuco populations as “traditional,” as their cultural dietary practices are suspected to more closely resemble those of ancestral societies than those of most Western “industrialized” societies that rely on large-scale and mechanized food processing. We recognize that the Hadza, Matses, and Tunapuco are culturally distinct, and we do not intend to imply that their own societies have been free of innovation, technological or otherwise. Relative to Western industrialized societies, traditional lifestyles are typically correlated with a wider variety and greater abundance of dietary complex carbohydrates (from wild plants) (8284) and are believed to be associated with lower incidences of IBD (85). The Hadza metagenome encodes a correspondingly greater variety and abundance of carbohydrate-active enzymes (CAZymes) than do metagenomes of industrialized populations (84). Consistent with this, traditional forager and agriculturalist metagenomes contained a greater number of unique GH156-encoding genes (Fig. 4C). In turn, individuals with IBD had fewer such genes than healthy Western controls. The assortment of GH156s found in traditionalist metagenomes was distinct from those of Western individuals (Fig. 4A and B). While GH156 genes from Verrucomicrobiota are absent from CD and UC patients and most healthy controls, they are prevalent in the hunter-gatherer/farmer groups. The Firmicutes_A GH156 genes are also seen more frequently in the traditional-lifestyle groups than in the IBD groups, consistent with typical IBD dysbiosis of reduction in Firmicutes and Verrucomicrobiota and expansion of facultative anaerobes and Actinobacteria (86), which do not contain GH156 genes. Conversely, a single Bacteroidota GH156 from a Prevotella species is detected in traditional forager/agriculturalist populations, while the postindustrialized groups largely contain GH156s from a mix of Parabacteroides species. The “Bacteroidota” cluster (red circle, Fig. 4B) contains the four most frequently detected GH156s (Table S3). This includes the abovementioned Prevotella GH156 as well as the GH156 from P. merdae, which is one of 57 species in >90% of human gut microbiomes sampled globally (87). Two of the most prevalent GH156-encoding genes in traditional populations cluster along with EnvSia156 (Fig. 4B). The third frequently observed traditionalist GH156-encoding gene is in the Bacteroidota cluster along with all the GH156-encoding genes detected in Westernized populations with high frequency. The abundance of a given GH156 gene in a sample is estimated by the number of reads mapped to that gene. Hadza and Tunapuco metagenomes contained a greater abundance of GH156-encoding genes than the other groups examined (Fig. 4D), consistent with the prevalence observed for other GH families (8284).

FIG 4.

FIG 4

A greater variety and abundance of GH156s are associated with the metagenomes of individuals practicing traditional lifestyles than with those of individuals from industrialized societies or with IBD. (A) GH156s detected in each sample and their relative abundance: H, Hadza; M, Matses; T, Tunapuco; Bact, Bacteroidota; Firm, Firmicutes_A; Verr, Verrucomicrobiota. The 10 most prevalent GH156s are labeled. (B) GH156 SSN (Fig. 1D) colored to indicate the group in which a given gene is the most prevalent. The size is proportional to the proportion of group members with a given gene. (C and D) Number of distinct GH156s detected in each sample (C) and number of reads mapped to a GH156 gene in each sample (D). The asterisk denotes a P value less than 0.05.

Diverse intestinal GH156s exhibit sialidase activity.

To facilitate molecular cloning of genes encoding GH156s from the human intestinal tract, we mapped metagenomic raw reads from healthy volunteer fecal samples to the GH156 gene sequences as previously described. A total of 19 GH156-encoding genes were cloned into expression plasmids for recombinant protein expression and subsequent activity assays (Table S5). To evaluate sialidase activity, lysates from Escherichia coli expressing polyhistidine-tagged recombinant GH156s were incubated with the fluorogenic substrate 2′-(4-methylumbelliferyl)-α-d-N-acetylneuraminic acid (4MU-Neu5Ac) for 30 min at 37°C. A significant increase in fluorescence (due to hydrolysis of 4MU-Neu5Ac) relative to the control suggests that 5 of these are sialidases (Fig. 5): GG_01282 (GUT_GENOME011168_01282), GG_01444 (GUT_GENOME049867_01444), GG_04400 (GUT_GENOME096473_04400), GG_00663 (GUT_GENOME246840_00663), and GG_02417 (GUT_GENOME258514_02417). These 5 enzymes are distributed across four distinct SSN clusters, including the Bacteroidota cluster and two others previously without a verified sialidase (Fig. S4). Global pairwise alignment of active GH156 sialidase domains showed relatively low sequence identity, in the range of 16 to 37% (Fig. 1D), except for GUT_GENOME_096473_04400 and GUT_GENOME_246840_00663 from the Bacteroidota SSN cluster, which shared 60% identity. GUT_GENOME_246840_00663 was the most commonly detected GH156 across all metagenomes examined, specifically in IBD or control subjects (Table S3). As well, GUT_GENOME011168_01282 and GUT_GENOME049867_01444 were frequently observed in metagenomes from subjects practicing traditional lifestyles.

FIG 5.

FIG 5

Diverse GH156s encoded in the gut microbiomes display sialidase activity. A fluorescence-based endpoint assay using recombinant enzymes and 4MU-Neu5Ac as a substrate revealed 5 recombinantly expressed GH156s are sialidases. Error bars indicate standard error across technical triplicates. AFU, arbitrary fluorescence unit. The lower panels, Western blots using primary anti-polyhistidine-tag antibodies, display expression of soluble recombinant polyhistidine-tagged GH156s. Numbers at left of blots are molecular masses in kilodaltons. The “Empty Vector” is pET28b(+).

DISCUSSION

GHs active on host-produced glycans play important roles in the gut environment and impact microbiota community structure and virulence. This is particularly true for sialidases, since sialic acids occupy terminal positions and can act as receptors for microorganisms and immune cells, are an abundant and accessible nutrient source, and contribute to the physiochemical properties of mucus. To this end, we are interested in characterizing novel sialidases. We used a structure-informed homology search of large protein databases to search for members of the newly described GH156 family. At the time of writing, the CAZy database catalogs 51 members of this family, only one of which has been biochemically characterized. By including protein sequences from metagenome data sets in our search, we were able to identify ~10-fold more putative family members that can plausibly act on sialic acid, given homology to EnvSia156 and the presence of critical Neu5Ac-binding and catalytic residues. This includes 90 unique protein sequence from human fecal samples.

The proposed GH156s are largely derived from poorly studied phyla, including Planctomycetes, Verrucomicrobiota, Firmicutes_A, and Bacteroidota (Fig. 1B), a potential indication as to why it was the ~160th GH family described. These taxa remain difficult to isolate; indeed, P. merdae, Parabacteroides goldsteinii, Parabacteroides johnsonii, Alistipes_A indistinctus, and V. vadensis were the only isolate whole-genome sequences to contain a GH156, and the identification of so many GH156s was possible only due to deep sequencing and MAG assembly. The tremendous abundance of GH genes in Bacteroidota genomes is well established and has been noted for the Verrucomicrobiota gut species Akkermansia muciniphila and V. vadensis (27, 43). Planctomycetes are important degraders of complex carbohydrates in a great variety of soil and aquatic environments, both extreme and otherwise, suggesting many of the identified GH156s likely target plant glycans (88). Comparative genomics studies of Planctomycetes have revealed these genomes often contain dozens of genes encoding close and distant homologs of CAZymes, suggesting this phylum may be an abundant repository of novel GH families with activities that evolved for their varied niches (89). The GH156 sequences we identified were quite heterogeneous (Fig. 1D and E), likely because only a small fraction of extant sequences have been identified and deposited in public databases. Assembly of genomes of gut microbes from undersampled human populations (Fig. 1C) as well as samples of other animals and environments will go a long way to filling in the gaps in order to build a more informative SSN with monofunctional clusters (57, 58).

We demonstrated sialidase activity for 5 GH156 enzymes distributed throughout 4 distinct SSN clusters, 3 GH156s of which were frequently observed in metagenome samples (Fig. 1D and Fig. 5). This suggests that the activity is widely distributed across phylogenetically distant family members. Unfortunately, 14 of the expressed GH156s did not liberate Neu5Ac from 4MU under any of the conditions tested; however, Neu5Ac substrate specificity for these family members cannot be ruled out, and absence of activity might be explained by, for example, improper protein folding. Furthermore, the sialidase GUT_GENOME_246840_00663 shares 94% identity with the inactive protein GUT_GENOME_123491_00908. At such a level of similarity, a shared substrate specificity is highly probable, though not guaranteed. However, this indicates that generation of a higher-similarity-threshold SSN would not segregate GH156s that displayed sialidase activity from those that did not.

A possible implication of the sequence diversity of GH156s identified is a commensurate intrafamily substrate diversity. This is consistent with the taxonomic profile of this family, suggesting that GH156s are active against complex carbohydrates present in various environments, and would at least partially explain why not all GH156s that we tested were active. Though it is generally used to refer to Neu5Ac, “sialic acid” is a generic name given to a large group of nine-carbon acidic sugars (nonulosonic acids), any of which could potentially be targeted by a given GH156 (64, 90). Common examples of sialic acids include N-glycolylneuraminic acid (Neu5Gc), 2-keto-3-deoxy-d-glycero-d-galacto-nononic acid (Kdn), pseudomonic acid, and legionaminic acid. Smaller ulosonic acids, including 3-deoxy-d-manno-oct-2-ulosonic acid (Kdo) and 2-keto-3-deoxy-d-lyxo-heptulosaric acid (Dha), are additional candidate substrates, and GH33 family members target these sugars as well (91). Indeed, the EnvSia156 residues in proximity to the Neu5Ac acyclic functional groups are variable among family members (Fig. 1G). The structure of the founding member shows the C5 position extends away from the enzyme, rather than in a hydrophobic pocket. The sialic acid acyclic glycerol group (C7-C9) makes extensive contact with the binding site. Together, it is likely that EnvSia156 homologs have binding-site architectures that accommodate different ulosonic acid substrates. Unfortunately, fluorogenic substrates are not readily available to test these various activities, complicating biochemical assays.

Generally, organisms that contain GH156 genes also contain genes that may be involved in the liberation and catabolism of a variety of sugars that compose human N- and O-glycans (Fig. 2). This offers some indirect evidence that the organisms could be mucus degraders, though direct experimental evidence is required to resolve this. Assigning glycan substrates is especially fraught given the polyfunctionality of well-described CAZy families (e.g., GH16), and the poor characterization of other families with critical mucolytic activities (e.g., GH101 and GH129). Unfortunately, genomic investigation at the level of the GH156 gene cluster was not able to provide more definitive insights into the matter by “guilt-by-association” with a set of families known to act on host glycans (Fig. 3). The GH156-encoding genes were often colocalized with genes encoding hypothetical proteins. These unannotated genes may represent novel CAZyme families. One possibility is that some of these genes encode enzymes with functions related to host glycan degradation that could not be assigned with current resources. Interestingly, GH156s were often associated with GH141s, either as separate proteins encoded in proximity to GH156 genetic loci or as separate domains in the same GH156 protein. One of the two GH141s that has been studied displayed fucosidase activity on a plant cell wall substrate, but it is possible that GH141s are also active against host fucoglycans. Furthermore, gaps in sugar catabolic pathways could be due in part to the phyla under investigation being poorly described and only distantly related to model organisms in which the pathway functions were originally discerned. Interestingly, many organisms that carry a GH156 gene also carry a GH33 gene. These genes may be redundant, recognize different ulosonic acid substrates, or recognize the same sugar in different macromolecular contexts (e.g., linkage type or specific aglycone). A reverse genetics approach monitoring upregulation of GH156s in isolates growing on host glycans can help resolve glycan substrates (56, 91).

Although relative to Firmicutes_A and Verrucomicrobiota, Bacteroidota encode few discrete GH156s (Fig. 1B), these enzymes are the most prevalent in the gut metagenomes examined (Fig. 4C). Notably, only Bacteroidota GH156s could be detected in multiple IBD samples, and Verrucomicrobiota GH156s were rarely detected. Relative to samples from industrialized societies, the Hadza, Matses, and Tunapuco gut metagenomes encoded a larger variety of GH156s from Firmicutes_A and Verrucomicrobiota taxa and a single Bacteroidota GH156 from an unclassified Prevotella species. The most commonly observed GH156s in these groups were clustered with EnvSia156, though these enzymes were completely absent from Westernized microbiomes. Traditional lifestyles are often associated with greater gut microbiota diversity, and this has been credited to the consumption of a greater variety and abundance of plant material. In this light, it seems likely that the GH156s harbored in their microbiomes would contribute to plant fiber degradation or the organisms are generalists that can switch between host and dietary glycans. Here, we were able to demonstrate the Neu5Ac sialidase activity of a GH156 commonly encoded in the gut metagenomes of individuals living traditional lifestyles.

In vivo targeting of Neu5Ac is a promising avenue for cancer therapies (92, 93). Cancer cells often hypersialylate the cell surface to evade the immune system via Siglec binding. Bacterial sialidase-antibody conjugates have shown efficacy in proof-of-concept experiments, making them promising therapeutics (94). Development of this technology involved screening a number of different GH33 sialidases to choose one with optimal activity against host glycans, demonstrating that cataloging the activities of representative sequences across the sialidase sequence space is critical for identifying enzymes of biotechnological importance. The GH156s identified here offer an additional pool of candidates with potential host sialoglycan activity. Of particular interest are the GH156s from the human gut, since there is an increased probability they would be active against host sialic acids relative to candidates from other environments. This rationale had previously led to the discovery of blood group-specific GHs from the human gut using a functional metagenomics screen (95). Functional screens using random fragment libraries and rational selection of putative GH sequences from genomes and metagenomes are both important strategies for uncovering functional novelty (96). The GH156 sequence space remains almost entirely unexplored and promises to offer new biochemical activities.

MATERIALS AND METHODS

Bioinformatics methods.

HmmerBuild (v3.1b2) (72) was used to construct a profile hidden Markov model using default settings and a multiple sequence alignment of 32 GH156 sequences from the CAZy database (March 2020) with a full complement of catalytic residues and truncated to the boundaries of the EnvSia156 (β/α)8-barrel domain. HmmerBuild was run using the pruned and truncated multiple-sequence alignment to construct the pHMM made using MAFFT (v7.471) G-INS-I (97). The Skylign webserver was used to construct the sequence logo (98). HmmerSearch was used for database searching. Hits were manually inspected to ensure the presence of at least four of the five highlighted active site residues (accepting the substitution of a single one of these for a different polar amino acid). CD-HIT (v4.8.1) was used with default settings and a sequence identity cutoff threshold of 0.95 to remove redundant sequences (99, 100). The resulting 556 nonredundant putative GH156 sialidase sequences were trimmed to the domain boundaries based on the pHMM alignment for subsequent bioinformatic analyses. Sequence similarity networks were calculated using the SSN tool developed by the Enzyme Function Initiative (101104), with a supplied multi-fasta file and an initial alignment score of 10. SSN with edge thresholds from 40 to 60% identity (alignment length, >100 aa) in 5% increments were evaluated. Edges were filtered, and the SSN were visualized using Cytoscape (v3.8.2) (105). To depict sequence conservation, the EnvSia156 6S00 structure was colored by sequence conservation using Chimera v1.14 (106). The sequence alignment to facilitate this was generated by predicting the 3-dimensional structure of 79 UHGP-95 GH156s using Alphafold2 with ColabFold Mmseqs2 v1.3 (107, 108) and generating a multiple structural alignment along with PDB 6S00 using mTM-align (109).

Publicly available metagenome data sets PRJNA400072 (88 CD, 76 UC, 56 control) (81), PRJNA268964 (24 Matses, 12 Tunapuco, 22 control) (83), and PRJNA278393 (27 Hadza, 11 control) (82) were downloaded from the Sequence Read Archive (SRA) using Entrez Direct. We then mapped raw reads to GH156 genes truncated to the GH156 domain boundaries using BWA-MEM (110). Mapped reads were sorted using SAMtools view (111), and the number of mapped reads was identified via SAMtools coverage. The proportion of bases covered for a given GH156 sequence with reads mapped from a single sample ranged from 1.5% to 100% (see Fig. S5A in the supplemental material). A majority of sequences with very low coverage were derived from databases other than UHGP-95. Given that the reads were from gut metagenomes, we reasoned that most GH156s present in a sample would be from UHGP-95. Therefore, these low-coverage genes were likely false positives, but some genes derived from nongut databases with almost complete coverage were likely authentic and represent a gap in the UHGP-95 catalog. To help distinguish those genes likely present in a sample from those likely absent, at various coverage detection thresholds we calculated the proportion of detected sequences across all samples that were derived from UHGP-95 (Fig. S5B). We sought to maximize the proportion of UHGP-95-derived genes while minimizing the stringency of the threshold. At a coverage threshold of 275 bp (a local maximum), 96.4% of the GH156 sequences detected were from UHGP-95. We therefore filtered our data to remove hits with less than 275 bp of coverage in each sample. The two remaining non-UHGP-95 genes detected in at least one sample included PWL98936.1 from UniProt and GEM-PF_16266 from GEM, which are both derived from human fecal metagenomes, providing strong support that all hits included in the analyses are truly present in the sample.

Read counts were adjusted to account for variability in gene length and per-sample read depth using the formula: number of reads × (maximum GH156 gene length/mapped GH156 gene length) × (maximum number of reads in all samples/number of reads in sample). Samples with adjusted read counts more than 3 standard deviations above the mean were removed from the analysis.

GH156 accessory domains were predicted using the dbCAN2 metaserver (77). UHGG genomes (Table S2) were annotated with DRAM (112) to retrieve EC numbers (Table S1) and with the dbCAN2 metaserver to assign CAZyme families.

Genes on the same contig 20 kb up- or downstream of GH156 were searched for CAZymes using the dbCAN2 metaserver. Clusters were visualized with the R package gggenomes v.0.9.5.9 (https://github.com/thackl/gggenomes). The Minimap2 software was used to detect synteny (113).

Statistical methods.

Statistical analysis and visualization were performed with R v.4.0.3 (114) with the Tidyverse package (115). Significant differences were determined using the rstatix package v.0.5.0 Tukey honestly significant difference (Tukey HSD) function.

Bacterial strains and growth conditions.

Bacterial cultures were grown at 37°C with aeration in lysogeny broth (LB) supplemented with 50 μg mL−1 kanamycin. E. coli XL1-Blue {recA1 endA1 gyrA96 thi-1 hsdR17 supE44 relA1 lac [F′ proAB lacIqZΔM15 Tn10 (Tetr)]} (Stratagene) was used for general cloning, and E. coli BL21-CodonPlus(DE3) {F ompT hsdS(rB mB) dcm+ Tetr gal λ(DE3) endA [argU proL Camr] [argU ileY leuW Strep/Specr]} (Stratagene) was used for recombinant protein expression.

DNA methods.

Custom oligonucleotide primers were obtained from Integrated DNA Technologies (Table 2). PCR amplification was performed using Phusion polymerase (New England Biolabs). Template material was metagenomic DNA from feces, except for GC_352 and GG_1309, which were DNA purified from an isolate of V. vadensis and a plasmid synthesized by Integrative DNA Technologies, respectively. The Monarch DNA gel extraction and Monarch PCR and DNA cleanup kits (New England Biolabs) were used to clean PCR products according to the manufacturers’ instructions. PCR amplicons were ligated into a kanamycin resistance vector, pET28b(+) or pET29b(+) (Novagen), using T4 ligase (New England Biolabs) following restriction endonuclease digestions (NheI, XhoI, and HindIII; New England Biolabs) and PCR cleanup to generate expression plasmids (Table S4).

TABLE 2.

Oligonucleotide primers used in generation of plasmids

Primer Sequencea
EJM001 5′-gatccatATGAAATCACTTATAATTAACTATTTC-3′
EJM004 5′-gatcaagcttGAAATACCCGTGCGTTTC-3′
EJM007 5′-gatccatATGAAAATAAGAAAACATCTTTGTTC-3′
EJM008 5′-gatcgaattcTTAAAAATACCCATGCGTTTCC-3′
EJM011 5′-gatcgctagcATGCGACAGGGCATCATC-3′
EJM012 5′-gatcaagcttTTACTCGAATACGCGCGCG-3′
EJM005 5′-gatccatATGAAAAGAGACTTCGCAAGCC-3′
EJM006 5′-gatcaagcttAAAGTAGCCGTTCGTTTCG-3′
EJM114 5′-gatcgctagcGAATTTTTCGAGTTGACAG-3′
EJM115 5′-gatcctcgagTTACAGTTCTATCCAAACAAG-3′
EJM122 5′-TTGCCTATTTGAAGAGATTTCGTCG-3′
EJM123 5′-TTGCCTATTTGAAGAGATTTCGTCG-3′
EJM088 5′-gatcgctagcTTTTTTGAAAAAATTTTTTTCTATTG-3′
EJM089 5′-gatcctcgagTCAGCCGGCATCGACAAAAATC-3′
EJM112 5′-gatcgctagcAACCGCCCTTCCGTGC-3′
EJM113 5′-gatcctcgagTCAGCGAACGGCCAAATC-3′
EJM092 5′-gatcgctagcAAGCAGAAAGAACCCG-3′
EJM093 5′-gatcctcgagTTACAGTTCTATCCAAACAAG-3′
EJM126 5′-AGCGCTTATAGTTACAACGTTTGTC-3′
EJM127 5′-TTCATTTCTGCGGTCTACATTTACG-3′
EJM100 5′-gatcgctagcAAAGGATCGAGTTCTATGTC-3′
EJM101 5′-gatcctcgagTTATCTCGGATTCTTAGGC-3′
EJM136 5′-CTATGCCGAATTTTAGGGGTTGATC-3′
EJM137 5′-ACCATGTTTTCCCACCTTTATAACG-3′
EJM096 5′-gatcgctagcAAAAAGTACGTTTTTTTCAATG-3′
EJM097 5′-gatcaagcttTCACTTGTTTTGAGCAATCTG-3′
EJM086 5′-gatcgctagcAACCGCCCTTCCG-3′
EJM087 5′-gatcctcgagTCAGCGAACGGCCAAATC-3′
EJM098 5′-gatcgctagcCGAGTAATCTTTAACGAGG-3′
EJM099 5′-gatcctcgagTCATTCTTTCATCCCGCAG-3′
EJM104 5′-gatcgctagcAGAGTCATTTTCAACGAAGATAAC-3′
EJM105 5′-gatcctcgagCTATTTGGCAGGCCGTTC-3′
EJM132 5′-GCTATACCTGATTCTCAAAAGCGTC-3′
EJM133 5′-CGTAACAGGCGATTCAACAATTTTC-3′
EJM102 5′-gatcgctagcAACCGCAGAGAGATGCTG-3′
EJM103 5′-gatcctcgagCTATTTTTTCAGAAAATCGTTG-3′
EJM142 5′-GAGGAAGATCGACGGATGTTTTATC-3′
EJM143 5′-AAATAAACACTTTCCGCGAGACAAG-3′
EJM140 5′-gatcgctagcAAAAAAGCATTTGTTTTGTATTTAAC-3′
EJM141 5′-gatcgagctcTTAATTTGCCATTTCTAACGGCAC-3′
EJM144 5′-CGACTCAATATGCTATGATTACGGC-3′
EJM145 5′-ATCCAAAGCCTTTTGTTATCTCAGC-3′
EJM106 5′-gatcgctagcGATACTGCACTTCGCGG-3′
EJM107 5′-gatcctcgagTCATCGTCTTTGAATTGCTAG-3′
EJM146 5′-GTGAATAAAAGCGGTAGAGTGTACG-3′
EJM147 5′-ATCATGGAATACAATTCTGTTCCGC-3′
EJM111 5′-gatcctcgagTTAGAAATACCCGTGCGTTTC-3′
EJM112 5′-gatcgctagcAACCGCCCTTCCGTGC-3′
EJM148 5′-GAATTCTCTCAGCATAAGGAGTTGC-3′
EJM149 5′-TAGGATATGTCCCTTTCGACCAATC-3′
EJM154 5′-gatcgctagcAAGACCATGCTGCAGTGG-3′
EJM155 5′-gatcctcgagTCACTTCTCCTTACGACG-3′
EJM015 5′-gatccatatgAGCCTTTATATCAACGACG-3′
EJM016 5′-gatcaagcttTTCTCCGACCCGAATATG-3′
a

Uppercase bases are homologous to the region being amplified. Boldface bases are restriction enzyme recognition sites.

Protein expression and purification.

In 10 mL LB, BL21-CodonPlus(DE3) harboring expression plasmids was grown to an A600 nm of 0.6. Cultures were then incubated with aeration at 15°C for 30 min prior to induction with 0.1 mM isopropyl-β-d-thiogalactopyranoside (IPTG). Cultures were incubated for 20 h at 15°C with aeration, and a culture volume of 5 mL A600−1 was harvested by centrifugation at 4,000 × g. Cells were lysed with 50 μL BugBuster (Millipore) containing rLysozyme (Millipore) and Benzonase nuclease (Millipore) for 1 h at room temperature on a rocking platform. Lysates were subjected to centrifugation for 20 min at 20,000 × g to pellet intact cells and insoluble material. The soluble cell lysate was collected and diluted with 150 μL of 150 mM sodium chloride in 100 mM sodium acetate, pH 5.5. Protein expression was monitored by Western blotting. Lysates were solubilized in Laemmli buffer, heated to 100°C for 10 min, and resolved using SDS-PAGE with 12% gels and Tris-glycine buffer (116). Protein was then transferred to a nitrocellulose membrane (Thermo Scientific) and probed with mouse anti-His tag primary antibody (Cedarlane Labs) and goat anti-mouse IgG secondary antibody conjugated to alkaline phosphate (Cedarlane Labs). Western blots were developed using 5-bromo-4-chloro-3-indoylphosphate and nitroblue tetrazolium.

Sialidase assay.

Sialidase activity was fluorometrically monitored using the substrate 2′-(4-methylumbelliferyl)-α-d-N-acetylneuraminic acid (4MU-Neu5Ac; Sigma-Aldrich). Reactions were performed in a final volume of 50 μL. To 45 μL cell lysate warmed to 37°C, 4MU-Neu5Ac was added at a final concentration of 100 μM in 37.5 mM sodium chloride, 25 mM sodium acetate, pH 5.5. The reaction mixture was incubated at 37°C, and fluorescence intensity was measured with a BioTek Synergy H1 microplate reader (excitation, 365 nm; emission, 445 nm) immediately after adding substrate and again after 30 min. Assays were performed in black-walled 96-well plates and in technical triplicates. Reported fluorescence intensity is the mean difference between the two measurements for the three triplicates.

ACKNOWLEDGMENTS

E.M. is supported by a postdoctoral fellowship from the Canadian Institutes of Health Research. M.G.S. is supported by a Canada Research Chair. This research was funded by a Proof of Principle grant from the W. Garfield Weston Foundation to E.M. and M.G.S. and a Canadian Institutes of Health Research grant to M.G.S.

We declare no conflicts of interest.

Footnotes

Supplemental material is available online only.

Supplemental file 1
Supplemental material. Download aem.01755-22-s0001.pdf, PDF file, 2.1 MB (2.1MB, pdf)

Contributor Information

Michael G. Surette, Email: surette@mcmaster.ca.

Danilo Ercolini, University of Naples Federico II.

REFERENCES

  • 1.Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486:207–214. 10.1038/nature11234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zheng D, Liwinski T, Elinav E. 2020. Interaction between microbiota and immunity in health and disease. Cell Res 30:492–506. 10.1038/s41422-020-0332-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ruff WE, Greiling TM, Kriegel MA. 2020. Host–microbiota interactions in immune-mediated diseases. Nat Rev Microbiol 18:521–538. 10.1038/s41579-020-0367-2. [DOI] [PubMed] [Google Scholar]
  • 4.Kaplan GG, Ng SC. 2017. Understanding and preventing the global increase of inflammatory bowel disease. Gastroenterology 152:313–321.e2. 10.1053/j.gastro.2016.10.020. [DOI] [PubMed] [Google Scholar]
  • 5.De Souza HSP. 2017. Etiopathogenesis of inflammatory bowel disease: today and tomorrow. Curr Opin Gastroenterol 33:222–229. 10.1097/MOG.0000000000000364. [DOI] [PubMed] [Google Scholar]
  • 6.Ananthakrishnan AN. 2015. Epidemiology and risk factors for IBD. Nat Rev Gastroenterol Hepatol 12:205–217. 10.1038/nrgastro.2015.34. [DOI] [PubMed] [Google Scholar]
  • 7.Dias AM, Pereira MS, Padrão NA, Alves I, Marcos-Pinto R, Lago P, Pinho SS. 2018. Glycans as critical regulators of gut immunity in homeostasis and disease. Cell Immunol 333:9–18. 10.1016/j.cellimm.2018.07.007. [DOI] [PubMed] [Google Scholar]
  • 8.Kudelka MR, Stowell SR, Cummings RD, Neish AS. 2020. Intestinal epithelial glycosylation in homeostasis and gut microbiota interactions in IBD. Nat Rev Gastroenterol Hepatol 17:597–617. 10.1038/s41575-020-0331-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Alves I, Vicente MM, Dias AM, Gaifem J, Rodrigues C, Campar A, Pinho SS. 2021. The role of glycosylation in inflammatory diseases. Adv Exp Med Biol 1325:265–283. 10.1007/978-3-030-70115-4_13. [DOI] [PubMed] [Google Scholar]
  • 10.Martens EC, Neumann M, Desai MS. 2018. Interactions of commensal and pathogenic microorganisms with the intestinal mucosal barrier. Nat Rev Microbiol 16:457–470. 10.1038/s41579-018-0036-x. [DOI] [PubMed] [Google Scholar]
  • 11.Varki A. 2017. Biological roles of glycans. Glycobiology 27:3–49. 10.1093/glycob/cww086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Robb M, Hobbs JK, Woodiga SA, Shapiro-Ward S, Suits MDL, McGregor N, Brumer H, Yesilkaya H, King SJ, Boraston AB. 2017. Molecular characterization of N-glycan degradation and transport in Streptococcus pneumoniae and its contribution to virulence. PLoS Pathog 13:e1006090. 10.1371/journal.ppat.1006090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Minhas V, Paton JC, Trappetti C. 2021. Sickly sweet – how sugar utilization impacts pneumococcal disease progression. Trends Microbiol 29:768–771. 10.1016/j.tim.2021.01.016. [DOI] [PubMed] [Google Scholar]
  • 14.Chandra K, Roy Chowdhury A, Chatterjee R, Chakravortty D. 2022. GH18 family glycoside hydrolase chitinase A of Salmonella enhances virulence by facilitating invasion and modulating host immune responses. PLoS Pathog 18:e1010407. 10.1371/journal.ppat.1010407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.2Devlin JR, Santus W, Mendez J, Peng W, Yu A, Wang J, Alejandro-Navarreto X, Kiernan K, Singh M, Jiang P, Mechref Y, Behnsen J. 2022. Salmonella enterica serovar Typhimurium chitinases modulate the intestinal glycome and promote small intestinal invasion. PLoS Pathog 18:e1010167. 10.1371/journal.ppat.1010167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hansson GC. 2020. Mucins and the microbiome. Annu Rev Biochem 89:769–793. 10.1146/annurev-biochem-011520-105053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Van der Sluis M, De Koning BAE, De Bruijn ACJM, Velcich A, Meijerink JPP, Van Goudoever JB, Büller HA, Dekker J, Van Seuningen I, Renes IB, Einerhand AWC. 2006. Muc2-deficient mice spontaneously develop colitis, indicating that MUC2 is critical for colonic protection. Gastroenterology 131:117–129. 10.1053/j.gastro.2006.04.020. [DOI] [PubMed] [Google Scholar]
  • 18.Fu J, Wei B, Wen T, Johansson MEV, Liu X, Bradford E, Thomsson KA, McGee S, Mansour L, Tong M, McDaniel JM, Sferra TJ, Turner JR, Chen H, Hansson GC, Braun J, Xia L. 2011. Loss of intestinal core 1-derived O-glycans causes spontaneous colitis in mice. J Clin Invest 121:1657–1666. 10.1172/JCI45538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang Y, Ju T, Ding X, Xia B, Wang W, Xia L, He M, Cummings RD. 2010. Cosmc is an essential chaperone for correct protein O-glycosylation. Proc Natl Acad Sci USA 107:9228–9233. 10.1073/pnas.0914004107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bergstrom K, Fu J, Johansson MEV, Liu X, Gao N, Wu Q, Song J, McDaniel JM, McGee S, Chen W, Braun J, Hansson GC, Xia L. 2017. Core 1- and 3-derived O-glycans collectively maintain the colonic mucus barrier and protect against spontaneous colitis in mice. Mucosal Immunol 10:91–103. 10.1038/mi.2016.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Johansson MEV, Phillipson M, Petersson J, Velcich A, Holm L, Hansson GC. 2008. The inner of the two Muc2 mucin-dependent mucus layers in colon is devoid of bacteria. Proc Natl Acad Sci USA 105:15064–15069. 10.1073/pnas.0803124105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tropini C, Earle KA, Huang KC, Sonnenburg JL. 2017. The gut microbiome: connecting spatial organization to function. Cell Host Microbe 21:433–442. 10.1016/j.chom.2017.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Donaldson GP, Lee SM, Mazmanian SK. 2016. Gut biogeography of the bacterial microbiota. Nat Rev Microbiol 14:20–32. 10.1038/nrmicro3552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Libertucci J, Dutta U, Kaur S, Jury J, Rossi L, Fontes ME, Shajib MS, Khan WI, Surette MG, Verdu EF, Armstrong D. 2018. Inflammation-related differences in mucosa-associated microbiota and intestinal barrier function in colonic Crohn’s disease. Am J Physiol Gastrointest Liver Physiol 315:G420–G431. 10.1152/ajpgi.00411.2017. [DOI] [PubMed] [Google Scholar]
  • 25.Earle KA, Billings G, Sigal M, Lichtman JS, Hansson GC, Elias JE, Amieva MR, Huang KC, Sonnenburg JL. 2015. Quantitative imaging of gut microbiota spatial organization. Cell Host Microbe 18:478–488. 10.1016/j.chom.2015.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pruss KM, Marcobal A, Southwick AM, Dahan D, Smits SA, Ferreyra JA, Higginbottom SK, Sonnenburg ED, Kashyap PC, Choudhury B, Bode L, Sonnenburg JL. 2021. Mucin-derived O-glycans supplemented to diet mitigate diverse microbiota perturbations. ISME J 15:577–591. 10.1038/s41396-020-00798-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Briggs JA, Grondin JM, Brumer H. 2021. Communal living: glycan utilization by the human gut microbiota. Environ Microbiol 23:15–35. 10.1111/1462-2920.15317. [DOI] [PubMed] [Google Scholar]
  • 28.Fang J, Wang H, Zhou Y, Zhang H, Zhou H, Zhang X. 2021. Slimy partners: the mucus barrier and gut microbiome in ulcerative colitis. Exp Mol Med 53:772–787. 10.1038/s12276-021-00617-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kudelka MR, Hinrichs BH, Darby T, Moreno CS, Nishio H, Cutler CE, Wang J, Wu H, Zeng J, Wang Y, Ju T, Stowell SR, Nusrat A, Jones RM, Neish AS, Cummings RD. 2016. Cosmc is an X-linked inflammatory bowel disease risk gene that spatially regulates gut microbiota and contributes to sex-specific risk. Proc Natl Acad Sci USA 113:14787–14792. 10.1073/pnas.1612158114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Qin Y, Havulinna AS, Liu Y, Jousilahti P, Ritchie SC, Tokolyi A, Sanders JG, Valsta L, Brożyńska M, Zhu Q, Tripathi A, Vázquez-Baeza Y, Loomba R, Cheng S, Jain M, Niiranen T, Lahti L, Knight R, Salomaa V, Inouye M, Méric G. 2022. Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. Nat Genet 54:134–142. 10.1038/s41588-021-00991-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.2Lopera-Maya EA, Kurilshikov A, van der Graaf A, Hu S, Andreu-Sánchez S, Chen L, Vila AV, Gacesa R, Sinha T, Collij V, Klaassen MAY, Bolte LA, Gois MFB, Neerincx PBT, Swertz MA, LifeLines Cohort Study, Aguirre-Gamboa R, Deelen P, Franke L, Kuivenhoven JA, Lopera-Maya EA, Nolte IM, Sanna S, Snieder H, Swertz MA, Vonk JM, Wijmenga C, Harmsen HJM, Wijmenga C, Fu J, Weersma RK, Zhernakova A, Sanna S. 2022. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat Genet 54:143–151. 10.1038/s41588-021-00992-y. [DOI] [PubMed] [Google Scholar]
  • 32.Rühlemann MC, Hermes BM, Bang C, Doms S, Moitinho-Silva L, Thingholm LB, Frost F, Degenhardt F, Wittig M, Kässens J, Weiss FU, Peters A, Neuhaus K, Völker U, Völzke H, Homuth G, Weiss S, Grallert H, Laudes M, Lieb W, Haller D, Lerch MM, Baines JF, Franke A. 2021. Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome. Nat Genet 53:147–155. 10.1038/s41588-020-00747-1. [DOI] [PubMed] [Google Scholar]
  • 33.Yang H, Wu J, Huang X, Zhou Y, Zhang Y, Liu M, Liu Q, Ke S, He M, Fu H, Fang S, Xiong X, Jiang H, Chen Z, Wu Z, Gong H, Tong X, Huang Y, Ma J, Gao J, Charlier C, Coppieters W, Shagam L, Zhang Z, Ai H, Yang B, Georges M, Chen C, Huang L. 2022. ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs. Nature 606:358–367. 10.1038/s41586-022-04769-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sommer F, Adam N, Johansson MEV, Xia L, Hansson GC, Bäckhed F. 2014. Altered mucus glycosylation in core 1 O-glycan-deficient mice affects microbiota composition and intestinal architecture. PLoS One 9:e85254. 10.1371/journal.pone.0085254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Morampudi V, Dalwadi U, Bhinder G, Sham HP, Gill SK, Chan J, Bergstrom KSB, Huang T, Ma C, Jacobson K, Gibson DL, Vallance BA. 2016. The goblet cell-derived mediator RELM-β drives spontaneous colitis in Muc2-deficient mice by promoting commensal microbial dysbiosis. Mucosal Immunol 9:1218–1233. 10.1038/mi.2015.140. [DOI] [PubMed] [Google Scholar]
  • 36.Li H, Zhang X, Chen R, Cheng K, Ning Z, Li J, Twine S, Stintzi A, Mack D, Figeys D. 2021. Elevated colonic microbiota-associated paucimannosidic and truncated N-glycans in pediatric ulcerative colitis. J Proteomics 249:104369. 10.1016/j.jprot.2021.104369. [DOI] [PubMed] [Google Scholar]
  • 37.Larsson JMH, Karlsson H, Crespo JG, Johansson MEV, Eklund L, Sjövall H, Hansson GC. 2011. Altered O-glycosylation profile of MUC2 mucin occurs in active ulcerative colitis and is associated with increased inflammation. Inflamm Bowel Dis 17:2299–2307. 10.1002/ibd.21625. [DOI] [PubMed] [Google Scholar]
  • 38.Martens EC, Chiang HC, Gordon JI. 2008. Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe 4:447–457. 10.1016/j.chom.2008.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sonnenburg JL, Xu J, Leip DD, Chen CH, Westover BP, Weatherford J, Buhler JD, Gordon JI. 2005. Glycan foraging in vivo by an intestine-adapted bacterial symbiont. Science 307:1955–1959. 10.1126/science.1109051. [DOI] [PubMed] [Google Scholar]
  • 40.Desai MS, Seekatz AM, Koropatkin NM, Kamada N, Hickey CA, Wolter M, Pudlo NA, Kitamoto S, Terrapon N, Muller A, Young VB, Henrissat B, Wilmes P, Stappenbeck TS, Núñez G, Martens EC. 2016. A dietary fiber-deprived gut microbiota degrades the colonic mucus barrier and enhances pathogen susceptibility. Cell 167:1339–1353.e21. 10.1016/j.cell.2016.10.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ndeh D, Gilbert HJ. 2018. Biochemistry of complex glycan depolymerisation by the human gut microbiota. FEMS Microbiol Rev 42:146–164. 10.1093/femsre/fuy002. [DOI] [PubMed] [Google Scholar]
  • 42.McKee LS, La Rosa SL, Westereng B, Eijsink VG, Pope PB, Larsbrink J. 2021. Polysaccharide degradation by the Bacteroidetes: mechanisms and nomenclature. Environ Microbiol Rep 13:559–581. 10.1111/1758-2229.12980. [DOI] [PubMed] [Google Scholar]
  • 43.El Kaoutari A, Armougom F, Gordon JI, Raoult D, Henrissat B. 2013. The abundance and variety of carbohydrate-active enzymes in the human gut microbiota. Nat Rev Microbiol 11:497–504. 10.1038/nrmicro3050. [DOI] [PubMed] [Google Scholar]
  • 44.Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. 10.1093/nar/gkt1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Henrissat B. 1991. A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem J 280:309–316. 10.1042/bj2800309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Cantarel BI, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. 2009. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res 37:D233–D238. 10.1093/nar/gkn663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Drula E, Garron ML, Dogan S, Lombard V, Henrissat B, Terrapon N. 2022. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res 50:D571–D577. 10.1093/nar/gkab1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bell A, Juge N. 2021. Mucosal glycan degradation of the host by the gut microbiota. Glycobiology 31:691–696. 10.1093/glycob/cwaa097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tamura K, Brumer H. 2021. Glycan utilization systems in the human gut microbiota: a gold mine for structural discoveries. Curr Opin Struct Biol 68:26–40. 10.1016/j.sbi.2020.11.001. [DOI] [PubMed] [Google Scholar]
  • 50.Trastoy B, Du JJ, Klontz EH, Li C, Cifuente JO, Wang LX, Sundberg EJ, Guerin ME. 2020. Structural basis of mammalian high-mannose N-glycan processing by human gut Bacteroides. Nat Commun 11:899. 10.1038/s41467-020-14754-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Garrido D, Dallas DC, Mills DA. 2013. Consumption of human milk glycoconjugates by infant-associated Bifidobacteria: mechanisms and implications. Microbiology (Reading) 159:649–664. 10.1099/mic.0.064113-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Wardman JF, Bains RK, Rahfeld P, Withers SG. 2022. Carbohydrate-active enzymes (CAZymes) in the gut microbiome. Nat Rev Microbiol 20:542–556. 10.1038/s41579-022-00712-1. [DOI] [PubMed] [Google Scholar]
  • 53.Crouch LI, Liberato MV, Urbanowicz PA, Baslé A, Lamb CA, Stewart CJ, Cooke K, Doona M, Needham S, Brady RR, Berrington JE, Madunic K, Wuhrer M, Chater P, Pearson JP, Glowacki R, Martens EC, Zhang F, Linhardt RJ, Spencer DIR, Bolam DN. 2020. Prominent members of the human gut microbiota express endo-acting O-glycanases to initiate mucin breakdown. Nat Commun 11:4017. 10.1038/s41467-020-17847-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wu H, Crost EH, Owen CD, van Bakel W, Martínez Gascueña A, Latousakis D, Hicks T, Walpole S, Urbanowicz PA, Ndeh D, Monaco S, Sánchez Salom L, Griffiths R, Reynolds RS, Colvile A, Spencer DIR, Walsh M, Angulo J, Juge N. 2021. The human gut symbiont Ruminococcus gnavus shows specificity to blood group A antigen during mucin glycan foraging: implication for niche colonisation in the gastrointestinal tract. PLoS Biol 19:e3001498. 10.1371/journal.pbio.3001498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Garron ML, Henrissat B. 2019. The continuing expansion of CAZymes and their families. Curr Opin Chem Biol 53:82–87. 10.1016/j.cbpa.2019.08.004. [DOI] [PubMed] [Google Scholar]
  • 56.Briliūtė J, Urbanowicz PA, Luis AS, Baslé A, Paterson N, Rebello O, Hendel J, Ndeh DA, Lowe EC, Martens EC, Spencer DIR, Bolam DN, Crouch LI. 2019. Complex N-glycan breakdown by gut Bacteroides involves an extensive enzymatic apparatus encoded by multiple co-regulated genetic loci. Nat Microbiol 4:1571–1581. 10.1038/s41564-019-0466-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Viborg AH, Terrapon N, Lombard V, Michel G, Czjzek M, Henrissat B, Brumer H. 2019. A subfamily roadmap of the evolutionarily diverse glycoside hydrolase family 16 (GH16). J Biol Chem 294:15973–15986. 10.1074/jbc.RA119.010619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Li A, Laville E, Tarquis L, Lombard V, Ropartz D, Terrapon N, Henrissat B, Guieysse D, Esque J, Durand J, Morgavi DP, Potocki-Veronese G. 2020. Analysis of the diversity of the glycoside hydrolase family 130 in mammal gut microbiomes reveals a novel mannoside-phosphorylase function. Microb Genom 6:mgen000404. 10.1099/mgen.0.000404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Grootaert H, van Landuyt L, Hulpiau P, Callewaert N. 2020. Functional exploration of the GH29 fucosidase family. Glycobiology 30:735–745. 10.1093/glycob/cwaa023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Aspeborg H, Coutinho PM, Wang Y, Brumer H, Henrissat B. 2012. Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5). BMC Evol Biol 12:186. 10.1186/1471-2148-12-186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Mewis K, Lenfant N, Lombard V, Henrissat B. 2016. Dividing the large glycoside hydrolase family 43 into subfamilies: a motivation for detailed enzyme characterization. Appl Environ Microbiol 82:1686–1692. 10.1128/AEM.03453-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.St John FJ, González JM, Pozharski E. 2010. Consolidation of glycosyl hydrolase family 30: a dual domain 4/7 hydrolase family consisting of two structurally distinct groups. FEBS Lett 584:4435–4441. 10.1016/j.febslet.2010.09.051. [DOI] [PubMed] [Google Scholar]
  • 63.Stam MR, Danchin EGJ, Rancurel C, Coutinho PM, Henrissat B. 2006. Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of α-amylase-related proteins. Protein Eng Des Sel 19:555–562. 10.1093/protein/gzl044. [DOI] [PubMed] [Google Scholar]
  • 64.Lewis AL, Chen X, Schnaar RL, Varki A. 2022. Sialic acids and other nonulosonic acids. In Varki A, Cummings RD, Esko JD, Stanley P, Hart GW, Aebi M, Mohnen D, Kinoshita T, Packer NH, Prestegard JH, Schnaar RL, Seeberger PH (ed), Essentials of glycobiology, 4th ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]
  • 65.Ng KM, Ferreyra JA, Higginbottom SK, Lynch JB, Kashyap PC, Gopinath S, Naidu N, Choudhury B, Weimer BC, Monack DM, Sonnenburg JL. 2013. Microbiota-liberated host sugars facilitate post-antibiotic expansion of enteric pathogens. Nature 502:96–99. 10.1038/nature12503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Huang YL, Chassard C, Hausmann M, Von Itzstein M, Hennet T. 2015. Sialic acid catabolism drives intestinal inflammation and microbial dysbiosis in mice. Nat Commun 6:8141. 10.1038/ncomms9141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Sorbara MT, Pamer EG. 2019. Interbacterial mechanisms of colonization resistance and the strategies pathogens use to overcome them. Mucosal Immunol 12:1–9. 10.1038/s41385-018-0053-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Juge N, Tailford L, Owen CD. 2016. Sialidases from gut bacteria: a mini-review. Biochem Soc Trans 44:166–175. 10.1042/BST20150226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Belzer C. 2022. Nutritional strategies for mucosal health: the interplay between microbes and mucin glycans. Trends Microbiol 30:13–21. 10.1016/j.tim.2021.06.003. [DOI] [PubMed] [Google Scholar]
  • 70.Chuzel L, Ganatra MB, Rapp E, Henrissat B, Taron CH. 2018. Functional metagenomics identifies an exosialidase with an inverting catalytic mechanism that defines a new glycoside hydrolase family (GH156). J Biol Chem 293:18138–18150. 10.1074/jbc.RA118.003302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Bule P, Chuzel L, Blagova E, Wu L, Gray MA, Henrissat B, Rapp E, Bertozzi CR, Taron CH, Davies GJ. 2019. Inverting family GH156 sialidases define an unusual catalytic motif for glycosidase action. Nat Commun 10:4816. 10.1038/s41467-019-12684-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.UniProt Consortium. 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49:D480–D489. 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, Wu D, Paez-Espino D, Chen IM, Huntemann M, Palaniappan K, Ladau J, Mukherjee S, Reddy TBK, Nielsen T, Kirton E, Faria JP, Edirisinghe JN, Henry CS, Jungbluth SP, Chivian D, Dehal P, Wood-Charlson EM, Arkin AP, Tringe SG, Visel A, IMG/M Data Consortium, Woyke T, Mouncey NJ, Ivanova NN, Kyrpides NC, Eloe-Fadrosh EA. 2021. A genomic catalog of Earth’s microbiomes. Nat Biotechnol 39:499–509. 10.1038/s41587-020-0718-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, Pollard KS, Sakharova E, Parks DH, Hugenholtz P, Segata N, Kyrpides NC, Finn RD. 2021. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39:105–114. 10.1038/s41587-020-0603-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Roy A, Srinivasan N, Gowri VS. 2009. Molecular and structural basis of drift in the functions of closely-related homologous enzyme domains: implications for function annotation based on homology searches and structural genomics. In Silico Biol 9:S41–S55. 10.3233/ISB-2009-0379. [DOI] [PubMed] [Google Scholar]
  • 77.Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, Busk PK, Xu Y, Yin Y. 2018. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 46:W95–W101. 10.1093/nar/gky418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Meziti A, Rodriguez-R LM, Hatt JK, Peña-Gonzalez A, Levy K, Konstantinidis KT. 2021. The reliability of metagenome-assembled genomes (MAGs) in representing natural populations: insights from comparing MAGs against isolate genomes derived from the same fecal sample. Appl Environ Microbiol 87:e02593-20. 10.1128/AEM.02593-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Lobb B, Tremblay BJM, Moreno-Hagelsieb G, Doxey AC. 2020. An assessment of genome annotation coverage across the bacterial tree of life. Microb Genom 6:e000341. 10.1099/mgen.0.000341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Brown HA, Koropatkin NM. 2021. Host glycan utilization within the Bacteroidetes Sus-like paradigm. Glycobiology 31:697–706. 10.1093/glycob/cwaa054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Franzosa EA, Sirota-Madi A, Avila-Pacheco J, Fornelos N, Haiser HJ, Reinker S, Vatanen T, Hall AB, Mallick H, McIver LJ, Sauk JS, Wilson RG, Stevens BW, Scott JM, Pierce K, Deik AA, Bullock K, Imhann F, Porter JA, Zhernakova A, Fu J, Weersma RK, Wijmenga C, Clish CB, Vlamakis H, Huttenhower C, Xavier RJ. 2019. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat Microbiol 4:293–305. 10.1038/s41564-018-0306-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Rampelli S, Schnorr SL, Consolandi C, Turroni S, Severgnini M, Peano C, Brigidi P, Crittenden AN, Henry AG, Candela M. 2015. Metagenome sequencing of the Hadza hunter-gatherer gut microbiota. Curr Biol 25:1682–1693. 10.1016/j.cub.2015.04.055. [DOI] [PubMed] [Google Scholar]
  • 83.Obregon-Tito AJ, Tito RY, Metcalf J, Sankaranarayanan K, Clemente JC, Ursell LK, Zech Xu Z, Van Treuren W, Knight R, Gaffney PM, Spicer P, Lawson P, Marin-Reyes L, Trujillo-Villarroel O, Foster M, Guija-Poma E, Troncoso-Corzo L, Warinner C, Ozga AT, Lewis CM. 2015. Subsistence strategies in traditional societies distinguish gut microbiomes. Nat Commun 6:6505. 10.1038/ncomms7505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Smits SA, Leach J, Sonnenburg ED, Gonzalez CG, Lichtman JS, Reid G, Knight R, Manjurano A, Changalucha J, Elias JE, Dominguez-Bello MG, Sonnenburg JL. 2017. Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science 357:802–806. 10.1126/science.aan4834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Turroni S, Fiori J, Rampelli S, Schnorr SL, Consolandi C, Barone M, Biagi E, Fanelli F, Mezzullo M, Crittenden AN, Henry AG, Brigidi P, Candela M. 2016. Fecal metabolome of the Hadza hunter-gatherers: a host-microbiome integrative view. Sci Rep 6:32826. 10.1038/srep32826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Walters WA, Xu Z, Knight R. 2014. Meta-analyses of human gut microbes associated with obesity and IBD. FEBS Lett 588:4223–4233. 10.1016/j.febslet.2014.09.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Doré J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, MetaHIT Consortium, Bork P, Ehrlich SD, Wang J. 2010. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65. 10.1038/nature08821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Wiegand S, Jogler M, Jogler C. 2018. On the maverick Planctomycetes. FEMS Microbiol Rev 42:739–760. 10.1093/femsre/fuy029. [DOI] [PubMed] [Google Scholar]
  • 89.Ivanova AA, Naumoff DG, Miroshnikov KK, Liesack W, Dedysh SN. 2017. Comparative genomics of four Isosphaeraceae Planctomycetes: a common pool of plasmids and glycoside hydrolase genes shared by Paludisphaera borealis PX4T, Isosphaera pallida IS1BT, Singulisphaera acidiphila DSM 18658T, and strain SH-P62. Front Microbiol 8:412. 10.3389/fmicb.2017.00412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.McDonald ND, Boyd EF. 2021. Structural and biosynthetic diversity of nonulosonic acids (NulOs) that decorate surface structures in bacteria. Trends Microbiol 29:142–157. 10.1016/j.tim.2020.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Ndeh D, Rogowski A, Cartmell A, Luis AS, Baslé A, Gray J, Venditto I, Briggs J, Zhang X, Labourel A, Terrapon N, Buffetto F, Nepogodiev S, Xiao Y, Field RA, Zhu Y, O’Neil MA, Urbanowicz BR, York WS, Davies GJ, Abbott DW, Ralet M-C, Martens EC, Henrissat B, Gilbert HJ. 2017. Complex pectin metabolism by gut bacteria reveals novel catalytic functions. Nature 544:65–70. 10.1038/nature21725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Edgar LJ. 2021. Engineering the sialome. ACS Chem Biol 16:1829–1840. 10.1021/acschembio.1c00273. [DOI] [PubMed] [Google Scholar]
  • 93.Smith BAH, Bertozzi CR. 2021. The clinical impact of glycobiology: targeting selectins, Siglecs and mammalian glycans. Nat Rev Drug Discov 20:217–243. 10.1038/s41573-020-00093-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Gray MA, Stanczak MA, Mantuano NR, Xiao H, Pijnenborg JFA, Malaker SA, Miller CL, Weidenbacher PA, Tanzo JT, Ahn G, Woods EC, Läubli H, Bertozzi CR. 2020. Targeted glycan degradation potentiates the anticancer immune response in vivo. Nat Chem Biol 16:1376–1384. 10.1038/s41589-020-0622-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Rahfeld P, Sim L, Moon H, Constantinescu I, Morgan-Lang C, Hallam SJ, Kizhakkedathu JN, Withers SG. 2019. An enzymatic pathway in the human gut microbiome that converts A to universal O type blood. Nat Microbiol 4:1475–1485. 10.1038/s41564-019-0469-7. [DOI] [PubMed] [Google Scholar]
  • 96.Helbert W, Poulet L, Drouillard S, Mathieu S, Loiodice M, Couturier M, Lombard V, Terrapon N, Turchetto J, Vincentelli R, Henrissat B. 2019. Discovery of novel carbohydrate-active enzymes through the rational exploration of the protein sequences space. Proc Natl Acad Sci USA 116:6063–6068. 10.1073/pnas.1815791116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Wheeler TJ, Clements J, Finn RD. 2014. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15:7. 10.1186/1471-2105-15-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  • 100.Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Zallot R, Oberg NO, Gerlt JA. 2018. ‘Democratized’ genomic enzymology web tools for functional assignment. Curr Opin Chem Biol 47:77–85. 10.1016/j.cbpa.2018.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Gerlt JA, Bouvier JT, Davidson DB, Imker HJ, Sadkhin B, Slater DR, Whalen KL. 2015. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): a web tool for generating protein sequence similarity networks. Biochim Biophys Acta 1854:1019–1037. 10.1016/j.bbapap.2015.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Gerlt JA. 2017. Genomic enzymology: web tools for leveraging protein family sequence–function space and genome context to discover novel functions. Biochemistry 56:4293–4308. 10.1021/acs.biochem.7b00614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Zallot R, Oberg N, Gerlt JA. 2019. The EFI web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry 58:4169–4182. 10.1021/acs.biochem.9b00735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504. 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. 2004. UCSF Chimera - a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612. 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
  • 107.Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. 2022. ColabFold - making protein folding accessible to all. Nat Methods 19:679–682. 10.1038/s41592-022-01488-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Dong R, Peng Z, Zhang Y, Yang J. 2018. mTM-align: an algorithm for fast and accurate multiple protein structure alignment. Bioinformatics 34:1719–1725. 10.1093/bioinformatics/btx828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997. 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
  • 111.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. 2021. Twelve years of SAMtools and BCFtools. Gigascience 10:giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Shaffer M, Borton MA, McGivern BB, Zayed AA, La Rosa SL, Solden LM, Liu P, Narrowe AB, Rodríguez-Ramos J, Bolduc B, Gazitúa MC, Daly RA, Smith GJ, Vik DR, Pope PB, Sullivan MB, Roux S, Wrighton KC. 2020. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48:8883–8900. 10.1093/nar/gkaa621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Ihaka R, Gentleman R. 1996. R: a language for data analysis and graphics. J Comput Graph Stat 5:299. 10.2307/1390807. [DOI] [Google Scholar]
  • 115.Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. 2019. Welcome to the Tidyverse. J Open Source Softw 4:1686. 10.21105/joss.01686. [DOI] [Google Scholar]
  • 116.Laemmli UK. 1970. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227:680–685. 10.1038/227680a0. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1

Supplemental material. Download aem.01755-22-s0001.pdf, PDF file, 2.1 MB (2.1MB, pdf)


Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES