Abstract
The human gut microbiota produces diverse, extensive metabolites that have the potential to affect host physiology. Despite significant efforts to identify metabolic pathways for producing these microbial metabolites, a comprehensive metabolic pathway database for the human gut microbiota is still lacking. Here, we present Enteropathway, a metabolic pathway database that integrates 3269 compounds, 3677 reactions, and 876 modules that were obtained from 1012 manually curated scientific literature. Notably, 698 modules of these modules are new entries and cannot be found in any other databases. The database is accessible from a web application (https://enteropathway.org) that offers a metabolic diagram for graphical visualization of metabolic pathways, a customization interface, and an enrichment analysis feature for highlighting enriched modules on the metabolic diagram. Overall, Enteropathway is a comprehensive reference database that can complement widely used databases, and a tool for visual and statistical analysis in human gut microbiota studies and was designed to help researchers pinpoint new insights into the complex interplay between microbiota and host metabolism.
Keywords: human gut microbiota, microbial metabolic pathway database, visualization tool
Introduction
The human gut microbiota encodes a wide and diverse set of metabolic pathways for producing metabolites that can profoundly influence host physiology, health, and disease [1]. For instance, secondary bile acids such as Deoxycholate (DCA) and Isoallolithocholate (IsoalloLCA) are known to influence host metabolism [2], cancer progression [3], and the immune system [4, 5]. These metabolites and their bacteria are potentially a great target for controlling the host phenotype in health and disease.
Together with investigating the bioactivity of metabolites produced by the human gut microbiota, there is an increasing number of experimental studies aimed at identifying the metabolic pathways and the genes involved in their production [6–11]. These studies allow us to extract the potential metabolic functionalities of metabolite production from the (meta)genomic sequence [12, 13] and enable the prediction of metabolite presence, leading to experimental investigations into their effects on host physiology.
Despite efforts to identify the metabolic pathways for producing microbial metabolites, a comprehensive metabolic database for the human gut microbiota is still lacking. Kyoto Encyclopedia of Genes and Genomes (KEGG) and MetaCyc databases are the most widely used databases in metabolic studies, for many organisms, including bacteria [14, 15]. However, the majority of the metabolic pathways in KEGG and MetaCyc are based on those of humans and model organisms, and the specific metabolic pathways for human gut microbiota, which are formed by multiple species, have yet to be covered [16]. To address this issue, biome-specific databases have been developed, like the human gut-specific metabolic modules [17], and the gut-brain modules [18], which are manually curated from scientific literature, and were often used in recent human gut microbiota studies [19–22]. These databases encompass cellular enzymatic processes, which are defined as modules consisting of KEGG Orthology (KO) sets and the compounds involved in their production and degradation processes. However, they currently cover less than 160 modules, and they have yet to provide comprehensive coverage of microbial metabolic pathways such as secondary bile acids and trimethylamine (TMA) metabolism, and other important pathways. Furthermore, descriptions such as the relationship between microbial metabolites and/or metabolic pathways and the host information, including dietary patterns and diseases, are not provided. These descriptions can aid in generating hypotheses and revealing novel biological insights.
Here, we present Enteropathway, an integrated metabolic pathway database for the human gut microbiota, constructed through manual curation of the scientific literature. Its web application provides users with an interactive and customizable metabolic pathways diagram, as well as an integrated enrichment analysis module for highlighting enriched modules on the metabolic diagram. As a reference database and a visual and statistical analysis tool, Enteropathway is a valuable resource that can complement widely used databases for human gut microbiota researchers on a quest for novel biological insights.
Results
Development of the Enteropathway database
Despite considerable efforts to map microbial metabolite production to the pathways of the human gut microbiota, a comprehensive metabolic database, specifically designed for the human gut microbiota is still lacking. Therefore, we developed Enteropathway, a metabolic pathway database that integrates 3269 metabolic compounds derived from human fecal samples and/or in vitro experiments with human gut bacteria, 3677 chemical reactions (The cellular enzymatic process that changes the compound), and 876 modules (a set of reaction processes) based on 1012 manually curated scientific literature related to human gut microbiota (Fig. 1, Table 1). It covers microbial metabolic modules for producing/degrading diverse metabolites, including mono/oligo/polysaccharides, short-chain fatty acids, amino acids, pyrimidines, vitamins, lipids, bile acids, and other important metabolites.
Table 1.
Component | Overlap in KEGG | Overlap in MetaCyc | Overlap in GMM/GBM | Only Enteropathway | Total |
---|---|---|---|---|---|
Compound | 1923 | 1757 | 0 | 1179 | 3269 |
Reaction | 1428 | 1276 | 0 | 2118 | 3677 |
Module | 84 | 143 | 54 | 698 | 876 |
Subsequently, we investigated the relationship between these metabolic compounds and/or pathways and host-related information, such as dietary patterns and diseases. We then integrated this information into the description of 687 out of 876 Enteropathway modules. This enriched information can serve as a valuable resource for generating hypotheses and gaining novel biological insights.
Furthermore, we assigned unique Enteropathway identifiers to compounds, reactions, and modules. These identifiers were then manually linked to widely used databases such as KEGG, MetaCyc, and UniProt by expert curation. In particular, Enteropathway modules were manually linked to modules/pathways in other databases such as KEGG Modules and MetaCyc Pathway by cross-referencing the relevant scientific literature, reactions, and compounds in both modules/pathways. This cross-referencing enables users to easily query Enteropathway with enzyme, reaction, or ortholog identifiers of commonly used databases in the field (Table S1).
Finally, we downloaded 3594 representative human gut microbial species genomes and their 7 905 062 gene lists [23] and then performed taxonomic assignment against Genome Taxonomy Database (GTDB) [24] and gene annotation against KEGG and UniProt respectively for predicting compounds, reactions, and metabolic pathways in each microbial species (see Methods). Among them, 3345 genomes and 3 823 073 genes were annotated and integrated into the database. This taxonomic and gene information can help explore the potential metabolic functionalities in different human gut microbial species. With these features, users can interactively explore gene-annotation-based metabolic pathways in each microbial species.
Development of the Enteropathway web application
Interactive metabolic pathway diagrams are powerful tools for understanding cellular metabolism [25, 26]. When accessed through an interactive web application they can facilitate pathway exploration and visual analysis, enabling users to generate hypotheses and new biological insights. Therefore, we developed a web application (https://enteropathway.org), to provide users with a friendly way to explore the Enteropathway database and its metabolic diagram interactively.
Users can zoom in and out of the metabolic pathway using mouse gestures to comprehensively display the metabolic pathway at different levels, e.g., at the module level (Fig. 2a). Additionally, zoom buttons are also available for zooming in, zooming out, and resetting to the initial zoom. These features provide users with access to both the entire pathways and individual modules.
The Enteropathway web application offers interactive features that allow users to conveniently access information related to the compound, reaction, and module. By clicking on the circle, arrow, or box on the metabolic pathway diagram, users can easily reveal the entry ID and access the corresponding entry page (Fig. S1, S2, S3). These entry pages provide a list of manually curated scientific literature used as references, along with cross-references to other major biological databases and host-related information. To facilitate efficient exploration, Enteropathway includes a search engine that enables users to search and highlight matched entities on the metabolic diagram using entry names or IDs as keywords (Fig. 2b). These functionalities help users efficiently identify target metabolites or genes.
For large-scale exploration, a customization interface is provided for mapping analysis outcomes onto Enteropathway’s metabolic diagram (Fig. 2c). Users can utilize Enteropathway IDs or any IDs from the following list of supported annotations: KEGG (KO, Reaction, Module, Compound), MetaCyc (Reaction, Pathway, Compound), Enzyme Commission number (EC number), EggNOG, gut-specific metabolic modules/gut-brain modules, Rhea, Uniprot Accession, UniRef50, UniRef90, Chemical Abstracts Service (CAS) ID, PubChem CID, and GTDB accession ID (Table S1). This service enables users to explore potential metabolic functionalities based on their compounds, orthologous genes, and module/pathway lists.
The interface accepts space-separated customization to target the color, size, and opacity of a metabolic pathway’s visual elements. Alternatively, the enrichment analysis module helps seamlessly highlight enriched reactions on the metabolic diagram (Fig. S4, see Methods). It accepts a list of KO or Enteropathway reaction IDs that will undergo a hypergeometric test to select and highlight statistically significant Enteropathway modules. The statistical test results can be shown and downloaded as a tab-separated format file. All diagram customizations can be exported as a PDF file, and user accounts are also available for saving and easily sharing customization results via a simple URL (Fig. S5).
The taxonomy search/browse function for different human gut microbial species allows users to explore strains interactively by providing taxonomic names such as genus or species based on the representative human gut microbial species genomes in the Enteropathway database (https://enteropathway.org/#/taxonomy). Additionally, users can customize the metabolic pathway diagram to generate a strain-specific pathway using the strain ID as an identifier (Fig. S6). These features can interactively explore predicted metabolic pathways in each microbial species.
Finally, Enteropathway supports programmatic access through a REST API, allowing seamless integration with other bioinformatics tools. Detailed information can be found on the API page (https://enteropathway.org/#/api). Users can submit customization settings files via the API to obtain corresponding PDF results. Similarly, submitting lists of KO or Enteropathway reactions for enrichment analysis provides customized pathway diagrams in PDF format, along with statistical results in a tab-separated format file.
Case study for analyzing metagenomic and metabolomic data sets by Enteropathway
The gut microbiota plays a role as a modulator of aging-related health by controlling immunosystems and resistance to pathogen infection [27–29]. A recent study showed that the secondary bile acids, especially IsoalloLCA that was produced by Odoribacteraceae strains, were enriched in Centenarian (individuals aged 100 years and older) compared to Older (individuals aged 85–89 years) and Young (individuals aged 21–55 years) [30]. The IsoalloLCA may reduce the risk of pathogen infection by killing harmful gut bacteria such as Clostridium difficile. Here, we use Enteropathway to further identify and visualize the Centenarian-specific genes and metabolites on its human gut-specific metabolic pathways diagram. To this end, we analyzed metagenomic and metabolomic data sets derived from the Centenarian cohort study [30] and then explored them in Enteropathway. Firstly, we downloaded these data sets and obtained KO, MetaCyc Reaction, and metabolome profiles (Fig. S7). Then, we compared the abundance of KO, MetaCyc Reaction, and metabolites between Centenarian with Older and Young to identify Centenarian-specific reactions and metabolites. As a result, 3884 KOs, 3361 MetaCyc Reactions, and 37 metabolites were significantly different in abundance in Centenarian (Q < 0.1, Table S2). These were converted to 1530 reactions and 37 compounds and used to identify the Centenarian-specific module by enrichment analysis on the web application. This analysis showed that 73 modules were significantly enriched/depleted in Centenarian (Q < 0.1, Table S2, Fig. 3). Among them, Oleate degradation (EPM1200), Linoleate degradation (EPM1201), and TMA biosynthesis (EPM0671) were enriched in Centenarian (Q = 2.98x10−18, 2.98x10−18, 2.84x10−2). Especially, Linoleate is a well-known essential fatty acid, which may reduce heart disease mortality [31, 32].
We proceeded with customizing Centenarian-specific reactions, metabolites, and modules and specifically focused on the bile acids metabolism pathway for further analysis. By visually analyzing the customization result, we observed distinct enrichment patterns between the upstream and downstream sections of the pathway, providing valuable biological insights (Fig. 3). In the upstream section of the pathway, Centenarian exhibited depletion of primary bile acids, specifically Glycocholate and Taurocholate, along with their associated reactions. Conversely, in the downstream section, Centenarian showed enrichment of secondary bile acids, including DCA, Lithocholate (LCA), Isolithocholate (IsoLCA), Allolithocholate (AlloLCA), and IsoalloLCA, as well as their corresponding reactions. This observation suggests a reciprocal relationship between the depletion of primary bile acids in the upstream and the enrichment of secondary bile acids in the downstream. Further investigation of the final downstream compound in this pathway could provide valuable insights.
Comparison with other databases and their visualization tools
KEGG [14], MetaCyc [15], and the gut-metabolic/brain modules [17, 18] have been used in many recent human gut microbiota studies [19–22]. To identify the set of novel pathways that are only covered by Enteropathway, we counted the number of unique Enteropathway modules that do not link to the KEGG database, MetaCyc database, and gut-metabolic/brain modules. We found 698 out of 876 modules that are unique to the Enteropathway database (Table 1). Next, we counted significantly enriched/depleted modules in Centenarian that do not link to these databases. The 21 out of 73 modules, including Oleate degradation (EPM1200) and Linoleate degradation (EPM1201) were found as the unique novel pathway in Enteropathway (Table S2).
Finally, to evaluate the Enteropathway metabolic diagram, we customized Centanarian-specific reactions and compounds on the KEGG pathway (map00121, Secondary bile acid biosynthesis), then compared it to that of Enteropathway. We found that the upstream of bile acids metabolism was the same in KEGG pathways (Fig. S8) and Enteropathway (Fig. 3). However, the downstream was different as some of the compounds such as IsoLCA and AlloLCA are absent from the KEGG pathways, which suggests that Enteropathway can complement the KEGG pathway.
Discussion
The human gut microbiota plays an important role in human health and disease. However, existing tools and metabolic databases for studying the human gut microbiota are limited. To address this gap, we have developed Enteropathway, an integrated metabolic database specifically designed for the human gut microbiota. It is based on an extensive set of manually curated scientific literature and contains several novel entries, especially metabolic modules that were never reported by commonly used databases such as KEGG [14], MetaCyc [15], and the gut-metabolic/brain modules [17, 18]. This uniqueness positions Enteropathway as a valuable reference database that can complement widely used databases for studies related to the human gut microbiota. The database is updated every year and is accessible through a user-friendly web application.
In addition to pathway exploration, the web application provides users with mapping and enrichment analysis tools. Here, we demonstrated these features by analyzing metagenomic and metabolomic data sets from a Centenarian cohort study. For instance, we could comprehensively capture the bile acids metabolism pathway and could find a difference in the enrichment between the upstream and the downstream within the pathway. This may reflect a bias for metabolic activity and presents the final compounds such as AlloLCA, and IsoalloLCA as potential targets for better understanding the difference between different age groups.
We further revealed the enrichment of potential metabolic functionality for degrading Oleate and Linoleate in the Centenarian. Previous studies have shown that these compounds are well-known essential fatty acids, which may reduce heart disease mortality [31, 32]. Altogether, our result may be linked to the low potential risk of heart disease in Centenarian [33, 34].
The above shows how the Enteropathway diagram is a powerful tool for finding biological insights at the metabolic level, and we hope it will lead other microbiome researchers to novel insights. Future enhancements include a statistical framework that will enable users to perform enrichment analysis directly from functional profiles. For the moment the user accounts system and the REST API provide different ways to analyze and easily share results from Enteropathway. Taken together, these features make our web application a user-friendly tool for pathway exploration, visual analysis, and statistical analysis.
However, it is important to consider certain limitations. Firstly, 47.6% of Enteropathway reactions are orphan enzymes. A computational approach to identify genes for orphan enzymes such as E-zyme2 [35], and their experimental validation are needed to characterize gene-based (meta)genomic data for human gut microbiota study. Secondly, in principle, modifications of microbiota-derived compounds by human enzymes such as hepatic enzymes are not covered in Enteropathway. Some microbial-host-derived compounds from these modifications have been reported as key risk factors for disease [36–38]. Therefore, focusing on these modifications is important to capture the impact of the gut microbiota on human health and disease.
In conclusion, Enteropathway is a comprehensive metabolic database for the human gut microbiota, curated from a wide range of scientific literature. It offers greater coverage of the metabolic pathway for the human gut microbiota compared to commonly used reference databases, positioning it as a valuable resource that can complement the most widely used databases for studies in this field. Its web application provides an intuitive interface, allowing users to explore and customize reactions, compounds, and modules on a metabolic pathways diagram. Additionally, its enrichment analysis module enables the identification of potential metabolic functionalities and pathways involved in compound production or degradation. By analyzing publicly available metagenomic and metabolomic datasets with Enteropathway, we successfully identified uncovered enrichment patterns for aging-related metabolic pathways. This case study highlights how Enteropathway facilitates the discovery of biological insights through visual and statistical analyses.
Methods
Development of an integrated database for the human gut microbiota
We have manually curated ~3000 full-text scientific literature related to human gut microbiota and, from these, 1012 literature utilized to develop a metabolic pathway database that integrates compounds, reactions, and modules. These metabolites, reactions, and modules have been collected from scientific literature in human fecal samples and/or in vitro experimental samples with human gut bacteria through reviews by at least two different curators to ensure accuracy and reliability. Reactions are defined as the cellular enzymatic process that changes the compound. Each reaction is assigned an Enteropathway reaction entry ID, along with a name and definition. Substrates and products resulting from the reaction are also assigned as an Enteropathway compound entry ID. Modules are defined as multiple reaction processes. Additionally, the database includes module descriptions that highlight the relationship between microbial metabolites, metabolic pathways, and host information, such as dietary patterns and diseases.
Each Enteropathway compound, reaction, and module entry ID have been manually linked to other databases by expert curation. Reaction IDs have been annotated with EC numbers [39], Rhea [40], KEGG Reaction, MetaCyc Reaction [15], UniProt [41], UniRef90, UniRef50, eggNOG [42], and KO IDs. Similarly, compound entry IDs are linked to PubChem [43], CAS, MetaCyc Compound, and KEGG Compound. Module IDs are linked to gut-metabolic/brain modules [17, 18], MetaCyc Pathway, and KEGG Module IDs by cross-referencing the relevant scientific literature, reactions, and compounds in both modules/pathways.
To predict compounds, reactions, and metabolic pathways in each microbial species, we downloaded 3594 representative human gut microbial species genomes and their gene lists [23] and then performed taxonomic assignment and gene annotation, respectively. The taxonomic information for these genomes was obtained by GTDB-Tk v2 (version 2.4.0) [44] with the GTDB database (version R220) [24]. The gene annotation in each species was performed by HMM-based KofamKOALA (version 1.2.0) [45] against KEGG (as of 2024) [14] and homology search-based DIAMOND (version 2.1.9) [46] against UniProt (as of 2024) [47] [cutoffs: identity >60, bit score >70, coverage >80]. These annotation results were converted to Enteropathway reaction and integrated into the database along with the taxonomic information.
Development of the web application
The Enteropathway web application comprises a metabolic pathway diagram in SVG format, metabolic pathway browsers, and an enrichment analysis module. Each module in the diagram is color-coded according to its category, with green representing food, purple for drugs, brown for toxins, blue for secretions, and grey for other categories.
In the metabolic pathway browser, users can dynamically scale and customize the diagram using Enteropathway IDs or IDs from other databases which are matched and highlighted on the diagram. In case several IDs map to the same graphical element of the diagram, only the customizations of the last ID will be reflected on the diagram. Additionally, module customizations override reaction customizations within the diagram.
Several highlight options are available in the customization process. The color of the metabolic pathway can be changed to gray through the clear original colors button. The size of reactions and compounds can be interactively scaled by the slider. For modules, users can choose to highlight a module if at least one of its reactions is matched. They can additionally highlight the module title box or the module title box only. The enrichment analysis module identifies enriched or depleted modules based on a predefined library of reactions. Users provide a list of reactions or KO as input, which are then linked to the corresponding modules. First, it counts the number of reactions provided by the users in each module (referred to as A). Second, it counts the total number of reactions provided by the users (referred to as B). Third, it counts the number of reactions that are present in each module (referred to as C). Fourth, it counts the number of reactions that are present in all modules (referred to as D). Finally, the platform compares the proportion of A to B with the proportion of C to D in each module using the hypergeometric test from the ‘SciPy’ package in Python.
Publicly available metagenomic and metabolomic data sets
We collected publicly available metagenomic and metabolomic data sets used in Sato et al [30] to analyze them with Enteropathway. In this study, 330 fecal samples were collected from 319 participants and classified into three groups according to age; (i) Centenarian (average age: 107 years old; 176 fecal samples derived from 160 participants), (ii) Older (85–89 years old; 110 fecal samples derived from 112 participants), (iii) Young (21–55 years old; 44 fecal samples derived from 47 participants). Samples derived from participants who underwent antibiotic treatment or had insufficient bacterial DNA yield have already been excluded.
For the metagenomic data sets, we downloaded 330 metagenomic samples from the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA, accession number: PRJNA675598). These samples were generated by Illumina NovaSeq 6000 with a 151 bp paired-end sequence (Number of paired-end reads per sample: 10 million).
The metabolomic data sets were obtained from the supplementary material of Sato et al [30]. In this study, 43 fecal bile acids were quantified from 297 metabolomic samples by liquid chromatography–tandem mass spectrometry (LC–MS/MS).
Functional profiling
The functional profiles were generated by HUMAnN version 3.0.0 [48]. In brief, metagenomic reads were mapped to the detected-species-specific pangenomes per sample, and unmapped reads were annotated by homology search against UniRef90 to derive UniRef90 gene family abundances in copies per million (CPM) units.
Subsequently, we obtained KO and MetaCyc Reaction profiles using the humann_regroup_table module from HUMAnN (—groups uniref90_ko and —groups uniref90_rxn) which maps UniRef90 gene families to KO or MetaCyc Reaction, then sums the abundance of mapped families to obtain the abundance profiles. Finally, we normalized the profiles using the humann_renorm_table module with the parameter—units relab to obtain relative abundances.
Statistical analysis
A Wilcoxon rank-sum test was performed to characterize Centenarian-specific metabolites, KO, and MetaCyc Reaction by a comparison between Centenarian with Older and between Centenarian with Young. The Centenarian-specific modules were identified by a hypergeometric test using the Enrichment analysis module of the Enteropathway web application. To account for multiple testing, P-values were corrected using the Benjamini-Hochberg false-discovery rate, and Q-values were obtained. Statistical significance was defined as Q < 0.1.
Pathway characterization
To characterize Centenarian-specific reactions, compounds, and modules on the metabolic pathway, we applied four methods to integrate statistics results derived from comparing KO and MetaCyc Reaction relative abundance and metabolites concentration between Centenarian and Older, and Centenarian and Young. Firstly, significant results derived from comparing either Centenarian and Older or Centenarian and Young were characterized as Centenarian-specific KO or MetaCyc Reaction or compounds. Secondly, Centenarian-specific KO and MetaCyc Reaction were converted to Enteropathway reactions. Centenarian-specific compounds were also converted to Enteropathway compounds. Thirdly, Centenarian-specific modules were identified by enrichment analysis in the Enteropathway web application. Lastly, Enteropathway reactions, compounds, and modules were highlighted on the pathway by the web application with the option—module_box_highlight box-only—Clear_original_colors true.
Centenarian-specific Enteropathway reactions and compounds were converted to KEGG reactions and compounds respectively, then mapped on the map00121 in the KEGG pathway by KEGG mapper [26] for comparing customization results between KEGG and Enteropathway.
Key Points
We proposed a comprehensive metabolic database for human gut microbiota, integrating compounds, reactions, and metabolic pathways derived from manually curated scientific literature.
Approximately 80% of metabolic pathways in our database are new entries that cannot link to commonly used databases such as KEGG, MetaCyc, and gut-specific metabolic and gut-brain modules in this field.
The database is accessible via a web application that provides an interactive and customizable metabolic pathways diagram, along with an integrated enrichment analysis feature for highlighting enriched pathways on the metabolic diagram.
Using our database to analyze publicly available metagenomic and metabolomic datasets, we successfully identified enrichment patterns in aging-related metabolic pathways.
Supplementary Material
Acknowledgements
We are thankful to Dr. H. Mori for inspiring discussions.
Contributor Information
Hirotsugu Shiroma, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.
Youssef Darzi, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan; Omixer solutions, 4-7-15, Zaimokuza, Kamakura-shi, Kanagawa 248-0013, Japan.
Etsuko Terajima, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.
Zenichi Nakagawa, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.
Hirotaka Tsuchikura, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.
Naoki Tsukuda, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan.
Yuki Moriya, Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871, Japan.
Shujiro Okuda, Graduate School of Medical and Dental Sciences, Niigata University, 2-5274, Gakkocho-dori, Chuo-ku, Niigata City, Niigata 951-8514, Japan.
Susumu Goto, Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871, Japan.
Takuji Yamada, School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan; Metagen, Inc., 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata 997-0052, Japan; Metagen Theurapeutics, Inc., 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata 997-0052, Japan; Digzyme, Inc., 2-2-1 Toranomon, Minato-ku, Tokyo 105-0001, Japan.
Author contributions
T.Y., S.G., S.O., Y.M., N.T., and H.S. contributed to the study concept and design. E.T., Z.N., and N.T. manually curated scientific literature to develop the database. Y.D., H.S., and H.T. developed the web application. H.S. performed bioinformatics analyses. H.S., T.Y., Y.D., T.N., E.T., G.S., and Z.N. wrote the manuscript. T.Y. supervised the study. All authors read and approved the final manuscript.
Conflict of interest
T.Y. is a founder of Metagen Inc., Metagen Therapeutics Inc., and digzyme Inc. Metagen Inc. focuses on the design and control of the intestinal environment for human health in terms of both disease treatment and disease prevention. Metagen Therapeutics Inc. is working on the development of the intestinal microbiota bank to effectively implement fecal microbiota transplantation. digzyme Inc is focused on discovering enzymes through a bioinformatics approach. Y.D. is the founder of Omixer Solutions which develops and provides bioinformatics services and consulting.
Funding
This work was supported by grants from the JST AIP Acceleration Research (JPMJCR19U3 to T.Y.), the Japan Society for the Promotion of Science (KAKENHI JP16H06279 (PAGS); JP25710016 (Grant-in-Aid for Young Scientists A) to T.Y.), the Japan Agency for Medical Research and Development (JP21ck0106546h0002 to T.Y.; JP21cm0106477 to T.Y; JP22ama221404 to T.Y.), the National Cancer Center Research and Development Fund (2020-A-7 to T.Y.).
Data availability
Nucleotide sequences of the Centenarian cohort from Sato et al. are available in the NCBI SRA as PRJNA675598. The metadata and metabolomic profile for these samples are available in the Supplementary Table of Sato et al. The database information of compounds, reactions, and modules as a tab-separated format file can be downloaded after login to the web application. Alternatively, all information in our database can be accessed without user registration through each entry page on the web application.
References
- 1. Fan Y, Pedersen O. Gut microbiota in human metabolic health and disease. Nat Rev Microbiol 2021;19:55–71. 10.1038/s41579-020-0433-9. [DOI] [PubMed] [Google Scholar]
- 2. Jia W, Xie G, Jia W. Bile acid-microbiota crosstalk in gastrointestinal inflammation and carcinogenesis. Nat Rev Gastroenterol Hepatol 2018;15:111–28. 10.1038/nrgastro.2017.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Yoshimoto S, Loo TM, Atarashi K. et al. Obesity-induced gut microbial metabolite promotes liver cancer through senescence secretome. Nature 2013;499:97–101. 10.1038/nature12347. [DOI] [PubMed] [Google Scholar]
- 4. Hang S, Paik D, Yao L. et al. Bile acid metabolites control TH17 and T cell differentiation. Nature 2019;576:143–8. 10.1038/s41586-019-1785-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Paik D, Yao L, Zhang Y. et al. Human gut bacteria produce ΤH17-modulating bile acid metabolites. Nature 2022;603:907–12. 10.1038/s41586-022-04480-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Ikeyama N, Murakami T, Toyoda A. et al. Microbial interaction between the succinate-utilizing bacterium Phascolarctobacterium faecium and the gut commensal Bacteroides thetaiotaomicron. Microbiology 2020;9:e1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Doden HL, Wolf PG, Gaskins HR. et al. Completion of the gut microbial epi-bile acid pathway. Gut Microbes 2021;13:1–20. 10.1080/19490976.2021.1907271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Needham BD, Funabashi M, Adame MD. et al. A gut-derived metabolite alters brain activity and anxiety behaviour in mice. Nature 2022;602:647–53. 10.1038/s41586-022-04396-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Wolf PG, Cowley ES, Breister A. et al. Diversity and distribution of sulfur metabolic genes in the human gut microbiome and their association with colorectal cancer. Microbiome 2022;10:64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Javdan B, Lopez JG, Chankhamjon P. et al. Personalized mapping of drug metabolism by the human gut microbiome. Cell 2020;181:1661–1679.e22. 10.1016/j.cell.2020.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Henke MT, Kenny DJ, Cassilly CD. et al. A member of the human gut microbiome associated with Crohn’s disease, produces an inflammatory polysaccharide. Proc Natl Acad Sci U S A 2019;116:12672–7. 10.1073/pnas.1904099116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Devlin AS, Fischbach MA. A biosynthetic pathway for a prominent class of microbiota-derived bile acids. Nat Chem Biol 2015;11:685–90. 10.1038/nchembio.1864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Pascal Andreu V, Augustijn HE, Chen L. et al. gutSMASH predicts specialized primary metabolic pathways from the human gut microbiota. Nat Biotechnol 2023;41:1416–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Caspi R, Billington R, Keseler IM. et al. The MetaCyc database of metabolic pathways and enzymes - a 2019 update. Nucleic Acids Res 2020;48:D445–53. 10.1093/nar/gkz862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Darzi Y, Falony G, Vieira-Silva S. et al. Towards biome-specific analysis of meta-omics data. ISME J 2016;10:1025–8. 10.1038/ismej.2015.188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Vieira-Silva S, Falony G, Darzi Y. et al. Species-function relationships shape ecological properties of the human gut microbiome. Nat Microbiol 2016;1:16088. [DOI] [PubMed] [Google Scholar]
- 18. Valles-Colomer M, Falony G, Darzi Y. et al. The neuroactive potential of the human gut microbiota in quality of life and depression. Nat Microbiol 2019;4:623–32. 10.1038/s41564-018-0337-x. [DOI] [PubMed] [Google Scholar]
- 19. Liu X, Tong X, Zou Y. et al. Mendelian randomization analyses support causal relationships between blood metabolites and the gut microbiome. Nat Genet 2022;54:52–61. 10.1038/s41588-021-00968-y. [DOI] [PubMed] [Google Scholar]
- 20. Fromentin S, Forslund SK, Chechi K. et al. Microbiome and metabolome features of the cardiometabolic disease spectrum. Nat Med 2022;28:303–14. 10.1038/s41591-022-01688-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Schmidt TSB, Li SS, Maistrenko OM. et al. Drivers and determinants of strain dynamics following fecal microbiota transplantation. Nat Med 2022;28:1902–12. 10.1038/s41591-022-01913-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Dekkers KF, Sayols-Baixeras S, Baldanzi G. et al. An online atlas of human plasma metabolite signatures of gut microbiome composition. Nat Commun 2022;13:5370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Leviatan S, Shoer S, Rothschild D. et al. An expanded reference map of the human gut microbiome reveals hundreds of previously unknown species. Nat Commun 2022;13:3863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Parks DH, Chuvochina M, Rinke C. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 2022;50:D785–94. 10.1093/nar/gkab776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Darzi Y, Letunic I, Bork P. et al. iPath3.0: interactive pathways explorer v3. Nucleic Acids Res 2018;46:W510–3. 10.1093/nar/gky299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kanehisa M, Sato Y, Kawashima M. KEGG mapping tools for uncovering hidden features in biological data. Protein Sci 2022;31:47–53. 10.1002/pro.4172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bosco N, Noti M. The aging gut microbiome and its impact on host immunity. Genes Immun 2021;22:289–303. 10.1038/s41435-021-00126-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ragonnaud E, Biragyn A. Gut microbiota as the key controllers of ‘healthy’ aging of elderly people. Immun Ageing 2021;18:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Ghosh TS, Shanahan F, O’Toole PW. The gut microbiome as a modulator of healthy ageing. Nat Rev Gastroenterol Hepatol 2022;19:565–84. 10.1038/s41575-022-00605-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Sato Y, Atarashi K, Plichta DR. et al. Novel bile acid biosynthetic pathways are enriched in the microbiome of centenarians. Nature 2021;599:458–64. 10.1038/s41586-021-03832-5. [DOI] [PubMed] [Google Scholar]
- 31. Farvid MS, Ding M, Pan A. et al. Dietary linoleic acid and risk of coronary heart disease: a systematic review and meta-analysis of prospective cohort studies. Circulation 2014;130:1568–78. 10.1161/CIRCULATIONAHA.114.010236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Naghshi S, Aune D, Beyene J. et al. Dietary intake and biomarkers of alpha linolenic acid and risk of all cause, cardiovascular, and cancer mortality: systematic review and dose-response meta-analysis of cohort studies. BMJ 2021;375:n2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Evert J, Lawler E, Bogan H. et al. Morbidity profiles of centenarians: survivors, delayers, and escapers. J Gerontol A Biol Sci Med Sci 2003;58:232–7. [DOI] [PubMed] [Google Scholar]
- 34. Hirata T, Arai Y, Yuasa S. et al. Associations of cardiovascular biomarkers and plasma albumin with exceptional survival to the highest ages. Nat Commun 2020;11:3820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Moriya Y, Yamada T, Okuda S. et al. Identification of enzyme genes using chemical structure alignments of substrate-product pairs. J Chem Inf Model 2016;56:510–6. 10.1021/acs.jcim.5b00216. [DOI] [PubMed] [Google Scholar]
- 36. Kikuchi K, Saigusa D, Kanemitsu Y. et al. Gut microbiome-derived phenyl sulfate contributes to albuminuria in diabetic kidney disease. Nat Commun 2019;10:1835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Brunt VE, Gioscia-Ryan RA, Casso AG. et al. Trimethylamine-N-oxide promotes age-related vascular oxidative stress and endothelial dysfunction in mice and healthy humans. Hypertension 2020;76:101–12. 10.1161/HYPERTENSIONAHA.120.14759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Liu W-C, Tomino Y, Lu K-C. Impacts of indoxyl sulfate and p-cresol sulfate on chronic kidney disease and mitigating effects of AST-120. Toxins 2018;10:367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Chang A, Jeske L, Ulbrich S. et al. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 2021;49:D498–508. 10.1093/nar/gkaa1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Bansal P, Morgat A, Axelsen KB. et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res 2022;50:D693–700. 10.1093/nar/gkab1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. UniProt Consortium . UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021;49:D480–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Huerta-Cepas J, Szklarczyk D, Heller D. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 2019;47:D309–14. 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Kim S, Chen J, Cheng T. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021;49:D1388–95. 10.1093/nar/gkaa971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Chaumeil P-A, Mussig AJ, Hugenholtz P. et al. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 2022;38:5315–6. 10.1093/bioinformatics/btac672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Aramaki T, Blanc-Mathieu R, Endo H. et al. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 2020;36:2251–2. 10.1093/bioinformatics/btz859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Buchfink B, Reuter K, Drost H-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 2021;18:366–8. 10.1038/s41592-021-01101-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. UniProt Consortium . UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 2023;51:D523–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Beghini F, McIver LJ, Blanco-Míguez A. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife 2021;10:e65088. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Nucleotide sequences of the Centenarian cohort from Sato et al. are available in the NCBI SRA as PRJNA675598. The metadata and metabolomic profile for these samples are available in the Supplementary Table of Sato et al. The database information of compounds, reactions, and modules as a tab-separated format file can be downloaded after login to the web application. Alternatively, all information in our database can be accessed without user registration through each entry page on the web application.