Community Curation of Microbial Metabolites Enables Biological Insights of Metabolomics Data

Helena Mannochio-Russo; Wilhan D Gonçalves Nunes; Shipei Xing; Fernanda de Oliveira; Andrés Mauricio Caraballo-Rodríguez; Paulo Wender Portal Gomes; Vincent Charron-Lamoureux; Julius Agongo; Nicole E Avalon; Tammy Bui; Lucia Cancelada; Marc G Chevrette; Andrés Cumsille; Moysés B de Araújo, Júnior; Marilyn De Graeve; Victoria Deleray; Mohamed S Donia; Mutsawashe B Dzveta; Yasin El Abiead; Ronald J Ellis; Donald Franklin, Jr; Neha Garg; Harsha Gouda; Claude Y Hamany Djande; Anastasia Hiskia; Benjamin N Ho; Chambers C Hughes; Sunghoon Hwang; Sofia Iliakopoulou; Jennifer E Iudicello; Alan K Jarmusch; Triantafyllos Kaloudis; Irina Koester; Robert Konkel; Hector H F Koolen; Kine Eide Kvitne; Sabina Leanti La Rosa; Anny Lam; Santosh Lamichhane; Motseoa Lephatsi; Scott Letendre; Sarolt Magyari; Hanna Mazur-Marzec; Daniel McDonald; Ipsita Mohanty; Mónica Monge-Loría; David J Moore; Thiago André Moura Veiga; Musiwalo S Mulaudzi; Lerato Nephali; Griffith Nguyen; Martin Orságh; Abubaker Patan; Tomáš Pluskal; Phillip B Pope; Lívia Soman de Medeiros; Paolo Stincone; Andrej Tekel; Sydney Thomas; Ralph R Torres; Shirley M Tsunoda; Fidele Tugizimana; Martijn van Faassen; Felipe Vasquez-Castro; Giovanni A Vitale; Berenike C Wagner; Crystal X Wang; Sevasti-Kiriaki Zervou; Haoqi Nina Zhao; Simone Zuffa; Daniel Petras; Laura-Isobel McCall; Rob Knight; Mingxun Wang; Pieter C Dorrestein

doi:10.64898/2026.01.24.701521

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2026 Jan 26:2026.01.24.701521. [Version 1] doi: 10.64898/2026.01.24.701521

Community Curation of Microbial Metabolites Enables Biological Insights of Metabolomics Data

Helena Mannochio-Russo ^1,^†,^*, Wilhan D Gonçalves Nunes ^1,^†, Shipei Xing ¹, Fernanda de Oliveira ^1,², Andrés Mauricio Caraballo-Rodríguez ¹, Paulo Wender Portal Gomes ^1,³, Vincent Charron-Lamoureux ¹, Julius Agongo ¹, Nicole E Avalon ^4,^5,⁶, Tammy Bui ¹, Lucia Cancelada ^7,⁸, Marc G Chevrette ^9,¹⁰, Andrés Cumsille ^9,¹⁰, Moysés B de Araújo Júnior ^11,¹², Marilyn De Graeve ^1,¹³, Victoria Deleray ¹, Mohamed S Donia ¹⁴, Mutsawashe B Dzveta ¹⁵, Yasin El Abiead ¹, Ronald J Ellis ^16,¹⁷, Donald Franklin Jr ^18,¹⁷, Neha Garg ^19,²⁰, Harsha Gouda ¹, Claude Y Hamany Djande ¹⁵, Anastasia Hiskia ²¹, Benjamin N Ho ¹, Chambers C Hughes ^22,^23,²⁴, Sunghoon Hwang ¹⁴, Sofia Iliakopoulou ²⁵, Jennifer E Iudicello ^18,¹⁷, Alan K Jarmusch ^1,²⁶, Triantafyllos Kaloudis ^25,²¹, Irina Koester ^7,²⁷, Robert Konkel ²⁸, Hector H F Koolen ¹¹, Kine Eide Kvitne ¹, Sabina Leanti La Rosa ²⁹, Anny Lam ¹, Santosh Lamichhane ³⁰, Motseoa Lephatsi ¹⁵, Scott Letendre ^17,³¹, Sarolt Magyari ^1,³², Hanna Mazur-Marzec ²⁸, Daniel McDonald ³³, Ipsita Mohanty ¹, Mónica Monge-Loría ¹⁹, David J Moore ^18,¹⁷, Thiago André Moura Veiga ³⁴, Musiwalo S Mulaudzi ¹⁵, Lerato Nephali ³⁵, Griffith Nguyen ¹, Martin Orságh ^36,³⁷, Abubaker Patan ¹, Tomáš Pluskal ³⁶, Phillip B Pope ^29,³⁸, Lívia Soman de Medeiros ³⁴, Paolo Stincone ³⁹, Andrej Tekel ^36,³⁷, Sydney Thomas ¹, Ralph R Torres ⁷, Shirley M Tsunoda ¹, Fidele Tugizimana ¹⁵, Martijn van Faassen ⁴⁰, Felipe Vasquez-Castro ¹, Giovanni A Vitale ³⁹, Berenike C Wagner ³⁹, Crystal X Wang ^18,¹⁷, Sevasti-Kiriaki Zervou ²¹, Haoqi Nina Zhao ¹, Simone Zuffa ¹, Daniel Petras ^39,⁴¹, Laura-Isobel McCall ⁴², Rob Knight ^33,^43,^44,^45,^46,⁴⁷, Mingxun Wang ⁴⁸, Pieter C Dorrestein ^1,^49,^50,^51,^*

¹Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA

²Department of Biotechnology, Engineering School of Lorena, University of São Paulo, Lorena, São Paulo State, Brazil

³Amazon Integrated Metabolomics Center (CIMAZON), Institute of Natural and Exact Sciences, Federal University of Pará, Rua Augusto Corrêa, 01 - Guamá 66075-110, Belém, PA, Brazil

⁴Department of Pharmaceutical Sciences, School of Pharmacy & Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, USA

⁵Robert A

⁶Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA

⁷Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA

⁸Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA

⁹Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI 53706, USA

¹⁰Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53706, USA

¹¹Grupo de Pesquisa em Metabolômica e Espectrometria de Massas, Universidade do Estado do Amazonas, 69065-001 Manaus-AM, Brazil

¹²Instituto de Ciências Exatas e Tecnologia, Universidade Federal do Amazonas, 69103-128 Itacoatiara-AM, Brazil

¹³Laboratory of Integrative Metabolomics, Department of Translational Physiology, Infectiology and Public Health, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium

¹⁴Department of Molecular Biology, Princeton University, Princeton, NJ, USA

¹⁵Research Centre for Plant Metabolomics, Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa

¹⁶Department of Neurosciences, University of California San Diego, San Diego, CA 92093, USA

¹⁷HIV Neurobehavioral Research Program, University of California San Diego, San Diego, CA 92093, USA

¹⁸Department of Psychiatry, University of California San Diego, San Diego, CA 92093, USA

¹⁹School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States

²⁰Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, Georgia 30332, United States

²¹Institute of Nanoscience & Nanotechnology, NCSR Demokritos, Athens, Greece

²²Department of Microbial Bioactive Compounds, Interfaculty Institute of Microbiology and Infection Medicine (IMIT), University of Tübingen, 72076 Tübingen, Germany

²³Cluster of Excellence EXC 2124: Controlling Microbes to Fight Infection, University of Tübingen, 72076 Tübingen, Germany

²⁴German Center for Infection Research (DZIF), Partner Site Tübingen, 72706 Tübingen, Germany

²⁵AquOmixLab, Department of Water Quality Control, Athens Water Supply & Sewerage Company (EYDAP SA), Athens, Greece

²⁶National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA

²⁷Woods Hole Oceanographic Institution, Woods Hole, MA, USA

²⁸Department of Marine Biology and Biotechnology, Faculty of Oceanography and Geography, University of Gdańsk, Gdynia, Poland

²⁹Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, 1432 Ås, Norway

³⁰Institute of Biomedicine, Faculty of Medicine, & Turku Clinical Microbiome Bank, Clinical Microbiology & Microbe Centre, Turku University Hospital and University of Turku and Wellbeing Services County of Southwest Finland,Turku, Finland

³¹Department of Medicine, University of California San Diego, La Jolla, CA, USA

³²Department of Chemistry, Simon Fraser University, 8888 University Drive,Burnaby, Canada

³³Department of Pediatrics, University of California San Diego, La Jolla, CA, USA

³⁴Institute of Environmental, Chemical and Pharmaceutical Sciences, Department of Chemistry, Federal University of São Paulo, Diadema, 09972-270, Brazil

³⁵Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa

³⁶Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic

³⁷Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University, Albertov 6, 120 00 Prague 2, Czech Republic

³⁸The Centre for Microbiome Research, Queensland University of Technology, 4102, Woolloongabba, Australia

³⁹University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany

⁴⁰Department of Laboratory Medicine, University of Groningen, University Medical Center Groningen, 9713 GZ Groningen, the Netherlands

⁴¹Department of Biochemistry, University of California Riverside, Riverside, CA, USA

⁴²Department of Chemistry and Biochemistry, San Diego State University, San Diego, California, USA

⁴³Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA

⁴⁴Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA

⁴⁵Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA

⁴⁶Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA

⁴⁷Hong Kong University of Science and Technology Jockey Club Institute for Advanced Study, Hong Kong SAR, China

⁴⁸Department of Computer Science and Engineering, University of California Riverside, Riverside, CA, USA

⁴⁹Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA

⁵⁰Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA

⁵¹Department of Pharmacology, University of California San Diego, La Jolla, CA, 92093, USA

^†

These authors contributed equally

Author contributions:

H.M.-R., M.W., and P.C.D. conceptualized the project.

W.D.G.N., S.X., and M.W. developed the CMMC enrichment workflow and its integration to the GNPS2 environment.

H.M.-R. and W.D.G.N. developed the CMMC-Dashboard MetaboApp.

H.M.-R., W.D.G.N., V.C.-L., M.D.G. and M.V.F. created documentation.

H.M.-R., W.D.G.N., F.O., A.M.C.-R., P.W.P.G., J.A., N.E.A., T.B., L.C., M.G.C., A.C.M., M.B.A.J., M.D.G., V.D., C.Y.H.D., M.D., M.B.D., K.E.K., Y.E.A., N.G., H.G., B.H., S.H., S.I., T.K., I.K., R.K., H.H.F.K., A.L., S.L., S.L.L.R., P.B.P., M.M.L., S.M., H.M.-M., I.M., M.M.-L., T.A.M.V., M.S.M., L.N., G.N., M.O., A.P., T.P., L.S.M., P.S., A.T., S.T., R.R.T., S.M.T., F.T., M.V.F., F.V.-C., G.A.V., B.W., C.X.W., H.N.Z., S.Z., D.P. contributed to the CMMC knowledgebase.

H.M.-R., W.D.G.N., S.I., R.K., T.K., S.-K.Z., A.H., H.M.-M., L.-I.M., M.M., and N.G. contributed to use cases.

A.K.J., D.M., and R.K. supervised sample handling and acquired data from the American Gut Project cohort.

R.J.E., D.F.Jr., J.E.I., S.L., and D.J.M. developed the clinical cohorts of human immunodeficiency virus (HIV) infection.

W.D.G.N. and M.W. developed the deposition algorithm and the knowledgebase interface.

P.C.D. acquired funding and supervised this project.

H.M.-R., W.D.G.N., and P.C.D. wrote the manuscript. All authors reviewed and edited the manuscript.

Correspondence: Helena Mannochio-Russo (hmannochiorusso@health.ucsd.edu) or Pieter C. Dorrestein pdorrestein@health.ucsd.edu

PMCID: PMC12874021 PMID: 41659575

Abstract

Microbial metabolites play a critical role in regulating ecosystems, including the human body and its microbiota. However, understanding the physiologically relevant role of these molecules, especially through liquid chromatography tandem mass spectrometry (LC-MS/MS)-based untargeted metabolomics, poses significant challenges and often requires manual parsing of a large amount of literature, databases, and webpages. To address this gap, we established the Collaborative Microbial Metabolite Center knowledgebase (CMMC-KB), a platform that fosters collaborative efforts within the scientific community to curate knowledge about microbial metabolites. The CMMC-KB aims to collect comprehensive information about microbial molecules originating from microbial biosynthesis, drug metabolism, exposure-related molecules, food, host-derived molecules, and, whenever available, their known activities. Molecules from other sources, including host-produced, dietary, and pharmaceutical compounds, are also included. By enabling direct integration of this knowledgebase with downstream analytical tools, including molecular networking, we can deepen insights into microbiota and their metabolites, ultimately advancing our understanding of microbial ecosystems.

Introduction:

Of the thousands of molecules detectable by liquid chromatography-mass spectrometry (LC-MS/MS) in typical biospecimens, the host-associated microbiome modifies 15–70% of them depending on the specific organ or biofluid analyzed¹. In a typical untargeted metabolomics profile from humans, only about 10% of the acquired spectra can be annotated, and among these, an even smaller portion can be directly traced to microbial origins. Humans have three major sources of microbial metabolites: 1) microbial metabolism of host-derived metabolites²; 2) microbial metabolism of molecules from food and beverages³; and 3) microbial metabolites assembled de novo using proteins encoded by genetic elements often arranged as gene clusters (in bacteria, archaea, fungi, and, recently, discovered to be widespread in phages)⁴. Additionally, microbial metabolites found in humans originate from microbial processing of xenobiotics other than food, such as plasticizers, pollutants, medications, and environmental molecules absorbed through the skin or inhaled ^5–7.

Despite the critical importance of microbiome-derived metabolites to human health – including those involved in the microbe-gut-brain⁸ and microbe-diet-host axes⁹ – and other ecosystems, there is no centralized knowledgebase where the scientific community can deposit, curate, access, and reuse that knowledge. Existing resources have assessed how the microbiome influences the consumption and production of about 900 largely primary microbial metabolites¹⁰, and have compiled literature-curated information about 3,269 microbiome-derived metabolites¹¹, but most of these metabolites are not unique to microorganisms or have been curated from metabolic models, which tend to capture mainly primary metabolism rather than specialized metabolites which can be more biologically relevant for host-microbiome interactions^12,13. In addition, some targeted commercial metabolite platforms claim to capture up to ~140 microbial molecules, but many of those could also come from diet or the host, highlighting the challenge in the field with accurately understanding microbiome-derived metabolites¹⁴. microbeMASST, our recent tool that enables searching a fragmentation spectra against a reference microbial metabolomics database, allows direct connection between bacteria and fungi and microbially-derived molecules they produce¹⁵. However, microbial metabolites that have been recently discovered (or even yet to be discovered), the organisms and the genes responsible for their production, and/or their related activities, are not yet systematically cataloged. Therefore, reusing this information is a bottleneck for the community that aims to mechanistically understand the microbiome.

To complement existing microbial metabolite resources, as well as to enable annotation of structurally uncharacterized metabolites (captured as MS/MS spectra), we have created the Collaborative Microbial Metabolite Center knowledgebase (CMMC-KB). Leveraging the Global Natural Product Social Molecular Networking (GNPS)¹⁶ mass spectrometry analysis ecosystem, the CMMC-KB enables collaboration within the scientific community to curate knowledge on microbial metabolites or metabolites that might influence microbial metabolites (drugs, food, etc). The goal of this initiative is to facilitate biological interpretations of microbiome-derived molecules. With downstream molecular networking integration, the CMMC-KB allows users to visualize MS/MS spectra of compounds classified as microbial metabolites within molecular networks (grouped by MS/MS spectral similarity), even if their structures remain unknown. Furthermore, it provides information on microbial producers, the sources of the molecules, associated genes or sequences, and their biological activities, if known. For a broader investigation of the metabolome, molecules from other sources, such as endogenous molecules, compounds ingested through diet, and drugs, among others, are also included as part of this resource. The CMMC-KB is a user-accessible, collaboratively curated, and continuously evolving microbiome resource. Further, to encourage data deposition, we offer web-based analysis tools, including accessible web applications, that benefit both the data contributors and the broader community. In alignment with the FAIR data principles, we are committed to building this central knowledge hub in collaboration with the scientific community.

Results and discussion:

The CMMC-KB (https://cmmc-kb.gnps2.org/) is a knowledgebase derived from contributions by the scientific community and comprises spectral (MS/MS data) and structural (chemical structures) information about microbially-derived compounds, as well as dietary, host-derived, and other exposure-related compounds. Contributions to the CMMC-KB are facilitated through a dedicated workflow in GNPS2 (a second major implementation of the GNPS ecosystem), enabling users to upload information organized into eight main sections: 1) MS/MS data selection, 2) metabolite identification, 3) taxonomy/phylogeny selection, 4) biosynthesis, 5) activity, 6) references, 7) funding information, and 8) additional comments (Figure 1a). The community can contribute to this resource by uploading knowledge for a single molecule at a time or in batches of molecules. A comprehensive documentation page is available to guide users on the recommended inputs (https://cmmc.gnps2.org/deposition_documentation/). While MS/MS spectra are recommended, they are not required, and users may deposit the molecular structure. Additionally, the molecules deposited can be classified as confirmed (e.g., observed experimentally in microbial cultures¹⁷, observed in colonized but not in germ-free mice, etc.) or predicted to be microbial (e.g., MS/MS of synthetic compounds with matches against other microbial resources like microbeMASST¹⁵). As of December 2025, the knowledgebase comprises 80,201 MS/MS spectra from 4,998 compounds that were linked to 2,722 microorganisms. These numbers reflect the collective efforts of more than 30 researchers who have contributed to the development of this resource to date^17–21. Among the compounds deposited, their molecular sources were mainly classified as microbial, drug, or diet-related (Figure 1b). The majority of compounds had a known molecular origin, such as drugs, de novo biosynthesized by microbes, or diet, with 25.9% classified as unknown/undefined (Figure 1c). Since diet, drugs, and host-derived molecules can act as confounders, and because they often influence microbial metabolite production, the CMMC-KB includes and annotates these non-microbial compounds within a single, comprehensive resource.

Figure 1. — **(a)** Inputs accepted for community depositions and current numbers as of December 2025. **(b)** CMMC enrichment workflow in GNPS2, which annotates molecular networks (generated from Classical or Feature-Based Molecular Networking^25,26) by matching experimental spectra to the CMMC-KB and retrieving associated metadata. **(c)** Download options as MGF and TSV files, enabling reuse in third-party software and in-house workflows. **(d)** The CMMC-Dashboard is a web application that enables users to utilize outputs from the FBMN and CMMC enrichment workflows, along with uploaded metadata, to generate visualizations for exploring matches to the CMMC-KB (e.g., boxplots for statistical evaluation, structure cards, UpSet-style overviews, and microbeMASST integration). Distribution of the deposited compounds (December 2025) by **(e)** molecule source and **(f)** molecule origin. Icons were obtained from Bioicons.com.

To facilitate the use of information deposited in the CMMC-KB, there are three ways to access and leverage the knowledgebase. First, data are available for direct download in TSV/CSV and MGF formats from the website, allowing integration into customized in-house or third-party workflows. Second, we developed a workflow within the GNPS2 ecosystem that enables downstream enrichment of molecular networks (CMMC enrichment) with information from the CMMC-KB. Finally, we created an interactive web application²², CMMC-Dashboard (https://cmmc-dashboard.gnps2.org/), which allows users to visually explore and interpret the data in an accessible and user-friendly manner.

Many compounds deposited as microbial metabolites may also come from other sources. For example, some amino acids and fatty acids can be synthesized by microorganisms, ingested through diet, and also produced by host cells. To address this issue, we refined source annotations in the CMMC-KB by reanalyzing four datasets available in the public domain which contained tissues or biofluids of germ-free (GF) and colonized mice (MSV000079949¹, MSV000088040²³, MSV000097485, MSV000090974²⁴), also considering mouse diet (chow) for metabolomics data, when available. We ran feature-based molecular networking (FBMN)²⁵ followed by CMMC enrichment in GNPS2. Entries initially labelled as “microbial” were selected in the CMMC-Dashboard, and boxplots were plotted for GF vs. colonized mice (and also vs. diet, if available). We defined a metabolite as “microbial-only” when it was absent in the GF group but present in colonized mice (Supplementary Figure S2a-c), and added labels as “diet” and/or “host” when the metabolite was detected in GF and/or chow. This classification may include both microbially-produced metabolites and microbe-induced host metabolites, which cannot be distinguished without additional experimental validation. This targeted curation expanded the information available in the CMMC-KB by providing additional classifications for 88 metabolites (1.76% of the compounds deposited).

To illustrate how the CMMC-KB can benefit researchers, we used this resource to investigate microbial metabolites in a subset of the American Gut Project (n = 1,993 files), a citizen-science cohort with participation open to the general population (primarily from the United States, the United Kingdom, and Australia)²⁷. In this example, FBMN was performed, followed by CMMC enrichment to annotate features based on spectral matches to the knowledgebase. The source distribution of matched metabolites revealed a diverse chemical landscape, with compounds classified across multiple categories, including microbial, host-derived, and xenobiotic sources (Figure 2a). By overlaying this information onto the molecular network, one can rapidly visualize regions enriched in specific source categories (Figure 2b). This network-based visualization facilitates hypothesis generation by revealing which networks of structurally related compounds share common sources. Zooming into specific network regions (Figure 2c-e) demonstrates the utility of the tool for detailed exploration of individual molecular families, where users can have an integrated view of the source annotations, structural relationships, and associated metadata for compounds of interest. With such an overview, users can target the investigation of specific classes of compounds with important biological functions. For instance, microbially-derived bile acids play crucial roles in immune regulation,²⁸ and have been implicated in conditions ranging from inflammatory bowel disease to metabolic disorders and neurocognitive function^29,30. Similarly, N-acyl lipids serve as signaling molecules involved in immune homeostasis, energy metabolism, and gut-brain axis communication^31,32. The ability to identify and annotate these metabolite families (along with their potential microbial, dietary, or host origins) enables researchers to formulate targeted hypotheses about microbiome-host interactions and prioritize investigations into specific microbial producers, dietary influences, or disease associations. This analysis exemplifies how the CMMC-KB, combined with molecular networking, provides an efficient workflow to survey complex metabolomic datasets and identify features warranting further mechanistic investigation. Importantly, while the biological roles of bile acids and N-acyl lipids in gut-microbiome interactions were previously established, the CMMC-KB workflow enabled their rapid annotation and source classification in the American Gut Project cohort – a process that would have required extensive manual literature curation. This cross-cohort validation demonstrates that known metabolite-microbiome relationships can be efficiently detected across diverse population studies using this framework.

Figure 2. — **(a)** Source distribution of metabolites matched to the CMMC-KB from a subset of the American Gut Project (n = 1,993 samples)²⁷. The UpSet plot was generated using the CMMC-Dashboard. **(b)** Molecular network visualization with nodes colored by metabolite source annotation from the CMMC-KB. Each node represents a unique mass spectral feature, and edges connect features with similar MS/MS spectra (cosine similarity threshold set to 0.5). **(c-e)** Zoomed-in views of selected molecular networks with distinct source annotations. These subnetworks illustrate the tool’s capability to rapidly identify and visualize structurally related compounds sharing common sources within complex metabolomic datasets. The colors of the nodes in **b-e** match the upset plot colors in a.

Beyond this specific use case, the CMMC-KB has been applied to diverse biological contexts that demonstrate its versatility in addressing complex research questions, ranging from the human microbiome, natural products, and environmental fields (Supplementary Material). In clinical settings, this resource enabled mapping drug metabolism across multiple biofluids in people with HIV, revealing that while antiretrovirals like ritonavir undergo extensive microbial transformation in the gut, these derivatives remain largely absent from plasma and cerebrospinal fluid (Supplementary Figure S1). Comparisons between germ-free and colonized mice facilitated the annotation and the refinement of microbial metabolites, including bile acid conjugates and N-acyl lipids, illustrating the dynamic, community-driven nature of the knowledgebase as new data emerge (Supplementary Figure S2). Environmental applications include the detection of bioactive cyanobacterial metabolites in Lake Marathon water samples, providing actionable information for water safety management (Supplementary Figure S3). In disease contexts, the tool identified a microbiome-derived bile acid conjugate altered by Leishmania infection in hamster tissues, linking microbial metabolism to parasite-induced disturbances (Supplementary Figure S4). Finally, in coral holobiont research, the CMMC-KB successfully disentangled bacterial versus zooxanthellae metabolic contributions in synthetic communities, revealing siderophore-mediated interactions that would have been difficult to assign using traditional approaches alone (Supplementary Figure S5).

When using the CMMC-KB, users should be aware of two key limitations. First, spectral matches are based on cosine similarity³³ or modified cosine similarity²⁶, which cannot easily distinguish isomers that share very similar MS/MS patterns. As a result, isomeric compounds, including those originating from different sources, may have spectra with a high cosine similarity (e.g., deoxycholic acid is a microbial metabolite, chenodeoxycholic acid is host-derived, and their MS/MS cosine similarity is >0.9; Supplementary Figure S2d). Consequently, users may obtain spectral matches to metabolites of incorrect biological origin, which highlights the need for follow up experiments and analyses for validation. Whenever possible, users should acquire orthogonal data (e.g. UV-vis, retention time, ion mobility collision cross section (CCS)) obtained from authentic chemical standards for confirmation. Second, the microbial origin of metabolites also requires additional experimental validation beyond spectral matching. Users can employ complementary approaches such as pure culture studies, co-culture experiments with isotope tracing (e.g., ¹³C-labeled substrates), comparisons between germ-free and colonized animal models, or spatial metabolomics to confirm not only the accuracy of the annotation but also its microbial biosynthesis or transformation of the detected compounds. As entries and curated knowledge continue to grow with future studies and depositions, the CMMC-KB will increasingly empower researchers to gain biological insights on the role of the microbiome in human health and diverse ecosystems.

Methods:

CMMC-KB development

The CMMC knowledge portal was developed using the FAIR (Findable, Accessible, Interoperable, and Reusable) principles as a guideline³⁴. It incorporates a series of Python workflows designed to process deposition files and generate visualization tables for all deposited information (Findable). In addition, the CMMC-KB server compiles the files required for molecular networking enrichment workflows, including the MGF for the spectral database and structural information for deposited metabolites (Accessible). The KB server provides programmatic access through API endpoints (Interoperable) to download the database files, enabling seamless integration and reuse of information within custom workflows (Reusable). The database is automatically compiled daily to ensure the workflows use the most up-to-date information available in the KB.

Each compound with an associated structure in the CMMC-KB is assigned a unique URL, enabling seamless cross-linking to external resources such as NPAtlas.^35,36 The structure page provides users with tools to explore the molecular structure of metabolites and access all available information for a given molecule and mass spectra available in the knowledgebase. From this interface, users can also contribute additional data by being redirected through a URL to a prepopulated deposition page containing the USI, molecule name, and SMILES/InChI, where further information can be added.

The CMMC-KB Statistics page is a public, daily refreshed summary of the knowledgebase that reports coverage (total unique mass spectra), composition (distributions by metabolite source/origin), and temporal dynamics (new deposits over time), alongside contributor activity.

CMMC-KB deposition workflow

The CMMC-KB deposition workflow is implemented as a Nextflow-based pipeline³⁷ on GNPS2, which runs a series of Python scripts to validate and process user submissions. The deposition workflow supports both single (one molecule) and batch (multiple entries) deposition modes. In single-deposition mode, parameters are provided through a workflow form or YAML file, while in batch depositions, the input is provided through a TSV file. Each entry is checked against controlled vocabularies (e.g., source and origin) and must include valid spectral and structural identifiers: spectra are verified via the Metabolomics Spectrum Resolver API³⁸ using USIs, and chemical structures (SMILES or InChI) are validated with the GNPS2 ChemicalStructureWebService API. Following validation, all data is submitted to the CMMC-KB server via POST requests. The necessary templates, including the TSV file and deposition instructions, are fully documented and available at https://cmmc.gnps2.org/deposition_documentation/.

Network Enrichment workflow

The enrichment workflow is implemented as a Nextflow pipeline available within the GNPS2 ecosystem, and can be launched as a downstream analysis from the Classical or Feature-Based Molecular Networking results. This design enables users to annotate molecular networks with microbial information through a single-click integration. The workflow retrieves molecular networking outputs from both GNPS1 and GNPS2 jobs, including the network (.graphml) and associated spectral (.MGF) files. The retrieved spectra will be matched against the ones available in the CMMC-KB by cosine similarity, and the matches are further enriched with additional metadata if available, including microbial producers, taxonomy, chemical structure, biosynthetic gene clusters, molecular origin, activities, and compound classifications predicted using NPClassifier.³⁹ The outputs include a library match TSV table and a new .graphml file with overlaid compound metadata information from the CMMC-KB matches. Additional visualizations, such as the producer lineage and taxonomic distribution, are generated from the NCBI Taxonomy IDs linked to each deposited spectrum. A documentation of the network enrichment workflow is available at https://cmmc.gnps2.org/network_enrichment/.

CMMC-Dashboard web application

The CMMC Analysis Dashboard (https://cmmc-dashboard.gnps2.org/) was implemented as a web application using the Streamlit Python package to provide interactive access to results from the CMMC-KB enrichment workflow in combination with the FBMN data. The dashboard integrates directly with GNPS2 through Task IDs provided by the user. Task IDs from enrichment and FBMN workflows allow the application to fetch processed files, including enrichment results, FBMN quantification tables, molecular networks, and associated metadata. The dashboard can also be launched directly as a downstream analysis from the enrichment workflow results page, from which the required inputs will be prepopulated in the dashboard interface. A complete documentation for this tool can be found at https://wang-bioinformatics-lab.github.io/GNPS2_Documentation/metaboapp_CMMC_dashboard/.

After the inputs are specified, the dashboard merges enrichment outputs with quantification tables and metadata for downstream analyses. Statistical functionality includes the generation of box plots to compare metabolite abundances across groups, with options for stratification and multiple statistical tests. Overlaps of metabolite sources or origins can be visualized using UpSet plots⁴⁰ derived from the enrichment results. Molecular network exploration is supported through interactive Plotly visualizations that highlight selected features within networks, incorporate delta-mass annotations for network edges, and enable export of figures. The dashboard further integrates microbeMASST¹⁵, allowing users to perform spectral searches based on a Universal Spectrum Identifier (USI) or feature ID, returning exact or analog matches with compounds from microbial cultures. This allows for taxonomically informed results with corresponding downloadable taxonomic trees.

Supplementary Material

Supplement 1

NIHPP2026.01.24.701521v1-supplement-1.pdf^{(2.4MB, pdf)}

Acknowledgements:

We thank the support from NIH (NIDDK) for the Collaborative Microbial Metabolite Center U24DK133658, support from NSF CAREER award #2047235 to N.G, support from the Research Foundation Flanders (FWO) [V406123N] to M.D.G. The Gordon and Betty Moore Foundation, GBMF12120 and https://doi.org/10.37807/GBMF12120, provided support to P.C.D and A.M.C.-R. This research was supported in part by the National Center for Complementary and Integrative Health of the NIH under award number F32AT011475 to N.E.A.. S.L. was supported by the Research Council of Finland funding (grant no. 363417). T.P. was supported by the Czech Science Foundation (GA CR) grant 21-11563M and by the European Union’s Horizon Europe program (ERC, TerpenCode, 101170268). F.O. was supported by FAPESP (2021/09175-4 and 2022/14603-8). H.H.F.K. was supported by FAPEAM, CNPq (443823/2024-3), and FINEP. L.-I.M. acknowledges the Burroughs Wellcome Fund Investigators in the Pathogenesis of Infectious Disease. R.J.E., D.F.Jr., J.E.I., S.L., and D.J.M were supported by NIH P30 MH062512. R.J.E., D.F.Jr., and S.L. were supported by NIH N01 MH22005 and R01 MH125720. This research was supported in part by the Intramural Research Program of the National Institutes of Health (NIH), National Institute of Environmental Health Sciences (ZIC ES103363). The contributions of the NIH author(s) were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered Works of the United States Government. However, the findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.

Disclosures:

P.C.D. is an advisor and holds equity in Cybele, BileOmix, and Sirenas, and a Scientific co-founder, advisor, holds equity and/or received income to Ometa, Enveda, and Arome with prior approval by UC-San Diego. P.C.D. also consulted for DSM animal health in 2023. M.W. is a co-founder of Ometa Labs LLC. D.M. is a consultant for and has equity in BiomeSense, Inc. The terms of these arrangements have been reviewed and approved by the University of California, San Diego, in accordance with its conflict-of-interest policies. R.K. is a scientific advisory board member, and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He has equity in and acts as a consultant for Cybele. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. S.T. is currently employed by Ometa Labs; this work was completed prior to that employment, and Ometa Labs had no role in the study design, data collection, analysis, or decision to publish. T.P. is a co-founder of mzio GmbH and a consultant for Novogaia, Inc. M.S.D is a scientific Co-founder at Pragma Bio. The work described here is unrelated to the work conducted at Pragma Bio.

Data availability:

All the datasets used in this work as use cases of the CMMC-KB are available in MassIVE (massive.ucsd.edu). Raw data files from the American Gut Project, used in Figure 2, are deposited at MSV000080673. The feature finding step was performed in MZmine3, following the previous parameters used for this dataset¹⁹. Feature-Based Molecular Networking analysis and CMMC enrichment analysis for the American Gut Project use case can be found at https://gnps2.org/status?task=553c08a0e2274572a4edd2ba2d669668 and https://gnps2.org/status?task=2ac40effdb0f404fa6a045a580ff5430, respectively. Additional relevant dataset accessions are provided together with their description in the Supplementary Material. Owing to human volunteer protection constraints, the sample metadata for the HIV cohorts will be provided upon request to HNRC: https://hnrp.hivresearch.ucsd.edu/index.php/hnrc-home.

Code availability:

The code used for creating and implementing the CMMC enrichment workflow within the GNPS2 ecosystem is available at https://github.com/Wang-Bioinformatics-Lab/CMMC_GNPSNetwork_Enrichment_Workflow. The code used to create the CMMC-Dashboard MetaboApp is available at: https://github.com/wilhan-nunes/streamlit_CMMC_analysis-dashboard. The code used for dataset analyses can be found at: https://github.com/helenamrusso/CMMC-KB_manuscript.

References:

1.Quinn R. A. et al. Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579, 123–129 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Tremaroli V. & Bäckhed F. Functional interactions between the gut microbiota and host metabolism. Nature 489, 242–249 (2012). [DOI] [PubMed] [Google Scholar]
3.Oliphant K. & Allen-Vercoe E. Macronutrient metabolism by the human gut microbiome: major fermentation by-products and their impact on host health. Microbiome 7, 91 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hadadi N., Berweiler V., Wang H. & Trajkovski M. Intestinal microbiota as a route for micronutrient bioavailability. Curr. Opin. Endocr. Metab. Res. 20, 100285 (2021). [Google Scholar]
5.Wilson I. D. & Nicholson J. K. Gut microbiome interactions with drug metabolism, efficacy, and toxicity. Transl. Res. 179, 204–222 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chiu K., Warner G., Nowak R. A., Flaws J. A. & Mei W. The impact of environmental chemicals on the gut microbiome. Toxicol. Sci. 176, 253–284 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lindell A. E., Zimmermann-Kogadeeva M. & Patil K. R. Multimodal interactions of drugs, natural compounds and pollutants with the gut microbiota. Nat. Rev. Microbiol. 20, 431–443 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Cryan J. F. et al. The Microbiota-gut-brain axis. Physiol. Rev. 99, 1877–2013 (2019). [DOI] [PubMed] [Google Scholar]
9.Corbin K. D. et al. Host-diet-gut microbiome interactions influence human energy balance: a randomized clinical trial. Nat. Commun. 14, 3161 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Han S. et al. A metabolomics pipeline for the mechanistic interrogation of the gut microbiome. Nature 595, 415–420 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Shiroma H. et al. Enteropathway: the metabolic pathway database for the human gut microbiota. Brief. Bioinform. 25, (2024). [Google Scholar]
12.Wishart D. S. et al. MiMeDB: The Human Microbial Metabolome Database. Nucleic Acids Res. 51, D611–D620 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kruger R. et al. MiMeDB 2.0: The human Microbial Metabolome Database for 2026. Nucleic Acids Res. (2025) doi: 10.1093/nar/gkaf1272. [DOI] [Google Scholar]
14.Wortmann E., Adam G. & Limonciel A. Biocrates. MxP^® Quant 1000 in microbiome research. Preprint at https://biocrates.com/wp-content/uploads/2025/05/Application-note-Quant-1000-in-microbiome-research.pdf (2025). [Google Scholar]
15.Zuffa S. et al. microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data. Nat Microbiol 9, 336–345 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wang M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Caraballo-Rodríguez A. M. et al. The undiscovered natural product potential of Actinomycetes. J. Antibiot. (Tokyo) (2025) doi: 10.1038/s41429-025-00876-x. [DOI] [Google Scholar]
18.Poynton E. F. et al. The Natural Products Atlas 3.0: extending the database of microbially derived natural products. Nucleic Acids Res. 53, D691–D699 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zhao H. N. et al. A resource to empirically establish drug exposure records directly from untargeted metabolomics data. Nat. Commun. 16, 10600 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Patan A. et al. Charting the undiscovered metabolome with synthetic multiplexing. bioRxiv (2025) doi: 10.1101/2025.11.18.689170. [DOI] [Google Scholar]
21.Mannochio-Russo H. et al. The microbiome diversifies long- to short-chain fatty acid-derived N-acyl lipids. Cell (2025) doi: 10.1016/j.cell.2025.05.015. [DOI] [Google Scholar]
22.Mannochio-Russo H. et al. Bridging complexity and accessibility in metabolomics with MetaboApps. ChemRxiv (2025) doi: 10.26434/chemrxiv-2025-3nq29. [DOI] [Google Scholar]
23.Wu M. et al. Gut complement induced by the microbiota combats pathogens and spares commensals. Cell 187, 897–913.e18 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Won T. H. et al. Host metabolism balances microbial regulation of bile acid signalling. Nature 638, 216–224 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Nothias L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Watrous J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl. Acad. Sci. U. S. A. 109, E1743–52 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.McDonald D. et al. American gut: An open platform for citizen science microbiome research. mSystems 3, (2018). [Google Scholar]
28.Mohanty I. et al. The changing metabolic landscape of bile acids - keys to metabolism and immune regulation. Nat. Rev. Gastroenterol. Hepatol. 21, 493–516 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Jia M. et al. Gut microbiota dysbiosis promotes cognitive impairment via bile acid metabolism in major depressive disorder. Transl. Psychiatry 14, 503 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Fogelson K. A., Dorrestein P. C., Zarrinpar A. & Knight R. The gut microbial bile acid modulation and its relevance to digestive health and diseases. Gastroenterology 164, 1069–1085 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Mann A. et al. Palmitoyl Serine: An Endogenous Neuroprotective Endocannabinoid-Like Entity After Traumatic Brain Injury. J. Neuroimmune Pharmacol. 10, 356–363 (2015). [DOI] [PubMed] [Google Scholar]
32.Long J. Z. et al. The secreted enzyme PM20D1 regulates lipidated amino acid uncouplers of mitochondria. Cell 166, 424–435 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wan K. X., Vidavsky I. & Gross M. L. Comparing similar spectra: from similarity index to spectral contrast angle. J. Am. Soc. Mass Spectrom. 13, 85–88 (2002). [DOI] [PubMed] [Google Scholar]
34.Wilkinson M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). [Google Scholar]
35.van Santen J. A. et al. The Natural Products Atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 50, D1317–D1323 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.van Santen J. A. et al. The Natural Products Atlas: An open access knowledge base for microbial natural products discovery. ACS Cent. Sci. 5, 1824–1833 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Di Tommaso P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017). [DOI] [PubMed] [Google Scholar]
38.Bittremieux W. et al. Universal MS/MS Visualization and Retrieval with the Metabolomics Spectrum Resolver Web Service. bioRxiv 2020.05.09.086066 (2020) doi: 10.1101/2020.05.09.086066. [DOI] [Google Scholar]
39.Kim H. W. et al. NPClassifier: A deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 2795–2807 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Lex A., Gehlenborg N., Strobelt H., Vuillemot R. & Pfister H. UpSet: Visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 1983–1992 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

NIHPP2026.01.24.701521v1-supplement-1.pdf^{(2.4MB, pdf)}

Data Availability Statement

[R1] 1.Quinn R. A. et al. Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579, 123–129 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Tremaroli V. & Bäckhed F. Functional interactions between the gut microbiota and host metabolism. Nature 489, 242–249 (2012). [DOI] [PubMed] [Google Scholar]

[R3] 3.Oliphant K. & Allen-Vercoe E. Macronutrient metabolism by the human gut microbiome: major fermentation by-products and their impact on host health. Microbiome 7, 91 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Hadadi N., Berweiler V., Wang H. & Trajkovski M. Intestinal microbiota as a route for micronutrient bioavailability. Curr. Opin. Endocr. Metab. Res. 20, 100285 (2021). [Google Scholar]

[R5] 5.Wilson I. D. & Nicholson J. K. Gut microbiome interactions with drug metabolism, efficacy, and toxicity. Transl. Res. 179, 204–222 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Chiu K., Warner G., Nowak R. A., Flaws J. A. & Mei W. The impact of environmental chemicals on the gut microbiome. Toxicol. Sci. 176, 253–284 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Lindell A. E., Zimmermann-Kogadeeva M. & Patil K. R. Multimodal interactions of drugs, natural compounds and pollutants with the gut microbiota. Nat. Rev. Microbiol. 20, 431–443 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Cryan J. F. et al. The Microbiota-gut-brain axis. Physiol. Rev. 99, 1877–2013 (2019). [DOI] [PubMed] [Google Scholar]

[R9] 9.Corbin K. D. et al. Host-diet-gut microbiome interactions influence human energy balance: a randomized clinical trial. Nat. Commun. 14, 3161 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Han S. et al. A metabolomics pipeline for the mechanistic interrogation of the gut microbiome. Nature 595, 415–420 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Shiroma H. et al. Enteropathway: the metabolic pathway database for the human gut microbiota. Brief. Bioinform. 25, (2024). [Google Scholar]

[R12] 12.Wishart D. S. et al. MiMeDB: The Human Microbial Metabolome Database. Nucleic Acids Res. 51, D611–D620 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Kruger R. et al. MiMeDB 2.0: The human Microbial Metabolome Database for 2026. Nucleic Acids Res. (2025) doi: 10.1093/nar/gkaf1272. [DOI] [Google Scholar]

[R14] 14.Wortmann E., Adam G. & Limonciel A. Biocrates. MxP^® Quant 1000 in microbiome research. Preprint at https://biocrates.com/wp-content/uploads/2025/05/Application-note-Quant-1000-in-microbiome-research.pdf (2025). [Google Scholar]

[R15] 15.Zuffa S. et al. microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data. Nat Microbiol 9, 336–345 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Wang M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Caraballo-Rodríguez A. M. et al. The undiscovered natural product potential of Actinomycetes. J. Antibiot. (Tokyo) (2025) doi: 10.1038/s41429-025-00876-x. [DOI] [Google Scholar]

[R18] 18.Poynton E. F. et al. The Natural Products Atlas 3.0: extending the database of microbially derived natural products. Nucleic Acids Res. 53, D691–D699 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Zhao H. N. et al. A resource to empirically establish drug exposure records directly from untargeted metabolomics data. Nat. Commun. 16, 10600 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Patan A. et al. Charting the undiscovered metabolome with synthetic multiplexing. bioRxiv (2025) doi: 10.1101/2025.11.18.689170. [DOI] [Google Scholar]

[R21] 21.Mannochio-Russo H. et al. The microbiome diversifies long- to short-chain fatty acid-derived N-acyl lipids. Cell (2025) doi: 10.1016/j.cell.2025.05.015. [DOI] [Google Scholar]

[R22] 22.Mannochio-Russo H. et al. Bridging complexity and accessibility in metabolomics with MetaboApps. ChemRxiv (2025) doi: 10.26434/chemrxiv-2025-3nq29. [DOI] [Google Scholar]

[R23] 23.Wu M. et al. Gut complement induced by the microbiota combats pathogens and spares commensals. Cell 187, 897–913.e18 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Won T. H. et al. Host metabolism balances microbial regulation of bile acid signalling. Nature 638, 216–224 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Nothias L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Watrous J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl. Acad. Sci. U. S. A. 109, E1743–52 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.McDonald D. et al. American gut: An open platform for citizen science microbiome research. mSystems 3, (2018). [Google Scholar]

[R28] 28.Mohanty I. et al. The changing metabolic landscape of bile acids - keys to metabolism and immune regulation. Nat. Rev. Gastroenterol. Hepatol. 21, 493–516 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Jia M. et al. Gut microbiota dysbiosis promotes cognitive impairment via bile acid metabolism in major depressive disorder. Transl. Psychiatry 14, 503 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Fogelson K. A., Dorrestein P. C., Zarrinpar A. & Knight R. The gut microbial bile acid modulation and its relevance to digestive health and diseases. Gastroenterology 164, 1069–1085 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Mann A. et al. Palmitoyl Serine: An Endogenous Neuroprotective Endocannabinoid-Like Entity After Traumatic Brain Injury. J. Neuroimmune Pharmacol. 10, 356–363 (2015). [DOI] [PubMed] [Google Scholar]

[R32] 32.Long J. Z. et al. The secreted enzyme PM20D1 regulates lipidated amino acid uncouplers of mitochondria. Cell 166, 424–435 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Wan K. X., Vidavsky I. & Gross M. L. Comparing similar spectra: from similarity index to spectral contrast angle. J. Am. Soc. Mass Spectrom. 13, 85–88 (2002). [DOI] [PubMed] [Google Scholar]

[R34] 34.Wilkinson M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). [Google Scholar]

[R35] 35.van Santen J. A. et al. The Natural Products Atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 50, D1317–D1323 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.van Santen J. A. et al. The Natural Products Atlas: An open access knowledge base for microbial natural products discovery. ACS Cent. Sci. 5, 1824–1833 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Di Tommaso P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017). [DOI] [PubMed] [Google Scholar]

[R38] 38.Bittremieux W. et al. Universal MS/MS Visualization and Retrieval with the Metabolomics Spectrum Resolver Web Service. bioRxiv 2020.05.09.086066 (2020) doi: 10.1101/2020.05.09.086066. [DOI] [Google Scholar]

[R39] 39.Kim H. W. et al. NPClassifier: A deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 2795–2807 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Lex A., Gehlenborg N., Strobelt H., Vuillemot R. & Pfister H. UpSet: Visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 1983–1992 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Community Curation of Microbial Metabolites Enables Biological Insights of Metabolomics Data

Helena Mannochio-Russo

Wilhan D Gonçalves Nunes

Shipei Xing

Fernanda de Oliveira

Andrés Mauricio Caraballo-Rodríguez

Paulo Wender Portal Gomes

Vincent Charron-Lamoureux

Julius Agongo

Nicole E Avalon

Tammy Bui

Lucia Cancelada

Marc G Chevrette

Andrés Cumsille

Moysés B de Araújo Júnior

Marilyn De Graeve

Victoria Deleray

Mohamed S Donia

Mutsawashe B Dzveta

Yasin El Abiead

Ronald J Ellis

Donald Franklin Jr

Neha Garg

Harsha Gouda

Claude Y Hamany Djande

Anastasia Hiskia

Benjamin N Ho

Chambers C Hughes

Sunghoon Hwang

Sofia Iliakopoulou

Jennifer E Iudicello

Alan K Jarmusch

Triantafyllos Kaloudis

Irina Koester

Robert Konkel

Hector H F Koolen

Kine Eide Kvitne

Sabina Leanti La Rosa

Anny Lam

Santosh Lamichhane

Motseoa Lephatsi

Scott Letendre

Sarolt Magyari

Hanna Mazur-Marzec

Daniel McDonald

Ipsita Mohanty

Mónica Monge-Loría

David J Moore

Thiago André Moura Veiga

Musiwalo S Mulaudzi

Lerato Nephali

Griffith Nguyen

Martin Orságh

Abubaker Patan

Tomáš Pluskal

Phillip B Pope

Lívia Soman de Medeiros

Paolo Stincone

Andrej Tekel

Sydney Thomas

Ralph R Torres

Shirley M Tsunoda

Fidele Tugizimana

Martijn van Faassen

Felipe Vasquez-Castro

Giovanni A Vitale

Berenike C Wagner

Crystal X Wang

Sevasti-Kiriaki Zervou

Haoqi Nina Zhao

Simone Zuffa

Daniel Petras

Laura-Isobel McCall

Rob Knight

Mingxun Wang

Pieter C Dorrestein

Abstract

Introduction: