Abstract
Intestinal bacteria can metabolize polyphenols into highly bioavailable derivatives, which provide potential health-promoting effects to the host. However, the metabolic pathways and related products in this process are still largely unclear. Polyphenols are generally characterized by the presence of many phenolic structural units, which makes it possible to explore correlations among compounds based on similar molecular networks. In this study, we developed a standard-oriented/database-assisted molecular networking (SODA-MN) method for iterative compound annotation analysis to explore the metabolic profiles of polyphenol-rich black raspberry extract metabolized by representative gut bacteria. Starting from a group of polyphenol metabolites, the SODA-MN method predicted the possible polyphenol derivatives by adding or deducting common biotransformation groups and iterative annotating of structure similarity based on fragmentation spectra. Our results showed that 48 polyphenol derivatives in the first round of analysis alone (fragmentation match >= 5, spec score >0.5) can be annotated, which were associated with 13 detected polyphenol standards that served as seed compounds. Meanwhile, this method was applied to a time course study to show the time dependent changes of polyphenols metabolized by a mix of gut bacteria. In addition, the metabolic capabilities of polyphenols among four representative gut bacteria were compared via our newly developed method and differential polyphenol metabolites were detected. In summary, the SODA-MN method provides a new approach for the annotation of unknown compounds by structure similarity and molecular networking analysis. Our analysis results could provide identification of key polyphenol derivatives that may contribute to the mechanistic investigations of their functions and assist our understanding of how polyphenols and gut bacteria interact to promote human health.
Keywords: Metabolomics, gut microbe, molecular network, polyphenols, black raspberry, mass spectrometry
1. Introduction
Gut bacteria play a crucial role in many health aspects of the host, and they strongly influence human metabolic states.[1] Many microbial metabolites, including short-chain fatty acids (SFCAs), methylamines, polyphenols, polyamines, indoles, and vitamins, that are metabolized by gut bacteria and closely related to host health have also been identified and characterized by metabolomics analysis.[2] However, many of these important small molecules that may impact various aspects of human health remain unidentified/unannotated and therefore hinder the utility of these potentially important compounds in biomedical research. In recent years, it has been suggested that dietary polyphenols can contribute to various aspects of human health. For instance, they can prevent metabolic syndrome and reduce the risk of cardiovascular diseases and type 2 diabetes through the regulation of oxidative stress,[3] immune responses,[4] and energy metabolism.[5] Compared with unmodified polyphenols, some of the polyphenols derivatives modulated by gut microbial activities have demonstrated better physicochemical properties and bioactivity, [6, 7] which may indicate that the health effects of polyphenols on human could come from, at least in part, their microbial metabolites, rather than original compounds found in the food.[8] Thus, discovering and identifying diverse derivatives/metabolites of polyphenols is a fundamental step for us to understand how the gut bacteria influence the host through many small molecules.
To date, characterizing phenolic compounds in food and biospecimens remains a complex process, as the phenolic compounds can be found in simple or highly polymerized structures. In addition, extract and analysis methods (LC columns, solvents, etc.) may also play a role in determining how many polyphenols to be detected.[9] Even though a detailed summary of polyphenol compounds from multiple publications was provided by previous studies such as the Phenol Explorer Library, [10] the relevant mass spectral database is still lacking. In addition, current databases for polyphenol metabolite/derivatives identification based on mass spectrometry (MS) techniques remain incomplete, impeding the global evaluation of high-throughput metabolomics research.
Nevertheless, it is expected that structurally similar polyphenol compounds will have multiple associations in their mass spectra, thus, a molecular network based on spectral/structural similarity can be established to describe the complex relationships among these compounds. When performing metabolite identification, MS2 spectra with high similarity are generally considered to come from a group of similar or related metabolites, because metabolites with similar structures and functional groups are generally fragmented into certain patterns under the same experimental conditions [11]. Therefore, metabolites with similar spectra can be linked to form a molecular network. Molecular network is generally defined as a data structure consisting of nodes and edges and has been applied to biomarker stratification of patients, drug discovery, microbiome research and synthetic biology studies. Some established molecular network tools such as Global Natural Product Social Molecular Networking (GNPS) [12] and Metabolic Reaction Network (MRN)-Based Recursive Algorithm (MetDNA) [13] have demonstrated their utility in annotating unknowns in metabolomics studies by building diverse and extensive molecular networks based on isotopes, adducts and chemical reactions. Algorithms on such tools mostly calculate the similarity by comparing the query spectrum with the library spectrum or the spectrum of known compounds and calculating their modified dot product [14]. However, these platforms are generally designed for generic purpose and not targeted for a specific family of compounds/metabolites, and therefore may lack of specific databases (e.g., not currently provide extensive database to cover a diverse group of polyphenols). Therefore, in this study, we develop a standard-oriented/database-assisted molecular networking (SODA-MN) method for iterative compound annotation analysis that aim to explore the microbial metabolites/derivatives of polyphenols after they interacted with gut bacteria. Our method was then used to assist the annotation of polyphenols and their metabolites/derivatives after gut bacterial biotransformation, which can provide fundamental information of small molecules in the gut to understand how gut bacteria could affect human health via their microbial metabolites. While we used polyphenols in this study to showcase the utility of our approach, our newly developed method has the potential to be extended to the annotation of other classes of compounds during the gut microbial biotransformation processes in future studies.
2. Materials and Methods
2.1. Chemicals and reagents
Gifu Anaerobic Broth (GAM broth), agar, and other chemicals, including LC/MS grade methanol, acetonitrile, water, formic acid, and all analytical grades were procured from Fisher Scientific (Pittsburgh, PA, USA). Premium Alcohol-Free Black Raspberry Extract was purchased from BerriHealth (Corvallis, OR, USA). Four types of bacteria strains used in the experiment were procured from ATCC (Manassas, VA, USA).
2.2. Growth conditions of microbial cultures
As we reported previously, the gut bacterial mix used in this study was extracted from feces donated by two healthy volunteers who had not received antibiotic treatment in the previous six months.[15] Before conducting the study, the informed consent form was approved by the Institute Review Board (IRB) committee, and informed consent were obtained from the healthy fecal donors. The pooled fecal samples were cultured to extract gut microbiota based on the generic extraction procedure. Briefly, the fecal content was added in pre-reduced PBS with 0.1% cysteine and vortexed, then the diluted suspension was plated on Gifu Anaerobic (GAM) Agar plate (HiMedia Laboratories LLC, West Chester, PA, USA) at 37°C for 48 h in the anaerobic chamber (COY lab, Grass Lake, Michigan, USA). The extracted gut microbiota and four bacterial strains, including Akkermansia muciniphila (AM, ATCC: BAA-835), Bacteroidetes thetaiotaomicron (BT, ATCC: 29148), Streptococcus thermophilus (ST, ATCC: 19258), and Escherichia coli (EC, ATCC: 29425) were cultured in the medium broth using GAM at 37 °C for 24 h. The mixed gut bacteria, AM, and BT were cultured under the strict anaerobic condition, whereas ST and EC were cultured under the aerobic condition at 37°C for 24 hr. Four biological replicates at each time point and eight replicates for each single bacterium were analyzed to ensure our reproducibility. Then, the optical density (OD) was measured using spectrophotometers at 575 nm and the OD values were used for spectral data normalization.
2.3. Treatment of microbial cultures by black raspberry (BRB) extract
To compare metabolite profiles of the gut bacterial mix and four strains of bacteria with BRB treatment, BerriHealth premium alcohol-free black raspberry extract was purchased from Berrihealth company (Corvallis, OR, USA) and used in this study. Initially, BRB extract with diluted with autoclaved LC grade water (1:10, v/v) then filtered through a 0.2 μm PTFE filter. Then, 100 μL of BRB solution was added to the different groups of bacterial cultures to reach a final concentration as 1:1000 v/v. Similarly, the 100 μL of water was added into control groups (samples with GAM medium but no bacterial inoculation). In the case of gut bacterial mix, cultures were acquired at different time points at 1, 8, and 24 hours to investigate the time-dependent BRB biotransformation. The four tested representative single strains were incubated for 24 hrs before harvest and analysis.
2.4. Metabolite extraction
To assess the metabolic profiles of bacterial cultures with or without BRB treatment, our previously reported metabolite extraction method was used with modification. [16] Briefly, bacterial culture (1 ml) was placed in a 2 ml tube and 1ml of ethyl acetate (1% formic acid) was added into the sample. The mixture was mixed thoroughly by vertexing for 1 min, then sonicated for 30 min. After centrifuge at 10,000g for 5 min (company), organic layers were taken into clean tubes. The residue was used for second extraction by repeating the same extraction procedure above. The pooled organic layer was completely evaporated at 30°C for 1 hour using Speedvac vacuum dryer (Germany, Eppendorf). Then 100 μL of methanol was added to reconstitute the dried samples before they were loaded into LC-MS glass vials.
2.5. Data collection on Liquid-chromatography mass spectrometry (LC-MS)
LC-MS analysis was performed for prepared samples on an UPLC-Q-Exactive Mass Spectrometer system with XTERRA RP 18 Column (3.5 μm, 3.9 mm X 100 mm; Waters Corporation, Milford, MA) with a gradient elution. Binary mobile phase containing 0.01% aqueous formic acid (A) and acetonitrile (B) was applied to achieve separation with the following gradient program: 0 min, 10% B; 18 min, 40% B; 20 min, 60% B; 22 min 10% B; 25 min, 10% B with the flow rate at 0.5 mL min−1. The total running time was set for 25 min, and the temperature of column was kept constant at 40 °C. The optimized detection parameters by our previous study [17] were applied in these detections: the data acquisition mode was full scan+ DDMS2, and high quality MS1 and MS2 data were obtained resolution=70,000, normalized collision energy (NCE) =10, intensity threshold = 1×105, exclusion time threshold = 30s, and negative ionization mode only. In addition, the instrumental parameters were set as followings: sheath gas flow rate, 60; auxiliary gas flow rate, 2; spray voltage, 3.5 kV; capillary temperature, 275 °C; and S-lens RF level, 60.0. The MS was controlled directly using Q Exactive Tune software (version 2.9) in full MS mode. The scan range of negative ionization modes was set from 50 to 1700 m/z; microscans of 1; AGC target of 1e6; and maximum injection time of 100 ms. In addition, pooled quality control (QC) samples for each type of metabolites were injected every ten biological samples to ensure the instrument stability.
2.6. Library preparation using biotransformation groups information
To prepare the polyphenol biotransformation group library for molecular similarity annotation, we extracted the compound name, formula and molecular weight of 37 standards (listed in Table S1) and their derivatives on Phenol Explorer,[10] a public database containing all known information on polyphenols based on published literatures, then calculated the mass difference between standards and their derivatives. Through the summarization of mass differences and comparison of chemical formulas, biotransformation groups were annotated. Biotransformation groups that occur more than once were retained in this library to serve as the basis for compound annotation.
2.7. Data processing, structural similarity-based compound annotation, and molecular network analysis
SODA-MN algorithm was established in R (version 4.0.3). First, we used xcms package to extract MS1 data, and CluMSID package to extract MS2 data. Raw data (.raw) were converted to mzML format by Proteo Wizard[18] for R package recognition. Features with m/z in 10 ppm and retention time in 60s were regarded as duplicates and the one with highest intensity was kept. For each spectrum, features with relative intensity lower than 1% were removed as noise signals.
After duplicates and noise removal, spectra, m/z and retention time were compared first between each sample matrix and standard matrix. Spectra which have as least 6 pairs of matched fragment ions were considered as candidates to calculate spectra similarity, which is conducted through optimized dot product by MetDNA (Metabolite annotation and Dysregulated Network Analysis) [13]
| (Eq. 1) |
Where weighted intensity vector, W = [relative intensity of matched fragment ions]n [m/z of matched fragment ions]m, S = standard, E = experiment. In this study, we only used top 10 matched fragment ions for calculation, and set n=1, m=1 to balance the intensity and m/z data; if less than 10 matched fragment ions were recorded, then all fragmentation were applied. Therefore, the equation and cut off for matching were:
| (Eq. 2) |
| (Eq. 3) |
| (Eq. 4) |
| (Eq. 5) |
Where Int = intensity, mz = m/z, RT = retention time, S = standard, E = experiment. Precursor ions satisfied with criteria above were annotated as seed compounds for round 1 annotation. At each round of neighbor compounds annotation, seed compounds and common biotransformation group library were used to predicted m/z of potential biotransformation products. Both forward biotransformation groups (meaning adding biotransformation group(s) to original compounds) and reverse biotransformation groups (meaning removing biotransformation group(s) from original compounds) were generated. These predicted m/z and the spectra of seed compounds were used to match remaining precursor ions to find their neighbor compounds. Because neighbor compounds have different precursor ion with the seed compound, here we adjusted the cut-off for matched fragment ions and Scorespec smaller, so as to find more neighbor compounds at first round of searching by avoiding false negative errors. But we still keep the higher cut-off for all round searching to avoid false positive errors. The criteria and equation for neighbor compounds annotation were:
| (Eq. 6) |
| (Eq. 7) |
| (Eq. 8) |
| (Eq. 9) |
Where SE= seed compounds, E= experiment, P = predicted compounds, mw = molecular weight. The annotation results after each round served as the seed compounds for the next round. When there are no new neighbor compounds to be found, then the algorithm stopped and output the results. A node table containing the annotated information and an edge table containing their connected relationship were also generated. These two tables were imported into Cytoscape [19] for molecular network visualization. The code of our analysis algorithm and a small set of demo data has been made available in GitHub with the following link: https://github.com/Xrrrr98784/SODA-MN.
To compare our results with existing leading platform for similar analysis. GNPS was used to generate a molecular network from the converted data in mzXML format. Precursor ion mass tolerance of 2.0 Da and fragment ion mass tolerance of 0.5 Da were applied to identify features. Features with at least 0.7 of pair cosine score calculated by at least 6 match fragment ions were connected to form edges. Network TopK was set at 10 and maximum connected components size was 100.
3. Results
3.1. The development of a general SODA-MN workflow
To annotate gut microbial metabolites/biotransformation products, we first established a SODA-MN workflow based on mass spectrometry-based spectra collection, standards-supported compound identification, and structure-similarity and molecular networking-based annotation of microbial metabolites of polyphenols (Figure 1). During experimental design section, three stages of experiments were performed, which including the targeted detection of 37 polyphenol standards (detailed in Table S1), the detection of metabolic features from bacterial mix cultured at 1 h, 8 h, 24 h, and the detection of metabolic features from four representative single bacterial cultures. The coefficient of variance (CV) distribution of intensity from all detected features was shown in Figure S1. It can be seen that the distribution in each group was concentrated below 0.3, indicating a good reproducibility among thousands of detected features. Spectral data were then extracted to a peak intensity matrix for similarity comparison. In the initial step, spectra from experiments satisfied the criteria of matched fragment ions >= 6, score m/z <10 ppm, score RT < 60 s, and score spec >0.7 when compared with those from standards were annotated and used as the seed compounds for round 1 identification of unknowns. For each seed compound, the precursor ion was either added or deducted a biotransformation group to predict a potential neighbor compound for annotation in the obtained spectra. Thirty-seven biotransformation groups used to annotate neighbor compounds are listed in Table 1, with their formula, molecular weight and frequency of occurrences listed based on Phenol Explorer. [10] These biotransformation groups can be divided into two main clusters: common functional groups, such as hydroxy and methyl; and glycoside, such as glucoside and glucuronide. In addition, these two clusters of biotransformation groups might be added together, which represents by the “&” sign. Groups which have same molecular weight were listed together and represented by the “/” sign. Similarly, when predicting potential neighbor compounds in the later rounds of annotation, spectra which satisfied criteria of matched fragment ions >= 5 or 6, score m/z <10 ppm and score spec >0.5 or 0.7 when compared with seed compounds were kept as seed compounds. After rounds of annotation, all tentative annotated compounds and their interactions were output to form a molecular network.
Figure 1.

The SODA-MN workflow demonstrates the experiment design, data detection and data analysis in this study.
Table 1.
Common biotransformation groups in polyphenols reported by Phenol Explorer
| Biotransformation groups | Formula | Molecular weight | Frequency |
|---|---|---|---|
| dihydro | H2- | 2.0157 | 6 |
| methyl | CH2- | 14.0157 | 18 |
| hydroxy/gallo | O- | 15.9949 | 16 |
| Hydroxy&dihydro | H2O- | 18.0106 | 2 |
| dimethyl/ethyl ester | C2H4- | 28.0313 | 6 |
| O-methyl | CH2O- | 30.0106 | 4 |
| Prenyl | C5H8- | 68.0626 | 2 |
| O-xyloside/O-arabinoside | C5H8O4- | 132.0423 | 15 |
| O-rhamnoside | C6H10O4- | 146.0579 | 4 |
| dihydro&O-rhamnoside | C6H12O4- | 148.0736 | 2 |
| O-gallate | C7H4O4- | 152.0110 | 6 |
| O-glucoside/O-galactoside | C6H10O5- | 162.0528 | 35 |
| methyl&O-gallate | C8H6O4- | 166.0266 | 3 |
| gallo&O-gallate | C7H4O5- | 168.0059 | 2 |
| acetyl-xyloside/acetyl-arabinoside | C7H10O5- | 174.0528 | 2 |
| O-glucuronide | C6H8O6- | 176.0321 | 29 |
| methyl&O-glucoside | C7H12O5- | 176.0685 | 2 |
| dihydro&O-glucuronide/Hydroxy&O-glucoside | C6H10O6- | 178.0477 | 4 |
| O-gallate&dimethyl/pyro | C9H8O4- | 180.0423 | 2 |
| methyl&O-glucuronide | C7H10O6- | 190.0477 | 6 |
| gallo&O-glucuronide | C6H8O7- | 192.0270 | 2 |
| O-acetyl-glucoside/O-(acetyl-galactoside) | C8H12O6- | 204.0634 | 13 |
| methyl&gallo&O-glucuronide | C7H10O7- | 206.0427 | 2 |
| O-malonyl-glucoside | C9H12O8- | 248.0532 | 6 |
| O-glucosyl-xyloside/O-sambubioside | C11H18O9- | 294.0951 | 7 |
| O-p-coumaroyl-glucoside | C15H16O7- | 308.0896 | 5 |
| O-rutinoside/O-glucoside&O-rhamnoside/O-galactoside&O-rhamnoside | C12H20O9- | 308.1107 | 17 |
| O-caffeoyl-glucoside | C15H16O8- | 324.0951 | 16 |
| O-sophoroside/O-diglucoside | C12H20O10- | 324.1057 | 14 |
| O-acetyl-galactoside&O-rhamnoside | C14H22O10- | 350.1213 | 2 |
| O-diglucuronide | C12H16O12- | 352.0642 | 4 |
| O-malonyl-glucoside&O-glucoside | C15H22O13- | 410.1060 | 2 |
| O-xylosyl-rutinoside | C17H28O13- | 440.1530 | 2 |
| O-rhamnosyl-rhamnosyl-glucoside | C18H30O13- | 454.1686 | 3 |
| O-sambubioside-O-glucoside | C17H28O14- | 456.1479 | 2 |
| O-glucosyl-rutinoside/O-glucosyl-rhamnosyl-glucoside | C17H30O14- | 470.1636 | 5 |
| O-diglucoside-O-glucoside/O-triglucoside/O-sophoroside&O-glucoside | C18H30O15- | 486.1585 | 4 |
3.2. Comparing MS signals of biotransformation products
To obtain an overview of the detected features from gut microbial growth experiments with added BRB, Figure S2 was generated with m/z as x-axis and retention time as y-axis for each unique feature. It also shows the distribution of precursor ions with (black dots) or without (red dots) MS2 fragmentation, as well as unique MS2 fragmentation (blue dots). After peak alignment and redundancy removal, Figure S2 shown the features from BRB control samples, the time-course samples from mixed bacteria cultures at 1h, 8h and 24 h, and from the single bacteria cultures. The quantitative information for Figure S2 were summarized in Table S2, and more than 7,000 peaks were detected in any given groups of samples tested. The unique precursor ions within both 10 ppm of m/z and 60s of retention time, or only within 10 ppm of m/z were also reported in Table S2, either of which can be utilized for MN-based annotation. The difference between these two unique precursor ion numbers is the former contain potential isomers but the later does not.
3.3. Algorithm workflow and examples of SODA-MN modeling
To show the details of how MN approach is applied in our study, a flow chart of SODA-MN algorithm is displayed in Figure 2A. Briefly, 3 types of data, including raw data of 37 standards, raw data of biological samples, and a table of common biotransformation group library (demonstrated in Table 1) were used as initial inputs of the algorithm. After data extraction, alignment, noise removal, and duplicates removal for standards and biological samples separately, two data matrices including all MS2 information were created. MS2 spectra, as well as their precursor ions and retention time of biological samples were matched to those of standards for identification. Identified compounds were used as seed compounds for round 1 annotation of neighbor compounds among data matrix of biological samples. If new neighbor compounds were found by following the stated criteria above, and if the maximum iteration number was not reached, these neighbor compounds were kept as seed compounds for the next round of annotation; otherwise, the algorithm stopped and output all results for creating molecular network. As examples, one pair of standard-to-sample annotation, and three pairs of biotransformation products matching were shown in Figure 2B, C, D, E. The top spectra were either from standard (Figure 2B) or seed compounds from previous rounds of annotation (Figure 2C, D, E), while the bottom spectra were unknowns from biological sample. During the step of compound identification by standards (that is, the round 0), node #1017 from bacteria mix at 1 h was matched to standard #9, ellagic acid, with 9 matched fragment ions and cosine score as 0.9998 (Figure 2B). Next, in the round 1 annotation, node #1017 was used as seed compound, and matched to node #1775 (match=7, cosine score =0.9954, Figure 2C). According to the mass difference of 176.0682 between these two nodes, we tentatively annotated node #1775 as ellagic acid added with a methyl group and a glucoside group (refer Table 1). According to the structure searching on SciFinder database, we putatively identified node #1775 as methylellagic acid O-glucoside. Figure 2D and 2E showed the third round of searching after methylellagic acid O-glucoside. Node#1705 was annotated as methylellagic acid O-glucoside with a deducted O-methyl group (C20H16O12), while node#1095 was annotated as methylellagic acid O-glucoside with a deducted O-glucoside group, that is, 3-O-methylellagic acid. The full spectra of Figure 2 are provided in Figure S3 for global comparison. O-glucoside group and O-methyl group were also included in our biotransformation group table (Table 1), however, C20H16O12 and 3-O-methylellagic acid were not found directly by ellagic acid in round 1 searching, indicating the criteria (match >= 6, spec score >0.7) in searching unique precursor ions without isomers here might be too strict. Thus, a relatively loose criteria (match >= 5, spec score >0.5) was applied to search unique precursor ions with isomers in all the following discussion of molecular network-based annotation.
Figure 2.

(A). The schematic demonstrates the algorithm logistics used for the neighbor compounds annotation. (B). Example of top 10 peak spectra matching process of ellagic acid, the top spectra were from ellagic acid standard, and the bottom spectra was from bacterial sample. (C, D, E). Example of top 10 peak spectra matching process, the top spectra were from seed compounds for round 1 searching, and the bottom spectra were from their neighbor compounds.
3.4. Monitoring time-dependent changes of biotransformation products of gut microbes
After compound annotation using SODA-MN approach, the identification, biotransformation groups, precursor ions, mass difference, cosine scores, rounds, intensity, etc. of all analyzed mass spectral features were then summarized and used to build molecular networks. As examples, molecular network built by compounds from round 1 annotation for the gut bacterial mix at 1h, 8h and 24h time point post BRB interventions were shown in Figure 3 A, B, C. In these figures, nodes with the biggest size are those annotated by comparing with standards, namely, seed compounds for round 1. In Figure 3 A, B, C, the seed compounds were labeled with full identification of polyphenols, while the labels on the neighbor compounds (generally their biotransformation products) is listed by the potential functional groups which were added or deducted from the seed compounds for simplicity. The numbers on the edge indicate the m/z difference between two neighboring compounds, of which the positive was a forward biotransformation group while the negative was a reverse a biotransformation group from the seed compounds. Colors filled in those nodes of Figure 3A, 3B and 3C represent relative intensity of these detected compounds, as indicated in the color bar below. In general, molecular networks from the first round of seed compounds are polycentric, diverging outwards and interrelated. It is worth mentioning that annotated compounds that diverge from different seed compounds and eventually overlapped could serve as a double check to indicate a more accurate structure elucidation.
Figure 3.

Molecular networks generated by tentative annotated compounds after round 1 searching for bacterial mix samples at 1 h (A), 8 h (B) and 24 h (C). The biggest size of nodes in each panel represents the seed compounds identified using commercial standards. Colors stand for relative intensity of these detected compounds, as indicated in the color bar below.
When investigating the time-dependent trend of polyphenol biotransformation in mixed gut microbial cultures with BRB intervention, the tentatively annotated compounds in each round first slightly decreased at 8 h, then increased at 24 h at each round (Table S2). For example, urolithin A was detected in 1 h and 24 h, but not in 8 h, thus the slightly decreased number of nodes from 1 h to 8 h might be due to the lack of compounds like urolithin A (or its concentration dropped below our limit of detection after data normalization by optical density, OD) and its biotransformation products. On the other hand, 3,4-dihydroxyphenylacetic acid was only detected in 24 h samples, which implied these two compounds might be the biotransformation products produced at later growth phase by the gut microbes.
To directly visualize the time dependent changes of polyphenol compounds in gut microbial cultures, we further aligned these annotated compounds and generated a molecular network of compounds from round 1 annotation for the time course-based comparison of the gut bacterial mix (Figure 4). In aligned MN, putative compound name was labelled on each node, and the biotransformation group between compounds was labelled on each edge. The color in nodes represent the intensity of each time point. Data from 8 h had 1 unique node (in total yellow), while data from 24 h had 7 unique nodes (blue nodes). In addition, in those common nodes which were detected at multiple time points, three (ellagic acid, Urolithin A 3-O-glucuronide, and C13H10O4) showed significant and trending changes by time (highlighted with the asterisk). It was demonstrated that the gut bacterial mix actively converting compounds into new derivatives/metabolites after 8 h of initial growth, which is in alignment with their growth and increased total population. Those unique nodes at 1 h or 8 h, or those nodes showed a significantly decrease over time are probably compounds that been utilized as growth substrate for the gut microbes, e.g., ellagic acid. On the other hand, the ones with unique presence or significant increase at later time points are likely to be biotransformation products, e.g., urolithin A 3-O-glucuronide.
Figure 4.

The aligned molecular networks and the comparisons of common compounds after round 1 annotation for the gut bacterial mix samples at 1 h (red), 8 h (yellow), and 24 h (blue). Different letters on top of the bar plots indicate statistical significance (p<0.05)
We also performed clustering analysis among compounds showing up in round 1, frequency of all biotransformation products, and the bacteria genus identified in the gut bacterial mix at 24 h (Figure S4). Some bacteria, such as Veillonella, Escherichia, and Bifidobacterium, show very similar patterns when correlating polyphenols and their derivatives to the bacterial abundance, indicating that these bacteria may share similar metabolic pathways that can interact with polyphenols. The same patterns also shown up in the correlation between biotransformation groups and bacteria. These patterns implied that some bacteria-specific biotransformation pathways could be discovered by our approach. However, further validations of these microbe-metabolites relationships have to be performed and the results obtained from this study will set us ready for testing many new hypotheses in follow-up studies.
3.5. Comparing bacteria-specific molecular signatures
Given the complexity of the mixed bacteria cultures used in the earlier part of the study, we also performed additional experiments using four representative gut bacteria to further evaluate if there are bacteria-specific molecular signatures after they biotransform natural products such as BRB tested in this study. As shown in Figure S5, the molecular network based on first round of compound annotation of AM, BT, ST and EC were generated for comparison. All round 1 annotation results (Figure S5A, S5B, S5C, S5D) demonstrated their uniquely displayed molecular networks of these four representative gut bacteria. Like Figure 3, the labels, colors and sizes of nodes represent compound names, peak intensities, and the initial standards vs. later derivatives, respectively. Interestingly, more precursor ions either with or without MS2 spectra in any single bacteria were detected than the mixed bacterial cultures (Table S2). At each round of annotation, significantly higher numbers of tentatively annotated compounds were found in these representative gut bacterial samples, with AM has the highest number of annotated compounds and ST has the lowest. Eleven seed compounds were displayed in this aligned molecular network (Figure S6), in which six were detected in all four bacteria and only daidzein was unique to EC. In total, 4, 6, 2, and 1 compounds were uniquely detected within AM, BT, ST, EC samples from first round, respectively, which implied BT might have the most unique biotransformation capability compared to others. Meanwhile, eleven commonly detected compounds with significant difference among four bacteria, and these compounds were also highlighted with the asterisk.
3.6. Validation of putative identification from SODA-MN
In order to validate these polyphenol derivatives annotated by SODA-MN workflow, we searched these annotated molecular formula or structure at round 1 via online database, such as SciFinder[20], to see if those structure really exist. Previous studies have considered that the identification of compounds can generally be divided into five levels [21]. Following this guideline and based on our search results, the annotated compounds were classified into three levels: Level 1: Confident 2D structure; there is one and only one identified structure, which is validated by reference standard; Level 2: probable structure; there is one or multiple possible isomeric structure, which are found in online databases; Level 3: Possible structure or class; the possible structure or formula can be proposed but is not reported in online databases;
Overall, forty-eight polyphenol and polyphenol derivatives annotated in both bacteria mix and single bacteria were summarized in Table S3, in which 19 compounds (including 13 with available standards) were recognized as level 1 identification. The structures of these level 1 annotated compounds were shown in Figure S7. There were 15 level 2 annotated compounds, of which the structures are not determined because of potential isomers. Further manually validation by comparing with spectrum from online database is needed to distinguish isomers. Both level 1 and level 2 annotation results show the ability of SODA-MN to discover unknown compounds from our experiments that have been reported by public databases before. Four additional standards were utilized to validate level 1 compounds. The mirror spectra (Figure S8) show that the m/z and RT of the four identified compounds were all comparable to the standards, and their spectra all had at least 7 matching fragment peaks with the spectra of the standards, as well as the cosine score > 0.7. In addition, we also found that standards of p-coumaric acid and caffeic acid were already available but were not stood out in the round 0. The m/z, RT, and spectra of these two compounds in biological samples were very close to those of the standards (Figure S9). Therefore, the standards of these two compounds can also be used to validate our results. Meanwhile, mutual validation in our existing standards detected in biological samples can also be observed. Structurally similar ones can all find each other in the first round, as shown by the two-way arrows (Figure 4 and Figure S6). There were 11 level 3 annotated compounds which cannot be matched to online database thus only formula were reported here, which demonstrates that our SODA-MN is capable to find unknown compounds. Some features with level 3 annotation always happened in reversed biotransformation groups, because the seed compounds did not contain those deducted biotransformation groups, but we were still able to obtain their formula via SODA-MN. This may due to some biotransformation groups with larger molecular weight can also be fragmented into smaller biotransformation groups by the applied collision energy, thus those smaller ones were misreported as a combined bigger biotransformation group. Meanwhile, our results were compared to results from MS-Finder. A total of 36 polyphenols were identified by MS/MS spectra matching in MS-Finder (Table S4), which is less than our reported polyphenols. Fifteen polyphenols from their results were also identified by our SODA-MN, among which 12 were level 1 and 3 were Level 2. For level 3 compounds, the formulas of 8 (out of 14) level 3 compounds in our results are consistent with MS-Finder results (which was only determined by m/z match, Table S5). We also applied our data on GNPS workflow, of which the results show 27 identified compounds including only 3 polyphenols in mixed bacteria (Table S6), and 61 compounds including only 7 polyphenols in four representative single gut bacteria (Table S7). Among those 7 polyphenols (after the removal of 3 duplicates) detected by GNPS, isoquercetin, quercetin 3-O-glucuronide and naringenin are also reported in our results. The other four were not reported by SODA-MN because they are not common polyphenols or have unusual functional groups, and they were also not reported in Phenol-Explorer.
4. Discussion
In this study, we developed a SODA-MN method to overcome the challenge of insufficient annotations of gut microbial metabolites/biotransformation products by regular untargeted spectral matching and the lack of commercial standards of polyphenol derivatives in most laboratories. In recent years, MN approach has become a key method for chemical relationship visualization and identification of untargeted mass spectrometry data.[22] Established molecular network tools, such as GNPS, emerged to provide a platform for communities to organize and share raw or processed tandem mass spectrometry data, and to help build molecular networks to increase the number of compound annotations. [23] After uploading user datasets to GNPS, the spectra are matched with a reference spectral library to annotate the molecules and find structurally similar compounds. However, it is reported that only 1.01% of the spectra in the public GNPS dataset match this library, indicating insufficient chemical space coverage.[12] Compared to our SODA-MN results, the identification rate in GNPS for polyphenols were pretty low, thus cannot support the desired follow-up steps such as pathway analysis and biological interpretation. However, GNPS, in conjunction with third-party databases, has shown improved ability to identify molecular derivatives within certain coverage,[24] and some studies have extended compound annotation rates in molecular networks by merging other existing databases with the GNPS workflow. For example, Soare et, al used the online lipidomics library, Lipyxplorer, and independent method to expand the lipid identification from the original GNPS database of 6 to 24, and added the identification results to the molecular network, thus expanding its ability to trigger further structural identification of unknown compounds.[25] In addition to molecular network identification tools, other identification tools, such as MS-Finder,[26] were also used to compare to our results. The compound identification and chemical formula prediction results of our algorithm partially coincide with those of MS-Finder, which validates our results. However, most chemical formula predictions in MS-Finder were not supported by MS/MS spectra. While in our level 3 compounds, we not only provided the chemical formula with MS/MS spectral information, but can also provide the possible biotransformation pathways of these polyphenol derivatives and combinations of functional groups (while the particular positions of these functional groups to be determined), which could help to better explain the rational for the proposed chemical formula.
Encouraged by the possibility of establishing molecular network to study compound derivatives, we designed a MN workflow according to a group of targeted polyphenols and their microbial products in our analysis. As there is no extensive specific LC-MS database for microbial transformation products of polyphenols, we incorporated the Phenol-Explorer database into our study to provide a starting point and a comparison database with more than 306 entries. Based on the simple and multi-repeated structural units of polyphenols, we expended from the idea of iterative annotation in the matching process to increase the number of putatively annotated compounds within the class of polyphenols. The combination of SODA-MN algorithm and polyphenol knowledge database to annotate phenolic derivatives is based on the fact that structural similarity of the compounds leads to similar spectra. As the first comprehensive database on polyphenol content in foods, Phenol-Explorer reports 6 classes, 31 sub-classes, and 751 polyphenols in total. However, it was only able to report 2 compounds, quercetin 3-O-rutinoside and hydroxybenzoic acids from BRB, suggesting additional polyphenols and their metabolites can be further discovered in (or linked to) BRB. Meanwhile, according to our results, 48 polyphenol and polyphenol derivatives can be annotated at the first round of searching during the SODA-MN process, and most of them can be validated by comparing to compounds reported by other database such as SciFinder and RIKEN spectral database for phytochemicals (ReSpect).[27]
So far, studies have shown that the mixture of phenolic substances from nature will eventually be converted into a number of key metabolites by gut bacteria, while the biotransformation and bioavailability of polyphenols by human gut bacteria have strong individual differences. [28] In our result, despite that the number of detected metabolic features did not change a lot by time during the BRB treatment to a group of fecal samples derived bacterial mix, and the annotated number of polyphenol derivatives did show clear increase at 24 h (Table S2), in the aligned MN (Figure 4), there are still some unique compounds appeared mostly at 24 h. This result indicated that our SODA-MN can sensitively identify new/unique derivatives produced by gut bacteria when they are interacting with polyphenols. Meanwhile, the four representative bacteria also showed somehow different metabolic profiles (Figure S5), indicating their unique metabolic capability of polyphenols that may be further confirmed with molecular biology approaches. In general, the interaction between gut bacteria improves the bioavailability of polyphenols by metabolizing them into microbial metabolites that can be absorbed easier, thereby allowing them to regulate gut health. [28]
In this study, we used two different parameter settings to search neighbor compounds. For more stringent criteria (match >= 6, spec score >0.7, considering isomers), our molecular network search eventually converged, finding limited polyphenols and polyphenol derivatives, but some real polyphenol compounds might be filtered out; However, for the relatively loose standard (match >= 5, spec score > 0.5, not considering isomers), the molecular network search did not converge, and the search results of each round showed a significant increase after multiple loops, which indicates that multiple loops cause the accumulation of false positives, even though they can reflect the strong correlation of all features in the data to a certain group. But in the first round of searches, the annotations of polyphenols and polyphenol derivatives were significantly improved. Therefore, the results of SODA-MN are extremely sensitive to the setting of early parameters. There should be a negative correlation between the degree of restriction of spectrum matching and the maximum number of set loops, namely, the balance between should be achieved. It is also worth to note that our structural similarity-based molecular network approach has the potential to be extended to the applications of annotating other types of compounds. For families of metabolites that generally lack commercial databases and have limited commercial standards, such as bile acids, methylamines, polyamines, etc., this method can achieve the identification of known derivatives as the starting seed compounds, and also enable the discovery of unknown compounds with similar structures and within the same family.
5. Conclusion
In conclusion, we showcased in this study that SODA-MN method can effectively annotate many polyphenol derivatives based on standard compounds fed into the analysis process, which could help to overcome the limitations of compound annotation caused by the lack of databases and commercial standard compounds in many metabolomics studies. The alignment of the molecular network provides an intuitive view that clearly shows the metabolic products of different groups of polyphenols going through microbial biotransformation and provides a new way to select unique metabolites as microbial metabolic signatures. Moving forward, we expect that SODA-MN can be further developed and applied to the annotation of other families of microbial metabolites that generally lack databases and standards.
Supplementary Material
Funding
Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133510.
Footnotes
Conflict of interest
The authors claim no conflict of interest
References
- 1.Donia MS and Fischbach MA, Small molecules from the human microbiota. Science, 2015. 349(6246). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yan S, et al. , Metabolomics in gut microbiota: applications and challenges. Science Bulletin, 2016. 61(15): p. 1151–1153. [Google Scholar]
- 3.Zhang H and Tsao R, Dietary polyphenols, oxidative stress and antioxidant and anti-inflammatory effects. Current Opinion in Food Science, 2016. 8: p. 33–42. [Google Scholar]
- 4.Ding S, Jiang H, and Fang J, Regulation of immune function by polyphenols. Journal of immunology research, 2018. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kerimi A and Williamson G, At the interface of antioxidant signalling and cellular function: key polyphenol effects. Molecular nutrition & food research, 2016. 60(8): p. 1770–1788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu J, et al. , Synthesis of chitosan-gallic acid conjugate: Structure characterization and in vitro anti-diabetic potential. International journal of biological macromolecules, 2013. 62: p. 321–329. [DOI] [PubMed] [Google Scholar]
- 7.Vittorio O, et al. , Dextran-Catechin: An anticancer chemically-modified natural compound targeting copper that attenuates neuroblastoma growth. Oncotarget, 2016. 7(30): p. 47479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cardona F, et al. , Benefits of polyphenols on gut microbiota and implications in human health. The Journal of nutritional biochemistry, 2013. 24(8): p. 1415–1422. [DOI] [PubMed] [Google Scholar]
- 9.López-Fernández O, et al. , Determination of polyphenols using liquid chromatography–tandem mass spectrometry technique (LC–MS/MS): A review. Antioxidants, 2020. 9(6): p. 479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rothwell JA, et al. , Phenol-Explorer 3.0: a major update of the Phenol-Explorer database to incorporate data on the effects of food processing on polyphenol content. Database, 2013. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Xiao JF, Zhou B, and Ressom HW, Metabolite identification and quantitation in LC-MS/MS-based metabolomics. TrAC Trends in Analytical Chemistry, 2012. 32: p. 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang M, et al. , Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nature biotechnology, 2016. 34(8): p. 828–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Shen X, et al. , Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics. Nature communications, 2019. 10(1): p. 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Horai H, et al. , MassBank: a public repository for sharing mass spectral data for life sciences. Journal of mass spectrometry, 2010. 45(7): p. 703–714. [DOI] [PubMed] [Google Scholar]
- 15.Chen L, et al. , Accurate and reliable quantitation of short chain fatty acids from human feces by ultra high-performance liquid chromatography-high resolution mass spectrometry (UPLC-HRMS). Journal of pharmaceutical and biomedical analysis, 2021. 200: p. 114066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Farag MA, Hegazi NM, and Donia MS, Molecular networking based LC/MS reveals novel biotransformation products of green coffee by ex vivo cultures of the human gut microbiome. Metabolomics, 2020. 16(8): p. 1–15. [DOI] [PubMed] [Google Scholar]
- 17.Xu R, et al. , Enhanced detection and annotation of small molecules in metabolomics using molecular-network-oriented parameter optimization. Molecular Omics, 2021. 17(5): p. 665–676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mahieu NG and Patti GJ, Systems-level annotation of a metabolomics data set reduces 25 000 features to fewer than 1000 unique metabolites. Analytical chemistry, 2017. 89(19): p. 10397–10406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Saito R, et al. , A travel guide to Cytoscape plugins. Nature methods, 2012. 9(11): p. 1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gabrielson SW, SciFinder. Journal of the Medical Library Association: JMLA, 2018. 106(4): p. 588. [Google Scholar]
- 21.Blaženović I, et al. , Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites, 2018. 8(2): p. 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Meier R, et al. , Bioinformatics can boost metabolomics research. Journal of biotechnology, 2017. 261: p. 137–141. [DOI] [PubMed] [Google Scholar]
- 23.Aron AT, et al. , Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nature protocols, 2020. 15(6): p. 1954–1991. [DOI] [PubMed] [Google Scholar]
- 24.Santos AL, et al. , Molecular network for accessing polyketide derivatives from Phomopsis sp., an endophytic fungus of Casearia arborea (Salicaceae). Phytochemistry Letters, 2021. 42: p. 1–7. [Google Scholar]
- 25.Soares V, et al. , Extending compound identification for molecular network using the LipidXplorer database independent method: A proof of concept using glycoalkaloids from Solanum pseudoquina A. St.‐Hil. Phytochemical Analysis, 2019. 30(2): p. 132–138. [DOI] [PubMed] [Google Scholar]
- 26.Vaniya A, et al. , Using MS-FINDER for identifying 19 natural products in the CASMI 2016 contest. Phytochemistry Letters, 2017. 21: p. 306–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sawada Y, et al. , RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database. Phytochemistry, 2012. 82: p. 38–45. [DOI] [PubMed] [Google Scholar]
- 28.Gross G, et al. , In vitro bioconversion of polyphenols from black tea and red wine/grape juice by human intestinal microbiota displays strong interindividual variability. Journal of agricultural and food chemistry, 2010. 58(18): p. 10236–10246. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
