Abstract
Small molecule metabolites are the product of many enzymatic reactions. Metabolomics thus opens a window into enzyme activity and function, integrating effects at the post-translational, proteome, transcriptome and genome level. In addition, small molecules can themselves regulate enzyme activity, expression and function both via substrate availability mechanisms and through allosteric regulation. Metabolites are therefore at the nexus of infectious diseases, regulating nutrient availability to the pathogen, immune responses, tropism, and host disease tolerance and resilience. Analysis of metabolomics data is however complex, particularly in terms of metabolite annotation. An emerging valuable approach to extend metabolite annotations beyond existing compound libraries and to identify infection-induced chemical changes is molecular networking. In this chapter, we discuss the applications of molecular networking in the context of infectious diseases specifically, with a focus on considerations relevant to these biological systems.
1. Introduction
Metabolomics allows insight into systems by systematically detecting and quantifying the metabolites present in samples (Idle & Gonzalez, 2007). Metabolites are small molecules under 1500Da with diverse chemical structures. Traditionally, metabolites include substrates, products, or intermediates of central metabolic pathways such as glycolysis, TCA cycle, or fatty acid oxidation. However, metabolites can also be involved in many critical systems such as regulation of gene expression, enzymatic inhibition, or signaling pathways (Newsom & McCall, 2018). In the broadest sense, metabolites can also include small molecules that were encountered and absorbed from the environment such as food or personal care product-derived molecules. The metabolome is thus the profile of metabolites found in a system and represents the intersection between cellular pathways, organ functions, behaviors, and habitat (Lewis, McCall, Sharp, & Spicer, 2020). The metabolome can be investigated through mass spectrometry (MS) or nuclear magnetic resonance spectroscopy (NMR)-based detection and identification methods. Mass spectrometry measures the mass-to-charge (m/z) ratio of analytes, both intact (MS1 level) or following fragmentation (MS2 level). Metabolomic approaches can be targeted, whereby a specific list of molecules of interest are quantified, or untargeted, whereby MS data is acquired across a broad mass range, with no pre-existing target m/z list. In the latter case, the goal is to study the broadest range of metabolites, for a given sample preparation protocol and detection system, without being pre-constrained by a priori notions. Major challenges in untargeted metabolomics include the complexity of the resulting data and the annotation of the resulting metabolite features (Wang et al., 2016).
Global Natural Product Social Molecular Networking (GNPS) provides a high throughput infrastructure to address these issues. Specifically, users can annotate their mass spectrometry data by comparing experimental MS2 spectra to collected reference MS2 spectra from public libraries of natural and synthetic small molecules, including lipids, central metabolism intermediates, plant and microbial natural products, drugs, personal care products, etc. These libraries are extended regularly by community addition of new reference spectra. In addition, GNPS enables users to search for analogs of specific compounds in their spectral data (Wang et al., 2016). For example, phosphatidylcholine 36:1 and phosphatidylcholine 36:3 have similar structures and thus similar MS2 fragmentation patterns. If only phosphatidylcholine 36:3 is present in the reference libraries, phosphatidylcholine 36:1 would remain unannotated by direct matching of spectra and precursor mass. In contrast, analog searches consider their MS2 spectral similarities, enabling phosphatidylcholine 36:1 annotation by analog matching to phosphatidylcholine 36:3 reference spectra.
The other advantage of GNPS is the ability to connect metabolite features with similar fragmentation spectra, indicating structural similarity, and thus generate molecular networks. These networks facilitate extended metabolite annotations (“annotation propagation”): although not all nodes in a molecular network match reference library spectra, connected nodes can be annotated by using the network and considering common chemical modifications such as oxidation or methylation. Molecular networks also enable analysis of the frequency of specific chemical modifications under a given experimental condition. Molecular networks can be visualized directly on the GNPS website or exported to Cytoscape software (see Section 3) (Shannon et al., 2003; Wang et al., 2016).
GNPS offers two types of molecular networking, classical and feature-based molecular networking (FBMN). Classical molecular networking is generated from unprocessed MS2 spectral data (from open-format .mzXML or .mzML files). It automatically clusters similar MS2 spectra into network nodes. Sometimes, however, MS2 fragmentation spectra from one single metabolite are variable, and in those cases classical molecular networking generates multiple nodes for this metabolite instead of only one node. FBMN uses MS1 information, including chromatographic retention time and feature peak area to group and align features in the molecular network. FBMN can thus distinguish between isomers with similar MS2 spectra but different retention times. In addition, FBMN networks can be directly related to metabolite feature quantification, since FBMN node size can be set to reflect MS1 peak area. On the other hand, classical networking is easy to set up from raw data, whereas FBMN needs pre-processing that is affected by user expertise. Classical molecular networking is also better suited to very large sample numbers or to integrate data collected through different instrumental conditions (Jarmusch et al., 2020; Nothias et al., 2020; Wang et al., 2016).
GNPS and molecular networking were initially designed for natural product discovery (Wang et al., 2016; Watrous et al., 2012), and therefore have seen the greatest applications in that field (e.g., Heine et al., 2018; Nothias et al., 2018; Senges et al., 2018). However, recent work has demonstrated their utility in the field of infectious diseases. Many infectious diseases can be difficult to detect and diagnose. In this context, GNPS was used to annotate metabolites linked to Chagas disease severity (Hoffman et al., 2021; McCall et al., 2017) or to discover metabolites predictive of Staphylococcus aureus bacteremia mortality (Wozniak et al., 2020). Metabolomics and molecular networking have also been used to identify molecules associated with pathogen or pathobiont colonization in mammalian or insect systems (Eberhard, Klimpel, Guarneri, & Tobias, 2021; Garg et al., 2017; Melnik et al., 2019). Chronic infectious diseases can also present a challenge in understanding effects on organ systems or unforeseen complications due to infection. Molecular networking applied to acute and chronic Trypanosoma cruzi infection identified changes in specific metabolites based on tissue location and helped discover a potential treatment to prevent damage associated with Trypanosoma cruzi infection (Hossain et al., 2020). To study communication between fungal pathogens and host, molecular networking was applied to the study of Cryptococcus gattii extracellular vesicles (Reis et al., 2021). Notably, these examples are all very recent. Thus, to build on these promising expansions of GNPS and molecular networking implementation, this protocol focuses specifically on molecular networking applications in the context of infectious diseases.
2. Practical considerations
All material should be handled using the appropriate biosafety protocol approved by your institution. For samples collected from animal models of infection, all approved Institutional Animal Care and Use Committee protocols should be followed.
There are multiple strategies for metabolite extraction, chromatography and mass spectrometry data acquisition. We will not discuss them here in detail, and instead refer the reader to the multiple excellent reviews on this topic (e.g., Dunn et al., 2011; Ivanisevic & Want, 2019; Want et al., 2013). In general, the metabolite extraction method should be chosen based on sample type and the properties of the metabolites of interest (for example polar or non-polar metabolites; lipids; microbial signaling molecules).
We focus here on molecular networking applied to liquid chromatography-tandem mass spectrometry data (Nothias et al., 2020; Wang et al., 2016), though it has recently been extended to gas chromatography–mass spectrometry data (Aksenov et al., 2021). Chromatography and mass spectrometry conditions should be optimized for the available instrument and metabolites of interest. Irrespective of these choices, it is however critical to acquire tandem mass spectrometry (MS/MS, also known as MS2) data to be able to perform molecular networking analysis of liquid chromatography-mass spectrometry data. In our experience, the best molecular network quality is obtained when MS/MS data is acquired for all samples under consideration, rather than only for a pooled sample. In all cases, MS2 should be acquired in data-dependent mode. High quality data will determine the success of this protocol, while poor quality data will lead to poor, uninterpretable or incorrect findings.
Additional important considerations include selection of sampling site and biofluid, which should be relevant to the pathogenesis of the disease of interest. Though more challenging to access, tissue samples are in our opinion preferable because they directly reflect local metabolic alterations caused by infection. Multiple adjacent samples will enable reconstruction of the spatial impact of infection on the metabolome (“chemical cartography”) (see for example, Dean et al., 2021; Hossain et al., 2020; McCall et al., 2017; Parab et al., 2021). However, should this not be possible or desired, tissue sampling site should always be consistent across all experimental samples.
Molecular networking is performed through the Global Natural Product Social Molecular Networking (GNPS) web interface. The GNPS website has excellent detailed instructions on how to perform molecular networking (https://ccms-ucsd.github.io/GNPSDocumentation/). Our goal here is not to supersede these instructions, but rather to highlight practical considerations for molecular networking specifically in the context of infectious diseases. Molecular networking can be performed straight from instrumental data files in open .mzXML or .mzML format (“classical molecular networking”) or by integrating feature finding using software with molecular networking (FBMN) (Nothias et al., 2020; Wang et al., 2016). In this protocol, we focus on the downstream data analyses, once a network has been generated.
3. Protocol
3.1. Before you begin
Timing: 4h-weeks (depending on number of samples, data size and prior familiarity with molecular networking).
Collect samples from infection model. Extract metabolites and collect data-dependent LC-MS2 data
Install Cytoscape software from https://cytoscape.org/
- Read the instructions on the GNPS website on how to set up a molecular networking job and watch the videos
- Published protocols are also available from Phelan (2020) and Aron et al. (2020)
Set up a classical or feature-based molecular network following these instructions
- The following example networks are available to the reader to practice this protocol and were used to generate the figures herein:
- Classical molecular network example: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=4f706aaba6ed4bc1b56791b12bf98777 (re-run from Hossain et al., 2020)
- Feature-based molecular network example: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=a2ae5d8bb4dd440ba885f7d0b3ebbfac (re-run with from Hossain et al., 2020)
Follow instructions below, Visualizing and interpreting your molecular network, on steps to download the data
Critical: These approaches are only compatible with data-dependent MS2 data. They cannot be applied to data-independent MS2 data, or to MS1-only data. Refer to the GNPS documentation for suitable file types (https://ccms-ucsd.github.io/GNPSDocumentation/isgnpsright/). Although molecular networking can now be performed using GC–MS data (Aksenov et al., 2021), we focus here on LC–MS/MS data analysis.
Critical: To be able to perform analyses of the impact of infection, ensure that your metadata contains categories pertaining to infection status, and that file names in the metadata table perfectly match names of the data files. To avoid mis-filtering, we recommend using “infected” and “no_infection,” or similar non-overlapping terms, rather than “infected” and “uninfected.”
3.2. Key resources table
| Deposited data | |
| “Classical” molecular networking job | Processed in GNPS, from mzXML or mzML file formats |
| Example (re-run from Hossain et al. 2020) | https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=4f706aaba6ed4bc1b56791b12bf98777 |
| Feature-based molecular networking job | Processed in GNPS, from mass spectrometry data processing programs (e.g., MZmine (Pluskal, Castillo, Villar-Briones, & Orešič, 2010), XCMS (Forsberg et al., 2018; Smith, Want, O’Maille, Abagyan, & Siuzdak, 2006; Tautenhahn, Böttcher, & Neumann, 2008), MS-DIAL (Lai et al., 2017; Tsugawa et al., 2015), OpenMS (Röst et al., 2016; Sturm et al., 2008)) |
| Example (re-run from Hossain et al. 2020) | https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=a2ae5d8bb4dd440ba885f7d0b3ebbfac |
| Software and algorithms | |
| GNPS | https://gnps.ucsd.edu/ |
| Cytoscape | https://cytoscape.org/ |
3.3. Materials and equipment
Computer. The analyses described in this protocol can be performed on any standard internet-connected desktop computer or laptop. Software is compatible with Windows, Mac and Linux.
Molecular network (from feature-based or classical networking job).
Cytoscape software.
3.4. Step-by-step method details
3.4.1. Visualizing and interpreting your molecular network
Timing: <40min
Detailed steps for visualizing and interpreting molecular networks in general may be found in the GNPS documentation, in Aron et al. (2020), and in Phelan (2020). For reader convenience, we provide a brief outline of these steps here, as a necessary prerequisite for subsequent analysis in the context of infectious diseases. These steps apply to both feature-based and classical molecular networks. As an example, we used the feature-based molecular network from: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=a2ae5d8bb4dd440ba885f7d0b3ebbfac.
Access your GNPS results page.
Some visualization is available via the GNPS browser interface (View Spectral Families (In Browser Network Visualizer)). Refer to the GNPS documentation and (Aron et al., 2020) for detailed instructions.
- For best visualization and flexibility, and to enable the subsequent analyses described in this protocol, download the data to Cytoscape through either of the following options:
- For large datasets:
- On your GNPS results page, select: Export/Download Network Files (Download Cytoscape Data) or Export/Download Network Files (Download GraphML for Cytoscape) (Fig. 1A (yellow box)).
- Click Download in the upper right. After downloading, unzip the folder.
- Open Cytoscape.
- For small datasets:
- On your GNPS results page, select: Advanced Views-External Visualization (Direct Cytoscape Preview/Download, Fig. 1A (red box)).
- The file opens in Cytoscape directly.
After importing the network into Cytoscape, the network displays in the right upper panel (Fig. 1D). To see more details on the network, zoom in with the tool bar magnifying glass icon (yellow box, Fig. 1D).
Each node represents one metabolite feature (Fig. 2A). Nodes with similar MS2 spectra are connected to each other and form a sub-network of features (chemical family, Fig. 2B). The lines that connect the nodes (features) are called edges (Fig. 2C). Some nodes (self-looped nodes) are not connected to other nodes and are displayed in the lowest part of the network (Fig. 2D). These self-looped nodes represent metabolite features whose MS2 spectra did not have sufficient similarity to any other MS2 spectra in the dataset (based on the cutoffs established when setting up the molecular networking job).
The total number of nodes and edges can be found on the left panel of the network tab (Fig. 3).
- To determine the number of subnetworks:
- Export network table using File→Export→Table. In the dropdown menu, select the name of your network default node table, then press OK.
- In Excel, determine the number of unique values in the “componentindex” column. In this example, we have 810 sub-families in our full molecular network. −1 values represent self-looped nodes.
- Node, edge, and network styles can be changed through the Style tab on the left panel. The Style tab has three sub-tabs for node, edge, and network.
- Nodes can be labeled with the annotation obtained via GNPS spectral matching.
- Select the Node tab.
- Select the Label drop-down menu by clicking the down arrow. Choose Compound_name for Column using the dropdown menu and Passthrough Mapping for Mapping Type (Fig. 4).
- Node color can be changed in a similar fashion using the Fill Color drop-down menu.
- Node shape can be changed in a similar fashion using the Shape drop-down menu.
- Additional detail is available here: (Aron et al., 2020; Phelan, 2020; Shannon et al., 2003) and https://ccms-ucsd.github.io/GNPS Documentation/cytoscape/.
- The node table (bottom right, Fig. 1D) contains information on each of the nodes within your network. To simplify the displayed table, use the “Show Columns” button (Fig. 1D, arrow).
- Pick Select None.
- Then manually select columns of interest. We recommend selecting at least: cluster index, Compound_Name, Analog: Compound_Name (Feature-based molecular networking), precursor mass, RTMean (Classical molecular networking) and RTConsensus (Feature-based molecular networking).
Annotated features can be verified by inspecting the mirror plot displaying the MS2 spectral concordance between your experimental spectra and the library reference spectra, and assessing the match cosine score, number of matched peaks, reference library quality, and your instrumental parameters. Plausibility in the context of your infection model should also be considered. Refer to the GNPS documentation and (Aron et al., 2020) for more details.
- A quick way to visualize the mirror plot for a specific annotation is as follows:
- Find the cluster index value for the feature/node with the annotation you would like to verify.
- From your GNPS results page, select View All Library Hits (Fig. 5B).
- Search by cluster index.
- Visualize the mirror plot by clicking the View Mirror Match button (Fig. 5C, red box).
- A higher cosine score indicates a higher similarity between the experimental and reference MS2 spectra, for a maximum of 1 (identical MS2 spectra) and a minimum of 0 (highly dissimilar MS2 spectra).
- A putative annotation for unannotated features can be obtained through network annotation propagation. Refer to Aron et al. (2020) and the GNPS documentation for more details. Briefly:
- For the node of interest, locate its neighbors using the First Neighbors of Selected Nodes (undirected) icon (Fig. 6A, blue box).
- Determine the mass difference between the unannotated node and any connected annotated nodes. Assess for chemical plausibility while also considering instrumental parameters.
- This information can also be found in the network Edge table, in column “EdgeAnnotation.”
- If first neighbors are not annotated, these steps can be repeated iteratively until no connected neighbors remain.
- See Optimizations and troubleshooting for additional complementary analysis tools that can be implemented if no nodes in the sub-network are annotated.
- To select specific nodes based for example on precursor mass or compound name:
- Click on the Select tab (PC) or Filter tab (Mac) on the left panel.
- Select “+,” then Column filter (Fig. 6A).
- Use the dropdown Choose column menu to select the parameter of interest (e.g., Node: precursor mass or Node: Compound_Name).
- Hit Apply icon at the bottom of the Select tab (Fig. 6B).
- Selected nodes are highlighted in yellow in your right-hand network view.
- To subset networks, for example to create a new network that only includes nodes annotated as acylcarnitines and members of the acylcarnitine family of metabolites (Hossain et al., 2020):
- Select nodes as described above:
- In the dropdown Choose column menu, select Node: Compound_Name.
- In the adjacent dropdown menu, select contains.
- In the search bar, type: carnitine.
- Use the First Neighbors of Selected Nodes (undirected) icon to select all neighbors (Fig. 7A).
- Continue expanding neighbors until all neighbors have been selected.
- Use the top “New Network From Selection (all edges)” icon (Fig. 7B, red box) to generate the new subnetwork.
- The new subnetwork can be visualized via the left panel Network tab.
- Rename the new subnetwork by right-clicking on the current network name and selecting Rename Network (Fig. 7C).
- The edge table and node table related to this subnetwork can be exported as a csv file as described above (step 7a).
Fig. 1.

Importing and visualizing the molecular network. (A) Downloading GNPS job file with “Direct Cytoscape preview/download” link (red box) for small datasets and “Download Cytoscape data” link (yellow box) for large datasets. (B) “Graphml” format file in the downloaded folder. (C) “Import network from file system” button in Cytoscape to import graphml format file (red box). (D) Example network displayed in Cytoscape (red box). Yellow box, zoom icon. Blue box, node table. Arrow, Show Columns button.
Fig. 2.

Network and network features. (A) A node in a network (red box). (B) A sub-network of related, connected nodes (red box). (C) An edge that connects two nodes (arrow). (D) Self-looped nodes (singletons) with no connection with any other nodes (red box).
Fig. 3.

Total number of nodes and edges.
Fig. 4.

Cosmetic changes (node, edge, and network styles). This figure shows how to customize the node labels with the compound name (red box).
Fig. 5.

Visualizing the mirror plot for a specific annotation. (A) Locating the node cluster index on the bottom right pane in Cytoscape (red box). (B) GNPS results page with “View All Library Hits” link (red box). (C) Example of mirror plot obtained by clicking the View Mirror Match link (red box) for the specified cluster index.
Fig. 6.

Selecting a specific node. (A) “Select” tab and “+” bottom on Cytoscape (red boxes). First Neighbors of Selected Nodes button (blue box). (B) Filtering the specific node with “Node: Compound_Name” as an example for parameter of interest and “Apply” icon.
Fig. 7.

Steps to subset network. (A) Selecting specific nodes based on parameters of interest and hitting the First Neighbors of Selected Nodes (red boxes). (B) “New Network From Selection (all edges)” icon to generate new subnetwork (red box). (C) Renaming the new network.
Pause point: Saving. When exiting Cytoscape, you will be prompted to save your session. This will preserve the formatting you established, so that it will be restored next time you open the saved Cytoscape session in Cytoscape software.
Pause point: Saving network view for figures. A publication-quality figure of the network can be saved through File→Export→Network Views as Graphics, and setting the export file format to pdf.
Note: Molecular network link should be shared in publications in the Data Availability section or appropriate equivalent.
3.4.2. Identifying network nodes perturbed by infection
Timing: 10–15min
This protocol will work for both feature-based and classical molecular networks. However, the pie charts are a better indication of relative metabolite feature abundance in feature-based molecular networks and thus the use of feature-based molecular networks is recommended for these analyses.
- Make the pie charts.
- Select the “Style” tab.
- Select the box in the Def. column by “Image/Chart 1” (Fig. 8A, red box).
- A new window will pop up, named “Graphics.”
- Switch to Pie chart tab (Fig. 8B, arrow).
- In the “Data” tab, choose the columns you are interested in and click the Add Selected arrow (Fig. 8B, red box) to put them into “Selected Columns”. In this example, we choose “GNPSGROUP: infected_small intestine_12” and “GNPSGROUP: no_infection_small intestine_12.”
- In the “Options” tab (Fig. 8B, yellow box), you can customize the pie chart colors. In our example, we choose red for “GNPSGROUP: infected_small intestine_12” and blue for “GNPSGROUP:no_infection_small intestine_12.”
- Click Apply in the bottom right.
- You will have a pie chart for each node, displaying the relative ratio of peak areas between selected groups (Fig. 8C).
- (Optional) For clearer pie chart visualization, change from the default blue node background to a white background. Refer to Section 3.4.1 section above for instructions.
Visually explore the network for nodes showing differential abundance between infected and uninfected samples (see Note for statistical analyses of these differences).
If you wish to export information on a select infection-perturbed sub-family, the best way to do so is to create a new subnetwork from this family. Refer to Section 3.4.1 for instructions.
Inspect node annotations. Refer to Section 3.4.1 for instructions.
Expected results: Fig. 8C.
Fig. 8.

Making feature abundance pie charts. (A) Select the “Image/chart” box (red box). (B) Choose pie chart tab (arrow) and select the groups to plot (red box). The options tab (yellow box) can be used to set pie chart colors. (C) Visualize the molecular network. (D) Network table with metabolite feature information.
Note: When molecular network is large or to provide statistical support for step 3, instead of visual network exploration, begin with statistical tools such as random forest analysis, fold change comparison between infected and uninfected samples or statistical significance testing, to identify metabolite features differing between infected and uninfected samples. Then, search for these features in the network as described in Section 3.4.1, step 13.
Note: Pie chart colors should be selected keeping in mind accessibility to color-blind readers. Avoid red (infected) to green (uninfected) visualizations in particular.
Pause point: You may save your Cytoscape session as described in Section 3.4.1.
Critical: Annotations should always be checked for quality of the chemical match (mirror plot, cosine score, number of matched peaks). For analog matches, chemical plausibility of the mass difference should always be considered. A final filter should always be the biological plausibility of the annotation. Refer to the GNPS documentation and (Aron et al., 2020), for more details.
3.4.3. Identifying infection-correlated chemical modifications
Timing: 50min
The goal of this protocol is to determine whether different chemical modifications occur in infected and uninfected samples. For example, you may want to determine whether there is increased oxidative modifications in infected samples, as a potential indicator of oxidative stress (Spickett, 2020), or whether degree of saturation differs between infection conditions. Higher degree of lipid saturation has been correlated with induction of pro-inflammatory pathways (Monson, Trenerry, Laws, Mackenzie, & Helbig, 2021). This protocol takes advantage of the molecular network structure, where related chemicals are connected by edges, which are annotated with the mass difference between nodes, representing a chemical group differing between the two nodes (Fig. 9A). This protocol can be used with classical or feature-based molecular networking.
Fig. 9.

Expected output of edge analysis. Feature-based networking results were subset to only include samples from the esophagus at 12days post-infection and processed as described in the protocol. (A) Sub-network of ceramide-related metabolite features in infected and uninfected esophagus at 12day timepoint. Annotated nodes indicated by arrows. Edges connecting metabolites differing by degree of saturation are highlighted in orange. (B) Partial screenshot of edge table export for all metabolite features detected in infected esophagus samples at 12days post-infection. (C) Partial screenshot of aggregated abundance of each edge annotation, obtained from exported edge table. (D) Relative proportion of H2 and 2 × H2 (single bond differing in saturation between related metabolites or two bonds differing in saturation between related metabolites) between infected and uninfected esophagus samples at 12day timepoint. *P <0.05, Fisher’s exact test.
Open your network in Cytoscape (as described in Section 3.4.1).
- Subset your network to only include infection-associated nodes in one subnetwork and nodes found in uninfected samples in a second subnetwork.
- First, use the Select (Filter) tab to select uninfected samples.
- In the example feature-based molecular network, uninfected samples can be found by selecting Node: ATTRIBUTE_condition and selecting “contains” in the drop-down menus, followed by typing in the search bar: no_infection.
- Create a new sub-network from selected samples using the New Network From Selection (all edges) button.
- Rename this network under the network tab to “uninfected.”
- Last, remove singletons (unconnected) nodes (Fig. 2D). These are unsuitable for edge analysis, since they are not connected to any other nodes. This can be performed by reformatting your subnetwork using the Layout top menu and selecting “Apply preferred layout.” Singleton nodes can then be visualized, highlighted using shift+mouse drag and then deleted using standard keyboard shortcuts.
- if you do not wish to delete nodes, a new subnetwork can instead be created by selecting manually all non-singleton nodes using shift+mouse drag and then creating a new subnetwork from the selection.
- Repeat these steps to select and create a new sub-network for infected samples called “infected,” without singletons.
- In the example feature-based molecular network, these can be found by selecting Node: ATTRIBUTE_condition contains infected.
Use the Export Table to File button to select and export “infected default edge” and “uninfected default edge” tables as .csv files to the location of your choice.
In the edge tables, the column of interest for this protocol is EdgeAnnotation.
Determine the number of occurrences of each annotation, in your “infected default edge” and “uninfected default edge” tables. This can be performed manually in Excel, but best practices would be to use reproducible code such as in Python or R, in Jupyter Notebooks or similar.
Combine these results for infected and uninfected samples into a single worksheet.
From this file, you can calculate the frequency of each edge annotation in infected and in uninfected samples and generate plots using excel or any standard code (see Fig. 9D for example). Statistical analysis can be performed using Fisher’s exact test.
Critical: Automated edge annotations should always be considered in the context of chemical and biochemical plausibility in the infectious disease system under study.
Note: This method does not consider relative abundance of the connected nodes and chemical modifications, only presence/absence.
Note: This method could also be applied to a particular chemical family. For example, if interested in oxidative modification of phospholipids, the steps above should be followed, with additional subsetting to the chemical family of interest.
3.4.4. Optional steps: Demonstrating that specific network nodes of interest may be pathogen-derived
Timing: 40min
This step requires data from axenically-cultured pathogen samples, acquired along with data from tissue samples. This data should be included when generating the molecular network. As an example, we use here a classical molecular network. However, these steps are suitable for both classical and feature-based molecular networks. Selection of the type of network to use will depend on the user’s needs in terms of data filtering vs available culture samples and data clean-up processes.
- Make groups in Cytoscape using the filtering tool. In our example we are looking to group the nodes from infected tissue, cultured parasites, and uninfected tissue.
- Use the Select (or Filter) button on the left side of Cytoscape
- Click the plus button and then Column Filter in the dropdown menu
- Select the column you would like to filter by based on the groups you are trying to create
- Infected tissue:
- To group all nodes present in infected tissue, select Node: ATTRIBUTE_condition as column filter.
- In the text box underneath, type: infected.
- The selected nodes will be yellow. If no nodes are highlighted yellow, check to make sure your spelling matches metadata.
- While the nodes are selected, move the mouse over a blank space between nodes on the network. Right click to pull up options for the selected nodes (Fig. 11B).
- Select the option: Group and then Group Selected Nodes (Fig. 11B).
- A textbox will appear where you can type the desired name for the group. In this example, we will call these “Infected.”
- Cultured parasites:
- To group all nodes present in cultured parasites, select Node: ATTRIBUTE_study as column filter.
- For the cultured parasites, we had two types in our study, labeled tryps (Trypanosoma cruzi trypomastigotes) and ama (T. cruzi amastigotes) in our metadata. To include both of these types in this group, two column filters are needed.
- To add a second column filter, click the plus button and select column filter again.
- Select Node: ATTRIBUTE_study for this column filter as well.
- In the first box, type: tryps and in the second box type: ama.
- We want this group to include all nodes in trypomastigotes or amastigotes, so in the upper right select: Match any (OR).
- Right click on the network with desired nodes selected.
- Select the option: Group and then Group Selected Nodes.
- A textbox will appear to type the desired name for the group. We will call these “Cultured Parasites.”
- Uninfected samples:
- This analysis included both uninfected host cells and uninfected tissue samples.
- To group all nodes present in the uninfected samples, select Node: ATTRIBUTE_study as the first column filter and Node: ATTRIBUTE_condition for the second column filter.
- Type C2C12 in the Node: ATTRUBUTE_study search box. These are uninfected cells of the cell type that were used to culture the T. cruzi parasites.
- Type no_infection in the Node: ATTRIBUTE_condition search box.
- We want this group to include all nodes in either of the categories, so in the upper right select: Match any (OR).
- Right click on the network with desired nodes selected.
- Select the option: Group and then Group Selected Nodes.
- A textbox will appear where you can type the desired name for the group. We will call these “Uninfected.”
- Make Venn diagrams
- Select Apps and select Venn and Euler diagrams.
- A pop up should appear with the groups delineated in the previous steps (Fig. 11C).
- Select all three groups: Infected, Uninfected, and Cultured Parasites and then select the picture of the three-way Venn diagram (red box, Fig. 11C).
- A pop up will appear with the Venn diagram using the groups selected (Fig. 11D).
- Clicking on the section of interest in the Venn diagram will select the nodes in that group within in the network. In our case we are looking for the nodes present in the infected state that are pathogen-derived. These would be the nodes present in both the cultured parasites and the infected groups.
- To visualize the nodes of interest more easily, you can subset those nodes into their own network.
- Make sure the nodes of interest are selected in yellow by clicking on the intersection in the Venn diagram.
- While the nodes of interest are selected, click the “New Network From Selection (all edges)” icon (red box, Fig. 7B).
- The new network name will appear underneath the main network (Fig. 11E). To rename the new network, right click on the current network name and select: Rename Network.
- To improve visibility of the nodes on this subset network, refer to Visualizing and interpreting your molecular network.
- Annotations can be investigated following standard GNPS instructions, available here: https://ccms-ucsd.github.io/GNPSDocumentation/networkingviews/#view-all-library-hits.
Fig. 10.

Installing the Venn and Euler Diagrams app. (A) App manager menu. (B) Choosing the Venn and Euler Diagrams app.
Fig. 11.

Expected output from identifying candidate pathogen-derived nodes. (A) Screenshot of the filter setting used to make groups. “Match any” is used in the protocol, however different analyses may require “Match all.” (B) Creation of a group from selected nodes. Selected nodes are highlighted in yellow. (C) Pop up screen for the Venn and Euler diagram app. All groups created will be shown in the box labeled “Groups.” Multiple groups must be selected to create a Venn diagram (red box). (D) The expected Venn diagram. Nodes of interest are found at the intersection of Cultured Parasites and Infected. Clicking on that section will select the 41 nodes of interest on the network. (E) Screenshot of the subset network name (Parasite Derived Metabolites) created from the nodes of interest and containing 41 metabolite features.
Note: Possibilities that the nodes are present in all sample types but below the limit of detection in uninfected controls should always be considered, as well as instrumental effects such as peak over-splitting or ion suppression.
Note: Absence of a node in axenic pathogen cultures that was present in infected samples should not automatically be interpreted as confirmation that the molecule is host-derived. The possibility that it is indeed a pathogen-derived molecule but only induced in infection settings and not in axenic culture should always be considered.
Note: Overlap between axenic pathogen cultures and uninfected samples is also possible. These metabolite features may represent metabolites common across host and pathogen, but decreased by infection in tissues (with the pathogen-derived metabolite within the tissues below the limit of detection).
Critical: Biological validation should always be performed as a follow-up, to address all the alternative possibilities discussed in Notes.
4. Advantages
The greatest strength of molecular networking is the ability to annotate metabolites that are not currently in reference metabolite libraries, but are structurally-related to these reference metabolites (“analog matches” and annotation propagation). This is particularly important when trying to discover novel pathogen-derived metabolites that are likely to be missing from existing databases. Parasitic infections in particular are under-studied even compared to other pathogens and thus novel parasite metabolites would otherwise be under-annotated in the absence of such methods. Furthermore, GNPS houses “living data”: additional reference spectra are continually being added and datasets re-analyzed, so that annotations increase over time (Wang et al., 2016). By grouping metabolite features into families with shared fragmentation patterns (and thus shared chemical substructures), molecular networking also enables analyses at the chemical family-level and rapid visualization of family-level changes in metabolite abundance between infected and uninfected samples (Dean et al., 2021; Eberhard et al., 2021; Hoffman et al., 2021; Hossain et al., 2020; McCall et al., 2017; Parab et al., 2021). We have demonstrated that this can provide additional insight into metabolic changes across infection conditions that would have been missed if analyses had been performed exclusively at the single metabolite level (Dean et al., 2021; Hoffman et al., 2021; Hossain et al., 2020; McCall et al., 2017). These findings can even be used to guide drug development for infectious diseases (Hossain et al., 2020). Molecular network connectivity further facilitates the determination of the presence and frequency of certain chemical modifications such as oxidation, methylation, etc. This may have strong utility in understanding the oxidative environment present under many infection conditions, though applications so far have often focused on environmental samples (e.g., Hartmann et al., 2017; Roach et al., 2021). Lastly, molecular networking is embedded within the GNPS infrastructure, enabling straightforward integration with multiple other data analysis tools such as in silico annotation tools (NAP, MS2LDA, DEREPLICATOR (da Silva et al., 2018; Mohimani et al., 2017; van der Hooft, Wandy, Barrett, Burgess, & Rogers, 2016)), meta-analysis tools (ReDU-MS2 (Jarmusch et al., 2020)), 3D visualization (‘ili (Protsyuk et al., 2018)), statistical analyses using QIIME2 (Bolyen et al., 2019; Caporaso et al., 2010), etc. Multiple other tools are being developed to facilitate and enrich analyses (e.g., GNPS Dashboard, MolNetEnhancer (Ernst et al., 2019; Petras et al., 2021)).
5. Limitations
As with all metabolomic analyses, there is still considerable “dark matter” (da Silva, Dorrestein, & Quinn, 2015): metabolite features that are unannotated and that cannot benefit from annotation propagation, because none of the connected nodes are annotated. While some or even many of these features may represent redundancy in the data (multiple adducts of the same metabolite, in-source fragmentation…) or artifacts (e.g., chromatographic effects, peak over-splitting during data processing) (Mahieu, Huang, Chen, & Patti, 2014; Sindelar & Patti, 2020), there is nevertheless still significant scope to discover new chemical matter (Cohen et al., 2017; Motley et al., 2017; Quinn et al., 2020). The continuous expansion of reference libraries, supplemented with new in silico tools integrated into GNPS (e.g., NAP, SIRIUS, COSMIC, CSI-FingerID (Böcker, Letzel, Lipták, & Pervukhin, 2009; da Silva et al., 2018; Dührkop et al., 2019; Duhrkop, Shen, Meusel, Rousu, & Bocker, 2015; Hoffmann et al., 2021)) can assist with annotating novel chemistry, while modified GNPS workflows (ion identity molecular networking) can reduce redundancy from the presence of multiple adducts with different MS2 fragmentation patterns (Schmid et al., 2021). In silico annotations should be confirmed with pure standards, where possible, according to the Metabolomics Standards Initiative (Sumner et al., 2007).
Molecular networking, and metabolomics data processing in general, are highly complex. We hope that this protocol, other published protocols (Aron et al., 2020; Phelan, 2020), and the resources on the GNPS website will assist with de-mystifying this approach. Although considerable functionality is available within the molecular networking infrastructure, additional statistical tools outside of GNPS and Cytoscape may be required to select infection-perturbed metabolite features to annotate (Dean et al., 2021; Hoffman et al., 2021; Hossain et al., 2020; McCall et al., 2017, 2018; Parab et al., 2021; Wozniak et al., 2020). The applicability, strengths and limitations of each of these tools should be evaluated prior to implementation. Lastly, standard limitations of LC-MS should always be kept in mind when analyzing resulting data, such as limit of detection, ion suppression, etc.
6. Optimization and troubleshooting
6.1. General troubleshooting
The first resource to troubleshoot unexpected outcomes is the GNPS molecular networking documentation. A second step is to review currently-discussed issues in the GNPS forum (https://groups.google.com/g/molecular_networking_bug_reports/) and troubleshooting pages (https://ccms-ucsd.github.io/GNPSDocumentation/troubleshooting/). Below, we discuss some common problems and solutions encountered when performing the analyses covered by this protocol.
6.2. Problem: Molecular network is too small or contains few annotations
In the case of feature-based molecular networking, the number of network nodes should be identical to the number of features in the feature table. Expected numbers of annotations will vary depending on the biological system of interest; however, common metabolite annotations such as amino acids, nucleotides and phospholipids will generally be obtained. If an internal standard was included at the time of data acquisition, is present in GNPS libraries and was not filtered out during pre-processing, it should be observed in the resulting annotations.
6.2.1. Potential solution to optimize the procedure
Solutions include verifying that the files were correctly converted to open format (classical molecular networking) and correctly processed by your mass spectrometry data processing program of choice (feature-based molecular networking).
6.3. Problem: Unable to link network node to metadata in cytoscape
In some cases, a network may be obtained with annotations as expected. However, each node has not been linked to metadata, so that feature abundance pie charts based on metadata cannot be generated (see Section 3.4.2).
6.3.1. Potential solution to optimize the procedure
The most likely cause is discrepancies between file names in data and metadata. Special characters, diaereses and umlauts should be avoided here (e.g., naïve should be entered as naive).
7. Safety considerations and standards
All data acquisition should follow local institutional regulations, including Institutional Biosafety Committee and Institutional Animal Care and Use Committee-approved protocols. Publications using these methods should provide networking links in manuscript Methods. We also encourage users of these methods to also deposit their own data back into the databases they used (for example GNPS/MassIVE if using these protocols).
8. Alternative methods/procedures
Should data not have been acquired from cultured pathogen samples at the time of data acquisition, a possible pathogen source can nevertheless be hypothesized by co-networking with publicly-available datasets using ReDU (https://redu.ucsd.edu/) or by single-spectrum search (MASST) of publicly-available datasets (Jarmusch et al., 2020; Wang et al., 2020). MASST can be accessed at https://masst.ucsd.edu/ or by navigating to the MASST search on the GNPS home page (https://gnps.ucsd.edu/). MASST will search available datasets for the presence of a given MS2 spectrum. The resulting list can then be mined to determine whether this MS2 is indeed pathogen-specific or common to other organisms (or to blank samples and controls). One strength of the MASST approach is the ability to focus on a specific metabolic feature of interest, and to determine specificity of that feature to the pathogen of interest (Wang et al., 2020). In contrast, ReDU enables global analysis, by building molecular networks that combine the user’s data with existing spectral data (Jarmusch et al., 2020). This approach could for example be used to determine the percentage of metabolic features in infected samples that are shared with the pathogen. In that case, once the co-network has been built, similar steps would be pursued as in Section 3.4.4. A limitation of these approaches is that they require publicly-available data on the pathogen of interest, which may be limiting in the case of neglected diseases. However, these two approaches use “living” databases, which are continually expanding.
Alternative networking approaches that do not rely on cosine scoring are being developed (for example, Spec2Vec or MS2LDA) (Huber et al., 2021; van der Hooft et al., 2016). Several of these are being implemented in the GNPS infrastructure. The resulting networks can then be visualized and analyzed using the approaches described in this manuscript.
9. Summary
Overall, this protocol covers procedures to use molecular networking approaches in the context of infectious diseases. Methods described in this protocol enable the visualization and identification of infection-associated changes in metabolite abundance and in chemical modifications. They also enable hypothesis-generating approaches such as the identification of candidate pathogen-derived metabolites. Metabolomics data analysis and the molecular networking infrastructure is rapidly evolving. We encourage interested researchers to build from these standard approaches to address their biological questions of interest.
Acknowledgments
We wish to thank the inventors and developers of molecular networking, as well as all the contributors who have been iteratively improving it. We have endeavored to cite their publications throughout this manuscript. We also wish to thank the members of the McCall lab who tested these protocols: Kate Wheeler, Camil Gosmanov and Michael Jimenez Sandoval. Research using molecular networking in the context of infectious diseases in the McCall laboratory is supported by NIH 1R21AI148886, NIH 1R21AI156669 and PhRMA foundation 45188. Laura-Isobel McCall, Ph.D. holds an Investigators in the Pathogenesis of Infectious Disease Award from the Burroughs Wellcome Fund. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.
References
- Aksenov AA, Laponogov I, Zhang Z, Doran SLF, Belluomo I, Veselkov D, et al. (2021). Auto-deconvolution and molecular networking of gas chromatography–mass spectrometry data. Nature Biotechnology, 39(2), 169–173. 10.1038/s41587-020-0700-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aron AT, Gentry EC, McPhail KL, Nothias L-F, Nothias-Esposito M, Bouslimani A, et al. (2020). Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nature Protocols, 15(6), 1954–1991. 10.1038/s41596-020-0317-5. [DOI] [PubMed] [Google Scholar]
- Böcker S, Letzel MC, Lipták Z, & Pervukhin A (2009). SIRIUS: Decomposing isotope patterns for metabolite identification†. Bioinformatics, 25(2), 218–224. 10.1093/bioinformatics/btn603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37(8), 852–857. 10.1038/s41587-019-0209-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), 335–336. 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen LJ, Esterhazy D, Kim S-H, Lemetre C, Aguilar RR, Gordon EA, et al. (2017). Commensal bacteria make GPCR ligands that mimic human signalling molecules. Nature, 549(7670), 48–53. 10.1038/nature23874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- da Silva RR, Dorrestein PC, & Quinn RA (2015). Illuminating the dark matter in metabolomics. Proceedings of the National Academy of Sciences of the United States of America, 112(41), 12549–12550. 10.1073/pnas.1516878112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- da Silva RR, Wang M, Nothias L-F, van der Hooft JJJ, Caraballo-Rodríguez AM, Fox E, et al. (2018). Propagating annotations of molecular networks using in silico fragmentation. PLoS Computational Biology, 14(4), e1006089. 10.1371/journal.pcbi.1006089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dean DA, Gautham, Siqueira-Neto JL, McKerrow JH, Dorrestein PC, & McCall L-I (2021). Spatial metabolomics identifies localized chemical changes in heart tissue during chronic cardiac Chagas disease. PLos Neglected Tropical Diseases, 15(10), e0009819. 10.1371/journal.pntd.0009819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dührkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, et al. (2019). SIRIUS 4: A rapid tool for turning tandem mass spectra into metabolite structure information. Nature Methods, 16(4), 299–302. 10.1038/s41592-019-0344-8. [DOI] [PubMed] [Google Scholar]
- Duhrkop K, Shen H, Meusel M, Rousu J, & Bocker S (2015). Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proceedings of the National Academy of Sciences of the United States of America, 112(41), 12580–12585. 10.1073/pnas.1509788112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N, et al. (2011). Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols, 6(7), 1060–1083. http://www.nature.com/nprot/journal/v6/n7/abs/nprot.2011.335.html#supplementary-information. [DOI] [PubMed] [Google Scholar]
- Eberhard FE, Klimpel S, Guarneri AA, & Tobias NJ (2021). Metabolites as predictive biomarkers for Trypanosoma cruzi exposure in triatomine bugs. Computational and Structural Biotechnology Journal, 19, 3051–3057. 10.1016/j.csbj.2021.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ernst M, Kang KB, Caraballo-Rodríguez AM, Nothias L-F, Wandy J, Chen C, et al. (2019). MolNetEnhancer: Enhanced molecular networks by integrating metabolome mining and annotation tools. Metabolites, 9(7), 144. Retrieved from https://www.mdpi.com/2218-1989/9/7/144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forsberg EM, Huan T, Rinehart D, Benton HP, Warth B, Hilmers B, et al. (2018). Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS online. Nature Protocols, 13(4), 633–651. 10.1038/nprot.2017.151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garg N, Wang M, Hyde E, da Silva RR, Melnik AV, Protsyuk I, et al. (2017). Three-dimensional microbiome and metabolome cartography of a diseased human lung. Cell Host Microbe, 22(5). 10.1016/j.chom.2017.10.001.705-716.e704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartmann AC, Petras D, Quinn RA, Protsyuk I, Archer FI, Ransome E, et al. (2017). Meta-mass shift chemical profiling of metabolomes from coral reefs. Proceedings of the National Academy of Sciences, 114(44), 11685–11690. 10.1073/pnas.1710248114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heine D, Holmes NA, Worsley SF, Santos ACA, Innocent TM, Scherlach K, et al. (2018). Chemical warfare between leafcutter ant symbionts and a co-evolved pathogen. Nature Communications, 9(1), 2208. 10.1038/s41467-018-04520-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffman K, Liu Z, Hossain E, Bottazzi ME, Hotez PJ, Jones KM, et al. (2021). Alterations to the cardiac metabolome induced by Chronic T. cruzi infection relate to the degree of cardiac pathology. ACS Infectious Diseases, 7(6), 1638–1649. 10.1021/acsinfecdis.0c00816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffmann MA, Nothias L-F, Ludwig M, Fleischauer M, Gentry EC, Witting M, et al. (2021). Assigning confidence to structural annotations from mass spectra with COSMIC. bioRxiv. 10.1101/2021.03.18.435634.2021.2003.2018.435634. [DOI] [Google Scholar]
- Hossain E, Khanam S, Dean DA, Wu C, Lostracco-Johnson S, Thomas D, et al. (2020). Mapping of host-parasite-microbiome interactions reveals metabolic determinants of tropism and tolerance in Chagas disease. Science Advances, 6(30), eaaz2015. 10.1126/sciadv.aaz2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber F, Ridder L, Verhoeven S, Spaaks JH, Diblen F, Rogers S, et al. (2021). Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships. PLoS Computational Biology, 17(2), e1008724. 10.1371/journal.pcbi.1008724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Idle JR, & Gonzalez FJ (2007). Metabolomics. Cell Metabolism, 6(5), 348–351. 10.1016/j.cmet.2007.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ivanisevic J, & Want EJ (2019). From samples to insights into metabolism: Uncovering biologically relevant information in LC-HRMS metabolomics data. Metabolites, 9(12), 308. Retrieved from https://www.mdpi.com/2218-1989/9/12/308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jarmusch AK, Wang M, Aceves CM, Advani RS, Aguirre S, Aksenov AA, et al. (2020). ReDU: A framework to find and reanalyze public mass spectrometry data. Nature Methods. 10.1038/s41592-020-0916-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lai Z, Tsugawa H, Wohlgemuth G, Mehta S, Mueller M, Zheng Y, et al. (2017). Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics. Nature Methods. 10.1038/nmeth.4512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis CM Jr., McCall LI, Sharp RR, & Spicer PG (2020). Ethical priority of the most actionable system of biomolecules: The metabolome. American Journal of Physical Anthropology, 171(2), 177–181. 10.1002/ajpa.23943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mahieu NG, Huang X, Chen YJ, & Patti GJ (2014). Credentialing features: A platform to benchmark and optimize untargeted metabolomic methods. Analytical Chemistry, 86(19), 9583–9589. 10.1021/ac503092d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCall LI, Morton JT, Bernatchez JA, de Siqueira-Neto JL, Knight R, Dorrestein PC, et al. (2017). Mass spectrometry-based chemical cartography of a cardiac parasitic infection. Analytical Chemistry, 89(19), 10414–10421. 10.1021/acs.analchem.7b02423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCall LI, Tripathi A, Vargas F, Knight R, Dorrestein PC, & Siqueira-Neto JL (2018). Experimental Chagas disease-induced perturbations of the fecal microbiome and metabolome. PLoS Neglected Tropical Diseases, 12(3), e0006344. 10.1371/journal.pntd.0006344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Melnik AV, Vázquez-Baeza Y, Aksenov AA, Hyde E, McAvoy AC, Wang M, et al. (2019). Molecular and microbial microenvironments in chronically diseased lungs associated with cystic fibrosis. mSystems, 4(5). 10.1128/mSystems.00375-19. e00375–00319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohimani H, Gurevich A, Mikheenko A, Garg N, Nothias LF, Ninomiya A, et al. (2017). Dereplication of peptidic natural products through database search of mass spectra. Nature Chemical Biology, 13(1), 30–37. 10.1038/nchembio.2219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monson EA, Trenerry AM, Laws JL, Mackenzie JM, & Helbig KJ (2021). Lipid droplets and lipid mediators in viral infection and immunity. Fems Microbiology Reviews, 45(4), fuaa066. 10.1093/femsre/fuaa066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Motley JL, Stamps BW, Mitchell CA, Thompson AT, Cross J, You J, et al. (2017). Opportunistic sampling of roadkill as an entry point to accessing natural products assembled by bacteria associated with non-anthropoidal mammalian microbiomes. Journal of Natural Products, 80(3), 598–608. 10.1021/acs.jnatprod.6b00772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newsom SN, & McCall LI (2018). Metabolomics: Eavesdropping on silent conversations between hosts and their unwelcome guests. PLoS Pathogens, 14(4), e1006926. 10.1371/journal.ppat.1006926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nothias L-F, Nothias-Esposito M, da Silva R, Wang M, Protsyuk I, Zhang Z, et al. (2018). Bioactivity-based molecular networking for the discovery of drug leads in natural product bioassay-guided fractionation. Journal of Natural Products, 81(4), 758–767. 10.1021/acs.jnatprod.7b00737. [DOI] [PubMed] [Google Scholar]
- Nothias LF, Petras D, Schmid R, Dührkop K, Rainer J, Sarvepalli A, et al. (2020). Feature-based molecular networking in the GNPS analysis environment. Nature Methods, 17(9), 905–908. 10.1038/s41592-020-0933-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parab AR, Thomas D, Lostracco-Johnson S, Siqueira-Neto JL, McKerrow JH, Dorrestein PC, et al. (2021). Dysregulation of glycerophosphocholines in the cutaneous lesion caused by leishmania major in experimental murine models. Pathogens, 10(5). 10.3390/pathogens10050593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petras D, Phelan VV, Acharya D, Allen AE, Aron AT, Bandeira N, et al. (2021). GNPS dashboard: Collaborative analysis of mass spectrometry data in the web browser. bioRxiv. 10.1101/2021.04.05.438475.2021.2004.2005.438475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phelan VV (2020). Feature-based molecular networking for metabolite annotation. Methods in Molecular Biology, 2104, 227–243. 10.1007/978-1-0716-0239-3_13. [DOI] [PubMed] [Google Scholar]
- Pluskal T, Castillo S, Villar-Briones A, & Orešič M (2010). MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics, 11(1), 395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Protsyuk I, Melnik AV, Nothias LF, Rappez L, Phapale P, Aksenov AA, et al. (2018). 3D molecular cartography using LC-MS facilitated by Optimus and ‘ili software. Nature Protocols, 13(1), 134–154. 10.1038/nprot.2017.122. [DOI] [PubMed] [Google Scholar]
- Quinn RA, Melnik AV, Vrbanac A, Fu T, Patras KA, Christy MP, et al. (2020). Global chemical effects of the microbiome include new bile-acid conjugations. Nature, 579(7797), 123–129. 10.1038/s41586-020-2047-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reis FCG, Costa JH, Honorato L, Nimrichter L, Fill TP, & Rodrigues ML (2021). Small molecule analysis of extracellular vesicles produced by Cryptococcus gattii: Identification of a tripeptide controlling Cryptococcal infection in an invertebrate host model. Frontiers in Immunology, 12(652). 10.3389/fimmu.2021.654574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roach TNF, Dilworth J, Christian Martin H, Jones AD, Quinn RA, & Drury C (2021). Metabolomic signatures of coral bleaching history. Nature Ecology & Evolution, 5(4), 495–503. 10.1038/s41559-020-01388-7. [DOI] [PubMed] [Google Scholar]
- Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, et al. (2016). OpenMS: A flexible open-source software platform for mass spectrometry data analysis. Nature Methods, 13(9), 741–748. 10.1038/nmeth.3959. [DOI] [PubMed] [Google Scholar]
- Schmid R, Petras D, Nothias L-F, Wang M, Aron AT, Jagels A, et al. (2021). Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nature Communications, 12(1), 3832. 10.1038/s41467-021-23953-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senges CHR, Al-Dilaimi A, Marchbank DH, Wibberg D, Winkler A, Haltli B, et al. (2018). The secreted metabolome of Streptomyces chartreusis and implications for bacterial chemistry. Proceedings of the National Academy of Sciences, 115(10), 2490–2495. 10.1073/pnas.1715713115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–2504. 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sindelar M, & Patti GJ (2020). Chemical discovery in the era of metabolomics. Journal of the American Chemical Society, 142(20), 9097–9105. 10.1021/jacs.9b13198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith CA, Want EJ, O’Maille G, Abagyan R, & Siuzdak G (2006). XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry, 78(3), 779–787. 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]
- Spickett CM (2020). Formation of oxidatively modified lipids as the basis for a cellular epilipidome. Frontiers in Endocrinology, 11(974). 10.3389/fendo.2020.602771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sturm M, Bertsch A, Gropl C, Hildebrandt A, Hussong R, Lange E, et al. (2008). OpenMS—An open-source software framework for mass spectrometry. BMC Bioinformatics, 9, 163. 10.1186/1471-2105-9-163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sumner LW, Amberg A, Barrett D, Beale MH, Beger R, Daykin CA, et al. (2007). Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics: Official Journal of the Metabolomic Society, 3(3), 211–221. 10.1007/s11306-007-0082-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tautenhahn R, Böttcher C, & Neumann S (2008). Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics, 9(1), 504. 10.1186/1471-2105-9-504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsugawa H, Cajka T, Kind T, Ma Y, Higgins B, Ikeda K, et al. (2015). MS-DIAL: Data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nature Methods, 12(6), 523–526. 10.1038/nmeth.3393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Hooft JJJ, Wandy J, Barrett MP, Burgess KEV, & Rogers S (2016). Topic modeling for untargeted substructure exploration in metabolomics. Proceedings of the National Academy of Sciences of the United States of America. 10.1073/pnas.1608041113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang M, Carver JJ, Phelan VV, Sanchez LM, Garg N, Peng Y, et al. (2016). Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nature Biotechnology, 34(8), 828–837. 10.1038/nbt.3597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang M, Jarmusch AK, Vargas F, Aksenov AA, Gauglitz JM, Weldon K, et al. (2020). Mass spectrometry searches using MASST. Nature Biotechnology, 38(1), 23–26. 10.1038/s41587-019-0375-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Want EJ, Masson P, Michopoulos F, Wilson ID, Theodoridis G, Plumb RS, et al. (2013). Global metabolic profiling of animal and human tissues via UPLC-MS. Nature Protocols, 8(1), 17–32. 10.1038/nprot.2012.135. [DOI] [PubMed] [Google Scholar]
- Watrous J, Roach P, Alexandrov T, Heath BS, Yang JY, Kersten RD, et al. (2012). Mass spectral molecular networking of living microbial colonies. Proceedings of the National Academy of Sciences of the United States of America, 109(26), E1743–E1752. 10.1073/pnas.1203689109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wozniak JM, Mills RH, Olson J, Caldera JR, Sepich-Poore GD, Carrillo-Terrazas M, et al. (2020). Mortality risk profiling of Staphylococcus aureus bacteremia by multi-omic serum analysis reveals early predictive and pathogenic signatures. Cell, 182(5). 10.1016/j.cell.2020.07.040.1311-1327.e1314. [DOI] [PMC free article] [PubMed] [Google Scholar]
