Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 7.
Published in final edited form as: Anal Chem. 2021 Aug 26;93(35):12001–12010. doi: 10.1021/acs.analchem.1c02041

Expanding Urinary Metabolite Annotation through Integrated Mass Spectral Similarity Networking

Fausto Carnevale Neto †,*, Daniel Raftery †,‡,*
PMCID: PMC8530160  NIHMSID: NIHMS1745813  PMID: 34436864

Abstract

The urine metabolome constitutes a rich source of functional information reflecting physiological states that are influenced by distinct conditions and biological stresses, such as responses to drug treatments or disease manifestations. Although global LC-MS profiling provides the most comprehensive measurement of metabolites in complex biological samples, annotation remains a challenge, and computational approaches are necessary to translate the molecular composition into biological knowledge. Here, we investigated the use of tandem MS-based enhanced molecular networks (MolNetEnhancer) to improve the metabolite annotation of urine extracts. The samples (n=10) were analyzed by hydrophilic interaction chromatography (HILIC)-quadrupole time-of-flight mass spectrometry in both electrospray (ESI) ionization modes. Consistent with other common data preprocessing software, the use of Progenesis QI led to the annotation of up to 20 metabolites based on MS2 library searches showing high fragmentation score (cosine similarity ≥ 0.7), that is, ~2% of mass features containing MS2 spectra. Molecular networking (MN) based on library matching resulted in annotation of up to 62 urinary compounds. Using a combination of unsupervised substructure discovery (MS2LDA), the in silico tool network annotation propagation (NAP), and ClassyFire chemical ontology, embedded in a multi-layered MN by MolNetEnhancer, we were able to expand the chemical characterization to ~50% of the dataset. The integrative approach led to the annotation of 275 compounds at the Metabolomics Standards Initiative (MSI) confidence level 2, as well as 459 and 578 urinary metabolites (MSI level 3) in both negative and positive ESI modes, respectively. The exhaustive MS2-based annotation outperformed similar studies applied to larger cohorts while offering the discovery of metabolites not identified by the MS2 library search. This is the first work that effectively integrates orthogonal annotation methods and MS2-based fragmentation studies to improve metabolite annotation in urine samples.

Keywords: urine, molecular networking, LC-MS, metabolomics, tandem mass spectrometry, high-resolution mass spectrometry

INTRODUCTION

Global metabolite profiling aims to comprehensively describe the chemical composition of biological samples for the purposes of metabolic exploration, mechanistic understanding, and biomarker discovery.13 To achieve this objective, untargeted metabolomics relies on accurate and selective analytical methods that provide the broadest number of detected metabolites and their correct molecular assignment.4 Liquid chromatography (LC) coupled with mass spectrometry (MS) has been recognized as the most powerful general strategy to explore the chemical diversity of biological extracts, as it provides high selectivity and high sensitivity.4,5

Various bioinformatics approaches have been developed to improve large-scale metabolite annotation.68 Novel databases and shared MS2 libraries have assisted the detection of known metabolites through spectral matching.9,10 Freely available in silico fragmentation tools have expanded to capture a larger portion of the chemical space.11,12 Additionally, molecular and metabolic networks have facilitated metabolite annotation by grouping and propagating massive amounts of experimental data.1316 Nevertheless, the application of such methods still demands several steps of data preparation and fastidious manual curation.

MolNetEnhancer is a workflow within the GNPS (Global Natural Products Social Molecular Networking) data-sharing web-based platform13 that integrates library matching, molecular substructure discovery, in silico fragmentation tools and chemical classification ontologies into the same molecular network (MN).17 By embedding experimental and predictive outputs into multi-informative MN layers, it can provide a more comprehensive metabolite assignment at various molecular “hierarchical levels”, that is, from wide chemical classes, to structurally diverse scaffolds and candidate structures, as well as differing minor functionalization.

Here, we investigated the performance of MolNetEnhancer to improve metabolite identification of urine samples through the establishment of a multi-layered mass spectral network built on independent annotation tools, as illustrated in Schemes 1 and 2. The urinary metabolome represents a potential source for clinically diagnostic biomarkers since it provides a fingerprint for each individual, containing information on disease states, as well as age, lifestyle, dietary intake, and disease history.18 MN was employed to investigate antihypertensive drugs excreted in urine samples.19 The study led to the concomitant identification of various endogenous urinary metabolites, but putative annotation was performed exclusively by the MS2 library search. Recently, all the metabolites detected by global LC-MS2 profiling were annotated by a combination of spectral matching, use of authentic standards, in silico prediction, and the NIST hybrid search, that is NIST direct peak matching with the neutral-loss scoring.20 Yet, it required data preparation involving a number of independent steps. Our goal is to examine the capabilities of MolNetEnhancer to enhance metabolite coverage and structural assignment while using urine, a biofluid commonly explored in metabolomics approaches, but sparingly investigated through the GNPS workflow. The methods studied here can speed up the exploration of the urinary metabolome and improve the ability to analyze other clinical samples in translational research.

Scheme 1.

Scheme 1.

MolNetEnhancer integrates complementary annotation tools into the same molecular network, including mass spectral molecular networking (GNPS), (1) unsupervised substructure discovery (MS2LDA), (2) in silico predictor Network Annotation Propagation (NAP), and the chemical classification ClassyFire (3).

Scheme 2. MolNetEnhancer workflow.

Scheme 2.

1) MS2LDA identifies recurrent fragmentation patterns across the entire dataset and converts them into motifs. The motifs represent a particular chemical substructure and can be annotated by comparison with the MotifDB. 2) NAP predicts candidate hits for every node that has not been annotated. It then recalculates the predictions using neighbor nodes previously annotated by the library search, if available. 3) All the annotated nodes are submitted to chemical classification via ClassyFire, which employs molecular descriptors, such as SMILES, to output chemical ontology terms. MolNetEnhancer calculates the most predominant chemical ontology for the entire cluster, which can be mapped through the network.

EXPERIMENTAL SECTION

Chemicals.

Acetonitrile (ACN), methanol, ammonium acetate, and acetic acid (all LC–MS grade) were obtained from Fisher Scientific (Pittsburgh, PA). DI water was obtained using an 18 MΩ Milli-Q system (EMD Millipore Corporation, Billerica, MA).

Samples and their preparation.

Urine samples were provided in de-identified form by Million Marker (Berkeley, CA) and were obtained with informed consent and prepared at Northwest Metabolomics Center as previously described.21 For details, see Supporting Information.

HILIC-ESI-QTOF/MS analysis.

Data acquisition was carried out using an Agilent 1200 SL LC system coupled to an Agilent 6520 Q-TOF mass spectrometer (Agilent Technologies, Santa Clara, CA) and equipped with an Agilent Jet Stream Technology electrospray ionization (ESI) source. Chromatographic separation was performed using a WATERS XBridge BEH Amide (15 cm × 2.1 mm, 2.5 μm) column. The solvent system consisted of mobile phase A: H2O:ACN (95:5, v/v), and mobile phase B: H2O:ACN (5:95, v/v), both containing 5 mM ammonium acetate and 0.1 % acetic acid. Gradient elution mode was as follows: 0–6.5 min, 94–78% B; 6.5–12.0 min, 78–39% B; 12.0–18.5 min, 39% B; 18.5–19.0 min, 39–94% B; 19.0–35.0 min, 94% B. The flow rate was 0.3 mL min−1, the column temperature was 35 °C, and the injection volume was 5 μL. QTOF/MS parameters can be found in the Supporting Information.

Data processing and metabolite annotation.

Raw data were converted into mzXML format using ProteoWizard MSConvert freeware.22 We selected 32-bit encoding precision and zlib compression. Data processing and peak annotation were performed using two non-targeted pipelines. In the first workflow, the mzXML files were imported into Progenesis QI software v. 2.2.5826.42898 (Nonlinear Dynamics, Newcastle, UK). The software-based workflow involved data import, peak alignment, peak picking and peak deconvolution. Peak annotation from the full scan (MS1) data was performed by searching metabolites from HMDB9,23 and Urine Metabolome Database.18 MS2 data were annotated by matching to MoNA (MassBank of North America) MS2 libraries. MoNA congregates publicly available mass spectra from MetaboBASE, GNPS, HMDB, LipidBlast, ReSpect and MassBank databases in one repository6. For details, see Supporting Information.

The mzXML files were also uploaded to the UCSD GNPS FTP-server (http://gnps.ucsd.edu) and investigated via the METABOLOMICS-SNETS13 workflow with the following parameters: parent mass tolerance 0.02 Da, ion tolerance 0.02 Da, removal of all MS/MS peaks within +/− 17 Da of the precursor m/z value, minimal pair cosine 0.7, network topK 10, maximum connected component size 100, minimum matched peaks 2, minimum cluster size 1 or 2. Metabolites were identified by searching the databases using the same setup as that for the input data. The molecular networks (MNs) were also subjected to MS2LDA24,25 (http://ms2lda.org/) and Network Annotation Propagation (NAP)26 workflows. MS2LDA identifies molecular substructures by extracting sets of concurring mass fragments and neutral losses features, named Mass2Motifs, from the MS/MS data. NAP propagates re-ranked candidates into the spectral networks using in silico fragmentation tools and spectral library matching. The outputs from molecular networking, MS2LDA and NAP were merged using MolNetEnhancer.17 All the nodes annotated by library search or in silico prediction were submitted to chemical classification using ClassyFire hierarchical chemical ontology software.27 We also performed feature-based molecular networking (FBMN) using the feature quantification table (.CSV format) and the MS2 spectral summary (.MSP format) directly exported from Progenesis QI.28 The FBMN workflow was set with parameters similar to classical MN configuration. Data were visualized via Cytoscape (ver. 3.2.1).29 To avoid misinterpretation of HPLC-contaminants and/or noise, blank injections (mobile phase) were input to Spectral Networks as a distinct sample group. LC-MS2 data were deposited in the MassIVE Public GNPS data set (http://massive.ucsd.edu, MSV000087011 and MSV000087012). All networks can be accessed online (links provided in the Supporting Information).

RESULTS AND DISCUSSION

To test the effectiveness of different processing methods on compound identification, we analyzed a set of 10 urine samples by HILIC-ESI-QTOF MS2. The data were acquired in both ESI modes, and we utilized the data-dependent acquisition (DDA) strategy, selecting the two most intense ions in each MS1 scan (top2) for collision induced dissociation (CID). During DDA, the instrument was configured to identify a set of priority peaks from the MS1 scan; subsequently, each ion was sequentially subjected to CID to generate MS2 spectra. All other peaks outside the selected precursor isolation window were discarded, meaning that the number of precursors that triggered MS2 fragmentations is sample dependent.30 The complex urine dataset was processed using the Waters Progenesis QI software.31

Progenesis QI metabolite annotation.

Progenesis QI data processing resulted in the extraction of approximately 3,300 and 3,400 mass features for each sample for (−)-ESI and (+)-ESI mode, respectively. Of these, only 15–20% contained MS2 data, as indicated in Table 1. The mass features that comprise full-scan MS1 data only were tentatively identified by searching similar m/z and isotopic distributions in the HMDB and Urine Metabolome DB. Mass features containing MS2 spectra were also putatively annotated by spectral matching with the MoNA database repository.6 An overlap of 55 mass features in (−)-ESI mode and that of 80 mass features in (+)-ESI mode were similarly annotated by both full-scan MS1 or MS2, as shown in Figure 1A. After applying a minimum fragmentation score of 0.7, calculated through the GNPS version of the implementation of cosine similarity,32 the MS2 annotation dropped to 10 metabolites in negative and 20 in positive ion mode (Table 1, Figure 1B). The poor metabolite annotation at high MS2 matching score can be partially related to the low intensity of precursor ions and the number of samples.

Table 1.

Annotation via Progenesis QI using LC-MS2 data from ten human urine samples.

(−)-ESI (+)-ESI
Progenesis QI
Number of mass features 3301 3410
Mass features w/o MS2 spectra 2802 2767
Mass features w MS2 spectra 499 642
MONA MS2 spectral matching 106 238
MS2 fragmentation score ≥ 0.7 10 20
Full-scan MS1 annotation 385 503

Figure 1. Annotation through Progenesis QI.

Figure 1.

A) Overlapped annotation of mass features using full-scan MS1 (red) and MS2 library matching (black). B) Candidate hits according to the fragmentation score (cosine similarity method32).

Classical Molecular Networking metabolite annotation.

The dataset was also analyzed by the GNPS workflow.13 We generated one MN for each ionization polarity. The MS-Cluster algorithm33 was applied to collapse identical spectra into a single consensus MS2 spectrum (or node). Because each sample contains > 10,000 MS2 spectra, we first examined the minimum number of MS2 spectra required in a node to be included in the network. Using a minimum cluster size = 1, GNPS formed 37,656 and 30,474 nodes, in (−)-ESI and (+)-ESI modes, respectively. MS2 library search using a cosine scoring threshold of 0.7 resulted in the annotation of 77 nodes in both (−) and (+)-ESI after excluding repeating nodes with same candidate hits. By removing all MS2 spectra that have only been encountered once, i.e., the singleton nodes, GNPS generated 565 and 787 nodes, in (−)-ESI and (+)-ESI modes, respectively, and led to the annotation of 48 nodes in both ionization modes after removing repeating nodes, as displayed in Table 2. The use of singleton nodes allows for more comprehensive metabolite annotation, however it significantly increases MN processing time and makes manual curation impractical. In our small dataset, ~75% of the number of MS2 spectra were from singletons, however a larger number referred to repeating nodes with similar candidate hits, possibly formed by MS2 spectra from background (the MNs can be access through the links in the supporting information). Here, we selected a minimum cluster size n = 2 for a detailed analysis of the different annotation approaches.

Table 2.

Annotation via GNPS using LC-MS2 data from ten human urine samples.

(−)-ESI (+)-ESI
MolNetEnhancer
Number of MS2 spectra 13968 10180
Number of nodes 565 787
Total IDs (≥ 0.7 match) 62 51
IDs ≥ 0.7 w/o repeating hits 48 48
MS2LDA motifs 89 221
NAP in silico IDs 135 397
ClassyFire annotation 274 424

One limitation of classical MN is that the MS-Cluster does not take into account retention time, allowing isobaric species, including positional isomers and in-source fragments, to be grouped into the same consensus MS2 spectrum.34 GNPS has recently deployed an FBMN algorithm, which weights feature detection and alignment tools, to enhance quantitative analyses and isomer distinction in MN.28 By creating a FBMN for each ionization polarity using the mass features pre-processed by Progenesis QI, we obtained 55 and 61 candidate hits in 3301 and 3410 nodes from (−)-ESI and (+)-ESI modes, respectively. The identification rate of ~ 1.5% of the nodes were similar to what we observed using Progenesis QI without applying the fragmentation cut-off of 0.7. Nevertheless, only 7 hits, all detected in (−)-ESI mode, were exclusively present in both FBMN and Progenesis QI lists of candidates (Figure S-1AB). The comparison among classical MN, FBMN and Progenesis QI showed that 2 (positive ESI) and 6 (negative ESI) candidate hits were similarly annotated by all the approaches. However, only 1 hit, in (+)-ESI mode, had Progenesis QÌs MS2 fragmentation score > 0.7 (Figure S-1). Between classical MN and Progenesis QI workflows, we observed that only 9 candidate hits were similarly annotated in both methods, in the (−)-ESI mode (4, if we use a cutoff of 0.7 on Progenesis QÌs MS2 score). Likewise, 14 candidates were concomitantly identified in the (+)-ESI mode (9 hits using the 0.7 cutoff).

Regardless of the method applied in our dataset, the identification based exclusively on MS2 library searching led to the annotation of up to 10% of the nodes. We observed 2–3% annotation based on MS2 library search using fragmentation score (≥ 0.7), even after isotope filtering and adduct merging performed by the Progenesis pre-processing pipeline. Despite the annotation rates from MN and Progenesis being not completely comparable due to pre-processing steps, such results are in accordance with previous untargeted metabolomics studies, in which 2–5% of chemical compounds are annotated,35 with newer approaches reaching 20%,6 even with the rich structural information contained in the spectra.

MolNetEnhancer metabolite annotation.

In order to expand the metabolite annotation of the urine dataset, we employed a combination of mass spectral molecular networking (GNPS), unsupervised substructure discovery (MS2LDA) and in silico tools (NAP and ClassyFire) through the MolNetEnhancer workflow.17 We did not use the FBMN workflow since it is not yet fully compatible with GNPS annotation tools. The results of this integrative approach are summarized in Table 2 (for more details, see Supporting Information). The MNs of each ESI mode are colored according to Mass2Motifs or chemical classification (Figures S-2 to S-7). From the 565 nodes formed in (−)-ESI mode, 90 nodes (~16%) were annotated according to 45 Mass2Motifs mapped on to the MN, apart from the detection of 62 hits obtained by library searches. Concurrently, up to 135 nodes (~24%) were identified using NAP. In the same way, out of 787 nodes observed in (+)-ESI mode and 51 hits through MS2 matching, 221 (~28%) were identified based on 73 Mass2Motifs, and up to 397 nodes (~50%) were annotated based on the in silico MN propagation system, as displayed in Figure 2A. MolNetEnhancer increased the recovery of structural chemical information, at least on a broad level, up to 8 times, when compared to GNPS library matching. This result come from merging substructure molecular features – generated by MS2LDA –, and in silico chemical prediction, alongside MS2 spectral matching.

Figure 2. Annotation through MolNetEnhancer.

Figure 2.

A) Overlapped nodes putatively annotated by GNPS library search, MS2LDA substructure discovery, and NAP-Fusion in silico prediction. B) ClassyFire chemical ranking of nodes detected in positive and negative ESI modes, at different hierarchical degrees, including the “superclass” level.

MS2LDA annotation arises from latent Dirichlet allocation (LDA) decomposition of MS2 spectra into Mass2Motifs.24 Using GNPS, recurrent interactions, i.e., shared fragmentation patterns and/or neutral losses, are propagated through the network, adding edges to connect the nodes. A total of 1872 and 1361 edges (interactions) were linked to the nodes in (−)-ESI and (+)-ESI mode, respectively. Of those interactions, 55% in (−)-ESI mode and 60% in (+)-ESI mode represent motifs previously described using the MS2LDA platform. The structural features contained in the motifs can be mined in MotifDB, a database of curated and annotated Mass2Motifs that includes amino acid related species, nucleotide related species, and many other molecules from various biological origin, as well as MS/MS spectra from GNPS and Massbank libraries.36 The motifs support the annotation independently achieved by MS2 matching, and guide the manual metabolite identification of “known unknown” nodes.

As a complement to MS2LDA, we performed the in silico annotation using the NAP workflow.26 NAP performs in silico fragmentation prediction, using MetFrag, in parallel to the MS2 matching. The in silico candidates are then re-ranked based on the MN topology: through MetFusion, using results from a library search of node neighbors, or via Consensus, in which there are not enough hits and the ranking remains based on structural similarity of the in silico candidates. We obtained 135 MetFrag, 111 Consensus, and 23 Fusion in silico candidates in the MN using data acquired in (−)-ESI mode. Of these, fifteen hits from MN library searching were accurately predicted after NAP-Fusion re-ranked the calculations. In the MN based on the (+)-ESI dataset, we observed 397 MetFrag, 395 Consensus, and 62 Fusion candidates, with eight hits also annotated by this approach (Figure S-8A).

All the metabolites detected using GNPS were further submitted to chemical taxonomy structure determination using ClassyFire within the MolNetEhancer environment.27 MolNetEnhancer uses the output from NAP in silico prediction, as well as other annotation tools, to calculate the most abundant chemical classes per cluster and provide a grasp on sub-structural composition across the molecular family.17 We mapped the MNs using “superclass” and “direct parent” classification levels, which are defined by the largest structural feature that describes the compound. ClassyFire annotated ~50% of the nodes that form both MNs (Figure 2B) according to approx. 50 different chemical classes. In (−)-ESI, ~23% of candidate hits obtained by GNPS library search were classified as organic acids and derivatives, 13% as organoheterocyclic compounds, 11% as organic oxygen compounds and 8% as lipids. Similarly, in (+)-ESI mode, 39% of the nodes annotated by MS2 spectral matching were suggested as organic acids, 10% as organic oxygen compounds, and 10% as lipids (Figure S-8B). Narrowing the analysis to the direct parent ion level, ClassyFire showed that the urine metabolome detected in (−)-ESI consisted mainly of fatty acids, histidine and derivatives, sugar alcohols and hippuric acid, while in (+)-ESI mode, it comprised mostly amino acids, acyl carnitines, oligosaccharides and nucleosides. All molecular classes provided by ClassyFire are illustrated in Figures S-9 and S-10.

Unknown metabolite annotation.

The chemical information provided by independent annotation tools assisted both the discovery of metabolites with no structural matches in the GNPS spectral libraries and the verification of putative identifications by additional structural evidence. Here, we give some examples of how the various annotation tools can be used together. In one cluster observed in (+)-ESI, GNPS putatively annotated a unique node by library matching as methyladenosine (Figure 3AB). Manual annotation of neighbor nodes is challenging given the unique hit. Correspondence between GNPS nodes and Progenesis QI mass features indicated that up to five compounds were also annotated by this conventional approach (Figure 3B). Progenesis suggested two compounds based on full-scan MS1 spectra, using high-resolution m/z and isotopic pattern, and three metabolites through MS2 library search. However, only one candidate hit matched GNPS annotation and the MS2 spectral matching score cut-off: O-methyluridine ([M+H]+ m/z 259.0941, 6.5 ppm, frag. matching score 73.1). MS2LDA substructure annotation extracted three Mass2Motifs, with the most prevalent (gnps_motif_44) being related to the elimination of a pentose substituent in glycosylated compounds, as illustrated in Figure 3A. The ClassyFire hierarchical system proposed “nucleosides, nucleotides and analogues” as the most frequent structural class within this cluster. Similarly, the nodes were classified as “purine nucleosides” according to the direct parent level (Figure 3B). Based on the cumulative information integrated by the MolNetEnhancer platform, we annotated 13 urinary nucleosides, metabolites of RNA turnover that represent potential biomarkers for early diagnosis of cancer.37

Figure 3.

Figure 3.

Spectral cluster observed in ESI positive mode and formed by nucleoside derivatives. A) MN layout representing the Mass2motifs interactions that group the nodes. B) Bar plot with the number of annotated nodes according to the different approaches. Additional MN layouts showing GNPS library search (blue node), nodes equivalent to mass features annotated by Progenesis QI using fullscan MS1 (red node) and MS2 (green node), and the chemical classification at the superclass and direct parent levels. ClassyFire also suggested “Purine Nucleosides” at the class level.

In another cluster detected in (+)-ESI, the GNPS library search resulted in two annotations out of 21 nodes, acetylcarnitine and methylbutyrylcarnitine, as shown in Figure 4AB. Progenesis QI misannotated the unique hit proposed by the full scan MS1 data, while it correctly assigned one compound out of five candidates when using MS2 matching, i.e., butyrylcarnitine ([M+H]+ m/z 232.1539, −1.8 ppm, frag. matching score 53.0). Surprisingly, MS2LDA extracted four previously uncharacterized Mass2Motifs but no motif related to acylcarnitine derivatives, despite the widespread detection of product ion m/z 85, related to the diagnostic fragment C4H5O2+.38 The four fragmentation patterns observed in the form of distinct motifs may indicate acylcarnitine substitutions based on, for example, the diagnostic fragment m/z 99 (C5H7O2+) from methylation.39 ClassyFire supported the putative annotation by automatically classifying the cluster as being composed by “lipid and lipid-like molecules” at the superclass level, and “acylcarnitines” at the direct parent level (Figure 4B). With the complementary annotation tools integrated to MolNetEnhancer, we detected 21 urinary carnitines. The acylcarnitines are important urinary markers for metabolic dysfunctions that includes fatty acid and branched-chain amino acid catabolism.39

Figure 4.

Figure 4.

Spectral cluster observed in ESI positive mode and formed by carnitines. A) MN layout representing the Mass2motifs interactions that group the nodes. B) Bar plot with the number of annotated nodes according to the different approaches. Additional MN layouts showing GNPS library search (blue node), nodes equivalent to mass features annotated by Progenesis QI using fullscan MS1 (red node) and MS2 (green node), and the chemical classification at the superclass and direct parent levels. ClassyFire suggested “Fatty Acids” and “Fatty Acid Esters” at class and subclass levels, respectively.

The investigation of another cluster led to the annotation of 30 di- and tri-peptides, while only one candidate hit was provided by GNPS MS2 spectral matching, and three correct assignments through Progenesis QI (Figure S-11). Likewise, one small cluster observed in (−)-ESI mode resulted in the detection of hippuric acid derivatives as well as one contaminant (Figure S-12). Detailed discussion on the annotation procedure can be seen in the Supporting Information.

MolNetEnhancer vs Progenesis QI.

Analysis using Progenesis QI led to the initial annotation of ~14% of the mass features containing full-scan MS1 data in (−)-ESI mode, and ~18% in (+)-ESI mode. It also resulted in the identification of ~21% of the mass features using the MS2 spectrum in (−)-ESI mode, and ~37% in (+)-ESI mode. Using a 0.7 cut-off threshold, however, these numbers dropped to ~2% and ~3%, respectively. GNPS led to the annotation of ~10% of the total number of nodes in (−)-ESI mode, and ~7% in (+)-ESI mode, also using the 0.7 cutoff. The use of MolNetEnhancer extended the chemical structural annotation by integrating substructure discovery, in silico annotation, as well as chemical ontology classification. Substructure annotation provided by MS2LDA corresponded to ~16% and ~28% of the total nodes in (−)-ESI and (+)-ESI modes, respectively, whereas in silico prediction through NAP represented up to ~24% and ~50% of the total nodes, respectively. The in silico classification of the urinary metabolome accurately evidenced the predominance of heterocycles, lipids, organic acids, and amino acids.

The MN integrative approach for exhaustive metabolite annotation provided considerable metabolite coverage relative to the number of samples, when compared with recently published workflows.20,40,41 While analyzing only 10 urine samples using DDA acquisition mode, and limited to the most prominent ions in the sample, we nevertheless putatively annotated 106 and 169 urinary compounds (MSI level 2), including endogenous and exogenous metabolites, in both positive and negative ESI modes, respectively, and 459 (−ESI) and 578 (+ESI) compounds (MSI level 3) by sub-structural assignment and chemical classification. In a newly reported untargeted metabolomics platform for epidemiology investigations, ~ 540 metabolites were identified in > 800 urine samples; however only 47 (−ESI) and 126 (+ESI) compounds were annotated in a similar way to our strategy, i.e. exclusively through MS2 matching – metabolites were also annotated based on in-house compound library for retention time and high-resolution m/z.41 Another study annotated 440 urinary metabolites (MSI level 2) and 728 compounds (MSI level 3) from the analysis of 43 samples, by manually combining different metabolomics annotation tools, and with the application of a distinct MS2 matching score.20 Interestingly, a recent study annotated more than 1000 metabolites, but it required the creation of an in-house spectral library and over a thousand injections of urine samples.40

The comparison of various tandem MS-based untargeted metabolomics pipelines emphasizes how MS2 acquisition parameters determine spectral quality, chemical coverage and, consequently, metabolite annotation. DDA remains a popular choice since it affords high quality tandem mass spectra with little or no data processing, however the limited number of ions selected for fragmentation has favored DIA strategies, especially with the use of novel mass spectrometry approaches.42,43 We showed that the DDA approach offers comprehensive and informative spectral data for robust metabolite annotation. With the development and application of new DDA strategies and data processing, such as spectral deconvolution from Ratio Analysis of Mass Spectrometry (RAMSY),44,45 it is possible to cover a larger number of analytes as well as deploy high quality MS2 spectra for confident annotation.

Recently, GNPS has implemented tools to improve mass spectral network connectivity and propagation and expand metabolite annotation. SIRIUS46 and CANOPUS47 modules offer computation solutions for molecular formula prediction, in silico structural assignment and chemical classification, while Ion Identity Molecular Networking (IIMN)48 merges different ion species of the same metabolite, allowing to propagate into the same cluster multiple fragmentation routes resulting from distinct gas-phase mechanisms (acid-base and/or redox reactions).49 The addition of such methodologies to the multi-informative structure-based mass spectral network provided by MolNetEnhancer should further improve metabolite profiling of urine samples for the purposes of metabolic exploration and biomarker discovery.

CONCLUSION

Global LC-MS profiling requires computational approaches to maximize metabolite coverage while providing accurate relative quantification. Data preprocessing strategies combine peak alignment, peak picking and peak deconvolution methods to ensure reliable feature extraction. Nonetheless, wide metabolite coverage of complex biological samples may result in issues related to feature annotation. Here, we investigated the capabilities of the MolNetEnhancer web-based platform to assist the structural assignments of metabolites detected in human urine. The enhanced MNs incorporate library searches, in silico annotation tools, and chemical classification in a multi-informative structure-based mass spectral network. This integrative approach expanded the chemical annotation of complex urine samples when compared to conventional data preprocessing and mass feature MS2 matching carried out using Progenesis QI. The layout-driven MN assisted data interpretation with the progressive propagation of metabolites within the same chemical class. In addition, independent in silico workflows merged orthogonal results into one MN, unraveling the chemical diversity of structurally related molecules. Despite the continuing challenges of metabolite identification, these combined approaches provide some of the most comprehensive putative annotations of metabolites in urine to date.

Supplementary Material

SI 01
SI 02

ACKNOWLEDGMENT

We thank the financial support from the NIH (P30CA015704, P30DK035816, and R01GM131491). We also thank Dr. Jenna Hua at Million Marker for providing the urine samples.

Footnotes

Supporting Information. Additional information is available free of charge via the Internet at http://pubs.acs.org.
  • Links to access the specific parameters of the MNs; supporting methods; supporting results and discussion; overview of the MNs mapped through complementary annotation tools; overlapped annotation according to the in silico tools; categorization of the metabolites annotated by GNPS MS2 library search; and ClassyFire chemical classification according to “direct parent” level
  • List of metabolites annotated via Progenesis QI and GNPS/MolNetEnhancer

The authors declare no competing financial interest.

REFERENCES

  • (1).Nagana Gowda G; Raftery D Biomarker Discovery and Translation in Metabolomics. Curr. Metabolomics 2013, 1 (3), 227–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).Theodoridis GA; Gika HG; Want EJ; Wilson ID Liquid Chromatography–Mass Spectrometry Based Global Metabolite Profiling: A Review. Anal. Chim. Acta 2012, 711, 7–16. [DOI] [PubMed] [Google Scholar]
  • (3).Tugizimana F; Steenkamp PA; Piater LA; Dubery IA Mass Spectrometry in Untargeted Liquid Chromatography/Mass Spectrometry Metabolomics: Electrospray Ionisation Parameters and Global Coverage of the Metabolome. Rapid Commun. Mass Spectrom 2018, 32 (2), 121–132. 10.1002/rcm.8010. [DOI] [PubMed] [Google Scholar]
  • (4).Aksenov AA; da Silva R; Knight R; Lopes NP; Dorrestein PC Global Chemical Analysis of Biology by Mass Spectrometry. Nat. Rev. Chem 2017, 1 (7), 0054. [Google Scholar]
  • (5).Ortmayr K; Causon TJ; Hann S; Koellensperger G Increasing Selectivity and Coverage in LC-MS Based Metabolome Analysis. TrAC Trends Anal. Chem 2016, 82, 358–366. [Google Scholar]
  • (6).Blaženović I; Kind T; Ji J; Fiehn O Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics. Metabolites 2018, 8 (2), 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (7).Smith CA; Want EJ; O’Maille G; Abagyan R; Siuzdak G XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal. Chem 2006, 78 (3), 779–787. [DOI] [PubMed] [Google Scholar]
  • (8).Tsugawa H; Cajka T; Kind T; Ma Y; Higgins B; Ikeda K; Kanazawa M; VanderGheynst J; Fiehn O; Arita M MS-DIAL: Data-Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis. Nat. Methods 2015, 12, 523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Wishart DS; Knox C; Guo AC; Eisner R; Young N; Gautam B; Hau DD; Psychogios N; Dong E; Bouatra S HMDB: A Knowledgebase for the Human Metabolome. Nucleic Acids Res. 2008, 37 (suppl_1), D603–D610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Stein S Mass Spectral Reference Libraries: An Ever-Expanding Resource for Chemical Identification. Anal. Chem 2012, 84 (17), 7274–7282. [DOI] [PubMed] [Google Scholar]
  • (11).Dührkop K; Shen H; Meusel M; Rousu J; Böcker S Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI: FingerID. Proc Natl Acad Sci USA 2015, 112 (41), 12580–12585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Hufsky F; Scheubert K; Böcker S Computational Mass Spectrometry for Small-Molecule Fragmentation. TrAC Trends Anal. Chem 2014, 53, 41–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Wang M; Carver JJ; Phelan V; Sanchez LM; Garg N; Peng Y; Nguyen DD; Watrous J; Kapono CA; Luzzatto-Knaan T; et al. Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking. Nat. Biotechnol 2016, 34 (8), 828–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Frank AM; Bandeira N; Shen Z; Tanner S; Briggs SP; Smith RD; Pevzner PA Clustering Millions of Tandem Mass Spectra. J. Proteome Res 2008, 7 (1), 113–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (15).Costa-Lotufo LV; Carnevale-Neto F; Trindade-Silva AE; Silva RR; Silva GGZ; Wilke DV; Pinto FCL; Sahm BDB; Jimenez PC; Mendonça JN; et al. Chemical Profiling of Two Congeneric Sea Mat Corals along the Brazilian Coast: Adaptive and Functional Patterns. Chem. Commun 2018, 54 (16), 1952–1955. [DOI] [PubMed] [Google Scholar]
  • (16).de Oliveira G; Carnevale Neto F; Demarque D; de Sousa Pereira-Junior J; Sampaio Peixoto Filho R; de Melo S; da Silva Almeida J; Lopes J; Lopes N Dereplication of Flavonoid Glycoconjugates from Adenocalymma Imperatoris-Maximilianii by Untargeted Tandem Mass Spectrometry-Based Molecular Networking. Planta Med. 2016, 83 (07), 636–646. [DOI] [PubMed] [Google Scholar]
  • (17).Ernst M; Kang K. Bin; Caraballo-Rodríguez AM; Nothias L-F; Wandy J; Chen C; Wang M; Rogers S; Medema MH; Dorrestein PC; et al. MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools. Metabolites 2019, 9 (7), 144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Bouatra S; Aziat F; Mandal R; Guo AC; Wilson MR; Knox C; Bjorndahl TC; Krishnamurthy R; Saleem F; Liu P; et al. The Human Urine Metabolome. PLoS One 2013, 8 (9), e73076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).van der Hooft JJJ; Padmanabhan S; Burgess KEV; Barrett MP Urinary Antihypertensive Drug Metabolite Screening Using Molecular Networking Coupled to High-Resolution Mass Spectrometry Fragmentation. Metabolomics 2016, 12 (7), 125. 10.1007/s11306-016-1064-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Blaženović I; Kind T; Sa MR; Ji J; Vaniya A; Wancewicz B; Roberts BS; Torbašinović H; Lee T; Mehta SS; et al. Structure Annotation of All Mass Spectra in Untargeted Metabolomics. Anal. Chem 2019, 91 (3), 2155–2162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Bearden DW; Sheen DA; Simón-Manso Y; Benner BA; Rocha WFC; Blonder N; Lippa KA; Beger RD; Schnackenberg LK; Sun J; et al. Metabolomics Test Materials for Quality Control: A Study of a Urine Materials Suite. Metabolites 2019, 9 (11), 270. 10.3390/metabo9110270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (22).Kessner D; Chambers M; Burke R; Agus D; Mallick P ProteoWizard: Open Source Software for Rapid Proteomics Tools Development. Bioinformatics 2008, 24 (21), 2534–2536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (23).Wishart DS; Tzur D; Knox C; Eisner R; Guo AC; Young N; Cheng D; Jewell K; Arndt D; Sawhney S HMDB: The Human Metabolome Database. Nucleic Acids Res. 2007, 35 (suppl_1), D521–D526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).Wandy J; Zhu Y; van der Hooft JJJ; Daly R; Barrett MP; Rogers S MS2LDA.Org: Web-Based Topic Modelling for Substructure Discovery in Mass Spectrometry. Bioinformatics 2018, 34 (2), 317–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Rogers S; Ong CW; Wandy J; Ernst M; Ridder L; van der Hooft JJJ; Wei Ong C; Wandy J; Ernst M; Ridder L; et al. Deciphering Complex Metabolite Mixtures by Unsupervised and Supervised Substructure Discovery and Semi-Automated Annotation from MS/MS Spectra. bioRxiv 2019, 491506. 10.1101/491506. [DOI] [PubMed] [Google Scholar]
  • (26).da Silva RR; Wang M; Nothias L-F; van der Hooft JJJ; Caraballo-Rodríguez AM; Fox E; Balunas MJ; Klassen JL; Lopes NP; Dorrestein PC Propagating Annotations of Molecular Networks Using in Silico Fragmentation. PLOS Comput. Biol 2018, 14 (4), e1006089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Djoumbou Feunang Y; Eisner R; Knox C; Chepelev L; Hastings J; Owen G; Fahy E; Steinbeck C; Subramanian S; Bolton E; et al. ClassyFire: Automated Chemical Classification with a Comprehensive, Computable Taxonomy. J. Cheminform 2016, 8 (1), 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Nothias L-F; Petras D; Schmid R; Dührkop K; Rainer J; Sarvepalli A; Protsyuk I; Ernst M; Tsugawa H; Fleischauer M; et al. Feature-Based Molecular Networking in the GNPS Analysis Environment. Nat. Methods 2020, 17 (9), 905–908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Shannon P; Markiel A; Ozier O; Baliga NS; Wang JT; Ramage D; Amin N; Schwikowski B; Ideker T Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res 2003, 13 (11), 2498–2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (30).Kaufmann A; Butcher P; Maden K; Walker S; Widmer M Practical Application of in Silico Fragmentation Based Residue Screening with Ion Mobility High-Resolution Mass Spectrometry. Rapid Commun. Mass Spectrom 2017, 31 (13), 1147–1157. [DOI] [PubMed] [Google Scholar]
  • (31).Zhang J; Yang W; Li S; Yao S; Qi P; Yang Z; Feng Z; Hou J; Cai L; Yang M; et al. An Intelligentized Strategy for Endogenous Small Molecules Characterization and Quality Evaluation of Earthworm from Two Geographic Origins by Ultra-High Performance HILIC/QTOF MSE and Progenesis QI. Anal. Bioanal. Chem 2016, 408 (14), 3881–3890. [DOI] [PubMed] [Google Scholar]
  • (32).Horai H; Arita M; Nishioka T Comparison of ESI-MS Spectra in MassBank Database. In 2008 International Conference on BioMedical Engineering and Informatics; IEEE, 2008; Vol. 2, pp 853–857. 10.1109/BMEI.2008.339. [DOI] [Google Scholar]
  • (33).Frank AM; Monroe ME; Shah AR; Carver JJ; Bandeira N; Moore RJ; Anderson GA; Smith RD; Pevzner PA Spectral Archives: Extending Spectral Libraries to Analyze Both Identified and Unidentified Spectra. Nat. Methods 2011, 8, 587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Pilon AC; Gu H; Raftery D; Bolzani VDS; Lopes NP; Castro-Gamboa I; Carnevale Neto F Mass Spectral Similarity Networking and Gas-Phase Fragmentation Reactions in the Structural Analysis of Flavonoid Glycoconjugates. Anal. Chem 2019, 91 (16), 10413–10423. [DOI] [PubMed] [Google Scholar]
  • (35).da Silva RR; Dorrestein PC; Quinn RA Illuminating the Dark Matter in Metabolomics. Proc Natl Acad Sci USA 2015, 112 (41), 12549–12550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).van der Hooft JJJ; Wandy J; Barrett MP; Burgess KE V; Rogers, S. Topic Modeling for Untargeted Substructure Exploration in Metabolomics. Proc. Natl. Acad. Sci 2016, 113 (48), 13738–13743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Lu Z; Wang Q; Wang M; Fu S; Zhang Q; Zhang Z; Zhao H; Liu Y; Huang Z; Xie Z; et al. Using UHPLC Q-Trap/MS as a Complementary Technique to in-Depth Mine UPLC Q-TOF/MS Data for Identifying Modified Nucleosides in Urine. J. Chromatogr. B 2017, 1051, 108–117. 10.1016/j.jchromb.2017.03.002. [DOI] [PubMed] [Google Scholar]
  • (38).van der Hooft JJJ; Ridder L; Barrett MP; Burgess KEV Enhanced Acylcarnitine Annotation in High-Resolution Mass Spectrometry Data: Fragmentation Analysis for the Classification and Annotation of Acylcarnitines. Front. Bioeng. Biotechnol 2015, 3, 26. 10.3389/fbioe.2015.00026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (39).Millington DS; Stevens RD Acylcarnitines: Analysis in Plasma and Whole Blood Using Tandem Mass Spectrometry; Metz TO, Ed.; Humana Press: Totowa, NJ, 2011; pp 55–72. 10.1007/978-1-61737-985-7_3. [DOI] [PubMed] [Google Scholar]
  • (40).Simón-Manso Y; Marupaka R; Yan X; Liang Y; Telu KH; Mirokhin Y; Stein SE Mass Spectrometry Fingerprints of Small-Molecule Metabolites in Biofluids: Building a Spectral Library of Recurrent Spectra for Urine Analysis. Anal. Chem 2019, 91 (18), 12021–12029. 10.1021/acs.analchem.9b02977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Meister I; Zhang P; Sinha A; Sköld CM; Wheelock ÅM; Izumi T; Chaleckis R; Wheelock CE High-Precision Automated Workflow for Urinary Untargeted Metabolomic Epidemiology. Anal. Chem 2021, 93 (12), 5248–5258. 10.1021/acs.analchem.1c00203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Davies V; Wandy J; Weidt S; van der Hooft JJJ; Miller A; Daly R; Rogers S Rapid Development of Improved Data-Dependent Acquisition Strategies. Anal. Chem 2021, 93 (14), 5676–5683. 10.1021/acs.analchem.0c03895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Guo J; Huan T Comparison of Full-Scan, Data-Dependent, and Data-Independent Acquisition Modes in Liquid Chromatography–Mass Spectrometry Based Untargeted Metabolomics. Anal. Chem 2020, 92 (12), 8072–8080. 10.1021/acs.analchem.9b05135. [DOI] [PubMed] [Google Scholar]
  • (44).Gu H; Gowda GAN; Carnevale Neto F; Opp MR; Raftery D RAMSY: Ratio Analysis of Mass Spectrometry to Improve Compound Identification. Anal. Chem 2013, 85 (22), 10771–10779. 10.1021/ac4019268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).Carnevale Neto F; Pilon AC; Selegato DM; Freire RT; Gu H; Raftery D; Lopes NP; Castro-Gamboa I Dereplication of Natural Products Using GC-TOF Mass Spectrometry: Improved Metabolite Identification by Spectral Deconvolution Ratio Analysis. Front. Mol. Biosci 2016, 3 (59), DOI: 10.3389/fmolb.2016.00059. 10.3389/fmolb.2016.00059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (46).Dührkop K; Fleischauer M; Ludwig M; Aksenov AA; Melnik AV; Meusel M; Dorrestein PC; Rousu J; Böcker S SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16 (4), 299–302. 10.1038/s41592-019-0344-8. [DOI] [PubMed] [Google Scholar]
  • (47).Dührkop K; Nothias L-F; Fleischauer M; Reher R; Ludwig M; Hoffmann MA; Petras D; Gerwick WH; Rousu J; Dorrestein PC; et al. Systematic Classification of Unknown Metabolites Using High-Resolution Fragmentation Mass Spectra. Nat. Biotechnol 2021, 39 (4), 462–471. 10.1038/s41587-020-0740-8. [DOI] [PubMed] [Google Scholar]
  • (48).Schmid R; Petras D; Nothias L-F; Wang M; Aron AT; Jagels A; Tsugawa H; Rainer J; Garcia-Aloy M; Dührkop K; et al. Ion Identity Molecular Networking for Mass Spectrometry-Based Metabolomics in the GNPS Environment. Nat. Commun 2021, 12 (1), 3832. 10.1038/s41467-021-23953-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Demarque DP; Crotti AEM; Vessecchi R; Lopes JLC; Lopes NP Fragmentation Reactions Using Electrospray Ionization Mass Spectrometry: An Important Tool for the Structural Elucidation and Characterization of Synthetic and Natural Products. Nat Prod Rep 2016, 33, 432–455. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SI 01
SI 02

RESOURCES