Abstract
Poor chemical annotation of high-resolution mass spectrometry data limit applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics – Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of LC/HRMS peaks regardless of the availability of MS2 fragmentation spectra. We demonstrate comparable annotation rates for commonly detected endogenous metabolites in human blood samples using IDSL.CSA libraries versus data dependent acquisition (DDA) MS2 libraries in validation tests. IDSL.CSA can create and search composite spectra libraries from any untargeted metabolomics dataset generated using high-resolution mass spectrometry coupled to liquid or gas chromatography. The cross-applicability of these libraries across independent studies can improve overall annotation rates in metabolomics and exposomics projects, providing access to new biological insights that may be missed due to the lack of MS2 fragmentation data. The IDSL.CSA package is available in the R CRAN repository (https://cran.r-project.org/package=IDSL.CSA. Detailed documentation and tutorials are provided at https://github.com/idslme/IDSL.CSA.
Keywords: Metabolomics, Peak Annotation, High resolution mass spectrometry, Spectra Search
Graphical Abstract

Introduction
Metabolomics has a great potential to uncover biological mechanisms and biomarkers that can be translated to innovative products to improve human health, food security and the environment. One major approach of metabolomics is the untargeted profiling of the small molecule chemical space in a biospecimen to create large databases of chemical measurements under different phenotype1 and genotype backgrounds1, 2. High-resolution mass spectrometry coupled with liquid chromatography (LC/HRMS) is the most commonly used technique for untargeted metabolomics assays. However, not all peaks in an LC/HRMS dataset are annotated with chemical information, mainly for two reasons 1) MS2 fragmentation data are not available for every detected LC/HRMS peak and 2) Available MS2 libraries poorly cover the expected chemical space. As a result, a large volume of untargeted metabolomics data remains to be underutilized, leading to missed discovery opportunities.
A single chemical compound can generate several ions species due to chemical reactions including fragmentation and adduct formations in the electrospray ionization (ESI) source, collectively known as the MS1 data. Sporadically, selected ions (precursors) can be fragmented further to generate mass spectra (MS2) that can be searched against data-dependent acquisition (DDA) mass spectral libraries such as NIST 2020 for chemical annotation3, 4 along with several advances in annotating and interpreting DDA MS2 data5–7. Existing approaches8–11 for MS1 data annotation are limited to group the correlating ions in order to flag LC/HRMS peaks as potential isotopologues, adducts or in-source fragments or assigning molecular formula using isotope patterns12. Promisingly, these approaches including RAMClust8, eISA11, CorrDec13, IIMN10, MS-FLO14, and MetaboAnnotatoR15 also suggests that co-eluting ions at MS1 level can be exported as a spectrum which may match an entry in a DDA MS/MS library, enabling the chemical annotation of untargeted metabolomics data using only MS1 data. Consistent ion annotation for common metabolites across multiple studies suggests that related ESI signals for a compound are trackable and can be utilized to annotate LC/HRMS peaks12, 16–19. There is a need to advance the computational resources to annotate compounds using these grouped peaks in untargeted LC/HRMS datasets. Converting these correlating ions on MS1 level to re-usable mass spectra libraries can be useful for annotating peaks in independent studies that may or may not have MS2 data.
Here, we propose to create composite spectra libraries of correlating LC/HRMS peaks in the MS1 data from untargeted metabolomics assays and to use these libraries for annotating peaks by mass spectral library searches. For that, we have developed a new R package IDSL.CSA (https://cran.r-project.org/package=IDSL.CSA) for creating CSA libraries and a companion R package IDSL.FSA (https://cran.r-project.org/package=IDSL.FSA) for mass spectral similarity searches.
Results:
First, we analyzed the elution profile for authentic reference compounds in LC/HRMS data. There were 255 chromatographic peaks detected for a test file (pool_P1_A1-A12.mzML) by the IDSL.IPA software19 (https://zenodo.org/record/7530009). Out of those, 153(60%) had at least one neighboring peak in a retention time window of 0.01 minutes. Many of these co-occurring ions may have originated from a single compound eluting from the LC column. For example, a cluster of 209.292, 192.065 and 146.060 m/z ions related to Kynurenine compound standard followed an almost identical elution profile (Figure S.1) at retention time 4.18 minutes. To automatically test if those neighboring peaks are related to a single compound, we have developed a new spectra deconvolution algorithm to group them by measuring the elution profile similarity. Extracted ion chromatograms (EICs) for individual peaks were extracted from the raw data file. Then, EICs were smoothed using the Local Polynomial Regression Fitting (LOESS) regression to provide a robust estimate of the elution profile similarity. At a Pearson correlation coefficient threshold of 0.98 between the EICs, 76.9% peaks were grouped into 42 EIC groups or CSA clusters (https://zenodo.org/record/7530028) for the test file. Clusters were selected only if they had at least two peaks from the IDSL.IPA peak list (https://zenodo.org/record/7530009) and a difference between minimum and maximum m/z values was greater than eight. These two criteria ensured that the CSA clusters had m/z values outside the isotopologue ranges which obviously have highly similar elution profiles. Observations of these clusters motivated us explore their utility in annotating compounds in untargeted LC/HRMS datasets.
We checked if a CSA cluster can represent a deconvoluted mass spectrum for a compound. For each CSA cluster, we exported all m/z values within a cluster, the retention time at the apex, the peak height values at the apex of for each ion’s EIC as the intensity values, the correlation statistics and the additional metadata to a mass spectra file in the NIST MSP format (https://zenodo.org/record/7530023). The file had 42 spectra. They were named as ‘CSA Spectra’. Searching these spectra against the NIST 2020 library suggested high confidence annotations for 5 clusters (https://zenodo.org/record/7530063). See the method section for spectral search parameters. Top hits for these clusters were Purine, L-Kynurenine, N-Acetyl histidine, Homoserine, N-Acetylneuraminic acid (Figure S.2). The annotations were also confirmed by the data dependent acquisition (DDA) MS2 spectra (Figure S.3, https://zenodo.org/record/7530151). These results suggested that CSA Spectra may be used for annotating compounds in LC-HRMS data with limited or no availability of DDA data which consistent with previous reports (cite RAMClust8 and MetaboAnnotatorR15).
We developed a workflow for automatically creating a CSA spectra library for authentic standards. A total of 359 unique chemical standards had confirmed reverse phase retention time and precursor m/z values across 54 LC/HRMS data in the ESI positive mode (https://zenodo.org/record/7530170). For these standards, 214(59.6%) were associated with a CSA spectrum by the workflow. Searching these CSA spectra against mass spectral databases suggested that only up to 42(19.6%) were similar to DDA spectra, indicating that a majority of CSA spectra contained different in-source ESI adducts. The spectra library is provided at (https://zenodo.org/record/7530184).
Next, we extended the CSA workflow to cover known annotations that have been reported for LC/HRMS data of biological specimens. In a publicly available untargeted metabolomics study (ST000923) of human stool samples, 177 annotations were targeted in 600 LC/HRMS data files (https://zenodo.org/record/7530227) for creating a CSA spectra library. Only MS1 data were available for this study. Up to 105(59.3%) compounds had an associated CSA spectra with a median frequency of 204, min of 1, max of 589 and a total of 25,351 CSA spectra were deconvoluted for the target compounds from 600 samples. Then, we identified variants of CSA spectra that are detected across multiple LC/HRMS data files. By using the entropy similarity approach20 (See method), we found that the average number of CSA spectra was 2.5 per compound. Searching these CSA spectra against mass spectral databases suggested that only 46(26%) were similar to DDA spectra. The CSA spectra library for the annotated compounds for the study has been provided at (https://zenodo.org/record/7530237).
Next, we scaled and applied the CSA deconvolution algorithm to the ST001000 (n=200) study to detect all possible CSA spectra in LC/HRMS data files. For this study, a total of 139,018 CSA spectra were deconvoluted with a median of 631 per file. A total of 16,659 unique CSA spectra with at least 5% detection frequency were detected for this study (https://zenodo.org/record/7530245).
To annotate the deconvoluted CSA spectra, we have first searched them against MS/MS databases. For the ST001000 study, 802(4.8%) CSA spectra had a high confidence spectral similarity match (https://zenodo.org/record/7530263). Because a compound can have multiple CSA variants across the entire study, unique base peaks by chemical name and InChiKeys were selected to subset the aligned peak table for the study. A total of 377 unique peaks were selected and annotated with 321 unique InChiKeys and 343 unique chemical names (https://zenodo.org/record/7530275). This subset aligned table represents a data matrix for annotated compounds by the CSA workflow.
Next, we tested if a CSA spectra library created for one study can enable annotations of compounds in a different study. We searched the unique CSA spectra variants for ST001000 study against the CSA library that we have generated earlier for the study ST000923. The spectral search results (https://zenodo.org/record/7530302) suggested that 34 additional compounds (InChiKeys) can be annotated for study ST001000 that were not covered by the DDA library search (https://zenodo.org/record/7530310).
To verify the accuracy of the annotations obtained using only the spectral searches for CSA spectra, we have matched their retention time against the published data dictionaries for ST001000 and ST000923 studies (See methods). Data for both studies were collected by the same laboratory using identical analytical conditions, but ST000923 had 174 annotated compounds (InChiKeys), of which 89 were not previously annotated for the ST001000 study. Of the 488 annotations (InChiKeys) by CSA spectra, 60 were confirmed by matching their retention time in the published data dictionaries for ST000923 and ST000100 studies in the Metabolomics WorkBench repository. Only 26(26.4%) of the hits were probable false positives or in-source fragments. The IDSL.CSA workflow suggested 301 new annotations (unique InChiKeys) for the ST001000 study (https://zenodo.org/record/7530359).
To demonstrate the biological significance of the annotated compounds for the ST001000 study, we conducted a chemical set enrichment analysis using ChemRICH software21. ChemRICH identified the chemical classes that were found to be significantly different between individuals with Crohn’s disease and healthy controls (https://zenodo.org/record/7530364). Our analysis revealed that several chemical classes, including cholic acids, amino acids, aminosalicylic acids, biogenic amines, hexosamines, and vitamins, were significantly different between these two groups (https://zenodo.org/record/7530366). The ChemRICH result suggests that the altered gut microbiota ecology22 in Crohn’s disease patients may also be connected to metabolic pathways involving these chemical classes.
Finally, we have incorporated the workflows for creating CSA libraries and annotating those using mass spectral similarity searches into two standard R package called ‘IDSL.CSA’ and ‘IDSL.FSA’ available on CRAN repository at (https://cran.r-project.org/package=IDSL.CSA) and. (https://cran.r-project.org/package=IDSL.FSA). We have also added workflows for processing data-dependent (DDA) and data-independent (DIA) acquisitions. The package includes a user-friendly parameter file in Microsoft Excel format that allows users to run different workflows and ensure reproducible data processing. Additionally, we also extended the CSA deconvolution to cover nominal mass data for GC/MS data using a secondary ‘IDSL.NPA’ R package (https://cran.r-project.org/package=IDSL.NPA). The parameter tables are extensive and cover commonly used settings as well as several new parameters to optimize spectra deconvolution and spectral searches (Table S1–S4). Documentation, tutorials, and code for the software are available for IDSL.CSA and IDSL.FSA R packages in the GitHub repository at https://github.com/idslme/IDSL.CSA and https://github.com/idslme/IDSL.FSA, respectively.
Discussion:
We have developed a simple and easy to use integrated workflow of IDSL.CSA and IDSL.FSA R packages that can improve the chemical annotation rates of LC/HRMS peaks in untargeted metabolomics datasets. The integrated approach using IDSL.IPA19, IDSL.CSA and IDSL.FSA R packages (Figure 1) contains easy to use steps for 1) creating CSA, DDA and DIA spectral libraries 2) performing mass spectral similarity searches using spectral entropy20, cosine similarity, normalized Euclidean mass error12 3) prefiltering library spectra for faster searches (Section S.1) 4) refining deconvolution results aligned table (Section S.2) and 5) ranking annotations using spectra search results from all the samples within a study.
Figure 1.
Integrated workflow of the IDSL.IPA, IDS.CSA and IDSL.FSA packages to deconvolute and annotate composite spectra.
We have developed a new workflow to create composite spectra libraries from LC-HRMS datasets. A CSA spectrum may be similar to a counter DDA spectrum of the compound but may often include additional ionization reactions (Figure S.4). As recommended by earlier reports12, 16–19, IDSL.CSA workflows also utilize both individual file level geometry alignment of LC/HRMS peaks and the co-detection frequency across all samples within a study to capture different variants of CSA spectra for a compound. For example, two variants of CSA spectra for kynurenine can contain different ESI adducts such as [M+Na]+ and [M-H2O+H]+ (Figure S.5). The composition of CSA variant spectra for a chemical depends on the instrument type, analytical method, sample matrix, gradient additives and other factors. These variants can increase the specificity of library searches. We argue that these CSA spectra should be catalogued in a mass spectral library and in a community driven MS database such as GNPS10. A CSA mass spectral library is a collection of unique composite spectra variants of different chemical compounds which can be created using MS1 only data for annotated peaks and authentic standards. To create CSA libraries, raw LC/HRMS data for known annotations or authentic standards are needed, which are readily available for over 2000 publicly available metabolomics datasets in EBI Metabolights23, GNPS Massive24 and Metabolomics Workbench25 repositories.
The IDSL.CSA workflows incorporate several essential and novel features for processing untargeted metabolomics datasets. It encapsulates a full workflow of steps including peak detection19, alignment, DDA/DIA deconvolution, library generation, spectra search and annotation ranking, into a single line of R command that need all parameters in an input Microsoft Excel file. The workflow applies a critical step of LOESS smoothing followed by the cubic spline smoothing method to minimize chromatogram jaggedness while computing correlation among ion intensities. CSA variants and consensus CSA spectra both were created for a compound to enable cross-instrument searches. It also interprets a CSA spectrum by identifying ion species of commonly observed ESI adducts. The optimized pre-filtering methods using precursor, spectra markers, spectra entropy enabled faster searching of larger libraries with millions of spectra. A key unique strength of our approach is to identify recurring CSA spectra across multiple samples which is helpful in generating high confidence CSA libraries. It also standardizes publicly available mass spectra data with inconsistent fields to a standard storage format in R, making IDSL.CSA and IDSL.FSA packages fully compatible with existing public MS/MS libraries and the NIST MS/MS database. With these features, IDSL.CSA workflow can streamline generation of high-quality data matrices with improved chemical annotation rates from untargeted metabolomics datasets. IDSL.CSA is a useful addition to the growing pool of software to improve the annotation of MS1 data in untargeted metabolomics studies.
It is a major challenge in metabolomics that 2/3 of untargeted studies in the Metabolomics Workbench data repository (https://www.metabolomicsworkbench.org) have only unannotated peaks. To overcome this hurdle to some extent, IDSL.CSA workflows may be able to annotate a substantial number of peaks in these studies using available reference DDA and newly created CSA libraries. By utilizing MS1 data in an exhaustive way, our workflow can minimize the underutilization of untargeted metabolomics datasets in studying basic metabolic processes and biomonitoring of environmental chemical exposure.
Material and methods:
Publicly available LC/HRMS test datasets:
Raw mass spectra data and known annotations were accessed for MSV000088661, ST000923 and ST001000 from Massive UCSD (https://massive.ucsd.edu), Metabolomics WorkBench (https://www.metabolomicsworkbench.org) data repositories. Raw data were converted to the centroid mzML format using the MSConvert utility.
R packages:
IDSL.CSA R package has been provided via the R-CRAN repository (https://cran.r-project.org/package=IDSL.CSA). IDSL.MXP R package19 (https://CRAN.R-project.org/package=IDSL.MXP) was used to read LC/HRMS files in mzML/mzXML/netCDF data formats. IDSL.IPA R package (version 2.6)19 (https://CRAN.R-project.org/package=IDSL.IPA) was used for generating chromatographic peak lists and aligned peak tables for each study. IDSL.FSA R package (https://cran.r-project.org/package=IDSL.FSA) was used for mass spectral similarity searches.
Generation of mass spectral libraries:
IDSL.CSA workflow can generate three types of mass spectral libraries – 1) composite spectra (CSA) 2) data-dependent acquisition (DDA) and 3) data independent acquisition (DIA). CSA: LC/HRMS peaks from IDSL.IPA peak list for each sample were grouped by estimating the Pearson correlation among the chromatographic peaks after LOESS smoothing followed by a cubic spline interpolation. First, peaks were sorted by intensity and the most abundant peak was selected as a seed peak for creating a CSA spectrum. Next, a retention time threshold was applied to narrow down candidate fragmentation peaks. Then, Pearson correlation was estimated among the candidate peaks. One peak was assigned to only a single CSA spectrum to avoid redundancies. For true positive annotations, the seed peak was reported m/z value. DDA: DDA spectra with precursor details were extracted. When more than one DDA spectra for a IDSL.IPA peak was observed, three options were provided 1) integrated all DDA spectra for a peak 2) select the most abundant 3) de-noise spectra by correlation statistics. DIA: DIA spectra were created by the same approach used for CSA, except that the fragment ions were obtained from the MSE (MS level = 2) data channels16. Spectra from these three methods were exported to the MSP format. Parameter files are provided at https://zenodo.org/record/7530387.
Mass spectral similarity search:
Spectral similarity search parameters were provided in a Microsoft excel format (https://zenodo.org/record/7530387) to the IDSL.FSA R package. If mass spectral libraries for standards were from different sources and in different formats such as mgf, they were converted to a standard msp file and separated by ionization mode (+/−). To enable faster searches in R, msp files for known annotations and authentic standards were converted into a fragmentation spectra database (FSDB) that stored pre-calculated spectral entropy for each spectrum in the MSP file (https://zenodo.org/record/7530387). FSDBs for public databases including the Global Natural Product Social Molecular Networking (GNPS) and Mass bank of North America (MoNA) for positive and negative modes are provided at (https://zenodo.org/record/7530397). Similar to Li et. al.,20 adjacent fragments within the instrument resolution were resolved followed by a noise removal (Default = 1%). A threshold above a baseline (%) was used to determine characteristic spectra markers, and then a minimum threshold (%) of matched characteristic spectra markers was used to allow partial symmetry matches in case of presence of instrumental noises. Cosine and entropy similarity scores were computed for spectra matching. The cosine similarity was computed using equation (1) only using the library fragments.
| (1) |
where Ii and NP represent the intensity of the fragment, and number of matched fragment peaks in the fragmentation spectra,/ respectively. Superscripts of lib and exptl represent library and experimental fragmentation spectra, respectively. Entropy similarity was calculated from spectral entropy (S) values described by Li et. al.20
| (2) |
where Ip is normalized intensity to the summation of intensities (ΣIp = 1). A merged mass spectrum lib:exptl is generated by 1:1 mixing the normalized lib and exptl spectra. Then, the entropy similarity was calculated using equation (3).
| (3) |
An additional weight transformation was proposed to adjust spectral entropy using equation (4).
| (4) |
The transformed Ipnew should also be re-normalized (ΣIpnew = 1). Normalized Euclidean mass error (NEME) was calculated using the equation (5) to further asses the fragmentation spectra. An entropy similarity cutoff (Default ≥ 0.75) may be used to filter out candidate hits.
| (5) |
where Mi represents the mass of the fragmentation peaks. A maximum threshold for NEME may be used to cut off hits with higher mass errors.
When the entire candidate hits were matched for an experimental fragmentation spectra, the candidate hits are sorted using the following equation.
| (6) |
For each sample in a study, spectra search result table was generated. An example table is provided at (https://zenodo.org/record/7530263). If the aligned table generated by the IDSL.IPA R package was available, spectra search results were summarized for annotation frequency and overall median ranks.
Parameter files:
IDSL.IPA and IDSL.CSA parameter files for MSV000088661, ST000923 and ST001000 studies are provided at (https://zenodo.org/record/7530409). These files include – 1) creating reference CSA spectra for authentic standard 2) creating CSA spectra for true positive annotation ST001000 study and 3) creating CSA spectra in a full deconvolution mode for ST001000 and 4) IDSL.IPA data processing settings. After filling up parameters in these files, the excel sheets were used directly as an input for running the workflow function in the IDSL.CSA R package.
Statistical analysis:
Student t-test and ChemRICH analysis was performed in R to find significantly different chemical and chemical sets in the Crohn’s disease patients in comparison to the healthy control group.
Supplementary Material
Funding:
The research is in part supported by NIH grants U2CES026561, R01ES032831, R01ES033688, U2CES026555 P30ES023515, T32HD049311, K12ES033594, U2CES030859, UL1TR004419 and UL1TR001433.
Footnotes
Conflict of interest: DKB has been a consultant for the Brightseed Inc, California, USA.
Supporting information:
Supporting spreadsheet tables, data sources, workflow and computational method details and additional benchmarks.
References:
- (1).Cheng Y.; Schlosser P.; Hertel J.; Sekula P.; Oefner P. J.; Spiekerkoetter U.; Mielke J.; Freitag D. F.; Schmidts M.; Investigators G.; et al. Rare genetic variants affecting urine metabolite levels link population variation to inborn errors of metabolism. Nat Commun 2021, 12 (1), 964. DOI: 10.1038/s41467-020-20877-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Hu J.; Yao J.; Deng S.; Balasubramanian R.; Jimenez M. C.; Li J.; Guo X.; Cruz D. E.; Gao Y.; Huang T.; et al. Differences in Metabolomic Profiles Between Black and White Women and Risk of Coronary Heart Disease: an Observational Study of Women From Four US Cohorts. Circ Res 2022, 131 (7), 601–615. DOI: 10.1161/CIRCRESAHA.121.320134 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Yang X.; Neta P.; Stein S. E. Quality control for building libraries from electrospray ionization tandem mass spectra. Anal Chem 2014, 86 (13), 6393–6400. DOI: 10.1021/ac500711m [DOI] [PubMed] [Google Scholar]
- (4).Yang X.; Neta P.; Stein S. E. Extending a Tandem Mass Spectral Library to Include MS(2) Spectra of Fragment Ions Produced In-Source and MS(n) Spectra. J Am Soc Mass Spectrom 2017, 28 (11), 2280–2287. DOI: 10.1007/s13361-017-1748-2 [DOI] [PubMed] [Google Scholar]
- (5).Domingo-Almenara X.; Montenegro-Burke J. R.; Benton H. P.; Siuzdak G. Annotation: A Computational Solution for Streamlining Metabolomics Analysis. Anal Chem 2018, 90 (1), 480–489. DOI: 10.1021/acs.analchem.7b03929 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Duhrkop K.; Fleischauer M.; Ludwig M.; Aksenov A. A.; Melnik A. V.; Meusel M.; Dorrestein P. C.; Rousu J.; Bocker S. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods 2019, 16 (4), 299–302. DOI: 10.1038/s41592-019-0344-8 [DOI] [PubMed] [Google Scholar]
- (7).Guijas C.; Montenegro-Burke J. R.; Domingo-Almenara X.; Palermo A.; Warth B.; Hermann G.; Koellensperger G.; Huan T.; Uritboonthai W.; Aisporna A. E.; et al. METLIN: A Technology Platform for Identifying Knowns and Unknowns. Anal Chem 2018, 90 (5), 3156–3164. DOI: 10.1021/acs.analchem.7b04424 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Broeckling C. D.; Afsar F. A.; Neumann S.; Ben-Hur A.; Prenni J. E. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data. Anal Chem 2014, 86 (14), 6812–6817. DOI: 10.1021/ac501530d [DOI] [PubMed] [Google Scholar]
- (9).Kuhl C.; Tautenhahn R.; Bottcher C.; Larson T. R.; Neumann S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal Chem 2012, 84 (1), 283–289. DOI: 10.1021/ac202450g [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Schmid R.; Petras D.; Nothias L. F.; Wang M.; Aron A. T.; Jagels A.; Tsugawa H.; Rainer J.; Garcia-Aloy M.; Duhrkop K.; et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat Commun 2021, 12 (1), 3832. DOI: 10.1038/s41467-021-23953-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Xue J.; Domingo-Almenara X.; Guijas C.; Palermo A.; Rinschen M. M.; Isbell J.; Benton H. P.; Siuzdak G. Enhanced in-Source Fragmentation Annotation Enables Novel Data Independent Acquisition and Autonomous METLIN Molecular Identification. Anal Chem 2020, 92 (8), 6051–6059. DOI: 10.1021/acs.analchem.0c00409 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Baygi S. F.; Banerjee S. K.; Chakraborty P.; Kumar Y.; Barupal D. K. IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics. Anal Chem 2022, 94 (39), 13315–13322. DOI: 10.1021/acs.analchem.2c00563 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Tada I.; Chaleckis R.; Tsugawa H.; Meister I.; Zhang P.; Lazarinis N.; Dahlen B.; Wheelock C. E.; Arita M. Correlation-Based Deconvolution (CorrDec) To Generate High-Quality MS2 Spectra from Data-Independent Acquisition in Multisample Studies. Anal Chem 2020, 92 (16), 11310–11317. DOI: 10.1021/acs.analchem.0c01980 [DOI] [PubMed] [Google Scholar]
- (14).DeFelice B. C.; Mehta S. S.; Samra S.; Cajka T.; Wancewicz B.; Fahrmann J. F.; Fiehn O. Mass Spectral Feature List Optimizer (MS-FLO): A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography-Mass Spectroscopy (LC-MS) Data Processing. Anal Chem 2017, 89 (6), 3250–3255. DOI: 10.1021/acs.analchem.6b04372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Graca G.; Cai Y.; Lau C. E.; Vorkas P. A.; Lewis M. R.; Want E. J.; Herrington D.; Ebbels T. M. D. Automated Annotation of Untargeted All-Ion Fragmentation LC-MS Metabolomics Data with MetaboAnnotatoR. Anal Chem 2022, 94 (8), 3446–3455. DOI: 10.1021/acs.analchem.1c03032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Baygi S. F.; Fernando S.; Hopke P. K.; Holsen T. M.; Crimmins B. S. Nontargeted Discovery of Novel Contaminants in the Great Lakes Region: A Comparison of Fish Fillets and Fish Consumers. Environ Sci Technol 2021, 55 (6), 3765–3774. DOI: 10.1021/acs.est.0c08507 [DOI] [PubMed] [Google Scholar]
- (17).Fakouri Baygi S.; Fernando S.; Hopke P. K.; Holsen T. M.; Crimmins B. S. Automated Isotopic Profile Deconvolution for High Resolution Mass Spectrometric Data (APGC-QToF) from Biological Matrices. Anal Chem 2019, 91 (24), 15509–15517. DOI: 10.1021/acs.analchem.9b03335 [DOI] [PubMed] [Google Scholar]
- (18).Fakouri Baygi S.; Fernando S.; Hopke P. K.; Holsen T. M.; Crimmins B. S. Decadal Differences in Emerging Halogenated Contaminant Profiles in Great Lakes Top Predator Fish. Environ Sci Technol 2020, 54 (22), 14352–14360. DOI: 10.1021/acs.est.0c03825 [DOI] [PubMed] [Google Scholar]
- (19).Fakouri Baygi S.; Kumar Y.; Barupal D. K. IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets. J Proteome Res 2022, 21 (6), 1485–1494. DOI: 10.1021/acs.jproteome.2c00120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Li Y.; Kind T.; Folz J.; Vaniya A.; Mehta S. S.; Fiehn O. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat Methods 2021, 18 (12), 1524–1531. DOI: 10.1038/s41592-021-01331-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Barupal D. K.; Fiehn O. Chemical Similarity Enrichment Analysis (ChemRICH) as alternative to biochemical pathway mapping for metabolomic datasets. Sci Rep 2017, 7 (1), 14567. DOI: 10.1038/s41598-017-15231-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Franzosa E. A.; Sirota-Madi A.; Avila-Pacheco J.; Fornelos N.; Haiser H. J.; Reinker S.; Vatanen T.; Hall A. B.; Mallick H.; McIver L. J.; et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat Microbiol 2019, 4 (2), 293–305. DOI: 10.1038/s41564-018-0306-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Haug K.; Cochrane K.; Nainala V. C.; Williams M.; Chang J.; Jayaseelan K. V.; O’Donovan C. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res 2020, 48 (D1), D440–D444. DOI: 10.1093/nar/gkz1019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Wang M.; Carver J. J.; Phelan V. V.; Sanchez L. M.; Garg N.; Peng Y.; Nguyen D. D.; Watrous J.; Kapono C. A.; Luzzatto-Knaan T.; et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol 2016, 34 (8), 828–837. DOI: 10.1038/nbt.3597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Sud M.; Fahy E.; Cotter D.; Azam K.; Vadivelu I.; Burant C.; Edison A.; Fiehn O.; Higashi R.; Nair K. S.; et al. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 2016, 44 (D1), D463–470. DOI: 10.1093/nar/gkv1042 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

