Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2022 Mar 31;38(10):2872–2879. doi: 10.1093/bioinformatics/btac197

Improving confidence in lipidomic annotations by incorporating empirical ion mobility regression analysis and chemical class prediction

Bailey S Rose 1, Jody C May 2, Jaqueline A Picache 3, Simona G Codreanu 4, Stacy D Sherrod 5, John A McLean 6,
Editor: Jonathan Wren
PMCID: PMC9306740  PMID: 35561172

Abstract

Motivation

Mass spectrometry-based untargeted lipidomics aims to globally characterize the lipids and lipid-like molecules in biological systems. Ion mobility increases coverage and confidence by offering an additional dimension of separation and a highly reproducible metric for feature annotation, the collision cross-section (CCS).

Results

We present a data processing workflow to increase confidence in molecular class annotations based on CCS values. This approach uses class-specific regression models built from a standardized CCS repository (the Unified CCS Compendium) in a parallel scheme that combines a new annotation filtering approach with a machine learning class prediction strategy. In a proof-of-concept study using murine brain lipid extracts, 883 lipids were assigned higher confidence identifications using the filtering approach, which reduced the tentative candidate lists by over 50% on average. An additional 192 unannotated compounds were assigned a predicted chemical class.

Availability and implementation

All relevant source code is available at https://github.com/McLeanResearchGroup/CCS-filter.

Supplementary information

Supplementary data are available at Bioinformatics online.

Graphical Abstract

graphic file with name btac197f6.jpg

1 Introduction

Lipids and lipid-like molecules play critical roles in a diverse array of biological processes, including membrane structure, signaling, and energy storage.(Cullis et al., 1996) Although the implications of their dysregulation in a number of diseases make lipids of prime interest for study (Wymann and Schneiter, 2008), global analysis is often hindered by the wide range of chemical and physical properties arising from the structural diversity of this single superclass of molecules (Chatgilialoglu et al., 2014; Wenk, 2005; Xu et al., 2020). Mass spectrometry (MS)-based lipidomics has risen to meet this challenge, enabling high sensitivity and high throughput measurements which facilitates not only a broader understanding of lipid metabolism, but also the discovery of significant molecular signatures for further study (Emília et al., 2015; Navas-Iglesias et al., 2009; Rustam and Reid, 2018).

Although great strides have been made toward comprehensive annotation of the lipidome, confident compound identification remains a bottleneck in untargeted analyses (Blaženović et al., 2018a; Schrimpe-Rutledge et al., 2016; Sindelar and Patti, 2020). Comprehensive lipid identification by accurate mass alone is unattainable due to the prevalence of isomeric species as well as the limited number of commercially available analytical standards (Rustam and Reid 2018). Tandem MS (MS/MS) ion fragmentation is often used to aid in the identification of lipid species with support from both experimental and in silico libraries (Kind et al., 2013; Kochen et al., 2016; Koelmel et al., 2017, 2020; Sud et al., 2007). However, MS/MS approaches are challenged by structurally similar lipids and chimeric fragmentation data resulting from isobaric signals, such as those arising from complex biological matrices. Liquid chromatography (LC) has also been extensively used in lipidomic analyses to improve peak capacity, resolve isomers, and help mitigate ion suppression effects at the ionization source (Cajka and Fiehn, 2014; Peterson and Cummings, 2006). The use of retention time as a chemical descriptor for species identification has been supported by the expanding landscape of retention time libraries and prediction tools (Blaženović et al., 2018b; Ross et al., 2020), but these approaches are not easily applicable across different laboratories and platforms due to the influence of various experimental parameters and matrix effects on retention time variability (Beyaz et al., 2014; Taylor, 2005). Additionally, direct sample analysis techniques such as those utilized for MS imaging are incompatible with chromatography and thus cannot take advantage of retention time correlation. Despite these limitations, the integration of complementary analytical techniques into multidimensional MS strategies is necessary to expand coverage and confidence in lipidomic annotations (Harris et al., 2019; Kyle et al., 2016).

An analytical separation technique that has had increasing success within the lipidomics community is ion mobility (IM; Blaženović et al., 2018b; Kyle et al., 2016; Leaptrot et al., 2019) This gas-phase separation is rapid (milliseconds) and structurally selective, which can resolve isomers/isobars while providing an additional metric for compound annotation, namely the collision cross-section (CCS; Kliman et al., 2011). CCS values provide direct structural information for lipids and have been demonstrated to be highly reproducible across laboratories (Stow et al., 2017). Further, the millisecond timescale of IM measurements falls within the timescales of the LC and time-of-flight MS dimensions (minutes and microseconds, respectively; May and McLean, 2015) and thus these analytical separations (LC-IM-MS) can be performed concurrently on a single sample injection. Many recent efforts have focused on the generation of CCS databases to aid in compound identification efforts (Nichols et al., 2018; Paglia et al., 2015; Picache et al., 2019; Poland et al., 2020; Zheng et al., 2017; Zhou et al., 2020). Figure 1 compares lipid coverage of a large repository of standardized experimental CCS values, the Unified CCS Compendium (Picache et al., 2019), to the LIPID MAPS Structural Database (LMSD; Sud et al., 2007). Similar to MS/MS libraries, CCS database initiatives require substantial resources and expertise to curate and are limited by available chemical standards. However, the curation of such empirical databases has been widely successful in compiling CCS values for thousands of lipid species across a variety of classes and subclasses. Additionally, these databases have had significant success as training sets for developing large-scale theoretical CCS libraries of known molecular species using various predictive approaches (Picache et al., 2020; Plante et al., 2019; Soper-Hopper et al., 2020; Zhou et al., 2017, 2020).

Fig. 1.

Fig. 1.

Lipid coverage (log scale) of the LMSD (gray, right) and the Unified CCS Compendium (pink, left) (A color version of this figure appears in the online version of this article)

Here, we demonstrate an integrated LC-IM-MS untargeted lipidomics workflow supported by a sequential chemical class prediction informatics strategy to increase the confidence in lipidomic annotations of unknown features. In both informatic approaches, the confidence is increased in accordance with the Metabolomics Standards Initiative (MSI) confidence level system, where additional molecular descriptors can be leveraged to effectively narrow the search space of candidate identifications (Schymanski et al., 2014; Schrimpe-Rutledge et al., 2016; Sumner et al., 2007). First, using characteristic class-specific mobility-mass correlations (available in the Unified CCS Compendium), candidate identifications of lipidomic features are automatically filtered on the basis of their measured CCS values, increasing the confidence in resulting annotations. Second, features which do not yield candidate identifications from conventional database matching can be assigned tentative classes based on their CCS values using a previously developed machine learning framework (i.e. Supervised Inference of Feature Taxonomy from Ensemble Randomization [SIFTER]; Picache et al., 2020). Using this dual approach, increased lipidomic coverage can be achieved with high confidence (putative) in the annotations and resulting biological interpretations.

2 Experimental methods

2.1 Materials

Optima grade water, acetonitrile, isopropanol (IPA), methanol and ammonium acetate; as well as bicinchoninic acid assay reagents and albumin standards were purchased from Fisher Scientific (Fair Lawn, NJ). Anhydrous methyl-tert-butyl ether, ammonium bicarbonate, heptadecanoic acid, and nonadecanoic acid were purchased from Sigma Aldrich (St. Louis, MO). Lipid standards including phosphatidylglycerol (PG, chicken egg) extract, phosphatidylinositol (PI, bovine liver) extract, glucosyl(β) sphingosine (d18:1), C17 ceramide (d18:1/17:0) and a mixture of heavy-labeled lipids SPLASH LIPIDOMIX were purchased from Avanti Polar Lipids (Alabaster, AL). All data in this article were acquired using a commercial drift tube IM-mass spectrometer (6560A IM-QTOF, Agilent Technologies, Santa Clara, CA) equipped with an electrospray source (Agilent Jet Stream; May et al., 2014) and an LC system (Agilent Infinity 1290). Detailed instrument parameters can be found in the Supplementary Material.

2.2 Preparation and analysis of standard lipid extracts

Purified TLC fractions of PG and PI (Avanti) were prepared to a final concentration of 10 µg/ml in IPA. The standard extracts were analyzed in both positive and negative ion modes via direct infusion IM-MS with a sample flow rate of 10 µl/min. IM arrival times for calibrant ions (ESI Low Concentration Tuning Mixture, Agilent) were used to calibrate single-field CCS values for lipid features. Where possible, these features were identified by exact mass measurements and the LIPID MAPS Structure Database (Sud et al., 2007). These data were then collated and submitted to the Unified CCS Compendium to support downstream identification of PG and PI lipid subclasses.

2.3 Preparation and analysis of murine brain tissue

Lipid extraction of murine brain tissue samples (Pfalzer et al., 2020) was performed as detailed in the Supplementary Material. The lipophilic extracts were then analyzed using a reverse-phase LC method shown in Supplementary Figure S1. Samples were analyzed in both positive and negative ion modes. For each polarity, full-scan MS data were acquired for all samples to validate the IM features, whereas IM-MS data were acquired in triplicate on pooled samples for CCS determination. A previously established single field relationship derived from the fundamental IM equation was used to determine the CCS values from the IM arrival time measurements of all detected features (Stow et al., 2017).

2.4 Data processing

Initial processing of LC-IM-MS data involved saturation correction and smoothing in both the retention time and drift time dimensions using the PNNL PreProcessor (v. 2.0; Bilbao et al., 2021) IM arrival times for calibrant ions were acquired at the beginning of experiments and applied offline to the individual data files to determine the single-field CCS values using IM-MS Browser (B.10, Agilent). IM-MS Browser was then used to apply a preset inclusion region of mass-mobility space to each file, generating extracted files containing data limited to the IM-MS region in which lipidomic data are known to occupy (Rose et al., 2021). This IM-MS prefiltering step helps to minimize artifactual and higher order charge state signals which are uncharacteristic of lipids. Finally, 4D feature finding was performed in Mass Profiler (B.10, Agilent). Processing of LC-MS data including retention time alignment, charge carrier deconvolution and molecular feature finding was performed in Progenesis QI (v2.3, Non-linear Dynamics, Durham, NC). The resulting deisotoped/deconvoluted features represent discrete molecules and thus are referred to as molecular features. Tentative identifications were assigned to features via accurate mass measurement at a 10 ppm mass tolerance matched against entries from a combination of data repositories including METLIN Metabolite and Chemical Entity Database, Human Metabolome Database, LIPID MAPS Structure Database and LipidBlast (Kind et al., 2013; Smith et al., 2005; Sud et al., 2007; Wishart et al., 2018). Here, annotation confidence levels (i.e. tentative identifications) are notated in accordance with the MSI scheme, where higher confidence assignments can be derived from additional pieces of analytical information (Schrimpe-Rutledge et al., 2016; Schymanski et al., 2014; Sumner et al., 2007). Example data output files from Mass Profiler and Progenesis used in subsequent analyses can be found on the McLean Research Group GitHub.

2.5 Classification

All subsequent data processing was performed in the R statistical programming environment (v. 3.6.0) unless otherwise noted. Following tentative identification, all identified molecular features were assigned a hierarchical classification, including a kingdom, superclass, class and subclass, in accordance with the structure-based comprehensive ontology, ChemOnt (Feldman et al., 2005). IUPAC International Chemical Identifier strings (InChIKeys) were assigned for each compound annotation using the Chemical Translation Service via the R package, webchem (v0.4.0; Szöcs et al., 2015; Wohlgemuth et al., 2010). These InChIKeys were then used as input to the web application ClassyFire in order to assign each taxonomical classification (Djoumbou Feunang et al., 2016).

2.6 Collision cross-section filtering pipeline

CCS values obtained from LC-IM-MS processing were appended to the tentatively identified molecular features (output from Progenesis QI) using a mass tolerance of 7 ppm and retention time tolerance of 0.5 min. In cases where multiple CCS values were extracted for one feature, those with the lowest relative standard deviation were prioritized to reduce the risk of assigning artifactual CCS values. Non-linear least squares regression models were generated for known classes and subclasses within the Unified CCS Compendium using previously developed R scripts (Picache et al., 2019). These regression models were used to filter the candidate identifications on two levels (Supplementary Fig. S2): the first filter was applied to all features whose candidate classes or subclasses had a representative regression model, assigning higher confidence to those annotations whose CCS fell within the 99% predictive interval associated with the class and subclass of that identification. The second filter was applied to all features whose annotations were from multiple subclasses and had passed the first filter. This filter calculated the distance of the CCS of the feature from the mean of the regression model of each of its candidate classes or subclasses. The annotations from the class or subclass whose model fell closest to the feature CCS are assigned a higher confidence (Supplementary Fig. S3). All analysis code and R scripts can be found on the McLean Research Group Github.

2.7 SIFTER chemical class prediction

Chemical class prediction of unknown molecules for which the previous filtering pipeline yielded no results were subjected to predictive analysis using the previously developed SIFTER algorithm (Picache et al., 2020) SIFTER utilizes a random forest machine learning approach to assign chemical class predictions based on m/z, CCS and mass defect (Δm) and has been described elsewhere (Picache et al., 2020).

3 Results and discussion

3.1 Expansion of compendium lipid coverage

To support the broader applicability of the proposed analysis pipeline, efforts were made to expand the lipid subclass coverage of the CCS Compendium to include PG and PI lipids, which are major glycerophospholipid subclasses. Total purified extracts of PG and PI were analyzed via IM-MS, and CCS values of the identified features were calibrated and added to the Compendium. This analysis resulted in 35 new PG CCS values and 34 new PI CCS values, expanding the overall glycerophospholipid coverage (Fig. 1). Addition of this data to the Compendium enabled the generation of two new subclass regression models for downstream predictive analysis (Supplementary Fig. S4).

3.2 Workflow design and assessment

An overview of the parallel workflow incorporating the IM CCS data is shown in Figure 2. After alignment, charge state deconvolution and feature finding, the molecular features are searched against accurate mass databases to assign tentative identifications. The features that have accurate mass candidates in a database were assigned tentative identifications and are subjected to CCS filtering to produce more high-confidence IDs (in this case, putative annotations—level 2; Schymanski et al., 2014). Unidentified features which did not match any accurate mass database entries are submitted to SIFTER to predict molecular classifications (Fig. 2b).

Fig. 2.

Fig. 2.

Data analysis workflow utilizing IM-MS derived measurements in two complementary approaches, (A) a CCS filtering pipeline to increase the confidence of features with tentatively assigned identifications described in this work, and (B) the SIFTER algorithm to predict the molecular classification of lipidomic features not identified by accurate mass database searching

Initial assessment of the CCS filtering approach was performed using LC-IM-MS data acquired from a standard mix of isotopically labeled lipids (SPLASH LIPIDOMIX, Avanti). The masses of the molecular features corresponding to the known components of the mixture were corrected to their unlabeled counterparts to facilitate database matching. These corrected masses were assigned tentative identifications from the LIPID MAPS Structure Database to verify that the filtering pipeline would determine the correct option for each feature as expected based on its CCS value. In this preliminary test with molecular features corresponding to the standard mix components, the correct class was assigned the highest confidence for 24 of 26 features (92%), and the correct subclass was assigned the highest confidence level in 19 of 26 features (73%). Upon more detailed analysis, two incorrect assignments were due to erroneous IM feature selection in the automated peak picking, and one additional incorrect assignment resulted from an absence of the correct candidate subclass in the tentative annotation list. These preliminary tests provided insight into the reliability of the workflow with which to proceed to its application to biological samples. Though these results are promising, the performance is expected to improve as data additions to the Compendium improve the accuracy of the regression models.

3.3 CCS filtering of lipidomic data

An untargeted lipidomics experiment was performed on murine brain tissue lipophilic extracts using the CCS filtering pipeline. An illustration of the number of features reduced in each step of the pipeline (Fig. 2a) is contained in Figure 3, with features that pass through each level of the filter being assigned higher confidence (Fig. 3b).

Fig. 3.

Fig. 3.

Feature reduction workflow for assigning molecular identifications with increasing levels of confidence using the IM-derived CCS. (A) Of 1657 features, 1083 were tentatively identified and thus subjected to CCS filtering. The remaining 574 with no identifications were submitted to the SIFTER algorithm for classification prediction. (B) Results of the CCS filtering pipeline in which increased confidence is assigned from each level of the filter. (C) Out of 1083 tentatively identified features, 39 did not have an ID with a representative regression model. (D) 883 features passed the class-specific filter, while 161 features fell outside the predictive interval of their class model(s). Finally, (E) 512 features were passed through the feature-specific filter. The 371 remaining features had IDs from only one class and did not need the additional filtering level

3.3.1 Tentatively identified features

Of the 1657 molecular features extracted from the raw data, 1083 (65%) were tentatively identified using accurate mass database searching. During mass database matching, the average number of candidate identifications per compound was 61 and, on average, 75% of these candidates were database entries with the same chemical formula, i.e. mass isomers. The candidate identifications were next assigned a hierarchical structural classification using the ClassyFire algorithm (Djoumbou Feunang et al., 2016). Of these classified compounds, 39 (∼4%) did not have a representative regression model in the training data, and thus could not be assigned any higher confidence other than the initial tentative identification (Fig. 3c).

3.3.2 Class-specific filter

Following classification, the features were subjected to two stages of CCS-based filtering using both the CCS values and the regression models generated from entries in the Unified CCS Compendium. The first filter level is a class-specific heuristic filter that assigns a higher confidence to an annotation if the CCS of that feature falls within the 99% predictive interval of the class to which the annotation belongs. In addition to assigning increased confidence, this step also serves as a quality assurance step, ensuring that the annotations that pass to the next stage are plausible candidates for the molecular feature. Here, 883 molecular features had annotations that passed this filter, whereas 161 features had CCS values that fell outside of the predictive intervals of all their candidate identifications, meaning these tentative identifications could not be further validated with their CCS information (Fig. 3d).

3.3.3 Feature-specific filter

In the second stage of the filtering pipeline, the proximity of the feature CCS to the mean of the regression models is evaluated. Higher confidence is assigned to an annotation if the mean of the corresponding class regression model falls within a shorter distance to the CCS of the feature than those of other potential classes. Over 500 features had annotations that could be distinguished in this way, whereas the remaining 371 had only one candidate class or subclass, and therefore did not require further filtering (Fig. 3e).

Using this CCS filtering approach, 82% of the features tentatively identified via accurate mass information (883 out of 1083) were able to be assigned some increased confidence using their CCS values. The average number of potential identification candidates for these high-confidence IDs was decreased from 65 to 31, which represents a reduction of over 50% of the possible compound identities that can be assigned to these features.

In addition to decreasing the overall average candidates per feature, the CCS filtering method also shifted the distribution of candidates significantly (Fig. 4a). Prior to filtering the annotations on the basis of their CCS values, only 34% of the features had 10 or fewer candidates. Of the 883 features whose candidate identifications had been subjected to filtering, 67% had 10 or fewer remaining candidates. This effective narrowing down of feature identification candidate lists increases the confidence in the remaining options.

Fig. 4.

Fig. 4.

Comparison of feature annotations before and after applying the filtering pipeline. (A) The list of possible candidate identifications was decreased by filtering on the basis of their CCS values. (B) Lipid class distribution of the unfiltered features with identifications assigned by accurate mass only, as compared with (C) class distribution of features with identifications assigned by the combination of accurate mass and CCS filtering

For these filtered lists, all the remaining candidate identifications are compounds within the same chemical class, providing high confidence in the class assignments, even in cases in which persisting isomeric/isobaric ambiguity increases the possible number of candidates. This, in turn, lends higher confidence to the resulting lipid class profile in these analyses. Figure 4b illustrates differences in lipid class distribution among ‘unfiltered’ features (where the ‘top hit’ of the accurate mass database search was used as the identification) and ‘filtered’ features (where the CCS filtering pipeline was used to assign the feature identity). In addition to the shift in distribution of lipid classes between the two groups of features, the share of features with assigned identifications belonging to non-lipid classes is reduced in the filtered group. Out of the 883 initial identifications made using the top candidate, only 441 of these identifications were consistent once the CCS filtering steps were applied (50%), reinforcing the notion that identifications based on mass measurement alone are insufficient for accurate lipid structure assignments.

3.4 SIFTER classification of unknown features

The molecular features which were not assigned any tentative identifications or ‘hits’ based on mass database searching were submitted to the SIFTER algorithm, as outlined in Figure 2b (Picache et al., 2020). Of the 192 features whose classes were successfully predicted by SIFTER, 82% of these had associated class prediction probabilities of >90% and no features had a probability <70%, indicating high confidence in the accuracy of the resulting predictions. Figure 5 details the hierarchical predicted class distribution of the unknown features from the SIFTER algorithm. As can be expected from a lipophilic extract, almost half of the predictions are lipids and lipid-like molecules (48%, 93/192). The fact that these compounds were not identified from the initial mass database searching suggests that these databases are incomplete in terms of lipidomic coverage, which might be accepted given that the majority of unidentified compounds derived from biological sources are predicted to be lipids (May and McLean, 2016). Notably, a little over half of the predictions from SIFTER were not classified as lipids and lipid-like molecules, but were organic acids (e.g. amino acids and peptides), organic oxygen compounds (e.g. carbohydrates) and organoheterocyclic compounds (e.g. nucleic acids). Since only lipid-specific databases were used for the accurate mass database searching, it is not surprising that most of the unidentified features are predicted to be molecules other than lipids. Further, amino acids and carbohydrates, which make up a large majority of the predicted non-lipids, are common lipid headgroup components whose presence may be partially explained by degradation at the sample level or in-source fragmentation during analysis.

Fig. 5.

Fig. 5.

Molecular classifications were predicted with the SIFTER machine learning algorithm for experimental features where no candidate identifications were assigned from the initial database search

The results from the CCS filtering and the SIFTER classification taken together address two limitations of identification assignment related to database coverage. Searching a large database or multiple databases increases search space and likelihood of a ‘hit’, but also increases the likelihood of false positives and can lower the confidence when the candidate lists are long. Conversely, searching a small, focused database space will result in fewer overall candidate identifications, lowering class coverage. Using additional analytical information, such as CCS provided by IM, mitigates both limitations, and the dual approach demonstrated in this article illustrates a strategy to maximize the chemical information which can be derived from IM measurements.

Despite providing increased confidence in molecular annotations, the parallel informatics workflow described in this manuscript comes with challenges. In both the CCS filtering pipeline and the SIFTER prediction, performance is limited by the training set and their resulting regression models. Additionally, the confidence in the assignment decreases with the classification hierarchical structure, i.e. there is more confidence in assigning a molecular class than a subclass because of the limited CCS space covered by the empirically trained regression models. In other words, the more specific the classification, the more overlap in its model with other similar models. Similarly, classes that are not well-populated in the training set will result in less confidence in annotations of that class. These limitations illustrate the importance of a reliable and representative training set and will diminish as the coverage of empirical databases, such as the Unified CCS Compendium, increases.

4 Conclusions

An integrated parallel workflow using IM-derived CCS for high-confidence untargeted lipidomics was demonstrated. This dual workflow combines a previously developed classification prediction algorithm with a novel CCS filtering pipeline. The utility of the workflow was shown using untargeted LC-IM-MS data from a murine brain lipid extract. Filtering candidate compound identifications on the basis of their measured CCS values increased the confidence of the annotations of 883 compounds by narrowing their candidate lists by over 50% on average. Using the SIFTER machine learning algorithm to predict the classification of unidentified features provided insight into the likely classes of 192 compounds which otherwise were not assigned a tentative identification based on accurate mass searching. Future work toward expansion of the Unified CCS Compendium training set will enable the use of this CCS filtering pipeline as a quantifiable metric for increased annotation confidence. Further integration of these approaches with other analytical techniques, e.g. tandem MS/MS ion fragmentation or LC retention time searching or prediction, will promote high confidence and increased coverage to improve interpretation and understanding of the complexity of the lipidome.

Supplementary Material

btac197_Supplementary_Data

Acknowledgements

The authors would like to thank Anna C. Pfalzer, Jordyn M. Wilcox, and Aaron B. Bowman for providing the murine brain tissue samples.

Contributor Information

Bailey S Rose, Department of Chemistry, Center for Innovative Technology, Vanderbilt-Ingram Cancer Center, Vanderbilt Institute of Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.

Jody C May, Department of Chemistry, Center for Innovative Technology, Vanderbilt-Ingram Cancer Center, Vanderbilt Institute of Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.

Jaqueline A Picache, Department of Chemistry, Center for Innovative Technology, Vanderbilt-Ingram Cancer Center, Vanderbilt Institute of Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.

Simona G Codreanu, Department of Chemistry, Center for Innovative Technology, Vanderbilt-Ingram Cancer Center, Vanderbilt Institute of Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.

Stacy D Sherrod, Department of Chemistry, Center for Innovative Technology, Vanderbilt-Ingram Cancer Center, Vanderbilt Institute of Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.

John A McLean, Department of Chemistry, Center for Innovative Technology, Vanderbilt-Ingram Cancer Center, Vanderbilt Institute of Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.

Funding

This work was supported in part using the resources of the Center for Innovative Technology (CIT) at Vanderbilt University. Financial support for aspects of this work was provided by the National Institutes of Health (NIH NIA) under project No. [R03NS125243] and the U.S. Environmental Protection Agency (EPA) under grant No. [R839504]. This work has not been formally reviewed by the EPA and EPA does not endorse any products or commercial services mentioned in this document. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the EPA or the U.S. Government. The data underlying this article are available in the McLean Research Group GitHub at https://github.com/McLeanResearchGroup/CCS-filter and the Unified CCS Compendium at McLeanResearchGroup.ShinyApps.io/CCS-Compendium.

Author contributions

The article was written through contributions of all authors who have given approval to the final version of the article.

Conflict of Interest:  The authors are unaware of any potential bias that affect the objectivity of this work, but acknowledge that the Vanderbilt University Center for Innovative Technology is designated as a Thought Leader Laboratory by Agilent Technologies, which is a manufacturer of the instrumentation and associated software used in this manuscript.

References

  1. Beyaz A.  et al. (2014) Instrument parameters controlling retention precision in gradient elution reversed-phase liquid chromatography. J. Chromatogr. A, 1371, 90–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bilbao A.  et al. (2021) A preprocessing tool for enhanced ion mobility–mass spectrometry-based omics workflows. J. Proteome Res., 21, 798–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Blaženović I.  et al. (2018a) Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites, 8, 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Blaženović I.  et al. , (2018b) Increasing compound identification rates in untargeted lipidomics research with liquid chromatography drift time–ion mobility mass spectrometry. Anal. Chem., 90, 10758–10764. [DOI] [PubMed] [Google Scholar]
  5. Cajka T., Fiehn O. (2014) Comprehensive analysis of lipids in biological systems by liquid chromatography-mass spectrometry. Trends Anal. Chem., 61, 192–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chatgilialoglu C.  et al. (2014) Lipid geometrical isomerism: from chemistry to biology and diagnostics. Chem. Rev., 114, 255–284. [DOI] [PubMed] [Google Scholar]
  7. Cullis P.R.  et al. (1996) Chapter 1: physical properties and functional roles of lipids in membranes. In: Vance D.E., Vance J.E. (eds) New Comprehensive Biochemistry. Vol. 31, Elsevier Science Publishers, Amsterdam, Netherlands, pp. 1–33. [Google Scholar]
  8. Djoumbou Feunang Y.  et al. (2016) ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform., 8, 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Emília A.  et al. (2015) Lipidomics in the study of lipid metabolism: current perspectives in the omic sciences. Gene, 554, 131–139. [DOI] [PubMed] [Google Scholar]
  10. Feldman H.J.  et al. (2005) CO: a chemical ontology for identification of functional groups and semantic comparison of small molecules. FEBS Lett., 579, 4685–4691. [DOI] [PubMed] [Google Scholar]
  11. Harris R.A.  et al. (2019) New frontiers in lipidomics analyses using structurally selective ion mobility-mass spectrometry. Trends Anal. Chem., 116, 316–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kind T.  et al. (2013) LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat. Methods, 10, 755–758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kliman M.  et al. (2011) Lipid analysis and lipidomics by structurally selective ion mobility-mass spectrometry. Biochim. Biophys. Acta, 1811, 935–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kochen M.A.  et al. (2016) Greazy: open-source software for automated phospholipid tandem mass spectrometry identification. Anal. Chem., 88, 5733–5741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Koelmel J.P.  et al. (2017) LipidMatch: an automated workflow for rule-based lipid identification using untargeted high-resolution tandem mass spectrometry data. BMC Bioinformatics, 18, 331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Koelmel J.P.  et al. (2020) Lipid annotator: towards accurate annotation in non-targeted liquid chromatography high-resolution tandem mass spectrometry (LC-HRMS/MS) lipidomics using a rapid and user-friendly software. Metabolites, 10, 101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kyle J.E.  et al. (2016) Uncovering biologically significant lipid isomers with liquid chromatography, ion mobility spectrometry and mass spectrometry. Analyst, 141, 1649–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Leaptrot K.L.  et al. (2019) Ion mobility conformational lipid atlas for high confidence lipidomics. Nat. Commun., 10, 985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. May J.C., McLean J.A. (2015) Ion mobility-mass spectrometry: time-dispersive instrumentation. Anal. Chem., 87, 1422–1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. May J.C., McLean J.A. (2016) Advanced multidimensional separations in mass spectrometry: navigating the big data deluge. Annu. Rev. Anal. Chem., 9, 387–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. May J.C.  et al. (2014) Conformational ordering of biomolecules in the gas phase: nitrogen collision cross sections measured on a prototype high resolution drift tube ion mobility-mass spectrometer. Anal. Chem., 86, 2107–2116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Navas-Iglesias N.  et al. (2009) From lipids analysis towards lipidomics, a new challenge for the analytical chemistry of the 21st century. Part II: analytical lipidomics. Trends Anal. Chem., 28, 393–403. [Google Scholar]
  23. Nichols C.M.  et al. (2018) Untargeted molecular discovery in primary metabolism: collision cross section as a molecular descriptor in ion mobility-mass spectrometry. Anal. Chem., 90, 14484–14492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Paglia G.  et al. (2015) Ion mobility-derived collision cross section as an additional measure for lipid fingerprinting and identification. Anal. Chem., 87, 1137–1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Peterson B.L., Cummings B.S. (2006) A review of chromatographic methods for the assessment of phospholipids in biological samples. Biomed. Chromatogr., 20, 227–243. [DOI] [PubMed] [Google Scholar]
  26. Pfalzer A.C.  et al. (2020) Huntington’s disease genotype suppresses global manganese-responsive processes in pre-manifest and manifest YAC128 mice. Metallomics, 12, 1118–1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Picache J.A.  et al. (2019) Collision cross section compendium to annotate and predict multi-omic compound identities. Chem. Sci., 10, 983–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Picache J.A.  et al. (2020) Chemical class prediction of unknown biomolecules using ion mobility-mass spectrometry and machine learning: supervised inference of feature taxonomy from ensemble randomization. Anal. Chem., 92, 10759–10767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Plante P.-L.  et al. (2019) Predicting ion mobility collision cross-sections using a deep neural network: DeepCCS. Anal. Chem., 91, 5191–5199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Poland J.C.  et al. (2020) Collision cross section conformational analyses of bile acids via ion mobility–mass spectrometry. J. Am. Soc. Mass. Spectrom., 31, 1625–1631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Rose B.S.  et al. (2021) High confidence shotgun lipidomics using structurally selective ion mobility-mass spectrometry. In: Hsu F.-F. (ed.) Mass Spectrometry-Based Lipidomics: Methods and Protocols. Springer US, New York, NY, pp. 11–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ross D.H.  et al. (2020) LiPydomics: a Python package for comprehensive prediction of lipid collision cross sections and retention times and analysis of ion mobility-mass spectrometry-based lipidomics data. Anal. Chem., 92, 14967–14975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Rustam Y.H., Reid G.E. (2018) Analytical challenges and recent advances in mass spectrometry based lipidomics. Anal. Chem., 90, 374–397. [DOI] [PubMed] [Google Scholar]
  34. Schrimpe-Rutledge A.C.  et al. (2016) Untargeted metabolomics strategies—challenges and emerging directions. J. Am. Soc. Mass. Spectrom., 27, 1897–1905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Schymanski E.L.  et al. (2014) Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol., 48, 2097–2098. [DOI] [PubMed] [Google Scholar]
  36. Sindelar M., Patti G.J. (2020) Chemical discovery in the era of metabolomics. J. Am. Chem. Soc., 142, 9097–9105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Smith C.A.  et al. (2005) METLIN: a metabolite mass spectral database. Ther. Drug Monit., 27, 747–751. [DOI] [PubMed] [Google Scholar]
  38. Soper-Hopper M.T.  et al. (2020) Metabolite collision cross section prediction without energy-minimized structures. Analyst, 145, 5414–5418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Stow S.M.  et al. (2017) An interlaboratory evaluation of drift tube ion mobility–mass spectrometry collision cross section measurements. Anal. Chem., 89, 9048–9055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Sud M.  et al. (2007) LMSD: LIPID MAPS structure database. Nucleic Acids Res., 35, D527–D532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Sumner L.W.  et al. (2007) Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) metabolomics standards inititative (MSI). Metabolomics, 3, 211–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Szöcs E.  et al. (2015) webchem: retrieve chemical information from the web. Zenodo, https://doi.org/10.5281/zenodo.33823.
  43. Taylor P.J. (2005) Matrix effects: the Achilles heel of quantitative high-performance liquid chromatography–electrospray–tandem mass spectrometry. Clin. Biochem., 38, 328–334. [DOI] [PubMed] [Google Scholar]
  44. Wenk M.R. (2005) The emerging field of lipidomics. Nat. Rev. Drug Discov., 4, 594–610. [DOI] [PubMed] [Google Scholar]
  45. Wishart D.S.  et al. (2018) HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res., 46, D608–D617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wohlgemuth G.  et al. (2010) The chemical translation service-a web-based tool to improve standardization of metabolomic reports. Bioinformatics, 26, 2647–2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wymann M.P., Schneiter R. (2008) Lipid signalling in disease. Nat. Rev. Mol. Cell Biol., 9, 162–176. [DOI] [PubMed] [Google Scholar]
  48. Xu T.  et al. (2020) Recent advances in analytical strategies for mass spectrometry-based lipidomics. Anal. Chim. Acta, 1137, 156–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zheng X.  et al. (2017) A structural examination and collision cross section database for over 500 metabolites and xenobiotics using drift tube ion mobility spectrometry. Chem. Sci., 8, 7724–7736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Zhou Z.  et al. (2017) LipidCCS: prediction of collision cross-section values for lipids with high precision to support ion mobility-mass spectrometry-based lipidomics. Anal. Chem., 89, 9559–9566. [DOI] [PubMed] [Google Scholar]
  51. Zhou Z.  et al. (2020) Ion mobility collision cross-section atlas for known and unknown metabolite annotation in untargeted metabolomics. Nat. Commun., 11, 4334. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btac197_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES