Ion mobility spectrometry (IMS) and the corresponding calculated collision cross section (CCS) values have emerged as an important resource for small molecule characterization in biochemical research. Here we present the METLIN-CCS database, which includes CCS values derived from IMS data for over 27,000 molecular standards, representing 79 chemical classes. METLIN-CCS provides CCS values measured in triplicate for these standards in both positive and negative ionization modes and yielding multiple ion types (for example, [M+H]+, [M+Na]+ and [M–H]−). With over 185,000 CCS values, the METLIN-CCS database is a unique resource for small molecule characterization and IMS-based machine learning.
In the last two decades, there has been a substantial increase in metabolomic and exposomic analyses, driven by a growing interest in how small molecules directly affect personal and population health1. These studies have highlighted the potential of small molecules, including primary metabolites, secondary metabolites and xenobiotics. And while mass spectrometry (MS) is typically the technology of choice in evaluating small molecules, orthogonal analytical methods are important for more confident identification. The use of IMS has therefore recently increased2 because of its orthogonal nature to chromatography, MS and tandem mass spectrometry (MS/MS) data. IMS data are generated by separating ions based on the interplay of an electric field pulling force and frictional drag force resulting from collisions with buffer gas molecules3. The resulting CCS values provide a size-based characteristic that further facilitates experimental small molecule identification.
METLIN currently hosts experimental MS/MS4,5 and neutral loss data on over 900,000 molecular standards. The new downloadable METLIN-CCS database features CCS values for a wide range of molecules, allowing researchers to compare their experimental CCS values for molecular identity confirmation. The METLIN-CCS database was designed for three purposes: (1) create a downloadable resource containing thousands of CCS values, (2) provide a basis set of data to train machine learning models, and (3) explore aggregation properties of each standard to provide information on how molecular ions behave in the source and throughout the instrument. The METLIN-CCS database was generated in a five-step process (Fig. 1a), which started by analyzing over 30,000 small molecule standards in triplicate and in both positive and negative ionization mode with an ionization success rate of 90%. The standards represent 79 different molecular classes (Supplementary Table 1). Initially, each standard was deposited into a 384-well plate using LabCyte acoustic deposition and then injected into an Agilent liquid chromatography system and a timsTOF Pro for trapped ion mobility spectrometry (TIMS) in nitrogen buffer gas and quadrupole time-of-flight MS analysis (see Supplementary Information). The raw data were then loaded into Skyline for CCS calculation, resulting in over 250,000 initial CCS values due to the various ions observed6. After quality control assessments, based on unique ion species having coefficient of variation values less than 2% using the triplicates, 185,589 CCS values matching 61,863 unique molecular species and 27,633 standards were retained for the METLIN-CCS database. The production of this database is an important contribution as experimentally generated CCS values for <5,000 small molecules are publicly available at present7.
Fig. 1 |. METLIN-CCS.

a, General five-step workflow used to generate CCS values for the molecular standards. b, Representative CCS values for the 27,633 molecular standards with respective [M+H]+, [M+Na]+ and [M−H]− ions. Two trend lines were observed for the ions, the first corresponding to those that formed singly charged dimers in the ESI source, stayed as dimers during the IMS separation, and then broke into monomers before detection (ions above dotted line) and the second corresponding to ions that were singly charged monomers from the ESI source to detector (ions below the dotted line). The lower trend line (ions below the dotted line) illustrates ions that were singly charged monomers from the ESI source to detector. The tuning mix of monomeric and dimeric ions at m/z of 118 and the monomers at 622 helped distinguish the trend line. m, mass; z, charge; K0, reduced mobility; CV, coefficient of variation.
To validate the TIMS CCS values obtained in the experiments, we initially compared the TIMS CCS values against previously generated drift tube IMS (DTIMS) CCS values8. This comparison is important as DTIMS is considered the gold standard for CCS values because DTIMS is able to apply multiple different electric fields and determine reduced mobility, K0, from first principles. As a result of the trapped and ramping approach for TIMS, the application of multiple electric fields is not possible, so calibration is needed to determine CCS values. The comparison of the DTIMS and TIMS values illustrated an average CCS variation of ±1.03% for lipids, indicating good correlation and similar CCS values between the two IMS techniques for these molecule types. Thus, we further evaluated all CCS values obtained with TIMS.
Overall, 61,863 unique molecular ion types were observed corresponding to 25,919 [M+H]+, 17,450 [M+Na]+ and 18,494 [M–H]− ions for 27,633 standards. The CCS values for all ion types formed two distinct trend lines, with 3,822 of the CCS values in the top trend line and the other 58,041 in the bottom trend line (Fig. 1b). Using the calibrant ions, we observed that the ions in the top trend line moved through the IMS separation region as 1+ or 1− dimers and broke into singly charged monomer ions following IMS separation but before detection, as in previous observations9,10.
In summary, the downloadable METLIN-CCS database contains CCS values (in .tsv and .csv formats) for 27,633 molecular standards with data generated in positive and negative ionization mode (for example, [M+H]+, [M+Na]+ and [M–H]−). Furthermore, because of the replicates, the provided IMS data from over 160,000 individual datasets should aid in designing machine learning algorithms and probing novel molecule types.
Supplementary Material
Acknowledgements
The authors appreciate the assistance of Bruker in facilitating the timsTOF data acquisition. We also thank Ulrike Schweiger-Hufnagel (Bruker) for her invaluable support during this process. E.S.B. acknowledges support from the US National Institute of Environmental Health Sciences (P42 ES027704), US National Institute of General Medicinal Sciences (RO1 GM141277 and RM1 GM145416) and US Environmental Protection Agency (STAR RD 84003201). This research was also partially funded by US National Institutes of Health grants R24 GM141156 (M.M.), P30 AG013280 (M.M), R35 GM130385 (G.S.) and U01 CA235493 (G.S.).
Footnotes
Competing interests
The authors declare no competing interests.
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1038/s41592-023-02078-5.
Data availability
The downloadable molecular standards METLIN-CCS database reported in this article are freely available at METLIN (https://metlin.scripps.edu/), the XCMS online data repository (https://xcmsonline.scripps.edu/) and PanoramaWeb (https://panoramaweb.org/ccs-library.url). The ion mobility data conversion to CCS values software is available at Skyline (https://skyline.ms/skyline.url).
References
- 1.Suhre K. et al. Nature 477, 54–60 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dodds JN & Baker ES J. Am. Soc. Mass Spectrom 30, 2185–2195 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zheng X. et al. Annu. Rev. Anal. Chem. (Palo Alto, Calif.) 10, 71–92 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Guijas C. et al. Anal. Chem 90, 3156–3164 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xue J, Guijas C, Benton HP, Warth B & Siuzdak G Nat. Methods 17, 953–954 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.MacLean BX et al. J. Am. Soc. Mass Spectrom 29, 2182–2188 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Picache JA et al. Chem. Sci 10, 983 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kirkwood KI et al. J. Proteome Res 21, 232–242 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stow SM et al. Anal. Chem 89, 9048–9055 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zheng X. et al. Anal. Bioanal. Chem 409, 467–476 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The downloadable molecular standards METLIN-CCS database reported in this article are freely available at METLIN (https://metlin.scripps.edu/), the XCMS online data repository (https://xcmsonline.scripps.edu/) and PanoramaWeb (https://panoramaweb.org/ccs-library.url). The ion mobility data conversion to CCS values software is available at Skyline (https://skyline.ms/skyline.url).
