High-Throughput Non-targeted Chemical Structure Identification Using Gas-Phase Infrared Spectra

Erandika Karunaratne; Dennis W Hill; Philipp Pracht; José A Gascón; Stefan Grimme; David F Grant

doi:10.1021/acs.analchem.1c02244

. Author manuscript; available in PMC: 2022 Aug 3.

Published in final edited form as: Anal Chem. 2021 Jul 21;93(30):10688–10696. doi: 10.1021/acs.analchem.1c02244

High-Throughput Non-targeted Chemical Structure Identification Using Gas-Phase Infrared Spectra

Erandika Karunaratne ¹, Dennis W Hill ², Philipp Pracht ³, José A Gascón ⁴, Stefan Grimme ⁵, David F Grant ⁶

PMCID: PMC8404482 NIHMSID: NIHMS1727968 PMID: 34288660

Abstract

The high-throughput identification of unknown metabolites in biological samples remains challenging. Most current non-targeted metabolomics studies rely on mass spectrometry, followed by computational methods that rank thousands of candidate structures based on how closely their predicted mass spectra match the experimental mass spectrum of an unknown. We reasoned that the infrared (IR) spectra could be used in an analogous manner and could add orthologous structure discrimination; however, this has never been evaluated on large data sets. Here, we present results of a high-throughput computational method for predicting IR spectra of candidate compounds obtained from the PubChem database. Predicted spectra were ranked based on their similarity to gas-phase experimental IR spectra of test compounds obtained from the NIST. Our computational workflow (IRdentify) consists of a fast semiempirical quantum mechanical method for initial IR spectra prediction, ranking, and triaging, followed by a final IR spectra prediction and ranking using density functional theory. This approach resulted in the correct identification of 47% of 258 test compounds. On average, there were 2152 candidate structures evaluated for each test compound, giving a total of approximately 555,200 candidate structures evaluated. We discuss several variables that influenced the identification accuracy and then demonstrate the potential application of this approach in three areas: (1) combining IR and mass spectra rankings into a single composite rank score, (2) identifying the precursor and fragment ions using cryogenic ion vibrational spectroscopy, and (3) the incorporation of a trimethylsilyl derivatization step to extend the method compatibility to less-volatile compounds. Overall, our results suggest that matching computational with experimental IR spectra is a potentially powerful orthogonal option for adding significant high-throughput chemical structure discrimination when used with other non-targeted chemical structure identification methods.

Graphical Abstract

graphic file with name nihms-1727968-f0001.jpg

INTRODUCTION

Although there have been multiple recent advances in computational and analytical methods that allow us to detect thousands of chemicals in complex biological samples, structure determination remains a fundamental problem in metabolomics research.^1–6 In human health applications, the inability to identify the structures of most detected compounds has become a seemingly intractable limitation of translational metabolomics. Despite the availability of a variety of metabolite databases (HMDB, Metlin, KEGG, HumanCyc, PubChem, and Chem-Spider) that collectively contain >100 × 10⁶ unique chemical structures,^7–16 most non-targeted metabolomics studies report the structure of a relatively low percentage of detectable compounds.^17–19 Our continued inability to identify most detected compounds is due to two major factors. First, the total number of potential unknown metabolites that might exist in any given biological sample is likely far greater than anticipated. We must now consider potential metabolites coming from the microbiome,²⁰ the exposome,^18,21 the lipidome,^22–25 the epimetabolome,²⁶ and the infectome.²⁷ Second, all analytical techniques used for compound identification have inherent limitations. These include limitations in sensitivity, discrimination/resolution, and throughput. NMR and mass spectrometry are the most consistently used analytical methods. NMR is quantitative, highly reproducible, and nondestructive but lacks throughput and sensitivity. Thus, high-resolution gas and liquid chromatography coupled to high-accuracy mass spectrometry (GC/LC–MS) methods with their high sensitivity, throughput, and resolution have dominated the field in recent years.¹ Unlike targeted studies where compound identification relies on comparing an experimental retention time and a tandem mass spectrum to a known standard,⁵ non-targeted GC/LC–MS-based metabolomic studies are designed to identify the structure of unknown compounds using a variety of computational methods for data extraction, interpretation, and candidate ranking.^4,28 Computational options include (1) in silico simulation methods,^29–32 (2) combinatorial fragmentation methods,^33–35 and (3) fingerprint-based methods.^36,37

There are advantages and disadvantages (in terms of speed, accuracy, and availability) to all non-targeted approaches. Some require model training, and a strong correlation exists between training data quantity and quality and the ability to identify unknown compounds with these methods. CSI:FingerID³⁶ has reported some of the best results (42% correct identifications) based on the percentage of correct identifications of unknowns that were not included in model training. However, unknown chemical structures that are not similar to the training structures or are not included in training data are less likely to be correctly identified. Another limitation of non-targeted GC/LC–MS methods is that it is often difficult to assess identification reliability. When searching large chemical databases, there will always be a top ranked compound; however, the correct unknown might be in the top 10 or 20 or perhaps not present in the database at all. Although progress has been made in this area,^38–40 it remains difficult to assess method accuracy other than by a final comparison with a known analytical standard using multiple analytical techniques. Thus, we and others have proposed the inclusion of multiple orthogonal, analytical, and computational methods to improve the likelihood that the correct candidate is at, or near, the top of the ranked candidate list. These include augmenting GC/LC–MS data with retention time/index,^41–47 Ecom50,^48,49 ion mobility,^50–52 and biochemical substructure matching.⁵³

Infrared (IR) spectroscopy has been used as a bioanalytical tool for over 100 years (reviewed in)⁵⁴ and is reliably understood at both the molecular and atomic levels. Although an IR spectrum is characteristic of an entire molecule, functional groups absorb IR radiation at well-defined frequencies regardless of the structure of the rest of the molecule. It is the persistence of these characteristic IR absorption bands that permits direct interpretation of the chemical structure. In addition, the region of the IR spectrum from 500 to 1450 cm⁻¹ (the fingerprint region) provides interpretable information regarding atom interconnectivity and 2D arrangement. Analytically, hyphenated GC–Fourier transform IR (FTIR) and GC/FTIR/MS instruments have been available for many years. Current GC/FTIR/MS instruments have on-column sensitivities in the low nanogram to high picogram range⁵⁵ and, since IR and MS provide the orthogonal data, have allowed the identification of closely related chemical structures when MS and GC–MS data are ambiguous.^55–57 A group at the FELIX laboratory⁵⁸ provided several examples of the utility of this approach.^59–63 Additionally, Menges et al.⁶⁴ reported the integration of MS and cryogenic ion vibrational spectrometry in a single instrument to obtain MS and vibrational spectra for phenylalanyl–tyrosine (FY) and three collision-induced FY fragment ions.

Interpretation of an IR spectrum for the identification of chemical functional groups is typically done by manual inspection, although quantitative comparisons are possible.⁵⁴ Importantly, quantum chemistry calculation of IR spectra, as well as machine learning methods, has been shown to correctly identify functional groups and predict the 2D structure of molecules.^65–68 Henschel and van der Spoel⁶⁹ used a small database of 670 compounds to demonstrate promising results for matching theoretical versus experimental IR spectra with various combinations of quantum mechanical approaches. We previously evaluated the performance of several semiempirical quantum mechanical GFN tight-binding, force-field, and B3LYP-3c density functional theory (DFT) methods for predicting IR spectra. Predicted spectra were then directly compared with the experimental GC–FTIR spectra taken from the National Institute of Standards and Technology (NIST) database.⁷⁰ We found that employing a fast GFN2-xTB tight-binding method and a B3LYP-3c DFT composite method with atomic mass scaling provided accurate matching between the predicted and experimental spectra of a given compound. However, there are no reports of using IR spectroscopy as a high-throughput method for chemical structure identification. Thus, for a non-targeted approach, it is currently not known whether theoretically predicted IR spectra are accurate enough to allow discrimination among hundreds or thousands of potential positional isomers (typical of large chemical databases) when compared to the experimental IR spectrum of an unknown compound. In addition, it is not clear whether computational speeds are sufficient to make this approach feasible for large data sets.

Here, we describe the development of a workflow (IRdentify) to evaluate IR spectroscopy as a complementary non-targeted method for high-throughput compound identification. We evaluated a total of 258 test compounds (from the NIST GC–FTIR database) for this study. Candidate structures from PubChem were selected using the monoisotopic molecular weight (MIMW) ± 10 ppm of each test compound. This resulted in an average of 2152 candidate structures for each test compound for a total of approximately 555,200 candidate structures evaluated. Our computational workflow consists of a fast semiempirical quantum mechanical method for initial IR spectra prediction, ranking, and triaging, followed by a final IR spectra prediction and ranking using DFT. Although this work was designed to evaluate the use of IR spectroscopy as a complementary high-throughput method to add orthogonal structure discrimination (rather than be compared with other identification methods), we correctly identified 47% of these 258 test compounds: a result similar to that of current MS-based methods. We then explored the variables influencing identification success. These included structure conformation dependence, candidate bin size, the mass of the unknown, and the number of experimental IR peaks. We also demonstrate the utility of combining IR and tandem mass spectrometry (MS/MS) ranking methods into a single composite score. Finally, we evaluated whether a trimethylsilyl (TMS) derivatization step could allow the extension of this method to less-volatile compounds. Our results suggest that matching computational with experimental IR spectra is a powerful high-throughput orthogonal approach for augmenting non-targeted MS-based methods. Although we focus here on metabolomics, this approach has applications in diverse areas of analytical chemistry, forensic chemistry and biochemistry.

EXPERIMENTAL SECTION

Computational Details.

This work used the new workflow program “IRdentify” for the identification of unknown compounds through the comparison of experimental gas-phase IR spectra with theoretical gas-phase IR spectra. This involved the use of both GFN tight-binding and composite DFT methods. To calculate the similarity between the experimental and theoretical gas-phase IR spectra, we utilized the match score (MSc), as previously described.⁷⁰

Data Set.

Spectra from the NIST/EPA gas-phase FTIR database⁷¹ formed the basis of the experimental data set used in this work. This database comes in two spectral sets, both with one data point every 4 cm⁻¹: (a) NIST file, 2120 spectra and (b) EPA file, 3108 spectra. All 5228 structures corresponding to experimental FTIR spectra were checked for consistency with the associated molecular formula and CAS registry numbers. From this, a sub-database of compounds was generated by removing the compounds that (i) consisted of atoms other than C, H, O, N, S, P, F, Cl, and Br, (ii) were duplicates, (iii) were salts, (iv) contained deuterium, and/or (v) were not neutral. This resulted in a final database of 4872 compounds and corresponding FTIR spectra. These were then sorted by their MIMW, and every 18th compound (with a non-repeated MIMW) having an MIMW > 75 g/mol was assigned to a sub-dataset of 258 compounds, which constituted the test set. For each test compound, a set of the 2D structures of compounds in the PubChem database with MIMWs ±10 ppm of that of the test compound were retrieved into a bin of candidate structures. From this, compounds with disconnected fragments, ions, stereoisomers, and heavy isotopes were removed. The resulting bins consisted of between 3 and 20,255 candidate compounds and contained the correct test compound. All additional experimental details are given in the Supporting Information.

RESULTS AND DISCUSSION

We recently used a semiempirical quantum mechanical GFN tight-binding method and a composite DFT method to match the calculated IR spectra with over 7000 experimental gas-phase FTIR spectra.⁷⁰ Here, we extend and formalize this approach in “IRdentify” as a method for high-throughput spectra prediction, matching, and ranking of candidate compounds from PubChem for identifying unknown chemical structures.

Figure 1 illustrates the IRdentify workflow using 258 experimental gas-phase FTIR spectra from the NIST database and candidate compounds from PubChem (Figure S1 provides the test and candidate compound characteristics). After an initial filtering of the 258 bins of PubChem candidates (Figure 1 panel 1) and conversion of PubChem candidates to their 3D structures (Figure 1 panel 2), vibrational frequencies at the GFN2-xTB⁷³ level were calculated (Figure 1 panel 3). Spectral similarity to the target experimental spectrum was then determined as a MSc value. In total, 555,254 candidate structures were processed at the GFN2-xTB level. Our filtering criteria (Figure 1, panel 4) were designed to reduce the computational cost of subsequent DFT steps (Figure 1 panels 5 and 6) for large bin sizes. Methods within the GFN family⁷⁴ were designed and parameterized to yield good geometries, frequencies, and noncovalent interactions and provide a suitable starting point for the evaluation of theoretical IR spectra at the subsequent DFT level. Altogether, 173,139 IR spectra were calculated at the DFT level in this study. The time required ranged from 1 min to 18 h per molecule with median and mode processing times of ∼26 and ∼5 min per molecule, respectively, using four parallel processors. By dividing the candidate list among ∼ 25 instances of IRdentify (each using four parallel processors), the total analysis time required for each bin of candidates was significantly reduced.

Using this approach, GFN2-xTB MSc values ranged from 0.375 to 0.916 (Figure 2a), while DFT MSc values ranged from 0.522 to 0.981 (Figure 2b). Ranking of the target compound within the candidate list varied between 1 and 4722 (Figure 2c) with GFN2-xTB and between 1 and 863 (Figure 2d) with DFT (full data list in Table S1). Both MSc values and rankings suggest that the final DFT step was better at predicting vibrational frequencies and intensities. Although the filtering step at the GFN2-xTB level worked well in significantly lowering the number of candidates passing to the DFT step (70% of the candidates were eliminated), 17 of the 258 test compounds (6.6%) were scored as false negatives at this step. These false negatives are shown as red squares in Figure 2a,c. As would be expected, the IR spectra of these 17 compounds were poorly predicted by GFN2-xTB, having significantly lower average MSc values (0.684 ± 0.081 vs 0.760 ± 0.097, p = 0.002) and thus poorer average rankings (1120 ± 1133 vs 94 ± 296, p = 0.002) compared to true positives. In order to determine whether a higher level of theory would have given better MSc values and rankings for these 17 false positives, DFT calculations were performed on all candidates in these bins (shown as red squares in Figure 2b,d, respectively). The average DFT MSc values for these 17 false negatives were not significantly different from the DFT MSc values for true positives (0.882 ± 0.067 vs 0.860 ± 0.053, respectively, p = 0.133), and although the 17 false negatives tended to have poorer average DFT rankings (120 ± 233 vs 22 ± 78, p = 0.114), this difference also was not significant. Thus, the benefit in speed afforded by the faster GFN2-xTB triaging step was offset by a 6.6% probability of incorrectly removing the correct compound from the final DFT ranking step.

Figure 2e shows the percentage of correctly identified structures found in the top k rank ≤ 20 using IRdentify. The green line shows results after GFN2-xTB ranking of all 258 bins, while the blue line shows the results after GFN2-xTB plus DFT ranking. A total of 14 and 47% of the 258 test compounds were ranked #1 in their respective MIMW bin with the use of GFN2-xTB and GFN2-xTB plus DFT, respectively; while a total of 47 and 79% of test compounds were ranked ≤20 in their respective MIMW bin with the use of GFN2-xTB and GFN2-xTB plus DFT, respectively. Overall, these results compare favorably with the currently used MS-based non-targeted methods including CSI:FingerID,³⁶ FingerID,⁷⁵ CFM-ID,⁷⁶ MAGMa,⁷⁷ MIDAS,⁷⁸ MS-Finder,⁶ and MetFrag.³³ It is important to note that our method cannot be directly compared to these other methods, some of which were based on considerable training data and benchmarked using different test compounds and different candidate lists. For example, CSI:FingerID would likely have given a higher value of correct identifications since many of our NIST test compounds were included in their training data.³⁶ However, for novel compounds not included in model training, or for which only experimental data are used, correct identification rates using MS-based methods are typically at, or below, 42%.³⁶ In addition, our method compares favorably with other MS-based methods in ranking approximately 80% of test compounds in the top 20. Figure S2 provides the corresponding data for all top k ranks.

Variables Affecting IRdentify Structure Identification.

Of our 258-compound test set, 17 were false negatives and 37 (shown in Table S2) were ranked >20. There are multiple factors that could influence the ranking results using IRdentify. Global optimization in predicting the vibrational frequency and intensity using GFN2-xTB and B3LYP-3c DFT has been discussed previously.⁷⁰ For flexible molecules, the prediction of structure conformation was likely important. In addition, the number of candidates in each bin and their structural diversity may have played a role. As shown in Table S1, the GFN2-xTB MSc values for the 17 false negatives ranged from 0.49 to 0.79, indicating that these spectra were poorly predicted. Even though it was difficult to establish a direct relation to particular functional groups, we observe that many of these compounds contain C≡C, C≡N, C═C, and C═N bonds (Figure S3), suggesting that inaccurately predicted vibrational frequencies and/or intensities for some functional groups may have played a role in poor GFN2-xTB MSc values.

In general, IR frequencies are dependent, to a variable degree, on conformation. Predicted vibrational frequencies are likely different in folded versus unfolded conformers and in hydrogen-bonded versus non-hydrogen-bonded structures. Furthermore, intensities can vary if the molecular dipole moment is influenced by conformation. Since the 2D-to-3D conversion (Figure 1 panel 2) involves a multidimensional energy surface search, it could potentially produce different energy conformers for multiple instances of IRdentify. Therefore, predicted vibrational frequencies/intensities, and consequently MSc values and rankings, could vary if the derived conformer was not near the global minimum energy. To assess conformational effects on GFN2-xTB rankings for the 17 false negative compounds, rankings were replicated by running IRdentify 10 times, starting from identical 2D input structures (Figure S4). The coefficient of the variation of percent ranking varied from 1 to 40% for these 17 false negative bins. Four of these (CID: 10887, 6065802, 138454, and 141704) occasionally exhibited a GFN2-xTB ranking of <25%. These compounds showed the highest variability in ranking (>30% coefficient of variation), suggesting conformation dependency. However, other compounds (CID: 6547 and CID: 98747) showed only a minor variation in ranking, indicating little or no conformation effect. A closer look at the structures of these compounds (Figure S3) reveals that conformation likely plays a more important role in ranking compounds with long carbon chains (e.g., CID: 10887) and compounds capable of forming intramolecular hydrogen bonds (e.g., CID: 138454). Overall, however, the 17 false negatives have significantly lower MSc values and therefore significantly lower rankings. Thus, improvements in frequency and intensity predictions would improve the ranking accuracy at the GFN2-xTB step. The 2D-to-3D conformation prediction step also plays a role in ranking; however, this is compound (and bin composition)-dependent and would be difficult to assess a priori.

We next provide representative examples (Figure 3) of compounds which were ranked by IRdentify with varying degrees of success at the DFT level. Figure 3a shows a qualitative comparison (predicted vs experimental) of ethyl 3-(2,6-dichlorophenyl)-5-methylisoxazole-4-carboxylate, and Figure 3b shows the same comparison for demetonthione. Both compounds were ranked #1 despite their relatively large differences in MSc values. Thus, the correct compound may be ranked #1, even with a mediocre MSc, if, for example, there are fewer candidates and/or if most candidates are structurally dissimilar compared to the correct structure. Figure 3c,d compares the experimental and DFT-calculated IR spectra of two false negative compounds CID: 10887 and CID: 98747, respectively. CID: 98747 ranked poorly at the GFN2-xTB level as well as the DFT level, indicating likely errors in frequency and/or intensity predictions at both levels of theory. CID: 10887 ranked 2/590 at the DFT level, even though it ranked poorly at the GFN2-xTB level (218/590). This was likely due to the higher level of theory used for predicting structure conformation at the DFT step. However, even at the DFT level, we found significant variability in the MSc calculation (Table S3). As would be expected, compounds containing more rotatable bonds (Table S3) showed greater MSc variability. This was probably due to the selection of conformers of slightly different minimum energies with each repeat in the CREST calculation. A potential strategy for improving conformation predictions would be to allocate more computational effort for predicting the lowest energy conformers, specifically for candidates with large numbers of rotatable bonds.

We further evaluated DFT rankings as a function of (a) DFT MSc, (b) the number of peaks in the experimental IR spectrum, (c) MIMW, and (d) the number of candidates used in DFT calculations (Figure S5). Surprisingly, there was no statistical relationship between rankings and the number of experimental IR peaks or the MIMW of the test compounds. Not surprisingly, better rankings were significantly associated with higher MSc values, and worse rankings were significantly associated with greater numbers of candidates in the bin. This suggests that improving the IR frequency and intensity predictions (i.e., the MSc) and decreasing the number of candidates evaluated at the DFT step would improve the ranking accuracy. However, even though these two associations were statistically significant, they explained a relatively modest amount of ranking variability. Thus, correct identification of an unknown compound using IRdentify depends on varying and interdependent interactions between (1) the accuracy in predicting conformation, (2) the accuracy in predicting IR absorption frequency and intensity, (3) the number of candidates, and (4) the structural diversity of the candidates in any given bin.

Our results suggest that improving the quantum mechanical predictions of IR frequency, intensity, and conformation would improve the overall ranking results. However, even with the improvements, there will likely be unknown structures/candidates for which IRdentify does poorly. This is also true for all analytical techniques and why having multiple orthogonal data improves analytical reliability. Furthermore, as the size of chemical structure databases (and therefore bin size) increases, resolution becomes a limiting factor. Thus, whenever possible, integrating multiple orthogonal non-targeted structure identification methods, as has been suggested by us⁷⁹ and others,^59,63 seems justified.

Composite Ranking Using IR and MS/MS Data.

Using a composite rank score which combines IR and MS/MS ranking data could potentially reduce false positive identifications, increase identification reliability, and synergistically enhance structure identification. We provide an exploratory evaluation of this approach using an additional set of 13 compounds from the NIST for which both IR and independently determined MS/MS data were available (Table S4). We used CSI:FingerID³⁶ for MS/MS-based ranking, and to maintain consistency, the same candidates were used for both methods. We also include FY as a 14th test compound. However, MS/MS and IR data for this compound were obtained from a different source and will be discussed separately below. We show the rankings of these compounds using each method independently and using a composite ranking method. CSI:FingerID ranked nine of these compounds as #1 within their candidate lists, and IRdentify ranked eight compounds as #1. Five compounds were ranked #1 by both methods. Ten compounds were ranked #1 with the use of the composite ranking method, and two more were ranked #2. Analysis of the top 20 candidates ranked by IRdentify and CSI:FingerID revealed that only a small number of candidates were common in both lists (average ∼4, unless the candidate list was <40), suggesting that the two methods were independent. Two compounds (spermine and N-methyloctadecylamine) were ranked >20 with IRdentify but <20 using composite ranking. These two compounds have flexible structures with long carbon backbones (rotatable bond counts are 11 and 17, respectively) and had the highest variability in MSc standard deviation (Table S3). In addition, compared to MS, IR spectroscopy provides limited information regarding the relative location of carbon chain branching, especially when there are many structurally similar candidates. It is tempting to speculate that when a candidate has a #1 ranking using both methods (five compounds in Table S4), it is far more likely to be the correct compound. These promising preliminary results highlight the potential of this approach for advancing chemical structure identification.

As mentioned above, in addition to the 13 compounds in Table S4, we also included FY as an example of using cryogenic ion vibrational spectroscopy as a method for obtaining an MS/MS spectrum and an IR spectrum on the same instrument. Menges et al.⁶⁴ previously described the use of this hyphenated instrument using FY as a model compound. The instrument combines a high-resolution mass spectrometer with a custom-built, cryogenic ion photo fragmentation IR spectrometer. Here, we used FY as a test unknown and retrieved candidates from PubChem based on its neutral MIMW (±10 ppm). Candidates were then protonated and independently ranked by CSI:Finger-ID and IRdentify. As shown (Table S4), protonated FY ranked #1 with CSI:FingerID and #9 with IRdentify. Analysis of the ranked candidate lists of CSI:FingerID and IRdentify revealed only three common candidates among the top 20 ranks of both methods. Of these three, FY-H⁺ had the best (lowest) composite score and rank.

Cryogenic ion spectroscopy can also be used to identify fragment ion structures. As described,⁶⁴ the three major fragment ions of [FY-H⁺] are a1 (120 m/z), y1 (182 m/z), and z1 (165 m/z). We retrieved candidate structures from PubChem using an MIMW ±10 ppm window for each fragment without assuming protonation as the ionization mechanism. Thus, only cationic and radical candidates were considered. As shown (Table S5), the top ranked candidate predicted for each fragment ion by IRdentify was identical to what Menges et al. proposed by manual inspection.⁶⁴ A notable observation was that the z1 fragment was predicted to converge to the more stable spiro compound by both IRdentify and Menges et al.

IR Spectra of TMS Derivatives.

The IR spectra of our 258 NIST experimental test compounds were acquired by GC–FTIR. They are therefore relatively volatile molecules and less likely to contain polar functional groups with hydrogen bond donors (e.g., −COOH, −OH, −NH, and −SH) since these groups tend to decrease volatility as well as interfere with GC analysis. Derivatizing hydrogen bond donors makes it possible to separate these polar compounds by GC and subsequently acquire their mass and IR spectra. The addition of one or more TMS ([−Si(CH₃)₃]) groups is a commonly used derivatization method, having a long history in analytical chemistry.⁸⁰ It is also widely used in metabolomics.^28,81–84 Unfortunately, we found very few experimental IR spectra of TMS-derivatized compounds in IR spectral databases. Therefore, we computed IR spectra at the DFT (B3LYP-3c) level for a small set of TMS-derivatized and underivatized compounds. These were chosen to represent the nine most common hydrogen-donating groups. We found that the TMS group introduced two intense peaks and one minor peak in the IR spectra at consistent absorption frequencies regardless of the hydrogen-donating group (Figure S6). Using an additional hydrogen-donating group (COOH), we also show that deuterated TMS (TMSD₉) gives the expected reduced-mass shift⁸⁵ of all three of these modes, making them easy to distinguish from other vibrational frequencies (Figure S7). As a further example, we repeated this theoretical analysis using one of our 258 NIST test compounds. This compound (CID: 12841) was selected because it was ranked number 1 out of 6422 candidates and had a DFT MSc of 0.943 (Table S1). Three characteristic TMS absorption bands were observed in the predicted IR spectrum of the TMSH₉ hydroxy derivative; and again, these three TMS frequencies were red-shifted in the predicted spectrum of the TMSD₉ derivative (Figure S8). Several other frequencies in this compound are predicted to be shifted in both TMSH₉ and TMSD₉ derivatives, and unique TMS-related frequencies/intensities are apparent. Although there is likely a limit to the number of TMS groups that can be added to any given compound, our theoretical results suggest that TMS derivatization (along with subsequent computational TMS derivatization of candidates) is a promising option for extending the applicability of IRdentify to compounds with hydrogen-donating groups. By adding orthologous IR ranking data, such an approach could potentially augment current GC–MS-based non-targeted metabolomics studies that typically use TMS derivatization in their workflows.⁸²

Limitations.

There are several limitations of the IRdentify method described here. Current quantum chemical methods are known to suffer from systematic errors, mainly due to basis set incompleteness, deficiencies in the treatment of electron–electron correlation, and neglect of anharmonicity.⁸⁶ These errors can be mitigated but at the expense of additional computational cost.^87–89 Semiempirical quantum mechanical techniques also suffer errors from potential energy surface and harmonic approximations, which are less systematic. In order to balance these deviations, many scaling schemes have been developed. In IRdentify, an atomic mass scaling approach was employed.⁷⁰ Even though mass scaling yields overall better results for GFN2-xTB and B3LYP-3c methods, there were still errors associated with theoretical oscillator frequency assignments when compared to experimental frequencies.

Without access to a significant computational infrastructure for running multiple instances of IRdentify, calculation of vibrational frequencies at the DFT level would be excessively time-consuming. This is rapidly becoming less of a problem as computational hardware is now more routinely available. We reduced the computation time by using GFN2-xTB to filter out approximately 70% of candidates entering DFT level calculations, but this increase in speed resulted in false negatives. We did not evaluate other filtering options. For example, MS/MS-based ranking or machine learning approaches might be better options for the initial filtering of large bins as these methods tend to be very fast. Ultimately, as databases grow larger, and candidate numbers increase, resolution becomes a limiting factor. Our DFT MSc values ranged between 0.5 and 0.96. Thus, ranking thousands of candidates with very similar MSc values becomes statistically intractable without the use of orthogonal filtering methods to reduce candidate numbers.

In conclusion, we report the use of IR spectrometry for non-targeted searching of chemical databases to identify unknown metabolites. Overall, this approach resulted in identification rates comparable with that of current mass spectrum-based methods. We also demonstrated that identification rates could be improved by combining IRdentify ranking and mass spectra-based ranking in a single composite score. Protonated molecules and fragment ions with a positive charge were successfully identified, demonstrating the applicability of this approach for identifying charged compounds. In addition, we provide theoretical evidence that this approach is compatible with TMS derivatization workflows. This work opens new possibilities for advancing non-targeted metabolomics, as well as other analytical techniques that rely on non-targeted structure determination.

Supplementary Material

SI-1_with title page

NIHMS1727968-supplement-SI-1_with_title_page.pdf^{(1.6MB, pdf)}

SI-2_with title page

NIHMS1727968-supplement-SI-2_with_title_page.xlsx^{(529.9KB, xlsx)}

ACKNOWLEDGMENTS

We thank the Computational Biology Core, Institute for Systems Genomics at the University of Connecticut for providing computational resources. D.F.G. was funded by the NIH grant GM087714. S.G. was funded by the DFG (Deutsche Forschungsgemeinschaft) in the framework of the “Gottfried Wilhelm Leibniz Prize”. We thank Graham Roberts for help with PubChem database prefiltering. Chemaxon’s Marvin View and JChem programs (https://chemaxon.com) were used to calculate molecular weights and elemental formula, determine IUPAC names of compounds, and convert the SMILES structure format to the sdf format for use in the IRdentify program.

Footnotes

ASSOCIATED CONTENT

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.1c02244.

Additional experimental details (and associated references); characterization of the 258 test compounds used; percentage of correctly identified structures found in the top k output of IRdentify; molecular structures of test compounds with GFN2-xTB rankings >25%; conformational effect on GFN2-xTB rankings; regression analysis of DFT (B3LYP-3c) ranking as a function of DFT MSc, the number of peaks in the experimental IR spectra, MIMW, and the number of candidates used in DFT calculations; comparison of theoretical IR spectra of underivatized and TMS-derivatized representative compounds containing hydrogen bond-donating groups; association of theoretical IR absorption bands to functional groups in acetic acid; and theoretical DFT (B3LYP-3c) IR absorption spectra of underivatized, TMSH₉-derivatized, and TMSH₉-derivatized 2-[4-(1,1-dimethylethyl)phenoxy]-ethanol (PDF)

Comparison of test compounds using GFN2-xTB and DFT-B3LYP-3c; test compounds that exhibited poor DFT rankings and their corresponding GFN2-xTB and DFT ranks; variability of the DFT Msc; composite scoring and ranking using IRdentify and CSI:FingerID; and identification of collision-induced fragment ions using IRdentify (XLSX)

The authors declare no competing financial interest.

Data sharing plans; IRdentify is freely available at: github.com/grimme-lab.

Contributor Information

Erandika Karunaratne, Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States.

Dennis W. Hill, Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States

Philipp Pracht, Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, 53115 Bonn, Germany.

José A. Gascón, Department of Chemistry, University of Connecticut, Storrs, Connecticut 06269, United States.

Stefan Grimme, Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, 53115 Bonn, Germany.

David F. Grant, Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States.

REFERENCES

(1).Domenick TM; Gill EL; Vedam-Mai V; Yost RA Anal. Chem 2021, 93, 546–566. [DOI] [PubMed] [Google Scholar]
(2).Dias D; Jones O; Beale D; Boughton B; Benheim D; Kouremenos K; Wolfender J-L; Wishart D Metabolites 2016, 6, 46. [DOI] [PMC free article] [PubMed] [Google Scholar]
(3).Böcker S Curr. Opin. Chem. Biol 2017, 36, 1–6. [DOI] [PubMed] [Google Scholar]
(4).Blaženović I; Kind T; Ji J; Fiehn O Metabolites 2018, 8, 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
(5).Kind T; Tsugawa H; Cajka T; Ma Y; Lai Z; Mehta SS; Wohlgemuth G; Barupal DK; Showalter MR; Arita M; Fiehn O Mass Spectrom. Rev 2018, 37, 513–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
(6).Lai Z; Tsugawa H; Wohlgemuth G; Mehta S; Mueller M; Zheng Y; Ogiwara A; Meissen J; Showalter M; Takeuchi K; Kind T; Beal P; Arita M; Fiehn O Nat. Methods 2018, 15, 53–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
(7).Domingo-Almenara X; Montenegro-Burke JR; Guijas C; Majumder EL-W; Benton HP; Siuzdak G Anal. Chem 2019, 91, 3246–3253. [DOI] [PMC free article] [PubMed] [Google Scholar]
(8).Domingo-Almenara X; Siuzdak G Methods Mol. Biol 2020, 2104, 11–24. [DOI] [PubMed] [Google Scholar]
(9).Montenegro-Burke JR; Guijas C; Siuzdak G Methods Mol. Biol 2020, 2104, 149–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
(10).Xue J; Guijas C; Benton HP; Warth B; Siuzdak G Nat. Methods 2020, 17, 953–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
(11).Wishart DS; Feunang YD; Marcu A; Guo AC; Liang K; Vázquez-Fresno R; Sajed T; Johnson D; Li C; Karu N; Sayeeda Z; Lo E; Assempour N; Berjanskii M; Singhal S; Arndt D; Liang Y; Badran H; Grant J; Serra-Cayuela A; Liu Y; Mandal R; Neveu V; Pon A; Knox C; Wilson M; Manach C; Scalbert A Nucleic Acids Res 2018, 46, D608–D617. [DOI] [PMC free article] [PubMed] [Google Scholar]
(12).Guijas C; Montenegro-Burke JR; Domingo-Almenara X; Palermo A; Warth B; Hermann G; Koellensperger G; Huan T; Uritboonthai W; Aisporna AE; Wolan DW; Spilker ME; Benton HP; Siuzdak G Anal. Chem 2018, 90, 3156–3164. [DOI] [PMC free article] [PubMed] [Google Scholar]
(13).Kanehisa M; Furumichi M; Tanabe M; Sato Y; Morishima K Nucleic Acids Res 2017, 45, D353–D361. [DOI] [PMC free article] [PubMed] [Google Scholar]
(14).Karp PD; Billington R; Caspi R; Fulcher CA; Latendresse M; Kothari A; Keseler IM; Krummenacker M; Midford PE; Ong Q; Ong WK; Paley SM; Subhraveti P Briefings Bioinf 2017, 20, 1085–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
(15).Kim S; Chen J; Cheng T; Gindulyte A; He J; He S; Li Q; Shoemaker BA; Thiessen PA; Yu B; Zaslavsky L; Zhang J; Bolton EE Nucleic Acids Res 2019, 47, D1102–D1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
(16).Pence HE; Williams A J. Chem. Educ 2010, 87, 1123–1124. [Google Scholar]
(17).da Silva RR; Dorrestein PC; Quinn RA Proc. Natl. Acad. Sci. U.S.A 2015, 112, 12549–12550. [DOI] [PMC free article] [PubMed] [Google Scholar]
(18).Uppal K; Walker DI; Liu K; Li S; Go Y-M; Jones DP Chem. Res. Toxicol 2016, 29, 1956–1975. [DOI] [PMC free article] [PubMed] [Google Scholar]
(19).Peisl BYL; Schymanski EL; Wilmes P Anal. Chim. Acta 2018, 1037, 13–27. [DOI] [PubMed] [Google Scholar]
(20).Kowarsky M; Camunas-Soler J; Kertesz M; De Vlaminck I; Koh W; Pan W; Martin L; Neff NF; Okamoto J; Wong RJ; Kharbanda S; El-Sayed Y; Blumenfeld Y; Stevenson DK; Shaw GM; Wolfe ND; Quake SR Proc. Natl. Acad. Sci. U.S.A 2017, 114, 9623–9628. [DOI] [PMC free article] [PubMed] [Google Scholar]
(21).Johnson CH; Athersuch TJ; Collman GW; Dhungana S; Grant DF; Jones DP; Patel CJ; Vasiliou V Hum. Genomics 2017, 11, 32. [DOI] [PMC free article] [PubMed] [Google Scholar]
(22).Kind T; Liu K-H; Lee DY; DeFelice B; Meissen JK; Fiehn O Nat. Methods 2013, 10, 755–758. [DOI] [PMC free article] [PubMed] [Google Scholar]
(23).Bou Khalil M; Hou W; Zhou H; Elisma F; Swayne LA; Blanchard AP; Yao Z; Bennett SAL; Figeys D Mass Spectrom. Rev 2010, 29, 877–929. [DOI] [PubMed] [Google Scholar]
(24).Raghu P Proc. Natl. Acad. Sci. U.S.A 2020, 117, 11191–11193. [DOI] [PMC free article] [PubMed] [Google Scholar]
(25).Tsugawa H; Ikeda K; Takahashi M; Satoh A; Mori Y; Uchino H; Okahashi N; Yamada Y; Tada I; Bonini P; Higashi Y; Okazaki Y; Zhou Z; Zhu Z-J; Koelmel J; Cajka T; Fiehn O; Saito K; Arita M; Arita M Nat. Biotechnol 2020, 38, 1159–1163. [DOI] [PubMed] [Google Scholar]
(26).Showalter MR; Cajka T; Fiehn O Curr. Opin. Chem. Biol 2017, 36, 70–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
(27).Ngô HM; Zhou Y; Lorenzi H; Wang K; Kim T-K; Zhou Y; El Bissati K; Mui E; Fraczek L; Rajagopala SV; Roberts CW; Henriquez FL; Montpetit A; Blackwell JM; Jamieson SE; Wheeler K; Begeman IJ; Naranjo-Galvis C; Alliey-Rodriguez N; Davis RG; Soroceanu L; Cobbs C; Steindler DA; Boyer K; Noble AG; Swisher CN; Heydemann PT; Rabiah P; Withers S; Soteropoulos P; Hood L; McLeod R Sci. Rep 2017, 7, 11496. [DOI] [PMC free article] [PubMed] [Google Scholar]
(28).Fiehn O Curr. Protoc. Mol. Biol 2016, 114, 30.4.1–30.4.32. [DOI] [PMC free article] [PubMed] [Google Scholar]
(29).Hill DW; Kertesz TM; Fontaine D; Friedman R; Grant DF Anal. Chem 2008, 80, 5574–5582. [DOI] [PubMed] [Google Scholar]
(30).Bauer CA; Grimme S J. Phys. Chem. A 2016, 120, 3755–3766. [DOI] [PubMed] [Google Scholar]
(31).Wang S; Kind T; Tantillo DJ; Fiehn OJ Cheminf 2020, 12, 63. [DOI] [PMC free article] [PubMed] [Google Scholar]
(32).Allen F; Greiner R; Wishart D Metabolomics 2015, 11, 98–110. [Google Scholar]
(33).Wolf S; Schmidt S; Müller-Hannemann M; Neumann S BMC Bioinf 2010, 11, 148. [DOI] [PMC free article] [PubMed] [Google Scholar]
(34).Ridder L; van der Hooft JJ; Verhoeven S Mass Spectrom 2014, 3, S0033. [DOI] [PMC free article] [PubMed] [Google Scholar]
(35).Tsugawa H; Kind T; Nakabayashi R; Yukihira D; Tanaka W; Cajka T; Saito K; Fiehn O; Arita M Anal. Chem 2016, 88, 7946–7958. [DOI] [PMC free article] [PubMed] [Google Scholar]
(36).Dührkop K; Fleischauer M; Ludwig M; Aksenov AA; Melnik AV; Meusel M; Dorrestein PC; Rousu J; Böcker S Nat. Methods 2019, 16, 299–302. [DOI] [PubMed] [Google Scholar]
(37).Brouard C; Shen H; Dührkop K; d’Alché-Buc F; Böcker S; Rousu J Bioinformatics 2016, 32, i28–i36. [DOI] [PMC free article] [PubMed] [Google Scholar]
(38).Scheubert K; Hufsky F; Petras D; Wang M; Nothias L-F; Dührkop K; Bandeira N; Dorrestein PC; Böcker S Nat. Commun 2017, 8, 1494. [DOI] [PMC free article] [PubMed] [Google Scholar]
(39).Palmer A; Phapale P; Chernyavsky I; Lavigne R; Fay D; Tarasov A; Kovalev V; Fuchser J; Nikolenko S; Pineau C; Becker M; Alexandrov T Nat. Methods 2017, 14, 57–60. [DOI] [PubMed] [Google Scholar]
(40).Wang X; Jones DR; Shaw TI; Cho J-H; Wang Y; Tan H; Xie B; Zhou S; Li Y; Peng JJ Proteome Res 2018, 17, 2328–2334. [DOI] [PMC free article] [PubMed] [Google Scholar]
(41).Wei X; Koo I; Kim S; Zhang X Analyst 2014, 139, 2507–2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
(42).Abate-Pella D; Freund DM; Ma Y; Simón-Manso Y; Hollender J; Broeckling CD; Huhman DV; Krokhin OV; Stoll DR; Hegeman AD; Kind T; Fiehn O; Schymanski EL; Prenni JE; Sumner LW; Boswell PG J. Chromatogr. A 2015, 1412, 43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
(43).Stanstrup J; Neumann S; Vrhovsěk U Anal. Chem 2015, 87, 9421–9428. [DOI] [PubMed] [Google Scholar]
(44).Wolfer AM; Lozano S; Umbdenstock T; Croixmarie V; Arrault A; Vayer P Metabolomics 2016, 12, 8. [Google Scholar]
(45).Hall LM; Hill DW; Bugden K; Cawley S; Hall LH; Chen M-H; Grant DF J. Chem. Inf. Model 2018, 58, 591–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
(46).Bach E; Szedmak S; Brouard C; Böcker S; Rousu J Bioinformatics 2018, 34, i875–i883. [DOI] [PubMed] [Google Scholar]
(47).Bonini P; Kind T; Tsugawa H; Barupal DK; Fiehn O Anal. Chem 2020, 92, 7515–7522. [DOI] [PMC free article] [PubMed] [Google Scholar]
(48).Dubey R; Hill DW; Lai S; Chen M-H; Grant DF Metabolomics 2015, 11, 753–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
(49).Hall LM; Hall LH; Kertesz TM; Hill DW; Sharp TR; Oblak EZ; Dong YW; Wishart DS; Chen M-H; Grant DF J. Chem. Inf. Model 2012, 52, 1222–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
(50).Crowell KL; Baker ES; Payne SH; Ibrahim YM; Monroe ME; Slysz GW; LaMarche BL; Petyuk VA; Piehowski PD; Danielson WF III; Anderson GA; Smith RD Int. J. Mass Spectrom 2013, 354–355, 312–317. [DOI] [PMC free article] [PubMed] [Google Scholar]
(51).Paglia G; Williams JP; Menikarachchi L; Thompson JW; Tyldesley-Worster R; Halldórsson S; Rolfsson O; Moseley A; Grant D; Langridge J; Palsson BO; Astarita G Anal. Chem 2014, 86, 3985–3993. [DOI] [PMC free article] [PubMed] [Google Scholar]
(52).Paglia G; Angel P; Williams JP; Richardson K; Olivos HJ; Thompson JW; Menikarachchi L; Lai S; Walsh C; Moseley A; Plumb RS; Grant DF; Palsson BO; Langridge J; Geromanos S; Astarita G Anal. Chem 2015, 87, 1137–1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
(53).Hamdalla MA; Mandoiu II; Hill DW; Rajasekaran S; Grant DF J. Chem. Inf. Model 2013, 53, 601–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
(54).Beć KB; Grabska J; Huck CW Anal. Chim. Acta 2020, 1133, 150–177. [DOI] [PubMed] [Google Scholar]
(55).Lanzarotta A; Falconer T; McCauley H; Lorenz L; Albright D; Crowe J; Batson J Appl. Spectrosc 2017, 71, 1050–1059. [DOI] [PubMed] [Google Scholar]
(56).Lanzarotta A; Lorenz L; Voelker S; Falconer TM; Batson JS Appl. Spectrosc 2018, 72, 750–756. [DOI] [PubMed] [Google Scholar]
(57).Zavahir JS; Smith JSP; Blundell S; Waktola HD; Nolvachai Y; Wood BR; Marriott PJ Separations 2020, 7, 27. [Google Scholar]
(58).Martens J; Berden G; Gebhardt CR; Oomens J Rev. Sci. Instrum 2016, 87, 103108. [DOI] [PubMed] [Google Scholar]
(59).Kranenburg RF; van Geenen FAMG; Berden G; Oomens J; Martens J; van Asten AC Anal. Chem 2020, 92, 7282–7288. [DOI] [PMC free article] [PubMed] [Google Scholar]
(60).Martens J; Berden G; Bentlage H; Coene KLM; Engelke UF; Wishart D; van Scherpenzeel M; Kluijtmans LAJ; Wevers RA; Oomens J J. Inherited Metab. Dis 2018, 41, 367–377. [DOI] [PMC free article] [PubMed] [Google Scholar]
(61).Martens J; Berden G; van Outersterp RE; Kluijtmans LAJ; Engelke UF; van Karnebeek CDM; Wevers RA; Oomens J Sci. Rep 2017, 7, 3363. [DOI] [PMC free article] [PubMed] [Google Scholar]
(62).Martens J; Koppen V; Berden G; Cuyckens F; Oomens J Anal. Chem 2017, 89, 4359–4362. [DOI] [PMC free article] [PubMed] [Google Scholar]
(63).Martens J; van Outersterp RE; Vreeken RJ; Cuyckens F; Coene KLM; Engelke UF; Kluijtmans LAJ; Wevers RA; Buydens LMC; Redlich B; Berden G; Oomens J Anal. Chim. Acta 2020, 1093, 1–15. [DOI] [PubMed] [Google Scholar]
(64).Menges FS; Perez EH; Edington SC; Duong CH; Yang N; Johnson MA J. Am. Soc. Mass Spectrom 2019, 30, 1551–1557. [DOI] [PMC free article] [PubMed] [Google Scholar]
(65).Pople JA; Schlegel HB; Krishnan R; Defrees DJ; Binkley JS; Frisch MJ; Whiteside RA; Hout RF; Hehre WJ Int. J. Quantum Chem 1981, 20, 269–278. [Google Scholar]
(66).Gastegger M; Behler J; Marquetand P Chem. Sci 2017, 8, 6924–6935. [DOI] [PMC free article] [PubMed] [Google Scholar]
(67).Yuanyuan C; Zhibin W Chemom. Intell. Lab. Syst 2018, 181, 1–10. [Google Scholar]
(68).Katsyuba SA; Zvereva EE; Grimme S J. Phys. Chem. A 2019, 123, 3802–3808. [DOI] [PubMed] [Google Scholar]
(69).Henschel H; van der Spoel D J. Phys. Chem. Lett 2020, 11, 5471–5475. [DOI] [PMC free article] [PubMed] [Google Scholar]
(70).Pracht P; Grant DF; Grimme S J. Chem. Theory Comput 2020, 16, 7044–7060. [DOI] [PMC free article] [PubMed] [Google Scholar]
(71).Clifton C; Gallagher J; Shamin A; Stein S; Zohdi H NIST/EPA Gas Phase Infrared Library; NIST Standard Reference Database Number 35; National Institute of Standards and Technology: Gaithersburg MD, 20899, 2007. [Google Scholar]
(72).Pracht P; Bohle F; Grimme S Phys. Chem. Chem. Phys 2020, 22, 7169–7192. [DOI] [PubMed] [Google Scholar]
(73).Bannwarth C; Ehlert S; Grimme S J. Chem. Theory Comput 2019, 15, 1652–1671. [DOI] [PubMed] [Google Scholar]
(74).Bannwarth C; Caldeweyher E; Ehlert S; Hansen A; Pracht P; Seibert J; Spicher S; Grimme S Wiley Interdiscip. Rev. Comput. Mol. Sci 2021, 11, No. e1493. [Google Scholar]
(75).Heinonen M; Shen H; Zamboni N; Rousu J Bioinformatics 2012, 28, 2333–2341. [DOI] [PubMed] [Google Scholar]
(76).Allen F; Pon A; Wilson M; Greiner R; Wishart D Nucleic Acids Res 2014, 42, W94–W99. [DOI] [PMC free article] [PubMed] [Google Scholar]
(77).Ridder L; van der Hooft JJJ; Verhoeven S; de Vos RCH; van Schaik R; Vervoort J Rapid Commun. Mass Spectrom 2012, 26, 2461–2471. [DOI] [PubMed] [Google Scholar]
(78).Wang Y; Kora G; Bowen BP; Pan C Anal. Chem 2014, 86, 9496–9503. [DOI] [PubMed] [Google Scholar]
(79).Menikarachchi LC; Hamdalla MA; Hill DW; Grant DF Comput. Struct. Biotechnol. J 2013, 5, No. e201302005. [DOI] [PMC free article] [PubMed] [Google Scholar]
(80).Poole CF J. Chromatogr. A 2013, 1296, 2–14. [DOI] [PubMed] [Google Scholar]
(81).Kumari S; Stevens D; Kind T; Denkert C; Fiehn O Anal. Chem 2011, 83, 5895–5902. [DOI] [PMC free article] [PubMed] [Google Scholar]
(82).Lai Z; Fiehn O Mass Spectrom. Rev 2018, 37, 245–257. [DOI] [PubMed] [Google Scholar]
(83).Khodadadi M; Pourfarzam M Metabolomics 2020, 16, 66. [DOI] [PubMed] [Google Scholar]
(84).Harvey DJ; Vouros P Mass Spectrom. Rev 2020, 39, 105–211. [DOI] [PubMed] [Google Scholar]
(85).Harris DC; Bertolucci MD Symmetry and Spectroscopy: An Introduction To Vibrational And Electronic Spectroscopy; Oxford University Press: New York, 2014; p 550. [Google Scholar]
(86).Scott AP; Radom L J. Phys. Chem 1996, 100, 16502–16513. [Google Scholar]
(87).Becke AD J. Chem. Phys 2014, 140, 18A301. [DOI] [PubMed] [Google Scholar]
(88).Houk KN; Liu F Acc. Chem. Res 2017, 50, 539–543. [DOI] [PubMed] [Google Scholar]
(89).Grimme S; Schreiner PR Angew. Chem., Int. Ed. Engl 2018, 57, 4170–4176. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SI-1_with title page

NIHMS1727968-supplement-SI-1_with_title_page.pdf^{(1.6MB, pdf)}

SI-2_with title page

NIHMS1727968-supplement-SI-2_with_title_page.xlsx^{(529.9KB, xlsx)}

[R1] (1).Domenick TM; Gill EL; Vedam-Mai V; Yost RA Anal. Chem 2021, 93, 546–566. [DOI] [PubMed] [Google Scholar]

[R2] (2).Dias D; Jones O; Beale D; Boughton B; Benheim D; Kouremenos K; Wolfender J-L; Wishart D Metabolites 2016, 6, 46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] (3).Böcker S Curr. Opin. Chem. Biol 2017, 36, 1–6. [DOI] [PubMed] [Google Scholar]

[R4] (4).Blaženović I; Kind T; Ji J; Fiehn O Metabolites 2018, 8, 31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] (5).Kind T; Tsugawa H; Cajka T; Ma Y; Lai Z; Mehta SS; Wohlgemuth G; Barupal DK; Showalter MR; Arita M; Fiehn O Mass Spectrom. Rev 2018, 37, 513–532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] (6).Lai Z; Tsugawa H; Wohlgemuth G; Mehta S; Mueller M; Zheng Y; Ogiwara A; Meissen J; Showalter M; Takeuchi K; Kind T; Beal P; Arita M; Fiehn O Nat. Methods 2018, 15, 53–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] (7).Domingo-Almenara X; Montenegro-Burke JR; Guijas C; Majumder EL-W; Benton HP; Siuzdak G Anal. Chem 2019, 91, 3246–3253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] (8).Domingo-Almenara X; Siuzdak G Methods Mol. Biol 2020, 2104, 11–24. [DOI] [PubMed] [Google Scholar]

[R9] (9).Montenegro-Burke JR; Guijas C; Siuzdak G Methods Mol. Biol 2020, 2104, 149–163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] (10).Xue J; Guijas C; Benton HP; Warth B; Siuzdak G Nat. Methods 2020, 17, 953–954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] (11).Wishart DS; Feunang YD; Marcu A; Guo AC; Liang K; Vázquez-Fresno R; Sajed T; Johnson D; Li C; Karu N; Sayeeda Z; Lo E; Assempour N; Berjanskii M; Singhal S; Arndt D; Liang Y; Badran H; Grant J; Serra-Cayuela A; Liu Y; Mandal R; Neveu V; Pon A; Knox C; Wilson M; Manach C; Scalbert A Nucleic Acids Res 2018, 46, D608–D617. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] (12).Guijas C; Montenegro-Burke JR; Domingo-Almenara X; Palermo A; Warth B; Hermann G; Koellensperger G; Huan T; Uritboonthai W; Aisporna AE; Wolan DW; Spilker ME; Benton HP; Siuzdak G Anal. Chem 2018, 90, 3156–3164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] (13).Kanehisa M; Furumichi M; Tanabe M; Sato Y; Morishima K Nucleic Acids Res 2017, 45, D353–D361. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] (14).Karp PD; Billington R; Caspi R; Fulcher CA; Latendresse M; Kothari A; Keseler IM; Krummenacker M; Midford PE; Ong Q; Ong WK; Paley SM; Subhraveti P Briefings Bioinf 2017, 20, 1085–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] (15).Kim S; Chen J; Cheng T; Gindulyte A; He J; He S; Li Q; Shoemaker BA; Thiessen PA; Yu B; Zaslavsky L; Zhang J; Bolton EE Nucleic Acids Res 2019, 47, D1102–D1109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] (16).Pence HE; Williams A J. Chem. Educ 2010, 87, 1123–1124. [Google Scholar]

[R17] (17).da Silva RR; Dorrestein PC; Quinn RA Proc. Natl. Acad. Sci. U.S.A 2015, 112, 12549–12550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] (18).Uppal K; Walker DI; Liu K; Li S; Go Y-M; Jones DP Chem. Res. Toxicol 2016, 29, 1956–1975. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] (19).Peisl BYL; Schymanski EL; Wilmes P Anal. Chim. Acta 2018, 1037, 13–27. [DOI] [PubMed] [Google Scholar]

[R20] (20).Kowarsky M; Camunas-Soler J; Kertesz M; De Vlaminck I; Koh W; Pan W; Martin L; Neff NF; Okamoto J; Wong RJ; Kharbanda S; El-Sayed Y; Blumenfeld Y; Stevenson DK; Shaw GM; Wolfe ND; Quake SR Proc. Natl. Acad. Sci. U.S.A 2017, 114, 9623–9628. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] (21).Johnson CH; Athersuch TJ; Collman GW; Dhungana S; Grant DF; Jones DP; Patel CJ; Vasiliou V Hum. Genomics 2017, 11, 32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] (22).Kind T; Liu K-H; Lee DY; DeFelice B; Meissen JK; Fiehn O Nat. Methods 2013, 10, 755–758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] (23).Bou Khalil M; Hou W; Zhou H; Elisma F; Swayne LA; Blanchard AP; Yao Z; Bennett SAL; Figeys D Mass Spectrom. Rev 2010, 29, 877–929. [DOI] [PubMed] [Google Scholar]

[R24] (24).Raghu P Proc. Natl. Acad. Sci. U.S.A 2020, 117, 11191–11193. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] (25).Tsugawa H; Ikeda K; Takahashi M; Satoh A; Mori Y; Uchino H; Okahashi N; Yamada Y; Tada I; Bonini P; Higashi Y; Okazaki Y; Zhou Z; Zhu Z-J; Koelmel J; Cajka T; Fiehn O; Saito K; Arita M; Arita M Nat. Biotechnol 2020, 38, 1159–1163. [DOI] [PubMed] [Google Scholar]

[R26] (26).Showalter MR; Cajka T; Fiehn O Curr. Opin. Chem. Biol 2017, 36, 70–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] (27).Ngô HM; Zhou Y; Lorenzi H; Wang K; Kim T-K; Zhou Y; El Bissati K; Mui E; Fraczek L; Rajagopala SV; Roberts CW; Henriquez FL; Montpetit A; Blackwell JM; Jamieson SE; Wheeler K; Begeman IJ; Naranjo-Galvis C; Alliey-Rodriguez N; Davis RG; Soroceanu L; Cobbs C; Steindler DA; Boyer K; Noble AG; Swisher CN; Heydemann PT; Rabiah P; Withers S; Soteropoulos P; Hood L; McLeod R Sci. Rep 2017, 7, 11496. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] (28).Fiehn O Curr. Protoc. Mol. Biol 2016, 114, 30.4.1–30.4.32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] (29).Hill DW; Kertesz TM; Fontaine D; Friedman R; Grant DF Anal. Chem 2008, 80, 5574–5582. [DOI] [PubMed] [Google Scholar]

[R30] (30).Bauer CA; Grimme S J. Phys. Chem. A 2016, 120, 3755–3766. [DOI] [PubMed] [Google Scholar]

[R31] (31).Wang S; Kind T; Tantillo DJ; Fiehn OJ Cheminf 2020, 12, 63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] (32).Allen F; Greiner R; Wishart D Metabolomics 2015, 11, 98–110. [Google Scholar]

[R33] (33).Wolf S; Schmidt S; Müller-Hannemann M; Neumann S BMC Bioinf 2010, 11, 148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] (34).Ridder L; van der Hooft JJ; Verhoeven S Mass Spectrom 2014, 3, S0033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] (35).Tsugawa H; Kind T; Nakabayashi R; Yukihira D; Tanaka W; Cajka T; Saito K; Fiehn O; Arita M Anal. Chem 2016, 88, 7946–7958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] (36).Dührkop K; Fleischauer M; Ludwig M; Aksenov AA; Melnik AV; Meusel M; Dorrestein PC; Rousu J; Böcker S Nat. Methods 2019, 16, 299–302. [DOI] [PubMed] [Google Scholar]

[R37] (37).Brouard C; Shen H; Dührkop K; d’Alché-Buc F; Böcker S; Rousu J Bioinformatics 2016, 32, i28–i36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] (38).Scheubert K; Hufsky F; Petras D; Wang M; Nothias L-F; Dührkop K; Bandeira N; Dorrestein PC; Böcker S Nat. Commun 2017, 8, 1494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] (39).Palmer A; Phapale P; Chernyavsky I; Lavigne R; Fay D; Tarasov A; Kovalev V; Fuchser J; Nikolenko S; Pineau C; Becker M; Alexandrov T Nat. Methods 2017, 14, 57–60. [DOI] [PubMed] [Google Scholar]

[R40] (40).Wang X; Jones DR; Shaw TI; Cho J-H; Wang Y; Tan H; Xie B; Zhou S; Li Y; Peng JJ Proteome Res 2018, 17, 2328–2334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] (41).Wei X; Koo I; Kim S; Zhang X Analyst 2014, 139, 2507–2514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] (42).Abate-Pella D; Freund DM; Ma Y; Simón-Manso Y; Hollender J; Broeckling CD; Huhman DV; Krokhin OV; Stoll DR; Hegeman AD; Kind T; Fiehn O; Schymanski EL; Prenni JE; Sumner LW; Boswell PG J. Chromatogr. A 2015, 1412, 43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] (43).Stanstrup J; Neumann S; Vrhovsěk U Anal. Chem 2015, 87, 9421–9428. [DOI] [PubMed] [Google Scholar]

[R44] (44).Wolfer AM; Lozano S; Umbdenstock T; Croixmarie V; Arrault A; Vayer P Metabolomics 2016, 12, 8. [Google Scholar]

[R45] (45).Hall LM; Hill DW; Bugden K; Cawley S; Hall LH; Chen M-H; Grant DF J. Chem. Inf. Model 2018, 58, 591–604. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] (46).Bach E; Szedmak S; Brouard C; Böcker S; Rousu J Bioinformatics 2018, 34, i875–i883. [DOI] [PubMed] [Google Scholar]

[R47] (47).Bonini P; Kind T; Tsugawa H; Barupal DK; Fiehn O Anal. Chem 2020, 92, 7515–7522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] (48).Dubey R; Hill DW; Lai S; Chen M-H; Grant DF Metabolomics 2015, 11, 753–763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] (49).Hall LM; Hall LH; Kertesz TM; Hill DW; Sharp TR; Oblak EZ; Dong YW; Wishart DS; Chen M-H; Grant DF J. Chem. Inf. Model 2012, 52, 1222–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] (50).Crowell KL; Baker ES; Payne SH; Ibrahim YM; Monroe ME; Slysz GW; LaMarche BL; Petyuk VA; Piehowski PD; Danielson WF III; Anderson GA; Smith RD Int. J. Mass Spectrom 2013, 354–355, 312–317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] (51).Paglia G; Williams JP; Menikarachchi L; Thompson JW; Tyldesley-Worster R; Halldórsson S; Rolfsson O; Moseley A; Grant D; Langridge J; Palsson BO; Astarita G Anal. Chem 2014, 86, 3985–3993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] (52).Paglia G; Angel P; Williams JP; Richardson K; Olivos HJ; Thompson JW; Menikarachchi L; Lai S; Walsh C; Moseley A; Plumb RS; Grant DF; Palsson BO; Langridge J; Geromanos S; Astarita G Anal. Chem 2015, 87, 1137–1144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] (53).Hamdalla MA; Mandoiu II; Hill DW; Rajasekaran S; Grant DF J. Chem. Inf. Model 2013, 53, 601–612. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] (54).Beć KB; Grabska J; Huck CW Anal. Chim. Acta 2020, 1133, 150–177. [DOI] [PubMed] [Google Scholar]

[R55] (55).Lanzarotta A; Falconer T; McCauley H; Lorenz L; Albright D; Crowe J; Batson J Appl. Spectrosc 2017, 71, 1050–1059. [DOI] [PubMed] [Google Scholar]

[R56] (56).Lanzarotta A; Lorenz L; Voelker S; Falconer TM; Batson JS Appl. Spectrosc 2018, 72, 750–756. [DOI] [PubMed] [Google Scholar]

[R57] (57).Zavahir JS; Smith JSP; Blundell S; Waktola HD; Nolvachai Y; Wood BR; Marriott PJ Separations 2020, 7, 27. [Google Scholar]

[R58] (58).Martens J; Berden G; Gebhardt CR; Oomens J Rev. Sci. Instrum 2016, 87, 103108. [DOI] [PubMed] [Google Scholar]

[R59] (59).Kranenburg RF; van Geenen FAMG; Berden G; Oomens J; Martens J; van Asten AC Anal. Chem 2020, 92, 7282–7288. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] (60).Martens J; Berden G; Bentlage H; Coene KLM; Engelke UF; Wishart D; van Scherpenzeel M; Kluijtmans LAJ; Wevers RA; Oomens J J. Inherited Metab. Dis 2018, 41, 367–377. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] (61).Martens J; Berden G; van Outersterp RE; Kluijtmans LAJ; Engelke UF; van Karnebeek CDM; Wevers RA; Oomens J Sci. Rep 2017, 7, 3363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] (62).Martens J; Koppen V; Berden G; Cuyckens F; Oomens J Anal. Chem 2017, 89, 4359–4362. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] (63).Martens J; van Outersterp RE; Vreeken RJ; Cuyckens F; Coene KLM; Engelke UF; Kluijtmans LAJ; Wevers RA; Buydens LMC; Redlich B; Berden G; Oomens J Anal. Chim. Acta 2020, 1093, 1–15. [DOI] [PubMed] [Google Scholar]

[R64] (64).Menges FS; Perez EH; Edington SC; Duong CH; Yang N; Johnson MA J. Am. Soc. Mass Spectrom 2019, 30, 1551–1557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R65] (65).Pople JA; Schlegel HB; Krishnan R; Defrees DJ; Binkley JS; Frisch MJ; Whiteside RA; Hout RF; Hehre WJ Int. J. Quantum Chem 1981, 20, 269–278. [Google Scholar]

[R66] (66).Gastegger M; Behler J; Marquetand P Chem. Sci 2017, 8, 6924–6935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] (67).Yuanyuan C; Zhibin W Chemom. Intell. Lab. Syst 2018, 181, 1–10. [Google Scholar]

[R68] (68).Katsyuba SA; Zvereva EE; Grimme S J. Phys. Chem. A 2019, 123, 3802–3808. [DOI] [PubMed] [Google Scholar]

[R69] (69).Henschel H; van der Spoel D J. Phys. Chem. Lett 2020, 11, 5471–5475. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R70] (70).Pracht P; Grant DF; Grimme S J. Chem. Theory Comput 2020, 16, 7044–7060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] (71).Clifton C; Gallagher J; Shamin A; Stein S; Zohdi H NIST/EPA Gas Phase Infrared Library; NIST Standard Reference Database Number 35; National Institute of Standards and Technology: Gaithersburg MD, 20899, 2007. [Google Scholar]

[R72] (72).Pracht P; Bohle F; Grimme S Phys. Chem. Chem. Phys 2020, 22, 7169–7192. [DOI] [PubMed] [Google Scholar]

[R73] (73).Bannwarth C; Ehlert S; Grimme S J. Chem. Theory Comput 2019, 15, 1652–1671. [DOI] [PubMed] [Google Scholar]

[R74] (74).Bannwarth C; Caldeweyher E; Ehlert S; Hansen A; Pracht P; Seibert J; Spicher S; Grimme S Wiley Interdiscip. Rev. Comput. Mol. Sci 2021, 11, No. e1493. [Google Scholar]

[R75] (75).Heinonen M; Shen H; Zamboni N; Rousu J Bioinformatics 2012, 28, 2333–2341. [DOI] [PubMed] [Google Scholar]

[R76] (76).Allen F; Pon A; Wilson M; Greiner R; Wishart D Nucleic Acids Res 2014, 42, W94–W99. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] (77).Ridder L; van der Hooft JJJ; Verhoeven S; de Vos RCH; van Schaik R; Vervoort J Rapid Commun. Mass Spectrom 2012, 26, 2461–2471. [DOI] [PubMed] [Google Scholar]

[R78] (78).Wang Y; Kora G; Bowen BP; Pan C Anal. Chem 2014, 86, 9496–9503. [DOI] [PubMed] [Google Scholar]

[R79] (79).Menikarachchi LC; Hamdalla MA; Hill DW; Grant DF Comput. Struct. Biotechnol. J 2013, 5, No. e201302005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] (80).Poole CF J. Chromatogr. A 2013, 1296, 2–14. [DOI] [PubMed] [Google Scholar]

[R81] (81).Kumari S; Stevens D; Kind T; Denkert C; Fiehn O Anal. Chem 2011, 83, 5895–5902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R82] (82).Lai Z; Fiehn O Mass Spectrom. Rev 2018, 37, 245–257. [DOI] [PubMed] [Google Scholar]

[R83] (83).Khodadadi M; Pourfarzam M Metabolomics 2020, 16, 66. [DOI] [PubMed] [Google Scholar]

[R84] (84).Harvey DJ; Vouros P Mass Spectrom. Rev 2020, 39, 105–211. [DOI] [PubMed] [Google Scholar]

[R85] (85).Harris DC; Bertolucci MD Symmetry and Spectroscopy: An Introduction To Vibrational And Electronic Spectroscopy; Oxford University Press: New York, 2014; p 550. [Google Scholar]

[R86] (86).Scott AP; Radom L J. Phys. Chem 1996, 100, 16502–16513. [Google Scholar]

[R87] (87).Becke AD J. Chem. Phys 2014, 140, 18A301. [DOI] [PubMed] [Google Scholar]

[R88] (88).Houk KN; Liu F Acc. Chem. Res 2017, 50, 539–543. [DOI] [PubMed] [Google Scholar]

[R89] (89).Grimme S; Schreiner PR Angew. Chem., Int. Ed. Engl 2018, 57, 4170–4176. [DOI] [PubMed] [Google Scholar]

PERMALINK

High-Throughput Non-targeted Chemical Structure Identification Using Gas-Phase Infrared Spectra

Erandika Karunaratne

Dennis W Hill

Philipp Pracht

José A Gascón

Stefan Grimme

David F Grant

Abstract

Graphical Abstract

INTRODUCTION