Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 14.
Published in final edited form as: J Chem Inf Model. 2024 Sep 27;64(19):7470–7487. doi: 10.1021/acs.jcim.4c00760

Predicting Collision-Induced-Dissociation Tandem Mass Spectra (CID-MS/MS) Using Ab Initio Molecular Dynamics

Jesi Lee 1, Dean Joseph Tantillo 2, Lee-Ping Wang 3, Oliver Fiehn 4
PMCID: PMC11492810  NIHMSID: NIHMS2027842  PMID: 39329407

Abstract

Compound identification is at the center of metabolomics, usually by comparing experimental mass spectra against library spectra. However, most compounds are not commercially available to generate library spectra. Hence, for such compounds, MS/MS spectra need to be predicted. Machine learning and heuristic models have largely failed except for lipids. Here, quantum chemistry software can be used to predict mass spectra. However, quantum chemistry predictions for collision induced dissociation (CID) mass spectra in LC-MS/MS are rare. We present the CIDMD (Collision-Induced Dissociation via Molecular Dynamics) framework to model CID-based MS/MS spectra. It uses first-principles molecular dynamics (MD) to simulate the physical process of molecular collisions in CID tandem mass spectrometry. First, molecular ions are constructed at specific protonation sites. Using density functional theory, these protonated ions are targeted by argon collider gas atoms at user-specified velocities. Subsequent bond breakages are simulated over time for at least 1,000 fs. Each simulation is repeated multiple times from various collisional directions. Fragmentations are accumulated over those repeated collisions to generate CIDMD in silico mass spectra. Twelve small metabolites (<205 Da) were selected to test the accuracy of this framework in comparison to experimental MS/MS spectra. When testing different protomers, collider velocities, number of simulations, simulation time and impact factor b cutoffs, we yielded 261 predicted mass spectra. These in silico spectra resulted in entropy similarity scores of an average 624 ± 189 for all 261 spectra compared to their corresponding experimental spectra, which improved to 828 ± 77 when using optimal parameters of the most probable protomers for 12 molecules. With increasing molecular mass, higher velocities achieved better results. Similarly, different protomers showed large differences in fragmentation; hence, with increasing numbers of protomers and tautomers, the average CIDMD prediction accuracy decreased. Mechanistic details showed that specific fragment ions can be produced from different protomers via multiple fragmentation pathways. We propose that CIDMD is a suitable tool to predict mass spectra of small metabolites like produced by the gut microbiome.

Graphical Abstract

graphic file with name nihms-2027842-f0001.jpg

1. INTRODUCTION

Metabolism governs life. Therefore, identifying small molecules resulting from cellular processes is key to a better understanding of metabolic conversions. Thousands of detectable metabolites remain unidentified, dubbed as the “dark matter”.1 Today, liquid chromatography in conjunction with electrospray collision induced dissociation (CID) tandem mass spectrometry (MS/MS) is mostly used in metabolomics (shortly, LC-MS/MS).2 In CID experiments, molecular ions from the first MS stage collide at high velocity with gas molecules in a collision cell, leading to fragment ions that are identified in the second stage. CID fragmentation spectra differ by the type of mass spectrometer used, the collision energy, and the stability of molecules. For untargeted screening of metabolites and other small molecules, publicly available mass spectral libraries such as Wiley,3 MassBank (Europe and Japan),4 GNPS5 and NIST6 account for <200,000 compounds, compared to >114 million compound entries in PubChem7,8 (Figure S1). This mismatch is amplified by the many (unknown) enzymatic conversions that microorganisms might add to the complement of metabolomes that organisms are exposed to, for example, through food.9

Mechanistic details of collisional activation and subsequent fragmentation are not completely understood, in part because of the wide range of experimental conditions that are employed.10 Apart from the physics of the collision and energy transfer itself, the stability of fragments plays an important role in determining the final mass spectra. Selected precursor ions pass through a collision cell, a vacuum chamber that is filled with a low pressure inert gas to induce binary collisions. The collisions in the cell increase the internal energy of the ions and induce unimolecular fragmentation reactions to create product ions. A collision between a target gas molecule G and the ion M+ activates the ion by converting a portion of the collision energy given in the center of mass frame (Ecom), into the internal energy (Eint). Ecom can be obtained from the experimental collision energy Elab as

Ecom=mgas(mgas+mion)Elab (1)

where mgas and mion are the masses of the inert gas and molecular ion, respectively.11 The probability distribution of internal energy P(Eint) is a key quantity in the outcome of the CID experiment because fragmentations can occur only if Eint is sufficient to traverse the potential energy surface (PES) to reach the dissociation limit.

Except at very low collision energies, the energy transfer step happens rapidly, on the order of 10−15 to 10−14 seconds. A vibro-rotational excitation and reactive processes are used to explain the energy transfer process, in which the target collides with a portion of the molecular ion, and the recoil energy is converted into internal energy, all in the electronic ground state. If the amount of energy transfer is great enough to break chemical bonds, the ion consequently decomposes into fragments, A+ and B. Decomposition is typically described as a separate step that takes place after the collisional activation step, as shown:

[M+H]++G[M+H]+*+G (2)
[M+H]+*A++B (3)

where [M + H]+* denotes the collisionally activated ion. In the absence of CID, only a very small portion (~1%) of the precursor ions, called metastable ions, will dissociate in the time required to record the fragment ions. CID greatly increases the efficiency of fragmentation beyond this baseline, therefore, greatly aiding in compound identification.

CID can be divided into two energetic regimes: high-energy and low-energy.1214 In high-energy CID, the kinetic energy of the precursor ion in the laboratory frame prior to collisions is within the keV range. At high energy, a collision has an increasing probability of electronic excitation of the precursor ion, and electronic, vibrational, and rotational degrees of freedom are all able to contribute to dissociation. Only one or two collisions are needed, and only a few eV of the input energy is converted to Eint. Moreover, P(Eint) depends only weakly on the collisional energy, although the distribution width tends to increase with collision energy. High-energy CID-MS/MS spectra often include abundant undecomposed molecular ion peaks. Depending on the molecular structure of small molecules, formation of some product ions requires high energy. On the other hand, low-energy CID involves lab frame kinetic energies of <100 eV. Because the transfer of momentum plays an important role, larger gas molecules such as nitrogen yield more efficient energy transfer compared to smaller species like helium. In contrast to a small number of collisions made in high-energy CID, a precursor ion is activated by multiple collisions in low-energy CID. Here, a precursor ion makes approximately 10 collisions on average while traveling through the collision cell within 20 μs, and as many as 100 collisions in ion-trap CID over the residence time of ~5 ms in the ion trap. Overall, there are many possible pathways for collisional activation and fragmentation across the diverse configurations of CID experiments, which have been developed in order to maximize analytical capabilities.

Accurate prediction of CID-MS/MS spectra from chemical structures remains challenging. Computational approaches such as statistical rate theory calculation or MD simulations were performed.1523 While they successfully studied specific cases of mechanisms, they were not tested for generalizability. Other methods predict CID-MS/MS spectra from chemical structures using first-principles methods, notably VENUS/NWCHEM and QCxMS. VENUS/NWCHEM facilitates QM/MM and can be applied to the study of unimolecular and bimolecular reactions, gas-surface collisions, and post-transition state dynamics.15 QCxMS performs both quantum chemistry electron ionization mass spectrometry (QCEIMS)2426 and quantum chemistry collision induced dissociation mass spectrometry (QCCIDMS) as stand-alone software.27 QCEIMS methods were found to achieve average mass spectral similarity scores of better than 600, including tests for specific compound classes,2426 but has not been studied comprehensively on multiple compound classes. QCCIDMS was showcased on ten organic compounds for protonated2729 and on four deprotonated molecules.29 Benchmarking was based on the presence of predicted ions in experimental spectra, usability to develop fragmentation pathways and to showcase both positively and negatively charged molecules.2729 Unfortunately, QCCIDMS was not evaluated by calculating mass spectral similarity between predicted and experimental ion abundances, reasoning that experimental designs of MS/MS instruments differed too much to reasonably expect high overall similarities between insilico predicted and experimental spectra.29

However, in metabolomics studies, MS/MS libraries routinely show high similarities across different mass spectrometers (such as QTOF or Orbitrap instruments). We therefore developed the novel CIDMD framework (Collision-Induced Dissociation via Molecular Dynamics) to test the usability of quantum-chemistry based MS/MS predictions in untargeted metabolomics. CIDMD performs collision-induced-dissociation molecular dynamics to model the physical process in CID tandem mass spectrometry. We tested CIDMD on 12 molecules of different sizes and chemical classes, specifically, small acids (<131 Da), small cyclic molecules (<150 Da), noncyclic molecules (<150 Da), and larger cyclic molecules (>150 Da). MS/MS similarities to experimental spectra were benchmarked qualitatively and quantitatively by generating 261 CIDMD in silico mass spectra that comprehensively explored different parameter settings. We demonstrate for the first time that quantum-chemistry-based MS/MS spectra, known for enhancing the understanding of fragmentation mechanisms, can achieve average entropy similarity scores above 600 and exceed 800 with optimized parameters.

2. MATERIALS AND METHODS

The CIDMD framework involves preparing molecular ions, setting up and running the MD simulations of molecular collisions, generating in silico mass spectra, and comparing these spectra to reference experimental data.

2.1. Preparing the Molecular Ion.

In positive electrospray, the cations are analyzed. We restricted the framework to model protonated ions, which are the most frequent molecular ion species observed in mass spectrometry. Depending on the molecular structure, multiple protonation sites can exist, leading to different protomers being calculated for each molecule. While the most likely protonation sites are often heteroatoms, we also analyzed unlikely protonation sites on some carbon atoms as negative controls. Protonated molecular ion structures were constructed manually using Avogadro software.30 The 3D shape of ion structures was then optimized using TeraChem31 with an unrestricted B3LYP density functional32 and the 6–31G* basis set.33

2.2. Setting Up the CIDMD System.

CIDMD simulations are designed to simulate the internal motions of a molecule, immediately following a collision. Therefore, we developed a program to generate the initial conditions of the simulation containing only the molecular ion and one target gas molecule, here called the “collider”. The inertial frame was chosen such that the molecular ion center of mass is at rest (called the MD frame). In the MD frame, trajectories of the collider atom (in the absence of an actual collision) were modeled as a uniform probability distribution of straight lines, some of which intersected with the molecular envelope. To generate a sample from the uniform distribution of straight lines, the program operates in either a random or deterministic mode.

  1. In the random mode, a random point is generated inside of a cubic box with a side-length of 100 Å and with the molecular ion at the center, and a direction vector is generated by choosing a random vector within the unit sphere then normalizing it. By calculating the distance from the line to each atomic center, the minimum perpendicular distance to any atom in the molecular ion is determined, and the trajectory is accepted for simulation if this distance is less than a b factor cutoff parameter (here called the b cutoff) (Figure S2). In this way, we generated any number of initial conditions by repeating the random trials until the desired number of accepted trajectories was reached.

  2. In the deterministic mode, straight line trajectories were constructed using points taken from a uniform Cartesian grid while direction vectors were selected from a Lebedev quadrature grid.34 The number of initial conditions is a function of other parameters such as the grid spacing, the Lebedev grid order and the b cutoff (the distance between the collider and the closest atom of molecular ion, not the center of mass of the target molecular ion).

We used multiple combinations of parameters per CIDMD test: (i) distance between the collider and the closest atom of molecular ion (b cutoff, varied from 0.2 to 5 Å), (ii) velocity (vfac, ranged from 1 to 12), (iii) number of colliders (np, varied from 50 to 400) (iv) reaction time (1 to 5 ps). The collisional geometry follows from the chosen parameters. Following the sampling of straight lines, the initial positions of colliders were set back by 5 Å from the point of closest approach to the molecular ion. The initial velocities from straight lines were set equal to the direction vector multiplied by a scale factor. Here, we used AKMA units for velocity where 1 AKMA velocity unit = 2045.482 m/s (AKMA units = length: Å, mass: atomic mass number, time: 4.88821 × 10−14 s). By setting the initial velocity parameter, one defines the MD frame kinetic energy, which can be converted into the center-of-mass kinetic energy or lab frame kinetic energy by substituting the masses into eq 1. Initial conditions of molecular ions were approximated using the energy minimized geometry with zero initial velocity. Once positions and velocities of the colliders were defined, CIDMD simulations were performed at unrestricted B3LYP/6–31G* in the microcanonical (NVE) ensemble with a time step of 1.0 fs utilizing TeraChem.31 The simulation trajectories were used as an input to the analysis package.

2.3. Analyses.

We developed an analysis package that determined the accurate mass and charge of molecular fragments that were formed in the simulation trajectories and converted the frequencies of the observed ions into a simulated mass spectrum. The exact mass of the most abundant isotopic species was calculated to be up to 0.01 mDa in order to differentiate different molecular formulas. Fragments were identified via automated analysis of the time series of Mayer bond orders35 generated during the CIDMD simulation.3638 The raw time series were low-pass-filtered with an upper frequency cutoff corresponding to 200 cm−1, then discretized by applying a threshold of 0.2. Fragments were required to exist for at least 50 fs to be recognized as stable. The charges on the fragments at each time step were estimated by taking the sum of the Mulliken populations over the atoms in the fragment. A fragment was considered to be charged if the total charge averaged over the existence of the fragment exceeded a threshold of 0.70.

All in silico mass spectra were compared with available experimental data both quantitatively and qualitatively. Three different types of similarity scores were calculated for each spectrum in comparison to the experimental spectra from the authoritative NIST20 spectral database. If the NIST repository included multiple MS/MS spectra per molecule, similarity scores were calculated for each comparison. We used cosine (cos), mass-weighted dot (Wdot), and entropy scores as follows:3941

cos=(ΣIUIL)2ΣI2LΣI2Uwhere{I=[PeakIntensity]U=Unkowndata(a.k.a.theo.data)L=Librarydata(a.k.a.NISTdata) (4)
Wdot=(ΣWUWL)2ΣW2LΣW2UwhereW=[PeakIntensity]0.6[Mass]0.3entropysimilarity=12SABSASBln4 (5)
whereS=p=inIpwlnIpwandforIw{w=1(S3)w=0.25+S×0.25(S<3) (6)

While similarity scores can give a quantitative overall assessment of accuracy, they can obscure details in the comparison of theoretical and experimental spectra and thus are not sufficient to get a full understanding of the advantages and drawbacks of the simulation method. To further assist our investigation, we compared theoretical vs experimental spectra visually, including the presence and absence of fragment ions. The visualization of head-to-tail graphs enables careful examination of specific fragmentations to gain insights into possible fragmentation pathways and detailed mechanisms. The analysis package calculates all three similarity scores for each of the given reference mass spectra and creates head-to-tail graphs.

We employed the nudged-elastic band (NEB) method to find the minimum energy reaction path for citramalic acid molecular ion m/z 149 to fragments with m/z 103 and m/z 43. To create the 20 input structures including the initial and final coordinates on an interpolated path, we used geometric software. We performed NEB with optimization at each step with (u)B3LYP/6–31G using Geometric software42 that engaged the Psi4 program.43 Using 100 iterations with a spring constant of 0.1, NEB were applied to the intermediate structures between each set of initial and final states. Frequency analysis was performed using TeraChem31 with DFT (u)B3LYP/6–31G*(* for the cyclic molecules). Visualization of molecules were done via VMD software.44

We further analyzed the trajectories by separating the total kinetic energy (KE) of the molecular ion and fragments into classical translational, rotational, and vibrational components (Figure S3). This analysis used bond-order-based analysis to assign atoms to the existing molecular fragments in the trajectory frame. The translational KE of a fragment was calculated as KEtrans=12mvCOM2 where m is the fragment mass and vCOM the center-of-mass speed in the MD frame. The rotational KE was calculated as

KErot=12ωIω (7)
I=ifrag.miriri (8)
ω=I1L (9)
I=ifrag.miri×vi (10)

where ω and L are the angular velocity and angular momentum of the fragment respectively, I the moment of inertia tensor, ri and vi the atomic position and velocity vectors in the center-of-mass frame of the molecular fragment, the symbols ·, × and ⊗ denote the inner product, cross product and outer product respectively, and the sums are taken over atoms belonging to the fragment. The vibrational KE is then defined as KEvib = KEtot – KEtrans – KErot where KEtot=ifrag.12mivi2 is calculated using the atomic velocities in the MD frame.

For this study, multiple types of graphical processing units (GPU) are used for the MD simulations and result analysis: NVIDIA GeForce RTX 2080 Ti, GeForce RTX 1080 Ti, and GeForce GTX 980 Ti.

3. RESULTS

3.1. Accuracy of CIDMD Predictions of MS/MS Spectra of 12 Metabolites.

A simplified workflow of the CIDMD procedure is depicted in Figure 1. To develop the CIDMD framework and assess the quality of simulated CID-MS/MS spectra, 12 example molecules were selected that varied in size, structure, and function. The molecules were placed into four categories. (1) small cyclic molecules (<150 Da) included uracil, dihydroxy pyrimidine, aniline, and ribono-γ-lactone; (2) noncyclic small acids at <131 Da were compsed of itaconic acid, citraconic acid, mesaconic acid, and sorbic acid; (3) noncyclic larger molecules (>150 Da) included taurine and citramalic acid; (4) cyclic larger molecules (>150 Da) were chosen as bufotenine and psilocin (Figure 2). Calculating CID spectra for low molecular weight compounds (<205 Da) kept computational costs manageable while enabling comparisons between predicted and experimental MS/MS mass spectra in the NIST20 database. We selected molecules with CID-MS/MS spectra that showed at least three abundant fragment ions but few additional low abundance ions and few clusters of fragment ions that indicate complex fragmentation mechanisms. We restricted our calculations to protonated molecules [M + H]+, which are the most frequent ion adduct species found in MS/MS spectra for positive electrospray ionization experiments.

Figure 1.

Figure 1.

CIDMD framework to produce an in silico CIDMD spectrum. A selected protomer of a molecule undergoes geometry optimization, selections of CIDMD parameters, and collisional MD with the various sets of CIDMD parameters.

Figure 2.

Figure 2.

Distribution of CIDMD predicted versus experimental MS/MS spectra for 261 CIDMD test systems. (A) Histograms of MS/MS similarity scores for all 261 systems: cosine (Cos), weighted-dot-product (Wdot) and entropy similarity. (B) Histograms of MS/MS entropy similarity scores for all 12 test compounds with varying collision parameters and protonation positions. (C) Structures of all 12 molecules categorized into four groups: small acids (<131 Da, top left), noncyclic small and cyclic larger molecules (<150 Da, bottom left and top right), and cyclic larger molecules (>150 Da, bottom right).

For each test molecule, we generated all possible protomers by adding protons to each heteroatom. We then applied the CIDMD procedure with multiple combinations of parameters per system (distance between molecular ion and collider (b cutoff), velocity, number of colliders, and reaction time). For each system, 200 MD trajectories were run, each with a different set of parameters, with additional trajectories run for citramalic acid that was used for parameter set optimizations. A total of 261 systems simulations were simulated with CIDMD, resulting in 261 in silico CID-MS/MS spectra.

All in silico MS/MS spectra were compared to the corresponding experimental spectra from the NIST20 database with respect to numerical similarity (cosine (Cos), mass-weighted dot score (Wdot), and entropy similarity) (Figure 2); ranges were scaled from 0 (no matching ion) to 1000 (for identical MS/MS spectra). Precursor ions were removed from similarity calculations to focus on the prediction of fragmentation reactions and to avoid bias for low-energy collisions that would not induce fragmentations. Each molecule was tested with a different number of parameter settings, driven by the number of possible protomers, likely combination of collision parameters, and computational time. Overall, 4–23 parameter settings were tested for each of the 12 test molecules. The distribution of similarity scores between Cos, Wdot and entropy similarity (Figure 2A) showed that entropy similarity had higher average matching scores and less variance than Cos or Wdot calculations, with entropy similarity at 624 ± 189 compared to 576 ± 349 and 572 ± 332 for Cos and Wdot, respectively (Table S1). Interestingly, entropy similarity also showed a 4-fold reduced number of complete mismatches (scores <100) (Figure S4). Entropy similarity scores were also more useful. Cos scores give less penalty for fragment ion mismatches from either a missing peak and/or an extra peak than does the entropy scoring scheme. Wdot favors larger m/z values, which are also misleading as such values favor trivial fragmentations like water losses. Hence, while entropy similarity had far fewer spectra with near-perfect matches (scores >900) than Cos or Wdot calculations (Figure 2A), entropy similarity scores better reflect a realistic estimate of overall matching.

Next, we investigated if this overall lower variance in entropy similarity scores was also found for CIDMD predictions for each individual molecule, not just for all 12 molecules combined (Figure 2B, Table 1). Small acids have overall high average scores and small standard deviations (Table 1). In particular, mesaconic acid and sorbic acid have the smallest deviations because their CIDMD predicted mass spectra of most of their protomers with most of parameters were highly similar to the corresponding experimental mass spectra. Aniline and 4,6-dihydroxypyrimidine showed the largest variation (Table 1). The large similarity score variance for aniline, bufotenine, and psilocin is clear from Figure 2B, and it is mainly due to their protonation effects rather than simulation parameters.

Table 1.

Results of MS/MS Predictions by CID-MD for 12 Selected Molecules Tested for 58 Protomers in 261 CID-MD Systems, with Basic Chemical Information, Number of Protomers and CID-MD Test Systems, Average Entropy Similarity Scores with Standard Deviation, Best Entropy Similarity Scores, and Optimum CID-MD Parameters

molecular classes name (abbreviation) formula MW # protomers w/total # systems entropy similarity avg ± SD *best score Optimum parameter set
small acids(<150 Da) citraconic acid (CC) C5H6O4 130 5 14 624 ± 159 *882 vfac 4, cut 05
itaconic acid (IA) C5H6O4 130 5 20 784 ± 119 *945 vfac 4, cut 05
mesaconic acid (MC) C5H6O4 130 5 9 779 ± 48 *855 vfac 4, cut 05
sorbic acid (SA) C6H8O2 112 2 7 745 ± 76 *805 vfac 4, cut 05
noncyclic(<150 Da) citramalic acid (CA) C5H8O5 148 5 46 582 ± 170 *827 vfac 8, cut 05
taurine (TR) C2H8NO3S 125 5 22 497 ± 225 *775 vfac 4, cut 05
small cyclic(<150 Da) 4,6-dihydroxy-pyrimidine (DP) C4H4N2O5 112 6 23 662 ± 326 *849 vfac 4, cut 25
high energy uracil (UH) C4H4N2O5 112 6 38 716 ± 117 *913 vfac 4, cut 03
low energy uracil (UL) C4H4N2O5 112 6 33 685 ± 168 *870 vfac 5, cut 05
aniline (AN) C6H7N 93 2 4 338 ± 326 *687 vfac 6, cut 05
ribono-γ-lactone (RL) C5H8O5 148 5 37 472 ± 119 *683 vfac 4, cut 01
large cyclic(>150 Da) bufotenine (BF) C12H16N2O 204 3 4 566 ± 185 *847 vfac 8, cut 05
psilocin (PS) C12H16N2O 204 3 4 598 ± 199 *869 vfac 8, cut 05

3.2. Impact of Parameter Settings on CIDMD Predictions.

Using citramalic acid, we investigated in detail which sets of CIDMD parameters were specifically useful to match the experimental data. Figure 3 shows how much collision-induced fragmentation depends on the parameters used both experimentally and computationally (Figure 3). We visualized the fragmentation breakdown to investigate the change in product ion abundances as a function of collision energy.45 Figure 3A shows the breakdown curve for citramalic acid by using experimental mass spectra from the NIST20 database. This analysis identified characteristic compound fragment ions for citramalic acid such as m/z 43.018, 85.029, 103.039, and 131.034. For example, m/z 43.018 is notable because it rapidly increases in intensity as the collision energy increases, reaching a maximum intensity at all energy levels larger than 20 eV of collision energy, dominating experimental spectra as a base peak ion. Even low energy collisions in experimental mass spectra on citramalic acid ramping from 4 to 20 eV drastically changed the overall fragmentation in QTOF instruments in favor of high abundance of m/z 43 (Figure 3B).

Figure 3.

Figure 3.

Example spectra for citramalic acid under experimental conditions in a QTOF mass spectrometer (A, B) compared to modeled spectra by CIDMD predictions (C, D). (A) Breakdown graph of citramalic acid show across 4–45 eV collision energies extracted from multiple NIST20 experimental spectra. (B) QTOF MS/MS spectra from 4 to 20 eV collision energy (CElab). (C) CIDMD spectra for four protomers under 206 eV modeled collision energy (vfac 8). (D) CIDMD spectra for one protomer modeled at different collision energies (CEmod).

In comparison, for theoretically predicted spectra using CIDMD, we observed large differences in fragmentation for different protomers (Figure 3C). Citramalic acid has five oxygen atoms, and clearly, each protomer produced drastically different mass spectra, with the exception that case 3 and case 4 converged to the same protonation state in two slightly different conformers after energy minimization. Such results can be used to explain which protomers may be present in a particular experimental setting because fragment ions must retain the charge to be detected in a mass spectrometer. Consequently, fragmentation substructures are heavily dependent on the protomer precursor ion. Last, we found that CIDMD models produced less fragmentation under high energy simulations than under low energy simulations (Figure 3D). Specifically, fragment ions m/z 43 and m/z 45 were much reduced in abundance, while precursor ion m/z 149 was even increased in relative intensity compared to lower energy models. This finding was counterintuitive and does not match the increase of abundance in low m/z fragment ions in experimental mass spectra, which showed m/z 43 as the base peak ion and the absence of the molecular ion (Figure 3B, CElab = 20). When we investigated individual trajectories in CIDMD models, we found that collider atoms in high energy models frequently knocked off single atoms in citramalic acid, instead of distributing the vibrational energy throughout the molecule that would lead to bond breakage. Hence, both the energy trend in CIDMD models and their absolute value (in eV) do not correspond to experimental MS/MS in QTOF mass spectrometers.

Figure 4 shows a matrix that compares the CIDMD prediction accuracy to experiments for citramalic acid using different simulation parameters and experimental energies. First, patterns between Cos, Wdot and Entropy similarity scores were very similar for each set of CIDMD parameters, clearly showing that protomers cases 3 and 4 yielded much lower similarity to experimental spectra than protomers 1, 2, and 5 (Figure 4). This finding indicates that protomers 3 and 4 are less likely to be present at a significant part of the population of all molecules under QTOF MS/MS experimental conditions. For all protomers and across other sets of tested simulation parameters, improved agreement with experiment could be achieved for lower experimental collision energies always yielding the best similarity scores (Figure 4). Experimental spectra involving >10 eV QTOF MS fragmentations were not well predicted.

Figure 4.

Figure 4.

Optimizing the selection of CIDMD parameters by matching to experimental MS/MS spectra for citramalic acid. (A) Coos similarity calculations. (B) Weighted dot score similarity. (C) Entropy similarity. Color scales from 0 (dark red, very poor similarity) to 1000 (dark blue, very good similarity). Y-axis presents experimental spectra for citramalic acid on a QTOF mass spectrometer from 4 to 45 eV collision energy (NIST20 library). X-axis presents the different CIDMD parameters tested. Cut = b cutoff factor, vfac = scaled velocity, np= number of colliders, longMD = reaction time >1 ps.

For one specific citramalic acid protomer (case 1, protonated at the carboxyl-oxygen), we broadly tested the effects of the CIDMD parameters. First, at very low modeled collision energies (velocity factor or “vfac” 2), generally no fragmentation was observed (data not shown). Second, for some parameter combinations, we performed replicate sets of simulations, each with 200 trajectories. These parameter combinations are indicated by −1 and −2 in the set names. These replicates yielded similar but not identical spectra and similarity matrices, which can be expected due to the statistical noise in the finite number of randomized initial conditions, giving us a lower bound for explaining apparent differences between different sets of simulations. Exploring the overall patterns further, we found that both very low and very high energies (vfac 2 and 4 versus 10 and 12) yielded lower similarities to experimental QTOF MS/MS spectra than did vfac 6 and 8 sets. We also tested extended MD simulations, following trajectories for up to 5 ps instead of 1 ps. Overall, changes for long-MD simulations were similar to those for shorter simulations, with a tendency to produce worse similarity scores. When investigating the b cutoff from 0.2 to 2.0 Å, we found that 0.5 Å worked the best in terms of similarity scores (Figure 4, Figure S5A). Last, we studied the impact of initial conditions of the MD simulations by varying the size of the data set from 50 to 400 argon atom colliders. We found that utilizing 200 collider atoms was the best compromise between statistical precision and computational cost (Figure 4, Figure S5B). Hence, we then tested other citramalic acid protomers with a reduced set of parameters, focusing on vfac 6–9 sets at 200 collider atoms with a 0.5 Å b cutoff (Figure 4).

When applying the optimized simulation parameters to all 12 test molecules (Figure 2, Table 1), we found that CIDMD calculations of aliphatic acids yielded the highest average entropy similarity scores with the least variance. Specifically, itaconic acid (IA) yielded 784 ± 119 similarity, mesaconic acid (MC) yielded 779 ± 48 similarity, and sorbic acid (SA) yielded 745 ± 76 similarity scores (Figure 2B, Table 1). Scores above 700 are often used as landmark thresholds for compound identifications in metabolomics (refs 21, 26), meaning that these CIDMD calculations would suffice for MS/MS library generations for compounds that cannot be obtained from chemical vendors. Citraconic acid (CC) scores were significantly lower at 624 ± 159 similarity, similar to citramalic acid at 582 ± 170. Scores above 600 similarity can still be regarded as potential hits, but are usually not used for automatic scoring in metabolomics. While many molecules showed specific parameter combinations that yielded similarity scores >600, the average scores for both alicyclic and aromatic molecules were found to be lower than for aliphatic acids. One reason for overall lower similarity scores was that cyclic molecules showed higher intensities of the molecular ion peak in CIDMD, meaning that fewer fragment ions were available in the mass spectra for similarity scoring.

A structural isomer of citramalic acid, ribono-γ-lactone (RL, Figure 2) had the lowest average entropy similarity score of 473 ± 119 (Table 1). Overall, the best-performing set of parameters for this molecule was vfac 4 and b cutoff 1, which yielded a score of 683. Vfac 4 corresponds to a low-energy collision, and b cutoff 1 Å corresponds to a more off-center collision trajectory than most other molecules. Ribono-γ-lactone has many C–O bonds that can be cleaved heterolytically, meaning that it is more prone to fragmentation under low-energy collisions than aromatic compounds that are more stable. The 3D structure of ribono-γ-lactone also occupies a comparatively larger 3D space than a flat aromatic compound, making it more likely to have successful fragmentations at off-center collision trajectories than aromatic structures.

When optimal parameter sets were compared, vfac 4 was most often found across different molecules, except for aromatic (more stable) compounds (Table 1). Similarly, b cutoff 05 (a 0.5 Å offset between Argon and the colliding atom of the target molecule) was found to be optimal. Hence, we can use these 12 example molecules to derive a tentative recommendation for future tests on CIDMD studies, i.e., to use vfac 4 with b cutoff 05 Å parameter sets for alicyclic and aliphatic molecules at <200 Da but use vfac 8 with b cutoff 05 Å sets for larger and aromatic molecules. We emphasize, though, that our set of test molecules is too small to give definitive optimum parameters and that even in this small set, exceptions were found as for citramalic acid (vfac 8 with b cutoff 05 Å).

Similarity scores are useful for rapidly capturing differences between two spectra, but they may also obscure mechanistic details that can be explored by visually comparing head-to-tail graphs. Many NIST20 spectra were recorded across multiple collision energies, each of which may vary in the type and intensity of the product ions. Here, the 4 eV NIST20 mass spectrum was matching best to the CIDMD predicted mass spectrum of citramalic acid protomer case 1, confirmed also by visually examining the details of the head-to-tail graph (Figure 5).

Figure 5.

Figure 5.

Head-to-tail graph of citramalic acid. CIDMD predicted mass spectrum (top, magenta) compared to the 4 eV NIST20 experimental spectrum (bottom, blue). Light blue: mechanistic interpretations from CIDMD trajectories.

Figure 6 displays head-to-tail graphs for each molecular ion that have high similarity scores (above 800) with the parameters employed for the particular CIDMD system to generate the in silico spectrum. For all spectra, CIDMD correctly reproduced the base peaks in addition to most other abundant experimentally observed fragment ions. CIDMD also correctly discriminated mass spectra between groups of isomers, citramalic acid versus ribono-γ-lactone, uracil versus 2,4-dihydroxypyrimidine, and three isomeric small acids, itaconic acid versus citraconic acid and mesaconic acid. For the small acids, the typical fragmentation reaction of losing water or carbon monoxide was well predicted.

Figure 6.

Figure 6.

Best CIDMD predicted mass spectra (by MS/MS similarity) compared to NIST20 experimental mass spectra. Head-to-tail graphs of CIDMD generated spectra (top, red) and corresponding experimental ones (bottom, blue).

3.3. Mechanistic Studies.

3.3.1. Citramalic Acid Identifier: m/z 43.

The most prominent fragment (base peak ion) that is experimentally observed in a QTOF mass spectrometer for citramalic acid is m/z 43.018 at collision energies of >20 eV (Figure 3A). All other experimental ions had <10% of the base peak abundance at energies of >20 eV, giving m/z 43.018 ion as the only dominant fragment ion for the identification of citramalic acid. CIDMD correctly predicted this ion, and we investigated the fragmentation reactions closely in CIDMD simulations to gain mechanistic details forming m/z 43.018. NIST MS interpreter software46 interpreted this ion m/z 43.018 as C2H3O+, annotation the fragment structure as acetyl (or acetylium) cation, H3CC(+)=O.

We now used CIDMD to test whether this annotation is supported by quantum chemical calculations. A possible reaction pathway for an acetylium product ion is by beta-cleavage fragmentation (Figure 7A). Our CIDMD trajectories support this mechanism for a protonation of the C-1 carboxylic group, involving beta-cleavage (C-1-C-2) after the collider argon atom impacts the O-7 position. This reaction results in carbene (dihydroxycarbene) loss and forms an ion at m/z 103 (Figure 7A). This ion undergoes another beta-cleavage fragmentation (C-2-C-3) within one vibrational period (approximately 60 fs for C–C), yielding the product ion at m/z 43.018 (Figure 7A). By Houk’s definition,47 carbene loss and subsequent beta-cleavage are therefore “dynamically concerted”. This reaction provides an example of one fundamental type of behavior observed in CIDMS trajectories: passage through a nonstatistical intermediate, i.e., a minimum on the PES that does not undergo complete intramolecular vibrational energy redistribution (IVR) before proceeding on to another reaction. Such “hot intermediates” have been discussed in various contexts,48 including the consequences of the momentum they possess as they approach the structure of the PES minimum for their fate (so-called “dynamic matching”).49

Figure 7.

Figure 7.

Formation of the acetylium cation from citramalic acid ions. (A) Bond order time series indicates which bonds are formed or cleaved. (B) Fragmentation reaction pathway depicted per bond order time series from a CIDMD trajectory. (C) Different citramalic acid protomers undergo a different fragmentation pathway to yield m/z 43.018, the acetylium cation.

To detail the mechanistic aspects of the beta-cleavage fragmentation itself, we used the Nudged Elastic Band (NEB) method to determine a pathway from the initial to final states of ion structures. We calculated energy profiles for the loss of carbene and the subsequent production of the acetylium cation (Figure 7B). The predicted activation energy of the carbene loss was calculated as approximately 41 kcal/mol, while the activation energy of the second reaction yielding the product ion was predicted to be approximately 13 kcal/mol. Although these activation energies are likely overestimates, given the relatively loose convergence criteria of 10−5 that was used in the NEB procedure, they still indicate the mechanism is consistent with experimental conditions. In the NIST library, the 10 eV collision energy in the lab frame translates to 2.07 eV (47.74 kcal/mol) of available energy in the center of mass frame for the CIDMD simulations. This calculation supports the theoretical calculation for the reaction energy barrier of 41 kcal/mol for the ion m/z 43.018 (Figure 7B). We further decomposed the total kinetic energy of the system into translational, rotational, and vibrational kinetic energies of each fragmentation (Figure 7C). The pink line of the graph shows the translational kinetic energy of the argon collider atom, 795.2 kcal/mol. After 20 fs, argon collides with the citramalic acid target at the C1-hydroxyl group. Around 75 fs, dihydroxycarbene (CH2O2) is lost with 287.3 kcal/mol of translational energy. The rest of the molecule (C4H7O3+, m/z 103.039) still has 186.3 kcal/mol of vibrational energy to undergo a second reaction. This reaction occurs at approximately 230 fs, losing C2H4O2 and resulting in the acetylium cation, H3CC(+)=O. In summary, our mechanistic investigations confirm that the hypothesized reactions leading to an overall beta-cleavage fragmentation are energetically feasible and present a rational explanation for the observed mass spectrum.

Next, we investigated whether one product ion might be produced by more than one reaction. Our CIDMD trajectories revealed that the same ion may indeed result from different fragmentation pathways depending on the protonation site of the molecular ion (Figure 8). For instance, citramalic acid yields m/z 43.018 by different MD pathways. Citramalic acid protomer 1 (Figure 8A) undergoes β-cleavage, yielding the acetylium cation. When protonated on the carbonyl-O of the C-4 carboxylic acid group, we observed a similar reaction to that before, resulting in the same product ion and the same neutral losses (Figure 8B). However, when citramalic acid was protonated on the hydroxyl-O of the C-4 carboxylic acid group, we found immediate water loss and a methyl transfer to the carbene, leading again to the acetylium product cation (Figure 8C). If citramalic acid is protonated on the C-2 hydroxyl group, water and carbon monoxide were eliminated, again yielding the same product ion with m/z 43.018. (Figure 8D). Here we confirm that a specific product ion of a mass spectrum may result from multiple fragmentation reactions and demonstrate that our simulations can reveal the details of each of these fragmentation pathways.

Figure 8.

Figure 8.

Formation of the acetylium cation from different citramalic acid molecular ion protomers. (A–D) The four citramalic acid protomers undergo different fragmentation pathways to yield the same acetylium cation at m/z 43.018.

3.3.2. m/z 43 of Ribono-γ-lactone.

We tested this hypothesis for another molecule, ribono-γ-lactone. The ribono-γ-lactone m/z 43.018 fragment ion serves as one of the major characteristic identifiers in MS/MS spectral similarity scoring. CIDMD correctly predicted these product ion and fragmentation reactions. Like citramalic acid, ribono-γ-lactone ion underwent different fragmentation pathways depending on the protonated position, all leading to m/z 43.018. When the lactone moiety was protonated, we observed an immediate opening of the ring, releasing carbon monoxide and alkenediol (Figure 9A). Subsequently, the product ion m/z 43.018 is produced, but here the structure is a Cα-protonated ketene cation (Figure 9A). Another form of ketene, an O-protonated ketene cation, was produced with a loss of water and carbon dioxide whenever one of the alcohol groups was protonated (Figure 9B). Interestingly, this same protomer (protonated at the hydroxyl group) produced another isomer of the m/z 43.018 ion, the cyclic version of the O-protonated ketene ion, a protonated oxirene cation (Figure 9C). In summary, for ribono-γ-lactone we again found different pathways leading to m/z 43.018, but unlike citramalic acid, CIDMD calculations suggest that this fragment ion represents different isomer structures depending on the fragmentation pathway for each protomer.

Figure 9.

Figure 9.

Formation of the acetylium cation from two different ribono-γ-lactone molecular ion protomers. (A) An isomer of [C2H3O]+, m/z 43.018, is formed by ring opening of the protonated lactone-oxygen in ribono-γ-lactone, followed by neutral loss of carbon monoxide, hydrogen rearrangement and water loss. (B,C) An initial water loss from the protonated alpha-hydroxy group leads to different fragmentation pathways to yield two acetylium cation tautomers at m/z 43.018.

3.3.3. Ribono-γ-lactone Fragment Ions m/z 69 and 85.

The ribono-γ-lactone mass spectrum includes another significant ion for MS/MS identification, the ion m/z = 69.033. This ion occurs in both HCD and QTOF mass spectra (Figure 10A, 10B). Despite testing many CIDMD simulations, we only observed this product ion in a single fragmentation reaction within 200 fs. This reaction used protomer 3 (Figure 10C, top left) and a protonation at the γ-hydroxyl group. Here, the molecular ion lost two H2O molecules within the first 100 fs of simulation. These losses were followed by a gamma-cleavage of the lactone portion within approximately 60 fs, which resulted in CO2 loss, and the product ion at m/z 69.033 is another example of dynamic concertedness.

Figure 10.

Figure 10.

Collision energy dependent molecule breakdown curves of ribono-γ-lactone constructed from the NIST20 library for (A) HCD orbital ion trap spectra and (B) QTOF MS/MS spectra. (C) Ribono-γ-lactone fragmentation pathway extracted from CIDMD simulations.

Since this ion m/z 69.033 is experimentally observed as a predominant ion, we investigated why CIDMD failed to predict its abundance (Figure 10A, 10B). Instead of the m/z = 69.033 ion, we observed an abundant m/z = 131.034 ion in CIDMD predictions that is missing in the experimental spectra (Figure 10C, top right). Therefore, we decided to investigate if further collisions from the m/z 131.034 ion could yield the m/z 69.033 ion. We selected two of the major ions identified from CIDMD simulations, m/z 103.039 and 87.044 (Figure 10C, middle right), since they are the ions resulting from either CO or CO2 loss from the m/z 131.034 ion. We speculated that the missing ion m/z 69.033 might arise from a H2O2 loss from m/z 103.039 or a H2O loss from m/z 87.044. We ran CIDMD simulations starting with these ions as the target ions. Unfortunately, we did not observe the m/z = 69.033 ion from the CIDMD simulation of the ion m/z = 103.039. The simulations showed that the two oxygens were geometrically too far apart to react and detach as hydrogen peroxide. Interestingly, the ion m/z 103.039 lost a H2O to become m/z 85.029, which is another experimentally observed identifier for the molecule (m/z 85.03 in Figure 10A, 10B). From the m/z 87.044 ion, we suspected that water loss could yield m/z 69.033 more easily, which we did indeed observe. This fragmentation reaction was detected with approximately 10% relative abundance for the vfac 6 setting and approximately 3% for vfac 4. CIDMD also predicted that the m/z 87.044 ion could lose hydrogen gas, resulting in identifier ion m/z 85.029. The m/z 69.033 ion underwent further fragmentation to yield the product acetylium ion, m/z 43.018. This scenario demonstrates the second fundamental type of behavior observed in CIDMD trajectories: full IVR to produce statistical intermediates that require further activation–here through further collisions–to undergo further reactions. This scenario also points toward a path for future improvement of prediction accuracy–modeling the effects of multiple collisions in CIDMD which indeed is likely to happen in experimental mass spectrometers. Unfortunately, implementing automatic multiple collisions in CIDMD is far from being trivial.

4. DISCUSSION

In untargeted metabolomics, compound identification heavily depends on matching experimental MS/MS mass spectra to spectral libraries. Because such libraries are very limited in coverage of the small-molecule space, computational mass spectrometry may assist in improving the identification. However, currently, only a few methods and software exist to predict mass spectra from molecular structures. In addition, MS/MS fragmentations are difficult to predict due to the complexities of collision-induced dissociation (CID) process,50 hampering the confidence in predicted spectra quality. We here developed a computational framework to lead the way for MD-dependent MS/MS predictions.

Overall, we generated and evaluated 261 in silico mass spectra using CIDMD from 12 molecules. Many predicted spectra were highly similar to those of their corresponding NIST library experimental spectra. An average entropy score of 621 was found over all 261 spectra generated from all parameters and protomers, yet, when considering only the most probable protomer of each of the 12 molecules, the best parameter settings yielded an average similarity score of 828 ± 77. With such high similarity scores, the prediction accuracy of CIDMD appears to be far better than that obtained by a rule-based MS/MS prediction tool, CFM-ID. CFM-ID was published with average match scores of 0.37, even though the authors included the precursor ions in the MS/MS calculation.51 Keeping unfragmented precursor ions in MS/MS similarity calculations artificially increases match scores and should therefore be avoided, as it does not yield orthogonal information in identity search that preselects candidate molecules by accurate precursor mass information. CFM-ID is a heuristic-rule-based tool but includes self-learning machine learning methods. The performance of such methods depends heavily on the training data as well as other factors such as featurization, the structure of the neural network, and the complexity of the problem. An independent benchmarking study yielded worse matching scores using CFM-ID52 than average scores we yielded here by CIDMD. However, a significant drawback of CIDMD software compared to CFM-ID is the much slower compute time per tested molecule. For instance, bufotenine, a molecule of 204 Da with 31 atoms in vfac 8, took approximately 123.5 h to predict one CIDMD spectra from one protomer with one GeForce GTX 980 Ti graphical processing unit (GPU). With 32 GPUs in parallel, the computational cost significantly decreased to less than 4 h; hence, with 3 protomers of bufotenine, the total time was approximately 11.5 h. For a similar task, rule-based tools such as CFM-ID perform a lot faster. At this point, no tool combines accuracy with speed, and hence, predicting high quality MS/MS spectra for even a fraction of the chemosphere is beyond the reach of current methods.53

Yet, CIDMD offers additional advantages because it can unveil detailed fragmentation mechanisms for different protomers of each molecule. For citramalic acid, these mechanisms explained how protonation states affected the ion formation and ion abundance. While CIDMD can therefore be used to retrospectively rationalize experimental mass spectra by the ratio of different protomers found under experimental conditions, it would be much more useful if protomer formation itself could be accurately predicted so that fewer computationally intensive CIDMD simulations would be needed. Experimentally, it was found that solvent conditions can impact protomer states,54 and ion mobility collision cross section data can discern different protomers.55 However, even for small molecules, predicting pKa values accurately appears to be best by using quantum mechanical methods using the van’t Hoff isotherm to calculate free dissociation energies (ΔGaq),56 whereas pKb values appear to be less useful. It appears, however, that pKa predictions by machine-learning models still have root-mean square errors of about 1 pKa unit, especially when nonaqueous solvents need to be considered as used in LC-MS/MS.57,58 In a parallel report,59 we investigate if the thermodynamically most likely protomers are likely representing the structures that produce the most accurate spectra in CIDMD.

Similarly, the best-suited parameter sets for CIDMD were still difficult to predict beforehand because they affect the CIDMD-calculated spectra differently; therefore, multiple parameter sets needed to be explored. The collision energy (CE) plays a critical role in the CID process. In CIDMD, the collision energy is represented by the vfac parameter, which resembles the collision energy used in the experiments. Our findings indicate that CIDMD performs better with higher velocity factors for larger molecules and smaller factors for smaller molecules overall. However, this value is not directly comparable to experimental CE values, even when vfac values are converted into the lab frame CElab (noted as CEmod previously). This discrepancy arises because experimental CE values refer to only the energy applied during collisions in the collision cell of the mass spectrometer. They do not account for the initial energy of the molecular ion, which is obtained from the ionization process, or for the energy accumulated by multiple collisions.

Furthermore, it is yet unclear how the characteristics of the collider trajectory (b cutoff and velocity) relative to the location and vibrational state of a target molecule might affect the efficiency of energy transfer. The energy transfer of a single collision is heavily dependent on collision types. Collision types can be described by impact parameter b, defined as the perpendicular distance between the trajectory of a colliding particle and the center of mass of the target molecular ion. Larger molecular ions require larger b values to accurately simulate single collisions, making the evolution of energy transfer with varying b values crucial. In CIDMD, we do not use impact parameter b. Instead, we introduce the b cutoff parameter, defined as the distance between the trajectory of the argon atom and the closest atom of the molecular ion and not the center of mass of the molecular ion. This approach allows CIDMD to simulate multiple types of collisions with improved computational efficiency, including head-on and grazing collisions, to occur within one system using a cutoff value smaller than the impact parameter b. We tested various b cutoff values to observe their effects on the resulting mass spectra and compared these with experimental spectra. Clearly, the efficiency of the energy transfer affects the CIDMD results directly. Of course, head-on collisions with a zero b cutoff yield the most effective energy transfer. However, under experimental conditions in a mass spectrometer, most collisions are not head-on. The b cutoff value of 0.5 Å yielded the best matching spectra overall for the tested molecules, transferring the collision energy successfully and allowing effective energy distribution along the target molecule during CIDMD simulations. A larger b cutoff value of 2 Å exhibited mass spectra with decreased overall intensities of fragment ions, and new fragments were rarely observed compared with the head-on collisions. Hence, our data here provide the basis for reasonable limits for parameter sets in CIDMD modeling of the MS/MS spectra of small molecules.

In CIDMD, the relative intensities of the ions per mass spectrum are determined by counting the fragment ions generated during the MD simulations. The number of initial conditions for these simulations is set by the np parameter, which corresponds to the number of independent collisional MD simulations with one argon atom, varying its placement relative to the input molecule geometry. Therefore, for a large np parameter value, the resulting data should statistically converge. However, for practicality, we tried to find the minimum np parameter value that yielded acceptable accuracy, testing MS/MS similarities up to an np parameter value of 400. We found that the np value of 200 gave fortuitously good agreement with experiments, while more than 200 trajectories did not yield any significant new fragment ions. Therefore, we used an np value of 200 as a compromise to yield a reasonable level of convergence with acceptable computational cost.

Notably, we also modeled only single collider events, in contrast with experimental conditions in which molecular ions undergo multiple collisions. In a quadrupole-type collision cells, 2–10 collisions have been reported, and 50–100 collisions in an ion trap.10 To some extent, our simulation of multiple angles of collisions and the summing up of many trajectories may account for some of this mismatch between CIDMD modeling and experimental MS/MS conditions.

One interesting finding emerged when we preliminarily tested a few different functionals in our CIDMD modeling for citramalic acid and bufotenine. The use of the uB3YLP functional resulted in the best prediction among the functionals we evaluated. For example, when predicting CID of citramalic acid (protomer 2 with vfac 8 and cutoff 0.5), the uB3YLP functional yielded the highest entropy similarity score (811.3), excluding the unfragmented molecular ion. This score was followed by the B3LYP functional (801.2) and other long-range-separated hybrid functionals: ωB97X-D3 (703.9), ωB97X (700.4), CAM-B3LYP (695.3), and ωPBEH (610.0). These results can be attributed to a couple of reasons. First is the chemical specificity of the particular system. As our preliminary evaluation of functionals in the CIDMD method focused solely on two molecular systems, citramalic acid and bufotenine, the outperformance by B3LYP might result from its empirical parameters that were particularly effective at capturing these molecules’ partial electron delocalization, and therefore, B3LYP might be more suitable for these specific molecules. Second, the self-interaction error (SIE) might play a role since B3LYP only includes approximately 20% Hartree–Fock (HF) exchange, partially correcting for SIE, while the long-range-separated hybrid functionals contain higher HF exchange contribution. Therefore, the long-range-separated hybrid functionals in the CIDMD model might lead to overlocalization of electron density at short distances, resulting in different fragment patterns observed experimentally. In addition, the outperformance of the unrestricted functional compared to the restricted one in CIDMD can be explained by the frequent occurrence of radical fragment ions in experiments. Currently, no systematic investigation of radical fragment ions in CID tandem mass spectra has been done, but a recent study investigated the mass spectra available in NIST20 and reported that over 10% of collision-induced dissociation reactions in positive mode tandem mass spectrometry involve radicals.60

Previously, Hase and co-workers proposed that some parts of a target molecule might receive energy preferentially before fragmentation occurs,61 instead of an even distribution of energy across all degrees of freedom, like in vibrations and rotations. Therefore, nonstatistical fragmentation dynamics could be more crucial in collisions.61 If nonstatistical fragmentation pathways are significant, simulating multiple collisions becomes crucial for capturing the diversity of experimentally observable fragmentation pathways. Simulating multiple collisions with the precursor ion might become more critical than simulating secondary or tertiary collisions with fragment ions because the primary fragmentation event determines the initial fragmentation pathway that is the key to identify nonstatistical fragmentations. In contrast, secondary and tertiary fragmentations contribute significantly less, as they are the results of further breakdown of primary fragmentations. Therefore, the necessity of simulating multiple collisions depends on the extent to which nonstatistical effects might influence the fragmentation process. In CIDMD, an automatic multiple collision method has not yet been implemented yet. Nevertheless, even without testing multiple collisions, CIDMD-calculated spectra exhibited a high similarity to the corresponding experimental spectra. It would be interesting to see if specific molecules that performed worse than average, e.g., ribono-gamma-lactone, would improve MS/MS similarity scores if multiple collisions were tested.

5. CONCLUSIONS

We developed a computational framework, CIDMD, to predict collision induced MS/MS spectra directly from protonated chemical structures. Despite making major approximations, CIDMD predicted mass spectra of 12 small molecules with much better MS/MS similarity to NIST20 experimental mass spectra than those typically achieved by heuristic and machine-learning methods. CIDMD also provides deep insights into mechanistic details of fragmentation pathways. The significant computational time could possibly be decreased if protonation sites on the molecule structures were predicted with better accuracy. We propose that CIDMD is ready to be used to predict and compare MS/MS spectra of isomers of unknown compounds that cannot be purchased from chemical vendors, for example, as a result of metabolic studies on gut microbiomes.

Supplementary Material

Supporting Information

ACKNOWLEDGMENTS

This study was funded by NIH U2C ES030158 (to OF).

Funding

Funding was provided by the National Institutes of Health under the award number NIH U2C ES030158 (to OF).

ABBREVIATIONS

CID

collision induced dissociation

LC

liquid chromatography

MS/MS

tandem mass spectrometry

ESI

electrospray ionization

MD

molecular dynamics

CIDMD

collision indued dissociation molecular dynamics

KE

kinetic energy

QTOF

quadrupole time-of-flight

HCD

high-energy collision dissociation

NIST

national institution of standard and technology

CE

collision energy

Footnotes

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.4c00760.

Figure S1: Experimental ESI-CID-MS/MS database libraries (PDF); Figure S2: Definition of b cutoff parameter in CIDMD (PDF); Figure S3: Kinetic energy decomposition equations and workflow (PDF); Figure S4: Results of MS/MS predictions by CID-MD for 12 selected molecules tested for cosine and weighted-dot score similarities (PDF); Figure S5: Cosine, wdot, and entropy similarity scores with various collisional parameters in Citramalic acid (CA) protomer case 1 (PDF). Table S1: Detailed information of total of 267 CIDMD spectra generated including 261 CIDMD spectra varying CIDMD parameters for 12 molecules and 6 secondary CIDMD spectra predicted from fragments (PDF)

Complete contact information is available at: https://pubs.acs.org/10.1021/acs.jcim.4c00760

The authors declare no competing financial interest.

Contributor Information

Jesi Lee, Department of Chemistry, University of California, Davis, California 95616, United States; West Coast Metabolomics Center, University of California, Davis, California 95616, United States.

Dean Joseph Tantillo, Department of Chemistry, University of California, Davis, California 95616, United States.

Lee-Ping Wang, Department of Chemistry, University of California, Davis, California 95616, United States.

Oliver Fiehn, West Coast Metabolomics Center, University of California, Davis, California 95616, United States.

Data Availability Statement

CIDMD mass spectra are freely available at MassBank of North America, https://massbank.us, accessed on 04 September 2023. All in-house scripts that were created for this research project are freely available at https://github.com/jesilee/.

REFERENCES

  • (1).Vinaixa M; Schymanski EL; Neumann S; Navarro M; Salek RM; Yanes O Mass spectral databases for LC/MS- and GC/MS-based metabolomics: State of the field and future prospects. TrAC Trends in Analytical Chemistry 2016, 78, 23–25. [Google Scholar]
  • (2).Xiao JF; Zhou B; Ressom HW Metabolite identification and quantitation in LC-MS/MS-based metabolomics. TrAC Trends in Analytical Chemistry 2012, 32, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Science Solutions. LC-MS Technology Overview. Wiley Science Solutions. See the following: https://sciencesolutions.wiley.com/solutions/technique/lc-ms/ [Google Scholar]
  • (4).Horai H; Arita M; Kanaya S; Nihei Y; Ikeda T; Suwa K; Ojima Y; Tanaka K; Tanaka S; Aoshima K; Oda Y; Kakazu Y; Kusano M; Tohge T; Matsuda F; Sawada Y; Hirai MY; Nakanishi H; Ikeda K; Akimoto N; Maoka T; Takahashi H; Ara T; Sakurai N; Suzuki H; Shibata D; Neumann S; Iida T; Tanaka K; Funatsu K; Matsuura F; Soga T; Taguchi R; Saito K; Nishioka T MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom 2010, 45 (7), 703–714. [DOI] [PubMed] [Google Scholar]
  • (5).Aron AT; Gentry EC; McPhail KL; Nothias LF; Nothias-Esposito M; Bouslimani A; Petras D; Gauglitz JM; Sikora N; Vargas F; van der Hooft JJJ; Ernst M; Kang KB; Aceves CM; Caraballo-Rodríguez AM; Koester I; Weldon KC; Bertrand S; Roullier C; Sun K; Tehan RM; Boya P CA; Christian MH; Gutiérrez M; Ulloa AM; Tejeda Mora JA; Mojica-Flores R; Lakey-Beitia J; Vásquez-Chaves V; Zhang Y; Calderón AI; Tayler N; Keyzers RA; Tugizimana F; Ndlovu N; Aksenov AA; Jarmusch AK; Schmid R; Truman AW; Bandeira N; Wang M; Dorrestein PC Reproducible Molecular Networking Of Untargeted Mass Spectrometry Data Using GNPS. Nature Protocols 2020, 15 (6), 1954–1991. [DOI] [PubMed] [Google Scholar]
  • (6).Vinaixa M; Schymanski EL; Neumann S; Navarro M; Salek RM; Yanes O Mass spectral databases for LC/MS- and GC/MS-based metabolomics: State of the field and future prospects. TrAC Trends in Analytical Chemistry 2016, 78, 23–35. [Google Scholar]
  • (7).Wang Y; Xiao J; Suzek TO; Zhang J; Wang J; Bryant SH PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009, 37 (Web Server), W623–W633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Kim S; Chen J; Cheng T; Gindulyte A; He J; He S; Li Q; Shoemaker BA; Thiessen PA; Yu B; Zaslavsky L; Zhang J; Bolton EE PubChem 2023 update. Nucleic Acids Research 2023, 51 (D1), D1373–D1380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Jeffryes JG; Colastani RL; Elbadawi-Sidhu M; Kind T; Niehaus TD; Broadbelt LJ; Hanson AD; Fiehn O; Tyo KEJ; Henry CS MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. J. Cheminform 2015, 7 (1), 44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Niessen WMA, MS-MS and MSn. In Encyclopedia of Spectroscopy and Spectrometry; Lindon JC, Tranter GE, Koppenaal DW, Eds.; Elsevier: 2017; pp 936–941. [Google Scholar]
  • (11).Borges RM; Colby SM; Das S; Edison AS; Fiehn O; Kind T; Lee J; Merrill AT; Merz KM; Metz TO; Nunez JR; Tantillo DJ; Wang L-P; Wang S; Renslow RS Quantum Chemistry Calculations for Metabolomics: Focus Review. Chem. Rev 2021, 121 (10), 5633–5670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Amorim Madeira PJ; Helena M, Applications of Tandem Mass Spectrometry: From Structural Analysis to Fundamental Studies. In Tandem Mass Spectrometry - Applications and Principles, Prasain J, Ed.; InTech: 2012. [Google Scholar]
  • (13).Shukla AK; Futrell JH Tandem mass spectrometry: dissociation of ions by collisional activation. J. Mass Spectrom 2000, 35 (9), 1069–1090. [DOI] [PubMed] [Google Scholar]
  • (14).Mayer PM; Poon C The mechanisms of collisional activation of ions in mass spectrometry. Mass Spectrom. Rev 2009, 28 (4), 608–639. [DOI] [PubMed] [Google Scholar]
  • (15).Lourderaj U; Sun R; Kohale SC; Barnes GL; de Jong WA; Windus TL; Hase WL The VENUS/NWChem software package. Tight coupling between chemical dynamics simulations and electronic structure theory. Comput. Phys. Commun 2014, 185 (3), 1074–1080. [Google Scholar]
  • (16).Macaluso V; Homayoon Z; Spezia R; Hase WL Threshold for shattering fragmentation in collision-induced dissociation of the doubly protonated tripeptide TIK(H+)2. Phys. Chem. Chem. Phys 2018, 20 (30), 19744–19749. [DOI] [PubMed] [Google Scholar]
  • (17).Martin-Somer A; Martens J; Grzetic J; Hase WL; Oomens J; Spezia R Unimolecular Fragmentation of Deprotonated Diproline [Pro2-H]- Studied by Chemical Dynamics Simulations and IRMPD Spectroscopy. J. Phys. Chem. A 2018, 122 (10), 2612–2625. [DOI] [PubMed] [Google Scholar]
  • (18).Rossich Molina E; Eizaguirre A; Haldys V; Urban D; Doisneau G; Bourdreux Y; Beau J-M; Salpin J-Y; Spezia R Characterization of Protonated Model Disaccharides from Tandem Mass Spectrometry and Chemical Dynamics Simulations. ChemPhysChem 2017, 18 (19), 2812–2823. [DOI] [PubMed] [Google Scholar]
  • (19).Spezia R; Lee SB; Cho A; Song K Collision-induced dissociation mechanisms of protonated penta- and octa-glycine as revealed by chemical dynamics simulations. Int. J. Mass Spectrom 2015, 392, 125–138. [Google Scholar]
  • (20).Lee G; Park E; Chung H; Jeanvoine Y; Song K; Spezia R Gas phase fragmentation mechanisms of protonated testosterone as revealed by chemical dynamics simulations. Int. J. Mass Spectrom 2016, 407, 40–50. [Google Scholar]
  • (21).Song K; Spezia R Theoretical mass spectrometry: tracing ions with classical trajectories; Walter de Gruyter GmbH & Co KG: 2018. [Google Scholar]
  • (22).Pratihar S; Barnes GL; Laskin J; Hase WL Dynamics of Protonated Peptide Ion Collisions with Organic Surfaces: Consonance of Simulation and Experiment. J. Phys. Chem. Lett 2016, 7 (16), 3142–3150. [DOI] [PubMed] [Google Scholar]
  • (23).Martin Somer A; Macaluso V; Barnes GL; Yang L; Pratihar S; Song K; Hase WL; Spezia R Role of Chemical Dynamics Simulations in Mass Spectrometry Studies of Collision-Induced Dissociation and Collisions of Biological Ions with Organic Surfaces. J. Am. Soc. Mass Spectrom 2020, 31 (1), 2–24. [DOI] [PubMed] [Google Scholar]
  • (24).Grimme S Towards first principles calculation of electron impact mass spectra of molecules. Angew. Chem., Int. Ed 2013, 52 (24), 6306–6312. [DOI] [PubMed] [Google Scholar]
  • (25).Koopman J; Grimme S Calculation of Mass Spectra with the QCxMS Method for Negatively and Multiply Charged Molecules. J. Am. Soc. Mass Spectrom 2022, 33 (12), 2226–2242. [DOI] [PubMed] [Google Scholar]
  • (26).Koopman J; Grimme S Calculation of Electron Ionization Mass Spectra with Semiempirical GFNn-xTB Methods. ACS Omega 2019, 4 (12), 15120–15133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Koopman J; Grimme S From QCEIMS to QCxMS: A Tool to Routinely Calculate CID Mass Spectra Using Molecular Dynamics. J. Am. Soc. Mass Spectrom 2021, 32 (7), 1735–1751. [DOI] [PubMed] [Google Scholar]
  • (28).Schnegotzki R; Koopman J; Grimme S; Suüssmuth RD Quantum Chemistry-based Molecular Dynamics Simulations as a Tool for the Assignment of ESI-MS/MS Spectra of Drug Molecules. Chem.—Eur. J 2022, 28 (27), No. e202200318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Koopman J; Grimme S Calculation of Mass Spectra with the QCxMS Method for Negatively and Multiply Charged Molecules. J. Am. Soc. Mass Spectrom 2022, 33 (12), 2226–2242. [DOI] [PubMed] [Google Scholar]
  • (30).Hanwell MD; Curtis DE; Lonie DC; Vandermeersch T; Zurek E; Hutchison GR Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminform 2012, 4 (1), 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Seritan S; Bannwarth C; Fales BS; Hohenstein EG; Kokkila-Schumacher SIL; Luehr N; Snyder JW Jr; Song C; Titov AV; Ufimtsev IS; Martínez TJ TeraChem: Accelerating electronic structure and ab initio molecular dynamics with graphical processing units. J. Chem. Phys 2020, 152 (22), 224110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Zhang IY; Wu J; Xu X Extending the reliability and applicability of B3LYP. Chem. Commun 2010, 46 (18), 3057–3070. [DOI] [PubMed] [Google Scholar]
  • (33).Rassolov VA; Ratner MA; Pople JA; Redfern PC; Curtiss LA 6–31G* basis set for third-row atoms. J. Comput. Chem 2001, 22 (9), 976–984. [Google Scholar]
  • (34).Lebedev VI Quadratures on a sphere. USSR Computational Mathematics and Mathematical Physics 1976, 16 (2), 10–24. [Google Scholar]
  • (35).Mayer I Bond order and valence indices: A personal account. J. Comput. Chem 2007, 28 (1), 204–221. [DOI] [PubMed] [Google Scholar]
  • (36).Hutchings M; Liu J; Qiu Y; Song C; Wang L-P Bond-Order Time Series Analysis for Detecting Reaction Events in Ab Initio Molecular Dynamics Simulations. J. Chem. Theory Comput 2020, 16 (3), 1606–1617. [DOI] [PubMed] [Google Scholar]
  • (37).Wang L-P; McGibbon RT; Pande VS; Martinez TJ Automated Discovery and Refinement of Reactive Molecular Dynamics Pathways. J. Chem. Theory Comput 2016, 12 (2), 638–649. [DOI] [PubMed] [Google Scholar]
  • (38).Wang L-P; Titov A; McGibbon R; Liu F; Pande VS; Martínez TJ Discovering chemistry with an ab initio nanoreactor. Nat. Chem 2014, 6 (12), 1044–1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (39).Stein SE; Scott DR Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom 1994, 5 (9), 859–866. [DOI] [PubMed] [Google Scholar]
  • (40).Stein S Mass Spectral Reference Libraries: An Ever-Expanding Resource for Chemical Identification. Anal. Chem 2012, 84 (17), 7274–7282. [DOI] [PubMed] [Google Scholar]
  • (41).Li Y; Kind T; Folz J; Vaniya A; Mehta SS; Fiehn O Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat. Methods 2021, 18 (12), 1524–1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Wang LP; Song C Geometry optimization made simple with translation and rotation coordinates. J. Chem. Phys 2016, 144 (21), 214108. [DOI] [PubMed] [Google Scholar]
  • (43).Parrish RM; Burns LA; Smith DGA; Simmonett AC; DePrince AE III; Hohenstein EG; Bozkaya U; Sokolov AY; Di Remigio R; Richard RM; Gonthier JF; James AM; McAlexander HR; Kumar A; Saitow M; Wang X; Pritchard BP; Verma P; Schaefer HF III; Patkowski K; King RA; Valeev EF; Evangelista FA; Turney JM; Crawford TD; Sherrill CD Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability. J. Chem. Theory Comput 2017, 13 (7), 3185–3197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).Humphrey W; Dalke A; Schulten K VMD: Visual molecular dynamics. J. Mol. Graphics 1996, 14 (1), 33–38. [DOI] [PubMed] [Google Scholar]
  • (45).Vékey K Internal Energy Effects in Mass Spectrometry. J. Mass Spectrom 1996, 31 (5), 445–463. [Google Scholar]
  • (46).Wallace WE; Moorthy AS NIST Mass Spectrometry Data Center standard reference libraries and software tools: Application to seized drug analysis. Journal of Forensic Sciences 2023, 68 (5), 1484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (47).Mackey JL; Yang Z; Houk KN Dynamically concerted and stepwise trajectories of the Cope rearrangement of 1,5-hexadiene. Chem. Phys. Lett 2017, 683, 253–257. [Google Scholar]
  • (48).Alvi S; Singleton DA Energy Read-out as a Probe of Kinetically Hidden Transition States. Org. Lett 2021, 23 (6), 2174–2177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Carpenter BK Dynamic Matching: The Cause of Inversion of Configuration in the [1,3] Sigmatropic Migration? J. Am. Chem. Soc 1995, 117 (23), 6336–6344. [Google Scholar]
  • (50).Krettler CA; Thallinger GG A map of mass spectrometrybased in silico fragmentation prediction and compound identification in metabolomics. Briefings in Bioinformatics 2021, 22 (6), bbab073. [DOI] [PubMed] [Google Scholar]
  • (51).Wang F; Liigand J; Tian S; Arndt D; Greiner R; Wishart DS CFM-ID 4.0: More Accurate ESI-MS/MS Spectral Prediction and Compound Identification. Anal. Chem 2021, 93 (34), 11692–11700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (52).Bremer PL; Vaniya A; Kind T; Wang S; Fiehn O How Well Can We Predict Mass Spectra from Structures? Benchmarking Competitive Fragmentation Modeling for Metabolite Identification on Untrained Tandem Mass Spectra. J. Chem. Inf. Model 2022, 62 (17), 4049–4056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Böcker S Searching molecular structure databases using tandem MS data: are we there yet? Curr. Opin. Chem. Biol 2017, 36, 1–6. [DOI] [PubMed] [Google Scholar]
  • (54).Demireva M; Armentrout PB Relative Energetics of the Gas Phase Protomers of p-Aminobenzoic Acid and the Effect of Protonation Site on Fragmentation. J. Phys. Chem. A 2021, 125 (14), 2849–2865. [DOI] [PubMed] [Google Scholar]
  • (55).Valadbeigi Y; Causon T Mechanism of formation and ion mobility separation of protomers and deprotomers of diaminobenzoic acids and aminophthalic acids. Phys. Chem. Chem. Phys 2023, 25 (30), 20749–20758. [DOI] [PubMed] [Google Scholar]
  • (56).Navo CD; Jiménez-Osés G Computer Prediction of pKa Values in Small Molecules and Proteins. ACS Med. Chem. Lett 2021, 12 (11), 1624–1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (57).Yang Q; Li Y; Yang JD; Liu Y; Zhang L; Luo S; Cheng JP Holistic Prediction of the pK(a) in Diverse Solvents Based on a Machine-Learning Approach. Angew. Chem., Int. Ed. Engl 2020, 59 (43), 19282–19291. [DOI] [PubMed] [Google Scholar]
  • (58).Mansouri K; Cariello NF; Korotcov A; Tkachenko V; Grulke CM; Sprankle CS; Allen D; Casey WM; Kleinstreuer NC; Williams AJ Open-source QSAR models for pKa prediction using multiple machine learning approaches. J. Cheminform 2019, 11 (1), 60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (59).Lee J; Tantillo DJ; Wang L-P; Fiehn O Impact of Protonation Sites on Collision-Induced Dissociation-MS/MS Using CIDMD Quantum Chemistry Modeling. J. Chem. Inf. Model 2024, DOI: 10.1021/acs.jcim.4c00761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (60).Xing S; Huan T Radical fragment ions in collision-induced dissociation-based tandem mass spectrometry. Anal. Chim. Acta 2022, 1200, 339613. [DOI] [PubMed] [Google Scholar]
  • (61).Meroueh SO; Wang Y; Hase WL Direct Dynamics Simulations of Collision- and Surface-Induced Dissociation of N-Protonated Glycine. Shattering Fragmentation. J. Phys. Chem. A 2002, 106 (42), 9983–9992. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Data Availability Statement

CIDMD mass spectra are freely available at MassBank of North America, https://massbank.us, accessed on 04 September 2023. All in-house scripts that were created for this research project are freely available at https://github.com/jesilee/.

RESOURCES