Fragment Formula Calculator (FFC): Determination of Chemical Formulas for Fragment Ions in Mass Spectrometric Data

André Wegner; Daniel Weindl; Christian Jäger; Sean C Sapcariu; Xiangyi Dong; Gregory Stephanopoulos; Karsten Hiller

doi:10.1021/ac403879d

. Author manuscript; available in PMC: 2015 May 21.

Published in final edited form as: Anal Chem. 2014 Feb 5;86(4):2221–2228. doi: 10.1021/ac403879d

Fragment Formula Calculator (FFC): Determination of Chemical Formulas for Fragment Ions in Mass Spectrometric Data

André Wegner ^†,^‡,^*, Daniel Weindl ^†, Christian Jäger ^†, Sean C Sapcariu ^†, Xiangyi Dong ^†, Gregory Stephanopoulos ^‡, Karsten Hiller ^†

PMCID: PMC4440337 NIHMSID: NIHMS591698 PMID: 24498896

Abstract

The accurate determination of mass isotopomer distributions (MID) is of great significance for stable isotope-labeling experiments. Most commonly, MIDs are derived from gas chromatography/electron ionization mass spectrometry (GC/EI-MS) measurements. The analysis of fragment ions formed during EI, which contain only specific parts of the original molecule can provide valuable information on the positional distribution of the label. The chemical formula of a fragment ion is usually applied to derive the correction matrix for accurate MID calculation. Hence, the correct assignment of chemical formulas to fragment ions is of crucial importance for correct MIDs. Moreover, the positional distribution of stable isotopes within a fragment ion is of high interest for stable isotope-assisted metabolomics techniques. For example, ¹³C-metabolic flux analyses (¹³C-MFA) are dependent on the exact knowledge of the number and position of retained carbon atoms of the unfragmented molecule. Fragment ions containing different carbon atoms are of special interest, since they can carry different flux information. However, the process of mass spectral fragmentation is complex, and identifying the substructures and chemical formulas for these fragment ions is nontrivial. For that reason, we developed an algorithm, based on a systematic bond cleavage, to determine chemical formulas and retained atoms for EI derived fragment ions. Here, we present the fragment formula calculator (FFC) algorithm that can calculate chemical formulas for fragment ions where the chemical bonding (e.g., Lewis structures) of the intact molecule is known. The proposed algorithm is able to cope with general molecular rearrangement reactions occurring during EI in GC/MS measurements. The FFC algorithm is able to integrate stable isotope labeling experiments into the analysis and can automatically exclude candidate formulas that do not fit the observed labeling patterns.¹ We applied the FFC algorithm to create a fragment ion repository that contains the chemical formulas and retained carbon atoms of a wide range of trimethylsilyl and tertbutyldimethylsilyl derivatized compounds. In total, we report the chemical formulas and backbone carbon compositions for 160 fragment ions of 43 alkylsilyl-derivatives of primary metabolites. Finally, we implemented the FFC algorithm in an easy-to-use graphical user interface and made it publicly available at http://www.ffc.lu.

Stable isotope labeling experiments (SLE) have emerged as an important tool in metabolic engineering and systems biology.² Of key concern for SLE is the accurate assessment of isotopomer distributions of cellular metabolites by gas chromatography/mass spectrometry (GC/MS) and nuclear magnetic resonance (NMR).³ While NMR lacks sensitivity, it provides detailed positional information. In contrast, GC/MS allows a sensitive determination of isotopic enrichment but only provides limited positional information. Over the last years, powerful techniques such as metabolic flux analysis (MFA) have been developed to determine metabolic fluxes in biological systems based on the mass isotopomer distributions (MID) of small molecules.^4–6 MFA has been applied to many biomedical and biotechnological problems.^7–11 Usually, MIDs for mass spectral fragment ions can be calculated only if the chemical formula of the specific fragment ion is known, except if a special experimental setup is used.¹² Hence, most often only the information of the molecular ion peaks are used for MID measurements. However, electron ionization (EI)-based mass spectrometry leads to complex mass spectra, caused by the fragmentation of the analyzed compound. The analysis of fragment ions, which contain only specific parts of the original molecule, can provide valuable information on the positional isotopic enrichment within the molecule of interest. This positional distribution of the label is of high interest for ¹³C-MFA. In addition, based on the applied derivatization method, the molecular ion might not be visible at all and fragment ions have to be analyzed instead. An important consideration is that the process of assigning a chemical structure to a fragment ion from a known molecular ion structure is time-consuming, even for an expert.¹³

In this work, we propose a novel method for the determination of chemical formulas and retained atoms for EI fragment ions based on the two-dimensional (2D) structure of a compound in combination with the measured mass spectrum. In general, there are two ways to deal with EI-based fragmentation: a rule-based in silico prediction or a combinatorial approach. Rule-based algorithms, such as ACD/MS Fragmenter or Mass Frontier,¹⁴ rely on fragmentation mechanisms derived from molecules where the fragmentation is known, assuming that similar structures will fragment the same way. However, small changes in structure can lead to a significantly different fragmentation mechanism.¹³ Furthermore, the rule-based approach fails for molecules where no similar fragmentation mechanism is known. A combinatorial approach usually is based on a systematic bond cleavage. For that, a cleavage cost is assigned to each bond to find the substructure with minmal costs. Finding the correct cost function, however, is challenging. For example, MetFrag¹⁵ uses bond-dissociation energies, whereas FiD¹⁶ uses standard bond energies. One drawback of current rule-based and combinatorial approaches is that they can only capture simple hydrogen rearrangements but fail for more complex rearrangements.

Here, we present a universal method to determine chemical formulas for fragment ions without a priori knowledge about the fragmentation mechanisms, taking advantage of the combinatorial aspect of the problem. A method based on a similar idea has been proposed for high-resolution tandem mass spectrometry.¹⁶ However, our method is designed for MS data with nominal masses, as produced by most GC/MS instruments with a quadrupole mass analyzer, which are routinely used in many laboratories. In contrast to high-resolution MS data determining chemical formulas for nominal masses is algorithmically more challenging, because there are many possible permutations of elemental compositions that cannot easily be ruled out. In addition, our algorithm is able to cope with molecular rearrangements, which occur frequently in EI measurements.

THEORETICAL BACKGROUND

The fragmentation of gas phase ions is a complex and often hard-to-predict process. A detailed description can be found elsewhere.¹³ Although the whole fragmentation process can be very complex, there are only a few basic types of reactions that break or form chemical bonds: (1) σ-ionization, immediately breaks a bond (affecting mostly hydrocarbons); (2) α-cleavage, a new bond is formed from a radical site and an adjacent bond is homolytically cleaved; (3) charge-induced heterolytic cleavage, cleavage of a bond next to a charge-site; (4) rearrangements, migrations of atoms or groups of atoms (see Figure 1); (5) displacement of atoms or groups of atoms; and (6) eliminations.

Proposed fragmentation mechanism of N,O-bis-(trimethylsilyl)-glycine. After expulsion of a methyl radical by alpha cleavage next to the nitrogen, carbon monoxide loss occurs by a retro-Diels–Alder-like reaction.

Graph theory has been extensively used in the fields of biology and chemistry. To model the fragmentation of a molecule, we will apply its graph-theoretical representation to determine chemical formulas of mass spectrometric fragment ions. On the basis of the fragmentation rules described above, a fragment ion is always composed of a subset of atoms of the original molecule. By using graph theory, the problem of assigning a chemical formula to a fragment ion can, therefore, be broken down to finding a subgraph H of G, assuming the graph G represents the structure of the molecular ion.

A graph is an ordered pair G = (V,E) where V is a set of vertices (or nodes) and E a set of edges. Each element of E contains a pair (u,v), elements of V. The term labeled graph refers to a graph G, where a label is assigned to the set of vertices and edges. Formally, this is expressed by the two functions f_V: V → A for the set of vertices and f_E: V × V → B for the set of edges. If B is an ordered set (e.g., real numbers) then the graph is called weighted and the value f_E (u,v) is called the weight of the edge from u to v. A connected component C of a graph G has every pair of vertices joined by a path. A connected graph consists of one connected component. The removal of a set of edges, which disconnects the graph, is called a cut. A subgraph of G = (V,E) is a graph H = (W,F), where W is a subset of V, and F is a subset of E, and all edges in F have their end points in W.

ALGORITHM

We model a molecule as an undirected, connected, and labeled graph G = (V, E, f_VA, f_VB, f_VC, f_ED), where V is the set of vertices corresponding to the atoms and E is the set of undirected edges corresponding to the bonds between the atoms. The function f_VA: V → A assigns each atom an element (e.g., carbon, hydrogen, etc.), f_VB: V → B assigns each atom an index, and f_VC: V → C assigns each atom the atomic mass according to the chemical element. The function f_ED: V × V → D assigns each bond an order (single, double, or triple). The mass of the molecular ion corresponds to the sum of the masses of all vertices:

W (G) = \sum_{v \in V} f_{V C} (v)

(1)

The underlying idea of this algorithm is that the fragmentation process usually only breaks a few bonds within the molecule. This can be simulated by removing a defined number of edges within the molecular graph. In terms of graph theory this means to induce a cut of a certain size in in the graph. This can leave the graph G disconnected. The resulting connected components C = {C₁, ..., C_n} of the subgraph H each have a molecular mass:

W (C_{i}) = \sum_{v \in V (C_{i})} f_{V C} (v)

(2)

Since the mass (m) of the fragment ion is determined by mass spectrometry, the chemical formula of this fragment ion corresponds to a combination of connected components of H, in which molecular masses W(C_i) sum up to m. Figure 2 illustrates this process. The resulting subgraph (representing the chemical composition), which can be composed of several connected components, does not necessarily represent the chemical structure because the formation of new bonds (e.g., fragmentation rule 4) is not modeled. However, the number and position of atoms of the intact compound retained in this fragment ion is uncovered.

Overview of the algorithm. (A) As input FFC needs the 2D structure of the compound together with the mass spectrum of the ion of interest. In this example, we present the molecule N,O-bis-(trimethylsilyl)-glycine (219 Da) and the fragment ion at mass 176. (B) 2D Structure is first converted into a molecular graph. The graph contains 34 vertices and 33 edges. Then all combinations of edge sets of a certain size (in this case 3) are consecutively deleted from the graph, resulting in 5456 disconnected graphs, one for each edge set deleted. The number of resulting subgraphs can be calculated with the binomial coefficient, where n corresponds to the number of edges and k corresponds to the cut size (eq 3). For simplification, only the edge set leading to the correct fragmentation is shown here. (C) For each disconnected graph, the connected components are determined. For every combination of connected components where the molecular masses sum up to the mass of the fragment ion, the atoms of these components are combined to build up a candidate formula. In this example, the connected components shown in green and light blue with the masses 87 and 89 sum up to the target mass of 176. The candidate formula is then C₆H₁₈NO₂Si₂, which is indeed the correct formula for this fragment ion. In addition to the chemical formula, the algorithm also yields positional information about the fate of specific atoms. For example, the carboxyl carbon of the original glycine molecule is lost in this fragment ion. (D) On the basis of the candidate formula, the theoretical mass spectrum is predicted and a spectrum similarity score to the measured spectrum based on the dot product¹⁷ is calculated. This is of special importance if more than one sum formula can be derived for the target mass.

So far, we have relied on the assumption that the correct edges are deleted from the graph. There are two unknowns, the number and the position of edges to be deleted. To define the minimal number of edges to delete from the graph (cut size), necessary to model the fragmentation, it is mandatory to take the fragmentation rules (as stated in Theoretical Background) into consideration. Fragmentation types 1–3 cleave one bond without forming new σ-bonds, 4 and 5 cleave one bond while forming a new one, 6 cleaves two bonds while forming a new one. Therefore, to describe an α-cleavage or a σ-ionization, clearly a cut size of one is sufficient. To simulate a simple elimination or a rearrangement, which is equivalent to deleting one edge in the graph, a cut size of one is also necessary. For the combination of a more complex rearrangement and an α-cleavage (as depicted in Figure 1), a cut size of three is necessary. To capture both the single and the combined fragmentations, the algorithm is designed to work with a defined maximum cut size. The cut size starts at one and subsequently increases until it reaches the defined maximum cut size.

One way to find the correct edges to delete from the graph is to select those edges that are most likely to break. For example, low-energy bonds can be assumed to break more easily. Although this is correct, additional rules are needed to describe rearrangements. Another more straightforward way is to delete all possible combinations of edges of a certain cut size. Certainly this includes the correct edges but at the same time increases the number of possible results enormously. If the number of edges is given by n and the cut size by k, then the number of k distinct elements of n is given by the binomial coefficient:

(\begin{matrix} n \\ k \end{matrix}) = \frac{n!}{k! \cdot (n - k)!}

(3)

For example, the graph of the molecule N,O-bis-(trimethylsilyl)-glycine with the molecular formula C₈H₂₁NO₂Si₂ has 33 edges. The number of possible distinct edge sets to delete for a cut size of 3 is then 5456.

To find the correct edges, the resulting fragment formulas for each of these possibilities have to be ranked according to a score. At best, this score is linked to the measured mass spectrum. One elegant way to do so is to predict the theoretical mass spectrum of the determined fragment formula and calculate a spectrum similarity score to the measured mass spectrum of this fragment ion. A mass spectrum can be theoretically predicted by using the natural stable isotopic distribution of elements and statistical theory.¹⁸ For elements that only have one naturally occurring stable isotope of significant abundance, the distribution of isotopes can be predicted by a binomial distribution:

m_{i} = \frac{n!}{i! \cdot (n - i)!} \cdot p_{0}^{n - 1} \cdot p_{1}^{i}

(4)

where n is the total number of atoms, i the number of atoms containing the heavier isotope (e.g., ¹³C), p₀ the natural abundance of the lighter isotope [e.g., p(¹²C) = 0.989] and p₁ the natural abundance of the heavier isotope [e.g., p(¹³C) = 0.01]. In case an element has several natural occurring isotopes, the distribution of those isotopes within a molecule can be predicted by a multinomial distribution:

m_{i} = \frac{n!}{a_{1}! \cdot a_{2}! \cdot \dots \cdot a_{k}!} \cdot p_{0}^{a_{0}} \cdot p_{1}^{a_{1}} \cdot \dots \cdot p_{k}^{a_{k}}

(5)

where n is the total number of atoms, a₀ to a_k the number of atoms containing the respective isotope, and p₀ to p_k the natural abundances of those isotopes.

Reducing Algorithmic Complexity

For GC/MS, compounds are usually derivatized prior to analysis. For example, active protons in functional groups (hydroxyl-, carboxyl-, thiol-, amino groups, etc.) can be replaced with a trimethylsilyl (TMS) or tert-butyldimethylsilyl (TBDMS) group. This makes compounds more volatile and less reactive but at the same time increases the computational complexity of finding the correct chemical formula of a fragment ion. In the case of stable isotope labeling experiments, the interest lies normally only in labeling patterns for atoms of the original (underivatized) molecule. As a consequence, the information obtained from the loss of atoms originating from the derivatization reagent used is often redundant. For example, when TMS derivatization is used, a [M – 15]⁺ fragment is often present in the mass spectrum, originating from the loss of a methyl group from the derivatized part of the molecule. Depending on the number of TMS groups within the molecule, there are several possibilities for the position of the lost methyl group. With regard to the calculation of chemical formulas, however, the position of this methyl group is not relevant and computational time can thus be saved. For that reason, we divide the molecular graph into atoms belonging to the original molecule (backbone atoms) and atoms originating from the derivatization reagent used. Subsequently, nonbackbone edges (edges that are not connected to at least one backbone atom) are grouped based on the atoms that would be lost if this edge is deleted (Figure 3). For example, all edges are grouped together where their removal would lead to the loss of one hydrogen. This reduces the number of distinct edges significantly, thereby decreasing the combinatorial complexity for the problem of finding the correct chemical formula. Additionally, this allows the user to follow the fate of specific atoms in the molecular ion by selecting them as backbone atoms.

Graph representation of N,O-bis-(trimethylsilyl)-glycine. The graph contains 33 edges. For a cut size of three, the number of distinct edge pairs to delete is 5456. To reduce the number of distinct edge pairs, non backbone edges (edges that are not connected to at least one backbone atom) are grouped based on their loss pattern. For example, edges shown in red are grouped together because their removal leads to the loss of one hydrogen. The group of edges shown in blue leads to the loss of a methyl group when one of these edges is removed. The group of edges shown in green lead to the loss of a TMS group when one of these edges is removed. After reduction to relevant backbone edges, the graph now contains only 7 distinct edge groups (as illustrated by the numbers above the edges) which reduces the number of distinct edge sets of size 3 from 5456 to 35.

Another advantage which makes the proposed algorithm capable of modeling rearrangements is the use of connected components. Fragment ions resulting from a rearrangement reaction are often composed of two or more disjoint substructures of the molecular ion. Identifying these substructures is computationally challenging, as their number grows enormously with the number of atoms. However, in our algorithm, the number of these substructures is limited by the number of connected components within the molecular graph, making the proposed algorithm also applicable for larger molecules.

Constraining/Weighting the Result Set

One problem of finding a chemical formula through a combinatorial- instead of a rule-based approach is the high number of possible results. One way to remove redundant results is to consider only results where either the molecular formula or the composition of backbone atoms changes. In other words, results with the same chemical formula but different nonbackbone atoms are ignored (as stated above). Although this shrinks the result set considerably, it still leaves a fair amount of candidate formulas. For that reason, the FFC program allows for the addition of a spectrum of a stable isotope labeling experiment to the analysis. Labeled fragments are automatically detected, and MIDs for those fragments are calculated in order to determine the number of labeled atoms within this fragment. Candidate formulas that do not fit the labeling pattern are directly excluded from the result set.

MATERIAL AND METHODS

Details can be found in the Supporting Information.

IMPLEMENTATION

FFC has been developed in C++ and Qt4 and is based on the publicly available MetaboliteDetector,¹⁹ NTFD,²⁰ and the ICBM algorithm.²¹ All graph-based calculations are done using the LEMON graph library,²² available at http://lemon.cs.elte.hu.

RESULTS AND DISCUSSION

We first validated the predictive capabilities of FFC by identifying the chemical formulas for 35 fragment ions of 13 tert-butyldimethylsilyl derivatized amino acids. These manually curated formulas have been published previously by Antoniewicz.¹ The mass spectra as well as the 2D structures were obtained from the NIST 08 library. An overview of all fragment ions tested is depicted in Table 1 of the Supporting Information. We tested whether FFC can not only predict the correct formula but also the correct position of retained backbone carbon atoms, which is very important for MFA. We considered a predicted formula as correct when the candidate with the lowest number of broken bonds matched the formula proposed by Antoniewicz. If there were multiple formulas resulting from the same number of broken bonds, we selected the formula with the highest spectrum similarity score. For the composition of backbone carbon atoms, selecting the correct solution is more challenging because candidates with different backbone carbon atoms but the same formula will have the same spectrum similarity score. For this reason, we only considered the prediction of backbone carbon atoms present to be correct if there was a unique solution. Overall, FFC was able to correctly predict 34 out of 35 chemical formulas and 30 out of 35 backbone carbon compositions. In the case of threonine, the formula and the carbon atoms for the fragment ion at m/z 376 were predicted incorrectly. However, when we used a spectrum measured using our Agilent 5975C MSD, both the formulas and the carbon atoms were predicted correctly. Apparently, the spectrum similarity score is dependent on the quality of the spectra used and how close it reflects the theoretical distribution of naturally occurring isotopes. The number of correctly predicted formulas is slightly higher compared to the number of backbone carbon compositions because of similar structural groups within the 2D structure of certain molecules. For example, aspartate 3TBDMS and glutamate 3TBDMS both have two carboxyl groups; for the ions at m/z 390 (Asp) and 330 (Glu), it is not clear which of these two groups is cleaved off. The chemical formula, however, is the same. In case of leucine and isoleucine, the side chains have the same chemical formula (C₄H₉) as the tert-butyl group and, therefore, have the same mass (m/z 57) and cannot be distinguished by our algorithm. The top two ranked candidate formulas for ions at m/z 200, 274, and 302 of N,O-bis(dimethyl-tert-butylsilyl)-leucine are depicted in Figure 4. For the ion at m/z 302, there are two equally ranked candidate formulas, resulting from either an α-cleavage of the tert-butyl group or the side chain. Interestingly, Antoniewicz showed with a stable isotope labeling experiment that two fragments with the same chemical formula are overlapping for this ion. He found significant M + 2 and M + 6 mass isotopomer abundances when using U–¹³C-leucine. This suggests that both backbone carbon atom compositions predicted by FFC are legitimate. For the ion at m/z 274, again there are two equally ranked candidate formulas, resulting from either a loss of a tert-butyl and the carbonyl group or the loss of the side chain and the carbonyl group. However, when using U–¹³C-leucine, only the M + 5 peak is abundant, suggesting that five of the six carbon backbone atoms are still present in this fragment.¹ This result can be explained by the rearrangement mechanism depicted in Figure 1. The retro-Diels–Alder-like rearrangement occurs only if the N-terminal tert-butyl is lost in a previous fragmentation step, leading to the loss of the carbonyl group. As these two candidate formulas cannot be distinguished solely from unlabeled spectra (unless an expert in the field is looking at it), a stable isotope labeling experiment should be performed to determine which formula is correct. For ion 200, the correct formula is C₁₁H₂₆NSi, resulting from an α-cleavage between the carbon of the carboxyl group and the adjacent carbon atom. The second best hit with the formula C₁₁H₂₄OSi has a slightly higher spectrum similarity score of 0.999866 (compared to 0.999819) but needs the higher number of broken bonds, which is very unlikely from a chemical point of view. In our analysis, the correct chemical formula for each fragment ion was always present in the list of results. However, as with most prediction algorithms, a critical look at the result is necessary in order to pull out those that are most chemically relevant.

Chemical formulas for ions 200, 274, and 302 of N,O-bis(dimethyl-*tert*-butylsilyl)-leucine. The two best-ranked hits, according to the number of broken bonds and spectrum similarity for each ion are shown. Incorrect fragmentations are visualized with a lower opacity, and cleaved atoms are shown in red.

Next, we applied the FFC program to determine the chemical formulas and carbon backbone compositions of a wide range of trimethylsilyl- (Tables 1, 2, and 3) and tert-butyldimethylsilyl- (Table 2 of the Supporting Information) derivatized compounds of central carbon metabolism. In this article, we report a fragment ion repository that includes the chemical formulas and the retained carbon atoms for 160 fragment ions of 43 compounds. The retained carbon backbone compositions of all compounds can be found in the Supporting Information. We manually curated these formulas and verified them with labeled reference spectra. For that, we generated fully ¹³C-labeled yeast extracts as described in the Materials and Methods section of the Supporting Information. These labeled spectra can be imported in the FFC program, and results that do not fit the labeling pattern are directly removed from the result set. We additionally validated the TMS spectra with deuterated N-methyl-N-(trimethyl-d₉-silyl)trifluoroacetamide (MSTFA-d₉) as a derivatization reagent. In conclusion, we present a high quality fragment ion repository that can help researchers to analyze stable isotope-labeling experiments. For example, the fragment formulas can be used to calculate MIDs, which in turn can be used in combination with the retained carbon atoms to perform ¹³C-MFA.

Table 1.

Fragments of TMS-Derivatized Compounds Part 1

compound	m/z	m/z ¹³C	m/z d₉-TMS	formula
adenine 2TMS	279	284	297	C₁₁H₂₁N₅Si₂
	264	269	279	C₁₀H₁₈N₅Si₂
	206	211	215	C₈H₁₂N₅Si
alanine 2TMS	233	–	–	C₉H₂₃NO₂Si₂
	218	220, 221	233, 236	C₈H₂₀NO₂Si₂
	190	192	205	C₇H₂₀NOSi₂
	116	118	125	C₅H₁₄NSi
aspartic acid 2TMS	277	281	295	C₁₀H₂₃NO₄Si₂
	262	266	277	C₉H₂₀NO₄Si₂
	234	237	249	C₈H₂₀NO₃Si₂
	220	222	235	C₇H₁₈NO₃Si₂
	160	163	169	C₆H₁₄N₁O₂Si
aspartic acid 3TMS	349	354	376	C₁₃H₃₁NO₄Si₃
	334	338	358	C₁₂H₂₈NO₄Si₃
	306	309	330	C₁₁H₂₈NO₃Si₃
	292	294	316	C₁₀H₂₆NO₃Si₃
	232	235	250	C₉H₂₂NO₂Si₂
	218	220	236	C₈H₂₀NO₂Si₂
β-alanine 3TMS	305	–	–	C₁₂H₃₁NO₂Si₃
	290	–	314	C₁₁H₂₈NO₂Si₃
	248	–	272	C₉H₂₆NOSi₃
	232	–	250	C₉H₂₂NO₂Si₂
	174	–	192	C₇H₂₀NSi
	86	–	92	C₃H₆OSi
citric acid 4TMS	480	–	–	C₁₈H₄₀O₇Si₄
	465	471	498	C₁₇H₃₇O₇Si₄
	375	381	399	C₁₄H₂₇O₆Si₃
	363	368	390	C₁₄H₃₁O₅Si₃
	347	352	371	C₁₃H₂₇O₅Si₃
	273	278	291	C₁₁H₂₁O₄Si₂
3-phosphoglycerate 4TMS	474			C₁₅H₃₉O₇PSi₄
	459	462	492	C₁₄H₃₆O₇PSi₄
	387	387	423	C₁₂H₃₆O₄PSi₄
	357	359	384	C₁₁H₃₀O₅PSi₃
	315	315	342	C₉H₂₈O₄PSi₃
	299	299	323	C₈H₂₄O₄PSi₃
glycerol-3-phosphate 4TMS	460			C₁₅H₄₁O₆PSi₄
	445	448	478	C₁₄H₃₈O₆PSi₄
	387	387	423	C₁₂H₃₆O₄PSi₄
	357	359	384	C₁₁H₃₀O₅PSi₃
	341	343	365	C₁₀H₂₆O₅PSi₃
	299	299	323	C₈H₂₄O₄PSi₃

Open in a new tab

Table 2.

Fragments of TMS-Derivatized Compounds Part 2

compound	m/z	m/z ¹³C	m/z d₉-TMS	formula
glutamic acid 3TMS	363	368	390	C₁₄H₃₃NO₄Si₃
	348	353	372	C₁₃H₃₀NO₄Si₃
	320	324	344	C₁₂H₃₀NO₃Si₃
	246	250	264	C₁₀H₂₄NO₂Si₂
	230	234	245	C₉H₂₀NO₂Si₂
glutamine 3TMS	362	367	389	C₁₄H₃₄N₂O₃Si₃
	347	352	371	C₁₃H₃₁N₂O₃Si₃
	273	278	291	C₁₁H₂₅N₂O₂Si₂
	245	249	263	C₁₀H₂₅N₂O₁Si₂
glycerol 3TMS	308	–	–	C₁₂H₃₂O₃Si₃
	293	296	317	C₁₁H₂₉O₃Si₃
	218	221	236	C₉H₂₂O₂Si₂
	205	207	223	C₈H₂₁O₂Si₂
glycine 3TMS	291	293	–	C₁₁H₂₉NO₂Si₃
	276	278	300	C₁₀H₂₆NO₂Si₃
	248	249	274	C₉H₂₆NOSi₃
	174	175	192	C₇H₂₀NSi₂
isoleucine 2TMS	275	–	–	C₁₂H₂₉NO₂Si₂
	260	265, 266	275, 278	C₁₁H₂₆NO₂Si₂
	232	237	247	C₁₀H₂₆NOSi₂
	218	220	236	C₈H₂₀NO₂Si₂
	158	163	167	C₈H₂₀NSi
leucine 2TMS	275	–	–	C₁₂H₂₉NO₂Si₂
	260	265, 266	275, 278	C₁₁H₂₆NO₂Si₂
	232	237	247	C₁₀H₂₆NOSi₂
	218	220	236	C₈H₂₀NO₂Si₂
	158	163	167	C₈H₂₀NSi
lysine 3TMS	362	368	389	C₁₅H₃₈N₂O₂Si₃
	347	353	371	C₁₄H₃₅N₂O₂Si₃
	200	206	209	C₉H₁₈NO₂Si
	174	175	192	C₇H₂₀NSi₂
	156	161	165	C₈H₁₈NSi
lysine 4TMS	434	440	470	C₁₈H₄₆N₂O₂Si₄
	419	425	452	C₁₇H₄₃N₂O₂Si₄
	391	396	324	C₁₆H₄₃N₂OSi₄
	317	322	344	C₁₄H₃₇N₂Si₃
	174	175	192	C₁₇H₂₀NSi₂
malic acid 3TMS	350	354	377	C₁₃H₃₀O₅Si₃
	335	339	359	C₁₂H₂₇O₅Si₃
	307	311	331	C₁₁H₂₇NO₄Si₃
	245	249	260	C₉H₁₇O₄Si₂
	233	236	251	C₉H₂₁O₃Si₂

Open in a new tab

Table 3.

Fragments of TMS-Derivatized Compounds Part 3

compound	m/z	m/z ¹³C	m/z d₉-TMS	formula
phenylalanine 2TMS	309	–	–	C₁₅H₂₇NO₂Si₂
	294	303	309	C₁₄H₂₄NO₂Si₂
	266	274	281	C₁₃H₂₄NOSi₂
	218	220	236	C₈H₂₀NO₂Si₂
	192	200	201	C₁₁H₁₈NSi
proline 2TMS	259	–	–	C₁₁H₂₅NO₂Si₂
	244	249	259	C₁₀H₂₂NO₂Si₂
	216	220	231	C₉H₂₂NOSi₂
	142	146	151	C₇H₁₆NSi
serine 3TMS	321	–	–	C₁₂H₃₁NO₃Si₃
	306	309	330	C₁₁H₂₈NO₃Si₃
	278	280	302	C₁₀H₂₈NO₂Si₃
	218	220	236	C₈H₂₀NO₂Si₂
	204	206	222	C₈H₂₂NOSi₂
	188	190	203	C₇H₁₈NOSi₂
succinic acid 2TMS	262		280	C₁₀H₂₂O₄Si₂
	247		262	C₉H₁₉O₄Si₂
	172		181	C₇H₁₂O₃Si
threonine 3TMS	335	–	–	C₁₃H₃₃NO₃Si₃
	320	324	344	C₁₂H₃₀NO₃Si₃
	218	221	236	C₉H₂₄NOSi
tyrosine 2TMS	325	–	–	C₁₅H₂₇NO₃Si₂
	310	319	325	C₁₄H₂₄NO₃Si₂
	282	290	297	C₁₃H₂₄NO₂Si₂
	208	216	217	C₁₁H₁₈NOSi
	192	200	198	C₁₀H₁₄NOSi
tyrosine 3TMS	397	–	–	C₁₈H₃₅NO₃Si₃
	382	391	406	C₁₇H₃₂NO₃Si₃
	354	362	378	C₁₆H₃₂NO₂Si₃
	280	288	298	C₁₄H₂₆NOSi₂
	218	220	236	C₈H₂₀NO₂Si₂
uracil 2TMS	256	260	284	C₁₀H₂₀N₂O₂Si₂
	241	245	256	C₉H₁₇N₂O₂Si₂
valine 2TMS	261	–	–	C₁₁H₂₇NO₂Si₂
	246	251	261	C₁₀H₂₄NO₂Si₂
	218	220, 222	233, 236	C₉H₂₄NOSi₂

Open in a new tab

The calculation time for this algorithm is dependent on the size of the molecule and the maximum cut size. For small molecules like N,O-bis-(trimethylsilyl)-glycine the run time is in the range of miliseconds, whereas for bigger molecules like (1Z)-O-methyloxime-2,3,4,5,6-pentakis-O-(trimethylsilyl)-glucose the run time is in the range of seconds on a standard PC.

CONCLUSION

In this article, we present FFC as an algorithm to not only calculate chemical formulas but also retained atoms of a compound in its mass spectrometric fragment ions. Knowing the correct number and position of specific atoms present in a fragment ion is of great significance for MFA. Although only carbon atoms were tracked in the validation experiment, in theory any element's fate (e.g., nitrogen, sulfur, and hydrogen) can be followed with this algorithm. We provide an easy to use software with a user-friendly graphical interface. Due to the combinatorial nature of our approach, it is not necessary to model the fragmentation based on a rule set, such as the preferred site of ionization or the bonds most likely to break. This also allows the calculation of chemical formulas for compounds where no similar fragmentation mechanism is known. However, identical structural groups present in the compound of interest can complicate interpretation when there is ambiguity in the results (e.g., alkanes, sugars, or fatty acids). To further filter out incorrect formulas, FFC can integrate results of a stable isotope labeling experiment to exclude results that do not fit the labeling pattern. In this article, we showed that this algorithm can be successfully applied to a wide range of biochemical compounds by identifying the chemical formulas and carbon backbone combinations for a wide range of compounds.

FFC is freely available under http://www.ffc.lu. Currently, installable packages for Linux (Debian, Red Hat packages), Mac OS, and Windows are provided.

Supplementary Material

NIHMS591698-supplement.pdf^{(1.5MB, pdf)}

ACKNOWLEDGMENTS

The authors gratefully thank Kazunori Sawada and Patrick May for their constructive comments. The authors acknowledge financial support from the Fonds National de la Recherche (FNR). Specifically, K.H. and D.W. are funded by the ATTRACT program Metabolomics Junior Group, A.W. is supported by the AFR Grant 1328318, and S.C.S. is supported by the HICE virtual institute.

Footnotes

The authors declare no competing financial interest.

ASSOCIATED CONTENT

Supporting Information

Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.

REFERENCES

1.Antoniewicz MR. Ph.D. thesis. Massachusetts Institute of Technology; 2006. Comprehensive Analysis of Metabolic Pathways Through the Combined Use of Multiple Isotopic Tracers. [Google Scholar]
2.Sauer U. Mol. Syst. Biol. 2006;2:62. doi: 10.1038/msb4100109. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Antoniewicz MR, Kelleher JK, Stephanopoulos G. Anal. Chem. 2007;79:7554–7559. doi: 10.1021/ac0708893. [DOI] [PubMed] [Google Scholar]
4.Antoniewicz MR, Kelleher JK, Stephanopoulos G. Metab. Eng. 2007;9:68–86. doi: 10.1016/j.ymben.2006.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Villas-Boas SG, Moxley JF, Akesson M, Stephanopoulos G, Nielsen J. The Biochemical Journal. 2005;388:669–677. doi: 10.1042/BJ20041162. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Nöh K, Grönke K, Luo B, Takors R, Oldiges M, Wiechert W. Journal of Biotechnology. 2007;129:249–267. doi: 10.1016/j.jbiotec.2006.11.015. [DOI] [PubMed] [Google Scholar]
7.Metallo CM, Gameiro P. a., Bell EL, Mattaini KR, Yang J, Hiller K, Jewell CM, Johnson ZR, Irvine DJ, Guarente L, Kelleher JK, Vander Heiden MG, Iliopoulos O, Stephanopoulos G. Nature. 2012;481:380–384. doi: 10.1038/nature10602. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Wegner A, Cordes T, Michelucci A, Hiller K. Current Biotechnology. 2012;1:88–97. [Google Scholar]
9.Niklas J, Priesnitz C, Rose T, Sandig V, Heinzle E. Appl. Microbiol. Biotechnol. 2012;93:1637–1650. doi: 10.1007/s00253-011-3526-6. [DOI] [PubMed] [Google Scholar]
10.McGuirk S, Gravel S-P, Deblois G, Papadopoli DJ, Faubert B, Wegner A, Hiller K, Avizonis D, Akavia UD, Jones RG, Giguére V, St-Pierre J. Cancer and Metabolism. 2013;1:22. doi: 10.1186/2049-3002-1-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Michelucci A, Cordes T, Ghelfi J, Pailot A, Reiling N, Goldmann O, Binz T, Wegner A, Tallam A, Rausell A, Buttini M, Linster CL, Medina E, Balling R, Hiller K. Proc. Natl. Acad. Sci. U.S.A. 2013;110:7820–7825. doi: 10.1073/pnas.1218599110. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hiller K, Metallo CM, Kelleher JK, Stephanopoulos G. Anal. Chem. 2010;82:6621–6628. doi: 10.1021/ac1011574. [DOI] [PubMed] [Google Scholar]
13.McLafferty FW, Turecek FJ. Chem. Educ. 1994;71:A54. [Google Scholar]
14.HighChem Mass Frontier 7.0. Highchem; Bratislava, Slovakia: 2011. [Google Scholar]
15.Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. BMC Bioinf. 2010;11:148. doi: 10.1186/1471-2105-11-148. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Heinonen M, Rantanen A, Mielikäinen T, Kokkonen J, Kiuru J, Ketola RA, Rousu J. Rapid Commun. Mass Spectrom. 2008;22:3043–3052. doi: 10.1002/rcm.3701. [DOI] [PubMed] [Google Scholar]
17.Stein SE, Scott DR. J. Am. Soc. Mass Spectrom. 1994;5:859–866. doi: 10.1016/1044-0305(94)87009-8. [DOI] [PubMed] [Google Scholar]
18.Fernandez CA, Des Rosiers C, Previs SF, David F, Brunengraber HJ. Mass Spectrom. 1996;31:255–262. doi: 10.1002/(SICI)1096-9888(199603)31:3<255::AID-JMS290>3.0.CO;2-3. [DOI] [PubMed] [Google Scholar]
19.Hiller K, Hangebrauk J, Jäager C, Spura J, Schreiber K, Schomburg D. Anal. Chem. 2009;81:3429–3439. doi: 10.1021/ac802689c. [DOI] [PubMed] [Google Scholar]
20.Hiller K, Wegner A, Weindl D, Cordes T, Metallo CM, Kelleher JK, Stephanopoulos G. Bioinformatics (Oxford, England) 2013;29:1226–1228. doi: 10.1093/bioinformatics/btt119. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wegner A, Sapcariu SC, Weindl D, Hiller K. Anal. Chem. 2013;85:4030–4037. doi: 10.1021/ac303774z. [DOI] [PubMed] [Google Scholar]
22.Dezso B, Jüttner A, Kovács P. Electronic Notes in Theoretical Computer Science. 2011;264:23–45. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS591698-supplement.pdf^{(1.5MB, pdf)}

[R1] 1.Antoniewicz MR. Ph.D. thesis. Massachusetts Institute of Technology; 2006. Comprehensive Analysis of Metabolic Pathways Through the Combined Use of Multiple Isotopic Tracers. [Google Scholar]

[R2] 2.Sauer U. Mol. Syst. Biol. 2006;2:62. doi: 10.1038/msb4100109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Antoniewicz MR, Kelleher JK, Stephanopoulos G. Anal. Chem. 2007;79:7554–7559. doi: 10.1021/ac0708893. [DOI] [PubMed] [Google Scholar]

[R4] 4.Antoniewicz MR, Kelleher JK, Stephanopoulos G. Metab. Eng. 2007;9:68–86. doi: 10.1016/j.ymben.2006.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Villas-Boas SG, Moxley JF, Akesson M, Stephanopoulos G, Nielsen J. The Biochemical Journal. 2005;388:669–677. doi: 10.1042/BJ20041162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Nöh K, Grönke K, Luo B, Takors R, Oldiges M, Wiechert W. Journal of Biotechnology. 2007;129:249–267. doi: 10.1016/j.jbiotec.2006.11.015. [DOI] [PubMed] [Google Scholar]

[R7] 7.Metallo CM, Gameiro P. a., Bell EL, Mattaini KR, Yang J, Hiller K, Jewell CM, Johnson ZR, Irvine DJ, Guarente L, Kelleher JK, Vander Heiden MG, Iliopoulos O, Stephanopoulos G. Nature. 2012;481:380–384. doi: 10.1038/nature10602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Wegner A, Cordes T, Michelucci A, Hiller K. Current Biotechnology. 2012;1:88–97. [Google Scholar]

[R9] 9.Niklas J, Priesnitz C, Rose T, Sandig V, Heinzle E. Appl. Microbiol. Biotechnol. 2012;93:1637–1650. doi: 10.1007/s00253-011-3526-6. [DOI] [PubMed] [Google Scholar]

[R10] 10.McGuirk S, Gravel S-P, Deblois G, Papadopoli DJ, Faubert B, Wegner A, Hiller K, Avizonis D, Akavia UD, Jones RG, Giguére V, St-Pierre J. Cancer and Metabolism. 2013;1:22. doi: 10.1186/2049-3002-1-22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Michelucci A, Cordes T, Ghelfi J, Pailot A, Reiling N, Goldmann O, Binz T, Wegner A, Tallam A, Rausell A, Buttini M, Linster CL, Medina E, Balling R, Hiller K. Proc. Natl. Acad. Sci. U.S.A. 2013;110:7820–7825. doi: 10.1073/pnas.1218599110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Hiller K, Metallo CM, Kelleher JK, Stephanopoulos G. Anal. Chem. 2010;82:6621–6628. doi: 10.1021/ac1011574. [DOI] [PubMed] [Google Scholar]

[R13] 13.McLafferty FW, Turecek FJ. Chem. Educ. 1994;71:A54. [Google Scholar]

[R14] 14.HighChem Mass Frontier 7.0. Highchem; Bratislava, Slovakia: 2011. [Google Scholar]

[R15] 15.Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. BMC Bioinf. 2010;11:148. doi: 10.1186/1471-2105-11-148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Heinonen M, Rantanen A, Mielikäinen T, Kokkonen J, Kiuru J, Ketola RA, Rousu J. Rapid Commun. Mass Spectrom. 2008;22:3043–3052. doi: 10.1002/rcm.3701. [DOI] [PubMed] [Google Scholar]

[R17] 17.Stein SE, Scott DR. J. Am. Soc. Mass Spectrom. 1994;5:859–866. doi: 10.1016/1044-0305(94)87009-8. [DOI] [PubMed] [Google Scholar]

[R18] 18.Fernandez CA, Des Rosiers C, Previs SF, David F, Brunengraber HJ. Mass Spectrom. 1996;31:255–262. doi: 10.1002/(SICI)1096-9888(199603)31:3<255::AID-JMS290>3.0.CO;2-3. [DOI] [PubMed] [Google Scholar]

[R19] 19.Hiller K, Hangebrauk J, Jäager C, Spura J, Schreiber K, Schomburg D. Anal. Chem. 2009;81:3429–3439. doi: 10.1021/ac802689c. [DOI] [PubMed] [Google Scholar]

[R20] 20.Hiller K, Wegner A, Weindl D, Cordes T, Metallo CM, Kelleher JK, Stephanopoulos G. Bioinformatics (Oxford, England) 2013;29:1226–1228. doi: 10.1093/bioinformatics/btt119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Wegner A, Sapcariu SC, Weindl D, Hiller K. Anal. Chem. 2013;85:4030–4037. doi: 10.1021/ac303774z. [DOI] [PubMed] [Google Scholar]

[R22] 22.Dezso B, Jüttner A, Kovács P. Electronic Notes in Theoretical Computer Science. 2011;264:23–45. [Google Scholar]

PERMALINK

Fragment Formula Calculator (FFC): Determination of Chemical Formulas for Fragment Ions in Mass Spectrometric Data

André Wegner

Daniel Weindl

Christian Jäger

Sean C Sapcariu

Xiangyi Dong

Gregory Stephanopoulos

Karsten Hiller

Abstract

THEORETICAL BACKGROUND

Figure 1.

ALGORITHM

Figure 2.

Reducing Algorithmic Complexity

Figure 3.

Constraining/Weighting the Result Set

MATERIAL AND METHODS

IMPLEMENTATION

RESULTS AND DISCUSSION

Figure 4.

Table 1.

Table 2.

Table 3.

CONCLUSION

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Fragment Formula Calculator (FFC): Determination of Chemical Formulas for Fragment Ions in Mass Spectrometric Data

André Wegner

Daniel Weindl

Christian Jäger

Sean C Sapcariu

Xiangyi Dong

Gregory Stephanopoulos

Karsten Hiller

Abstract

THEORETICAL BACKGROUND

Figure 1.

ALGORITHM

Figure 2.

Reducing Algorithmic Complexity

Figure 3.

Constraining/Weighting the Result Set

MATERIAL AND METHODS

IMPLEMENTATION

RESULTS AND DISCUSSION

Figure 4.

Table 1.

Table 2.

Table 3.

CONCLUSION

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases