Calculation of the Isotope Cluster for Polypeptides by Probability Grouping

Matthew T Olson; Alfred L Yergey

doi:10.1016/j.jasms.2008.10.007

. Author manuscript; available in PMC: 2014 Sep 8.

Published in final edited form as: J Am Soc Mass Spectrom. 2008 Oct 17;20(2):295–302. doi: 10.1016/j.jasms.2008.10.007

Calculation of the Isotope Cluster for Polypeptides by Probability Grouping

Matthew T Olson ^a,^b, Alfred L Yergey ^a

PMCID: PMC4157305 NIHMSID: NIHMS620260 PMID: 19026561

Abstract

This paper presents a novel theoretical basis for accurately calculating the isotope cluster of polypeptides. In contrast to previous approaches to this problem, which consider exhaustive or near exhaustive combinations of isotopic species, the program, Neutron Cluster, groups probabilities to yield highly accurate information without elucidating any fine structure within a nominal mass unit. This is a fundamental difference from any previously described algorithm for calculating the isotope cluster. As a result of this difference, the accurate isotope clusters for high molecular weight polypeptides can be calculated rapidly without any pruning. When applied to isotope enriched polypeptides, the algorithm introduces “grouping error”, which is described, quantified, and avoided by using probability partitioning.

Proper interpretation of mass spectrometric data requires some understanding of the isotope cluster for the analyte. When considering analysis of peptides and proteins, prediction of the expected isotope cluster facilitates the recognition of contamination and potential instrumental malfunction [1–3]. Prediction of the change in the isotope cluster due to enrichment leads to identification of enriched molecules, a method on which mass spectrometric quantification and quantitative proteomic discovery relies [4–13]. Finally, accurate calculation of the isotope cluster may enable better Fourier transform deconvolution schemes for complex electrospray envelopes of macromolecules [14–18].

At the most basic level, the isotope cluster is a distribution of neutrons among the various elements present in a molecule. To date, the intensities of peaks in the isotope cluster have been estimated by the enumeration of the isotopic contributions of each element, with pruning introduced for practicality. The most classical examples of such an approach are the polynomial algorithms [19–23], first described by Yergey [23], which combine polynomial distributions for all the atoms in a molecule. Separately, Hsu described a calculation of the isotope cluster by means of convoluting multiple large Diophantine equations [24], and Kubinyi [25] introduced the concept of “splitting” isotopic distributions to yield a final cluster. These two methods are notable because they provide theoretical advantages in preserving the abundance accuracy of the cluster. However, since the number of combinations for large molecules precludes infinitely resolved exhaustive enumeration by any means, the convolution algorithms, either by polynomial expansion, peak splitting, or large elemental Diophantine equations, must implement probability thresholds to prune the number of species to satisfy computational time and memory demands.

Recently, dynamic programming [26] and Fourier transform [13, 27] methods have emerged as powerful tools for calculating fine isotopic structure. Since the fine isotopic structure obtained from these comprehensive algorithms is beyond the resolution of most contemporary instruments, the details are generally combined into nominal mass bins that resemble real measurements. The implementation of dynamic programming, as Snider [26] has done, offers the best means introduced so far for obtaining highly accurate predictions of the isotope cluster. This algorithm represents a genuine advancement in the field but, in its presently available rendering, it faces a fundamental problem similar to the polynomial algorithms since highly resolved isotopic combinations or “states” must be elaborated before generating a theoretical spectrum. While this is a desirable feature for users who may be interested in fine isotopic structure, it calculates more detail than is necessary for a theoretical spectrum that can be subsequently compared with data from a standard instrument.

To date, accurate determination of peak intensities has entailed more robust strategies for handling species enumeration. In contrast, this paper introduces a novel approach, which calculates the accurate probability for each number of neutrons added to the monoisotopic composition. At the outset, the algorithm—Neutron Cluster—calculates global neutronic probabilities rather than individual atomic distributions or isotopic combinations. Such probability grouping greatly simplifies the calculations and completely eliminates the need for pruning. It is the first algorithm described to date that uses grouped probabilities to obviate time- and memory-consuming species elucidation. As will be shown below, this approach calculates accurate intensities in nominal mass bins and yields helpful information towards elucidating fine isotopic structure after, rather than before, generating a theoretical spectrum.

Theory

Neutronic Distribution

Isotopes and equatransneutronic isotopes

For each element, E, each isotope, A, occurs at a natural abundance, P_{A_E}, which has been measured by other methods. These natural abundances are normalized for all the isotopes of each element:

\sum P_{A_{E}} = 1

(1)

To calculate the isotope cluster, which is the distribution of neutrons in a molecule, it is necessary to define the concept of equatransneutronic (ETN) isotopes, groups of isotopes differing from their element's most abundant isotope by the same number of neutrons. For the following derivation, isotopes in ETN group G_x differ from the most abundant isotope by d_{G_x}. For example, peptides, which contain C, H, N, O, and S, have 3 ETN groups: d_G = 1, isotopes with 1 additional neutron (¹³C, ²H, ¹⁵N, ¹⁷O, ³³S); d_G = 2, those with 2 additional neutrons (¹⁸O, ³⁴S), and one isotope with d_G = 4 additional neutrons (³⁶S).

Elemental, isotopic, and equatransneutronic probabilities

The abundances of elements in a molecule are known from the molecular formula, so the probability, P_Emolecule, that a randomly selected atom is element E in a molecule having N atoms of which N_E are element E can be calculated as follows.

P_{E m o l e c u l e} = \frac{N_{E}}{N}

(2)

Thus, the probability P_{A_E}_molecule that a randomly selected atom in the molecule is the isotope A_E is calculated in a straightforward manner:

P_{A_{E} m o l e c u l e} = P_{E m o l e c u l e} P_{A_{E}} = \frac{N_{E} P_{A_{E}}}{N}

(3)

Further, the probability, P_{ETN_Gx}, of selecting an atom from the molecule, which is a member of a particular ETN group G_x is obtained from summing the probabilities of each P_{A_E}_molecule for each isotope in the ETN group. If E_i is the designation of an element of which isotope ^AE_i is a member of ETN group G_x, and j is the number of elements in the ETN group, then this probability is represented as follows.

P_{E T N_{G_{x}}} = \sum_{E T N_{G_{x}}} P_{A_{E} m o l e c u l e} = \sum_{i = 1}^{j} \frac{N_{E_{i}} P_{A_{E_{i}}}}{N}

(4)

Because each ETN group contains atoms that have the same difference, d_{G_x}, in the number of neutrons from the most abundant isotope, P_{ETN_G_x}, is the probability that such a difference in neutrons will be present in a randomly selected atom of the molecule. This probability for each ETN group will be combined into absolute probabilities that a molecule contains a specific number of extra neutrons.

The Diophantine equation^1,2 for obtaining combinations of equatransneutronic groups

To calculate the desired probability—the probability that a molecule differs by n neutrons from the monoisotopic composition—it is necessary to consider all the combinations of the ETN groups that can yield n, and to sum all of those contributions together. Since d_{G_x}, the neutronic difference between ETN group and the monoisotopic molecule must be a positive integer, and k_x, the number of ETN group G_x members in the molecule, must also be a positive integer, the equation for n is the following Diophantine equation, where f is the number of ETN groups considered.

n = k_{1} d_{G_{1}} + k_{2} d_{G_{2}} + \dots + k_{f} d_{G_{f}}

(5)

The solutions for k₁ . . . k_f to this Diophantine equation represent all the combinations of ETN groups that can yield the addition of n neutrons to the monoisotopic composition. Since the Diophantine equation is dependent only on the ETN groups that are being considered and not on the actual elements or isotopes in those groups, the equation can be solved once a priori and stored in a lookup table. The Diophantine can be set up and solved easily for any desired combination of ETN groups and any number of neutrons. For this paper, which focuses on polypeptides, the Diophantine was solved for d_G = {1,2,4}, and for n = {1 . . . 500}. As will be shown below, this simple lookup table is sufficient to calculate the correct isotope cluster for proteins > 500 kDa.

Combining the contributions of equatransneutronic groups to the number of neutrons

The solutions to the Diophantine are sets of integers which, when substituted into the equation, yield a desired n. For example, if d_G = {1,2,4}, and n = 1, then there is only one solution, {k₁ = 1; k₂ = 0; k₃ = 0}. However, for the same d_G if n = 2, then there are two solutions: {k₁ = 2; k₂ = 0; k₃ = 0} and {k₁ = 0; k₂ = 1; k₃ = 0}. The probability P_solution that one of these solutions will occur is the product of the probabilities P_{k_x} that atoms from each ETN group G_x will occur k_x times.

P_{s o l u t i o n} = \prod_{x = 1}^{f} P_{k_{x}}

(6)

The probability P_{k_x} can be calculated by the binomial formula where N is the total number of atoms, and P_{ETNG_x} has is calculated in eq 4.

P_{k_{x}} = \frac{N!}{k_{x}! (N - k_{x})!} p_{E T N_{G_{x}}}^{k_{x}} {(1 - P_{E T N_{G_{x}}})}^{N - k_{x}}

(7)

While the binomial distribution is used here to calculate P_{k_x}, this algorithm differs substantially from other algorithms that use the binomial distribution to calculate the isotope cluster because the values for each variable in the equation are fundamentally different. As a consequence, this algorithm uses the binomial formula fewer times, and with smaller integers, than other algorithms. These differences are readily apparent from the above derivation and will be discussed below.

Calculating the abundance of a peak in the cluster

Equations 6 and 7 yield the probability that a particular combination of ETN groups, with a total addition of n neutrons to the monoisotopic composition, is present in the molecule. The abundance of the n^th peak in the isotope cluster, P_n, is the sum of probabilities for each of the t combinations of ETN groups that add n neutrons to the monoisotopic composition. The abundance of the n^th peak in the cluster is shown in eq 8.

P_{n} = \sum_{s = 1}^{t} \prod_{x = 1}^{f} P_{k_{s}} = \sum_{s = 1}^{t} \prod_{x = 1}^{f} \frac{N!}{k_{x_{s}}! (N - k_{x_{s}})} P_{E T N_{G_{x}}}^{k_{x_{s}}} \times {(1 - P_{E T N_{G_{x}}})}^{N - k_{x_{s}}}

(8)

Since there is no permutation threshold, P_n can be calculated correctly and rapidly for any number of neutrons for any molecule. For practical reasons, the algorithm allows the user to determine a fraction of the total isotope cluster to be characterized (i.e., 0.999999), and the algorithm stops after achieving enough probability peaks to explain that fraction of the cluster. This fraction is merely an indication of how much of the cluster is desired; it is not a permutation or abundance threshold and thus does not change the run time significantly.

Multiplicity

If each of the f ETN groups contain j_x equatransneutronic isotopes, and k_x is the number of atoms from ETN group G_x in solution s of the Diophantine, then the total multiplicity, M, for the n^th peak in the isotope cluster is given in eq 9.

M_{n} = \sum_{s = 1}^{t} \prod_{x = 1}^{f} P_{k_{x_{s}}} = \sum_{s = 1}^{t} \prod_{x = 1}^{f} (\begin{matrix} j_{x_{s}} + k_{x_{s}} - 1 \\ j_{x_{s}} - 1 \end{matrix})

(9)

Since the algorithm here does not calculate individual atomic combinations, it does not implement or need permutation or abundance thresholds. Thus, prima facie, multiplicity is irrelevant. As will be discussed below, multiplicity is calculated for two reasons: (1) to determine the contribution of pruning error in the difference between this algorithm and others, and (2) to determine the number of combinations that would exist if fine isotopic combinations were desired.

Neutronic Mass

Each ETN group contains isotopes with different of nuclear binding energies. Table 1 lists the average mass per neutron between ETN isotopes and the most abundant isotope for each of the three ETN groups that are encountered in a peptide. Equation 7 yields P_{k_x}, the contribution of each ETN combination to the abundance of the n^th peak in the isotope cluster. Equation 10 incorporates this probability into a weighted average that yields the average mass difference between the n^th peak in the isotope cluster and the mass of the monoiso-topic composition.

Δ m a s s = n {\overset{‒}{m}}_{n} = n (\frac{\sum_{x = 1}^{f} ({\overset{‒}{m}}_{n_{x}} \sum_{s = 1}^{t} P_{k_{x_{s}}})}{P_{n}})

(10)

The total mass of the n^th peak in the cluster is simply the monoisotopic mass plus the mass change as calculated in eq. 10.

Table 1.

The elements (E_i), isotopes (^AE), exact mass, and the average mass of an added neutron for each of the ETN groups encountered in a polypeptide

d_G	E_i	^AE	Mass (Da)	ΔMass/neutron (Da)	(Da)
0	H	1	1.007825	–	–
	C	12	12.00000	–
	N	14	14.00307	–
	O	16	15.99491	–
	S	32	31.97297	–
1	H	2	2.014102	1.006277	1.00188
	C	13	13.00335	1.003354
	N	15	15.00011	0.997038
	O	17	16.99913	1.004220
	S	33	32.97146	0.99849
2	O	18	17.99916	1.002125	0.99979
	S	34	33.96787	0.99745
4	S	36	35.96708	0.99853	0.99853

Open in a new tab

Enrichment

The equations and algorithm up to this point have been developed under the assumption that ETN groups share a similar distribution. While it will be shown below that such an assumption succeeds well with natural abundances for C, H, N, O, and S, enriched isotopes of these elements or elements with a different natural distribution than those commonly present in organic molecules (for example Sn or Hg) cannot be included in the same ETN group without the introduction of “grouping error,” which will be illustrated later. To avoid grouping error with these distributions, the algorithm groups similar distributions and partitions divergent ones. The treatment for a single distribution is shown above, and the different distributions are combined in a straightforward manner. If distribution D₁ has b peaks from the monoisotopic to the b^th peak, and distribution D₂ has c peaks from the monoisotopic to the c^th peak, then the final distribution D’ is calculated by multiplying each intensity in D₁ by each intensity in D₂. The total number of neutrons added to the monoisotopic composition must be less than bc since D₁ and D₂ contain overlapping numbers of neutrons.

Graphical Output

While the algorithm only calculates mass and intensity for peaks in the cluster that correspond to additional neutrons, a graphical representation or “theoretical spectrum” is often desired to compare predicted results to measurements. For this paper, the graphical output is rendered by a Gaussian approximation for the combination of instrumental and combinatorial variance around the neutronic mass. Each peak is used to generate a Gaussian bell, where the resolution, R, and the peak mass, m, are converted into the Gaussian standard deviation σ by eq 11.

σ = \frac{m}{2 R {(2 \ln (2))}^{\frac{1}{2}}}

(11)

Values obtained from eq 11 are placed into appropriately spaced bins and graphed on any commercially available graphing software.

Computational Features and Requirements

The algorithm above is implemented in Perl and is available in the supplemental material. The CPAN Perl modules BigInt and BigFloat were implemented for computational accuracy on factorial values above and exponential values below the validated computational range for Perl. The calculations in this paper run the BigFloat module with a precision of six significant figures, although greater precision can be introduced if desired. All run times discussed below are on a Mac-Book with a 2.16 GHz Intel 2 Core Duo Processor and 4 GB RAM.

Results and Discussion

Comparison with Existing Algorithms

To compare existing algorithms to the one described here, results for a published peptide, bovine insulin, with the elemental composition C₂₅₄H₃₇₇N₆₅O₇₅S₆, were analyzed using IsoPro, a widely available embodiment of the standard polynomial algorithm, and IsoDalton [26], a well-implemented open source dynamic programming algorithm. These programs were operated with reasonable parameters: a 10⁻⁶ permutation threshold for IsoPro and a 10⁻⁸ state cut-off for IsoDalton. The abundances were all set to the same values as defined by NIST. Comparisons of intensity and multiplicity are shown in Table 2. The minor (1.1%) negative difference between Neutron Cluster and IsoDalton at the [M + 1] peak is a result of using absolute abundances, which add to unity; because IsoDalton's intensities are lower than those obtained in Neutron Cluster for the addition of more neutrons, the intensity of the [M + 1] peak is slightly higher. IsoPro does not normalize intensities to unity, and the effect of pruning error is seen dramatically in the higher mass peaks.

Table 2.

Comparison of absolute abundance (A) and multiplicity (M) for the isotope cluster obtained involving n neutrons from this algorithm—called NeutronCluster (NC)—Isodalton (ID), and IsoPro (IP) for bovine insulin (C₂₅₄H₃₇₇N₆₅O₇₅S₆)

	Neutron Cluster			IsoDalton				IsoPro
n	Mass (Da)	NC (A)	NC (M)	ID (A)	ID (M)^*	NC-ID (%A)	NC-ID (M)	Mass NC-IP (ppm)	IP (A)	IP (M)^*	NC-IP (%A)	NC-IP (M)
0	5729.606	0.030	1	0.030	1	0.0	0	1.0	0.029	1	3.3	0
1	5730.610	0.092	5	0.093	5	–1.1	0	1.2	0.092	5	0	0
2	5731.613	0.160	17	0.160	17	0.0	0	1.3	0.16	16	0	1
3	5732.617	0.190	45	0.190	45	0.0	0	1.7	0.19	33	0	12
4	5733.617	0.180	104	0.180	104	0.0	0	1.4	0.18	57	0	47
5	5734.622	0.140	216	0.140	216	0.0	0	2.0	0.14	81	0	135
6	5735.624	0.097	416	0.096	416	1.0	0	2.0	0.097	107	0	309
7	5736.629	0.059	752	0.059	752	0.0	0	2.7	0.059	121	0	631
8	5737.629	0.033	1294	0.032	1294	3.0	0	2.4	0.032	132	3	1162
9	5738.634	0.017	2134	0.016	2134	5.9	0	3.1	0.016	131	5.9	2003
10	5739.635	0.0080	3398	0.0074	3396	7.5	2	3.0	0.0073	112	8.8	3286
11	5740.640	0.0035	5246	0.0032	5246	8.6	0	3.7	0.0030	92	14.3	5154
12	5741.640	0.0015	7888	0.0013	7884	13.3	4	3.4	0.0011	66	26.7	7822
13	5742.645	0.00057	11584	0.0005	11564	12.3	20	4.2	0.0003	28	47.3	11556
14	5743.646	0.00022	16664	0.0002	16592	9.1	72	4.0	0.0001	11	54.5	16653
15	5744.651	0.00010	23528	0.0001	23358	0.0^**	170	–	–	0	–	23528

Open in a new tab

Because Isodalton does not produce multiplicity in its standard output, the numbers here are maximal estimates for multiplicity, which were generated from NC by removing the contribution to multiplicity for all ETN combinations that were below ID's threshold of 1 × 10^–8; an ETN combination represents at least 1 isotopic species, so the true multiplicity for ID is at most the number here. Thus, differences in multiplicity between ID and NC are minimal estimates.

^**

No difference can be calculated within the single significant figure that ID yields for this n.

As described elsewhere [20, 25], pruning has an enormous effect on multiplicity. The results in Table 2 reiterate previous findings. The approach described differs fundamentally from other isotope cluster algorithms because neutron clustering allows for the calculation of the exact probability of adding neutrons to a molecule without elucidating every possible atomic location of these neutrons. Thus, multiplicity is a calculated rather than counted value, and all the low abundance species are intrinsic to the probability of adding a certain number of neutrons. While the algorithm avoids calculating the fine isotopic structure altogether, it yields the probability of each equatransneutronic combination, and this data could be used to calculate fine isotopic structure if desired. Such computations will be examined in a later paper.

Calculation of High Molecular Weight Isotope Clusters

As an example of the algorithm's function at high molecular weight, 0.9999 of the isotope cluster for human dynein heavy chain 1, with the molecular formula C_23,832H_37,816N₆₅₂₈O₇₀₃₁S₁₇₀, the monoisotopic mass of 533403.57, and the average of 533741.15 was calculated without any pruning in 39.95s. This entailed the addition of 400 neutrons; when plotted on a linear intensity scale, as in Figure 1a, the low end of the visible portion of the cluster is at 286 Da above the monoiso-topic mass and spans 98 Da. Figure 1b shows the same mass range on a logarithmic intensity scale. Comparison of these two figures highlights the low abundance of all peaks in the isotope cluster; the highest peak in the cluster, shown in Figure 1c, has an absolute intensity of 0.0206, and the monoisotopic peak, shown in Figure 1d, has an abundance of 4.0 × 10^–136. These figures demonstrate that the range of absolute intensities required to calculate the full isotope cluster of a high molecular weight protein requires an algorithm that does not have any cutoffs or thresholds. For example, if the most abundant peak, the 337 neutron peak, were pruned to only include equatransneutronic combinations with probabilities greater than 10⁻⁸, only 105 out of the 7225 possible equatransneutronic combinations would remain. An error of ~3% would result on this peak alone.

The theoretical spectrum for the isotope cluster for the microtubule associated protein dynein heavy chain 1 (accession KIAA0325) with a monoisotopic mass of 533403.57 Da and a resolution of 10⁵. The cluster seen here explains .9999 of the absolute cluster. With absolute abundance plotted on a linear scale (a), only values close to the average mass of 533741.15 appear while the same data plotted on a logarithmic scale (b) demonstrate the very small peaks that lead up to the most abundant peaks. A close view of most abundant peaks (c) can be achieved on a linear scale. The monoisotopic and immediately subsequent peaks are rendered accurately by the algorithm and can be seen plotted on a low intensity logarithmic scale (d).

Computational time for isotope cluster calculations has remained a significant hurdle in the routine determination of accurate intensities for high molecular weight polypeptides, principally because the probabilities for individual atomic or isotopic combinations had to be elucidated before the intensity of the peak could be calculated. This algorithm presents a means of calculating the peaks while considering individual isotopes only once: in the calculation that obtains the global equatransneutronic probabilities (eq 4). For peptides, the number of equatransneutronic combinations is much less than the number of isotopic, and even atomic, combinations, so the algorithm presents a significant time advantage, as seen in Table 3. As stated before, these run times represent the time required to calculate the cluster without any pruning.

Table 3.

Run times needed for the algorithm to calculate the unpruned isotope cluster of some biologically relevant polypeptides. In the case of Angiotensin II, which has no sulfur, only 2 ETN groups were necessary, so Diophantine solutions in 2 variables (d_G = {1,2}) were used

Accession	Common name	Molecular formula	Monoisotopic mass (Da)	Time (s)
–	Angiotensin II	C₅₀H₇₁N₁₃O₁₂	1045.534	0.004
550085A	Bovine insulin	C₂₅₄H₃₇₇N₆₅O₇₅S₆	5729.606	0.009
AAA59179	Human insulin	C₅₂₀H₈₁₇N₁₃₉O₁₄₇S₈	11616.855	0.013
P02144	Human myoglobin	C₇₄₄H₁₂₂₄N₂₁₀O₂₂₂S₅	17172.957	0.017
P27352	Human intrinsic factor	C₂₀₂₃H₃₂₀₈N₅₂₄O₆₁₉S₂₀	45387.020	0.109
P02769	Bovine serum albumin	C₂₉₃₄H₄₆₁₅N₇₈₁O₈₉₇S₃₉	66389.890	0.151
P05023	Human Na/K ATPase, Renal isoform, subunit	C₅₀₄₇H₈₀₁₄N₁₃₃₈O₁₄₉₅S₄₈	112823.910	0.787
Q8WWZ7	Human ATP binding cassette protein	C₈₅₇₄H₁₃₃₇₈N₂₀₉₂O₂₃₉₂S₇₇	186386.849	3.056
O60494	Human intrinsic factor-hydroxocobalamin receptor	C₁₇₆₀₀H₂₆₄₇₄N₄₇₅₂O₅₄₈₆S₁₉₇	398470.499	20.174
KIAA0325	Human dynein heavy chain	C₂₃₈₃₂H₃₇₈₁₆N₆₅₂₈O₇₀₃₁S₁₇₀	533403.568	39.950

Open in a new tab

Since each neutronic peak in the isotope cluster represents an enormous multiplicity of isotopic species, and the mass assigned to each peak is the weighted average of the equatransneutronic groups, some mass error is expected in the mass assignments of these peaks. One way to estimate this error is to compare the mass assigned to the most abundant peak in the calculated isotope cluster against the mass calculated from average atomic masses. The results of such calculations are shown in Table 4. At higher masses, the theoretical average mass converges more with the most abundant peak because the number of added neutrons is greater, so the probability distribution of lesser abundant isotopes, which go into the average atomic mass, becomes more significant.

Table 4.

Relationship between the mass of the most abundant peak in the isotope cluster and the calculated average mass of the polypeptides in Table 3

Common name	Molecular formula	Predicted average mass (Da)	Most abundant peak (Da)	Error (ppm)	Absolute intensity of most abundant peak
Angiotensin II	C₅₀H₇₁N₁₃O₁₂	1045.53^*	1045.53	0.0	0.528
Bovine insulin	C₂₅₄H₃₇₇N₆₅O₇₅S₆	5733.54	5732.62	16.0	0.186
Human insulin	C₅₂₀H₈₁₇N₁₃₉O₁₄₇S₈	11624.50	11623.88	5.3	0.135
Human myoglobin	C₇₇₄H₁₂₂₄N₂₁₀O₂₂₂S₅	17183.71	17182.98	4.3	0.116
Human intrinsic factor	C₂₀₂₃H₃₂₀₈N₅₂₄O₆₁₉S₂₀	45415.86	45415.10	1.7	0.696
Bovine Serum Albumin	C₂₉₃₄H₄₆₁₅N₇₈₁O₈₉₇S₃₉	66432.73	66432.01	1.1	0.0567
Human Na/K ATPase, renal isoform, subunit	C₅₀₄₇H₈₀₁₄N₁₃₃₈O₁₄₉₅S₄₈	112895.58	112895.12	0.4	0.0442
Human ATP binding Cassette Protein	C₈₅₇₄H₁₃₃₇₈N₂₀₉₂O₂₃₉₂S₇₇	186506.81	186506.19	0.3	0.0342
Human intrinsic factor-hydroxocobalamin receptor	C₁₇₆₀₀H₂₆₄₇₄N₄₇₅₂O₅₄₈₆S₁₉₇	398724.57	398724.21	0.1	0.0233
Human dynein heavy chain	C₂₃₈₃₂H₃₇₈₁₆N₆₅₂₈O₇₀₃₁S₁₇₀	533737.29	533736.52	0.1	0.0206

Open in a new tab

Because of its small mass, the predicted peak for Angiotensin II is the monoisotopic, rather than the average mass, so there is no mass error.

Perturbation of the Isotopic Composition

Stable isotope enrichment comprises the most reliable strategy for mass spectrometric quantification. Molecules labeled with stable isotopes, usually ¹³C or ²H, are used to quantify an analyte. Given the extreme importance of quantitative mass spectrometric measurements, any isotope clustering algorithm for organic molecules should provide accurate intensity values for enriched atoms. The algorithm in this paper simplifies calculations immensely by grouping intensities. While this allows for high accuracy (Table 2) and speed (Table 3), the trade-off is grouping error, which enters into the cluster intensities if members in the ETN group widely divergent distributions. Figure 2 shows an extreme example of grouping error for the peptide DARWIM (C₃₅H₅₄N₁₀O₁₆S₁) when the abundance of ¹³C is 99%. The enriched abundance of carbon in this example differs so significantly from the natural abundances of ²H,¹⁵N,¹⁶O, and ³³S that placing carbon in the same ETN group as these low abundance isotopes effectively averages ¹³C into the cluster. This results in an incorrectly broad distribution (the gray trace in Figure 2). Partitioning ¹³C and multiplying its distribution by the cluster generated from clustering all the other elements yields the correct distribution (the blue trace in Figure 2). Because all the other elements retain their clustering, the effect of this partitioning is negligible. Since the theoretical basis for both the clustered and partitioned distributions is the same, no pruning is introduced in the process of partitioning an enriched distribution.

An example of grouping error for the peptide DARWIM (C₃₅H₅₄N₁₀O₉S₁). The natural abundance distribution is shown in red, and the correct 99% enriched distribution, calculated by partitioning carbon from the grouped cluster, is shown in blue. At this enrichment, the difference between the correct cluster and the grouped cluster, shown in gray, is obvious. All the intensities are absolute.

Table 5 calculates the effect of grouping on isotope cluster calculations. It can serve as a general guide for applying this algorithm to enrichment experiments. If the instrumental cluster variance is less than the error on the cluster for a certain enrichment, then calculation of the theoretical cluster should involve partitioning of the enriched isotope. While partitioning reverts to previously described polynomial combination methods, the computational time is much less and does not require pruning because partitioning results in a combination of much fewer clusters, in this case the grouped cluster and the partitioned cluster, than would be encountered in the calculation of individual clusters for C, H, N, O, and S.

Table 5.

Calculated grouping error for ¹³C abundances from natural to 99% enrichment. The error is calculated as a sum of the absolute error for the intensities of the peaks in the grouped cluster against the intensities for the peaks in the cluster calculated by partitioning carbon

Abundance of ¹³C (% carbon)	Total error (%)
1.1	0.070
5.0	1.2
10	3.6
20	7.2
50	25
99	140

Open in a new tab

Conclusion and Future Directions

The theory and calculations in this paper present a novel basis for simplifying accurate calculations of the isotope cluster by grouping. Since each neutronic probability is calculated separately, the algorithm is the most conducive to parallel processing that has been described to date, and this benefit will be explored. Additionally, the detailed probabilities for each ETN combination can serve as a starting point for calculation of the fine isotopic structure when desired. Finally, by partitioning enriched isotopes, the theoretical spectra from quantitative experiments for a known enrichment can be obtained accurately and rapidly.

Acknowledgments

The authors acknowledge support for this research by the Intramural Research Program of the National Institute of Child Health and Human Development. The authors gratefully acknowledge the helpful comments and discussion of Jonathan A. Epstein (OSD/NICHD/NIH).

Footnotes

A Diophantine equation is a linear equation with positive integer coefficients and a finite multitude of positive integer solutions. In the case described in the text we are interested in the number of ways 1, 2, and 4 neutrons can combine to yield n neutrons. For a similar but less scientific example, the reader is encouraged to consider all the ways a cashier may dispense 16¢ using various numbers of 1¢, 5¢, and 10¢ coins. The solutions to this problem are the solutions to the Diophantine equation: 1n₁+5n₅+10n₁₀=16.

It should be clear that Diophantine equations are a kind of equation rather than a specific equation. Thus, while Diophantine equations are not new to isotope cluster calculations²⁴, the element-independent Diophantine equations here are much simpler, and their context is much different-than the Diophantine equations used previously. Additionally, they have been precomputed to enhance efficiency.

References

1.Zubarev RA, Demirev PA, Hakansson P, Sundqvist BU. R. Approaches and Limits for Accurate Mass Characterization of Large Biomolecules. Anal. Chem. 1995;67(20):3793–3798. [Google Scholar]
2.Blank PS, Sjomeling CM, Backlund PS, Yergey AL. Use of Cumulative Distribution Functions to Characterize Mass Spectra of Intact Proteins. J. Am. Soc. Mass Spectrom. 2002;13(1):40–46. doi: 10.1016/S1044-0305(01)00338-5. [DOI] [PubMed] [Google Scholar]
3.Havilio M, Haddad Y, Smilansky Z. Intensity-Based Statistical Scorer for Tandem Mass Spectrometry. Anal. Chem. 2003;75(3):435–444. doi: 10.1021/ac0258913. [DOI] [PubMed] [Google Scholar]
4.Spellman DS, Deinhardt K, Darie CC, Chao MV, Neubert TA. Stable Isotopic Labeling by Amino Acids in Cultured Primary Neurons: Application to Brain-Derived Neurotrophic Factor-Dependent Phosphotyrosine-Associated Signaling. Mol. Cell. Proteom. 2008;7(6):1067–1076. doi: 10.1074/mcp.M700387-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Connor EC, Rott AS, Zeder M, Juttner F, Dorn S. C-13-Labeling Patterns of Green Leaf Volatiles Indicating Different Dynamics of Precursors in Brassica Leaves. Phytochemistry. 2008;69(6):1304–1312. doi: 10.1016/j.phytochem.2008.01.003. [DOI] [PubMed] [Google Scholar]
6.Xiao GG, Garg M, Lim S, Wong D, Go VL, Lee WNP. Determination of Protein Synthesis in Vivo Using Labeling from Deuterated Water and Analysis of MALDI-TOF Spectrum. J. App. Physiol. 2008;104(3):828–836. doi: 10.1152/japplphysiol.00976.2007. [DOI] [PubMed] [Google Scholar]
7.Suzuki H, Sasaki R, Ogata Y, Nakamura Y, Sakurai N, Kitajima M, Takayama H, Kanaya S, Aoki K, Shibata D, Saito K. Metabolic Profiling of Flavonoids in Lotus japonicus Using Liquid Chromatography Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. Phytochemistry. 2008;69(1):99–111. doi: 10.1016/j.phytochem.2007.06.017. [DOI] [PubMed] [Google Scholar]
8.Pascal BD, Chalmers MJ, Busby SA, Mader CC, Southern MR, Tsinoremas NF, Griffin PR. The Deuterator: Software for the Determination of Backbone Amide Deuterium Levels from H/D Exchange MS Data. BMC Bioinformatics. 2007;8:156–167. doi: 10.1186/1471-2105-8-156. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hotchko M, Anand GS, Komives EA, Ten Eyck LF. Automated Extraction of Backbone Deuteration Levels from Amide H/(2) H Mass Spectrometry Experiments. Protein Sci. 2006;15(3):583–601. doi: 10.1110/ps.051774906. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Snijders APL, de Koning B, Wright PC. Perturbation and Interpretation of Nitrogen Isotope Distribution Patterns in Proteomics. J. Proteome Res. 2005;4(6):2185–2191. doi: 10.1021/pr050260l. [DOI] [PubMed] [Google Scholar]
11.Zhang XM, Hines W, Adamec J, Asara JM, Naylor S, Regnier FE. An Automated Method for the Analysis of Stable Isotope Labeling Data in Proteomics. J. Am. Soc. Mass Spectrom. 2005;16(7):1181–1191. doi: 10.1016/j.jasms.2005.03.016. [DOI] [PubMed] [Google Scholar]
12.Johnson KL, Muddiman DC. A Method for Calculating O-16/O-18 Peptide Ion Ratios. J. Am. Soc. Mass Spectrom. 2004;15(4):437–445. doi: 10.1016/j.jasms.2003.11.016. [DOI] [PubMed] [Google Scholar]
13.Rockwood AL, Van Orman JR, Dearden DV. Isotopic Compositions and Accurate Masses of Single Isotopic Peaks. J. Am. Soc. Mass Spectrom. 2004;15(1):12–21. doi: 10.1016/j.jasms.2003.08.011. [DOI] [PubMed] [Google Scholar]
14.Prebyl BS, Cook KD. Use of Fourier Transform for Deconvolution of the Unresolved Envelope Observed in Electrospray Ionization Mass Spectrometry of Strongly Ionic Synthetic Polymers. Anal. Chem. 2004;76:127–136. [Google Scholar]
15.Valkenborg D, Jansen I, Burzykowski T. A Model-Based Method for the Prediction of the Isotopic Distribution of Peptides. J. Am. Soc. Mass Spectrom. 2008;19(5):703–712. doi: 10.1016/j.jasms.2008.01.009. [DOI] [PubMed] [Google Scholar]
16.Aizikov K, O'Connor PB. Use of the Filter Diagonalization Method in the Study of Space Charge Related Frequency Modulation in Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2006;17(6):836–843. doi: 10.1016/j.jasms.2006.02.018. [DOI] [PubMed] [Google Scholar]
17.Maleknia SD, Downard KM. Charge Ratio Analysis Method to Interpret High Resolution Electrospray Fourier Transform-Ion Cyclotron Resonance Mass Spectra. Int. J. Mass Spectrom. 2005;246(1/3):1–9. [Google Scholar]
18.Meija J, Caruso JA. Deconvolution of Isobaric Interferences in Mass Spectra. J. Am. Soc. Mass Spectrom. 2004;15(5):654–658. doi: 10.1016/j.jasms.2003.12.016. [DOI] [PubMed] [Google Scholar]
19.Fernandez-de-Cossio J, Gonzalez LJ, Satomi Y, Betancourt L, Ramos Y, Huerta V, Amaro A, Besada V, Padron G, Minamino N, Takao T. Isotopica: A Tool for the Calculation and Viewing of Complex Isotopic Envelopes. Nucleic Acids Res. 2004;32:W674–W678. doi: 10.1093/nar/gkh423. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Roussis SG, Proulx R. Reduction of Chemical Formulas from the Isotopic Peak Distributions of High-Resolution Mass Spectra. Anal. Chem. 2003;75(6):1470–1482. doi: 10.1021/ac020516w. [DOI] [PubMed] [Google Scholar]
21.Horn DM, Zubarev RA, McLafferty FW. Automated Reduction and Interpretation of High Resolution Electrospray Mass Spectra of Large Molecules. J. Am. Soc. Mass Spectrom. 2000;11(4):320–332. doi: 10.1016/s1044-0305(99)00157-9. [DOI] [PubMed] [Google Scholar]
22.Datta BP. Polynomial Method of Molecular Isotopic Abundance Calculations: A Computational Note. Rapid Commun. Mass Spectrom. 1997;11(16):1767–1774. [Google Scholar]
23.Yergey JA. A General Approach to Calculating Isotopic Distributions for Mass-Spectrometry. Int. J. Mass Spectrom. 1983;52:337–349. doi: 10.1002/jms.4498. [DOI] [PubMed] [Google Scholar]
24.Hsu CS. Diophantine Approach to Isotopic Abundance Calculations. Anal. Chem. 1984;56(8):1356–1361. [Google Scholar]
25.Kubinyi H. Calculation of Isotope Distributions in Mass Spectrometry. A Trivial Solution for a Non-Trivial Problem. Anal. Chim. Acta. 1991;247:107–119. [Google Scholar]
26.Snider RK. Efficient Calculation of Exact Mass Isotopic Distributions. J. Am. Soc. Mass Spectrom. 2007;18(8):1511–1515. doi: 10.1016/j.jasms.2007.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Rockwood AL, Kushnir MM, Nelson GJ. Dissociation of Individual Isotopic Peaks: Predicting Isotopic Distributions of Product Ions in MSn. J. Am. Soc. Mass Spectrom. 2003;14(4):311–322. doi: 10.1016/S1044-0305(03)00062-X. [DOI] [PubMed] [Google Scholar]

[R1] 1.Zubarev RA, Demirev PA, Hakansson P, Sundqvist BU. R. Approaches and Limits for Accurate Mass Characterization of Large Biomolecules. Anal. Chem. 1995;67(20):3793–3798. [Google Scholar]

[R2] 2.Blank PS, Sjomeling CM, Backlund PS, Yergey AL. Use of Cumulative Distribution Functions to Characterize Mass Spectra of Intact Proteins. J. Am. Soc. Mass Spectrom. 2002;13(1):40–46. doi: 10.1016/S1044-0305(01)00338-5. [DOI] [PubMed] [Google Scholar]

[R3] 3.Havilio M, Haddad Y, Smilansky Z. Intensity-Based Statistical Scorer for Tandem Mass Spectrometry. Anal. Chem. 2003;75(3):435–444. doi: 10.1021/ac0258913. [DOI] [PubMed] [Google Scholar]

[R4] 4.Spellman DS, Deinhardt K, Darie CC, Chao MV, Neubert TA. Stable Isotopic Labeling by Amino Acids in Cultured Primary Neurons: Application to Brain-Derived Neurotrophic Factor-Dependent Phosphotyrosine-Associated Signaling. Mol. Cell. Proteom. 2008;7(6):1067–1076. doi: 10.1074/mcp.M700387-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Connor EC, Rott AS, Zeder M, Juttner F, Dorn S. C-13-Labeling Patterns of Green Leaf Volatiles Indicating Different Dynamics of Precursors in Brassica Leaves. Phytochemistry. 2008;69(6):1304–1312. doi: 10.1016/j.phytochem.2008.01.003. [DOI] [PubMed] [Google Scholar]

[R6] 6.Xiao GG, Garg M, Lim S, Wong D, Go VL, Lee WNP. Determination of Protein Synthesis in Vivo Using Labeling from Deuterated Water and Analysis of MALDI-TOF Spectrum. J. App. Physiol. 2008;104(3):828–836. doi: 10.1152/japplphysiol.00976.2007. [DOI] [PubMed] [Google Scholar]

[R7] 7.Suzuki H, Sasaki R, Ogata Y, Nakamura Y, Sakurai N, Kitajima M, Takayama H, Kanaya S, Aoki K, Shibata D, Saito K. Metabolic Profiling of Flavonoids in Lotus japonicus Using Liquid Chromatography Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. Phytochemistry. 2008;69(1):99–111. doi: 10.1016/j.phytochem.2007.06.017. [DOI] [PubMed] [Google Scholar]

[R8] 8.Pascal BD, Chalmers MJ, Busby SA, Mader CC, Southern MR, Tsinoremas NF, Griffin PR. The Deuterator: Software for the Determination of Backbone Amide Deuterium Levels from H/D Exchange MS Data. BMC Bioinformatics. 2007;8:156–167. doi: 10.1186/1471-2105-8-156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Hotchko M, Anand GS, Komives EA, Ten Eyck LF. Automated Extraction of Backbone Deuteration Levels from Amide H/(2) H Mass Spectrometry Experiments. Protein Sci. 2006;15(3):583–601. doi: 10.1110/ps.051774906. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Snijders APL, de Koning B, Wright PC. Perturbation and Interpretation of Nitrogen Isotope Distribution Patterns in Proteomics. J. Proteome Res. 2005;4(6):2185–2191. doi: 10.1021/pr050260l. [DOI] [PubMed] [Google Scholar]

[R11] 11.Zhang XM, Hines W, Adamec J, Asara JM, Naylor S, Regnier FE. An Automated Method for the Analysis of Stable Isotope Labeling Data in Proteomics. J. Am. Soc. Mass Spectrom. 2005;16(7):1181–1191. doi: 10.1016/j.jasms.2005.03.016. [DOI] [PubMed] [Google Scholar]

[R12] 12.Johnson KL, Muddiman DC. A Method for Calculating O-16/O-18 Peptide Ion Ratios. J. Am. Soc. Mass Spectrom. 2004;15(4):437–445. doi: 10.1016/j.jasms.2003.11.016. [DOI] [PubMed] [Google Scholar]

[R13] 13.Rockwood AL, Van Orman JR, Dearden DV. Isotopic Compositions and Accurate Masses of Single Isotopic Peaks. J. Am. Soc. Mass Spectrom. 2004;15(1):12–21. doi: 10.1016/j.jasms.2003.08.011. [DOI] [PubMed] [Google Scholar]

[R14] 14.Prebyl BS, Cook KD. Use of Fourier Transform for Deconvolution of the Unresolved Envelope Observed in Electrospray Ionization Mass Spectrometry of Strongly Ionic Synthetic Polymers. Anal. Chem. 2004;76:127–136. [Google Scholar]

[R15] 15.Valkenborg D, Jansen I, Burzykowski T. A Model-Based Method for the Prediction of the Isotopic Distribution of Peptides. J. Am. Soc. Mass Spectrom. 2008;19(5):703–712. doi: 10.1016/j.jasms.2008.01.009. [DOI] [PubMed] [Google Scholar]

[R16] 16.Aizikov K, O'Connor PB. Use of the Filter Diagonalization Method in the Study of Space Charge Related Frequency Modulation in Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2006;17(6):836–843. doi: 10.1016/j.jasms.2006.02.018. [DOI] [PubMed] [Google Scholar]

[R17] 17.Maleknia SD, Downard KM. Charge Ratio Analysis Method to Interpret High Resolution Electrospray Fourier Transform-Ion Cyclotron Resonance Mass Spectra. Int. J. Mass Spectrom. 2005;246(1/3):1–9. [Google Scholar]

[R18] 18.Meija J, Caruso JA. Deconvolution of Isobaric Interferences in Mass Spectra. J. Am. Soc. Mass Spectrom. 2004;15(5):654–658. doi: 10.1016/j.jasms.2003.12.016. [DOI] [PubMed] [Google Scholar]

[R19] 19.Fernandez-de-Cossio J, Gonzalez LJ, Satomi Y, Betancourt L, Ramos Y, Huerta V, Amaro A, Besada V, Padron G, Minamino N, Takao T. Isotopica: A Tool for the Calculation and Viewing of Complex Isotopic Envelopes. Nucleic Acids Res. 2004;32:W674–W678. doi: 10.1093/nar/gkh423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Roussis SG, Proulx R. Reduction of Chemical Formulas from the Isotopic Peak Distributions of High-Resolution Mass Spectra. Anal. Chem. 2003;75(6):1470–1482. doi: 10.1021/ac020516w. [DOI] [PubMed] [Google Scholar]

[R21] 21.Horn DM, Zubarev RA, McLafferty FW. Automated Reduction and Interpretation of High Resolution Electrospray Mass Spectra of Large Molecules. J. Am. Soc. Mass Spectrom. 2000;11(4):320–332. doi: 10.1016/s1044-0305(99)00157-9. [DOI] [PubMed] [Google Scholar]

[R22] 22.Datta BP. Polynomial Method of Molecular Isotopic Abundance Calculations: A Computational Note. Rapid Commun. Mass Spectrom. 1997;11(16):1767–1774. [Google Scholar]

[R23] 23.Yergey JA. A General Approach to Calculating Isotopic Distributions for Mass-Spectrometry. Int. J. Mass Spectrom. 1983;52:337–349. doi: 10.1002/jms.4498. [DOI] [PubMed] [Google Scholar]

[R24] 24.Hsu CS. Diophantine Approach to Isotopic Abundance Calculations. Anal. Chem. 1984;56(8):1356–1361. [Google Scholar]

[R25] 25.Kubinyi H. Calculation of Isotope Distributions in Mass Spectrometry. A Trivial Solution for a Non-Trivial Problem. Anal. Chim. Acta. 1991;247:107–119. [Google Scholar]

[R26] 26.Snider RK. Efficient Calculation of Exact Mass Isotopic Distributions. J. Am. Soc. Mass Spectrom. 2007;18(8):1511–1515. doi: 10.1016/j.jasms.2007.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Rockwood AL, Kushnir MM, Nelson GJ. Dissociation of Individual Isotopic Peaks: Predicting Isotopic Distributions of Product Ions in MSn. J. Am. Soc. Mass Spectrom. 2003;14(4):311–322. doi: 10.1016/S1044-0305(03)00062-X. [DOI] [PubMed] [Google Scholar]

PERMALINK

Calculation of the Isotope Cluster for Polypeptides by Probability Grouping

Matthew T Olson

Alfred L Yergey

Abstract

Theory

Neutronic Distribution

Isotopes and equatransneutronic isotopes

Elemental, isotopic, and equatransneutronic probabilities

The Diophantine equation^1,2 for obtaining combinations of equatransneutronic groups

Combining the contributions of equatransneutronic groups to the number of neutrons

Calculating the abundance of a peak in the cluster

Multiplicity

Neutronic Mass

Table 1.

Enrichment

Graphical Output

Computational Features and Requirements

Results and Discussion

Comparison with Existing Algorithms

Table 2.

Calculation of High Molecular Weight Isotope Clusters

Figure 1.

Table 3.

Table 4.

Perturbation of the Isotopic Composition

Figure 2.

Table 5.

Conclusion and Future Directions

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Calculation of the Isotope Cluster for Polypeptides by Probability Grouping

Matthew T Olson

Alfred L Yergey

Abstract

Theory

Neutronic Distribution

Isotopes and equatransneutronic isotopes

Elemental, isotopic, and equatransneutronic probabilities

The Diophantine equation1,2 for obtaining combinations of equatransneutronic groups

Combining the contributions of equatransneutronic groups to the number of neutrons

Calculating the abundance of a peak in the cluster

Multiplicity

Neutronic Mass

Table 1.

Enrichment

Graphical Output

Computational Features and Requirements

Results and Discussion

Comparison with Existing Algorithms

Table 2.

Calculation of High Molecular Weight Isotope Clusters

Figure 1.

Table 3.

Table 4.

Perturbation of the Isotopic Composition

Figure 2.

Table 5.

Conclusion and Future Directions

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

The Diophantine equation^1,2 for obtaining combinations of equatransneutronic groups