Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Nov 19.
Published in final edited form as: Anal Chem. 2013 Oct 29;85(22):10.1021/ac4019268. doi: 10.1021/ac4019268

RAMSY: Ratio Analysis of Mass Spectrometry to Improve Compound Identification

Haiwei Gu 1, G A Nagana Gowda 1, Fausto Carnevale Neto 1,2, Mark R Opp 3, Daniel Raftery 1,4,*
PMCID: PMC3867450  NIHMSID: NIHMS536195  PMID: 24168717

Abstract

The complexity of biological samples poses a major challenge for reliable compound identification in mass spectrometry (MS). The presence of interfering compounds that cause additional peaks in the spectrum can make interpretation and assignment difficult. To overcome this issue, new approaches are needed to reduce complexity and simplify spectral interpretation. Recently, focused on unknown metabolite identification, we presented a new approach, RANSY, (Ratio Analysis of Nuclear Magnetic Resonance Spectroscopy, Anal. Chem. 2011, 83, 7616–7623), which extracts the 1H signals related to the same metabolite based on peak intensity ratios. Based on this concept, we present the Ratio Analysis of Mass Spectrometry (RAMSY) method, which facilitates improved compound identification in complex MS spectra. RAMSY works on the principle that, under a given set of experimental conditions, the abundance/intensity ratios between the mass fragments from the same metabolite are relatively constant. Therefore, the quotients of average peak ratios and their standard deviations, generated using a small set of MS spectra from the same ion chromatogram, efficiently allow the statistical recovery of the metabolite peaks and facilitate reliable identification. RAMSY was applied to both gas chromatography (GC)-MS and liquid chromatography tandem MS (LC-MS/MS) data to demonstrate its utility. The performance of RAMSY is typically better than the results from correlation methods. RAMSY promises to improve unknown metabolite identification for MS users in metabolomics or other fields.

Keywords: ratio analysis of mass spectrometry, compound identification, GC-MS, LC-MS/MS, metabolomics

INTRODUCTION

Mass spectrometry (MS) is an essential analytical tool in complex mixture analysis, and enables the detection and quantitation of hundreds of metabolites in biological samples from a single measurement. MS therefore plays a prominent role in the growing metabolomics field.19 The combination of MS with either gas chromatography (GC) or liquid chromatography (LC) allows for identification of several hundred metabolites through reliable matching of parameters such as m/z, retention time, and/or fragmentation patterns from one or more standard MS libraries. In GC-MS, the commonly used electron ionization (EI) provides fairly reproducible and characteristic mass fragments for each metabolite. By matching these data with standard mass spectra in databases, such as the National Institute of Standards and Technology (NIST) database10, 11 or the Agilent Fiehn GC-MS Metabolomics Retention Time Locking (RTL) library,12 the identity of observed spectra can be established. A match factor (MF), which is a probability measure of accuracy, is often calculated for each identified metabolite.13 The use of retention time (RT) and retention index (RI) often provides improved identification.12, 14 Meanwhile, LC-MS detected metabolites are typically identified based on exact mass, isotopic ratios, and/or tandem mass spectrometry results.1518 Commonly used exact mass MS or MS/MS databases include the Human Metabolome Database (HMDB)19, 20 and the METLIN Metabolite Database.2123

However, in many practical applications, the identification of compounds using MS is still challenging. For example, in metabolomics5, 7, 8, 2427 the identity of nearly 2/3 of the metabolites detected by current MS methods remains unknown. Moreover, the identity of a large fraction of the metabolites with lower match factors continues to be ambiguous and unreliable. While this situation is not surprising given the high complexity of biological samples, improvements that facilitate reliable identification are highly desirable.

A significant challenge arises from the interfering peaks that complicate the MS spectra and make identification difficult. One possible approach is to use correlation methods to help isolate peaks from the same metabolites.28 A number of such methods have been used to analyze nuclear magnetic resonance (NMR) spectroscopy spectra, especially STOCSY (statistical total correlation spectroscopy).2931 In such methods peaks from the same metabolites are identified based on correlations between the peaks; however, due to the substantial number of correlations often observed with other metabolites, it can be difficult to distinguish the meaningful correlations among the peaks from a metabolite. More recently, we proposed the method of RANSY (Ratio Analysis of Nuclear Magnetic Resonance Spectroscopy),32 to facilitate the isolation of peaks from the same metabolite and thereby improve compound identification using NMR spectroscopy. RANSY works on the principle that the intensity ratios between the NMR peaks from the same metabolite are fixed. Hence, across a set of samples, the standard deviation of the ratios of peaks from the same metabolite will be small (zero, in principle). On the other hand, ratios of peaks from unrelated metabolites will typically have a large standard deviation, except in those rare cases where the metabolites of interest are highly correlated. We applied RANSY to both 1D and 2D NMR data and showed that its performance is generally better than correlation for statistically isolating the peaks associated with a particular metabolite across a set of spectra.

In the current study, we extend the concept of RANSY, and present the Ratio Analysis of Mass Spectrometry (RAMSY) method, which facilitates improved compound identification using MS. We apply RAMSY to GC-MS and LC tandem MS (LC-MS/MS) data and demonstrate that RAMSY reduces spectral interference and facilitates the identification of individual molecules in overlapped MS spectra without the need for additional experiments.

THEORY

The working principle of RAMSY is similar to that for RANSY.32 RAMSY is also designed to work using single datasets that contain multiple MS spectra for the same metabolite. For peaks that originate from the same compound, under the same experimental conditions, their MS peak intensity ratios across the chromatographic peak should be relatively constant. In addition, the standard deviations of those ratios should be small. As shown schematically in Figure 1, during a typical analysis of complex mixtures the chromatographic elution of a compound of interest (A, red, smaller chromatographic peak) is often interfered by another compound (B, blue, larger chromatographic peak). However, by choosing a driving peak (marked with an *) from compound A, the ratios between all the MS peaks from compound A and the driving peak will show much less variation than ratios between the MS peaks from compound B and the compound A driving peak. Thus, the RAMSY calculation will reduce the interference of compound B’s MS peaks from the spectra of compound A.

Figure 1.

Figure 1

Schematic illustration of RAMSY. In this example, the chromatographic elution of compound A (red, smaller chromatographic peak) is interfered by compound B (blue, larger chromatographic peak). For the MS spectra collected at different retention times (1, 2, and 3), the driving peak (*) and other MS peaks from the same compound (compound A) have ratios that are relatively constant, and the ratios’ standard deviations across the spectra are typically small. In contrast, the ratios for MS peaks from compound B and the driving peak (*) from compound A vary and the resulting standard deviations are relatively large. The calculated RAMSY spectrum will de-emphasize peaks from compound B, reducing the interference and aiding compound identification.

The procedure for calculating the RAMSY spectrum is largely as described previously.32 Briefly, a driving peak is selected from the mass spectra and then ratios between the driving peak and all the other points (or peaks) in the spectra are calculated as shown in Equation 1 below.

Di,j=Xi,jXi,k (1)

Where, the vector Xi is the ith spectrum of a set of n MS spectra, and the jth data point of m total points in that spectrum is denoted as Xi,j (Xi,k is the driving peak). D is the ratio matrix of dimension n × m.

The RAMSY values, denoted as an m-element vector R, are the quotients of means and standard deviations across columns of D. The standard deviation is zero for the driving peak itself; therefore, its RAMSY value is pre-defined (e.g., the value of the highest RAMSY ratio). The other RAMSY values are calculated as elements of the vector R as follows:

Rj=1ni=1nDi,j1ni=1n(Di,j-1ni=1nDi,j)2 (2)

Since a ratio’s standard deviation is used as the denominator, a small standard deviation will produce a large reciprocal value, generating a peak (in principle an MS peak from the same compound as that for the driving peak). In general, the MS peaks from interfering compounds will generate large standard deviations and thus small RAMSY numbers, similar to noise values. Notably, RAMSY values are dimensionless.

EXPERIMENTAL METHODS

Chemicals

Arginine, acetonitrile, methanol, pyridine, and methoxyamine hydrochloride were purchased from Sigma-Aldrich (St. Louis, MO). Ammonium acetate was purchased from Fisher Scientific (Hampton, NH). DI water was provided in-house with a Synergy Ultrapure Water System from Millipore (Billerica, MA). FAME (Fatty Acid Methyl Ester, chain lengths from C8–C30) mixture was purchased from Supelco (Bellefonte, PA), and the Fiehn GC/MS Metabolomics Standards Kit was obtained from Agilent Technologies (Santa Clara, CA).

Biological Samples

Sprague-Dawley rat plasma samples for GC-MS were provided by Prof. Mark Opp in the University of Washington (Seattle, WA) and were collected in accordance with UW IACUC approved protocols. Human serum samples for LC-MS/MS experiments were obtained from Innovative Research, Inc. (Novi, MI).

GC-MS

We followed the general procedures for the Agilent Fiehn GC-MS Metabolomics RTL library,12, 33 with minor changes incorporated to improve detection sensitivity. Briefly, proteins in rat plasma samples were first precipitated by the addition of methanol in a ratio of 200 μL methanol to 100 μL sample. The mixture was vortexed and then stored at 4 °C for 30 min. While still cold, samples were centrifuged for 10 min at 13,000 rpm. The supernatant was then transferred to a clean 1.5 mL Eppendorf tube. Another 200 μL methanol was added to the protein pellet, mixed well, and centrifuged for 10 min at 13,000 rpm. The resulting supernatant was combined with the first, vortexed for 30 sec, and then evaporated to dryness using an Eppendorf Vacufuge (Eppendorf, Hauppauge, NY). A solution of myristic acid-d27 (5 μL) from the Fiehn GC/MS Metabolomics Standards Kit was added as an internal standard for retention time locking. For the derivatization process, the samples were first oximated by adding 10 μL of O-methylhydroxylamine hydrochloride solution (in pyridine) at 30 °C for 90 min. The samples were then derivatized using 90 μL N-methyl-n-trimethylsilyltrifluoroacetamine with 1% chlorotrimethylsilane (MSTFA+1% TMCS) at 37 °C for 30 min. Subsequently, 2 μL of the FAME (fatty acid methyl acid) mixture was added to each sample; the solution was gently vortexed and transferred to a GC-MS glass vial for analysis.

GC-MS experiments were performed on an Agilent 7890A GC-5975C MSD system (Agilent Technologies, Santa Clara, CA) by injecting 1 μL of the prepared samples with a split ratio of 10:1. Helium was used as the carrier gas with a constant flow rate of 1.2 mL/min. The separation of metabolites was achieved using an Agilent DB5-MS+10m Duraguard Capillary Column (30 m × 250 μm × 0.25 μm). The column temperature was maintained at 60 °C for 1.00 min, then increased at a rate of 10 °C/min to 325 °C, and held at this temperature for 10 min. Mass spectral signals were recorded after a 4.90 min solvent delay.

LC-MS/MS

Frozen human serum samples were thawed, and proteins were precipitated by mixing 100 μL serum with 200 μL cold methanol. The mixture was centrifuged at 13,000 rpm for 10 min. The supernatant was transferred to a clean 1.5 mL Eppendorf tube and then dried under vacuum (Eppendorf Vacufuge). The obtained residue was reconstituted in 250 μL DI water prior to LC-MS/MS analysis.

LC-MS/MS experiments were performed using an Agilent 1200 SL LC system coupled online with an Agilent 6520 Q-TOF mass spectrometer (Agilent Technologies, Santa Clara, CA). Each prepared sample (8 μL) was injected onto an Agilent Poroshell 120 EC-C18 column (2.1 × 50 mm, 2.7-micron), which was heated to 50 °C. The flow rate was 0.5 mL/min. Mobile phase A was 5 mM ammonium acetate in water, and mobile phase B was 0.1% water in ACN. The mobile phase composition was kept isocratic at 3% B for 1 min, and was increased to 90% B in 4 min; after another 4 min at 90% B, the mobile phase composition was returned to 3% B. Electrospray ionization (ESI) was used in positive mode, and the voltage was 3.5 kV. The collision energy for automatic LC-MS/MS experiments was fixed at 10 V, targeting pre-selected compounds (such as arginine at m/z 175.1195). The mass accuracy of our LC-MS system is generally less than 5 ppm; the Q-TOF MS spectrometer was calibrated prior to each batch run, and a mass accuracy of less than 1 ppm was often achieved using the standard tuning mixture (G1969-85000, Agilent Technologies, Santa Clara, CA). Throughout the MS measurements, reference masses of m/z 121.0509 and 922.0098 were used to correct any mass errors. The absolute intensity threshold for MS data collection was set to 100, and the relative threshold was 0.001%. The absolute intensity threshold for MS/MS measurements was 5, and the relative threshold was 0.01%. The acquisition rate was 1.5 spectra/s. This Q-TOF system has good resolution for MS measurements; for example, in a typical tuning the resolution was 4787 and 7315 for the ions at m/z 118.0863 and 322.0481, respectively. After data acquisition, the whole data set obtained for each sample was exported for analysis, without identifying any MS/chromatogram peaks.

Data Analysis

The data were analyzed using Matlab 7.0 software (MathWorks, Natick, MA) installed on a personal computer. The data were subjected to analysis using RAMSY as well as correlation algorithms (see Supporting Information; free download available at http://depts.washington.edu/nwmrc/RAMSY). The RAMSY and correlation values were set to zero if all the intensities at a specific m/z were zero across all the spectra selected in the calculation.

In addition, we computed spectral Match Factors (MFs) using the same algorithms as those reported for the NIST library.13 To compute a spectral Match Factor, we first obtained the “angle” between the two spectra:

F1=M(ASAU)1/2[MASMAU]1/2 (3)

M is the m/z value, and AS and AU are the base-peak normalized abundances of the peaks in the standard spectrum and unknown spectrum, respectively. Next, F2 is calculated:

F2=(1NU&S-1)i=2NU&S(AS,iAS,i-1)n(AU,iAU,i-1)-n (4)

F2 is based on relative intensities of pairs of adjacent peaks present in both spectra. NU&S is the number of peaks common to the unknown and standard spectra, and n = 1 (−1) if the first abundance ratio is less (larger) than the second. The Match Factor is then calculated as follows:

MF=1000NU+NU&S(NUF1+NU&SF2) (5)

where “1000” is the scaling parameter. A perfect match results in an MF value of 1000; spectra with no peaks in common result in a value of 0.

RESULTS AND DISCUSSION

The RAMSY approach was applied to both GC-MS and LC-MS/MS data. To demonstrate the performance of the method, we focused on ion chromatograms that provided overlapping mass spectra due to co-eluting metabolites. The GC-MS spectra for methyl caprylate and LC-MS/MS spectra for arginine were thus selected as examples for the analysis. Notably, RAMSY is a versatile method and can be applied for compound identification using other analytical platforms.

GC-MS

For GC-MS data, we chose a relatively simple example of a compound of interest that is overlapped with other peaks from the biological sample. In this example, the compound is methyl caprylate, a C8 fatty-acid methyl ester commonly used as one of the 12 retention index (RI) markers, which makes it important to identify correctly.12 As shown in Figure 2, methyl caprylate appears in the total ion chromatogram (TIC) of the FAME mixture at 7.8 min (see Figure 2a and inset), but is heavily overlapped by interfering compound(s) in the rat plasma sample spiked with the FAME mixture (Figure 2b and inset). The extraction of the mass spectrum of methyl caprylate from the TIC of Figure 2a (the local chromatographic peak maximum at 7.80 min) provides a clean mass spectrum (Figure 2c). Figure 2d shows the EI-MS spectrum of the interfering compound(s) at 7.86 min. Comparing Figure 2c and Figure 2d, it is observed that the peak at m/z 74 (the base peak in Figure 2c) is locally more unique to methyl caprylate (chosen as the driving peak in RAMSY), while the interference is mainly caused by the MS peaks at m/z 73 and 147 (Figure 2d).

Figure 2.

Figure 2

a) TIC of the FAME mixture; the inset shows a well-resolved peak for methyl caprylate at 7.80 min. b) TIC of the GC-MS data from a rat plasma sample spiked with the FAME standards; the inset shows the expanded TIC between 7.75 min and 7.90 min, and no chromatographic peak can be resolved for methyl caprylate. c) The EI-MS spectrum at the chromatographic peak maximum (7.80 min, Figure 2a). d) The EI-MS spectrum of the interfering compound(s) at 7.86 min (local maximum, Figure 2b), dominated by the peaks at m/z 73 and 147.

We first used the NIST library to provide MFs for the extracted ion chromatograms. For methyl caprylate in the spectrum of the FAME mixture (Figure 2c), the NIST library provided an MF of 904, which is considered an excellent match (this spectrum was selected as the standard spectrum in the following MF calculations). However, in analyzing the rat plasma sample, the best MF for the same compound obtained after scanning all the mass spectra in the TIC peak in the range 7.75–7.90 min was 774 (7.81 min; Supporting Information Figure S1); such a value (which lies between 700–800) is considered to be a fair match according to NIST.11 The MF for the average spectrum (Supporting Information Figure S2) calculated from 25 mass spectra over the same time points range was only 195 (vida infra).

To identify methyl caprylate using the RAMSY approach, we again selected 25 mass spectra from the TIC in the range 7.75–7.90 min (Figure 2b). The RAMSY spectrum calculation was performed using the peak at 74 m/z as the driving peak (locally unique to methyl caprylate). Figure 3a shows the RAMSY spectrum in the range m/z 50–400, and Figure 3b shows the 8 MS peaks with top RAMSY values (including the driving peak; the number of selected peaks is explained below). Figure 3c shows the averaged EI-MS spectrum (Supporting Information Figure S2) after filtering with the RAMSY values (only those MS peaks with top RAMSY values were shown). It can be seen from the comparison of the spectra shown in Figure 2c and Figure 3b that RAMSY correctly identified many of the methyl caprylate peaks including those at m/z 53 (small peak), 55, 59, 87, 101, and 115. However, RAMSY missed a fragment peak at m/z 127 and a weak molecular ion peak at m/z 158. RAMSY also picked up the wrong peak at m/z 58 (not marked on Figure 3b) since there was no peak observed there in Figure 2c. It may be interesting to note that despite the unit resolution of GC-MS, RAMSY was able to eliminate unrelated peaks quite specifically. The strong overlapping peak at m/z 73 (Figure 2d and Supporting Information Figure S2) did not appear in the RAMSY spectrum (Figure 3b) when using the nearby driving peak at m/z 74, although the MS peak at m/z 73 could generate the isotopic peak at m/z 74. Since RAMSY is based on ratio analysis, it is relatively independent of the peaks’ original intensities. As shown in Figure 2d and Supporting Information Figure S2, the peak at m/z 147 is the base peak; however, its RAMSY value was not one of the top 8 (Figure 3b).

Figure 3.

Figure 3

a) The RAMSY spectrum in the range m/z 50–400, based on the 25 mass spectra from the TIC in the range 7.75–7.90 min (Figure 2b); the driving peak at m/z 74 for methyl caprylate identification is indicated by the asterisk. b) The 8 peaks with top RAMSY values. c) The averaged spectrum (Figure S2) after filtering with the selected RAMSY values in Figure 3b. The RAMSY method indicated many correct fragment ions in the averaged EI-MS spectrum. d) The correlation spectrum in the range m/z 50–400, based on the 25 mass spectra from the TIC in the range 7.75–7.90 min (Figure 2b); the driving peak at m/z 74 for methyl caprylate identification is indicated by the asterisk. e) The 8 MS peaks having top correlation values with the peak at m/z 74 (indicated by the asterisk) using the same data as that for Figure 3b. f) The averaged spectrum (Figure S2) after filtering with the selected correlation values in Figure 3e.

We then compared the performance of RAMSY with a correlation calculation. Figure 3d shows the correlation spectrum in the range m/z 50–400, based on the same 25 mass spectra from the TIC in the range 7.75–7.90 min (Figure 2b). For a fair comparison, eight peaks were also identified to have high correlation values with the peak at m/z 74 (Figure 3e) which had been used as the driving peak for RAMSY. The correlation approach correctly identified a number of fragment ions of methyl caprylate such as those at m/z 115, 101, 59, 55, 57, and 53. However, it missed the peak at m/z 87, which is the second highest fragment ion peak for methyl caprylate (Figure 2c). Similar to RAMSY, the correlation approach incorrectly selected the peak at m/z 58.

MFs were also calculated based on the average GC-MS spectra generated using the peaks identified by correlation and peaks identified by RAMSY, separately (Table 1). The MF calculated using the averaged MS spectrum (8 highest peaks selected to match the number of peaks used in the calculation) against the standard spectrum of methyl caprylate was 195, which represents a very poor match according to the NIST criteria.11 The MF determined based on 8 MS peaks (the driving peak at m/z 74 and seven identified peaks) from the correlation method was 688. The MF calculated using the 8 MS peaks (the driving peak at m/z 74 and seven identified peaks) selected by RAMSY was 752, which lies between 700 and 800 and is considered to be a fair match. The MF of RAMSY is comparable to the best MF value (774) that could be obtained in this RT region (7.75–7.90 min) by comparing individual spectra. Both the average-spectrum and correlation provided a poorer MF.

Table 1.

Match Factor value comparisons for examining the performance of RAMSY and correlation for compound identification using the GC-MS and LC-MS/MS data. Detailed parameters for these calculations are given in the text.

GC-MS LC-MS/MS
Molecule methyl caprylate arginine
Averaged Spectrum 195 644
Correlation 688 117
RAMSY 752 780

Figure 4 shows the MF values of RAMSY and correlation with different numbers of selected peaks in the calculation. The MS peaks with top RAMSY/correlation values were selected from the averaged spectrum (Supporting Information Figure S2) and used in the MF calculation. It is clearly seen that MF values of RAMSY/correlation are relatively stable when the number of selected MS peaks is between 5 and 10. In general, RAMSY generates a higher MF than correlation; even in a few cases when correlation generates a higher MF value than RAMSY (e.g., when the number of selected peaks=13), the MF values are close. Therefore, the selection of 8 peaks in the analysis above represents a typical example.

Figure 4.

Figure 4

The MF values of RAMSY and correlation with different numbers of selected peaks in the calculation (GC-MS data). The MS peaks with top RAMSY/correlation values were selected from the averaged spectrum (Supporting Information Figure S2) and used in the MF calculation.

LC/MS/MS

RAMSY was also applied to simplify LC-MS/MS spectra for improving compound identification. Using human serum samples, we targeted the TIC region for arginine, a metabolite which had MS/MS spectra containing interfering peaks from other metabolites. A typical MS/MS spectrum at the TIC peak maximum at 0.65 min obtained by targeting arginine at m/z 175.1195 is shown in Figure 5a. Based on the qualitative comparison with the standard spectrum for arginine from the Metlin database,22 a number of interfering peaks, including the strong peaks at m/z 59 and 118, can be seen in the MS/MS spectrum (Figure 5a). Figure 5b shows the MS/MS spectrum for the standard sample of arginine, and it is very similar to the Metlin standard spectrum of arginine and further confirmed the interference peaks in Figure 5a.

Figure 5.

Figure 5

a) The LC-MS/MS spectrum at the peak maximum of the TIC for a human serum sample, targeting arginine at m/z 175.1195. The inset shows the TIC of the LC-MS/MS data. The signals at m/z 59 and 118 are major interfering peaks. b) The standard LC-MS/MS spectrum of arginine.

For the RAMSY and correlation analysis, 8 LC-MS/MS spectra were selected from the TIC region between 0.55–0.85 min. Figure 6a shows the average of these spectra. RAMSY and correlation spectra were calculated using the protonated molecular ion peak of ariginine at m/z 175.1195 as the driving peak. Figure 6b and Figure 6c show the RAMSY and correlation spectra, respectively, based on the 8 LC-MS/MS spectra of arginine collected from human serum samples. Although not perfect, it is observed that many peaks in the RAMSY spectrum (Figure 6b) have better “signal-to-noise ratio,” while the correlation spectrum (Figure 6c) looks more “noisy.” The RAMSY spectrum shows 8 peaks above a threshold value of 1.4 (Figure 6b and Figure S3a). These eight peaks were selected, since the main purpose of this example is to examine the relationship between arginine and the major peaks in the MS/MS spectra. As seen in Figure 5b, Figure 6b, and Figure S3a, RAMSY identified almost all the major MS/MS peaks from arginine and greatly de-emphasized the interfering peaks at m/z 59 and 118 (the second highest peak in Figure 6a). Using the same MS/MS spectra, we made correlation calculations for comparison. As seen in Figure 6c and Figure S3b, the correlation analysis identified 8 peaks with correlation values above 0.65 (the same number of peaks were selected for fair comparison between RAMSY and correlation). Three of the peaks at m/z 70, 116, and 158 were in agreement with the peaks from the standard compound (arginine, Figure 5b) and those identified by RAMSY (Figure 6b and Figure S3a). However, a number of peaks including those at m/z 84, 87, and 118 were not related to the arginine spectrum.

Figure 6.

Figure 6

a) The averaged LC-MS/MS spectrum of the 8 spectra from human serum samples selected from the TIC region 0.55–0.85 min that contains MS/MS spectra for arginine. b) The RAMSY spectrum based on the 8 LC-MS/MS spectra of arginine collected from human serum samples. c) The correlation spectrum based on the 8 LC-MS/MS spectra of arginine collected from human serum samples.

MF calculations were made in a similar manner to those for the GC-MS spectra, and the results are included in Table 1. The MF obtained from the averaged LC-MS/MS spectrum (with a threshold of 90 to match the number of peaks used in the calculation) was 644. The MF calculated for the RAMSY spectrum (with a threshold of 1.4) provided a value of 780, while the MF calculated using the correlation derived spectrum (with a threshold of 0.65) provided an MF value of 117. In this case RAMSY also provides a more accurate representation for the identity of the metabolite, arginine. Finally, Figure 7 shows the MF values for RAMSY and correlation with different numbers of selected peaks from the averaged LC-MS/MS spectrum of arginine (Figure 6a). It is clearly seen that RAMSY generates higher MFs than correlation, for as many as 45 peaks selected according to the top RAMSY/correlation values.

Figure 7.

Figure 7

The MF values of RAMSY and correlation with different numbers of selected peaks in the calculation (LC-MS/MS data). The MS peaks with top RAMSY/correlation values were selected from the averaged spectrum (Figure 6a) and used in the MF calculation. On average, each peak has 22 data points in our high-resolution LC-Q-TOF experiments.

Given these results, we believe that the ratio analysis spectroscopy is a potentially powerful approach to simply crowded mass spectra for the reliable identification of metabolites within complex mixtures. The RAMSY approach is demonstrated using specific examples for both GC-MS and LC-MS/MS data. RAMSY works when the GC/LC elution of compounds of interest and the interfering compounds is not exactly the same; however, even when the chromatographic peaks are the same, RAMSY is still applicable if the MS intensity ratios of the compounds are changing (e.g., the slopes of the calibration curves are different). Based on our experience, generally more than 5 spectra and 5–10 MS peaks with top RAMSY values should be utilized. An advantage of using the RAMSY approach is that it is resistant to low peak intensities since the peak identification is based on the ratios. Further, since the RAMSY calculations are primarily based on the ratios of the peak intensities, this approach is anticipated to be less sensitive to altered chromatography conditions. A major requirement for RAMSY is that at least one isolated peak in the mass spectrum is needed to serve as the driving peak, since RAMSY cannot quantitatively differentiate the contribution from coincident interference peaks (i.e., the peak contains contributions from both the compound of interest and the interfering compound). However, it should be possible under most circumstances to find at least one relatively unique peak in the spectra for a given compound of interest. We anticipate that incorporation of the RAMSY approach as a digital filter to metabolite identification algorithms already available will benefit reliable compound identification in the mass spectra of complex biological mixtures.

CONCLUSIONS

Reliable compound identification from the analysis of mass spectra is a considerable challenge. In this work, efforts focused on alleviating this problem showed improved statistical isolation of peaks from the same metabolite in complex mass spectra using a new approach, RAMSY, which has promise for enhanced metabolite identification. RAMSY identifies mass spectral peaks or fragment ions based on their fixed ratios for a given metabolite under the same experimental conditions. Using specific examples, we have demonstrated RAMSY using both GC-MS and LC-MS/MS data. While RAMSY cannot provide a perfect solution, it fairs quite well in terms of peak identification and Match Factors relative to current correlation or averaging methods. An advantage of RAMSY is that it uses multiple mass spectra from the same chromatograms and does not need additional experiments. Specifically, in combination with advanced methods/software tools, RAMSY shows promise for reliable metabolite identification and reducing the large list of unidentified metabolites in the mass spectra of complex biological samples. We anticipate that the RAMSY algorithm can be incorporated into current analysis software, and that its applications can be made quite broad, including, but not limited to hetero-spectroscopic applications.

Supplementary Material

1_si_001

Acknowledgments

This work was supported by the NIH/NIGMS (Grant 2R01 GM085291).

Footnotes

Supporting Information

Supporting Information Available: This material is available free of charge via the Internet at http://pubs.acs.org.

References

  • 1.Dunn WB, Ellis DI. TRAC-Trend Anal Chem. 2005;24:285–294. [Google Scholar]
  • 2.Zhang AH, Sun H, Wang P, Han Y, Wang XJ. Analyst. 2012;137:293–300. doi: 10.1039/c1an15605e. [DOI] [PubMed] [Google Scholar]
  • 3.Scalbert A, Brennan L, Fiehn O, Hankemeier T, Kristal BS, van Ommen B, Pujos-Guillot E, Verheij E, Wishart D, Wopereis S. Metabolomics. 2009;5:435–458. doi: 10.1007/s11306-009-0168-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pan ZZ, Raftery D. Anal Bioanal Chem. 2007;387:525–527. doi: 10.1007/s00216-006-0687-8. [DOI] [PubMed] [Google Scholar]
  • 5.Fiehn O. Plant Mol Biol. 2002;48:155–171. [PubMed] [Google Scholar]
  • 6.Dettmer K, Aronov PA, Hammock BD. Mass Spectrom Rev. 2007;26:51–78. doi: 10.1002/mas.20108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gowda GAN, Zhang SC, Gu HW, Asiago V, Shanaiah N, Raftery D. Expert Rev Mol Diagn. 2008;8:617–633. doi: 10.1586/14737159.8.5.617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Reaves ML, Rabinowitz JD. Curr Opin Biotechnol. 2011;22:17–25. doi: 10.1016/j.copbio.2010.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lindon JC, Holmes E, Nicholson JK. Pharm Res. 2006;23:1075–1088. doi: 10.1007/s11095-006-0025-z. [DOI] [PubMed] [Google Scholar]
  • 10.Babushok VI, Linstrom PJ, Reed JJ, Zenkevich IG, Brown RL, Mallard WG, Stein SE. J Chromatogr A. 2007;1157:414–421. doi: 10.1016/j.chroma.2007.05.044. [DOI] [PubMed] [Google Scholar]
  • 11.Stein SE. NIST/EPA/NIH Mass Spectral Database NIST 11) and NIST Mass Spectral Search Program Version 2.0g) National Institute of Standards and Technology; Gaithersburg: 2011. [Google Scholar]
  • 12.Kind T, Wohlgemuth G, Lee DY, Lu Y, Palazoglu M, Shahbaz S, Fiehn O. Anal Chem. 2009;81:10038–10048. doi: 10.1021/ac9019522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Stein SE. J Am Soc Mass Spectrom. 1994;5:316–323. doi: 10.1016/1044-0305(94)85022-4. [DOI] [PubMed] [Google Scholar]
  • 14.Mallard WG, Reed J. AMDIS-User Guide. National Institute of Standards and Technology NIST; Gaithersburg: [Google Scholar]
  • 15.Liang Y, Wang GJ, Xie L, Sheng LS. Curr Drug Metab. 2011;12:329–344. doi: 10.2174/138920011795202910. [DOI] [PubMed] [Google Scholar]
  • 16.Rivier L. Anal Chim Acta. 2003;492:69–82. [Google Scholar]
  • 17.Prasad B, Garg A, Takwani H, Singh S. TRAC-Trend Anal Chem. 2011;30:360–387. [Google Scholar]
  • 18.Rojas-Cherto M, Peironcely JE, Kasper PT, van der Hooft JJJ, de Vos RCH, Vreeken R, Hankemeier T, Reijmers T. Anal Chem. 2012;84:5524–5534. doi: 10.1021/ac2034216. [DOI] [PubMed] [Google Scholar]
  • 19.Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu YF, Djoumbou Y, Mandal R, Aziat F, Dong E, Bouatra S, Sinelnikov I, Arndt D, Xia JG, Liu P, Yallou F, Bjorndahl T, Perez-Pineiro R, Eisner R, Allen F, Neveu V, Greiner R, Scalbert A. Nucleic Acids Res. 2013;41:D801–D807. doi: 10.1093/nar/gks1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.http://www.hmdb.ca/.
  • 21.Tautenhahn R, Cho K, Uritboonthai W, Zhu ZJ, Patti GJ, Siuzdak G. Nat Biotechnol. 2012;30:826–828. doi: 10.1038/nbt.2348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.http://metlin.scripps.edu/.
  • 23.Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. Anal Chem. 2006;78:779–787. doi: 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]
  • 24.Gu H, Gowda GAN, Raftery D. Future Oncol. 2012;8:1207–1210. doi: 10.2217/fon.12.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Nicholson JK, Holmes E, Kinross JM, Darzi AW, Takats Z, Lindon JC. Nature. 2012;491:384–392. doi: 10.1038/nature11708. [DOI] [PubMed] [Google Scholar]
  • 26.Bain JR, Stevens RD, Wenner BR, Ilkayeva O, Muoio DM, Newgard CB. Diabetes. 2009;58:2429–2443. doi: 10.2337/db09-0580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fan TWM, Lane AN. J Biomol NMR. 2011;49:267–280. doi: 10.1007/s10858-011-9484-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Borras S, Kaufmann A, Companyo R. Anal Chim Acta. 2013;772:47–58. doi: 10.1016/j.aca.2013.02.012. [DOI] [PubMed] [Google Scholar]
  • 29.Cloarec O, Dumas ME, Craig A, Barton RH, Trygg J, Hudson J, Blancher C, Gauguier D, Lindon JC, Holmes E, Nicholson J. Anal Chem. 2005;77:1282–1289. doi: 10.1021/ac048630x. [DOI] [PubMed] [Google Scholar]
  • 30.Blaise BJ, Navratil V, Emsley L, Toulhoat P. J Proteome Res. 2011;10:4342–4348. doi: 10.1021/pr200489n. [DOI] [PubMed] [Google Scholar]
  • 31.Holmes E, Cloarec O, Nicholson JK. J Proteome Res. 2006;5:1313–1320. doi: 10.1021/pr050399w. [DOI] [PubMed] [Google Scholar]
  • 32.Wei SW, Zhang J, Liu LY, Ye T, Gowda GAN, Tayyari F, Raftery D. Anal Chem. 2011;83:7616–7623. doi: 10.1021/ac201625f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Agilent Fiehn GC/MS Metabolomics RTL User Guide. Agilent Technologies; Wilmington: 2008. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

RESOURCES