Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jul 9.
Published in final edited form as: Anal Chem. 2006 May 15;78(10):3289–3295. doi: 10.1021/ac060245f

Non-linear Data Alignment for UPLC-MS and HPLC-MS based Metabolomics: Application to Endogenous and Exogenous Metabolites in Human Serum

Anders Nordström 1, Grace O’Maille 1, Chuan Qin 1, Gary Siuzdak 1,*
PMCID: PMC3705959  NIHMSID: NIHMS61036  PMID: 16689529

Abstract

We have validated a strategy for serum metabolomics using non-linear retention time correction for the alignment of LC-MS data. Two small molecule mixtures with a difference in relative concentration of 20% to 100% in for 10 of the compounds were added to human serum. The metabolomics protocol using UPLC and XCMS for LC-MS data alignment could readily identify 8 of 10 spiked differences among more than 2700 features detected. Normalization of data against a single factor obtained through averaging the XCMS integrated response areas of spiked standards increased the number of identified differences. The original data structure was well preserved using XCMS, but re-integration of identified differences in the original data decreased the number of false positives. Using UPLC for separation resulted in 20% more detected components compared to HPLC. The length of the chromatographic separation also proved to be a crucial parameter for number of detected features. Moreover, UPLC displayed better retention time reproducibility and signal to noise ratios (S/N) for spiked compounds over HPLC, making this technology more suitable for non-targeted metabolomics applications.

Keywords: Metabolomics, XCMS, retention time correction, UPLC, mass spectrometry, human serum, validation

1. Introduction

Mass spectrometry has established itself as a useful tool for metabolomics analysis for its capability to measure compounds present at very low levels and at the same time provide structural information (Villas-Boas et al., 2005; Want et al., 2005). With separation systems such as gas or liquid chromatography (GC or LC) coupled with mass analysis, the information content can be dramatically increased due to reduced ionization suppression and temporal separation of isomers. Prior to any statistical or multivariate analysis of acquired data, it is a prerequisite that the data is aligned, i.e. in a GC or LC-MS analysis, m/z (X) at retention time (Y) must be consistent throughout all observations. While drift in the m/z direction is fairly straightforward to correct by m/z calibration, the error in the retention-time domain is more difficult to align. To achieve datasets relevant for further statistical processing, three main strategies are used;

(1). Alignment of chromatograms

Alignment of the chromatograms without prior component picking can potentially be very useful due to little or non operator intervention in setting criteria for peak finding etc. Furthermore, it enables direct differential analysis between entire sets of data matrices. A procedure for aligning raw LC-UV chromatograms, (COW), was developed by Nielsen and co-workers (Nielsen et al., 1998). The COW algorithm has been further adapted to make use of MS data for chromatogram alignment (Bylund et al., 2002). An algorithm for aligning one dimensional GC-FID data was also developed by Johnson and co-workers (Johnson et al., 2003)

(2). Summation or binning of chromatographic data

Summation of chromatographic segments in LC or GC –MS analysis is one way of reducing the problems associated with data alignment. Summation of m/z data across preset time windows will result in no loss of information and the alignment error will be confined to the edges of the windows or bins. After multivariate analysis, which reveals windows displaying significant difference and subsequent deduction of the m/z responsible for the difference, a reversion to the original chromatograms can confirm the differences (Plumb et al., 2002; Jonsson et al., 2004)

(3). Curve resolution or deconvolution

This strategy involves finding components that are subsequently matched over the different observations (data sets). With GC-MS electron ionization (EI) spectra, deconvolution using the free AMDIS software (Davies, 1998; Stein, 1999) or proprietary mass spectrometer manufacturer software can be done prior to matching of the components with a software like MSFACT (RTAlign) (Duran et al., 2003). However AMDIS and other software primarily developed for EI spectra require the presence of several mass traces that converge at the same retention time for deconvolution. With LC and electrospray ionization (ESI) this is not always the case. For LC-MS data, Windig and co-workers developed the component detection algorithm (CODA) (Windig et al., 1996) which reduced the number of spectra to be investigated by an order of magnitude. Component resolving algorithms developed for LC-MS data includes GENTLE (Shen et al., 2001; Idborg-Bjorkman et al., 2003) and MEND (Andreev et al., 2003). More recently other integrated strategies for comparison of LC-MS data in a metabolomics have been developed (Radulovic et al., 2004; Jonsson et al., 2005) as well as the MZmine software (Katajamaa and Oresic, 2005). There are also commercial software available such as MS-resolver (Pattern Recognition Systems) and MarkerLynx (Waters) which have been compared (Idborg et al., 2005) and metAlign (Plant Research International) has been used for mining LC-MS data in a non targeted fashion (Vorst et al., 2005). After peak finding, matching of peaks has to be performed. For this, several different matching strategies have been used. Examples include the use of time windows to assign components with specific retention time and intensity to a certain component group (Duran et al., 2003). “Master chromatogram alignment” i.e. matching of all the peaks in a sample to a master peak list by using retention time and an m/z value for aligning components that are in the proximity of each other in time and that possesses a similar m/z value (Katajamaa and Oresic, 2005). Another approach is to calculate spectral similarity for adjacent components in a specified time segment and thereby matching components with high similarity within that particular time segment (Idborg-Bjorkman et al., 2003).

We have recently developed XCMS (Smith et al., 2005), a software for non-linear retention time correction of XC-MS data where X denominates the possibility to use various types of chromatography prior to MS analysis. The software is freely available under an open source license from (http://metlin.scripps.edu/download/). XCMS reads CDF files from the mass spectrometer, find peaks in the form of “features” which is, defined as a unique m/z at a unique time point. It matches the features and performs nonlinear retention time correction through a local regression fitting method using peak groups that are initially well grouped. The output from XCMS can be visualized as a series of superimposed retention time corrected extracted ion chromatograms and also as a tab delimited table with columns denoting observations and a row for each feature aligned across the observations. The values making up the data matrix are area values representing the response for each feature. The table can be imported into Microsoft Excel, Matlab or any preferred multivariate or statistical software for further analysis without any additional processing, as the data is already aligned.

Ultra performance liquid chromatography (UPLC) (Shen et al., 2005; Wilson et al., 2005) is a promising separation technique for metabolomics. The reduced particle size (1.4 to 1.7 μm) of the packing material offers increased separation through narrower chromatographic peaks over normal particle size (3.5 to 5 μm) HPLC, resulting in increased peak capacity, lower ion suppression and potentially better signal to noise (S/N) for observed components. Especially when analyzing complex mixtures with LC-MS, as often is the case in metabolomics investigations, UPLC can be of great advantage over regular microbore HPLC in that more components can be detected (Plumb et al., 2004).

In the present study we validate a metabolomics strategy for human serum using UPLC-MS and HPLC-MS and non-linear retention time correction software XCMS for data alignment. The software is shown to identify features with a concentration difference of only 20% between two sample classes among thousands of features that remained constant is demonstrated in conjunction with the benefits of using UPLC in terms of S/N, spectra quality and number of components detected.

2. Materials and methods

The workflow is illustrated in Figure 1 and composition of the two spiking mixtures is shown in Table 1.

Figure 1.

Figure 1

Workflow scheme, the acquired data is converted to CDF format. The CDF files are subsequently processed using XCMS. The output from XCMS is data aligned data which can be viewed as picture files and also as a result matrix where samples (observations) are making up the columns and the features (variables) constitutes the rows. The features are (in this case 2711) are normalized and sorted according to p-value obtained through a t-test. Features that display significant difference (p<0.05) between sample class A and B are subject to re-integration in the raw data. The re-integrated areas are again normalized and sorted according to p-value, and compounds with p<0.05 constitutes the final table of metabolites that are different between sample A and B.

Table 1.

[M+H]+, composition and injected amount of spiking mixes

Compound [M+H]+ MIX A (p mole) MIX B (p mole)
Tacrine 199 0.2 0.16
Prednisone 359 2 0
Propafenone 342 0.2 0
Bis Tacrine 493A 0.2 0.3
Haloperidol 376 0.2 0.3
Nicotinic acid 124 80 40
Mifepristone 430 0.2 0.24
Sulpiride 342 0.2 0.16
Tenoxicam 338 0.2 0.1
Piroxicam 332 2 2.4
Phenylalanine 2H5 171 200 200
Atenolol 267 0.2 0.2
Caffeine 195 0.2 0.2
Tetracycline 445 2 2
Chloroteracycline 479 2 2
Propranolol 260 0.2 0.2
Verapamil 455 0.2 0.2
Ketoprofene 255 0.2 0.2
Oleamide 282 2 2
A

Bis Tacrine was detected by its fragment at m/z 247

2.1 Chemicals and sample preparation

All solvents used were of HPLC grade (JT Baker, Philipsburg NJ). The serum used was human serum from clotted human male whole blood, sterile-filtered (Sigma, St Louis MO). The bovine serum albumin (BSA) (Sigma St Louis MO) was subject to reduction, alkylation and tryptic digestion according to standard protocol. Phenylalanine 2H5 (98%) was obtained through (Cambridge Isotope Laboratories, Andover MA). All other chemicals for spiking mixtures (A) and (B) were obtained in high purity from (Sigma, St Louis MO). Protein precipitation of five aliquots (100 μL) of the human serum were performed with cold methanol according to (Want et al., 2005). The precipitated aliquots were dried, re-suspended in 100 μL acetonitrile/water 5/95 v/v (0.1% formic acid) and subsequently pooled. From the pooled serum extract, 100 μL were aliquoted to four different HPLC vials. To vial one 900 μL acetonitrile/water 5/95 v/v (0.1% formic acid) was added. To vial two 87.7 μL BSA digest (5.7 μM) and 812.3 μL acetonitrile/water 5/95 v/v (0.1% formic acid). To vial three and four 87.7 μL BSA digest (5.7 μM) and 808.3 μL acetonitrile/water 5/95 v/v (0.1% formic acid) as well as 4 μL of stock solution of mixture (A) and (B) respectively. By keeping volume constant (1000 μL) all dilution issues of the serum were cancelled.

2.2 LC-MS

The separation system used was a Waters Acquity UPLC (Waters, Millford MA) coupled to Micromass Q-TOF Micro (Waters, Millford MA). The same system was used for both UPLC and HPLC experiments. Elution buffers were A: acetonitrile 0.1% formic acid and B: Water 0.1% formic acid. Linear gradients from 5%B/95% A to 95%B/5% A over 10 or 30 minutes were used after keeping at initial conditions for 1 minute. The gradient was kept at 95%B/5% A for three column volumes and the column were subsequently re-equilibrated with four column volumes. Flow rates used were 0.25 mL and 0.5 mL for HPLC and UPLC respectively and the flow was split post column 1:1 in the UPLC experiments. For UPLC experiments we used a BEH chemistry C18 (2.1×100 mm) 1.7 μm particle size column and for HPLC experiments a Symmetry C18 (2.1×100 mm) 3.5 μm particle size (both obtained through, Waters, Millford MA). The column heater was set to 40° C and backpressures noted were around 8000 PSI/1600 PSI for UPLC/HPLC respectively. Injection volume was set to 20 μL full loop with an overfill factor of two. The mass spectrometer was used with the regular ESI interface and calibrated prior to experiments. Data was collected in continuum mode between m/z 100 and m/z 1000 with a acquisition rate of ~ two spectra/second resulting in at least five spectra per chromatographic peak (scan speed 0.52, inter scan delay 0.1). Each of the four pools of serum was injected in five replicates with each chromatographic setup, making 80 injections in total.

2.3 Data analysis

The files were converted to CDF format using Databridge (Waters, Millford MA). The size of the files was 0.5 to 1 Gb, which required a fairly powerful computer for further processing. Software XCMS previously described (Smith et al., 2005) and downloadable through (http://metlin.scripps.edu/download/) was installed on a computer with Linux operating system (dual processor 3.2 Ghz and 6 Gb RAM). Data was processed with default settings (see documentation for XCMS, http://metlin.scripps.edu/download/) of the software with the following exceptions: xcmsSet (profmethod=“binlinbase”), retcor (p=“m”, f=“s”, missing=5, extra=5, span=0.2), group (bw=10 for HPLC and bw=5 for UPLC). The resulting table (CSV file) was opened in Excel (Microsoft, Redmond WA).

All t-test performed in Excel were two-sided, unequal variance. The number of features from originating BSA was determined by comparing serum pool with serum spiked with BSA digest (five and five replicates) and setting p-value cutoff to<0.01. The re-integrated areas were obtained by using QuanLynx (Waters, Milford MA) and by setting up an integration parameter file using the average m/z and retention time for features found to be significant, Apex peak tracking was used for finding peaks.

3. Result and Discussion

In a comprehensive metabolomics study, the aim is to compare the metabolomes of different samples in a non-targeted fashion. The result of such a comparison should be list features/metabolites with quantitative information that imply biochemical change. The structural identification of unknown metabolites is a time consuming process and it is therefore desirable to make this list accurate, reflecting only the significant changes both small and large with as few false positives as possible.

3.1 Design of experiment

The validation was designed as splitting a pool of human serum which previously had been subject protein precipitation into four aliquots. One aliquot was kept as control. The second pool was spiked with a bovine serum albumin (BSA) digest to further increase the complexity. The third pool was spiked with a BSA digest and with a mixture (A) consisting of 19 non-endogenous compounds. The fourth aliquot was spiked with a BSA digest and with a mixture (B) which was a modified version of mixture (A). The composition of mixture (A) and (B) are shown in Table 1. In total 10 concentration changes were made between (A) and (B), while 9 compounds were kept constant. Two compounds were removed, four were increased in concentration by 20% and 50% and four were reduced in concentration by 20% and 50% respectively. The strategy was to introduce a known set of differences between the two sample classes (A) and (B), while keeping thousands of components constant and follow how well our metabolomics strategy could identify these minor differences. The design involved analyzing each of the four plasma pools with both UPLC and HPLC using a long and a short separation gradient. The primary motivation for this design was to evaluate the effect of the length of the chromatography on number of detected components and also their retention and response reproducibility.

The outline of our validated metabolomics workflow is illustrated in Figure 1. After acquisition, the data were converted to CDF format and read in to XCMS. The unaligned data is aligned and the output from XCMS can be visualized as aligned and superimposed extracted ion chromatograms and/or as a data matrix with observations as the columns and features (named by m/z and retention time) aligned across the observations as rows. The resulting table, in tab delimited format, can be readily exported to Microsoft Excel, Matlab or other multivariate/statistical software. After normalization, sorting of the features according to p-value (cutoff p<0.05) calculated by performing a t-test between class (A) and (B) generated a list of significant features. The ions in the list were reintegrated in the raw data. Integration was manually inspected and the integration report was exported back to Excel where a new normalization was performed with areas from the same 6 compounds (integrated from the original data). Again a t-test was calculated on the resultant normalized data matrix, and the features were again sorted according to p-value (cutoff p<0.05). The resulting list is used for reverting back to raw data (or re-analysis with other instrumentation) for identification of the most significantly different features.

3.2 UPLC versus HPLC

After XCMS processing, the resulting tab delimited table was imported to Microsoft Excel. The number of detected features revealed that UPLC was superior over HPLC in that it detected more features. Both for the 30 and 10 minutes gradients UPLC generated roughly 20% more features than the HPLC analysis (Table 2). The length of the separation was also important for the number of detected features. This illustrates a potential for gradient length optimization by using XCMS to increase the gradient length until the increase in number of detected features flattens out. By searching the output from XCMS for the known added compounds, it was observed that the longer gradient enabled XCMS to extract more ions correctly (Table 2). The two compounds that were not found by XCMS with the 30 minute gradient were tenoxicam and ketoprofene. With UPLC these compounds could be manually identified in the chromatogram with an S/N close to the detection limit, whereas they remain undetected in the HPLC chromatogram even after manual inspection. A BSA digest was introduced to three of the serum aliquots to increase the complexity of the sample and to test whether XCMS could be used for finding quantitative differences amongst peptides and thus potentially be used for quantitative proteomics. In a comparison between the serum without BSA and serum spiked with BSA, the 30 minute UPLC gradient identified the highest number of significantly different features (Table 2). The XCMS report table contains minimum and maximum retention time for each feature as measured prior to correction. By subtracting minimum from maximum for each feature, an estimate of retention time correction is obtained. These differences were averaged over all observations for the different separation schemes and are reported in Table 2. Interestingly, the average required retention time corrections were much lower using UPLC. This might reflect an individual column difference but could potentially imply higher retention time reproducibility using an ultra high pressure separation system. Response reproducibility was similar between UPLC and HPLC as shown in Table 2. The largest fraction of features showed a RSD between 5 and 25%. Taken together, these results confirm that UPLC offers an advantage in terms of more detected ions which has been suggested in previous work (Plumb et al., 2004). Surprisingly, the retention time did appear more reproducible with UPLC compared to HPLC. This might in part be due to better peak shapes obtained with UPLC, and the fact that faster chromatography (10 minutes gradient) displayed higher reproducibility (Table 2). However faster chromatography resulted in less detected features overall and is therefore not a good option for untargeted comparisons. Signal to noise values comparing UPLC and HPLC are displayed in Figure 2A. For a majority of the spiked compounds, UPLC gave higher S/N, this is in agreement with previous research (Wilson et al., 2005). A more detailed examination of selected EIC:s and corresponding spectra from representative spiked compounds are shown in Figure 2B. One reason for better signal to noise is the narrower elution profiles obtained with UPLC. Furthermore, it is evident by comparison of the spectra in Figure 2B that the spectral purity of the chromatographic components is higher with UPLC. Narrower peaks will result in better separation and therefore less suppression during ionization.

Table 2.

Comparison between UPLC and HPLC

Separation and gradient (minutes) Total features found Spiked features foundA Features from BSAB Retention time correctionC (seconds)
Fraction (%) of features with RSD <5% Fraction (%) of features with 5%<RSD<25% Fraction (%) of features with RSD>25%

Non-norm.D Norm.E Non-norm.D Norm.E Non-norm.D Norm.E
UPLC-30 2709 17 585 1.8 ± 2.0 6.9 25.2 84.1 64.4 9.0 10.4
HPLC-30 2125 17 494 10.1 ± 21.6 5.8 8.3 83.1 82.1 11.1 9.6
UPLC-10 2034 16 494 1.7 ± 1.7 51.0 52.1 47.0 45.8 2.0 2.1
HPLC-10 1619 16 501 2.9 ± 4.2 43.0 41.6 53.2 54.8 3.8 3.6
A

Serum spiked with mixture A

B

Significance cutoff at p<0.01

C

Average of observed [maximum − minimum] retention time for each feature before correction

D

Feature areas as obtained by XCMS without normalization (5observations/feature)

E

Feature areas as obtained by XCMS with normalization (5observations/feature) (see method section)

Figure 2.

Figure 2

A. Signal to noise ratio (S/N) for the spiked compounds. Black and white bars are S/N measured with UPLC and HPLC respectively. Error bars are showing standard deviation (n=5). B. Extracted ion chromatograms (EIC) and spectra for selected spiked compounds.

3.3 Normalization and re-integration

Normalization was performed by dividing each feature area value in the separate observations against a scalar obtained separately for each injection by calculating the average XCMS integrated area for six of the compounds that were kept constant between mix (A) and (B). This resulted in a response factor matrix of the same size compared to the original matrix which was subsequently further processed. In the literature, several normalization strategies are found (Fiehn et al., 2000; Wang et al., 2003; Jonsson et al., 2004; Shurubor et al., 2005). We choose to normalize this way because averaging of the response of several spiked components appearing over the entire chromatographic range, gave a good result even for small differences (Figure 3) and were easily performed. It is worth to note, that it has been demonstrated that using nanospray infusion MS, the need for normalization is minimized (Boernsen et al., 2005). We found that normalization increased the number of spiked features that were significant with p-values <0.05 (Table 3). Also found in Table 3 are the number of significant features (p<0.05) before and after normalization of XCMS data, as well as the number of significant features before and after normalization observed after re-integration of the raw data. As noted in table 3, initially the normalization caused an increase in the number of false positives. However, a large reduction of false positives was achieved when the data was re-integrated and re-normalized. The RSD distribution kept fairly constant after normalization except for UPLC 30 minutes, where a shift towards smaller RSD values was noted (Table 2). This might reflect better chromatographic peak shapes which potentially could be integrated more coherently and subsequently would normalize better. In Figure 3, a representative example of the effect of initial normalization on the XCMS output is shown for a feature that represent a compound that was changed between (A) and (B). The feature M430T608 (mifepristone) becomes significant only after normalization. An interesting aspect of our data processing protocol is that the known differences first detected as features by XCMS, remain through out the procedure whereas the number of false positives is reduced (Table 3). The initial increase of significant features is reduced when subject to re-integration of the original data. After re-normalization against average integrated area for the same six standards previously used, there is again an increase in the number of false positives. But in all cases, more of the spiked differences are found after re-normalization. The largest amounts of spiked compounds paired with the smallest amount of false positives are found with UPLC combined with a long gradient. This observation makes UPLC a good choice for metabolomics.

Figure 3.

Figure 3

Effect of normalization the top graph display areas for feature M430T608 (Mifepristone) as integrated by XCMS. Letters A and B and the numbers corresponds to the injection replicates of respective spiking mixture. The bottom graph show the normalized area values for the same feature.

Table 3.

Number of features above significance cutoff (p<0.05) as a function of normalization and re-integration

Separation and gradient (minutes) XCMS before normalization XCMS after normalization Integrated before normalization Integrated after normalization False Positives

Total Spiked Total Spiked Total Spiked Total Spiked
UPLC-30 41 6 70 8 9 6 17 8 9
HPLC-30 19 4 26 6 4 8 13 5 8
UPLC-10 77 6 183 8 15 7 31 8 23
HPLC-10 222 7 138 8 21 7 46 8 38

3.4 Identified differences

We performed a detailed trace of the fate of the spiked compounds in the UPLC 30 minutes experiment (Table 4). From this data, we can conclude that XCMS does preserve the ratio aspect of data just as well as the integration of raw data with manufacturers software does. The final area ratios reported well represents the actual differences between (A) and (B). The final result of the entire processing protocol is displayed in Table 5. The fact that tacrine represents the lowest p-value while concentration was only reduced 20% compared to prednisone and propafenone which were completely removed, illustrates that factors such as chromatography and S/N (Figure 2A) will be reflected as smaller standard deviation and hence lower p-values. Nine out of the twelve top features identified as differences between (A) and (B) are differences that relate to spiked alterations when ranked according to p-value. The feature M324T615 is a source fragment from propafenone and the feature M429T1665 was found to be an actual difference with all separation setups. The feature M429T1665 is not a spiked component, and does not originate from column bleed, neither is it a contamination from any of the standards as it was not detected when standards were run individually. The spiked but non-changed compounds atenolol and verapamil also appeared as false positives. The remaining hits in Table 5, unknown 2–6, were manually inspected in the original data. They all represented real chromatographic peaks. If a significance criterion is set to even lower p-value, some of these would disappear together with some of the spiked compounds present only at a 20% concentration difference. This reflects a general problem with detecting differences present at a relative level comparable to the precision limit of the analytical protocol. An estimation of the analytical precision is shown in Table 2. A majority of the features shows a RSD in the range 5 to 25% and this will ultimately limit the potential to discover small differences.

Table 4.

Concentration and area ratios for spiked features using UPLC 30 minute

Compound Feature name (m/z - seconds) [B]/[A] XCMS raw area B/A Integration raw area B/A Full procedure area B/A
Tacrine M199T261 0.8 0.591 0.512 0.536
Prednisone M381T508 0 0.19 0.037 0.005
Propafenone M342T615 0 0.079 0.004 0.003
Bis Tacrine M247T580 1.5 1.732 1.847 1.921
Haloperidol M376T550 1.5 1.249 1.341 1.397
Nicotinic acid M124T39 0.5 0.622 0.589 0.615
Mifepristone M430T608 1.2 1.108 1.129 1.183
Sulpiride M342T131 0.8 0.848 0.843 0.878
Tenoxicam n.d.A 0.5 n.d.A 0.893 n.d.A
Piroxicam M332T550 1.2 1.074 1.073 no diff.B
Phenylalanine 2H5 M171T76 1 0.987 0.966 no diff.C
Atenolol M267T117 1 0.841 0.815 no diff.C
Caffeine M195T193 1 1.025 1.002 no diff.C
Tetracycline M445T274 1 0.968 0.938 no diff.C
Chloroteracycline M479T378 1 1.041 1.072 no diff.C
Propranolol M260T471 1 0.906 0.886 no diff.C
Verapamil M455T633 1 1.036 1.054 no diff.C
Ketoprofene n.d.A 1 n.d.A 0.951 no diff.C
Oleamide M282T1419 1 1.017 1.012 no diff.C
gradient.
A

Feature not detected by XCMS

B

Feature detected by XCMS but no significant difference (p>0.05)

C

Feature spiked at same [] in A and B, not detected as a difference

Table 5.

Resulting table displaying differences significant at p<0.05 for UPLC 30 minute

Gradient
Feature name (m/z - seconds) p-value Compound
M199T261 1.35×10−7 Tacrine
M381T508 2.11 ×10−7 Prednisone
M342T615 3.59 ×10−7 Propafenone
M247T580 5.09 ×10−7 Bis tacrine
M429T1665 6.94 ×10−6 Unknown1
M324T615 8.24 ×10−6 PropafenoneA
M267T117 1.9 ×10−5 Atenolol
M376T550 1.93 ×10−5 Haloperidol
M124T39 5.88 ×10−5 Nicotinic acid
M455T633 0.000502 Verapamil
M430T608 0.001159 Mifepristone
M342T131 0.001649 Sulpiride
M452T608 0.003294 False positive1
M420T110 0.010193 False positive2
M344T134 0.017685 False positive3
M671T1636 0.020916 False positive4
M217T31 0.044037 False positive5
A

In source fragment of propafenone

4. Concluding remarks

With only two sample classes to compare, it is important to emphasize that while we performed all our calculations in Microsoft Excel, with more sample classes the initial identification of significant features is easier to perform with a software that can perform other types of data mining such as partial least squares projection to latent structures discriminant analysis (PLS-DA). The output from XCMS is very well suited for this type of computation since all variables are aligned over the observations. It is evident that UPLC is a good tool for LC-MS based metabolomics. More features were detected at a higher S/N which will provide a better foundation for peak finding, integration and further statistical evaluation. We chose to use XCMS as a tool for the generation of a list with potential significant differences. The original data was subsequently re-integrated with manufactures integration software. By doing this, the final list will reflect differences that are representative of the raw data and also provide a good point for manual inspection of the data. A very high degree of flexibility for data processing is offered by XCMS. We choose to normalize against a set of internal standards, but normalization could also be done with the technique preferred or as demanded by experimental conditions. Finally, the proposed metabolomics analysis protocol using XCMS and UPLC-MS proved to detect 8 out of 10 spiked differences, some differing only by a concentration of 20%. Moreover, few false positives were identified, providing us with a list representing the compounds that are worth further identification effort.

References

  1. Andreev VP, Rejtar T, Chen HS, Moskovets EV, Ivanov AR, Karger BL. A universal denoising and peak picking algorithm for LC-MS based on matched filtration in the chromatographic time domain. Anal Chem. 2003;75:6314–6326. doi: 10.1021/ac0301806. [DOI] [PubMed] [Google Scholar]
  2. Boernsen KO, Gatzek S, Imbert G. Controlled protein precipitation in combination with chip-based nanospray infusion mass spectrometry. An approach for metabolomics profiling of plasma. Anal Chem. 2005;77:7255–7264. doi: 10.1021/ac0508604. [DOI] [PubMed] [Google Scholar]
  3. Bylund D, Danielsson R, Malmquist G, Markides KE. Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. J Chromatogr A. 2002;961:237–244. doi: 10.1016/s0021-9673(02)00588-5. [DOI] [PubMed] [Google Scholar]
  4. Davies T. The new automated mass spectrometry Deconvolution and Identification System (AMDIS) Spectroscopy Europe. 1998:22–26. [Google Scholar]
  5. Duran AL, Yang J, Wang LJ, Sumner LW. Metabolomics spectral formatting, alignment and conversion tools (MSFACTs) Bioinformatics. 2003;19:2283–2293. doi: 10.1093/bioinformatics/btg315. [DOI] [PubMed] [Google Scholar]
  6. Fiehn O, Kopka J, Dormann P, Altmann T, Trethewey RN, Willmitzer L. Metabolite profiling for plant functional genomics. Nat Biotechnol. 2000;18:1157–1161. doi: 10.1038/81137. [DOI] [PubMed] [Google Scholar]
  7. Idborg H, Zamani L, Edlund PO, Schuppe-Koistinen I, Jacobsson SP. Metabolic fingerprinting of rat urine by LC/MS Part 2. Data pretreatment methods for handling of complex data. J Chromatogr B. 2005;828:14–20. doi: 10.1016/j.jchromb.2005.07.049. [DOI] [PubMed] [Google Scholar]
  8. Idborg-Bjorkman H, Edlund PO, Kvalheim OM, Schuppe-Koistinen I, Jacobsson SP. Screening of biomarkers in rat urine using LC/electrospray ionization-MS and two-way data analysis. Anal Chem. 2003;75:4784–4792. doi: 10.1021/ac0341618. [DOI] [PubMed] [Google Scholar]
  9. Johnson KJ, Wright BW, Jarman KH, Synovec RE. High-speed peak matching algorithm for retention time alignment of gas chromatographic data for chemometric analysis. J Chromatogr A. 2003;996:141–155. doi: 10.1016/s0021-9673(03)00616-2. [DOI] [PubMed] [Google Scholar]
  10. Jonsson P, Bruce SJ, Moritz T, et al. Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets. Analyst. 2005;130:701–707. doi: 10.1039/b501890k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Jonsson P, Gullberg J, Nordstrom A, et al. A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS. Anal Chem. 2004;76:1738–1745. doi: 10.1021/ac0352427. [DOI] [PubMed] [Google Scholar]
  12. Katajamaa M, Oresic M. Processing methods for differential analysis of LC/MS profile data. Bmc Bioinformatics. 2005;6:179–190. doi: 10.1186/1471-2105-6-179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Nielsen NPV, Carstensen JM, Smedsgaard J. Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping. J Chromatogr A. 1998;805:17–35. [Google Scholar]
  14. Plumb R, Castro-Perez J, Granger J, Beattie I, Joncour K, Wright A. Ultra-performance liquid chromatography coupled to quadrupole-orthogonal time-of-flight mass spectrometry. Rapid Commun Mass Sp. 2004;18:2331–2337. doi: 10.1002/rcm.1627. [DOI] [PubMed] [Google Scholar]
  15. Plumb RS, Stumpf CL, Gorenstein MV, et al. Metabonomics: the use of electrospray mass spectrometry coupled to reversed-phase liquid chromatography shows potential for the screening of rat urine in drug development. Rapid Commun Mass Sp. 2002;16:1991–1996. doi: 10.1002/rcm.813. [DOI] [PubMed] [Google Scholar]
  16. Radulovic D, Jelveh S, Ryu S, et al. Informatics platform for global proteomic profiling and biomarker discovery using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics. 2004;3:984–997. doi: 10.1074/mcp.M400061-MCP200. [DOI] [PubMed] [Google Scholar]
  17. Shen HL, Grung B, Kvalheim OM, Eide I. Automated curve resolution applied to data from multi-detection instruments. Anal Chim Acta. 2001;446:313–328. [Google Scholar]
  18. Shen YF, Zhang R, Moore RJ, et al. Automated 20 kpsi RPLC-MS and MS/MS with chromatographic peak capacities of 1000–1500 and capabilities in proteomics and metabolomics. Anal Chem. 2005;77:3090–3100. doi: 10.1021/ac0483062. [DOI] [PubMed] [Google Scholar]
  19. Shurubor YI, Paolucci U, Krasnikov BF, Matson WR, Kristal BS. Analytical precision, biological variation, and mathematical normalization in high data density metabolomics. Metabolomics. 2005;1:75–85. [Google Scholar]
  20. Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification. Anal Chem. 2005;78:779–787. doi: 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]
  21. Stein SE. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J Am Soc Mass Spectr. 1999;10:770–781. [Google Scholar]
  22. Villas-Boas SG, Mas S, Akesson M, Smedsgaard J, Nielsen J. Mass spectrometry in metabolome analysis. Mass Spectrom Rev. 2005;24:613–646. doi: 10.1002/mas.20032. [DOI] [PubMed] [Google Scholar]
  23. Vorst O, Vos CHRd, Lommen A, et al. A non-directed approach to the differential analysis of multiple LCâ-”MS-derived metabolic profiles. Metabolomics. 2005;1:169–180. [Google Scholar]
  24. Wang WX, Zhou HH, Lin H, et al. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem. 2003;75:4818–4826. doi: 10.1021/ac026468x. [DOI] [PubMed] [Google Scholar]
  25. Want EJ, Cravatt BF, Siuzdak G. The expanding role of mass spectrometry in metabolite profiling and characterization. Chembiochem. 2005;6:1941–1951. doi: 10.1002/cbic.200500151. [DOI] [PubMed] [Google Scholar]
  26. Want EJ, O’Maille G, Smith CA, Brandon TR, Uritboonthai W, Siuzdak G. Solvent Dependent Metabolite Distribution, Clustering and Protein Extraction for Serum Profiling with Mass Spectrometry. Anal Chem. 2005;78:743–752. doi: 10.1021/ac051312t. [DOI] [PubMed] [Google Scholar]
  27. Wilson ID, Nicholson JK, Castro-Perez J, et al. High resolution “Ultra performance” liquid chromatography coupled to oa-TOF mass spectrometry as a tool for differential metabolic pathway profiling in functional genomic studies. J Proteome Res. 2005;4:591–598. doi: 10.1021/pr049769r. [DOI] [PubMed] [Google Scholar]
  28. Windig W, Phalp JM, Payne AW. A noise and background reduction method for component detection in liquid chromatography mass spectrometry. Anal Chem. 1996;68:3602–3606. [Google Scholar]

RESOURCES