2D NMR Metabonomic Analysis: A Novel Method for Automated Peak Alignment

Ming Zheng; Peng Lu; Yanzhou Liu; Joseph Pease; Jonathan Usuka; Guochun Liao; Gary Peltz

doi:10.1093/bioinformatics/btm427

. Author manuscript; available in PMC: 2008 Jan 4.

Published in final edited form as: Bioinformatics. 2007 Sep 10;23(21):2926–2933. doi: 10.1093/bioinformatics/btm427

2D NMR Metabonomic Analysis

A Novel Method for Automated Peak Alignment

Ming Zheng ¹, Peng Lu ¹, Yanzhou Liu ², Joseph Pease ², Jonathan Usuka ¹, Guochun Liao ^1,^*, Gary Peltz ¹

PMCID: PMC2174787 NIHMSID: NIHMS31760 PMID: 17846038

Abstract

Motivation

Comparative metabolic profiling by nuclear magnetic resonance (NMR) is showing increasing promise for identifying inter-individual differences to drug response. Two dimensional (2D) ¹H-¹³C NMR can reduce spectral overlap, a common problem of 1D ¹H NMR. However, the peak alignment tools for 1D NMR spectra are not well suited for 2D NMR. An automated and statistically robust method for aligning 2D NMR peaks is required to enable comparative metabonomic analysis using 2D NMR.

Results

A novel statistical method was developed to align NMR peaks that represent the same chemical groups across multiple 2D NMR spectra. The degree of local pattern match among peaks in different spectra is assessed using a similarity measure, and a heuristic algorithm maximizes the similarity measure for peaks across the whole spectrum. This peak alignment method was used to align peaks in 2D NMR spectra of endogenous metabolites in liver extracts obtained from four inbred mouse strains in the study of acetaminophen-induced liver toxicity. This automated alignment method was validated by manual examination of the top fifty peaks as ranked by signal intensity. Manual inspection of 1872 peaks in 39 different spectra demonstrated that the automated algorithm correctly aligned 1810 (96.7%) peaks.

Availability

Algorithm is available upon request.

Contact

guochun.liao@roche.com

1 INTRODUCTION

Nuclear magnetic resonance spectroscopy (NMR) is a well established method for the analysis of complex biological samples. Profiling of endogenous metabolites (metabonomics) by NMR is showing increasing promise for predicting an individual's response to drugs. For example, comparative metabolic profiling by NMR was used to predict the response to doxorubicin and interleukin-2 treatment (Ewens et al., 2006). Many metabonomic studies to date utilize one dimensional (1D) ¹H NMR analysis because the data is relatively quick and easy to obtain and analysis is straight forward. However, 1D ¹H NMR spectra of complex biological samples typically have high spectral overlap, which significantly limits the number of metabolites that can be uniquely identified and quantified. Recently, 2D ¹H-¹³C HSQC NMR was used to analyze global metabolic changes in yeast (Lu et al., 2007). Because almost all endogenous metabolites contain carbon, the addition of the second (¹³C) NMR dimension improves the resolution and enables the accurate identification of a large number of metabolites that are not resolvable in a standard 1D ¹H NMR spectrum. An example of the improved resolution is shown in figure 1.

Fig. 1 — Relative to 1D NMR spectra, 2D NMR spectra have increased ability to resolve individual metabolites. The horizontal axis denotes ¹H dimension; and the vertical axis in the 2D NMR spectra denotes the ¹³C dimension. The unit for both axes is part per million (ppm). (A) and (B) show whole 1D and 2D NMR spectra generated from analysis of the same sample. (C) presents an enlarged view for the boxed area in (A); and (D) is enlarged view for the boxed area in (B). Both (C) and (D) show the areas of spectra within the same ¹H range, 2.9 to 3.4 ppm. It is apparent that the single peak at 3.27 ppm on the 1D ¹H NMR spectrum consists of many peaks in ¹³C dimension when analyzed by 2D NMR. The 1D NMR spectrum was visualized using Topspin software (http://www.bruker-biospin.com/topspin.html); while the 2D NMR spectrum was visualized using a Sparky Assignment and Integration Software package (http://www.cgl.ucsf.edu/home/sparky).

However, metabonomic analyses require comparison of metabolic profiles obtained from multiple replicates of samples exposed to different experimental conditions. This requires aligning NMR cross-peaks that represent the same chemical groups across multiple NMR spectra. The complexity of 2D ¹H-¹³C HSQC NMR spectra presents a number of challenges for comparative spectral analysis. First, the position of a cross-peak representing the same chemical group across multiple spectra is not fixed on 2D ¹H-¹³C NMR spectra. There is always a slight shift in the position of a cross-peak because the experimental conditions can not be replicated exactly. Subtle differences in pH, temperature or ionic strength all can cause minor shifts in cross-peak location. Second, shifts in peak position are not systematic. The direction and extent of the shift for each peak is not consistent throughout a spectrum, and occur to a different extent within different areas of a spectrum. Therefore, a global correction for an individual spectrum is not helpful. Finally, not all metabolites are present in all samples analyzed for a given metabonomics study. Therefore, the ability to distinguish a peak with an insignificant signal from a neighboring peak representing a different metabolite is critical for the alignment of 2D NMR spectra.

A number of methods have been used for ¹H 1D NMR spectrum alignment, including spectral binning (Anthony et al., 1994), kernel smoothing (Smith et al., 2006) and targeted profiling (Weljie et al., 2006). In the spectral binning method, the entire 1D spectrum is divided into small bins of fixed or variable size, and then the total intensity within each bin is further analyzed (Anthony et al., 1994). Although spectral binning can be applied directly to 2D NMR data, determination of meaningful bin boundaries is very difficult due to heterogeneity within different regions of spectrum. The kernel smoothing method estimates the overall distribution of all peaks, and then overlapping intervals are used for peak matching (Smith et al., 2006). Recently, targeted profiling was proposed to identify and quantify unknown metabolic peaks within uncharacterized spectra by searching a database of pure compound spectra (Weljie et al., 2006). Because of the increased complexity and insufficient prior knowledge of individual NMR resonances within 2D NMR spectra, kernel smoothing and targeted profiling methods have limited applications for analysis of 2D NMR spectra. The problem of aligning 2D NMR spectra is actually similar to the ’point match problem’ encountered in the field of Computer Vision. A robust point matching method was developed to align features in two visual images, which resembles the peaks within 2D NMR spectra. This method uses a distance similarity measure (Chui et al., 2000; Gold et al., 1996; Rangarajan et al., 1997; Geiger et al., 1991) to align the visual feature. However, this method can not be applied directly to 2D NMR spectrum alignment. This method works well when there are less than 100 features in each image; but 2D NMR spectra usually contain 300 500 peaks. In addition, this method can only analyze two different images, and modifications are required to enable alignment of multiple images.

Here, we present a simple and effective statistical method for alignment of peaks within 2D NMR spectra. Since peaks representing different chemical groups often exhibit a similar shift pattern within a small region of a spectrum, we designed a similarity measure that assesses the degree of local pattern match. Then, we developed a heuristic algorithm to maximize the similarity measure across the whole spectrum, which enables alignment of peaks representing the same chemical groups in different spectra. This method was applied to ¹H-¹³C 2D NMR spectra that were produced as part of a research effort to identify genetic factors regulating susceptibility to acetaminophen-induced liver toxicity.

2 DATA AND METHODS

Overview of metabonomic analysis of acetaminophen-induced liver toxicity

Acetaminophen is a safe and effective drug when administered appropriately; however, an overdose can cause liver damage by inducing localized centrilobular cell death (James et al., 2003; Bessems et al., 2001). Inbred mouse strains showed differential susceptibility to acetaminophen-induced liver damage. Identifying the factors that contribute to the resistance to acetaminophen-induced liver toxicity can lead to novel prevention or treatment strategies. The abundance of soluble metabolites in liver extracts from four inbred mouse strains was compared using ¹H-¹³C 2D NMR analysis to identify metabolites that with a unique abundance pattern in the strain (SJL) that was resistant relative to the 3 strains (DBA/2, C57BL6 and SMJ) that were sensitive to the drug-induced liver toxicity.

Animal husbandry and drug treatment

All animal experiments were performed in the Laboratory Animal Facilities at Roche Palo Alto using protocols that were approved by Roche Institutional Animal Care and Use Committee. 7−8 week old male mice of the following strains were obtained from Jackson Laboratory (Bar Harbor, ME) and acclimatized for an additional week before use: SJL, DBA/2J, C57BL/6J, and SM/J. The mice were housed under pathogen-free environment and provided food and water ad libitum with a 12h:12h light: dark cycle until experimental use. Before each experiment, food was withheld from the animals overnight (>16h) to uniformly deplete hepatic glutathione stores (Bartolone et al., 1987). All mice were administered a single 300mg/kg intraperitoneal dose of freshly prepared acetaminophen (Voigt Global Distribution INC, www.VGDLLC.com) suspended in PBS (pH 7.4), and were allowed free access to food and water after treatment. The mice were euthanized by CO₂ inhalation to collect blood samples and liver tissues at 0, 3, 6 hrs after dosing.

Metabolite extraction and NMR data acquisition

Frozen liver tissues ( 500 mg) were pulverized with liquid nitrogen and immediately plunged into 15 ml solution of 67% MeOH/33% H₂O at −20 °C. Tissues were lysed by freezing and thawing three times, thoroughly mixed, and then centrifuged at 12,000 x g for 30 min at 4 °C. The supernatants were dried by speed-vacuum and re-suspended in 500 ul D₂O (Cambridge Isotope Laboratories, Inc.). Samples were then centrifugally filtered through 10-kDa cutoff filters (Microcon YM-10, Millipore) to remove precipitated proteins. The filtrate was lyophilized and dissolved in 200 ul D₂O containing 1 mM sodium-3-(tri-methylsilyl)-2,2,3,3-tetradeuteriopropionate (TSP; Sigma-Aldrich), which was an internal standard for NMR analysis.

NMR spectra were recorded at 300°K on a Bruker Avance 600 MHz spectrometer operating at ¹H frequency of 599.99 MHz and ¹³C frequency of 150.87 MHz using a 3 mm Nalorac microprobe with z-axis pulsed-field gradient. Resonances were assigned using 1D proton spectra, 2D proton correlated spectroscopy COSY, and 2D ¹H-¹³C single-bond correlated HSQC spectra. Peaks used in the quantitative metabolite analyses were picked from 2D ¹H-¹³C HSQC spectra acquired using z-axis pulsed field gradients for coherence selection. Spectra were acquired using 16 to 64 scans per FID, 1024 and 256 points for the ¹H and ¹³C dimensions, and total spectrum acquisition times of 2.75 to 11 hours. NMR assignments were confirmed by acquiring the spectra of samples spiked with purified compounds. 2D-NMR spectra were processed with Bruker Topspin software and interpreted using the Sparky Assignment and Integration Software package (http://www.cgl.ucsf.edu/home/sparky).

Description of Peak Alignment Method

In order to align peaks within 2D NMR spectra, we designed an object function to access the similarity between peaks from different spectra, and then used a greedy algorithm to maximize the objective function. First let us define that a peak is a local maximum within a predefined range (e.g. 0.03 ppm in ¹H dimension and 0.35 ppm in ¹³C) in a spectrum. An alignment proposal is a proposal to assign correspondence to every possible peak pair for a set of spectra: if a pair of peaks is proposed to represent the same chemical group, this pair of peaks is assigned ”corresponding”. An alignment proposal follows two rules:

”Corresponding” assignment is transitive: if peak 1 and peak 2 are corresponding and peak 1 and peak 3 are corresponding, then peak 2 and peak 3 are corresponding.
If a pair of peaks consists of two different peaks from the same spectrum, then they are assigned non-corresponding, because within a single spectrum different peaks are considered to represent different chemical groups.

The rules (1) and (2) imply that any peak in a spectrum can not be assigned ”corresponding” to two peaks in another spectrum. It is clear that a peak is always corresponding to itself and if peak 1 is corresponding to peak 2, then peak 2 is corresponding to peak 1. Therefore, mathematically this corresponding assignment can be viewed as an equivalence assignment.

Let us define a peak group as the set of all peaks assigned to the same chemical group by an alignment proposal. For example, for the four spectra shown in figure 3A, if an alignment proposal assigns peak #1 of each spectrum to a same chemical group, then these four peaks, each from a different spectrum, make one peak group. A peak group satisfies following:

If two peaks are assigned ”corresponding”, they are in the same peak group.
Any two peaks in the same peak group are assigned ”corresponding”.

Fig. 3 — Examples of peaks that were properly aligned (A), had minor errors (B), or misaligned (C) by the automated algorithm. Each image in a column shows the same region of a 2D NMR spectrum of metabolites in liver obtained from each of the four indicated mouse strains. The crossed lines in the spectra in each column indicate the exact same ¹³C and ¹H coordinates. The numbers inside and outside of the parenthesis are the chemical group identities assigned to the peaks by the automated or manual alignment methods, respectively. For a given peak, the different numbers inside and outside of the parenthesis indicate a peak misalignment. (A) Example of peaks that are properly aligned by the automated method. The identity of chemical group 1 was determined to be Glutathione according to HSQC, COSY and TOCSY spectra. This assignment was subsequently confirmed by acquiring the spectra of samples that were spiked with purified Glutathione. (B) Example of automated peak alignments with minor errors. A peak located near the cross had a relatively large shift in the SMJ spectrum. This caused the automated alignment algorithm to select a nearby peak causing a misalignment of this peak. (C) Example of a significant error by the automated peak alignment algorithm. The peaks representing three chemical groups (3, 4 and 5) within the area near the cross are shown. In the SJL and C57B6 spectra, all three peaks are present and correctly aligned. However, in the DBA and SMJ spectra, the peaks representing chemical group 4 are missing. This caused the algorithm to align the peaks representing chemical group 5 in the DBA and SMJ spectra with the peaks representing chemical group 4 in the SJL and C57B6 spectra.

2.1 Objective function and similarity measure

As mentioned in the introduction, shifts in peak position in 2D NMR spectrum are not systematic. However, our observation indicates that within a small area of 2D NMR spectrum, shifts in peak position are similar: the peaks within a small area usually shift towards similar directions with similar extent. As a result, the relative positions of peaks within a small area are not changed by peak shifting. Therefore, the local spatial pattern of peaks across different spectra are usually matched. We proposed a similarity measure to measure how well the local patterns are matched and a new algorithm to maximize the similarity measure. By doing these, we are able to assign ”corresponding” to peaks across multiple spectra by local pattern match.

We designed an objective function for each alignment proposal as

F (alignment proposal) = \sum_{all peak groups in the proposal} f (peak group),

(1)

where

f (peak group) = \sum_{all pairs < i, j > in the group} s i m ({\vec{p}}_{i}, {\vec{p}}_{i}) .

(2)

This objective function is essentially the total sum of similarity measure between the peak pairs for all peak groups. Here, ${\vec{p}}_{i}$ and ${\vec{p}}_{j}$ are the i^th peak and the j^th peak respectively, sim( ${\vec{p}}_{i}$ , ${\vec{p}}_{i}$ ) is the similarity measure between ${\vec{p}}_{i}$ and ${\vec{p}}_{j}$ , and ${\vec{p}}_{i}$ = (C_i, H_i), where C_i and H_i are the coordinates of the i^th peak in the carbon and proton dimension, respectively; and f(peak group) is the total similarity measure for a peak group.

There are three components in the proposed similarity measure. The first component is the weighted Euclidean distance between peaks, which is used to access local pattern match. Because the scales of the carbon and proton dimensions are different, the two coordinates of peaks need to be rescaled to balance their contribution to the similarity measure. We found that the peaks corresponding to the same chemical group in different spectra often had similar 2D shapes. Therefore, for each peak we defined a neighborhood region and used the neighborhood correlation as the second component to measure the shape similarity between a pair of peaks. It is common that replicate spectra are measured in many studies, so we considered the similarity of signal intensity between replicates as well and used it as the third component. Therefore, we designed the following similarity measure for alignment:

s i m ({\vec{p}}_{i}, {\vec{p}}_{i}) = {\begin{matrix} - \infty, if {\vec{p}}_{i} and {\vec{p}}_{j} is far away \\ s_{1} + s_{2} + s_{3}, otherwise \end{matrix}

(3)

where $s_{1} = - {d i s t ({\vec{p}}_{i}, {\vec{p}}_{j})}^{2} = - {{(C_{i} - C_{j})}^{2} + λ \cdot {(H_{i} - H_{j})}^{2}}$ is the negative weighted Euclidean distance between ${\vec{p}}_{i}$ and ${\vec{p}}_{j}$ ; s₂ = γ · (correlation – 1) is the weighted Pearson's correlation between the defined neighborhood of ${\vec{p}}_{i}$ and ${\vec{p}}_{j}$ minus one (correlation – 1 is chosen because a) when the shapes are perfectly matched, this term becomes zero, i.e. there is no penalty for the shape difference, which conceptually coincides with the meaning of the first term s₁, and b) correlation – 1 is always non-positive, as s₁ and s₃); and s₃ = –η · (I_i – I_j)² is the negative weighted square difference between the log intensity of ${\vec{p}}_{i}$ and ${\vec{p}}_{j}$ ; γ is the weight for the correlation term and η is the weight for the log-intensity difference term. Although peaks in a 2D NMR spectrum can shift, the extent of shift is always limited. Therefore, we defined upper bounds for the coordinate differences between peak pairs in the same peak group. When the distance between a pair of peaks exceeds the upper bound, this pair of peaks are considered as ”far away” and the alignment proposal will not be accepted by assigning a negative infinity score to the similarity measure. The upper bounds for carbon and proton dimensions were estimated by manually aligning two reference peak groups representing two known metabolites that are next to each other in 2D NMR spectrum. We assumed that the coordinates of carbon dimension for the peaks in one peak group were independently drawn from a uniform distribution [C_true–R_C, C_true+R_C], where C_true is the actual coordinate in the carbon dimension for the corresponding metabolite and R_C is the maximum range that a peak can shift in the carbon dimension, which can be estimated unbiasedly from the reference data. The upper bound B_C for the carbon dimension is then defined as 2R_C. The upper bound B_H for the proton dimension can be calculated in the similar way. The normalizing constant λ in equation 3 is estimated as (B_C/B_H)².

γ is determined by the user to balance the position difference and the shape difference. Empirically, we suggest to choose a γ to make the scales of the correlation term and the square distance term to be the same. From the spectra we want to align, we can randomly pick a large number (e.g. 10,000) of peak pairs such that the two peaks in the pair are from different spectra and that their position differences in both carbon dimension and proton dimension are within the permitted ranges, which are B_C and B_H, respectively. Then we use the ratio of the mean of the weighted square distances to the mean of one minus the correlations as a reasonable estimation for γ, so that the estimation of γ is always positive. η is defined as the ratio of the standard deviation of the position shifts in carbon dimension and the standard deviation of the log-intensity of all peaks when we align replicated spectra, η is zero otherwise. The position shifts in carbon dimension is assumed to follow a uniform distribution [–R_C, R_C], therefore the standard deviation is $R_{C} ∕ \sqrt{3} = B_{C} ∕ \sqrt{12}$ , where B_C = 2R_C.

Peak with insignificant signal or missing peak was handled by adding an imaginary ”missing peak” to a peak group. The similarity measure between a real peak and a missing peak is defined as:

s i m ({\vec{p}}_{i}, missing peak) = s i m (missing peak, missing peak) = α .

(4)

Here, α is the penalty term for missing peaks, and ${\vec{p}}_{i}$ is any real peak in the same peak group. Therefore, there should always be S(S – 1)/2 terms in the summation , where S is the number of spectra in the summation.

Finally, in order to compare objective function F (alignment proposal) between proposals with different number of peak groups, we added imaginary peak groups composed of all missing peaks to the proposal with fewer peak groups to make the total number of peak groups in each proposal equal.

2.2 Maximizing the objective function

A greedy algorithm was used to maximize the objective function. This greedy algorithm has two main steps: generating initial alignment proposal and refining the alignment to maximize the object function (figure 2).

Fig. 2 — A flow diagram of the alignment algorithm. The left panel shows the steps for the initial alignment; and the right panel shows the steps for refining the alignment.

Generating the initial alignment proposal

is done by creating peak groups from unprocessed peaks in a greedy manner iteratively. As the starting point, all peaks in all spectra are unprocessed. First a peak, denoted as ${\vec{p}}_{1}$ , is randomly selected from a spectrum to start a peak group containing this peak only. The best possible members for this peak group are then identified from the remaining spectra. Suppose that currently n – 1 spectra have been processed, therefore, the peak group has n – 1 members, either real peaks or ”missing” peaks: $P = {{\vec{p}}_{1}, {\vec{p}}_{2}, \dots, {\vec{p}}_{n - 1}}$ . Then the unprocessed peak in the n^th spectrum that is most similar to P is identified and let's denote it as ${\vec{p}}_{n}$ (for any peak $\vec{p}$ in the n^th spectrum, $s i m (\vec{p}, P) = \sum_{i = 1}^{n - 1} s i m (\vec{p}, {\vec{p}}_{i})$ .) If ${\vec{p}}_{n}$ and P are compatible, i.e., |C_n – C_i| < B_C and |H_n – H_i| < B_H for i = 1, 2, . . . , n – 1, where C_i and H_i are the coordinates of ${\vec{p}}_{i}$ in the carbon and proton dimension respectively, then ${\vec{p}}_{n}$ is added to the peak group and marked as processed ( ${\vec{p}}_{1}$ is automatically marked upon selection), or a ”missing” peak is added to the peak group for this spectrum and ${\vec{p}}_{n}$ remains unprocessed. If ${\vec{p}}_{n}$ does not exist, i.e. all peaks in the n^th spectrum have been processed, then a ”missing peak” is added to the current peak group. This is repeated until the current peak group contains exactly one peak from each spectrum. Next another unprocessed peak is selected to start another peak group. This process is repeated until all peaks in all spectra have been processed. The selection of unprocessed peaks can be in a stochastic order or a deterministic order. The application shown in this paper used a deterministic order for peak selection: spectra were randomly ordered; peaks within each spectrum were ordered first by their ¹³C coordinate and then by their ¹H coordinate; and the selection of unprocessed peaks was based on this order until all peaks were processed.

Refining alignment proposal

is achieved by re-assigning peaks to peak groups to maximize the total similarity score. First, three peak groups are selected. Then a spectrum is selected; each selected peak group has one peak in the selected spectrum, either a real peak or a ”missing” peak. Then six combinations of assigning three peaks to three peak groups are examined and the one with the highest similarity score is kept in the alignment proposal (Figure 2). All spectra are evaluated iteratively until no further improvement can be achieved for that triplet. This process is repeated for all possible triplets of peak groups until there is no further increase in the objective function.

2.3 Computational Complexity

In generating initial alignment proposal, to construct a peak group, S peaks need to be added to the group, where S is the number of spectra. Adding a peak of a spectrum to the current peak group takes O(s · g), where s is the current number of peaks in the group and g is the number of peaks in the spectrum that remains unprocessed. s and g are always less than or equal to S and G, respectively, where G is the number of peak groups in the final initial-alignment proposal. Therefore, generating a peak group takes O(S² · G). Because there are G peak groups needed to be generated for the final initial-alignment proposal, the computational complexity for in generating initial alignment proposal step is O(S² · G²). In refining alignment proposal, there are O(G³) possible triplet peak groups; for each triplet, there are O(S) calculations of similarity measure for each of the S target spectra. Therefore, the computational complexity for refinement step is O(S² · G³). Therefore, the total computational complexity for this method is controlled at O(S² · G³). Because 2D NMR spectrum usually have less than 500 peaks, we can further consider G as a constant, so the total computational complexity for this alignment algorithm becomes O(S²). This algorithm was implemented in C on a Redhat Linux platform with a 3G Hz CPU, and it took less than 1 minute to process 30 spectra.

3 RESULTS AND DISCUSSION

We wanted to evaluate the abundance of endogenous soluble metabolites in liver extracts from mouse strains that are sensitive (SM/J, DBA/2, C57BL/6) or resistant (SJL) to acetaminophen-induced liver toxicity. To do this, liver extracts were prepared from these four strains at 0, 3 and 6 hour after treatment with a single dose of 300 mg/kg of acetaminophen; the metabolites were extracted and ¹H-¹³C 2D NMR spectra were prepared as described. We found that analysis of the 500 individual peaks in each of the 39 spectra (3 or 4 replicates from each of the 4 strains at 3 time points) was an overwhelming task that would require several weeks to complete, even when performed by an investigator (P.L.) that had several years of experience with analysis of NMR spectra. Therefore, we developed and evaluated an automated method for peak alignment in 2D NMR spectra.

Before applying the alignment algorithm, the spectra were normalized so that the total absolute intensity of each spectrum equaled 1. Two reference metabolites-glutathione and glutamine with well known 2D NMR coordinates (33−35 ppm of ¹³C dimension and 2.3−2.5 ppm of ¹H dimension)-were manually aligned across all spectra. This enabled the upper and lower bounds for the coordinate differences between peak pairs within the same peak group to be determined. The shape similarity term (γ) was set to 0.5 empirically (the estimated value for γ to balance the contribution of position difference and shape difference was 0.36 from the data, but we decided to give slightly more weight to the shape and therefore selected 0.5); all other parameters were calculated from the data as described in the methods section. The raw NMR data was processed using AMIX (http://www.brukerbiospin.com/amix.html) to identify the metabolite peaks and then the alignment algorithm was applied.

Since three or four independent replicate experiments were performed for each of the 12 conditions analyzed (4 strains by 3 time points after dosing) to produce a total of 39 spectra, the spectral alignment was performed in two steps. First, the replicate spectra from each individual condition were aligned. Then, the average intensities and coordinates of the aligned peak groups were calculated to generate a representative spectrum for each strain at a particular time point. A peak group was included in the representative spectrum only if the corresponding peak appeared in more than three replicates, or if the corresponding peak appeared in two replicates with a normalized signal intensity that was above the user-defined signal threshold of 0.00001. The representative spectra for 12 conditions were then aligned to generate the final alignment with 514 peak groups. Because a gold standard for assessing the 2D NMR spectral alignment does not exist, the automated alignments were manually examined by an experienced investigator to validate the alignments generated. First, all peak groups were ranked according to the maximal intensity of the individual peaks within each group. Then, the top fifty peak groups (Table 1) were manually inspected. The manual inspection indicated that we could not unambiguously align 2 peak groups without performing spiking experiments to determine the actual identity of the metabolites. Therefore, these 2 peak groups were excluded from the validation process. However, manual inspection also indicated that all 39 peaks within each of 33 different peak groups were perfectly aligned by the automated method. There was one peak group that was missing one peak in one spectrum. The peak was missing because it was no longer a local maximum due to the effect of a nearby larger peak. The manual inspection also indicated that seven peak groups had minor errors: incorrect peaks were included in one or two of the 39 spectra analyzed. Seven other peak groups had major misalignments: the local patterns were disrupted by the mixture effect caused by peak shifting and/or missing/extra peaks, especially the phenomenon that twin peaks merged to a single peak in certain spectra. Examples of regions with properly aligned or misaligned peaks are shown in figure 3. Thus, manual inspection of 1872 peaks (39 peaks within each of 48 peak groups analyzed) indicated that the algorithm performed quite well; 1810 (96.7%) peaks were correctly aligned by the automated algorithm.

Table 1.

Comparison of automated and manual alignment of peaks on 2D NMR spectra.

# Peak groups	# Peaks Aligned Correctly	# Peaks Analyzed	Comparison Result	Error Type
33	1287	1287	Perfectly aligned	-
1	38	39	Correctly aligned, missing Peaks	Missing peaks due to peak missing/merging
7	266	273	Minor Error	Incorrect peaks in 1−2 of 39 spectra
7	219	273	Major Error	Local patterns disrupted by peak shifting and missing/extra peaks

Open in a new tab

For this comparison, peaks representing 48 different chemical groups across 39 2D NMR spectra were analyzed. All peak groups were ranked by their maximum peak intensity; then, the top fifty peak groups were manually inspected. Two peak groups were excluded because manual alignment could not be accomplished. The manual alignment results were then compared to the output of the automated alignment algorithm. Overall, manual inspection of 1872 peaks indicated that 96.7% (n=1810) of the peaks were aligned correctly by the automated alignment algorithm.

We describe an automated method for peak alignment in 2D-NMR spectra that enables assessment of the relative abundance of individual metabolites. It uses a greedy algorithm, which continuously refines peak alignments until a local maximum of the object function is reached. The heuristic analysis identifies a reasonable local maximum that is not far from the true global maximum. The final result is dependent upon the order in which spectra are added into the initial alignment. By assessing multiple orders of spectral addition and selecting the best result, the probability of identifying an optimum local or even a global maximum is increased. Furthermore, this method only interchanges peaks within a triplet of peak groups. It is possible to swap peaks within a higher order of peak groups, such as quartets or quintets. However, this requires a longer processing time.

The similarity measure described here represents our initial attempt to assess the degree of pattern matching among peaks. There are a number of opportunities that could improve this method. Clearly, other measures of similarity can subsequently be assessed, including those that more properly restrict local shape matching. The Pearson's correlation of local neighborhood term in equation 3 is a raw approximation for shape matching. A shape model could be established and fitted to each peak to provide more precise shape information for matching. The overall distribution of neighboring peaks can also provide information that can be used for peak alignment; this will probably solve the misalignment shown in figure 3B and 3C. Since the alignment algorithm is independent to the method used for peak identification, kernel smoothing (Smith et al., 2006) or wavelet transform (Du et al., 2006) could be applied to detect peaks before aligning the spectra. Additionally, modifications to the point matching method can be introduced into this algorithm to improve the ability to maximize the total similarity measure, which could also improve the final alignment. Irrespective of the modifications that are subsequently utilized, this automated method represents a significant first step that provides a substantial improvement in our ability to analyze 2D NMR metabonomic data.

ACKNOWLEDGEMENT

We thank Erin Farrell for the help on the animal study and sample preparation. P.L. was supported by a grant (1 R01 GM068885-01A1) from the NIGMS awarded to G.P.

REFERENCES

Anthony ML, Sweatman BC, Beddell CR, Lindon JC, Nicholson JK. Pattern recognition classification of the site of nephrotoxicity based on metabolic data derived from proton nuclear magnetic resonance spectra of urine. Mol. Pharmacol. 1994;46:199–211. [PubMed] [Google Scholar]
Bartolone JB, Sparks K, Cohen SD, Khairallah EA. Immunochemical detection of acetaminophen-bound liver proteins. Biochem. Pharmacol. 1987;36:1193–1196. doi: 10.1016/0006-2952(87)90069-4. [DOI] [PubMed] [Google Scholar]
Bessems JG, Vermeulen NP. Paracetamol (acetaminophen)-induced toxicity: molecular and biochemical mechanisms, analogues and protective approaches. Crit Rev. Toxicol. 2001;31:55–138. doi: 10.1080/20014091111677. [DOI] [PubMed] [Google Scholar]
Chui H, Rangarajan A. A New Algorithm from Non-rigid Point Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2000;II:44–51. [Google Scholar]
Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics. 2006;22:2059–2065. doi: 10.1093/bioinformatics/btl355. [DOI] [PubMed] [Google Scholar]
Ewens A, Luo L, Berleth E, Alderfer J, Wollman R, Hafeez BB, Kanter P, Mihich E, Ehrke MJ. Doxorubicin plus Interleukin-2 Chemoimmunotherapy against Breast Cancer in Mice. Cancer Res. 2006;66:5419–5426. doi: 10.1158/0008-5472.CAN-05-3963. [DOI] [PubMed] [Google Scholar]
Geiger D, Girosi F. Parallel and Deterministic Algorithms from Mrfs - Surface Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1991;13:401–412. [Google Scholar]
Gold S, Rangarajan A. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1996;18:377–388. [Google Scholar]
James LP, Mayeux PR, Hinson JA. Acetaminophen-induced hepatotoxicity. Drug Metab Dispos. 2003;31:1499–1506. doi: 10.1124/dmd.31.12.1499. [DOI] [PubMed] [Google Scholar]
Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM. Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome. Metab. Eng. 2003;9:8–20. doi: 10.1016/j.ymben.2006.06.003. [DOI] [PubMed] [Google Scholar]
Rangarajan A, Chui H, Mjolsness E, Pappu S, Davachi L, Goldman-Rakic P, Duncan J. A robust point matching algorithm for autoradiograph alignment. Visualization in Biomedical Computing. 1997;1:379–398. doi: 10.1016/s1361-8415(97)85008-6. [DOI] [PubMed] [Google Scholar]
Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 2006;78:779–787. doi: 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]
Weljie AM, Newton J, Mercier P, Carlson E, Slupsky CM. Targeted profiling: quantitative analysis of 1H NMR metabolomics data. Anal. Chem. 2006;78:4430–4442. doi: 10.1021/ac060209g. [DOI] [PubMed] [Google Scholar]

[R1] Anthony ML, Sweatman BC, Beddell CR, Lindon JC, Nicholson JK. Pattern recognition classification of the site of nephrotoxicity based on metabolic data derived from proton nuclear magnetic resonance spectra of urine. Mol. Pharmacol. 1994;46:199–211. [PubMed] [Google Scholar]

[R2] Bartolone JB, Sparks K, Cohen SD, Khairallah EA. Immunochemical detection of acetaminophen-bound liver proteins. Biochem. Pharmacol. 1987;36:1193–1196. doi: 10.1016/0006-2952(87)90069-4. [DOI] [PubMed] [Google Scholar]

[R3] Bessems JG, Vermeulen NP. Paracetamol (acetaminophen)-induced toxicity: molecular and biochemical mechanisms, analogues and protective approaches. Crit Rev. Toxicol. 2001;31:55–138. doi: 10.1080/20014091111677. [DOI] [PubMed] [Google Scholar]

[R4] Chui H, Rangarajan A. A New Algorithm from Non-rigid Point Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2000;II:44–51. [Google Scholar]

[R5] Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics. 2006;22:2059–2065. doi: 10.1093/bioinformatics/btl355. [DOI] [PubMed] [Google Scholar]

[R6] Ewens A, Luo L, Berleth E, Alderfer J, Wollman R, Hafeez BB, Kanter P, Mihich E, Ehrke MJ. Doxorubicin plus Interleukin-2 Chemoimmunotherapy against Breast Cancer in Mice. Cancer Res. 2006;66:5419–5426. doi: 10.1158/0008-5472.CAN-05-3963. [DOI] [PubMed] [Google Scholar]

[R7] Geiger D, Girosi F. Parallel and Deterministic Algorithms from Mrfs - Surface Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1991;13:401–412. [Google Scholar]

[R8] Gold S, Rangarajan A. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1996;18:377–388. [Google Scholar]

[R9] James LP, Mayeux PR, Hinson JA. Acetaminophen-induced hepatotoxicity. Drug Metab Dispos. 2003;31:1499–1506. doi: 10.1124/dmd.31.12.1499. [DOI] [PubMed] [Google Scholar]

[R10] Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM. Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome. Metab. Eng. 2003;9:8–20. doi: 10.1016/j.ymben.2006.06.003. [DOI] [PubMed] [Google Scholar]

[R11] Rangarajan A, Chui H, Mjolsness E, Pappu S, Davachi L, Goldman-Rakic P, Duncan J. A robust point matching algorithm for autoradiograph alignment. Visualization in Biomedical Computing. 1997;1:379–398. doi: 10.1016/s1361-8415(97)85008-6. [DOI] [PubMed] [Google Scholar]

[R12] Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 2006;78:779–787. doi: 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]

[R13] Weljie AM, Newton J, Mercier P, Carlson E, Slupsky CM. Targeted profiling: quantitative analysis of 1H NMR metabolomics data. Anal. Chem. 2006;78:4430–4442. doi: 10.1021/ac060209g. [DOI] [PubMed] [Google Scholar]

PERMALINK

2D NMR Metabonomic Analysis

Ming Zheng

Peng Lu

Yanzhou Liu

Joseph Pease

Jonathan Usuka

Guochun Liao

Gary Peltz