Abstract
The reliability and robustness of metabolite assignments in 1H NMR is complicated by numerous factors including variations in temperature, pH, buffer choice, ionic strength, and mixture composition that led to peak overlap and spectral crowding. As sample conditions fluctuate, peak drift and line broadening further complicate peak deconvolution and a subsequent chemical assignment. We present a collection of 1D 1H NMR spectra of 54 common metabolites at varied pH (6.0 to 8.0 in 0.5 step increments) and temperature (290K to 308K) to quantify chemical shift variability to facilitate automated metabolite assignments. Our results illustrate the fundamental challenges with accurately assigning NMR peaks under varied environmental conditions prevalent in complex mixtures. Phosphorylated metabolites showed a larger variation in chemical shifts due to pH, whereas; amino acids showed a higher variation due to temperature. Mixtures of phosphorous compounds showed a consistently poor reliability in achieving an accurate assignment. Phosphorylated cholines, amino acids, and glycerols yielded a 40% false negative rate for 7 out of 9 mixture conditions. Amino acids had a false negative rate of 57% at 298K and pH 8. Our results demonstrate that the automated assignments of complex biofluid mixtures requires an expert to intervene to confirm the accuracy of metabolite assignments. Our analysis also indicates the need for reference databases to include spectra under a variety of conditions that includes mixtures and a range of pH and temperature to improve the accuracy and reproducibility of metabolite assignments.
Keywords: pH, temperature, 1H NMR, automated metabolite assignment, biofluids
Graphical Abstract

INTRODUCTION
Metabolomics is a rapidly expanding field reliant on different analytical techniques and variable sample origins that create unique challenges to accurately annotate spectral data [1]. One-dimensional proton (1D) 1H NMR offers many advantages in metabolomics profiling, namely the relative simplicity of sample preparation [2], a robust method to accurately quantify the most abundant compounds in a metabolome [3], the ready identification of compounds that are difficult to ionize by mass spectrometry [4], the identification of compounds of identical mass, and studies in living organisms [4, 5]. Thus, 1D 1H NMR is ubiquitously performed for a variety of qualitative and quantitative metabolomics studies [6]. A major challenge commonly encountered with 1D 1H NMR metabolomics is the increase in spectral crowding and peak overlap that occurs with an increase in sample complexity [7]. Accurate metabolite assignments become ambiguous, complicated, and potentially, impossible to complete with an increase in spectral complexity [3, 8]. In addition to peak overlap, external influences that includes variations in pH, temperature, ionic strength, buffer choice, metabolite concentrations, and mixture composition also present challenges to accurate assignments [9]. These and other factors may induce significant chemical shift deviations from standard reference spectra.
One approach to limiting these issues is the use of two-dimensional (2D) experiments such as 2D 1H-1H TOCSY and 2D 1H-13C HSQC to validate the 1D 1H NMR assignments [10]. While homonuclear and heteronuclear 2D correlation experiments can help resolve overlapped regions and account for chemical shift variability (i.e., lower sensitivity of 13C chemical shifts to sample conditions), but they also incur an increase in overall experiment time and may potentially lead to a loss of information. For example, peaks may be missing in 2D spectra due to a lower signal-to-noise, because of the lower sensitivity of 15N/13C resonances in natural abundance samples, or because only a selective subset of the metabolites were 15N- or 13C-labeled from a 15N/13C-tracer (e.g., 13C6-glucose).
Sample handling and preparation strategies greatly influence the final spectral quality and ease of metabolite assignments [11]. Sample condition variability, such as differences in pH and temperature, are critical factors that can negatively influence assignment accuracy and should be carefully considered during the study design [12, 13]. Unfortunately, the large variation in sample preparation and data collection protocols employed by the NMR metabolomics community greatly complicates the metabolite assignment strategy. Metabolite assignment software or protocols range from fully automated to entirely manual approaches [8, 14–16]. Fully automated approaches are quick but can lead to high errors in the assignments especially when variable sample conditions are encountered. Alternatively, manual assignments using public databases such as the human metabolome database (HMDB) [17] or BMRB [18] can produce highly accurate results but are extremely time consuming and requires extensive training [16]. Nevertheless, manual approaches to metabolite assignments are also severely challenged by poor spectral matches to reference libraries due to environmentally induced chemical shift changes. Simply, the magnitude and direction of a chemical shift perturbation for each individual NMR resonances is highly variable, and in practice, unpredictable, which makes unambiguous matches extremely difficult. Similar changes in peak linewidths further confound the process of annotating NMR metabolomics spectra.
To start resolving these issues, we have collected a series of 1D 1H NMR spectra of 54 common metabolites acquired over a range of pHs and temperatures, and then compiled the chemical shift changes on a per resonance basis. We have catalogued chemical shift uncertainties relative to reference spectra at pH 7 and 298K for a diversity of functional groups. Further, we demonstrate the challenges faced by variable pH and temperature conditions in the analysis of 1D 1H NMR spectra of individual metabolites and mixtures. The magnitude of peak shifts and linewidth changes are metabolite dependent with the NMR spectra of some compounds being more significantly impacted than others. As a result, software designed to automate NMR assignments are severely challenged by these chemical shift uncertainties and, unsurprisingly, by the increasing complexity of the mixtures [8]. Our results clarify the need for spectral databases to expand their content to include pH and temperature variations for a diversity of chemical functional groups and reference spectra for a variety of complex mixtures including biofluids.
MATERIAL AND METHODS
Urine and Serum Samples
De-identified human serum samples were obtained from the NIH NeuroBioBank, specifically the University of Miami Brain Endowment Bank. De-identified human urine samples were obtained from the Multiple Sclerosis Clinic within the Saunders Medical Center (Wahoo, NE, USA). All materials, biospecimens, and human subjects’ data collected, stored, and maintained for and during the conduct of this research was reviewed and approved by the University of Nebraska-Lincoln Institutional Review Board (IRB, IRB#: 20200820533EP, IRB#: 20180517991EPCOLLA).
Preparation of Standard NMR Samples
The standard samples for 1D 1H NMR data collection included 5 mixtures and 54 individual compounds consisting of 33 phosphorous-containing compounds and 21 amino acids. The phosphorous-containing compounds were divided into four groups comprising four NMR samples and consisted of phosphorylated nucleic acid analogs (group 1, 11 compounds), sugars (group 2, 8 compounds), coenzymes (group 3, 6 compounds), and cholines, amino acids and glycerols (group 4, 8 compounds). The 21 amino acids were combined to form one mixture of amino acids. The complete list of individual compounds and mixture compositions are available in Table S1. Stock solutions for amino acids and phosphorous-containing compounds were prepared in D2O (Sigma Aldrich, 99.8%) at a concentration of 90 mM and 100 mM, respectively. The stock solutions was then diluted into a 50 mM phosphate buffer with 50 μM of 3-(tetramethylsilane) propionic acid-2,2,3,3-d4 (TMSP-D4) in 99.8% D2O at five different pH values (uncorrected) of 6.0, 6.5, 7.0, 7.5, and 8.0 to prepare NMR samples for each individual compound at a final concentration of 1.5 mM and 15 mM for amino acids and phosphorus-containing compounds, respectively. 1D 1H NMR spectra were collected at 298K. In addition, 1D 1H NMR spectra for the pH 7.0 samples were collected at 290K, 294K, 304K, and 308K. The same protocol was followed for preparing each of the five standard mixtures outlined in Table S1.
Preparation of Serum NMR Samples
A 150 μL aliquot of serum was added to 300 μL of methanol to precipitate proteins. The 1:2 mixture of serum:methanol was vortexed for 10 seconds and then incubated at 4°C for 10 minutes followed by centrifugating at 14,000 g for 20 mins at 4°C to pellet the proteins. The supernatant was collected and centrifuged again at 14,000 g for 5 min at 4°C to pellet any remaining proteins. Supernatants were then collected, snap frozen in liquid nitrogen, and dried by speed vacuum centrifugation (SpeedVac R Plus, Savant) followed by lyophilization using FreeZone™ (Labconco, Kansas City, MO) for 24 hours. Samples were then stored at −80°C until NMR analysis. At time of data collection, samples were reconstituted using 150 μL of a 50 mM phosphate buffer in “100%” D2O at pH 7.2 (uncorrected) with 50 μM of TMSP-D4 as a chemical shift reference. The samples were centrifuged at 14,000 g for 20 minutes at 4°C to remove any precipitant. The supernatants were placed in a 3 mm NMR tube for data acquisition [19].
Preparation of Urine NMR Samples
A 150 μL aliquot of urine was centrifuged at 14,000 g for 10 min to pellet debris. 135 μL of the supernatant was combined with 15 μL of a 50 mM phosphate buffer in “100%” D2O at pH 7.2 (uncorrected) with 50 μM of TMSP-D4 as a chemical shift reference [20]. The samples were then transferred to a 3 mm NMR tube for data acquisition.
NMR Data Collection and Processing for Standard Samples
1D 1H NMR spectra were collected on a Bruker Avance III-HD 700 MHz spectrometer equipped with a quadruple resonance QCI-P cryoprobe (1H, 13C, 15N, 31P) with z-axis gradients. A SampleJet automated sample changer system, automated tune and match device, and Bruker ICON-NMR software was used to automate the NMR data collection. 1D 1H NMR spectra were collected with 32K data points, 64 scans, 4 dummy scans, and a spectral width of 9,090 Hz using an excitation sculpting pulse sequence [21]. 1D 1H NMR spectra were processed using Bruker TopSpin 3.6 and MestreNova 12.0.2 (https://mestrelab.com/) in NMRBox [22]. The spectra were processed with exponential multiplication and twice zero filled followed by Fourier transform. Spectra were manually phased and referenced to TMSP-D4. Peaks were picked, integrated, and then exported to obtain peak centers and linewidths. The NMR data set is available by request from the authors and will be made available through the National Metabolomics Data Repository at Metabolomics Workbench (https://www.metabolomicsworkbench.org/.
1D 1H NMR and 2D 1H-13C HSQC Data Collection and Processing for Biofluids
1D 1H NMR spectra were collected with 65K data points, 138 scans, 4 dummy scans, and a spectral width of 9,090 Hz using an excitation sculpting pulse sequence.[21] Natural abundance 2D 1H-13C HSQC, and 1H-1H TOCSY spectra were collected at 25% sparsity using our deterministic non-uniform sampling (NUS) schedule [23]. 2D 1H-1H TOCSY spectra were collected with 8 scans, 16 dummy scans, and 2,048 data points and a spectral width of 7,003 Hz in the direct dimension and 256 data points with a spectral width of 7,003 Hz in the indirect dimension. 2D 1H-13C HSQC spectra were collected with 128 scans, 16 dummy scans, and 1,024 data points and a spectral width of 11,160 Hz in the direct dimension and 256 data points and a spectral width of 29,060 in the indirect dimension Both 1D and 2D NMR spectra were processed using Topspin 3.6 and were Fourier transformed, phased, and referenced to TMSP. NUS spectra were reconstructed using the multi-dimensional decomposition method (MDD) [24].
Metabolite Assignment Strategy
The metabolites in the standard and biofluid mixtures were assigned using Chenomx NMR Suite version 8.51 (https://www.chenomx.com/), the Human Metabolome Database (https://hmdb.ca) [17] and the Chenomx 700 MHz reference library. Spectra labeled ‘automated’ were assigned using the batch fit algorithm with reference libraries containing the set of 33 phosphorous compounds and 21 amino acid mixtures. Automated assignments were then triaged, or manually inspected and validated for accuracy. An overview of the protocol is shown in Figure 1A.
Figure 1. Study Design to Assess Chemical Shift Variation Based on Temperature and pH.

(A) Workflow of NMR metabolite assignment protocol. The dashed arrow shows the iterative steps in the assignment process. (B) Bar graph demonstrating the distribution of 1H chemical shifts for metabolites in HMDB. A typical (C) 2D 1H-1H TOCSY spectrum collected from a human serum sample and (D) a typical 2D 1H-13C HSQC spectrum collected from a human urine sample. The corresponding 1D 1H NMR spectra are in the lower right-hand corner of each figure. Each Venn diagram represents the number of metabolites generally identified from the 1D 1H NMR (yellow) and 2D NMR (blue) experiments for the indicated biofluids. (E) Study design color scheme that is used throughout the study to indicate a specific pH or temperature value.
RESULTS AND DISCUSSION
Need for Cataloging Chemical Shift Variations Across Metabolite Classes
1D 1H NMR is a routine choice for untargeted metabolomics since it provides information rich data that is easy to acquire and amenable to high throughput analysis [25]. However, the process of annotating NMR spectral data and assigning metabolites to their corresponding spectral peaks is a highly time consuming process that is prone to errors (Figure 1A) [26]. Spectral complexity and the resulting high degree of peak overlap further complicates the assignment workflow. This is clearly evident from the histogram plot of chemical shift distributions for metabolites deposited in the HMDB (Figure 1B) [17]. The low spectral resolution also leads to a loss in observable peak multiplicities, which are equally important to the assignment process and would result in more ambiguous peaks.
2D NMR experiments that include 2D 1H-1H TOCSY (Figure 1C) and 2D 1H-13C HSQC (Figure 1D) can be acquired to supplement the 1D 1H NMR data and partly resolve the peak overlap problem. Of course, the significant increase in experiment time, the decrease in signal-to-noise, especially for natural abundance samples, and the potential loss of low abundant metabolites may diminish the utility of 2D NMR experiments compared to 1D NMR experiments. Further, the use of a 15N/13C tracer chemical to enrich the 15N- or 13C-labeling of the metabolome is only selective for metabolites enzymatically derived from the 5N/13C tracer. In this regard, metabolites detected by the 1D NMR experiment may be missing in the 2D NMR spectra of a 15N/13C-labeeld metabolome. Chemical shift variability resulting from differences in pH, temperature, ionic strength, buffer, metabolite concentrations, and mixture composition further confounds the metabolite assignment process [9]. The uncertainties or errors in chemical shifts are unknown values especially on a per peak basis. Therefore, both manual and automated approaches to metabolite assignments tend to use a single large error range to ensure matches between reference and experimental NMR spectra. A higher false positive rate is the likely outcome with generous, uniform error-bars, while alternatively, a higher false negative rate will occur with tighter error-bars. Further, the choice of chemical shift error ranges is completely reliant on uninformed guesses. To partly address these issues, a series of 1D 1H NMR spectra were collected at various temperatures and pH values to better define chemical shift error bars for matching spectra (Figure 1E). Our spectral data also illustrates the broad impact of pH and temperature changes on peak linewidths. This demonstrates the need for metabolomics databases to augment 1D 1H NMR data sets with spectra collected over a range of temperatures and pH values to increase assignment efficiency and reliability.
pH Dependent Chemical Shift Variation Across Metabolite Groups
A set of 1D 1H NMR spectra were collected for each of the common metabolites comprising amino acids and phosphorous-containing compounds as detailed in Table S1. The 21 common amino acids provided enough structural diversity to characterize trends in chemical shift variations based on functional group. Additionally, the common structural features for the amino acids highlight annotation challenges due to chemical shift overlap. The 33 phosphorous containing compounds provided a similar framework while exhibiting a higher sensitivity to pH and a likely upper-bound to chemical shift variability. The sample pH was only varied from 6 to 8 to bracket the typical target pH of 7–7.4 and to capture the likely maximal pH-induced chemical shift changes encountered in routine metabolomics samples. Figure 2 summarizes the pH dependent chemical shift variations per functional group for the 21 amino acids and the 33 phosphorous containing compounds. Full profiles of the pH-induced 1H chemical shifts changes for the 21 amino acids are presented in Figures S1A–D and tabulated in Table S2.
Figure 2. Chemical Shift Variation as a Function of pH and Metabolite Group.

(A-I) Dot plots summarizing the chemical shift variation across five different pH values for select amino acids. Amino acids with relatively low to moderate chemical shift changes are plotted in panels A-D. Amino acids with relatively high chemical shift changes are plotted in panels F-I. The 1H chemical shift changes (Δ1H ppm) relative to pH 7 for (A,F) Hα, (B,G) Hβ, (C,H) Hγ, and (D,H) Hδ are plotted per amino acid. Each circle is colored according to pH as outlined in Figure 1E. (E) General amino acid structure depicting the standard nomenclature used throughout the manuscript. (A-I) Dot plots summarizing the chemical shift variation across five different pH values for select phosphorus containing compounds (Group 1). The 1H chemical shift changes (Δ1H ppm) relative to pH 7 for (J) Hα, (K) Hβ, (L) Hγ, and (M) Hδ are plotted per nucleic acid analog. (N) General nucleic acid structure depicting the standard nomenclature used throughout the manuscript. pH values are represented by colored circles: 6.0 (yellow), 6.5 (green), 7.5 (teal), 8.0 (blue). Table S1 lists the three-letter code for each metabolite.
Figures 2A–D displays 1H chemical shift variations compared to pH 7 for the selected amino acids. 1H chemical shift variations as a function of pH were grouped according to the common Hα, Hβ, Hγ, and Hδ notations for amino acids (Figure 2E). Most NMR resonances exhibited a downfield shift as the pH increased from 6 to 8. The 1H chemical shift changes (Δ1H ppm) ranged from −0.008 ppm to +0.1 ppm. The magnitude and spread of these chemical shift changes generally increased in order from Hα to Hδ with Hγ and Hδ having the greatest pH-induced chemical shift changes. Hγ and Hδ chemical shifts ranged from −0.002 ppm to +0.005 ppm and from −0.008 ppm to +0.008 ppm, respectively. An example of this trend can be easily seen by comparing the Hα of Asp (Figure 2A) with its Hβ (Figure 2B). Four notable outliers from this trend were the Hβs of leucine and histidine, Hγ of glutamate, and the Hδ of isoleucine, which shifted upfield with variable magnitude changes as the pH was increased.
The amino acids with some of the largest pH-induced chemical shift changes were plotted separately for clarity in Figures 2F–I. Specifically, the presence of exchangeable sidechain 1H resonances in arginine (pKa 12.48), cysteine (pKa 8.37), histidine (pKa 6.04), and lysine (pKa 10.54) resulted in sizable pH induced chemical shift changes for Hα (Figure 2F) and Hδ (Figure 2I) due to the sidechain pKa values. Notably, the Hβ and Hγ shifts (Figures 2G, H) were relatively unaffected and similar to the changes observed for the other amino acids (Figures 2B,C). A likely impact of hydrogen bonding in pH induced chemical shift changes was observed for serine and threonine. A relatively large variation of 0.004 to 0.008 ppm was observed for the Hβ chemical shift.
The phosphorous containing compounds were assigned to one of four groups that included (1) nucleic acid analogs, (2) phosphorylated analogs of sugars, (3) coenzymes, and (4) choline analogs, amino acids, and phosphoglycerols. Figures 2J–M highlights the chemical shifts of nucleic acid analogs. Figure 2N shows a generic nucleic acid structure and an illustration of the Hα, Hβ, Hγ, and Hδ naming convention for the 1H chemical shifts. Full profiles of the 1H chemical shifts changes for the remaining phosphorus containing compounds are presented in Figures S2A–J. Full profiles of the pH-induced 1H chemical shift changes for the 33 phosphorus containing compounds are tabulated in Table S3. The larger variance in pKa values and the considerable structural complexity of the phosphorous compounds compared to amino acids resulted in greater chemical shift variations. The largest Δ1H ppm for the amino acids was 0.013 for the Hδ of asparagine at pH 8.0 (Figure 2G). Conversely, the largest Δ1H ppm for the phosphorous compounds was 0.078 for the Hδ of glucose 1,6-bisphosphate (GBP) at pH 6.0 (Figure S2D). There was also a considerable deviation from the pH-dependent downfield shift observed with the amino acids. In particular, the Hα (Figure 2J) shifted drastically upfield (ADP) or downfield (TTP, UMP) at pH 7.5 compared to the other pH values. This inconsistency in a pH-dependent trend in Δ1H ppm for the phosphorus containing compounds was seen for all four groups and for each 1H resonance type and can again be attributed to pKa differences. It should be noted that the group with the highest chemical shift variation was the phosphorylated sugars, which when combined with their complex 1D 1H NMR spectra will likely lead to a higher uncertainty in obtaining a correct assignment from complex mixtures.
Temperature Dependent Chemical Shift Variation Across Metabolite Groups
The effect of temperature on the chemical shifts for the 21 amino acids was also evaluated by varying the sample temperature from 290K to 308K in 4K increments. The sample temperature was only varied from 290K to 308K to bracket the typical target temperature of 298K and to capture the likely maximal temperature-induced chemical shift changes encountered in routine metabolomics samples. Temperature-dependent chemical shift changes were measured relative to 298K. Full profiles of the temperature-induced 1H chemical shifts changes for the 21 amino acids are tabulated in Table S2. Temperature was found to have a greater impact on chemical shift and the Δ1H values presented an opposite trend compared to pH (Figure S1E–F). The Δ1H values tended to increase moving along the amino acid sidechain from the Hα to the Hδ position. Specifically, Δ1H ppm ranged from −0.005 to +0.01 for Hα, −0.005 to +0.025 for Hβ, −0.01 to +0.025 for Hγ, and −0.005 to +0.020 for Hδ. These chemical shift ranges were much broader than previously observed for the pH changes.
The effect of temperature on chemical shifts was also observed for the phosphorous containing compounds, and in contrast to the amino acids, temperature had less of an impact on the distribution of chemical shifts compared to pH (Figure S2K–N). Full profiles of the temperature-induced 1H chemical shifts changes for the 33 phosphorous containing compounds are tabulated in Table S3. Most of the phosphorous containing compounds had temperature-induced Δ1H values within ± 0.05 ppm. Conversely, the amino acids ranged from −0.01 to +0.025 ppm and were primarily shifted downfield. Peak shifts due to temperature and pH can vary drastically, where the exact range and magnitude of these chemical shift variations are distinct to each structure and NMR resonance. Consequently, chemical shift variabilities present unique challenges when interpreting NMR spectra of complex mixtures and creates serious uncertainties in obtaining correct metabolite assignments.
pH Dependent Changes to Chemical Shift Also Affects Peak Linewidths
In addition to chemical shift variabilities, changes in peak linewidths due to pH changes can further increase the challenges in assigning metabolites within complex mixtures. To quantify pH-induced linewidth changes, individual peak regions were fitted using MestreNova to estimate linewidths at each pH. Again, changes in linewidths were measured relative to pH 7. Full profiles of the temperature and pH-induced changes in peak multiplicity and linewidths for the 21 amino acids and 33 phosphorus containing compounds are tabulated in Tables S2 and S3, respectively. A series of stack plots of expanded 1D 1H spectra for select metabolites and NMR resonances over the pH range of 6 to 8 are shown in Figures 3A–I. Each stack plot has a chemical structure of the metabolite that highlights the selected NMR resonances and a line plot indicating the linewidth changes at each pH. Selected peaks for the amino acids arginine (Figure 3A), histidine (Figure 3B), and lysine (Figure 3C) show that peaks broaden variably as a function of pH. Arginine and histidine exhibited broader linewidths at lower pH, while lysine showed broader linewidths at higher pH. The effect of pH on the apparent peak multiplicity was also demonstrated with histidine (Figure 3B) where the Hα peak shifted from a triplet at pH 6.0 to quartet at pH 6.5–8.0.
Figure 3. NMR Spectral Variations as a Function of pH and Metabolite Group.

1D 1H NMR stack plots and a corresponding line plot showing the distribution of peaks compared to pH 7 for a select set of amino acids: (A) arginine (ARG), (B) histidine (HIS), (C) lysine (LYS), and phosphorylated compounds: (D) pyridoxal 5-phosphate (P5P), (E) fructose 6-phosphate (F6P), (F) glucose 1,6-bisphosphate (GBP), (G) glucose 6-phosphate (G6P), (H) cytidine monophosphate (CMP), and (I) thymidine monophosphate (TMP). 1D 1H spectra traces and associated circles are colored according to pH: 6.0 (yellow), 6.5 (green), 7.0 (black), 7.5 (teal), and 8.0 (blue). The region of the 1D 1H spectra colored grey in I corresponds to the peaks that merge with the Hγ peak of TMP.
Similar results were observed for selected peaks from the phosphorous compounds pyridoxal 5’-phosphate (Figure 3D), fructose 6-phosphate (Figure 3E), glucose 1,6-bisphosphate (Figure 3F), glucose 6-phosphate (Figure 3G), cytidine monophosphate (Figure 3H), and thymidine monophosphate (Figure 3I). Like the amino acids, the selected peaks broaden differently as a function of pH. The NMR peaks for fructose 6-phosphate, glucose 1,6-bisphosphate, glucose 6-phosphate, and cytidine monophosphate became sharper at lower pH, but pyridoxal 5’-phosphate became broader. Apparent peak multiplicities again changed as a function of pH in fructose 6-phosphate and thymidine monophosphate. At pH 7, the fructose 6-phosphate Hβ displays a single complex multiplet of low intensity that separates into two distinct multiplets at both higher and lower pH values. Similarly, decreasing the pH causes the thymidine monophosphate Hγ multiplet to collapse into a neighboring multiplet. The presence of exchangeable protons further complicates pH-dependent linewidth changes leading to variable peak broadening. Both thymidine monophosphate and glucose 1,6-bisphosphate demonstrate this effect by greater peak broadening at pH 6.5 and 7.5 relative to pH 6.0 and 8.0.
pH Dependent Linewidth Changes Lead to Spectral Crowding
Variable changes in peak positions, splitting patterns and linewidths makes it difficult to analyze a 1D 1H NMR spectrum and assign a metabolite. The metabolite assignment problem is further complicated by complex mixtures that leads to peak overlap and spectral crowding. To clearly illustrate the problem, various bubble plots are shown in Figure 4 where each circle’s center and diameter corresponds to the peak center and the chemical shift range, respectively, that the peak occupies at a given pH. The distribution of chemical shifts and linewidths results in spectral regions that are highly populated with extensive peak overlap. Similar results were seen for the peak distribution for amino acids (Figure 4A) and coenzymes (Figure 4D). The expanded spectral regions at 0–5 ppm (Figure 4B) and 6.5–8.5 ppm (Figure 4C) further demonstrate the high degree of peak overlap for the amino acids at a single pH of 6.5. A similar result was obtained with the phosphorous compounds. The expanded spectral regions of 1–3 ppm (Figure 4E), 3–4 ppm (Figure 4F), 4–5 ppm (Figure 4G), and 5–11 ppm (Figure 4H) at a pH of 6.5 again highlights the spectral crowding that occurs due to the variability in chemical shifts and linewidths. This problem would be further exasperated if the data from the amino acids and phosphorous compounds were combined into a single bubble plot.
Figure 4. Spectral Crowding Due to pH-Induced Linewidth Changes in Complex Metabolic Mixtures.

Bubble plots with circle diameters corresponding to peak width and color corresponding to pH 6.0 (yellow), 6.5 (green), 7.5 (teal), 8.0 (blue) for (A) amino acids and (D) coenzymes. Expanded views of the amino acid bubble plots corresponding to the (B) 1.0 to 5.0 ppm region and the (C) 6.50 to 8.5 ppm region. (E-H) Expanded views of the bubble plots corresponding to the four groups of phosphorylated compounds consisting of nucleic acid analogs (red), sugars (blue), coenzymes (gray), and cholines, amino acids and glycerols (green). Expanded views correspond to the (E) 1 to 3 ppm region, (F) the 3 to 4 ppm region, (G) the 4 to 5 ppm region, and the (H) 5 to 11 ppm region.
The spectral region with the highest peak density is 2–4 ppm. Accordingly, this spectral crowding would likely cause difficulties in identifying peak splitting patterns, which would hinder metabolite assignments as individual peaks lose their unique, identifying characteristics. As demonstrated in Figures 3A–I, pH can alter apparent peak splitting patterns, peak locations, and peak resolutions. Factoring in these variations will only worsen peak crowding and overlap, and the challenges of annotating 1D 1H NMR spectra becomes more difficult, if not impossible to achieve. It should be noted that the amino acid mixture (Figures 4A–C) contains only 21 compounds, the coenzyme phosphorous mixture (Figure 4D) contains only 6 compounds, and the full phosphorous compound mixture (Figure 4E–H) contains only 33 compounds. The size of these mixtures is significantly smaller than metabolomics samples from biofluids or cell/tissue extracts, which can have upwards of a hundred detectable metabolites.
Matrix Effects in Known Mixtures
The Chenomx batch fit algorithm, the 700 MHz Chenomx metabolite library (338 compounds), and two individualized metabolite libraries (i.e., amino acids and phosphorous containing compounds) were used to demonstrate the challenges of assigning metabolites from complex mixtures, and to evaluate the robustness of an automated metabolite assignment approach without accounting for chemical shift variability. The two individualized metabolite libraries were derived from the complete 700 MHz Chenomx metabolite library. Please note that six phosphorous containing compounds that included inosine monophosphate, thymidine monophosphate, glucose 1,6-bisphosphate, dihydroxyacetone phosphate, O-phospho-L-tyrosine, and cyclophosphamide compounds were not available in the 700 MHz Chenomx metabolite library. So, the individualized library for the phosphorous containing compounds only contained 27 reference spectra instead of 33. Importantly, the 6 compounds missing from the metabolite reference libraries were not included in the calculations of true positives, false positives, and false negative rates. These three metabolite libraries were then used to assign the 1D 1H NMR spectra obtained for our known amino acid (Figure 5A) and phosphorous containing compounds (Figure 5B) mixtures. The complete list of true positives, false positives, and false negative rates are provided in Table S4.
Figure 5. The Accuracy of Automated NMR Assignments are Affected by Chemical Shift Variabilities.

1D 1H NMR stack plots of (A) the amino acid mixture, (B) a phosphorylated compound mixture (Group 2), and human (C) serum and (D) urine samples. Experimental 1D 1H NMR spectra are colored black. The spectral line comprising the sum fit of the reference NMR spectra for the metabolites identified by Chenomx are colored red. The manually assigned 1D 1H NMR spectra are colored blue. The solid and dashed boxes overlayed onto the 1D 1H NMR spectra identify the expanded spectral regions shown to the right of each spectrum. The solid box identifies the expanded spectral region that matches well to the sum of reference spectra, while the dashed box identifies the spectral region that was poorly fitted to the sum of reference spectra. The Venn diagrams summarize the total number of metabolites (white circle), the correctly assigned metabolites (green circle), and the incorrectly assigned metabolites (red circle) identified by Chenomx.
For the amino acid mixture of 21 compounds at pH 7.0 and 298K, the batch fit algorithm assigned a total of 90 metabolites. Of these 90 metabolites, 20% (18) were true positives, 80% (72) were false positives. Three amino acids (14%), asparagine, cysteine, and histidine, were false negatives. Repeating the batch assignment for the five pHs and four temperatures yielded an average true positive rate of 17.4% ± 0.2%. The batch assignment of the 1D 1H NMR spectra for the amino acid mixture was repeated with the individualized metabolite library containing only the 21 amino acids present in the mixture. A true positive rate of 81% was achieved for the amino acid mixture at pH 7.0 and 298K and an average true positive rate of 73% ± 1.1% was achieved across the entire set of pH and temperature conditions. Using either the 700 MHz Chenomx metabolite library or the amino acid library, the true positive assignment rate decreased as pH varied from 7.0. Notably, temperature deviations from 298K impacted true assignment rates far less. Four amino acids (i.e., glycine, phenylalanine, serine, and tryptophan) were correctly assigned 100% of the time regardless of the pH or temperature condition. Conversely, histidine, glutamine, cysteine, and cystine were correctly assigned less than 50% of the time. Cysteine failed to be assigned at all by the batch fit algorithm, and histidine was only assigned once. This outcome is consistent with the chemical shift and linewidth variations where cysteine and histidine showed consistently larger chemical shift deviations from pH 7.0 (Figures 2F–I, 3B). The expanded spectral regions in Figure 5 demonstrates where Chenomx accurately assigned (dashed line) and incorrectly assigned (solid line) amino acids within the mixture. The red spectral line shows the sum fit of the reference spectra for the metabolites identified by Chenomx that match the black experimental 1D 1H NMR spectrum. The higher intensity of the fit line (red) compared to the true spectral intensity (black) is indicative of overfitting by the software.
The 1D 1H NMR spectra for the mixtures of phosphorous containing compounds were assigned using the same 700 MHz Chenomx metabolite library and then the individualized library of phosphorous containing compounds. The batch fit algorithm performed significantly worse for the phosphorylated compounds compared to the amino acids. The best results were for glucose-1-phosphate, which was correctly assigned for all conditions except pH 6.0, and for glucose 6-phosphate that was correctly assigned in all spectra except at pH 6.0 and 8.0. The group 2 mixture of phosphorylated sugars at pH 7.0 and 298K produced only a 7.7% (2) true positive rate, a 76.9% (20) false positive rate, and a 15.4% (4) false negative rate. The automated assignments improved when the targeted phosphorous containing compound library was used. Chenomx correctly assigned 50% of the compounds with 3 true positives, 3 false negatives, and zero false positives. Overall, the four mixtures of phosphorous containing compounds had true positive rates predominantly less than 50% for all pH and temperature conditions. The group 2 and 4 mixtures performed slightly better with an average true positive rate of 44.4% ± 14.2% and 43.3% ± 17.6%, respectively. The poor results may be partially attributed to the greater peak crowding in the sugar region of the NMR spectra and the lack of chemical shift dispersion. The expanded spectral region shown in Figure 5B clearly demonstrates the poor fit between the red spectral line comprising the sum fit of reference metabolite spectra identified by Chenomx that match the black experimental 1D 1H NMR spectrum. The random fluctuations in the true positive rates across pH and temperature were correlated with the chemical shift variability detailed in Figures 2 and S2.
Biofluid samples were used to further demonstrate the challenges that chemical shift variability imposes on an automated approach to annotate complex 1D 1H NMR spectra. The NMR spectra were fit automatically then manually triaged using the 700 MHz Chenomx metabolite library. For the 1D 1H NMR spectra obtained from the serum sample shown in Figure 5C, an automated assignment fit a total of 75 compounds where a manual triage reduced the number of compounds that fit the spectra to 52. The dashed box highlights an expanded spectral region with a poor fit from the automated assignments, while the solid box highlights an expanded region that appears to be well fit by the software. Nevertheless, the solid box includes a crowded region that is still overfit by the software while the dashed boxed region appears underfit. For the 1D 1H NMR spectra obtained from the urine sample shown in Figure 5D, an automated assignment fit a total of 125 compounds where a manual triage reduced the number of compounds that fit the spectra to 46. The expanded spectral region highlighted by a solid box appears to be well fit by the software but upon closer inspection shows the region is actually overfit while the dashed region is again underfit. Overall, the automated assignments falsely fit numerous compounds because it can’t accurately account for the variability in both peak position and linewidths, which is consistent with a similar conclusion made by Tredwell et al (2011) [16]. Most assignment software can only make minor adjustments to account for peak fluctuations so the accurate annotation of 1D 1H NMR spectrum still requires a manual intervention [8]. It is important to note that the results reported with the Chenomx batch fit algorithm are not necessarily representative of all metabolomics assignment software. Instead, our analysis was intended to highlight the high failure rate that is likely to occur if chemical shift and linewidth variability are not properly accounted for by either a manual or automated assignment protocol.
CONCLUSION
While the automated assignment of 1D 1H NMR spectra is the ideal scenario for metabolomics analyses, it remains fraught with problems. Variations in pH, temperature, ionic strength, buffer, mixture composition, sample preparation, and magnetic field strength, among other issues, pose serious challenges when annotating an experimental NMR spectrum with a database of reference spectra. Current databases are data limited and do not account for various environmental factors that contribute to peak position uncertainties. We have shown that the accuracy of an automated assignment decreases when peak variability is not adequately accounted for, which also confounds a manual assignment [16]. Adherence to pH, temperature, ionic strength, and magnetic field strengths that are consistent with the conditions used to collect the NMR reference data used by automated assignments software improves assignment accuracy, but of course it doesn’t account for chemical shift variabilities due to other factors like differences in mixture compositions or metabolite concentrations [9]. The expansion of reference databases to include matrix and sample condition effects would allow for improved data analyses. Specifically, knowing the true chemical shift uncertainty associated with each individual NMR resonance or functional group would precisely define search parameters (chemical shift range) to allow for a reliable match between experimental and reference peak while avoiding overfitting of the data [8]. To partly address this need, we experimentally measured the chemical shift variability for 54 common metabolites using modest differences in pH (6 to 8) and temperature (290K to 308K) as a surrogate for the natural variability encountered in complex metabolomics mixtures. It is important to note that other factors such as differences in ionic strength, buffers, metabolite concentrations, and mixture composition, will also modulate chemical shifts. Our analysis of the pH and temperature dependency of chemical shifts is only the first step to completely characterizing the true range of chemical shift errors. Incorporating known chemical shift variability on a per resonance basis into an automated metabolite assignment software may improve the overall accuracy for one of the most crucial aspects of a metabolomics project [8].
Supplementary Material
Highlights.
Chemical shift variability is a serious problem in NMR metabolomics
Chemical shift errors lead to high false positive and false negative rates
Chemical shift variance on a per NMR resonance was measured for common metabolites
ACKNOWLEDGMENTS
We would like to thank the NIH NeuroBioBank, the University of Maryland Brain and Tissue Bank, and the Saunders Medical Center for providing the serum and urine samples used in this study. We would also like to thank Dr. Martha Morton, Director of the Molecular Analysis and Characterization Facility within the Department of Chemistry at the University of Nebraska-Lincoln for her assistance in acquiring NMR data. This material is based upon work supported by the National Science Foundation under Grant Number (1660921). This work was supported in part by funding from the Nebraska Center for Integrated Biomolecular Communication (P20 GM113126, NIGMS). The research was performed in facilities renovated with support from the National Institutes of Health (RR015468-01). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The following are the Supplementary data to this article:
An Excel file containing: Table S1: List of compounds and mixture compositions; Table S2: Complete List of Chemical Shift Changes for the 21 Amino Acids; Table S3: Complete List of Chemical Shift Changes for the 33 Phosphorus Containing Compounds; and Table S4: List of True Positives, False Positives, and False Negative Rates. A PDF file containing: Figure S1: Chemical Shift Variation in Amino Acids; Figure S2. Chemical Shift Variation in Phosphorous Containing Compounds; and Figure S3. Automated 1D 1H NMR Phosphorous Mixtures.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
REFERENCES
- [1].Nagana Gowda GA, Raftery D, Can NMR solve some significant challenges in metabolomics?, Journal of Magnetic Resonance, 260 (2015) 144–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Emwas A-HM, The Strengths and Weaknesses of NMR Spectroscopy and Mass Spectrometry with Particular Focus on Metabolomics Research, in: Bjerrum JT (Ed.) Metabonomics: Methods and Protocols, Springer New York, New York, NY, 2015, pp. 161–193. [DOI] [PubMed] [Google Scholar]
- [3].Emwas AH, Roy R, McKay RT, Tenori L, Saccenti E, Gowda GAN, Raftery D, Alahmari F, Jaremko L, Jaremko M, Wishart DS, NMR Spectroscopy for Metabolomics Research, Metabolites, 9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Marshall DD, Powers R, Beyond the paradigm: Combining mass spectrometry and nuclear magnetic resonance for metabolomics, Progress in nuclear magnetic resonance spectroscopy, 100 (2017) 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Markley JL, Brüschweiler R, Edison AS, Eghbalnia HR, Powers R, Raftery D, Wishart DS, The future of NMR-based metabolomics, Current Opinion in Biotechnology, 43 (2017) 34–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Wishart DS, Quantitative metabolomics using NMR, TrAC Trends in Analytical Chemistry, 27 (2008) 228–237. [Google Scholar]
- [7].Nagana Gowda GA, Raftery D, NMR-Based Metabolomics, Adv Exp Med Biol, 1280 (2021) 19–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Vu T, Xu Y, Qiu Y, Powers R, Shifting-corrected regularized regression for 1H NMR metabolomics identification and quantification, Biostatistics, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Takis PG, Schäfer H, Spraul M, Luchinat C, Deconvoluting interrelationships between concentrations and chemical shifts in urine provides a powerful analysis tool, Nature Communications, 8 (2017) 1662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Dona AC, Kyriakides M, Scott F, Shephard EA, Varshavi D, Veselkov K, Everett JR, A guide to the identification of metabolites in NMR-based metabonomics/metabolomics experiments, Computational and Structural Biotechnology Journal, 14 (2016) 135–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Snytnikova OA, Khlichkina AA, Sagdeev RZ, Tsentalovich YP, Evaluation of sample preparation protocols for quantitative NMR-based metabolomics, Metabolomics, 15 (2019) 84. [DOI] [PubMed] [Google Scholar]
- [12].Nawrocka EK, Urbańczyk M, Koziński K, Kazimierczuk K, Variable-temperature NMR spectroscopy for metabolite identification in biological materials, RSC Advances, 11 (2021) 35321–35325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Bhinderwala F, Evans P, Jones K, Laws BR, Smith TG, Morton M, Powers R, Phosphorus NMR and Its Application to Metabolomics, Anal Chem, 92 (2020) 9536–9545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Spicer R, Salek RM, Moreno P, Cañueto D, Steinbeck C, Navigating freely-available software tools for metabolomics analysis, Metabolomics, 13 (2017) 106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].O’Shea K, Misra BB, Software tools, databases and resources in metabolomics: updates from 2018 to 2019, Metabolomics, 16 (2020) 36. [DOI] [PubMed] [Google Scholar]
- [16].Tredwell GD, Behrends V, Geier FM, Liebeke M, Bundy JG, Between-Person Comparison of Metabolite Fitting for NMR-Based Quantitative Metabolomics, Analytical Chemistry, 83 (2011) 8683–8687. [DOI] [PubMed] [Google Scholar]
- [17].Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui VW, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth HB, Greiner R, Gautam V, HMDB 5.0: the Human Metabolome Database for 2022, Nucleic Acids Res, 50 (2022) D622–d631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Kent Wenger R, Yao H, Markley JL, BioMagResBank, Nucleic Acids Research, 36 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Bhinderwala F, Lei S, Woods J, Rose J, Marshall DD, Riekeberg E, Leite ADL, Morton M, Dodds ED, Franco R, Powers R, Metabolomics Analyses from Tissues in Parkinson’s Disease, Methods in molecular biology (Clifton, N.J.), 1996 (2019) 217–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Bhinderwala F, Powers R, NMR Metabolomics Protocols for Drug Discovery, Methods Mol Biol, 2037 (2019) 265–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Nguyen BD, Meng X, Donovan KJ, Shaka AJ, SOGGY: Solvent-optimized double gradient spectroscopy for water suppression. A comparison with some existing techniques, Journal of Magnetic Resonance, 184 (2007) 263–274. [DOI] [PubMed] [Google Scholar]
- [22].Maciejewski MW, Schuyler AD, Gryk MR, Moraru II, Romero PR, Ulrich EL, Eghbalnia HR, Livny M, Delaglio F, Hoch JC, NMRbox: A Resource for Biomolecular NMR Computation, Biophys J, 112 (2017) 1529–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Worley B, Powers R, Deterministic multidimensional nonuniform gap sampling, (2015). [DOI] [PMC free article] [PubMed]
- [24].Orekhov VY, Ibraghimov I, Billeter M, Optimizing resolution in multidimensional NMR by three-way decomposition, Journal of biomolecular NMR, 27 (2003) 165–173. [DOI] [PubMed] [Google Scholar]
- [25].Vignoli A, Ghini V, Meoni G, Licari C, Takis PG, Tenori L, Turano P, Luchinat C, High-Throughput Metabolomics by 1D NMR, Angew Chem Int Ed Engl, 58 (2019) 968–994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Amberg A, Riefke B, Schlotterbeck G, Ross A, Senn H, Dieterle F, Keck M, NMR and MS Methods for Metabolomics, Methods Mol Biol, 1641 (2017) 229–258. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
