Abstract
Reference standardization was developed to address quantification and harmonization challenges for high-resolution metabolomics (HRM) data collected across different studies or analytical methods. Reference standardization relies on the concurrent analysis of calibrated pooled reference samples at predefined intervals and enables a single-step batch correction and quantification for high-throughput metabolomics. Here, we provide quantitative measures of approximately 200 metabolites for each of three pooled reference materials (220 metabolites for Qstd3, 211 metabolites for CHEAR, 204 metabolites for NIST1950) and show application of this approach for quantification supports harmonization of metabolomics data collected from 3677 human samples in 17 separate studies analyzed by two complementary HRM methods over a 17-month period. The results establish reference standardization as a method suitable for harmonizing large-scale metabolomics data and extending capabilities to quantify large numbers of known and unidentified metabolites detected by high-resolution mass spectrometry methods.
Graphical Abstract
High-resolution metabolomics (HRM), based on the use of liquid chromatography coupled to high-resolution mass spectrometry (LC−HRMS), detects thousands of mass spectral features generated by both known and unidentified metabolites in biological samples. Methods were developed stepwise to improve data extraction,1–4 signal annotation and identification,5 quantification,1 and use in model systems and human studies.6–11 HRM analyses provide extensive coverage of metabolites12 from endogenous metabolic pathways, diet, therapeutics, xenobiotics, or the microbiome and can be used for development of cumulative metabolomics databases suitable for personalized medicine, systems pharmacology, and exposome research.13–15 HRM is not intended to replace analytical methods for specific targeted chemical analysis but rather to provide an alternative to improve metabolic coverage and lower costs for omics scale research.
Untargeted HRM data are typically reported as accurate mass-to-charge ratios (m/z) with retention times (rt) and associated peak intensities. Annotating these features to provide lists of identified metabolites with estimated concentrations would facilitate cross-study and cross-method comparisons and development of harmonized cumulative metabolomics databases. However, harmonization of LC−HRMS-based metabolomics data remains a challenge because preanalytical, analytical, and postanalytical factors can impact metabolite detection, identification, and quantification. While several strategies have been proposed to account for these differences including data normalization,16–20 removing matrix effects and ion suppression,21,22 or using pooled quality control samples and internal standards for identification and quantification, only half of spectral features detected by HRM correspond to previously characterized metabolites for which authentic standards are available.12 The remaining spectral features detected by HRM are not well annotated or characterized; however, many show significant associations with disease.13,23
Reference standardization provides a practical and community-based solution for harmonizing cross-platform and cross-laboratory data collected from untargeted HRM studies.1,24 In principle, this approach corrects for systematic technical errors by normalizing metabolite spectral peak intensities to metabolite concentrations relative to a calibrated reference sample analyzed with study samples. An ideal reference should exist in sufficient quantities for long-term routine use with every batch of samples analyzed and be representative of the biochemical composition of study samples. In practice, this can be achieved for individual laboratories by creation of a pooled reference that is calibrated against a widely available reference,1 such as National Institutes of Standards and Technology Standard Reference Material-1950 (NIST SRM1950).25
Because ion abundances detected by HRM are generally proportional to metabolite concentrations, these properties would enable use of the instrument response obtained for an identified metabolite with a stable, known concentration in the reference to be used as a reference to estimate concentrations of the same metabolite detected in study samples. Since most metabolites in reference materials (e.g., NIST1950) are stable long-term when stored at −80 °C,1 this approach enables retrospective quantification of metabolites as additional metabolites are characterized in study samples provided (1) the same metabolite is identified in the reference sample and (2) the ratio of metabolite peak areas between a reference and a study sample is consistently measured across several studies.1 Thus, a thorough examination of reference “metabolomes” and extensive characterization and reporting of metabolite identifications and concentrations in one or more reference samples would provide a practical and scalable community-based strategy to estimate concentrations and harmonize data for large numbers of metabolites.
To evaluate reference standardization as a strategy for harmonizing metabolite measurements across multiple studies, concentrations of approximately 200 metabolites were measured in each plasma pool (Qstd3, CHEAR, NIST1950). We tested the reproducibility of this approach by comparing inter-reference metabolite ratios over multiple analytical batches. We tested the applicability of this approach to quantify metabolites in heparin plasma from EDTA plasma, and also evaluated the accuracy of this approach by comparing calculated values against compiled ranges in HMDB.26 Finally, we tested use of these values to harmonize metabolomics data on metabolites collected from 3677 human plasma samples analyzed over a 17-month period across 17 different studies using two complementary methods.
MATERIALS AND METHODS
Reference Plasma Materials.
EDTA and heparin are commonly used anticoagulants for preparation of blood plasma; two EDTA plasma pools and a lithium heparin plasma pool were used as reference materials for this study. Qstd3: Pooled EDTA plasma obtained from 50 healthy donors purchased from Equitech-Bio (SHP45) without information on drug use or fasting status. CHEAR: Pooled EDTA plasma obtained from 100 adults (50 males and 50 females) purchased from BioreclamationIVT. NIST1950:25,27 Pooled lithium heparin plasma obtained from 100 healthy volunteers intended for use as a healthy reference human plasma metabolome; plasma was collected from fasted individuals with no documented drug use 72 h prior to sample collection.
Standards and Standard Curve Preparation.
Authentic chemical standards used for preparation of standard curves were from commercially available libraries (Sigma-Aldrich MSMLS) or individually purchased with stated purities of >95%. Mixtures of these standards were prepared into stock solutions and used for preparation of standard curves in 0.1% saline (NaCl) solution (3 concentrations and blank) and Qstd3 (6 concentrations and unspiked Qstd3). In total, over 700 chemical standards were analyzed for this study. CHEAR and Qstd3 were analyzed with three technical triplicates per column with every batch (54 total batches) and because supplies of SRM NIST1950 are limited, NIST1950 was analyzed with three technical replicates every 4 batches.
Sample Preparation for LC−FTMS Analysis.
50 µL of sample (plasma or saline) was mixed with 100 µL of acetonitrile containing a mixture of 9 stable isotope internal standards.28 Sample mixtures were incubated on ice for 30 min, and centrifuged for 10 min at 14 000g at 4 °C to pellet proteins. Supernatants were transferred to autosampler vials and immediately loaded onto a chilled 4 °C autosampler for analysis.
Instrumental Analysis.
Five µL aliquots of sample extracts were analyzed using liquid chromatography and Fourier transform high-resolution mass spectrometry (Dionex Ultimate 3000, HF Q-Exactive, Thermo Scientific). A dual pump configuration on the chromatographic system enabled parallel analyte separation and column flushing.28 Sample extracts were injected and analyzed using hydrophilic interaction liquid chromatography (HILIC) with positive electrospray ionization (ESI+) and also reverse phase (C18) chromatography with negative electrospray ionization (ESI−). Analyte separation for HILIC was performed with a Waters XBridge BEH Amide XP HILIC column (2.1 × 50 mm2, 2.6 µm particle size) and gradient elution with mobile phases A: LCMS grade water, B: LCMS grade acetonitrile, C: 2% formic acid. The initial 1.5 min period consisted of 22.5% A, 75% B, and 2.5% C, followed by a linear increase to 75% A, 22.5% B, and 2.5% C at 4 min and a final hold of 1 min. C18 chromatography was performed on an endcapped C18 column (Higgins Targa C18 2.1 × 50 mm2, 3 µm particle size) with mobile phases A: water, B: Acetonitrile, C: 10 mM ammonium acetate. The initial 1 min period consisted of 60% A, 35% B, and 5% C followed by a linear increase to 0% A, 95% B, and 5% C at 3 min and held for the remaining 2 min. For both methods, the mobile phase flow rate was 0.35 mL/min for the first min, and increased to 0.4 mL/min for the final 4 min. The HRMS was operated at 120k resolution and MS1 spectra were collected from 85 to 1275 m/z. Tune parameters for sheath gas were 45 for ESI+ and 30 for ESI−. Auxiliary gas was set at 25 for ESI+ and 5 for ESI−. Spray voltage was set at 3.5 kV for ESI+ and −3.0 kV for ESI−. Ion dissociation spectra were collected on an additional technical replicate using parallel-reaction monitoring (PRM) mode with targeted inclusion lists (provided in supplemental data) for expected ions in HCD mode with normalized collision energy of 35%. MS1 only injections using 120k resolution provided 4 scans/s. Injections with PRM methods implemented were analyzed at 60k resolution for MS1 scans and 30k resolution for MS2 scans (4 MS1 scans and 4 MS2 scans/s).
Metabolite Identification in Reference Samples.
Spectral peaks associated with potentially formed adducts (M+H, M+2Na−H, M+Na, M−H2O+H, M+K, M+3H, M− 2H2O+H, 2M+H, 2M−H, and 2M+ACN+H in ESI+; M−H, M+Cl, M+CH3CO2, M+HCO2, M−2H, 2M−3H in ESI−) were examined per metabolite using a ± 3 ppm mass window in xCalibur Qualbrowser software. When multiple adducts were detected, the most reproducible (technical replicate CV) and quantifiable (exhibiting the most predictable relationship between analyte concentration and peak area with an unweighted Pearson’s linear model) MS1 adduct was selected for quantification. If MS1 was not sufficient to distinguish between isobaric species, then a diagnostic MS2 fragment ion was used for quantification in the reference. Metabolite peak areas were integrated in Qualbrowser. Metabolite retention times (RT), MS1, and MS/MS spectra obtained in unspiked reference samples were matched with RT and spectral information obtained from analysis of authentic chemical standards added to plasma or saline.
Metabolite Quantification in Reference.
Method of Standards Addition to Calibrate Qstd3 Metabolite Concentrations.
An unweighted linear regression line was plotted (X-axis = metabolite concentration, Y-axis = metabolite peak area) and the negative of the X-intercept was the estimated concentration of a metabolite in Qstd3. When the addition of standard did not result in a proportional increase in signal intensity or if no signal was observed in the reference, the metabolite was not quantified. Qstd3 values were then used as the reference for estimation of metabolite concentrations in CHEAR and NIST1950 using the following equation: concentrationsample = concentrationreference × (intensitysample/intensityreference). For 30 of the 220 metabolites quantified in the Qstd3, this approach was not useful because a positive x-intercept was obtained; for these cases, NIST1950 values were used to calibrate Qstd3 and CHEAR. Raw data were not normalized, autoscaled, or normalized to internal standards prior to reference standardization. Further details on quantification methodology are provided in Supporting Information (SI).
Reference Standardization of Representative Metabolites in 3677 Human Plasma Samples.
The data used to evaluate reference standardization were derived from 17 studies, comprising 3677 samples analyzed over a 17-month period. These included fully deidentified human samples from a range of studies and were without demographic or health information. Thus, their comparison provides a blinded analysis in which the same instruments, methods, and personnel analyzed the samples, but the sample collection and characteristics of the samples were independent of the analytical laboratory. HRM data for the 17 studies were aligned using apLCMS.2,3 Qstd3 reference was analyzed with six technical replicates every 20 samples. NIST1950 reference was analyzed at the beginning and end of every study. CHEAR reference was used in 4 studies. Reference standardization was performed batch-wise using Qstd3 values using a customized R-script (https://github.com/kuppal2/xMSquant/tree/master/). Frequency distribution plots were prepared in R using the Kernel Density plot function.
RESULTS
The detection of 467 individual chemicals on one or both platforms was validated (Table S1). Information and examples for classification as detectable, identifiable, and quantifiable are provided in the SI. For those detected, metabolites were identified in the reference material using accurate mass MS1 signal (±3 ppm from theoretical mass for respective adduct form), coelution with authentic standard within 3s, and ion dissociation spectra (MS2/MSn) matching authentic standard (Figure 1A,B). These criteria fulfill level 1 identification according to the Metabolomics Standards Initiative29 and a level 1 confidence score according to Schymanski.30 For some metabolites, as noted in the table, concentrations in the reference were too low to obtain useful MS/MS spectra, and additional criteria were used for identification as described below. Some metabolites with identical molecular formulas could be distinguished from one another by chromatographic separation or use of a diagnostic MS/MS fragment (e.g., valine/betaine); many were not easily distinguishable, however, and were not considered further for quantification. Coeluting isobaric metabolites that could not be reported as a single metabolite are reported as mixtures (e.g., leucine/isoleucine, glucose-1-phosphate/glucose-6-phosphate) or as a generic isomer that encompasses all possible isomers (hexane hexol for galactitol/mannitol/sorbitol). For some coeluting metabolites (alanine, β-alanine) where the major component is expected to be >85% of the total, the data are reported as the major metabolite.
Figure 1.
General workflow for metabolite identification and quantification. Metabolites were identified in reference samples by (A) coelution of an authentic standard and matching MS1 and MS/MS spectra or at a minimum by (B) coelution of an authentic standard with matching MS1. Glycine M+H was outside of the mass range (85−1275 m/z) used for FTMS analysis. (C) Metabolites were quantified in Qstd3 reference using external calibration with a method of standards addition in Qstd3 reference.
Not all identified metabolites in the reference could be quantified. An identified metabolite was considered quantifiable if addition of authentic standard produced an increasing linear response and an extrapolatable negative X-intercept with response characteristics similar to that observed for the pure standard in saline (Figure 1C).
Adduct Selection for Metabolite Quantification.
A single metabolite can generate multiple ions including adducts, isotopes, and source fragments. Thus, selection of the most reproducible and quantifiable ion produced by a chemical is an important consideration for metabolite quantification using reference standardization. Because buffer selection can influence adduct formation, the linear response of multiple adducts per standard using both HILIC/ESI+ and C18/ESI− were evaluated. Over 80% of detected chemical standards produced an increase in instrument response as M+H or M−H ions (Supplemental 1). In HILIC/ESI+, some organic acids formed quantitative M+2Na−H adducts and carbohydrates tended to form M+Na or M+K adducts. In C18/ESI−, carbohydrates tended to form quantitative M+Cl adducts. CoA species were detected as M−2H ions. These data are helpful for annotating metabolites forming non-(+H/−H) adducts. Overall, approximately one-third of metabolites in the MSMLS library were detected in the reference plasma samples.
To evaluate the reproducibility of detection of specific adducts, we calculated the coefficient of variation (%CV) of different inter-reference signal intensities of different ion forms generated by the same metabolite (Figure 2A). The data show that the M+Cl adduct (130% CV NIST1950:Qstd3, 74% CV NIST1950:CHEAR, 86% CV CHEAR:Qstd3) has more variance in quantified values compared to an M−H adduct (14% CV NIST1950:Qstd3, 19% CV NIST1950:CHEAR, 10% CV CHEAR:Qstd3) or an M+CH3COO adduct (15% CV NIST1950:Qstd3, 17% CV NIST1950:CHEAR, 13% CV CHEAR:Qstd3).
Figure 2.
Selection of consistently detected adducts for quantification. (A) Evaluation of reference sample ion ratios of multiple ion forms shows selection of ion form used is important for reference standardization reproducibility. Ion forms with low inter-reference ratios (<20% CV) are selected when available. Data shown are representative of eight batches from the beginning to the end of the chemical validation period. (B) Reproducibility of inter-reference metabolite ratios for preferred adducts. Measured inter-reference metabolite signal intensity ratios are consistent from batch to batch and adequate for quantification of representative metabolites from different chemical classes using reference standardization. Each data-point represents the ratio of metabolite peak areas between two reference samples over eight representative batches collected over a two month time period. The average intersample metabolite ratio CV was 14% for the data shown here.
Inter-reference Metabolite Signal Intensity Ratios Are Reproducible for Preferred Adducts.
For clinical untargeted metabolomics assays, reportable metabolites are recommended to have QC sample coefficients of variation <30%.31 An analogous metric for reference standardization is the CV% of metabolite peak intensity ratios in the reference samples across multiple analytical batches. Previous studies show the ratio for NIST1950 to Qstd3 peak signal intensities for a panel of amino acids are stable over 13 months of routine analysis.1 We calculated ratios of metabolite signal intensities between Qstd3:NIST1950, Qstd3:CHEAR, and CHEAR:NIST1950 for 8 batches spanning the 2-month period during which chemical standards were analyzed. For nine metabolites with diverse characteristics (Figure 2B), all CV% ratios were less than 30%, and all but one were less than 20%. The NIST:CHEAR ratio CV% for carnitine was 24%, but the NIST:Qstd3 and CHEAR:Qstd3 ratios were less than 15% (Figure 2). These results support use of CV% of inter-reference metabolite peak intensity ratios across analytical batches as a means to evaluate the consistency of quantitative information for reference standardization of metabolomics data.
Estimated Concentrations of Metabolites in Qstd3, CHEAR, NIST1950.
We used the most reproducibly detected adduct for each quantifiable metabolite to generate the list of detectable metabolites with quantified values (# in parentheses) for Qstd3 (220), CHEAR (211), and NIST1950 (204) provided in Table S1 and Supplemental 3,4. 92% of the 220 metabolites quantified in Qstd3 were detected in all three reference samples. The list includes metabolites that were detected, identified, and quantified in saline but not quantified in reference samples; this information could be useful for future metabolite identification in other tissues or reference samples. Overall, the list of 467 metabolites provides coverage of 264 pathways and 186 modules in KEGG. These are organized by metabolite class including amino acids and derivatives, organic acids, lipids, sterols, carnitines, vitamins and cofactors, nucleotide-related metabolites, biogenic amines, and metabolites derived from diet, xenobiotics, or other sources.
Consistency of Metabolite Concentrations with NIST1950 and HMDB Values.
We compared the estimated metabolite concentrations obtained by reference standardization for NIST1950 against previously published values and expected ranges compiled in HMDB. Because previous reports show that quantification of metabolites in plasma varies dependent on the choice of anticoagulant,32 we wanted to evaluate whether use of an EDTA (m/z 293.0979 (+H), rt 100s) plasma (Qstd3) as a reference material would provide estimates of metabolite concentrations in heparin plasma (NIST1950) consistent with previously published values. Our data show that most amino acids quantified in NIST1950 resulted in concentration values in agreement with published NIST1950 values. This analysis showed that 75% of estimated amino acids concentrations were within ±25% of certified reference values for NIST1950 (Supplemental 2).
Comparison of representative metabolites in Qstd3, CHEAR, and NIST1950 with expected ranges compiled in HMDB showed that estimated concentrations of metabolites in Qstd3, CHEAR, and NIST1950 were within expected HMDB ranges (Figure 3). While 220 metabolites were quantified in Qstd3, only 137 of these metabolites had previously reported blood ranges in HMDB. Out of the 137 metabolites with previously reported values, over two-thirds of the quantified metabolite concentrations in Qstd3 were within previously reported blood ranges in HMDB. Thus, the estimated concentrations for metabolites detected in Qstd3, CHEAR, and NIST1950 plasma reported here extend-capabilities to test utility of reference standardization to quantify and harmonize data from different studies.
Figure 3.
Consistency of metabolite concentrations with HMDB ranges. Using reference standardization with either Qstd3 or NIST1950 as the reference, 80% of representative metabolite concentrations in reference materials detected at nM and µM concentrations in NIST1950 (solid circle), CHEAR (gray-filled circle), or Qstd3 (open circle) produced values within compiled HMDB ranges (gray rectangles). Overall, approximately 2 out of every 3 estimate of a metabolite concentration in Qstd3 fell within previously published ranges in human plasma. (SA-homocysteine, S-adenosyl-homocysteine; DHA, docosahexaenoic acid).
Reference Standardization for Quantification and Harmonization of Large-Scale Metabolomics data.
For metabolomics data to be “harmonized” or comparable between studies, technical variation due to batch effects need to be minimized and metabolite peak intensities need to be converted to units that can be reproduced across studies or analytical methods. Analytical workflows often consider these points separately. Data from large-scale metabolomics studies cannot generally be combined without adjusting for batch effects,33 and several strategies have been proposed to correct for batch effects based on the use of scaling factors, quality control samples, internal standards, or use of statistical batch effect correction strategies.17,34,35 We tested use of reference standardization with the Qstd3 calibrated reference plasma to perform a single-step correction for systematic technical errors and subsequent quantification for large-scale metabolomics. This was tested with six technical replicates of the Qstd3 reference at the beginning, middle, and end of every 40-sample batch. In comparison to the 3-s within day alignment of metabolites, the median drift in metabolite retention time during the two month period analytical standards were analyzed was up to 8 s with the mass spectral drift less than 2 ppm using the HFQE. These results indicate that accurate mass MS1 and retention times for metabolites with confirmed identity and quantified values in the reference are sufficiently stable over time for quantification of respective metabolites in study samples.
The total variance associated with metabolite measurements is the sum of biological variance and technical variance, with the biological variance represented by the median relative standard deviation (Med-RSD) of study samples and the technical variance represented by the mean relative standard deviation (RSD) of reference samples. Conversion of sample metabolite peak intensities to concentrations minimizes technical variation (decreased Med-RSD) so that the data better represent biological variation (Figure 4A). For example, comparing the tryptophan, uric acid, or methionine median peak intensity in batch 2 versus reference standardized concentrations shows a reduction of batch-wise median variance. Taken together, these data show that batch-wise reference standardization can be used to correct for batch effects for most metabolites in large multibatch studies.
Figure 4.
Application of reference standardization for quantification and harmonization of large-scale metabolomics. (A) Batch-wise reference standardization of 400 study samples analyzed across a 10-batch study over a one-week period. Red dots represent Qstd3 reference samples superimposed over study sample box-and whisker plots depicting the range, quartiles, and medians of 40 study samples. Use of batch-wise reference standardization can reduce median RSD (Med-RSD) for metabolite measurements across a multibatch study—Guanidinoacetate peak intensity Med-RSD: 50%; reference standardized Med-RSD: 18%. Tryptophan peak intensity Med-RSD: 25%; reference standardized Med-RSD: 8%. Uric acid peak intensity Med-RSD: 42%; reference standardized Med-RSD: 19%. Methionine peak intensity Med-RSD: 21%; reference standardized Med-RSD: 12%. (B) Histograms for selected metabolites measured in 3677 study samples analyzed over a 17-month period in our laboratory show reference standardization can harmonize metabolite measurements across different studies. Each colored line (solid line EDTA plasma, dashed lines non-EDTA plasma) represents frequency distributions from individual studies using peak intensities or batch-wise reference standardized concentrations. The study median relative standard deviations decreased with the use of reference standardization compared to use of raw peak intensities alone.
To test the utility of reference standardization for harmonization of metabolomics data from multiple studies, we calculated metabolite concentrations for 4 different metabolites in 3677 plasma samples across 17 separate studies analyzed with the same analytical methodology over a 17-month time period (Figure 4B). Data for each sample were quantified relative to concurrently analyzed Qstd3 samples using reference standardization. In total, 97 batches of samples were analyzed for this evaluation. 79% of the quantified metabolites in the reference samples had a CV% of less than 35%. We did not observe a trend between the CV% and the 17-month run period. Results showed that the distributions were more uniform for most of the metabolites and studies but that harmonization of some metabolites may be influenced by choice of anticoagulant. For instance, guanidinoacetate, uric acid, and tryptophan achieved relatively normalized distributions. However, reference standardized methionine concentrations appear to exhibit a bimodal distribution based on EDTA.
DISCUSSION
Many metabolites are useful health indicators and routinely quantified in targeted mass spectrometry-based assays. In contrast, untargeted HRM methods were developed to capture as broad of a chemical space as possible and rely on computational methods to discover associations of metabolites, metabolic pathways and networks with health and disease phenotypes.6–11 In this study, we expand the list of quantified metabolites to support quantification of 220 metabolites with Qstd3, 204 metabolites with NIST1950, and 211 metabolites with CHEAR pooled reference materials using the reference standardization method. Results show that intersample metabolite peak intensity ratios for validated adducts are consistently measured and demonstrate the ability to provide metabolite concentrations in agreement with previously established NIST1950 values and HMDB ranges. The analyses further show ability to use this approach to harmonize metabolite measurements collected over a 17-month period for more than 3600 individual study samples. The results suggest a generalizable approach to use reference standardization to develop cumulative metabolomics databases suitable for personalized medicine.
Metabolite identification remains a bottleneck in untargeted metabolomics, and community-driven efforts to describe the biochemical composition of reference materials and detectability of chemical standard library metabolites by LC−HRMS methods are important to facilitate progress. In this study, we analyzed over 700 analytical standards and validated the detection of 467 metabolites covering endogenous, microbiome, dietary, drug, and environmental-exposure metabolites on a dual liquid chromatography HILIC/ESI+ and C18/ESI−HRM platform. We delineated metabolites that are detected and quantifiable in three pooled plasma materials from those that were not confidently detected or quantifiable in reference samples. Furthermore, we have identified and quantified metabolites (e.g., valerobetaine) that were not previously listed in HMDB. The increasing number of identified and quantified metabolites, with adduct forms and retention times, increases the opportunity to improve interoperable metabolomics databases to support efforts for precision medicine.
A limitation of reference standardization is that metabolites can only be quantified if they are present and quantified in the reference. Some metabolites not detected may not be present in these reference materials, and others will probably require complementary methods with improved sensitivity and/or selectivity. Use of reference materials with a wide spectrum of exogenous metabolites (e.g., NIST1957/1958 SRM for organic contaminants in human serum or NIST968f, 971a, 972a, 1951c, 1955, 2378, 2973, 3949, 3950 for various endogenous lipids, nutrients, and hormones) will enable reference standardization for metabolites that are not commonly detected in each individual. The creation and use of diverse sets of matrix-matched calibrated pooled reference materials suited for specific purposes will facilitate more targeted analyses,13,36 and enable more global assessments of the spectrum of exogenous chemicals derived from food, xenobiotics, environmental and occupational exposures, personal care products, supplements, and drugs and their biotransformation products.
NIST1950 plasma was developed as a reference material for use with metabolomics studies to facilitate identification and quantification of metabolites.25 Most of the estimated metabolite concentrations presented in this study are consistent with previously published values for NIST1950 and within expected HMDB ranges. Reference ranges for metabolite concentrations vary between laboratories depending on methodology. For example, estimated concentrations for several lipids in the NIST1950 are outside of expected HMDB ranges. Previous studies show differences in lipid and amino acid quantification depending on the anticoagulant used for plasma samples.32 Whether such differences in metabolite quantification observed in this study are due to anticoagulant, or differences in sample preparation or analysis are not clear.
Reference standardization assumes a linear relationship between analyte concentration and instrument response, which was validated for most of the chemicals reported here; exceptions were those calibrated relative to previously published NIST1950 values, where single-point calibration was used. For automated quantification as used in Figure 4B, the method uses single-point calibration and performs comparably to quantification relative to internal standard, which also involves single point calibration.1 By using a pooled reference material, the quantification is biased toward average human values, and the results can be expected to be less quantitatively reliable for relatively high and low values. Also, if the biochemical composition of the reference does not reflect the study samples (different intensity range observed in the reference versus study samples or different matrix), then the estimated concentrations for study sample metabolites will be associated with higher error. For these cases, other analytical (use of appropriate internal standards, additional QC samples, kit methods, improved analyte separation) or postanalytical considerations (normalization, nonlinear models, or use of other scaling factors) would be needed to correct for variation caused by shifts in instrument performance, autosampler issues, ion suppression, or other matrix effects. Such approaches have been described elsewhere and could further improve quantification and harmonization with reference standardization.31
Targeted methods for clinical analysis typically require <30% CV on QC samples and accuracies within 10% of an accepted central value. A recent lipidomics harmonization effort across 31 laboratories using diverse analytical methodologies shows that less than a quarter of lipid species (339/1,527) identified by a single laboratory could be detected by more than 5 laboratories in the NIST1950 SRM.27 After filtering quantified lipids with less than 40% coefficient of dispersion (COD—a measure analogous to relative standard deviation), 259/339 (75%) of identified lipids remained. Another interlaboratory comparison of metabolomics data using a common targeted analytical strategy showed that approximately 80% of metabolites could be measured within 20% of the consensus values.37
The present study shows that reference standardization can be used in an automated workflow to facilitate harmonization of metabolomics data collected across multiple studies. By correcting for batch effects on a chemical-by-chemical basis, reference standardization facilitates harmonization of large-scale metabolomics studies without the trade-off of most batch normalization methods in which improved quality of some metabolites is offset by corruption of data for other metabolites. By converting raw peak intensities to concentrations, reference standardization allows comparison of results obtained from multiple analytical platforms based upon estimates of absolute concentration. Intra- and interlaboratory validation and proficiency testing are well developed for targeted clinical assays and will need to be developed and implemented to enable untargeted metabolomics for medical and other uses.
In conclusion, reference standardization using calibrated reference samples analyzed at predefined intervals with study samples provides a practical and simple method for data normalization and estimation of metabolite concentrations in high-throughput applications. In principle, adoption of this technique could allow untargeted metabolomics data to be comparable across studies and laboratories. This approach is scalable as additional metabolites are characterized in other pooled reference materials, thereby expanding capabilities to harmonize metabolomics for clinical research and other practical applications.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by T32-GM008602, P30-ES019776, R01-ES023485, U2C-ES030163, RC2-DK118619, S10-OD018006, and UH2-AI132345.
Footnotes
ASSOCIATED CONTENT
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.0c00338.
Supporting figures and data (PDF)
Table S1, List of identified and quantified metabolites (PDF)
MSMLS prm masses (XLS)
LC parameters and MS parameters (PDF)
The authors declare no competing financial interest.
Contributor Information
Ken H. Liu, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States.
Mary Nellis, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States.
Karan Uppal, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States.
Chunyu Ma, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States.
ViLinh Tran, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States.
Yongliang Liang, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States.
Douglas I. Walker, Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
Dean P. Jones, Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia 30322, United States
REFERENCES
- (1).Go YM; Walker DI; Liang Y; Uppal K; Soltow QA; Tran V; Strobel F; Quyyumi AA; Ziegler TR; Pennell KD; Miller GW; Jones DP Toxicol. Sci 2015, 148 (2), 531–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Yu T; Jones DP Bioinformatics 2014, 30 (20), 2941–2948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Yu T; Park Y; Johnson JM; Jones DP Bioinformatics 2009, 25 (15), 1930–1936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Uppal K; Soltow QA; Strobel FH; Pittard WS; Gernert KM; Yu T; Jones DP BMC Bioinf 2013, 14, 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Uppal K; Walker DI; Jones DP Anal. Chem 2017, 89 (2), 1063–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Li S; Sullivan NL; Rouphael N; Yu T; Banton S; Maddur MS; McCausland M; Chiu C; Canniff J; Dubey S; Liu K; Tran V; Hagan T; Duraisingham S; Wieland A; Mehta AK; Whitaker JA; Subramaniam S; Jones DP; Sette A; Vora K; Weinberg A; Mulligan MJ; Nakaya HI; Levin M; Ahmed R; Pulendran B Cell 2017, 169 (5), 862–877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Saeedi BJ; Liu KH; Owens JA; Hunter-Chang S; Camacho MC; Eboka RU; Chandrasekharan B; Baker NF; Darby TM; Robinson BS; Jones RM; Jones DP; Neish AS Cell Metab 2020, 31 (5), 956–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Ravindran R; Khan N; Nakaya HI; Li S; Loebbermann J; Maddur MS; Park Y; Jones DP; Chappert P; Davoust J; Weiss DS; Virgin HW; Ron D; Pulendran B Science 2014, 343 (6168), 313–317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Pillai VB; Samant S; Sundaresan NR; Raghuraman H; Kim G; Bonner MY; Arbiser JL; Walker DI; Jones DP; Gius D; Gupta MP Nat. Commun 2015, 6, 6656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Xu X; Araki K; Li S; Han JH; Ye L; Tan WG; Konieczny BT; Bruinsma MW; Martinez J; Pearce EL; Green DR; Jones DP; Virgin HW; Ahmed R Nat. Immunol 2014, 15 (12), 1152–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Rao A; Kosters A; Mells JE; Zhang W; Setchell KD; Amanso AM; Wynn GM; Xu T; Keller BT; Yin H; Banton S; Jones DP; Wu H; Dawson PA; Karpen SJ Sci. Transl. Med 2016, 8 (357), 357ra122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Liu KH; Walker DI; Uppal K; Tran V; Rohrbeck P; Mallon TM; Jones DPJ Occup. Environ. Med 2016, 58, S53–S61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Jones DP Toxicol Rep 2016, 3, 29–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Walker DI; Valvi D; Rothman N; Lan Q; Miller GW; Jones DP Current Epidemiology Reports 2019, 6 (2), 93–103. [PMC free article] [PubMed] [Google Scholar]
- (15).Walker D; Go Y-M; Liu K; Pennell K; Jones D Population Screening for Biological and Environmental Properties of the Human Metabolic Phenotype In Metabolic Phenotyping in Personalized and Public Healthcare; Nicholson J, Darzi A, Holmes E, Lindon JC, Eds.; Elsevier Academic Press: London, 2016; pp 167–211. [Google Scholar]
- (16).De Livera AM; Dias DA; De Souza D; Rupasinghe T; Pyke J; Tull D; Roessner U; McConville M; Speed TP Anal. Chem 2012, 84 (24), 10768–76. [DOI] [PubMed] [Google Scholar]
- (17).De Livera AM; Sysi-Aho M; Jacob L; Gagnon-Bartsch JA; Castillo S; Simpson JA; Speed TP Anal. Chem 2015, 87 (7), 3606–3615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Boysen AK; Heal KR; Carlson LT; Ingalls AE Anal. Chem 2018, 90 (2), 1363–1369. [DOI] [PubMed] [Google Scholar]
- (19).Kamleh MA; Ebbels K; Spagou P; Masson EJ; Want EJ Anal. Chem 2012, 84 (6), 2670–2677, DOI: 10.1021/ac202733q. [DOI] [PubMed] [Google Scholar]
- (20).Li B; Tang J; Yang Q; Li S; Cui X; Li Y; Chen Y; Xue W; Li X; Zhu F Nucleic Acids Res 2017, 45 (W1), W162–W170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).You L; Zhang B; Tang YJ Metabolites 2014, 4 (2), 142–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Weindl D; Wegner A; Jager C; Hiller KJ Chromatogr A 2015, 1389, 112–9. [DOI] [PubMed] [Google Scholar]
- (23).Uppal K; Walker DI; Liu K; Li S; Go YM; Jones DP Chem. Res. Toxicol 2016, 29 (12), 1956–1975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).van der Greef J; Martin-Juhasz S; Juhasz P; Adourian A; Plasterer T; Verheij ER; McBurney RN J. Proteome Res 2007, 6 (4), 1540–1559, DOI: 10.1021/pr0606530. [DOI] [PubMed] [Google Scholar]
- (25).Phinney KW; Ballihaut G; Bedner M; Benford BS; Camara JE; Christopher SJ; Davis WC; Dodder NG; Eppe G; Lang BE; Long SE; Lowenthal MS; McGaw EA; Murphy KE; Nelson BC; Prendergast JL; Reiner JL; Rimmer CA; Sander LC; Schantz MM; Sharpless KE; Sniegoski LT; Tai SSC; Thomas JB; Vetter TW; Welch MJ; Wise SA; Wood LJ; Guthrie WF; Hagwood CR; Leigh SD; Yen JH; Zhang N-F; Chaudhary-Webb M; Chen H; Fazili Z; LaVoie DJ; McCoy LF; Momin SS; Paladugula N; Pendergrast EC; Pfeiffer CM; Powers CD; Rabinowitz D; Rybak ME; Schleicher RL; Toombs BMH; Xu M; Zhang M; Castle AL Anal. Chem 2013, 85 (24), 11732–11738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Wishart DS; Feunang YD; Marcu A; Guo AC; Liang K; Vazquez-Fresno R; Sajed T; Johnson D; Li C; Karu N; Sayeeda Z; Lo E; Assempour N; Berjanskii M; Singhal S; Arndt D; Liang Y; Badran H; Grant J; Serra-Cayuela A; Liu Y; Mandal R; Neveu V; Pon A; Knox C; Wilson M; Manach C; Scalbert A Nucleic Acids Res 2018, 46 (D1), D608–D617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Bowden JA; Heckert A; Ulmer CZ; Jones CM; Koelmel JP; Abdullah L; Ahonen L; Alnouti Y; Armando AM; Asara JM; et al. J. Lipid Res 2017, 58 (12), 2275–2288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Soltow QA; Strobel FH; Mansfield KG; Wachtman L; Park Y; Jones DP Metabolomics 2013, 9 (1 Suppl), S132–S143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Sumner LW; Amberg A; Barrett D; Beale MH; Beger R; Daykin CA; Fan TWM; Fiehn O; Goodacre R; Griffin JL; Hankemeier T; Hardy N; Harnly J; Higashi R; Kopka J; Lane AN; Lindon JC; Marriott P; Nicholls AW; Reily MD; Thaden JJ; Viant MR Metabolomics 2007, 3 (3), 211–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Schymanski EL; Jeon J; Gulde R; Fenner K; Ruff M; Singer HP; Hollender J Environ. Sci. Technol 2014, 48 (4), 2097–8. [DOI] [PubMed] [Google Scholar]
- (31).Broadhurst D; Goodacre R; Reinke SN; Kuligowski J; Wilson ID; Lewis MR; Dunn WB Metabolomics 2018, 14 (6), 72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Khadka M; Todor A; Maner-Smith KM; Colucci JK; Tran V; Gaul DA; Anderson EJ; Natrajan MS; Rouphael N; Mulligan MJ et al. The Effect of Anticoagulants, Temperature, and Time on the Human Plasma Metabolome and Lipidome from Healthy Donors as Determined by Liquid Chromatography-Mass Spectrometry. Biomolecules 2019, 9 (5), 200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Johnson WE; Li C; Rabinovic A Biostatistics 2007, 8 (1), 118–127. [DOI] [PubMed] [Google Scholar]
- (34).Fernańdez-Albert F; Llorach R; Garcia-Aloy M; Ziyatdinov A; Andres-Lacueva C; Perera A Bioinformatics 2014, 30 (20), 2899–2905. [DOI] [PubMed] [Google Scholar]
- (35).Kuligowski J; Sańchez-Illana Á; Sanjuán-Herráez D; Vento M; Quintaś G Analyst 2015, 140 (22), 7810–7817. [DOI] [PubMed] [Google Scholar]
- (36).Jones DP; Park Y; Ziegler TR Annu. Rev. Nutr 2012, 32, 183–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Siskos AP; Jain P; Römisch-Margl W; Bennett M; Achaintre D; Asad Y; Marney L; Richardson L; Koulman A; Griffin JL; Raynaud F; Scalbert A; Adamski J; Prehn C; Keun HC Anal. Chem 2017, 89 (1), 656–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.