Abstract
Rationale:
The precision of the doubly labeled water (DLW) method is determined by the precision and accuracy of the isotopic measurements. Quality control (QC) procedures to mitigate sample variability require additional measurements if sample duplicates differ more than a factor of instrument precision. We explored the effect of widening QC ranges on total daily energy expenditure (TDEE) determined using the two-point sampling method.
Methods:
We screened DLW data from 121 individuals for instances where samples were analyzed more than twice using our existing QC criteria (±2.0 per mil [δ] for 2H and ±0.5 δ for 18O). We then applied wider QC ranges for accepting duplicate measures and recalculated TDEE.
Results:
Widening the 2H QC range to ±10.0 δ in samples collected on the first day (most enriched) and to ±5.0 δ in samples collected on the final day (less enriched) produced almost identical mean TDEE compared to the originally calculated TDEE (2684 ± 508 vs. 2687 ± 512 kcal/day, p = 0.40). There was a strong correlation with the originally calculated TDEE (r2 = 0.97, p < 0.001).
Conclusions:
Expanding the 2H QC range to ±10.0 δ for samples collected on the first day and ±5.0 δ for samples collected on the final day provides similar mean TDEE results. These findings may help DLW labs optimize QC criteria and reduce analytical costs.
1 |. INTRODUCTION
The doubly labeled water (DLW) method is the gold standard for the measurement of free-living total daily energy expenditure (TDEE). TDEE is determined based on the difference in turnover rates of labeled hydrogen (deuterium, 2H) and oxygen (18O) in body water.1,2 The precision of the DLW method depends on the accuracy and precision of isotopic measurements. Quality control (QC) procedures are typically based on instrument precision (i.e., duplicate measurements that differ more than a factor of instrument precision are subjected to additional measurement), which increases operating costs. Currently, DLW laboratories rely on instrument precision to determine QC criteria. To our knowledge, an objective approach to determine the acceptable QC criteria to estimate TDEE using the DLW method has never been considered.
Historically, DLW analysis has been performed using isotope ratio mass spectrometry (IRMS).1 However, off-axis integrated cavity output spectroscopy (OA-ICOS), which uses laser absorption spectroscopy, has emerged as an alternative approach for performing isotopic analysis of biological samples.3,4 OA-ICOS offers several advantages over IRMS including lower instrument cost, less cumbersome sample preparation, and concurrent measurement of 2H and 18O on a single sample. In a previous study, we demonstrated that TDEE measured via DLW with isotope enrichments using OA-ICOS was not different from that measured using isotope measurements from IRMS, nor from the gold standard of 24-h energy expenditure (EE) measured using whole-room indirect calorimetry.3 In the past few years, there has been an increase in the number of laboratories using OA-ICOS for DLW measurements of TDEE.5
As a relatively new laboratory performing DLW analysis with OA-ICOS, we invested a substantial amount of time in optimizing our operating procedures. We have now achieved a level of precision, based on measurements of internal controls that have known enrichments calculated relative to international standards (i.e., Vienna Standard Mean Ocean Water [SMOW]), of 2.0–2.5 per mil (δ) for 2H across a range of enrichments of −100.0 to +1000.0 δ and <0.5 δ for 18O across a range of enrichments of −60.0 to 120.0 δ. This level of precision is similar to that observed with IRMS.3 Thus, we have set our QC criteria for rerunning samples as ±2.0 δ for 2H and ±0.5 δ for 18O. That is, if duplicate measures exceed these values, a third sample is run, and then the average of the three sample measurements (unless one of the samples is an extreme outlier) is used for TDEE analysis. Preliminary analysis of some of our data indicated that duplicate measures that fell outside this tight QC range yielded TDEE results that were nearly identical to those obtained when calculations were performed on samples that fell within this range. Thus, rather than setting QC criteria on a rather arbitrary value, we asked whether we could establish QC criteria on an objective assessment, that is, what is the widest level of agreement between duplicate samples that still yields the correct result? To answer this question, we examined the extent to which widening the QC criteria for duplicates would impact TDEE measures. Minimizing the number of triplicate sample analyses will increase overall throughput and reduce operating costs.
2 |. MATERIALS AND METHODS
2.1 |. Participants and experimental design
Step 1: To estimate reasonable QC cutoffs, we first used a preexisting DLW dataset of samples from 17 individuals collected during a validation study comparing TDEE estimated using DLW with isotope enrichments measured using OA-ICOS against the criterion measure of whole-room calorimetry over 7 days.3 Briefly, a baseline urine sample was obtained before the consumption of the DLW dose water for determination of δ2H and δ18O background (i.e., natural) abundances. Study participants were given an oral dose of 0.25 g of 98 atom percent (98% APE) 18O-labeled water and 0.14 g 99.8% APE 2H-labeled water (Sigma-Aldrich) per kilogram of total body water (TBW; estimated as 73% of fat-free mass [FFM] derived from dual-energy X-ray absorptiometry [DXA]). Two samples were collected 4 and 5 h after consumption of the dose (post-dose; PD4 and PD5), and two additional samples were collected on the final study day (final sample; F4 and F5) at the same time of day as PD4 and PD5. To test different QC ranges, we increased and decreased the measured isotope value to simulate a wider QC range for PD4, PD5, F4, and F5 samples and recalculated TDEE for each of the 17 participants. The applied 2H QC ranges for PD and F were ±3.0, ±5.0, ±7.0, and ±10 δ, respectively. The applied 18O QC ranges for PD and F were ±1.0, ±1.5, and ±2.0 δ, respectively. We also considered different combinations of QC ranges (e.g., ±10PD and ±10F; ±10PD and ±5F) to determine the optimal QC ranges for each time point. Additionally, we expanded the background QC range for both 18O (e.g., ±1, ±1.5, and ±2) and 2H (e.g., ±5 and ±10). The optimal combination was considered to be the widest QC range that produced results that were within ±1% compared to the room calorimeter.
Step 2: To externally validate our newly selected QC ranges, we then used another preexisting DLW dataset of samples (obtained from n = 121 participants) collected from an ongoing two-arm randomized clinical trial (ClinicalTrials.gov identifier NCT03411356) comparing the weight loss efficacy of daily caloric restriction versus intermittent fasting. The study protocol has been previously described.6 The sample collection protocol was as described above. We screened this dataset for all instances where given PD and F samples were analyzed more than twice. We then applied wider QC criteria for accepting duplicate measures and recalculated TDEE using the widest ranges of QC criteria that produced acceptable results compared to the criterion measure of whole-room indirect calorimetry in Step 1.
2.2 |. OA-ICOS analysis of urine samples
Isotopic data from the OA-ICOS analyzer were processed using commercially available Post Analysis Software (Version 4.5.0, ABB, Zurich, Switzerland) as previously described.3 Before analysis, frozen urine samples were thawed and centrifuged at 4000 rpm for 30 min. A 2 mL portion of supernatant was pipetted into cryovials and stored at −80°C for subsequent analysis. Each sample (160 μL) was analyzed for 18O and 2H enrichment using OA-ICOS (ABB, Zurich, Switzerland). During each injection, a 1 μL sample of the supernatant was injected into a heated (~85°C) stainless steel block to produce water vapor, which was then introduced into the OA-ICOS optical cavity. A total of 12 injections per sample were performed, and the final four injections were averaged. Simultaneous measurements of δ2H and δ18O were performed on each injection. Standards and internal controls were interleaved in each DLW run. If the difference between duplicate runs exceeded 2.0 δ for 2H:1H or 0.5 δ for 18O:16O for a given sample, then that sample was run again until acceptable results were achieved.
2.3 |. and TDEE calculation
Total body water (TBW) was calculated as the average dilution spaces of 2H and 18O after correction for isotopic exchange with other body pools.7 Deuterium (kd) and oxygen (ko) turnover rates were calculated by linear regression of the natural logarithm of isotope enrichment as a function of time. All four time points (e.g., PD4, PD5, F4, and F5) were used in the calculation of kd and ko. TBW and were calculated using the intercept method and the equation of Speakman et al.8 TDEE was calculated from rCO2 using the equation of Weir (), where , assuming a respiratory quotient (RQ) of 0.85 and averaged over 7 days. All four combinations of samples were used in the intercept method to obtain final TDEE.
2.4 |. Statistics
Before analysis, data were tested for normality. In Step 1, TDEE from the combination of QC ranges was compared to the criterion whole-room indirect calorimeter using a paired t-test. Turnover rates of 2H (kd) and 18O (ko) were compared to the existing QC criterion (±2.0 per mil for 2H and ±0.5 for 18O) turnover rates using a paired t-test. In Step 2, the original and recalculated TDEE was also compared using a paired t-test. Associations between original and recalculated TDEE was determined using Pearson’s correlation and intraclass correlation coefficients. Bland–Altman plots, which provide a measure of bias and limits of agreement, as well as determining whether the error is associated with the magnitude of the criterion measure, were also utilized. The Bland–Altman analyses were performed using the original TDEE as the criterion measure. Significance for all tests was set at p = 0.05. Analyses were performed using GraphPad Prism (v. 5.03, La Jolla, CA) and SPSS (v.29, Chicago, IL). Data are reported as mean ± SD.
3 |. RESULTS
Step 1: Widening the 2H QC range to ±10.0 δ for both PD and F samples produced unacceptable TDEE and kd/ko results (Table 1). However, widening the 2H QC range for PD samples to ±10.0 δ and widening the QC range for F samples to ±5.0 δ produced acceptable TDEE and kd/ko results (Table 1). Widening the 18O QC range for the PD and/or F beyond 0.5 δ produced unacceptable TDEE and kd/ko results (Table 1). Widening the 2H QC range for background to ±5.0 produced acceptable TDEE and kd/ko results, however, expanding it to ±10.0 δ produced unacceptable TDEE and kd/ko results (Table 2). Widening the 18O QC range for background 18O to ±1.0, 1.5, or 2.0 produced unacceptable results (Table 2).
TABLE 1.
Effect of widening post-dose and final-day 2H and 18O quality control ranges on total daily energy expenditure compared to indirect calorimetry.
| QC changes |
|||||
|---|---|---|---|---|---|
| 2H(δ) | 18O (δ) | kd(h−1) | ko (h 1) | Widened QC TDEE (kcal/d) | % difference from IC |
| ±2 | ±0.5 | 0.00490 ± 0.00124 | 0.00590 ± 0.00138 | ||
| +10 PD, +5 F | 0.00489 ± 0.00122 | 2442 ± 410 | 3.4 | ||
| −10 PD, −5 F | 0.00491 ± 0.00126 | 2413 ± 409 | 2.2 | ||
| +10 PD, +10 F | 0.00480 ± 0.00121* | 2681 ± 439* | 13.5* | ||
| −10 PD, −10 F | 0.00500 ± 0.00128* | 2161 ± 388* | −8.4* | ||
| +1 PD, +1 F | 0.00581 ± 0.00134* | 2168 ± 371* | −8.1* | ||
| −1 PD, −1 F | 0.00600 ± 0.00141* | 2700 ± 433* | 14.3* | ||
| +1.5 PD, +1.5 F | 0.00577 ± 0.00132* | 2056 ± 387* | −12.9* | ||
| −1.5 PD, −1.5 F | 0.00605 ± 0.00143* | 2841 ± 460* | 20.4* | ||
| +2 PD, +2 F | 0.00572 ± 0.00131* | 1919 ± 363* | −18.7* | ||
| −2 PD, −2 F | 0.00610 ± 0.00145* | 2983 ± 488* | 26.4* | ||
Note: Widened QC TDEE was compared to the criterion measure of whole-room indirect calorimetry TDEE (2360 ± 372 kcal/d). Values are means ± SD; n = 17. Comparisons made of widened QC TDEE versus indirect calorimetry by paired t-test. Comparisons made of existing QC criterion ko/kd versus expanded QC ko/kd by paired t-test. Shaded row indicates existing QC criterion (±2.0 δ for 2H and ±0.5 δ for 18O).
Abbreviations: F, final-day; IC, indirect calorimetry; kd, deuterium turnover rate; ko, 18O turnover rate; PD, post-dose; QC, quality control; TDEE, total daily energy expenditure.
Significance was p < 0.05.
TABLE 2.
Effect of widening background 2H and 18O quality control ranges on total daily energy expenditure compared to indirect calorimetry.
| QC changes |
|||
|---|---|---|---|
| 2H (6) | 18O (6) | Widened background TDEE (kcal/d) | % difference from IC |
| +5 | 2297 ± 383 | −2.7 | |
| −5 | 2556 ± 408 | 8.3 | |
| +10 | 2161 ± 377* | −8.4* | |
| −10 | 2681 ± 426* | 13.6* | |
| +1 | 2700 ± 433* | 14.4* | |
| −1 | 2168 ± 371* | −8.1* | |
| +1.5 | 2840 ± 459* | 20.3* | |
| −1.5 | 2042 ± 365* | −13.5* | |
| +2 | 2983 ± 488* | 26.3* | |
| −2 | 1919 ± 363* | −18.7* | |
Note: Widened QC TDEE was compared to the criterion measure of whole-room indirect calorimetry TDEE (2360 ± 372 kcal/d). Values are means ± SD; n = 17. Comparisons made versus indirect calorimetry by paired t-test.
Abbreviations: IC, indirect calorimetry; QC, quality control; TDEE, total daily energy expenditure.
Significance was p < 0.05.
Step 2: Based on these data, we then applied 2H QC limits in the PD samples to ±10.0 δ and in the F samples to ±5.0 δ. Using this combination, the recalculated TDEE results were not significantly different from the originally calculated TDEE (2684 ± 508 kcal/day vs. 2687 ± 512 kcal/day, p = 0.40). There was a strong positive correlation between the recalculated and original TDEE (r2 = 0.97, p < 0.001; intraclass correlation = 0.99, n = 121, p < 0.001, Figure 1). We also analyzed the recalculated TDEE using a Bland–Altman analysis (Figure 2). The Bland–Altman correlation for the original and recalculated TDEE measurements was not significant, indicating no bias (Figure 2), and the 95% limits of agreement between measurements were between −170 and 165 kcal/day.
FIGURE 1.

Correlation between mean original total daily energy expenditure (TDEE) versus recalculated TDEE values.
FIGURE 2.

Bland–Altman plot of the recalculated total daily energy expenditure (TDEE) versus original TDEE.
4 |. DISCUSSION
In this analysis, we demonstrate that widening the 2H QC range in the post-dose samples to ±10.0 δ and final-day samples to ±5.0 δ produced almost identical mean TDEE results compared to the originally calculated TDEE (~2 kcal/day, 0.1% of mean) with narrow 95% limits of agreement (−170 to 165 kcal/day) compared to the originally calculated TDEE. Because the DLW method is best suited for estimating the average TDEE of groups, statistical tests comparing different groups or changes in TDEE over time would yield the same results and interpretation.
The accuracy of TDEE measured using DLW for individuals is highly variable. For example, in our previous ICOS validation study,3 although the mean difference between ICOS measured TDEE compared to room calorimetry was not statistically significant (OA-ICOS: 2427 ± 406 kcal/day; indirect calorimetry: 2360 ± 373 kcal/day), there was wide interindividual variability in the accuracy for different individuals between OA-ICOS and room calorimetry (−270 to +523 kcal/day), respectively. This wide interindividual variability is likely due to the underlying assumptions of the DLW method.2 Nonetheless, in the current analysis, the limits of agreement for the recalculated TDEE were within −170 to 165 kcal/day of the original TDEE (Figure 2), which is similar to the interindividual variation in TDEE estimated from DLW (~150 kcal/day).9 These data support our conclusion that statistical tests and interpretation of datasets would not be impacted by employing these wider QC ranges for 2H.
Widening the 18O QC range for the PD and F measurements produced unacceptable TDEE results (Table 1). Additionally, widening the 18O QC range for the background measurements produced unacceptable TDEE results (Table 2). Small deviations in measured 18O can significantly impact the calculation of TDEE. Thus, it is not surprising that widening the QC range for 18O measurements produced unacceptable results. However, most samples that required reruns in the current analysis were due to variations in 2H, while 18O remained stable. Thus, widening the QC for 18O would likely have negligible effects on the number of samples requiring reruns.
Results of this study are only directly applicable to the two-point DLW method. Other methods, such as multiple sampling, are likely to be far less dependent on individual isotope values along the washout curve. Thus, applying a more liberal QC criterion, such as this study employed, is likely more feasible for such studies.
5 |. CONCLUSION
In conclusion, widening the 2H QC range to ±10.0 δ for post-dose samples and ±5.0 δ for final-day samples produced similar TDEE results compared to ±2.0 δ. Widening the 18O QC range above ±0.5 δ produced unacceptable results. Furthermore, expanding the background QC range for 2H to ±5.0 δ produced acceptable results, while expanding 2H ±10.0 δ and/or 18O >±0.5 δ produced unacceptable results. On the basis of these results, we conclude that expanding the 2H QC range up to ±10.0 δ for post-dose samples and ±5.0 δ for final-day samples while leaving the 18O QC range at ±0.5 δ provides accurate TDEE measurements with the DLW method.
New and noteworthy:
This study provides insight for doubly labeled water laboratories to optimize their quality control criteria and reduce analytical costs. Expanding the 2H QC range to 10.0 δ for samples collected on the first day and 5.0 δ for samples collected on the final day provides similar TDEE results (within 2 kcal/day) compared to a QC range of 2.0 δ.
Funding information
Research reported in this publication was supported by the National Institutes of Health through the National Institute of Diabetes and Digestive and Kidney Diseases, grant/award numbers: P30DK048520, R01DK111622, R43/R44 DK09336, and UL1TR002535.
Footnotes
CONFLICT OF INTEREST STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
DATA AVAILABILITY STATEMENT
Source data for this study are not publicly available due to privacy or ethical restrictions. The source data are available to verified researchers upon request by contacting the corresponding authors and Dr. Victoria A. Catenacci (vicki.catenacci@cuanschutz.edu). DLW samples from an ongoing clinical trial (ClinicalTrials.gov identifier NCT03411356) were used in this analysis. Participants have not consented to the public sharing of their research data. Data are stored in a controlled access repository and available only upon request with appropriate data-sharing agreements in place.
REFERENCES
- 1.Speakman JR. The history and theory of the doubly labeled water technique. Am J Clin Nutr. 1998;68(4):932S. doi: 10.1093/ajcn/68.4.932S [DOI] [PubMed] [Google Scholar]
- 2.Lifson N, McClintock R. Theory of use of the turnover rates of body water for measuring energy and material balance. J Theor Biol. 1966; 12(1):46–74. doi: 10.1016/0022-5193(66)90185-8 [DOI] [PubMed] [Google Scholar]
- 3.Melanson EL, Swibas T, Kohrt WM, et al. Validation of the doubly labeled water method using off-axis integrated cavity output spectroscopy and isotope ratio mass spectrometry. Am J Physiol Endocrinol Metab. 2018;314(2):E124–E130. doi: 10.1152/ajpendo.00241.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Berman ESF, Swibas T, Kohrt WM, et al. Maximizing precision and accuracy of the doubly labeled water method via optimal sampling protocol, calculation choices, and incorporation of 17O measurements. Eur J Clin Nutr. 2020;74(3):454–464. doi: 10.1038/s41430-019-0492-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Reynard LM, Wong WW, Tuross N. Accuracy and practical considerations for doubly labeled water analysis in nutrition studies using a laser-based isotope instrument (off-axis integrated cavity output spectroscopy). J Nutr. 2022;152(1):78–85. doi: 10.1093/jn/nxab324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ostendorf DM, Caldwell AE, Zaman A, et al. Comparison of weight loss induced by daily caloric restriction versus intermittent fasting (DRIFT) in individuals with obesity: study protocol for a 52-week randomized clinical trial. Trials. 2022;23(1):718. doi: 10.1186/s13063-022-06523-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Racette SB, Schoeller DA, Luke AH, Shay K, Hnilicka J, Kushner RF. Relative dilution spaces of 2H- and 18O-labeled water in humans. Am J Physiol. 1994;267(4):E585–E590. doi: 10.1152/ajpendo.1994.267.4.E585 [DOI] [PubMed] [Google Scholar]
- 8.Speakman JR, Yamada Y, Sagayama H, et al. A standard calculation methodology for human doubly labeled water studies. Cell Rep Med. 2021;2(2):100203. doi: 10.1016/j.xcrm.2021.100203 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Schoeller DA, Hnilicka JM. Reliability of the doubly labeled water method for the measurement of total daily energy expenditure in free-living subjects. J Nutr. 1996;126(1):348S-354S. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Source data for this study are not publicly available due to privacy or ethical restrictions. The source data are available to verified researchers upon request by contacting the corresponding authors and Dr. Victoria A. Catenacci (vicki.catenacci@cuanschutz.edu). DLW samples from an ongoing clinical trial (ClinicalTrials.gov identifier NCT03411356) were used in this analysis. Participants have not consented to the public sharing of their research data. Data are stored in a controlled access repository and available only upon request with appropriate data-sharing agreements in place.
