Abstract
Objective:
The doubly labeled water (DLW) intake-balance method estimates energy intake (EI) during weight loss using the time-weighted average of total daily energy expenditure (TDEE) and changes in body energy stores. Because TDEE declines rapidly during the early phase of weight loss, an early additional measurement is recommended. This study aimed to develop regression models that estimate time-weighted TDEE using fewer interim measurements and determine if EI accuracy is maintained during a 12-month weight loss intervention.
Methods:
Data from a behavioral weight loss intervention (Dietary Caloric Restriction versus Intermittent Fasting Trial, “DRIFT”) were used. TDEE, body weight, and body composition were measured at months 0, 1, 6, and 12. Regression models using only 2 or 3 time points were used to estimate time-weighted TDEE at months 6 and 12, respectively. Models were validated using bootstrap sampling, and time-weighted TDEE and percent caloric restriction (%CR) were compared to a reference approach.
Results:
Models demonstrated strong predictive performance (R2 = 0.911–0.982). Limits of agreement with the reference model were 121.1–274.5 kcal/day for TDEE and 4.5–10.3 % for %CR, without significant bias.
Conclusions:
Using a regression modeling approach, we demonstrate the DLW intake-balance method maintains accuracy during weight loss without early-phase TDEE measurements.
Keywords: doubly labeled water, calorie restriction, energy balance, metabolic adaptation, intake-balance method
Introduction
Accurate determination of energy intake (EI) is critical when interpreting the results of weight loss interventions, including assessing dietary adherence, differentiating the relative contribution of diet versus physical activity (PA) to weight loss, and determining the dose-response relationship between percent caloric restriction (%CR) and changes in physiological outcomes. Most methods for estimating EI and %CR during weight loss rely on self-report, which suffers from considerable inaccuracy and bias (1–3). The doubly labeled water (DLW) method is the gold standard for measuring total daily energy expenditure (TDEE) in free-living humans (4, 5). When body weight is stable, average daily EI can be assumed to be equal to TDEE (3, 6). However, during energy imbalance (i.e., weight loss or weight gain), EI does not equal TDEE. In these circumstances, EI can be estimated using the DLW intake-balance method (7–9). This approach estimates EI using the time-weighted average of TDEE (TDEEave) over the measurement period, along with changes in body energy stores (EI = TDEEave + Δ body energy stores).
A limitation of previous DLW intake-balance approaches to estimate EI and %CR during weight loss studies is their requirement for an additional measurement of TDEE within the first 2–4 weeks to capture the rapid reduction in TDEE during the early phase of calorie restriction to accurately estimate the time-weighted TDEEave (10). This early decline in TDEE, known as adaptive thermogenesis, is a physiological process involving a decrease in metabolic rate beyond what would be expected based on changes in body weight (11). We hypothesized that using regression modeling to calculate the time-weighted TDEEave using fewer TDEE measurements would provide an estimate of average EI and %CR similar to that estimated using the reference approach utilizing all TDEE measurements. Thus, the primary objective of this study was to develop linear regression models to estimate the time-weighted TDEEave over 6 and 12 months during a behavioral weight loss intervention using fewer TDEE measurement time points. We then compared the prediction of the time-weighted TDEEave, EI, and %CR during the intervention using the developed linear regression models to the published reference approach that uses all available TDEE data over both 6 and 12 months.
Methods
Study Participants
Participants were adults with overweight or obesity (age 18–60 years, BMI 27–46 kg/m2) enrolled in a 12-month randomized clinical trial designed to compare weight loss generated by daily caloric restriction (DCR) to 4:3 intermittent fasting (4:3 IMF). The trial (Daily Caloric Restriction versus Intermittent Fasting Trial; “DRIFT”) is registered at ClinicalTrials.gov (NCT03411356) and was approved by the Colorado Multiple Institutional Review Board (COMIRB). The primary outcome of this intervention was change in body weight at 12 months. The study protocol, inclusion and exclusion criteria, interventions, and results are described elsewhere (12, 13). Briefly, participants (n=165) were randomized (1:1) into either 4:3 IMF (n=84) or DCR (n=81) for 12 months. Participants in the 4:3 IMF group were instructed to restrict EI from calculated baseline weight maintenance energy requirements by 80% on three non-consecutive days/week with ad libitum intake the other four days (34.3% target weekly energy deficit). Participants in the DCR group were prescribed a diet that reduced daily EI by 34.3% to match the weekly targeted energy deficit of 4:3 IMF. Both groups received a recommendation to increase moderate-intensity PA to 300 min/week over the initial 6 months and to maintain this level of PA for the duration of the study. The current analysis included only a subset of participants from the parent study who completed a 1-month TDEE measurement (n=50).
Total Daily Energy Expenditure
TDEE was measured using the DLW method over seven days at months 0, 1, 6, and 12 (i.e., M0, M1, M6, and M12), as previously described (12). Briefly, a baseline urine sample was obtained before the consumption of the DLW dose to determine δ2H and δ18O background abundances. Study participants were given an oral dose of ~0.18 g of 10 atom percent (10% APE) 18O-labeled water and 0.12 g 99.8% APE 2H-labeled water (Sigma-Aldrich) per kilogram of total body water. Urine samples were collected on the dosing day and on day 7, and were analyzed in duplicate using off-axis integrated cavity output spectroscopy (OA-ICOS; ABB, Zurich, SWI), as previously described (14). Due to the proximity of the M0 and M1 TDEE measurements, isotopic enrichments remained elevated at the M1 DLW dosing period. Therefore, for M1 TDEE calculations, the baseline (M0) isotope abundances were used to calculate turnover rates, and the M1 isotope abundances were used to calculate the dilution spaces. TDEE was calculated using estimated respiratory quotient values of 0.85 and the population dilution space of 1.036 using the intercept method (15).
Body Weight and Body Composition
Body weight and composition were measured at M0, M1, M6, and M12. Body weight was measured to the nearest ±0.1 kg using a calibrated digital scale after an overnight fast. Fat mass (FM) and fat-free mass (FFM) were measured with dual-energy x-ray absorptiometry (DXA; Hologic Discovery W, Apex Software Version 4.5.3 Hologic Inc., Bedford, MA). Scans were performed and analyzed at the Colorado Clinical and Translational Sciences Institute (CCTSI) by the Colorado Nutrition Obesity Research Center (NORC) Energy Balance Assessment Core. The DXA at our site undergoes annual maintenance and calibration; within-subject coefficients of variation (CV) are 0.8 ± 0.6% for FM and 0.5 ± 0.3% for FFM.
Model Development
The models used in our analysis are presented in Table 1. The two reference approaches include M0 and M1 measures plus M6 and/or M12 assessments of TDEE to estimate the time-weighted TDEEave over 6 (6M-Ref) and 12 months (12M-Ref) (10). These reference approaches are time-weighted based on the actual number of days between measures. We then developed linear regression models using fewer measurements of TDEE to approximate the reference approaches to estimate the time-weighted TDEEave over 6 and 12 months. The 6-month regression model (Model 1) incorporates only M0 and M6 measurements of TDEE. The 12-month regression models incorporate TDEE assessments at M0 and M12 (Model 2), M0, M1, and M12 (Model 3), and M0, M6, and M12 (Model 4).
Table 1.
Regression Models and Reference Approaches to Estimate Time-Weighted TDEEave over 6 or 12 Months
| Model | Time-Weighted TDEEave Model | Timepoints Used |
|---|---|---|
| 6 Month Approach | ||
| 6M-Ref | TDEEave= (DaysM0-M1/DaysM0-M6) x [(TDEEM0+ TDEEM1)/2] + (DaysM1-M6/DaysM0-M6) x [(TDEEM1+TDEEM6)/2] | 0 1 6 |
| Model 1 | TDEEave= α + β1(TDEEM0) + β2(TDEEM6) | 0 6 |
| 12 Month Approach | ||
| 12M-Ref | TDEEave = (DaysM0-M1/DaysM0-M12) x [(TDEEM0+ TDEEM1)/2] + (DaysM1-M6/DaysM0-M12) x [(TDEEM1+TDEEM6)/2] + (DaysM6-M12/DaysM0-M12) x [(TDEEM6+TDEEM12)/2] |
0 1 6 12 |
| Model 2 | TDEEave= α + β1(TDEEM0) + β2(TDEEM12) | 0 12 |
| Model 3 | TDEEave= α + β1(TDEEM0) + β2(TDEEM1) + β3(TDEEM12) | 0 1 12 |
| Model 4 | TDEEave = α + β1(TDEEM0) + β2(TDEEM6) + β3(TDEEM12) | 0 6 12 |
Abbreviations are as follows: Time-Weighted Average of Total Daily Energy Expenditure (TDEEave); Total Daily Energy Expenditure (TDEE); Months 0, 1, 6, and 12 (M0, M1, M6, and M12); 6-month Reference Approach (6M-Ref); 12-month Reference Approach (12M-Ref); α (model intercept); β1–3 (coefficients for TDEE at M0, M1, M6, or M12).
Estimating Energy Intake and Percent (%) Calorie Restriction
The average daily change in energy stores (ΔES) was computed from the change in body composition from M0 to M6 or M0 to M12 divided by the exact number of days between measurements, assuming 1 g of FM = 9.3 kcal and 1 g of FFM = 1.1 kcal (16, 17). Baseline (M0) EI was assumed to be equivalent to M0 TDEE. Mean EI over months 0 – 6 and 0 – 12 was calculated by subtracting ΔES from the estimated time-weighted TDEEave over each respective timeframe. Mean %CR over months 0 – 6 and 0 – 12 was calculated as the percentage decrease in EI relative to baseline (%CR(mean) = [1 − EI(mean)/EI(M0)] × 100).
Statistical Analysis
All models were built using only participants with complete data for both dependent and independent variables; no missing data was imputed. Because no external datasets with matching time points during a weight loss intervention were available, we evaluated the performance and generalizability of our models using a bootstrapping approach. We generated 2,000 bootstrap samples by re-sampling the original dataset with replacement (i.e., mimicking external validation). Each bootstrap sample served as a simulated external dataset, providing a robust means to assess the models’ ability to generalize beyond the training data. Specifically, we fit the models to each re-sampled dataset to obtain an apparent estimate of R2 for the sample. We then applied the re-sampled model to the original dataset to obtain an apparent R2. The difference between these apparent R2 values was calculated to assess the ‘optimism’ for each bootstrap sample. Optimism quantifies the extent to which model performance on training data overestimates its performance on new data, providing a direct measure of generalizability and validity (18, 19). Furthermore, optimism represents the difference between the apparent performance (i.e., performance on the training data) and the expected performance on new, unseen data. High optimism scores (>0.10) indicate substantial overfitting, where the model performs considerably better on training data than on independent datasets. Conversely, a very low optimism score (<0.01) suggests minimal bias, indicating that the model’s estimated performance closely approximates its true generalizability in real-world settings.
The average ‘optimism’ from the 2,000 samples was computed to calculate the optimism-corrected R2. This approach helped us estimate the models’ predictive accuracy while accounting for potential overfitting. Using 2,000 bootstrap samples, an ample number according to established guidelines (20, 21), we could confidently assess the stability, reliability, and generalizability of our models’ performance across different data variations. Bland-Altman regression analysis was used to evaluate the agreement between mean differences for %CR between models.
Results
Study Participants
Of the 165 randomized participants in the parent study, 50 participants completed DLW and DXA measurements at M0, M1, and M6, and 47 participants completed DLW and DXA measurements at M0, M1, M6, and M12 (Figure 1). Therefore, data from 50 participants were used to develop Model 1, and data from 47 participants were used to develop Models 2 – 4. Table 2 presents the baseline characteristics of participants used in the model development. Figure 2 illustrates the relative changes, and Supplemental Table 1 displays the absolute changes in body weight, body composition, and TDEE observed during the 12-month intervention. Average TDEE at M0, M1, M6, and M12 was 2649 ± 484 (n=50), 2546 ± 513 (n=50), 2600 ± 490 (n=50), and 2615 ± 581 kcal/d, respectively (n = 47). TDEE declined ~100 kcal/d from M0 to M1, increased by ~50 kcal/d between M1 and M6, and stabilized through M12.
Figure 1.

Participant flow diagram and study timeline for secondary analysis.
Abbreviations are as follows: 4:3 Intermittent Fasting (4:3 IMF); Daily Caloric Restriction (DCR); Doubly Labeled Water (DLW); Dual-energy X-ray Absorptiometry (DXA); Total Daily Energy Expenditure (TDEE); Months 0, 1, 6, and 12 (M0, M1, M6, and M12).
Table 2.
Baseline Characteristics of Study Population Included in the Time-Weighted TDEEave Models
| Characteristic | 6M – Reference* n = 50 | 12M – Reference** n = 47 |
|---|---|---|
| Age (y) (mean ± SD) | 43.3 ± 8.6 | 43.4 ± 8.7 |
| BMI (kg/m2) | 33.6 ± 4.1 | 33.7 ± 4.0 |
| Sex (n, %) | ||
| Male | 14, 28% | 13, 28% |
| Female | 36, 72% | 34, 72% |
| Race | ||
| White | 45, 90% | 42, 89% |
| Asian | 1, 2% | 1, 2% |
| Other | 4, 8% | 4, 9% |
| Ethnicity (n, %) | ||
| Hispanic or Latino | 9, 18% | 8, 17% |
| Non-Hispanic | 41, 81% | 39, 83% |
Results are mean ± SD; Abbreviations are as follows: Body Mass Index (BMI); Months 0, 1, 6, and 12 (M0, M1, M6, and M12); 6-month Reference Approach (6M-Ref); 12-month Reference Approach (12M-Ref).
Figure 2.

Changes in body weight, body composition, and TDEE after 1, 6, and 12 months.
A. Data (n=50) used to develop the 6M-Ref and Model 1. B. Data (n=47) used to develop the 12M-Ref and Models 2 – 4.
a Results (mean ± SD) are from linear mixed effects model with unstructured covariance.
Model Performance
Table 3 displays the performance and bias of the regression models. Results indicate all models demonstrated strong predictive ability. In all four models, optimism ranged from 0.002–0.008, indicating negligible bias and minimal risk of overfitting. Subtracting the estimate of optimism from the apparent performance, the optimism-corrected predictive performance estimates of models were 0.923 (Model 1), 0.911 (Model 2), 0.975 (Model 3), and 0.982 (Model 4), respectively. These results indicate that each model provides a robust estimate of the time-weighted TDEEave. Regression model equations and coefficients are provided in Supplemental Table 1.
Table 3.
Performance of Regression Models to an External Dataset
| Model | Model Equation to Estimate Time-Weighted TDEEave | Naive R2 | Optimism | Optimism-Corrected R2 | Timepoints Used |
|---|---|---|---|---|---|
| Model 1 | 41.76 + 0.33(TDEEM0) + 0.64(TDEEM6) | 0.926 | 0.002 | 0.923 | 0 6 |
| Model 2 | 224.79 + 0.37(TDEEM0) + 0.53(TDEEM12) | 0.919 | 0.008 | 0.911 | 0 12 |
| Model 3 | 122.45 + 0.08(TDEEM0) + 0.41(TDEEM1) + 0.46(TDEEM12) | 0.978 | 0.004 | 0.975 | 0 1 12 |
| Model 4 | −15.7 + 0.2(TDEEM0) + 0.63(TDEEM6) + 0.17(TDEEM12) | 0.984 | 0.002 | 0.982 | 0 6 12 |
Abbreviations are as follows: Time-Weighted Average of Total Daily Energy Expenditure (TDEEave); Total Daily Energy Expenditure (TDEE); Months 0, 1, 6, and 12 (M0, M1, M6, and M12); Pearson Correlation Coefficient (R2).
Regression Model Performance Compared to Reference Approaches
The agreement between estimated TDEEave using the reference approaches and regression models was evaluated using Bland-Altman analyses, as summarized in Figure 3. The two regression models using only pre- and post-intervention TDEE assessments produced limits of agreement of 253.7 kcal/d; (Fig 3a) compared to the 6M-Ref and 274.5 kcal/d; (Fig 3b) compared to the 12M-Ref. Including interim TDEE assessment of M1 (Model 3) produced limits of agreement of 141.8 kcal/d; (Fig 3c) compared to the 12M-Ref and including M6 (Model 4) produced limits of agreement of 121.1 kcal/d; (Fig 3d) compared to the 12M-Ref. Adjusting for intervention arm, sex, or physical activity did not improve the performance of the regression models (data not shown). Exploratory analyses including sex-by-treatment interaction terms showed no improvement in model fit or predictive accuracy.
Figure 3.


Legend. Bland-Altman comparisons of the time-weighted average of TDEE computed using the regression models and reference approaches. Time points include M0, M1, M6, and M12. A. Model 1 (M0 and M6 TDEE) vs 6M-Ref, B. Model 2 (M0 and M12 TDEE) vs 12M-Ref, C. Model 3 (M0, M1, and M12 TDEE) vs 12M-Ref, and D. Model 4 (M0, M6, and M12 TDEE) vs 12M-Ref.
Estimating Time-Weighted Average TDEE, Energy Intake, and % Calorie Restriction
The estimated time-weighted TDEEave, EI, and %CR from the reference approaches and regression models are shown in Table 4. The regression models (Models 1 – 4) produced similar results to the reference approaches (6M-Ref and 12M-Ref), with no significant differences in average TDEE, EI, and %CR. The time-weighted TDEEave from Model 1 was 2580 ± 457 kcal/d, while the time-weighted TDEEave from the 6M-Ref was 2581 ± 475 kcal/d (within 1 kcal/d). The average EI and %CR from these models were 2329 ± 484 and 11.9% (95% CI: 8.9 to 14.8%CR) vs 2329 ± 501 kcal/d and 11.9% (95% CI: 8.6 to 15.1%CR), respectively. The time-weighted TDEEave from Model 2 was 2589 ± 471 kcal/d, while the time-weighted TDEEave from the 12M-Ref was 2594 ± 492 kcal/d. The average EI and %CR from these models were 2451 ± 466 and 7.1% (95% CI: 4.7 to 9.5%CR) vs 2455 ± 493 and 6.9% (95% CI: 4.1 to 9.8%CR), respectively. Model 3 and Model 4 produced similar time-weighted TDEEave, EI, and %CR estimates compared to both Model 2 and 12M-Ref.
Table 4.
Estimated Time-Weighted TDEEave, EI, and % CR from Reference Approaches and Regression Models
| Model | Approach for Quantifying EI and % CR | TDEE Timepoints | Δ Energy Stores | Estimated TDEEavea | Estimated EIa | Estimated %CR (95% CI)fb |
|---|---|---|---|---|---|---|
| 6 Month Approach | ||||||
| 6M-Ref | 6-Month TDEEM0,M1,M6 + 6-month Δ body composition | M0 M1 M6 | −252.1 ± 223.8 | 2580.5 ± 474.6 | 2329.4 ± 501.1 | 11.9 (8.6 to 15.1) |
| Model 1 | 6-Month TDEEM0,M6 + 6-month Δ body composition | M0 M6 | 2580.3 ± 456.6 | 2329.0 ± 483.6 | 11.9 (8.9 to 14.8) | |
| 12 Month Approach | ||||||
| 12M-Ref | 12-Month TDEEM0,M1,M6,M12 + 12-month Δ body composition | M0 M1 M6 M12 | −138.3 ± 146.4 | 2593.6 ± 491.6 | 2455.3 ± 493.0 | 6.9 (4.1 to 9.8) |
| Model 2 | 12-Month TDEEM0,M12 + 12-month Δ body composition | M0 M12 | 2589.4 ± 470.5 | 2451.1 ± 466.3 | 7.1 (4.7 to 9.5) | |
| Model 3 | 12-Month TDEEM0,M1,M12 + 12-month Δ body composition | M0 M1 M12 | 2586.1 ± 485.0 | 2447.8 ± 482.8 | 7.2 (4.5 to 9.9) | |
| Model 4 | 12-Month TDEEM0,M6,M12 + 12-month Δ body composition | M0 M6 M12 | 2594.1 ± 488.0 | 2455.8 ± 489.9 | 6.9 (4.1 to 9.7) | |
Results (mean ± SD) are from linear mixed effects model with unstructured covariance.
Results (mean ± 95% CI) are from linear mixed effects model with unstructured covariance.
Abbreviations are as follows: Time-Weighted Average of Total Daily Energy Expenditure (TDEEave); Total Daily Energy Expenditure (TDEE); Energy Intake (EI); Percent Calorie Restriction (% CR); Change in Energy Stores (Δ Energy Stores); Months 0, 1, 6, and 12 (M0, M1, M6, and M12); 6-month Reference Approach (6M-Ref); 12-month Reference Approach (12M-Ref).
Discussion
This study presents regression models to estimate the time-weighted TDEEave for use in the DLW intake-balance method calculation of EI and %CR in participants with overweight or obesity undergoing a 12-month behavioral weight loss intervention that incorporated both CR and PA. The four regression models performed well based on the results of the bootstrapping analysis and compared to the previously published reference approach (10). Results of the current study indicate that, on a group level, all four regression models provide an accurate estimate of the time-weighted TDEEave, EI, and %CR over 6 and 12 months. Thus, using these regression models for estimating the time-weighted TDEEave in the DLW intake-balance method calculations will be less costly and yield relatively accurate results for estimating the time-weighted TDEEave, EI, and %CR without the need for interim assessments. However, the accuracy of %CR estimates can be improved by 1) obtaining a M1 TDEE measurement in a 6-month intervention; and 2) obtaining a 6M TDEE measurement in a 12-month intervention. The findings from these analyses have significant relevance for methodology surrounding the DLW intake-balance method to accurately determine EI and %CR in behavioral weight loss trials with both diet and exercise prescriptions.
The four regression models developed in this study (Models 1 – 4) were assessed for their predictive performance using optimism-corrected R2 to account for potential overfitting. The models exhibited exceptionally low optimism values (ranging from 0.002 to 0.008) and high optimism-corrected R2 values (ranging from 0.911 – 0.982), demonstrating strong predictive robustness and generalizability to external (i.e., real-world) datasets. While Model 4 (using M0, M6, and M12 TDEE assessment time points) exhibited the highest performance, all models showed minimal optimism, suggesting they are well-calibrated and robust against overfitting. Future studies should validate these models using additional external datasets or refine model specifications to ensure their predictive power holds true across different populations or study designs.
The previously published reference approach was developed using data from 6-month interventions (10), however, the accuracy of these models over longer intervention periods (e.g., 12 months) was unknown. Racette et al. used a subset of data collected from the CALERIE study, a controlled clinical trial designed to examine the effects of 25% CR over two years. In this subset, participants received food provisions and incentives to enhance and facilitate adherence to the 25% CR prescription, and no structured PA intervention was included. The researchers compared three quantification approaches to estimate average EI and %CR using different combinations of TDEE time points (i.e., M0, M1, M3, and M6) in conjunction with different methods assessing changes in energy stores (DXA) to a reference approach using four TDEE time points (i.e., M0, M1, M3, and M6). Specifically, they tested models with TDEE and DXA measured at 1) M0, M3, and M6, 2) M0, M1, and M6, and 3) M0 and M6. In the Racette et al. study, compared to the reference approach, exclusion of both interim TDEE measurements (i.e., M1 and M3) produced unacceptably high EI and low %CR. In contrast, our study in adults with overweight and obesity, average EI and %CR agreed very well in all regression models, while Model 4 (i.e., M0, M6, and M12) demonstrated the best agreement. These differences may be due to our pragmatic study design of the parent DRIFT trial, where food was not provided, dietary adherence was lower, and PA was increased, which may have attenuated the rapid TDEE decline at M1 observed in the Racette et al. subset. Moreover, Racette et al. studied individuals without obesity (BMI eligibility: 23.5–29.9 kg/m2), whereas DRIFT included participants with overweight or obesity (BMI eligibility: 27–46 kg/m2). The greater heterogeneity in our sample may explain the higher variability in EI, TDEE, body composition, and body weight. This distinction is an important strength of our study, as it suggests our models may be more generalizable to populations with overweight or obesity, a demographic more representative of individuals typically enrolled in behavioral weight loss interventions. Furthermore, the free-living nature of our intervention more accurately reflects real-world conditions than studies involving food provisions, thereby enhancing the generalizability and translational relevance.
The time-weighted reference approach for estimating average 6-month TDEE, EI, and %CR in the current study included measurements of TDEE at three time points (6M-Ref 1; M0, M1, and M6 + Δ body composition). Compared to Model 1, using only M0 and M6 time points, the estimated average TDEE, EI, and %CR results were similar on a group level. However, in agreement with Racette et al., there was a mild indication of proportional (i.e., systematic) bias (p = 0.06) while comparing estimated %CR differences from Model 1 and 6M-Ref, indicating the differences between this model may increase or decrease as %CR values change with greater magnitude on an individual level. The time-weighted reference approach for average 12-month TDEE, EI and %CR included four time points (12M-Ref; M0, M1, M6, and M12 + Δ body composition). Compared to Models 2 – 4, using fewer time points, the results were similar on a group level. However, similar to Model 1, Model 2 showed significant systematic bias (p = 0.02) compared to 12M-Ref, indicating the accuracy of Model 2 depends on the magnitude of %CR on an individual level. This discrepancy has been observed in DLW studies, suggesting that inter-individual variability likely arises from the underlying assumptions of the DLW method (5). While the DLW method produces a high level of accuracy at the group level, it displays notable variability at the individual level (9). This suggests models relying solely on pre- and post-intervention TDEE measurements may over- or underestimate the degree of %CR and EI on an individual basis. Therefore, when individual-level accuracy is desired, a M1 TDEE measurement is recommended for 6-month interventions, and either a M1 or M6 TDEE measurement is recommended for 12-month interventions. However, given that the primary objective of clinical trials is to assess intervention efficacy based on group averages, the precision and reliability of group-level estimates take precedence. Therefore, on a group level, all four regression models may be used interchangeably to obtain accurate EI and %CR estimates using the DLW intake-balance method over 6 or 12 months.
A potential limitation of this study and previous studies is the absence of a true external dataset for model validation. Instead, we generated 2,000 bootstrap samples as a proxy to simulate external data, which is widely considered sufficient to provide a robust and reliable estimate of model performance (20, 21). We selected bootstrap re-sampling over leave-one-out cross-validation or data-splitting methods (e.g., 1/3 training, 2/3 testing) due to its superior ability to provide stable, low-variance estimates of model performance, particularly in studies with modest sample sizes. Unlike these methods, bootstrap validation leverages the entire dataset for both model development and evaluation while directly quantifying optimism, therefore offering a more reliable assessment of model generalizability and overfitting. While bootstrap results provide strong internal validity and confidence in our models’ predictive accuracy, future studies should externally validate these in independent, more diverse cohorts for broader generalizability. This is particularly important given the demographic composition of our sample was predominately female (72%), White (>85%), and similar to Racette et al. (75% female; race/ethnicity not reported). We formally tested for sex-by-treatment interaction effects and found no evidence of effect modification.
Similar to Racette et al., another limitation is the absence of a comparative measure of objective EI. Although the DLW method is widely regarded as the gold standard for quantifying TDEE in free-living individuals (22), it is subject to inherent analytical and physiological errors that cumulatively result in a margin of error of approximately ±5% (23), which may be exacerbated during weight loss or weight regain (24). To mitigate such limitations, this study adhered to rigorous protocols for DLW isotope administration, sample collection, and analysis. DXA-based body composition assessment introduces measurement variability, particularly when detecting small changes in FM over short intervals that approach the methodological resolution limit, potentially compromising EI estimates (9). However, this error decreases with longer measurement intervals and larger changes in body weight. In this study, the 12-month duration permitted more substantial alterations in body weight, thereby enhancing the fidelity of EI and %CR estimates derived from DLW and DXA (25). Additionally, TDEE was assessed over a 7-day interval, which, although at the lower bound of the optimal 1–3 biological half-lives of isotopes (typically 7–21 days in adults), aligns with methodological precedent and practical considerations. The International Atomic Energy Agency (IAEA) protocol confirms that a 7-day protocol corresponds to ~3 biological half-lives of 2H disappearance, supporting its application (26). TDEE was calculated using a fixed RQ of 0.85, reflecting typical substrate oxidation under mixed-diet conditions. This assumption may introduce minor errors but is unlikely to significantly affect estimates due to the modest caloric restriction in our sample. These models require validation in larger, more representative samples of participants undergoing both energy restriction and increased PA prescriptions. Although the sample was predominantly female (72% female), similar to Racette et al. (75% female), exploratory analyses did not indicate a systemic influence of sex on model performance.
An important methodological consideration in the interpretation of our findings, as well as that of Racette et al., is the known intra-individual variability associated with repeated TDEE measurements using the DLW method. Schoeller and colleagues have reported that the typical variation between repeated DLW assessments in weight-stable individuals is approximately ± 150 kcal/d, reflecting both biological variability and measurement error (27). This level of variability represents an inherent limitation in the precision of estimating TDEE, thereby potentially inflating the variability in estimating TDEEave, EI, and %CR. In the current analysis, the 95% limits of agreement for Model 3 and Model 4 fell within the expected measurement variability of ± 150 kcal/d compared to the reference approach. The 95% limits of agreement for Model 1 and Model 2 fell within ± 250 – 275 kcal/d compared to the reference approaches. Accordingly, a portion of the 95% limits of agreement observed in our Bland-Altman analyses can be attributed to the established measurement error inherent to the DLW method. Recognizing this limitation emphasizes the importance of developing alternative methods that reduce measurement frequency while preserving accuracy, thereby enabling broader application of the DLW intake-balance method in large-scale, resource-limited clinical trials. Given intra-individual variability in DLW-based TDEE estimates, integrating wearable device data (i.e., continuous heart rate and activity) may enhance precision and provide valuable behavioral context that could help explain day-to-day fluctuations in TDEE. Combined with machine learning, these measures enable individualized TDEE modeling while reducing reliance on repeated DLW assessments (28).
The use of the DLW method for estimating the time-weighted TDEEave and DXA-based measurement of ES for calculations of average EI and %CR is often cost-prohibitive in large-scale clinical trials due to the substantial cost, analytic resources, staff effort, and participant burden. To address this, our study aimed to develop a more cost-effective approach by estimating the time-weighted TDEEave, EI, and %CR over 6 and 12 months using fewer DLW data collection points than reference approaches including 1-month measurements. By eliminating interim time points, once thought necessary, these models perform similarly to the reference approaches on a group level and reduce the logistical and financial burden associated with the DLW intake-balance method. These new models may offer a practical solution for more efficient implementation of estimating average TDEE, EI, and %CR in large-scale clinical trials.
Conclusion
Regression models using fewer TDEE measurements to estimate the time-weighted average of TDEE during a 1-year behavioral weight loss intervention provide acceptable results for use in the DLW intake-balance method calculations of EI and %CR on a group level. On an individual basis, the inclusion of a 1-month or 6-month TDEE measure provided the most accurate %CR estimates compared to the reference approaches over 1 year, as these time points better capture dynamic changes in TDEE due to adaptive thermogenesis. These models may inform EI and %CR calculations using the DLW intake-balance method in weight loss trials with fewer TDEE measurements over a 1-year period.
Supplementary Material
Acknowledgements
EM, VC, DO, and SC designed research; VC, DO, SC, JD and MB conducted research; ZP and MB analyzed data and performed statistical analysis; and MB wrote the paper. EM had primary responsibility for final content. All authors read and approved the final manuscript.
Funding:
This work was supported by grants from the National Institutes of Health: R01 DK111622, P30 DK048520, UL1 TR002535.
Abbreviations:
- %CR
percent calorie restriction
- APE
atom percent
- Approx
approximation
- CCTSI
Colorado Clinical and Translational Sciences Institute
- CV
coefficient of variation
- DCR
daily caloric restriction
- DLW
doubly labeled water
- DXA
dual-energy x-ray absorptiometry
- EI
energy intake
- ES
energy stores
- FFM
fat free mass
- FM
fat mass
- IAEA
International Atomic Energy Agency
- IMF
intermittent fasting
- kcal
kilocalorie
- NORC
Nutrition Obesity Research Center
- OA-ICOS
off-axis integrated cavity output spectroscopy
- PA
physical activity
- Ref
reference
- TDEE
total daily energy expenditure
- TDEEave
average of total daily energy expenditure
Footnotes
Author Disclosures: The authors declare that the research was conducted in the absence of any commercial of financial relationships that could be construed as a potential conflict of interest. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Declaration of Generative AI and AI-assisted technologies in the writing process: During the preparation of this work the author(s) used no AI-assisted technologies.
Clinical Trial Registry Number and Website Obtained: NCT03411356; https://clinicaltrials.gov/study/NCT03411356?term=NCT03411356&rank=1
Data Availability*:
Data described in the manuscript, code book, and analytic code will be made available upon request pending application and approval.
References
- 1.Hill RJ, Davies PS. The validity of self-reported energy intake as determined using the doubly labelled water technique. Br J Nutr. 2001;85(4):415–30. doi: 10.1079/bjn2000281. [DOI] [PubMed] [Google Scholar]
- 2.Dhurandhar NV, Schoeller D, Brown AW, Heymsfield SB, Thomas D, Sorensen TI, et al. Energy balance measurement: when something is not better than nothing. Int J Obes (Lond). 2015;39(7):1109–13. Epub 20141113. doi: 10.1038/ijo.2014.199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Trabulsi J, Schoeller DA. Evaluation of dietary assessment instruments against doubly labeled water, a biomarker of habitual energy intake. Am J Physiol Endocrinol Metab. 2001;281(5):E891–9. doi: 10.1152/ajpendo.2001.281.5.E891. [DOI] [PubMed] [Google Scholar]
- 4.Speakman J Doubly labelled water: theory and practice: Springer Science & Business Media; 1997. [Google Scholar]
- 5.Lifson N, McClintock R. Theory of use of the turnover rates of body water for measuring energy and material balance. Journal of theoretical biology. 1966;12(1):46–74. [DOI] [PubMed] [Google Scholar]
- 6.Larsson CL, Johansson GK. Dietary intake and nutritional status of young vegans and omnivores in Sweden. Am J Clin Nutr. 2002;76(1):100–6. doi: 10.1093/ajcn/76.1.100. [DOI] [PubMed] [Google Scholar]
- 7.Schoeller DA. The energy balance equation: looking back and looking forward are two very different views. Nutr Rev. 2009;67(5):249–54. doi: 10.1111/j.1753-4887.2009.00197.x. [DOI] [PubMed] [Google Scholar]
- 8.Thomas DM, Schoeller DA, Redman LA, Martin CK, Levine JA, Heymsfield SB. A computational model to determine energy intake during weight loss. Am J Clin Nutr. 2010;92(6):1326–31. Epub 20101020. doi: 10.3945/ajcn.2010.29687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.de Jonge L, DeLany JP, Nguyen T, Howard J, Hadley EC, Redman LM, Ravussin E. Validation study of energy expenditure and intake during calorie restriction using doubly labeled water and changes in body composition. Am J Clin Nutr. 2007;85(1):73–9. doi: 10.1093/ajcn/85.1.73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Racette SB, Das SK, Bhapkar M, Hadley EC, Roberts SB, Ravussin E, et al. Approaches for quantifying energy intake and %calorie restriction during calorie restriction interventions in humans: the multicenter CALERIE study. Am J Physiol Endocrinol Metab. 2012;302(4):E441–8. Epub 20111129. doi: 10.1152/ajpendo.00290.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rosenbaum M, Leibel RL. Adaptive thermogenesis in humans. Int J Obes (Lond). 2010;34 Suppl 1(0 1):S47–55. doi: 10.1038/ijo.2010.184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ostendorf DM, Caldwell AE, Zaman A, Pan Z, Bing K, Wayland LT, et al. Comparison of weight loss induced by daily caloric restriction versus intermittent fasting (DRIFT) in individuals with obesity: study protocol for a 52-week randomized clinical trial. Trials. 2022;23(1):718. Epub 20220829. doi: 10.1186/s13063-022-06523-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Catenacci VA, Ostendorf DM, Pan Z, Kaizer LK, Creasy SA, Zaman A, et al. The Effect of 4:3 Intermittent Fasting on Weight Loss at 12 Months : A Randomized Clinical Trial. Ann Intern Med. 2025. Epub 20250401. doi: 10.7326/ANNALS-24-01631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Breit MJ, Duncan NM, Dahle JH, Catenacci VA, Creasy SA, Berman ES, et al. Improving quality control procedures for the measurement of total daily energy expenditure using the two-point doubly labeled water method. Rapid Commun Mass Spectrom. 2024;38(19):e9886. doi: 10.1002/rcm.9886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Speakman JR, Yamada Y, Sagayama H, Berman ESF, Ainslie PN, Andersen LF, et al. A standard calculation methodology for human doubly labeled water studies. Cell Rep Med. 2021;2(2):100203. Epub 20210216. doi: 10.1016/j.xcrm.2021.100203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tataranni PA, Harper IT, Snitker S, Del Parigi A, Vozarova B, Bunt J, et al. Body weight gain in free-living Pima Indians: effect of energy intake vs expenditure. Int J Obes Relat Metab Disord. 2003;27(12):1578–83. doi: 10.1038/sj.ijo.0802469. [DOI] [PubMed] [Google Scholar]
- 17.Pullar JD, Webster AJ. The energy cost of fat and protein deposition in the rat. Br J Nutr. 1977;37(3):355–63. doi: 10.1079/bjn19770039. [DOI] [PubMed] [Google Scholar]
- 18.Steyerberg EW. Overfitting and optimism in prediction models. 2009 2009. In: Clinical Prediction Models [Internet]. New York, NY: Springer Nature. 1st Edition. [83–100]. [Google Scholar]
- 19.Strandberg R, Jepsen P, Hagstrom H. Developing and validating clinical prediction models in hepatology - An overview for clinicians. J Hepatol. 2024;81(1):149–62. Epub 20240324. doi: 10.1016/j.jhep.2024.03.030. [DOI] [PubMed] [Google Scholar]
- 20.Bootstrap Hesterberg T.. WIREs Comp Stat. 2011;3(6):497–526. doi: 10.1002/wics.182. [DOI] [Google Scholar]
- 21.Iba K, Shinozaki T, Maruo K, Noma H. Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models. BMC Med Res Methodol. 2021;21(1):9. Epub 20210107. doi: 10.1186/s12874-020-01201-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Standing Committee on the Scientific Evaluation of Dietary Reference Intakes SoI, Uses of Dietary Reference Intakes, Subcommittee on Upper Reference Levels of Nutrients, Panel on the Definition of Dietary Fiber, & Panel on Macronutrients. Dietary reference intakes for energy, carbohydrate, fiber, fat, fatty acids, cholesterol, protein, and amino acids: National Academies Press; 2005.
- 23.Trabulsi J, Troiano RP, Subar AF, Sharbaugh C, Kipnis V, Schatzkin A, Schoeller DA. Precision of the doubly labeled water method in a large-scale application: evaluation of a streamlined-dosing protocol in the Observing Protein and Energy Nutrition (OPEN) study. Eur J Clin Nutr. 2003;57(11):1370–7. doi: 10.1038/sj.ejcn.1601698. [DOI] [PubMed] [Google Scholar]
- 24.Del Corral P, Chandler-Laney PC, Casazza K, Gower BA, Hunter GR. Effect of dietary adherence with or without exercise on weight loss: a mechanistic approach to a global problem. J Clin Endocrinol Metab. 2009;94(5):1602–7. Epub 20090303. doi: 10.1210/jc.2008-1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Westerterp KR, Donkers JH, Fredrix EW, Boekhoudt P. Energy intake, physical activity and body weight: a simulation model. Br J Nutr. 1995;73(3):337–47. doi: 10.1079/bjn19950037. [DOI] [PubMed] [Google Scholar]
- 26.Prentice AM. The doubly labelled water method for measuring energy expenditure, technical recommendations for use in humans (IAEA-NAHRES-4). Vienna, Austria: International Atomic Energy Agency; 1990. [Google Scholar]
- 27.Schoeller DA, Hnilicka JM. Reliability of the doubly labeled water method for the measurement of total daily energy expenditure in free-living subjects. J Nutr. 1996;126(1):348S-54S. [PubMed] [Google Scholar]
- 28.O’Driscoll R, Turicchi J, Hopkins M, Horgan GW, Finlayson G, Stubbs JR. Improving energy expenditure estimates from wearable devices: A machine learning approach. J Sports Sci. 2020;38(13):1496–505. Epub 20200406. doi: 10.1080/02640414.2020.1746088. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data described in the manuscript, code book, and analytic code will be made available upon request pending application and approval.
