Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 1.
Published in final edited form as: Med Sci Sports Exerc. 2022 Sep 3;55(2):322–332. doi: 10.1249/MSS.0000000000003033

Validation, Recalibration, and Predictive Accuracy of Published V˙O2max Prediction Equations for Adults Ages 50 – 96

Benjamin T Schumacher 1, Chongzhi Di 2, John Bellettiere 1, Michael J LaMonte 3, Eleanor M Simonsick 4, Humberto Parada Jr 5,6, Steven P Hooker 7, Andrea Z LaCroix 1
PMCID: PMC9840647  NIHMSID: NIHMS1832964  PMID: 36069964

Abstract

Purpose:

Maximal oxygen uptake (V˙O2max) is the criterion measure of cardiorespiratory fitness (CRF). Lower CRF is a strong predictor of poor health outcomes, including all-cause mortality. Since V˙O2max testing is resource intensive, several non-exercise based V˙O2max prediction equations have been published. We assess these equations’ ability to predict measured V˙O2max, recalibrate these equations, and quantify the association of measured and predicted V˙O2max with all-cause mortality.

Methods:

Baltimore Longitudinal Study of Aging participants with valid V˙O2max tests were included (n=1,080). Using published V˙O2max prediction equations, we calculated predicted V˙O2max and present performance metrics before and after recalibration (deriving new regression estimates by regressing measured V˙O2max on BLSA covariates). Cox proportional hazards models were fit to quantify associations of measured, predicted, and recalibration-predicted values of V˙O2max with mortality.

Results:

Mean age and V˙O2max were 69.0±10.4 years and 21.6±5.9 mL•kg−1•min−1, respectively. The prediction equations yielded root mean squared error values ranging from 4.2-20.4 mL•kg−1•min−1. After recalibration, these values decreased to 3.9-4.2 mL•kg−1•min−1. Adjusting for all covariates, all-cause mortality risk was 66% lower for the highest quartile of measured V˙O2max relative to the lowest. Predicted V˙O2max variables yielded similar estimates in unadjusted models but were not robust to adjustment.

Conclusions:

Measured V˙O2max is an extremely strong predictor of all-cause mortality. Several published V˙O2max prediction equations yielded: (1) reasonable performance metrics relative to measured V˙O2max, especially when recalibrated, (2) all-cause mortality hazard ratios similar to those of measured V˙O2max, especially when recalibrated, yet (3) were not robust to adjustment for basic demographic covariates likely because these were used in the equation for predicted V˙O2max.

Keywords: CARDIORESPIRATORY FITNESS, AGING, EPIDEMIOLOGY, ASSESSMENT, RECALIBRATION

INTRODUCTION

The capacity of the circulatory and respiratory systems to deliver oxygen to skeletal muscles for use during physical activity and exercise can be quantified by one’s cardiorespiratory fitness (CRF) level (1). CRF is a physiological attribute determined by several factors including age, sex, health status, and genetics; however, the principal modifiable determinant is habitual physical activity (PA) level (1). Through increases in the frequency, duration, and intensity of PA, CRF can incrementally increase, especially among the sedentary, though CRF declines soon after the frequency, duration, and/or intensity of PA declines. Thus, CRF often is used as an objective surrogate of recent PA patterns. Decades of clinical, epidemiologic, and exercise science studies have reported that higher CRF is a strong and independent predictor of a myriad of beneficial health outcomes (24). Low CRF is among the strongest predictors of cardiovascular and all-cause mortality, with associations as strong or stronger than those of smoking (5, 6). Likewise, higher CRF is associated with lower: coronary heart disease/cardiovascular disease incidence and mortality (79), incidence of cardiometabolic risk factors (10, 11), cancer incidence and cancer mortality (1214), dementias (15) including Alzheimer’s disease (16) and their progression, depression symptoms (17, 18), rates of loss of independence for older adults (19), and all-cause mortality (5, 6, 8, 20).

The gold standard measure of CRF is maximal oxygen uptake (V˙O2max)(1). In research settings, V˙O2max measurements are conducted using maximal graded exercise tests on a treadmill or stationary cycle ergometer and require specialized testing equipment, highly trained personnel, and direct physician supervision in most instances. Further, in vulnerable populations such as older adults, V˙O2max testing may be contraindicated as it requires maximal, strenuous activity to the point of absolute exhaustion. Thus, conducting direct measures of V˙O2max in large epidemiologic cohort studies is largely infeasible (21). As an alternative approach, several non-exercise based V˙O2max prediction equations have been published to enable the approximation of V˙O2max in a variety of settings, including large epidemiologic cohorts (7, 2229). However, few equations have been developed specifically for use in older adult populations (25, 27). There is a critical need for accurate V˙O2max prediction models in older adults, given that by the year 2060, almost a quarter of the United States (U.S.) population will be comprised of adults 65 years of age or older (i.e., older adults) (30) and V˙O2max has been identified as a hallmark biomarker of successful aging (31). Given the shifting demographics, the challenges older adults face with V˙O2max testing, and the benefits of increased CRF on health, we aimed to quantify the performance of published V˙O2max prediction models in relation to measured V˙O2max in the Baltimore Longitudinal Study of Aging (BLSA), recalibrate the equations to the BLSA cohort, and assess their predictive accuracy in relation to all-cause mortality.

METHODS

Study Participants

The analytic sample for the present study was derived from the BLSA, the longest running scientific study of aging (32, 33). The BLSA was established in 1958 and is conducted by the National Institute on Aging Intramural Research Program (34). BLSA participants have been asked to visit the BLSA testing facility every one to four years to undergo a three-day battery of health, cognitive, and functional evaluations. More than 3,000 participants have participated in the BLSA since its inception, and over 1,300 participants are still active (32). To date, 1,080 BLSA participants have had laboratory-based V˙O2max measurements that meet criteria for a maximal test. Extensive details about the design, recruitment, and measurements collected in the BLSA have been published elsewhere (33). This study was approved by the relevant Institutional Review Boards and all participants provided written informed consent.

Measures

V˙O2max Measurement

V˙O2max (measured in milliliters of O2 uptake / kilogram body weight / minute; mL•kg−1•min−1) was assessed in the BLSA using a modified Balke treadmill testing protocol (35, 36). This protocol consists of a graded exercise test; walking on a treadmill at a constant pace at 3.0 miles per hour (mph) for women and 3.5 mph for men, with the incline of the treadmill increasing 3% every 2 minutes until the participant indicates they have reached exhaustion. During this test, expired gas volumes were measured using a Parkinson-Cowan gas meter and concentrations of oxygen and carbon dioxide were measured using a medical mass spectrometer (Perkin-Elmer MGA-1110), which was calibrated daily using standard gases. A computerized interface between the gas meter and mass spectrometer calculated average expired gas concentrations every 30 seconds throughout the test and the highest 30-second value for O2 uptake defined the participant’s V˙O2max.

Achievement of maximal effort during the treadmill test was defined as reaching a respiratory exchange ratio (RER) >1.0. Fifty-two participants had a V˙O2max test just below this RER cutoff when the treadmill test was stopped. Of these 52 participants, 11 achieved ≥ 85% of their age-predicted maximal heart rate (beats per minute, bpm; calculated as 220 – age) and reported a Borg rating of perceived exertion (RPE) ≥17 on a 20-point scale, so their tests were considered to reflect maximal effort and were included in the present analysis. Of the remaining 41 participants with an RER <1.0 at the time the treadmill was stopped, 31 were excluded because they had no other V˙O2max test that met the aforementioned maximal effort criteria, and 10 participants were included who provided a subsequent V˙O2max test that fit the criteria for a maximal test, resulting in a final analytic sample of 1,080. For participants with multiple V˙O2max measurements, the first measurement satisfying these criteria was used in the present study.

Non-Exercise Test V˙O2max Prediction Models

Google Scholar was used to query previously published studies using the terms “non-exercise based V˙O2max prediction models” and “older adults”, yielding a total of 12 V˙O2max prediction equations from nine published studies that were assessed in the present study. Studies that developed V˙O2max prediction equations derived solely for use in younger populations, were developed using any form of exercise testing or physical performance as a predictor of V˙O2max, or included variables in the prediction equation not available in the BLSA were not included in the present study. Each prediction equation included sex, age, and some measure of body mass. Some equations additionally included variables such as self-reported PA scores, smoking history, height, and resting heart rate. In the present analysis, covariates in the published V˙O2max prediction equations were matched with their closest equivalent covariate in the BLSA.

Outcome Ascertainment

All-cause mortality status and date of death were ascertained by linking participants to the National Death Index, a centralized database of death record information compiled from state vital statistics records, and by correspondence from relatives (37). Follow-up for mortality occurred from first V˙O2max test date, the earliest of which was January 1st, 2007, through April 15th, 2021. Mortality ascertainment was high with 96% of participants having a classified vital status. Over a median follow-up time of 9.6 years (range: 0.60 – 14.1 years), 141 participants died from any cause.

Covariates

Covariates for the V˙O2max prediction equations or their closest approximations in the BLSA included participant’s sex, age, body mass index (BMI), resting heart rate, self-reported PA/exercise level, self-rated general health status, and smoking history. In the BLSA, a participant’s sex and age were self-reported during each health history interview. Height and weight were measured using a stadiometer and calibrated scale, respectively, and BMI was calculated as weight in kilograms divided by height in meters squared. Resting heart rate was assessed by a nurse after the participant had been sitting quietly for at least 5 minutes (38). Participants were asked how much time they spent each week engaging in weight/circuit training, moderate-to-high intensity exercise, or brisk walking which was then categorized as: 0–29 (coded as 0), 30–74 (1), 75–149 (2), or ≥150 (3) minutes. Health-related quality-of-life was assessed using the 12-item short form health survey (SF-12) (39). Smoking history (never, current, or former smoker) was self-reported using a standardized questionnaire (40). The following covariates were not used in any V˙O2max prediction models, but were employed in the description of the study sample: self-reported race (White, Black, Asian/Other Pacific Islander, Other/not classifiable), self-reported educational attainment (non-college graduate, college graduate, post-college graduate), beta blocker use (yes or no), systolic and diastolic blood pressures (mmHg; oscillometric brachial blood pressure was measured with the participant in a supine position on both arms three times and the minimum systolic and diastolic blood pressures were used).

Statistical Analysis

We compared covariates by sex-specific quartiles of measured V˙O2max using chi-square tests for categorical variables and analysis of variance (ANOVA) tests for continuous variables.

Predicted V˙O2max was calculated using each V˙O2max prediction equation as originally published. The performance (ability to accurately predict measured V˙O2max) of each equation was evaluated by comparing the predicted V˙O2max to the measured V˙O2max using the root mean square error (RMSE), bias, mean absolute percentage error (MAPE), the Bland-Altman 95% Limits of Agreement (LOA) (41), correlation coefficients, and R2. These analyses were conducted in the overall sample and within sex strata. In brief, RMSE quantifies the concentration of the data around the line of best fit by estimating the square of all predicted V˙O2max minus measured V˙O2max pairs, taking the mean of these squared differences, and obtaining the square root of the mean squared errors. Bias was computed by taking the mean of the measured V˙O2max minus predicted V˙O2max pairs. MAPE was computed by taking the mean of the absolute value of the percent deviation of the predicted V˙O2max from the measured V˙O2max. The lower the RMSE, bias, and MAPE, the better the performance of the prediction model, with 0 indicating perfect prediction of the measured V˙O2max. The calculation for the Bland-Altman 95% LOA has been described elsewhere, but these limits are expected to capture 95% of the differences between measured and predicted V˙O2max; a more narrow range of limits indicates a better prediction (41). The Bland-Altman 95% LOA were obtained using the blandr package in R (42).

Because the accuracy of each V˙O2max prediction equation is strongly influenced by the distribution of covariates and measured V˙O2max in the source population from which the equation was derived, the application of a prediction equation from one population to another can affect predictive accuracy. Therefore, each V˙O2max prediction equation was recalibrated by regressing measured V˙O2max in the BLSA on the BLSA covariates representing those used in each prediction equation. With recalibration, the regression coefficients for each covariate in relation to measured V˙O2max derive fully from the BLSA, as opposed to applying regression weights calculated in a different population to BLSA covariates. Recalibration has been used in other settings to evaluate accuracy of prediction equations when transported from the source to other populations (43). Residuals vs. Fitted, Normal Q-Q, Scale-Location, and Residuals vs. Leverage plots were used to assess model diagnostics of the recalibrated V˙O2max prediction equations (44). After evaluation of all recalibrated equations, their predicted V˙O2max values were output. Performance metrics for the recalibrated equations included the same metrics as described above for evaluation of the original equations, as well as the 10-fold cross-validation (CV) RMSE and R2 values.

To further evaluate the validity of predicted V˙O2max values, sequentially adjusted Cox proportional hazards regression models were used to estimate the associations between quartiles of V˙O2max (measured V˙O2max, predicted V˙O2max, and the recalibration-predicted V˙O2max) and all-cause mortality. Model 1 was unadjusted, Model 2 adjusted for age and sex, and Model 3 adjusted for Model 2 covariates in addition to race and ethnicity, and education. Linear trends across quartiles (P-value for Trend) were tested by specifying the quartile indicator in the model as a continuous variable. Associations between a one standard deviation increase in each V˙O2max variable and all-cause mortality were also assessed using the same modeling approach, and the P-value for the centered and scaled V˙O2max variable in the model are presented. The concordance statistic (C-Statistic), the proportion of pairs of participants where the model correctly predicts which participant will experience a mortality event first, is also presented. Tests of the proportional hazards assumption were conducted using the cox.zph function of the survival package (45) in R through the testing of the correlation of each covariate’s (and the whole model’s) scaled Schoenfeld residuals with time to ensure independence between the residuals and time; no violations were noted. Variance inflation factors (VIF) were used to assess multicollinearity between independent variables; no values were outside the range of 0.25 – 4.(46)

All analyses were conducted in R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria).

RESULTS

Sample Characteristics

The 565 women and 515 men with measured V˙O2max included in this study had a mean age, BMI, and V˙O2max of 69.0 ± 10.4 years, 27.0 ± 4.4 kg/m2, and 21.6 ± 5.9 mL•kg−1•min−1, respectively (Table 1). Two-thirds of study participants were non-Hispanic White, one-fourth were non-Hispanic Black, 4.6% were non-Hispanic Asian, 3.2% were Hispanic, and the remaining 0.7% were from other non-Hispanic race/ethnicity groups or could not be classified. The majority of the sample (61.9%) had a post-college education. Current smoking prevalence was 1.8% and mean systolic and diastolic blood pressures were 114.1 ± 14.1 and 66.7 ± 8.8 mmHg, respectively. Age, BMI, current smoking, and systolic blood pressure were inversely related with incremental quartiles of measured V˙O2max, whereas education, self-reported exercise, self-rated health status, and diastolic blood pressure were positively related with V˙O2max (see Table 1).

Table 1.

Characteristics of BLSA participants overall and according to quartiles of measured V˙O2max (n = 1,080)

Measured V˙O2max
Characteristic Total
(n = 1,080)
Quartile 1††
(n = 270)
Quartile 2††
(n = 277)
Quartile 3††
(n = 265)
Quartile 4††
(n = 268)
P-value*
Age, Mean (SD) 69.0 (10.4) 75.5 (8.8) 72.1 (9.7) 67.3 (8.9) 60.9 (8.2) < 0.01
Race, n (%) < 0.01
  non-Hispanic, White 708 (65.6) 169 (62.6) 177 (63.9) 177 (66.8) 185 (69.0)
  non-Hispanic, Black 279 (25.8) 87 (32.2) 82 (29.6) 60 (22.6) 50 (18.7)
  non-Hispanic, Asian/Other Pacific Islander 50 (4.6) 8 (3.0) 9 (3.2) 14 (5.3) 19 (7.1)
  Hispanic 35 (3.2) 4 (1.5) 6 (2.2) 11 (4.2) 14 (5.2)
  non-Hispanic, Other/not classifiable 8 (0.7) 2 (0.7) 3 (1.1) 3 (1.1) 0 (0.0)
Highest attained education, n (%) < 0.01
  Post college 669 (61.9) 152 (56.3) 168 (60.6) 169 (63.8) 180 (67.2)
  College 225 (20.8) 51 (18.9) 53 (19.1) 57 (21.5) 64 (23.9)
  Non-college graduate 183 (16.9) 67 (24.8) 56 (20.2) 39 (14.7) 21 (7.8)
BMI (kg/m2), Mean (SD) 27.0 (4.4) 28.9 (4.7) 27.4 (4.6) 26.6 (4.1) 24.9 (3.4) < 0.01
Beta Blocker Use, n (%) 152 (14.1) 78 (28.9) 39 (14.1) 22 (8.3) 13 (4.9) < 0.01
Minutes of Exercise, n (%) < 0.01
  0-29 465 (43.1) 171 (63.3) 127 (45.8) 93 (35.1) 74 (27.6)
  30 - 74 169 (15.6) 36 (13.3) 48 (17.3) 33 (12.5) 52 (19.4)
  75 - 149 165 (15.3) 25 (9.3) 42 (15.2) 52 (19.6) 46 (17.2)
  150+ 272 (25.2) 36 (13.3) 59 (21.3) 84 (31.7) 93 (34.7)
Self Rated Health, n (%) < 0.01
  Excellent 339 (31.4) 43 (15.9) 84 (30.3) 90 (34.0) 122 (45.5)
  Very Good/Good 715 (66.2) 219 (81.1) 185 (66.8) 170 (64.2) 141 (52.6)
  Fair/Poor 14 (1.3) 5 (1.9) 6 (2.2) 2 (0.8) 1 (0.4)
Systolic BP (mmHg), Mean (SD) 114.1 (14.1) 117.3 (14.8) 116 (13.3) 113 (13.7) 110.2 (13.3) < 0.01
Diastolic BP (mmHg), Mean (SD) 66.7 (8.8) 65 (8.4) 66.3 (9.3) 66.9 (8.6) 68.5 (8.5) < 0.01
Smoking Status, n (%) < 0.01
  Never 682 (63.1) 149 (55.2) 169 (61.0) 180 (67.9) 184 (68.7)
  Former 372 (34.4) 112 (41.5) 103 (37.2) 83 (31.3) 74 (27.6)
  Current 19 (1.8) 7 (2.6) 4 (1.4) 1 (0.4) 7 (2.6)
Maximal Exercise Test
  V˙O2max  (mL/kg/min), Median (SD) 21.6 (5.9) 15.5 (2.5) 19.8 (2.1) 23.5 (2.2) 28.8 (4.5) < 0.01
  Respiratory Exchange Ratio, Mean (SD) 3.3 (68.1) 1.2 (0.1) 1.2 (0.1) 1.2 (0.1) 9.5 (136.8) 0.18
  Borg Score, Mean (SD) 16.5 (1.7) 16.1 (1.7) 16.2 (1.7) 16.7 (1.7) 17 (1.6) < 0.01
  % of Max. Predicted HR, Mean (SD)** 98.8 (50.2) 89.6 (13.3) 97.5 (9.3) 100.3 (8.5) 107.8 (98.3) < 0.01

Percentages may not sum to 100% due to missing data.

*

P-value for continuous variables from One-way ANOVA and Chi-Sq. goodness of fit test for categorical variables across VO2max quartiles.

**

Maximum predicted heart rate: 220 - age

Bold indicates significance at the P < 0.05 level.

††

Sex-specific quartile definitions were as follows:

Q1: Men : < 19.9; n = 129 & Women: < 16.5; n = 141

Q2: Men : ≥ 19.9 & ≤ 23.7; n = 131 & Women: ≥ 16.5 & ≤ 19.9; n = 146

Q3: Men : > 23.7 & ≤ 27.4; n = 128 & Women: > 19.9 & ≤ 23.7; n = 137

Q4: Men : > 27.4; n = 127 & Women: > 23.7; n = 141

V˙O2max Prediction Equations

When each prediction equation (Table 2) was used to estimate V˙O2max in the BLSA sample, the lowest and highest RMSE values (in units of mL•kg−1•min−1) of the V˙O2max prediction equations were 4.2 (Bradshaw et al.’s (22) equations) and 20.4 (Jang et al. (28)), respectively (see Table 3.). The absolute value of the bias (unitless) ranged from 0.1 (Matthews et al. (25)) to 19.3 (Jang et al. (28)). Bradshaw et al. (22) had the lowest MAPE value (15.4%) and Jang et al. (28) had the highest MAPE value (97.7%).

Table 2.

Extant V˙O2max Prediction Equations, Adaptations for the BLSA, and the Recalibrated Formulas

Study Prediction Equation from Study Variable Definitions Variable Adaptations
Jurca et al; NASA Original: 18.07 + 2.77(Sex) − 0.10(Age) − 0.17(BMI) − 0.03(Resting HR) + 0.32(SRPA1) + 1.06(SRPA2) + 1.76(SRPA3) + 3.03(SRPA4)
Recalibrated: 58.64 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) − 2.39(exercise0) − 1.73(exercise1) − 1.15(exercise2)
Sex coded as Men = 1 and Women = 0. SRPA0 = Little activity other than walking for pleasure (Ref.). SRPA1 = Some regular participation in modest physical activities involving sports, recreational activities. SRPA2 = Aerobic exercise such as run/walk for 20 to 60 minutes per week. SRPA3 = Aerobic exercise such as run/walk for 1 to 3 hours per week. SRPA4 = Aerobic exercise such as run/walk for > 3 hours per week. BLSA participants self-reported the weekly minutes of weight/circuit training, moderate-to-high intensity exercise, and/or brisk walking and were categorized into 0 - 29 minutes (coded as 0), 30 – 74 minutes (1), 75 – 149 minutes (2), and 150+ minutes (3). Dummy variables were created and exercise0 = SRPA1, exercise1 = SRPA2, exercise3 = SRPA3, and exercise3 = SRPA4.
Jurca et al; ACLS Original: 18.81 + 2.49(Sex) − 0.08(Age) − 0.17(BMI) − 0.05(Resting HR) + 0.81(SRPA1) + 1.17(SRPA2) + 2.16(SRPA3) + 3.05(SRPA4)
Recalibrated: 58.64 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) − 2.39(exercise0) − 1.73(exercise1) − 1.15(exercise2)
Sex coded as Men = 1 and Women = 0. SRPA0 = No activity (Ref.). SRPA1 = Participated in sporting or leisure-time physical activity other than walking, jogging, or running. SRPA2 = Walk, jog, or run up to 10 miles per week. SRPA3 = Walk, jog, or run from 10 to 20 miles per week. SRPA4 = Walk, jog, or run > 20 miles per week BLSA participants self-reported the weekly minutes of weight/circuit training, moderate-to-high intensity exercise, and/or brisk walking and were categorized into 0 - 29 minutes (coded as 0), 30 – 74 minutes (1), 75 – 149 minutes (2), and 150+ minutes (3). Dummy variables were created and exercise0 = SRPA1, exercise1 = SRPA2, exercise3 = SRPA3, and exercise3 = SRPA4.
Jurca et al; ADNFS Original: 21.41 + 2.78(Sex) − 0.11(Age) − 0.17(BMI) − 0.05(Resting HR) + 0.35(SRPA1) + 0.29(SRPA2) + 0.64(SRPA3) + 1.21(SRPA4)
Recalibrated: 58.64 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) − 2.39(exercise0) − 1.73(exercise1) − 1.15(exercise2)
Sex coded as Men = 1 and Women = 0. SRPA0 = From 0 to 4 occasions of at least moderate activity in past 4 weeks (Ref.). SRPA1 = From 5 to 11 occasions of at least moderate activity in past 4 weeks. SRPA2 = G.E. 12 Occasions of moderate activity in past 4 weeks. SRPA3 = G.E. 12 Occasions of a mix of moderate and vigorous activities in past 4 weeks. SRPA4 = G.E. 12 occasions of vigorous activity in past 4 weeks. BLSA participants self-reported the weekly minutes of weight/circuit training, moderate-to-high intensity exercise, and/or brisk walking and were categorized into 0 - 29 minutes (coded as 0), 30 – 74 minutes (1), 75 – 149 minutes (2), and 150+ minutes (3). Dummy variables were created and exercise0 = SRPA1, exercise1 = SRPA2, exercise3 = SRPA3, and exercise3 = SRPA4.
Bradshaw et al Original: 48.073 + 6.178(Sex) − 0.246(Age) − 0.619(BMI) + 0.712(PFA) + 0.671(PA-R)
Recalibrated: 46.61 + 4.82(Sex) − 0.31(Age) − 0.46(BMI) + 1.58(sfhealth) + 0.57(exercise)
Sex coded as Men = 1 and Women = 0. PFA = two questions that ascertain how fast participants feel they can cover a 1- and 3-mile distance at a comfortable pace. The participant’s sum total of both 13-point questions is counted as the PFA score (range: 2–26). PA-R = individuals rate their activity level over the previous 6-months using a 10-point scale. Reverse coded SF-12 self-rated health score was used in lieu of PFA ( 5 = “Excellent”, 4 = “Very Good”, 3 = “Good”, 2 = “Fair”, 1 = “Poor”. Used the aforementioned exercise variable in lieu of PA-R.
Jackson et al Original: 56.363 + 1.921(PA-R) − 0.381(Age) − 0.754(BMI) + 10.987(Sex)
Recalibrated: 53.89 + 0.74(exercise) − 0.31(Age) − 0.5(BMI) + 4.7(Sex)
Sex coded as Men = 1 and Women = 0. SRPA0 = Little activity other than walking for pleasure (Ref.). PA-R from Jurca et al. NASA equation. Used the aforementioned exercise variable in lieu of PA-R.
Matthews et al Original: 34.142 + 11.403(Sex) + 0.133(Age) − (0.005(Age*Age)) + 1.463(PAS) + 9.170*(ht meters) − 0.254(Body mass in kg)
Recalibrated: 32.58 + 5.15(Sex) − 0.32(Age) + 0.74(exercise) + 12.97(ht meters) − 0.18(wtkg)
Sex coded as Men = 1 and Women = 0. PAS = Physical Activity Status (0 - 7); This instrument has subjects rate their last month of physical activity participation on a 0-7 scale. Responses of 0 and 1 represented no regular physical activity, whereas a response of 2 or 3 represented moderate intensity activities, and responses of 4 to 7 represented regular vigorous physical activity participation of increasing exercise time. Used the aforementioned exercise variable in lieu of PAS.
Sloan et al Original: Male w/ HR: 52.23 − 0.20(Age) − 0.35(BMI) − 0.06(HRrest/min) + 2.05(Physical activity score)
Original: Female w/ HR: 47.79 − 0.21(Age) − 0.35(BMI) − 0.06(HRrest/min) + 2.08(Physical activity score)
Original: Male w/o HR: 49.9 − 0.21(Age) − 0.36(BMI) + 2.12(Physical activity score)
Original: Female w/o HR: 43.27 − 0.22(Age) − 0.37(BMI) + 2.17(Physical activity score)
Recalibrated w/ HR: 56.06 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) + 0.78(exercise)
Recalibrated w/o HR: 53.89 + 4.7(Sex) − 0.31(Age) − 0.5(BMI) + 0.74(exercise)
Physical activity score: In accordance with the procedure outlined by Jurca et al, participants were asked to select one of the five levels of self-reported physical activity that best described their usual activity pattern: (a) level 0 – inactive or little activity other than usual daily activities; (b) level 1 – regular (> 5 days/wk) participation in physical activities requiring low levels of exertion that result in slight increases in breathing and heart rate for at least 10 mins at a time; (c) level 2 – participation in aerobic exercises such as brisk walking, jogging or running, cycling, swimming or vigorous sports at a comfortable pace, or other activities requiring similar levels of exertion for 20–60 mins/wk; (d) level 3 – participation in aerobic exercises such as brisk walking, jogging or running at a comfortable pace, or other activities requiring similar levels of exertion for 1–3 hrs/wk; and (e) level 4 – participation in aerobic exercises such as brisk walking, jogging or running at a comfortable pace, or other activities requiring similar levels of exertion for > 3 hrs/wk. Used the aforementioned exercise variable in lieu of physical activity score.
de Souza e Silva et al Original: 44.74 − 10.9(Sex) − 0.35(Age) − 0.15(Weight pounds) + 0.68(Height inches); Treadmill constant into the intercept.
Recalibrated: 44.3 − 5.31(Sex) − 0.33(Age) − 0.09(Weight pounds) + 0.35(Height inches)
Sex coded as Men = 1 and Women = 2. -
Baynard et al Original: 77.96 − 10.35(Sex) − 0.32(Age) − 0.92(BMI)
Recalibrated: 61.45 − 4.82(Sex) − 0.33(Age) − 0.54(BMI)
Sex coded as Men = 0 and Women = 1. -
Jang et al Original: 50.543 − 0.069(Age) + 13.525(Sex) − 0.403(BMI) − 1.530(Smoking)
Recalibrated: 56.58 − 0.33(Age) + 4.85(Sex) − 0.53(BMI) − 1.49(Smoking)
Smoking: 0 = never or former, 1 = current. -
Myers et al Original: 79.9 − 0.39(Age) − 13.7(Sex) − 0.127(wt lbs)
Recalibrated: 62.71 − 0.35(Age) − 6.85(Sex) − 0.08(wt lbs)
Sex coded as Men = 0 and Women = 1. -

Table 3.

Performance Metrics for Previously Published VO2max Prediction Equations Compared to Measured V˙O2max in the BLSA

Formula RMSE Bias MAPE LOA Correlation w/Measured V˙O2max R2
Jurca et al; NASA 15.8 15.1 67.8 (5.7, 24.5) 0.68 0.46
    Male 16.8 15.8 63.5 (4.9, 26.7) 0.60 0.36
    Female 15.0 14.5 71.2 (6.7, 22.3) 0.67 0.45
Jurca et al; ACLS 15.1 14.3 63.5 (4.6, 23.9) 0.66 0.44
    Male 16.1 15.1 60.2 (4.0, 26.2) 0.58 0.34
    Female 14.2 13.6 66.2 (5.5, 21.6) 0.64 0.41
Jurca et al; ADNFS 15.4 14.7 65.4 (5.1, 24.2) 0.69 0.48
    Male 16.4 15.4 61.5 (4.4, 26.4) 0.66 0.44
    Female 14.6 14.1 68.6 (6.0, 22.1) 0.68 0.46
Bradshaw et al 4.2 1.0 15.4 (−7.1, 9.0) 0.72 0.52
    Male 4.6 0.2 15.7 (−8.8, 9.2) 0.67 0.45
    Female 3.8 1.7 15.2 (−5.1, 8.5) 0.74 0.55
Jackson et al 7.2 4.7 29.4 (−6.0, 15.4) 0.69 0.48
    Male 5.1 1.5 17.1 (−8.0, 11.1) 0.64 0.41
    Female 8.7 7.6 40.5 (−0.7, 15.9) 0.73 0.53
Matthews et al 5.3 −0.1 20.7 (−10.4, 10.2) 0.72 0.52
    Male 5.6 −2.3 20.1 (−12.2, 7.6) 0.67 0.45
    Female 5.0 1.9 21.2 (−7.2, 10.9) 0.72 0.52
Sloan et al; HR 5.1 −2.3 21.0 (−11.2, 6.6) 0.67 0.45
    Male 6.0 −2.9 23.2 (−13.1, 7.4) 0.58 0.34
    Female 4.2 −1.8 19.2 (−9.4, 5.7) 0.66 0.44
Sloan et al; No HR 5.3 −2.3 21.3 (−11.6, 7.0) 0.65 0.42
    Male 6.4 −4.0 26.0 (−13.9, 5.8) 0.57 0.32
    Female 3.9 −0.8 17.0 (−8.3, 6.8) 0.67 0.45
de Souza e Silva et al 5.4 −1.6 21.4 (−11.8, 8.5) 0.67 0.45
    Male 6.6 −4.4 26.3 (−14.1, 5.2) 0.61 0.37
    Female 4.1 0.9 16.9 (−6.9, 8.7) 0.71 0.50
Baynard et al 6.3 −3.6 25.7 (−13.8, 6.6) 0.66 0.44
    Male 8.0 −6.4 33.7 (−15.9, 3.2) 0.61 0.37
    Female 4.2 −1.1 18.4 (−9.0, 6.9) 0.70 0.49
Jang et al 20.4 −19.3 97.7 (−32.5, −6.1) 0.44 0.19
    Male 24.8 −24.2 113.3 (−35.1, −13.2) 0.45 0.20
    Female 15.4 −14.8 83.4 (−23.0, −6.7) 0.61 0.37
Myers et al 5.7 −2.3 22.2 (−12.4, 7.8) 0.66 0.44
    Male 7.0 −5.0 28.4 (−14.6, 4.5) 0.61 0.37
    Female 4.0 0.2 16.6 (−7.7, 8.0) 0.68 0.46

Bold indicates significance at the P < 0.01 level.

After recalibration of the equations to the BLSA data, every equation improved on all performance metrics (see Tables 3 and 4). The recalibrated formulas’ cross-validated RMSE values ranged from 3.9 (Bradshaw et al. (22)) to 4.2 (Myers et al. (7)) and, as expected, all bias values were 0. MAPE values were similar across the recalibrated prediction equations, ranging from 14.4% (Bradshaw et al. (22)) to 15.7% (Myers et al. (7)). The R2 for the recalibrated equations ranged from 49% (Myers et al. (7)) to 58% (Bradshaw et al. (22)), which compares favorably to an age and sex adjusted model R2 of 36%. Additional recalibrated performance metrics including sex-stratified performance metrics are reported in Tables 3 and 4.

Table 4.

Performance Metrics for Recalibrated V˙O2max Prediction Equations Compared to Measured V˙O2max in the BLSA

Correlation w/Measured V˙O2max
Recalibrated Formula RMSE MAPE LOA R2*
Jurca et al 4.1 15.4 (−8.1, 8.1) 0.73 0.53
    Male 4.8 16.2 (−9.4, 9.4) 0.68 0.46
    Female 3.5 14.8 (−6.8, 6.8) 0.73 0.53
Bradshaw et al 3.9 14.4 (−7.6, 7.6) 0.76 0.58
    Male 4.3 14.6 (−8.4, 8.4) 0.72 0.52
    Female 3.4 14.3 (−6.7, 6.7) 0.74 0.55
Jackson et al 4.0 15.0 (−7.9, 7.9) 0.73 0.53
    Male 4.5 15.3 (−8.9, 8.9) 0.67 0.45
    Female 3.5 14.7 (−6.8, 6.8) 0.73 0.53
Matthews et al 4.0 15.0 (−7.9, 7.9) 0.73 0.53
    Male 4.5 15.3 (−8.9, 8.9) 0.67 0.45
    Female 3.5 14.7 (−6.9, 6.9) 0.73 0.53
Sloan et al; HR 4.1 15.4 (−8.1, 8.1) 0.73 0.53
    Male 4.8 16.1 (−9.4, 9.4) 0.68 0.46
    Female 3.5 14.8 (−6.8, 6.8) 0.72 0.52
Sloan et al; No HR 4.0 15.0 (−7.9, 7.9) 0.73 0.53
    Male 4.5 15.3 (−8.9, 8.9) 0.67 0.45
    Female 3.5 14.7 (−6.8, 6.8) 0.73 0.53
de Souza e Silva et al 4.1 15.4 (−8.1, 8.1) 0.72 0.52
    Male 4.6 15.7 (−9.1, 9.1) 0.66 0.44
    Female 3.6 15.1 (−7.1, 7.1) 0.71 0.50
Baynard et al 4.1 15.4 (−8.1, 8.1) 0.72 0.52
    Male 4.6 15.8 (−9.1, 9.1) 0.66 0.44
    Female 3.6 15.1 (−7.0, 7.0) 0.71 0.50
Jang et al 4.1 15.4 (−8.1, 8.1) 0.72 0.52
    Male 4.6 15.7 (−9.1, 9.1) 0.66 0.44
    Female 3.6 15.1 (−7.0, 7.0) 0.71 0.50
Myers et al 4.2 15.7 (−8.2, 8.2) 0.70 0.49
    Male 4.6 15.9 (−9.1, 9.1) 0.65 0.42
    Female 3.7 15.5 (−7.3, 7.3) 0.68 0.46

Bold indicates significance at the P < 0.01 level.

*

In a model with age and sex alone, the R2 for the entire sample was .356, .315 for men, and .251 for women.

RMSE and R2 were obtained using 10-fold cross-validation.

V˙O2max Associations with Mortality

When assessing the associations between quartiles of measured V˙O2max and all-cause mortality, a steep inverse gradient in mortality risk across incremental V˙O2max quartiles was evident in both unadjusted and adjusted models. Adjusting for Model 3 covariates, the hazard ratios (HRs) and (95% CI) were 0.55 (0.37-0.82), 0.30 (0.17-0.54), and 0.34 (0.15-0.75) for quartile 2 (Q2) – Q4 relative to Q1 of measured V˙O2max, respectively, Ptrend < 0.001 (Table 5.). To further investigate the robustness of measured V˙O2max to adjustments beyond the Model 3 covariates, we additionally adjusted for the following variables: BMI, smoking history, self-rated health, diagnosed diabetes, glucose intolerance, or high blood sugar, history of heart attack or myocardial infarction, history of heart failure or CHF, history of stroke, mini stroke, or slight stroke, and current hypertension. The HRs from this model slightly strengthened in magnitude, remained statistically significant, and maintained their trend across quartiles (HRs for Q2-Q4 relative to Q1: 0.56 (0.36-0.88), 0.30 (0.16-0.59), and 0.31 (0.13-0.75); Ptrend < 0.001).

Table 5.

Hazard Ratios (HR) of All-Cause Mortality by Measured and Predicted V˙O2max in the Selected BLSA Sample (n = 1,080)

Sex- Specific Quartiles of V˙O2max
Author Model Q1 Q2 Q3 Q4 P-Trend HR for 1 SD Increase P-value C Statistic
Measured
1 1.00 (ref.) 0.43 (0.29-0.63) 0.16 (0.09-0.29) 0.10 (0.05-0.20) < 0.01 0.46 (0.38-0.57) < 0.01 0.71 (0.02)
2 1.00 (ref.) 0.55 (0.37-0.81) 0.30 (0.17-0.54) 0.34 (0.16-0.75) < 0.01 0.51 (0.39-0.66) < 0.01 0.79 (0.02)
3 1.00 (ref.) 0.55 (0.37-0.82) 0.30 (0.17-0.54) 0.34 (0.15-0.75) < 0.01 0.50 (0.38-0.66) < 0.01 0.79 (0.02)
Baynard et al
1 1.00 (ref.) 0.67 (0.45-0.99) 0.42 (0.27-0.66) 0.15 (0.07-0.29) < 0.01 0.89 (0.75-1.05) 0.16 0.66 (0.02)
2 1.00 (ref.) 0.72 (0.49-1.07) 0.82 (0.51-1.33) 0.58 (0.28-1.20) 0.12 0.91 (0.66-1.24) 0.55 0.78 (0.02)
3 1.00 (ref.) 0.71 (0.48-1.06) 0.85 (0.53-1.37) 0.63 (0.30-1.32) 0.19 0.94 (0.68-1.30) 0.72 0.78 (0.02)
Bradshaw et al
1 1.00 (ref.) 0.66 (0.44-0.98) 0.33 (0.20-0.54) 0.24 (0.14-0.42) < 0.01 0.79 (0.67-0.93) < 0.01 0.64 (0.02)
2 1.00 (ref.) 0.74 (0.50-1.10) 0.76 (0.45-1.28) 1.09 (0.58-2.02) 0.62 0.90 (0.68-1.19) 0.47 0.78 (0.02)
3 1.00 (ref.) 0.73 (0.49-1.09) 0.78 (0.47-1.32) 1.27 (0.67-2.41) 0.83 0.93 (0.70-1.25) 0.64 0.78 (0.02)
de Souza e Silva et al
1 1.00 (ref.) 0.59 (0.40-0.88) 0.43 (0.28-0.66) 0.14 (0.07-0.28) < 0.01 0.86 (0.73-1.01) 0.07 0.66 (0.02)
2 1.00 (ref.) 0.66 (0.44-0.98) 0.87 (0.54-1.39) 0.57 (0.28-1.18) 0.13 0.89 (0.65-1.21) 0.45 0.78 (0.02)
3 1.00 (ref.) 0.65 (0.44-0.97) 0.90 (0.56-1.45) 0.64 (0.31-1.34) 0.21 0.93 (0.67-1.28) 0.64 0.78 (0.02)
Jackson et al
1 1.00 (ref.) 0.47 (0.31-0.72) 0.32 (0.20-0.51) 0.19 (0.11-0.34) < 0.01 0.81 (0.69-0.96) 0.01 0.66 (0.02)
2 1.00 (ref.) 0.69 (0.46-1.05) 0.82 (0.50-1.35) 0.98 (0.51-1.89) 0.52 0.92 (0.68-1.25) 0.59 0.77 (0.02)
3 1.00 (ref.) 0.69 (0.46-1.06) 0.83 (0.51-1.37) 1.12 (0.57-2.21) 0.70 0.94 (0.69-1.29) 0.71 0.78 (0.02)
Jang et al
1 1.00 (ref.) 1.08 (0.70-1.67) 0.88 (0.56-1.38) 0.54 (0.33-0.91) 0.02 1.33 (1.12-1.57) < 0.01 0.57 (0.02)
2 1.00 (ref.) 1.03 (0.67-1.59) 0.74 (0.47-1.17) 1.00 (0.59-1.69) 0.49 0.80 (0.38-1.67) 0.55 0.78 (0.02)
3 1.00 (ref.) 1.05 (0.68-1.63) 0.76 (0.48-1.20) 1.05 (0.62-1.77) 0.60 0.87 (0.41-1.84) 0.71 0.78 (0.02)
Jurca et al; ACLS
1 1.00 (ref.) 0.55 (0.31-0.99) 0.33 (0.17-0.64) 0.28 (0.13-0.58) < 0.01 0.74 (0.59-0.94) 0.01 0.63 (0.04)
2 1.00 (ref.) 0.65 (0.36-1.17) 0.65 (0.33-1.26) 0.89 (0.40-1.96) 0.39 0.87 (0.60-1.26) 0.47 0.80 (0.03)
3 1.00 (ref.) 0.62 (0.34-1.12) 0.60 (0.31-1.18) 0.90 (0.40-2.01) 0.34 0.87 (0.60-1.27) 0.47 0.80 (0.03)
Jurca et al; ADNFS
1 1.00 (ref.) 0.76 (0.45-1.29) 0.24 (0.11-0.50) 0.14 (0.06-0.37) < 0.01 0.67 (0.52-0.85) < 0.01 0.67 (0.03)
2 1.00 (ref.) 1.18 (0.69-2.02) 0.74 (0.33-1.68) 1.31 (0.42-4.09) 0.95 0.89 (0.54-1.46) 0.64 0.80 (0.03)
3 1.00 (ref.) 1.14 (0.66-1.97) 0.72 (0.32-1.64) 1.33 (0.41-4.36) 0.89 0.88 (0.53-1.47) 0.63 0.80 (0.03)
Jurca et al; NASA
1 1.00 (ref.) 0.44 (0.24-0.79) 0.31 (0.16-0.59) 0.22 (0.10-0.48) < 0.01 0.67 (0.52-0.85) < 0.01 0.64 (0.04)
2 1.00 (ref.) 0.69 (0.38-1.26) 0.69 (0.35-1.33) 1.12 (0.47-2.66) 0.58 0.84 (0.57-1.24) 0.39 0.80 (0.03)
3 1.00 (ref.) 0.67 (0.36-1.22) 0.65 (0.33-1.27) 1.15 (0.48-2.78) 0.54 0.84 (0.57-1.25) 0.40 0.81 (0.02)
Matthews et al
1 1.00 (ref.) 0.25 (0.16-0.40) 0.19 (0.12-0.31) 0.09 (0.04-0.18) < 0.01 0.60 (0.50-0.71) < 0.01 0.71 (0.02)
2 1.00 (ref.) 0.47 (0.29-0.75) 0.57 (0.32-1.03) 0.60 (0.25-1.45) 0.04 0.82 (0.57-1.16) 0.26 0.78 (0.02)
3 1.00 (ref.) 0.47 (0.29-0.75) 0.62 (0.34-1.13) 0.69 (0.28-1.68) 0.07 0.85 (0.59-1.22) 0.37 0.79 (0.02)
Myers et al
1 1.00 (ref.) 0.62 (0.42-0.91) 0.36 (0.23-0.56) 0.13 (0.06-0.26) < 0.01 0.83 (0.70-0.98) 0.03 0.66 (0.02)
2 1.00 (ref.) 0.82 (0.55-1.21) 0.80 (0.49-1.30) 0.70 (0.32-1.53) 0.24 0.78 (0.56-1.07) 0.13 0.77 (0.02)
3 1.00 (ref.) 0.86 (0.58-1.27) 0.86 (0.53-1.40) 0.79 (0.35-1.76) 0.43 0.82 (0.59-1.15) 0.26 0.78 (0.02)
Sloan et al; HR
1 1.00 (ref.) 0.41 (0.22-0.75) 0.32 (0.17-0.60) 0.22 (0.10-0.47) < 0.01 0.65 (0.51-0.83) < 0.01 0.64 (0.04)
2 1.00 (ref.) 0.63 (0.34-1.15) 0.61 (0.32-1.17) 1.05 (0.44-2.50) 0.37 0.84 (0.59-1.20) 0.34 0.80 (0.03)
3 1.00 (ref.) 0.62 (0.34-1.16) 0.58 (0.30-1.12) 1.08 (0.44-2.61) 0.34 0.84 (0.59-1.21) 0.36 0.81 (0.02)
Sloan et al; No HR
1 1.00 (ref.) 0.38 (0.24-0.59) 0.39 (0.25-0.60) 0.26 (0.15-0.43) < 0.01 0.86 (0.73-1.02) 0.08 0.64 (0.02)
2 1.00 (ref.) 0.67 (0.42-1.05) 0.74 (0.47-1.15) 0.94 (0.53-1.66) 0.39 0.94 (0.72-1.24) 0.67 0.78 (0.02)
3 1.00 (ref.) 0.67 (0.43-1.06) 0.73 (0.46-1.14) 1.04 (0.58-1.88) 0.50 0.96 (0.72-1.26) 0.75 0.78 (0.02)

Model 1 = V˙O2max quartiles; crude

Model 2 = Model 1 + age + sex

Model 3 = Model 2 + race and ethnicity + education

Results from the Cox proportional hazards regression models estimating the associations between predicted V˙O2max (each equation separately), and all-cause mortality are shown in Table 5. For most equations, predicted V˙O2max was associated with mortality in a pattern and strength similar to that of measured V˙O2max in the crude model (Model 1), but adjustment for basic covariates in Models 2 and 3 attenuated the HRs, widened the confidence intervals to statistical insignificance, and eliminated all linear trends (see Table 5.).

After recalibration, unadjusted HRs for Q2 – Q4 relative to Q1 of predicted V˙O2max exhibited patterns and magnitudes of association that more closely reflected those for measured V˙O2max. For example, no published equation had an HR of 0.10 (the Q4 HR of measured V˙O2max relative to Q1) in Q4 relative to Q1, but recalibrated Q4 HRs were ≤0.10 for most equations. However, after adjustment for covariates in Models 2 and 3, the HRs were attenuated again, confidence intervals widened to statistical insignificance, and linear trends were not statistically significant (see Table 6.).

Table 6.

Hazard Ratios (HR) of All-Cause Mortality by Measured and Recalibrated, Predicted V˙O2max in the Selected BLSA Sample (n = 1,080)

Sex- Specific Quartiles of V˙O2max
Author Model Q1 Q2 Q3 Q4 P-Trend HR for 1 SD Increase P-value C Statistic
Measured
1 1.00 (ref.) 0.43 (0.29-0.63) 0.16 (0.09-0.29) 0.10 (0.05-0.20) < 0.01 0.46 (0.38-0.57) < 0.01 0.71 (0.02)
2 1.00 (ref.) 0.55 (0.37-0.81) 0.30 (0.17-0.54) 0.34 (0.16-0.75) < 0.01 0.51 (0.39-0.66) < 0.01 0.79 (0.02)
3 1.00 (ref.) 0.55 (0.37-0.82) 0.30 (0.17-0.54) 0.34 (0.15-0.75) < 0.01 0.50 (0.38-0.66) < 0.01 0.79 (0.02)
Baynard et al
1 1.00 (ref.) 0.39 (0.26-0.58) 0.23 (0.14-0.37) 0.10 (0.05-0.20) < 0.01 0.60 (0.50-0.71) < 0.01 0.70 (0.02)
2 1.00 (ref.) 0.75 (0.48-1.16) 0.78 (0.43-1.39) 0.98 (0.39-2.46) 0.48 0.90 (0.64-1.27) 0.55 0.77 (0.02)
3 1.00 (ref.) 0.79 (0.51-1.22) 0.85 (0.47-1.55) 1.19 (0.47-3.05) 0.75 0.94 (0.66-1.33) 0.72 0.78 (0.02)
Bradshaw et al
1 1.00 (ref.) 0.50 (0.33-0.73) 0.26 (0.16-0.42) 0.13 (0.07-0.25) < 0.01 0.60 (0.50-0.71) < 0.01 0.68 (0.02)
2 1.00 (ref.) 0.87 (0.57-1.32) 0.83 (0.48-1.46) 1.16 (0.50-2.70) 0.77 0.86 (0.63-1.17) 0.34 0.78 (0.02)
3 1.00 (ref.) 0.90 (0.59-1.36) 0.91 (0.51-1.62) 1.34 (0.56-3.19) 0.97 0.89 (0.65-1.22) 0.48 0.78 (0.02)
de Souza e Silva et al
1 1.00 (ref.) 0.41 (0.27-0.60) 0.23 (0.14-0.37) 0.10 (0.05-0.20) < 0.01 0.59 (0.49-0.70) < 0.01 0.70 (0.02)
2 1.00 (ref.) 0.82 (0.53-1.27) 0.78 (0.44-1.39) 1.03 (0.41-2.59) 0.55 0.86 (0.61-1.21) 0.40 0.77 (0.02)
3 1.00 (ref.) 0.88 (0.57-1.36) 0.87 (0.48-1.58) 1.26 (0.50-3.23) 0.89 0.91 (0.64-1.29) 0.58 0.78 (0.02)
Jackson et al
1 1.00 (ref.) 0.40 (0.27-0.60) 0.23 (0.14-0.38) 0.10 (0.05-0.21) < 0.01 0.61 (0.51-0.72) < 0.01 0.69 (0.02)
2 1.00 (ref.) 0.65 (0.43-1.00) 0.72 (0.41-1.26) 0.82 (0.35-1.96) 0.22 0.91 (0.66-1.25) 0.57 0.77 (0.02)
3 1.00 (ref.) 0.65 (0.42-0.99) 0.78 (0.44-1.39) 0.95 (0.40-2.29) 0.34 0.94 (0.68-1.30) 0.72 0.78 (0.02)
Jang et al
1 1.00 (ref.) 0.40 (0.27-0.60) 0.23 (0.15-0.38) 0.10 (0.05-0.20) < 0.01 0.59 (0.50-0.70) < 0.01 0.70 (0.02)
2 1.00 (ref.) 0.79 (0.51-1.22) 0.84 (0.47-1.50) 1.07 (0.42-2.69) 0.66 0.90 (0.64-1.28) 0.56 0.78 (0.02)
3 1.00 (ref.) 0.83 (0.53-1.29) 0.91 (0.50-1.65) 1.33 (0.52-3.40) 0.96 0.94 (0.66-1.34) 0.72 0.78 (0.02)
Jurca et al
1 1.00 (ref.) 0.39 (0.22-0.68) 0.21 (0.11-0.43) 0.09 (0.03-0.24) < 0.01 0.52 (0.40-0.66) < 0.01 0.71 (0.03)
2 1.00 (ref.) 0.66 (0.37-1.19) 0.74 (0.33-1.66) 0.92 (0.25-3.34) 0.42 0.82 (0.52-1.29) 0.38 0.80 (0.03)
3 1.00 (ref.) 0.64 (0.35-1.16) 0.73 (0.32-1.65) 1.03 (0.27-4.02) 0.43 0.81 (0.50-1.29) 0.37 0.81 (0.02)
Matthews et al
1 1.00 (ref.) 0.41 (0.27-0.61) 0.24 (0.15-0.38) 0.11 (0.05-0.21) < 0.01 0.60 (0.50-0.71) < 0.01 0.69 (0.02)
2 1.00 (ref.) 0.70 (0.46-1.07) 0.74 (0.42-1.29) 0.79 (0.34-1.84) 0.25 0.88 (0.64-1.21) 0.43 0.77 (0.02)
3 1.00 (ref.) 0.69 (0.45-1.06) 0.79 (0.45-1.41) 0.88 (0.37-2.07) 0.37 0.92 (0.66-1.27) 0.60 0.78 (0.02)
Myers et al
1 1.00 (ref.) 0.41 (0.27-0.60) 0.21 (0.13-0.35) 0.09 (0.04-0.19) < 0.01 0.56 (0.47-0.67) < 0.01 0.70 (0.02)
2 1.00 (ref.) 0.76 (0.50-1.15) 0.66 (0.37-1.19) 0.86 (0.33-2.22) 0.22 0.76 (0.54-1.08) 0.13 0.77 (0.02)
3 1.00 (ref.) 0.80 (0.52-1.21) 0.72 (0.40-1.31) 1.11 (0.42-2.95) 0.42 0.81 (0.57-1.16) 0.26 0.78 (0.02)
Sloan et al; HR
1 1.00 (ref.) 0.39 (0.22-0.67) 0.19 (0.09-0.39) 0.08 (0.03-0.23) < 0.01 0.51 (0.40-0.66) < 0.01 0.71 (0.03)
2 1.00 (ref.) 0.61 (0.34-1.10) 0.61 (0.27-1.41) 0.78 (0.22-2.82) 0.21 0.81 (0.51-1.27) 0.36 0.80 (0.03)
3 1.00 (ref.) 0.60 (0.33-1.08) 0.59 (0.25-1.37) 0.86 (0.22-3.32) 0.21 0.80 (0.50-1.28) 0.35 0.81 (0.03)
Sloan et al; No HR
1 1.00 (ref.) 0.40 (0.27-0.60) 0.23 (0.14-0.38) 0.10 (0.05-0.21) < 0.01 0.61 (0.51-0.72) < 0.01 0.69 (0.02)
2 1.00 (ref.) 0.65 (0.43-1.00) 0.72 (0.41-1.26) 0.82 (0.35-1.96) 0.22 0.91 (0.66-1.25) 0.57 0.77 (0.02)
3 1.00 (ref.) 0.65 (0.42-0.99) 0.78 (0.44-1.39) 0.95 (0.40-2.29) 0.34 0.94 (0.68-1.30) 0.72 0.78 (0.02)

Model 1 = VO2max quartiles; crude

Model 2 = Model 1 + age + sex

Model 3 = Model 2 + race and ethnicity + education

DISCUSSION

In the present study, we sought to provide validation, recalibration, and predictive accuracy metrics of published V˙O2max prediction equations with the aim of enabling large scale epidemiologic cohorts with older, ambulatory, community-dwelling adults to accurately estimate V˙O2max. Performance metrics of several of the extant equations yielded reasonable results relative to measured V˙O2max, e.g. the Bradshaw (22) equation had an RMSE value of 4.2 mL•kg−1•min−1. This means that, on average, this equation’s errors were within ~1.2 metabolic equivalents (METs) assuming the standard conversion of 3.5 mL•kg−1•min−1 to 1 MET. The Matthews (25) equation had absolute bias value of 0.1, meaning that, on average, this model’s predictions were within 0.03 METs. The recalibration of these equations using the BLSA measured V˙O2max and covariate data improved every performance metric, although such recalibration would not be possible in epidemiologic cohorts unless V˙O2max and the covariates used in the derivation cohort were directly measured.

Cox proportional hazards modeling showed measured V˙O2max is an extremely powerful predictor of all-cause mortality in BLSA participants in both the unadjusted and adjusted models. Compared to participants in the lowest quartile of measured V˙O2max, those in the highest quartile had a 3-fold reduction in the risk of all-cause mortality, after adjusting for age, sex, race and ethnicity, and education. These HRs are similar to, though slightly stronger than, those reported in other studies of V˙O2max and all-cause mortality for those with the highest levels of CRF relative to those with the lowest CRF (4749).

Among the previously published V˙O2max prediction models, there was no discernable pattern of covariate types (i.e. demographics, body mass, self-reported PA) that contributed to the performance of the model more than others (e.g. the Bradshaw equation (22), one of the best performing models, has the same covariates as the Jurca equations (23), which did not perform as well in relation to measured V˙O2max in the BLSA). Several of the published equations yielded HRs similar in pattern and magnitude to those of measured V˙O2max before adjustment, but these associations were not robust to even minimal adjustments. After adjustment for only age and sex, the ability of the equations to predict mortality was substantially weakened, suggesting that much of the association observed in the unadjusted models was due to these two variables alone. In a very large study (n = 43,356), sex-stratified estimates of the association between predicted cardiorespiratory fitness and all-cause mortality remained statistically significant after adjustment for age (50). In regression models using the recalibrated equations, the patterns of association were more similar to those estimated using measured V˙O2max in unadjusted models, (i.e. closer to the pattern of the unadjusted HRs of measured V˙O2max): Q1 – Q4: 1.00 (ref.), 0.43 (0.29-0.63), 0.16 (0.09-0.29), and 0.10 (0.05-0.20).

Despite the pattern of the recalibrated equations’ HRs in unadjusted models, these associations were still not robust to adjustment. These findings strongly suggest that while the equations may be valid and useful, to varying degrees, for individual exercise prescriptions in the field, their ability to predict mortality is severely compromised after adjustment for basic demographic and anthropometric covariates, some of which are components of the prediction equations themselves. V˙O2max, and CRF in general, are complex constructs reflecting an integration of multifaceted organ systems and metabolic processes (51). Without direct measures of the physiologic variability across individuals inherent in measured CRF, even well-performing prediction equations based on basic demographic and health characteristics do not predict mortality independent of sex and age. To a large extent, this is because demographic and behavioral characteristics do not adequately capture the integrated physiological signal reflected in measured V˙O2max.

There are some limitations to the present study. First, not all covariates from the published equations had exact counterpart covariates in the BLSA. While these discrepancies could potentially limit the performance metrics of the equations when applied in the BLSA, this limitation would be eliminated once the equations were recalibrated to the BLSA measured V˙O2max. Next, the majority of the sample (61.9%) had a post-college education, which is higher than the general population. One substantial strength of the present study is the prospective follow-up, enabling the evaluation of the accuracy of predicted V˙O2max with respect to measured V˙O2max and their associations with mortality. BLSA enrolled a large group of racially and ethnically diverse older adults, included laboratory-based measurements of V˙O2max, followed participants for mortality outcomes after V˙O2max assessment, and collected data that enabled adjustment for confounders. The conclusions drawn from these data and analyses are robust across our approaches—the performance metrics and the HRs contribute to a consistent and unified narrative regarding the importance of accurately assessing V˙O2max in older adults and the relevance of this aging biomarker (31) to clinical outcomes such as all-cause mortality.

CONCLUSIONS

Measured V˙O2max is an extremely strong predictor of all-cause mortality in aging men and women. Those in the highest sex-specific quartile of measured V˙O2max experienced a 66% lower risk of death relative to those in the lowest quartile of V˙O2max after adjustment for age, race, sex, and education. Several published V˙O2max prediction models yielded: (1) reasonable performance metrics relative to measured V˙O2max, especially when recalibrated, (2) all-cause mortality hazard ratios similar to those of measured V˙O2max, especially when recalibrated, yet (3) were not robust to adjustment for basic demographic covariates. These findings make an important contribution to research on the development of an inexpensive surrogate for direct measurement of CRF that could be broadly used to guide healthy aging in the older population. Future studies should investigate whether modern analytic methods such as machine learning can improve prediction of V˙O2max in community-dwelling older adults so that this critical “vital sign” can be more broadly studied as a modifiable target for promoting functional resiliency and healthy aging.

Acknowledgments

This research was supported in part by the Intramural Research Program of the National Institute on Aging.

H Parada Jr was supported by the National Cancer Institute (K01 CA234317), the SDSU/UCSD Comprehensive Cancer Center Partnership (U54 CA132384 and U54 CA132379), and the Alzheimer’s Disease Resource Center for advancing Minority Aging Research at the University of California San Diego (P30 AG059299). The content is solely the responsibility of the authors and does not necessarily represent the official views of funding agencies.

The authors would like to acknowledge the BLSA participants and staff for their participation in this important scientific endeavor.

We would like to thank Sandy Liles for his thorough review of and contributions to this manuscript.

The authors have no conflicts of interest to declare. The results of the study have been presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. Publication of these results does not constitute endorsement by the American College of Sports Medicine.

REFERENCES

  • 1.Garber CE, Blissmer B, Deschenes MR, et al. American College of Sports Medicine position stand. Quantity and quality of exercise for developing and maintaining cardiorespiratory, musculoskeletal, and neuromotor fitness in apparently healthy adults: guidance for prescribing exercise. Med Sci Sports Exerc. 2011;43(7):1334–59. [DOI] [PubMed] [Google Scholar]
  • 2.Surgeon General’s Report on Physical Activity and Health. JAMA. 1996;276(7):522. [PubMed] [Google Scholar]
  • 3.2018 Physical Activity Guidelines Advisory Committee. 2018 Physical Activity Guidelines Advisory Committee Scientific Report. 2018; Available from: https://health.gov/paguidelines/second-edition/report/pdf/PAG_Advisory_Committee_Report.pdf. doi: 10.1111/j.1753-4887.2008.00136.x. [DOI]
  • 4.Physical Activity Guidelines Advisory Committee Report, 2008 to the Secretary of Health and Human Services: (525442010-001) 2008; [cited 2022 Feb 11 ] Available from: 10.1037/e525442010-001. [DOI] [PubMed]
  • 5.Lee D, Artero EG, Sui X, Blair SN. Mortality trends in the general population: the importance of cardiorespiratory fitness. J Psychopharmacol. 2010;24(4 Suppl):27–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wei M, Kampert JB, Barlow CE, et al. Relationship between low cardiorespiratory fitness and mortality in normal-weight, overweight, and obese men. JAMA. 1999;282(16):1547–53. [DOI] [PubMed] [Google Scholar]
  • 7.Myers J, McAuley P, Lavie CJ, Despres J-P, Arena R, Kokkinos P. Physical activity and cardiorespiratory fitness as major markers of cardiovascular risk: their independent and interwoven importance to health status. Prog Cardiovasc Dis. 2015;57(4):306–14. [DOI] [PubMed] [Google Scholar]
  • 8.Kodama S, Saito K, Tanaka S, et al. Cardiorespiratory fitness as a quantitative predictor of all-cause mortality and cardiovascular events in healthy men and women: a meta-analysis. JAMA. 2009;301(19):2024–35. [DOI] [PubMed] [Google Scholar]
  • 9.Sui X, LaMonte MJ, Blair SN. Cardiorespiratory fitness as a predictor of nonfatal cardiovascular events in asymptomatic women and men. Am J Epidemiol. 2007;165(12):1413–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.LaMonte MJ, Barlow CE, Jurca R, Kampert JB, Church TS, Blair SN. Cardiorespiratory fitness is inversely associated with the incidence of metabolic syndrome. Circulation. 2005;112(4):505–12. [DOI] [PubMed] [Google Scholar]
  • 11.Barlow CE, LaMonte MJ, Fitzgerald SJ, Kampert JB, Perrin JL, Blair SN. Cardiorespiratory fitness is an independent predictor of hypertension incidence among initially normotensive healthy women. Am J Epidemiol. 2006;163(2):142–50. [DOI] [PubMed] [Google Scholar]
  • 12.Lakoski SG, Willis BL, Barlow CE, et al. Midlife cardiorespiratory fitness, incident cancer, and survival after cancer in men: the Cooper Center Longitudinal Study. JAMA Oncol. 2015;1(2):231–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Peel JB, Sui X, Adams SA, Hébert JR, Hardin JW, Blair SN. A prospective study of cardiorespiratory fitness and breast cancer mortality. Med Sci Sports Exerc. 2009;41(4):742–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sui X, Lee D-C, Matthews CE, et al. Influence of cardiorespiratory fitness on lung cancer mortality. Med Sci Sports Exerc. 2010;42(5):872–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liu R, Sui X, Laditka JN, et al. Cardiorespiratory fitness as a predictor of dementia mortality in men and women. Med Sci Sports Exerc. 2012;44(2):253–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Vidoni ED, Honea RA, Billinger SA, Swerdlow RH, Burns JM. Cardiorespiratory fitness is associated with atrophy in Alzheimer’s and aging over 2 years. Neurobiol Aging. 2012;33(8):1624–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Milani RV, Lavie CJ. Impact of cardiac rehabilitation on depression and its associated mortality. Am J Med. 2007;120(9):799–806. [DOI] [PubMed] [Google Scholar]
  • 18.Sui X, Laditka JN, Church TS, et al. Prospective study of cardiorespiratory fitness and depressive symptoms in women and men. J Psychiatr Res. 2009;43(5):546–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Shephard RJ. Maximal oxygen intake and independence in old age. Br J Sports Med. 2009;43(5):342–6. [DOI] [PubMed] [Google Scholar]
  • 20.Sui X, LaMonte MJ, Laditka JN, et al. Cardiorespiratory fitness and adiposity as mortality predictors in older adults. JAMA. 2007;298(21):2507–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fletcher GF, Ades PA, Kligfield P, et al. Exercise standards for testing and training: a scientific statement from the American Heart Association. Circulation. 2013;128(8):873–934. [DOI] [PubMed] [Google Scholar]
  • 22.Bradshaw DI, George JD, Hyde A, et al. An accurate VO2max nonexercise regression model for 18-65-year-old adults. Res Q Exerc Sport. 2005;76(4):426–32. [DOI] [PubMed] [Google Scholar]
  • 23.Jurca R, Jackson AS, LaMonte MJ, et al. Assessing cardiorespiratory fitness without performing exercise testing. Am J Prev Med. 2005;29(3):185–93. [DOI] [PubMed] [Google Scholar]
  • 24.Jackson AS, Blair SN, Mahar MT, Wier LT, Ross RM, Stuteville JE. Prediction of functional aerobic capacity without exercise testing. Med Sci Sports Exerc. 1990;22(6):863–70. [DOI] [PubMed] [Google Scholar]
  • 25.Matthews CE, Heil DP, Freedson PS, Pastides H. Classification of cardiorespiratory fitness without exercise testing. Med Sci Sports Exerc. 1999;31(3):486–93. [DOI] [PubMed] [Google Scholar]
  • 26.Sloan RA, Haaland BA, Leung C, Padmanabhan U, Koh HC, Zee A. Cross-validation of a non-exercise measure for cardiorespiratory fitness in Singaporean adults. Singapore Med J. 2013;54(10):576–80. [DOI] [PubMed] [Google Scholar]
  • 27.de Souza E Silva CG, Kaminsky LA, Arena R, et al. A reference equation for maximal aerobic power for treadmill and cycle ergometer exercise testing: Analysis from the FRIEND registry. Eur J Prev Cardiol. 2018;25(7):742–50. [DOI] [PubMed] [Google Scholar]
  • 28.Jang T-W, Park S-G, Kim H-R, Kim J-M, Hong Y-S, Kim B-G. Estimation of maximal oxygen uptake without exercise testing in Korean healthy adult workers. Tohoku J Exp Med. 2012;227(4):313–9. [DOI] [PubMed] [Google Scholar]
  • 29.Baynard T, Arena RA, Myers J, Kaminsky LA. The role of body habitus in predicting cardiorespiratory fitness: the FRIEND registry. Int J Sports Med. 2016;37(11):863–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vespa J, Medina L, Armstrong DM. Demographic Turning Points for the United States: Population Projections for 2020 to 2060 Population Estimates and Projections Current Population Reports. [date unknown]. Available from: www.census.gov/programs-surveys/popproj.
  • 31.Kritchevsky SB, Forman DE, Callahan KE, et al. Pathways, contributors, and correlates of functional limitation across specialties: workshop summary. J Gerontol A Biol Sci Med Sci. 2019;74(4):534–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.History BLSA. Natl Inst Aging. [date unknown]; [cited 2021 Aug 17 ] Available from: http://www.nia.nih.gov/research/labs/blsa/history.
  • 33.Ferrucci L The Baltimore Longitudinal Study of Aging (BLSA): a 50-year-long journey and plans for the future. J Gerontol A Biol Sci Med Sci. 2008;63(12):1416–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Stone JL, Norris AH. Activities and attitudes of participants in the Baltimore longitudinal study. J Gerontol. 1966;21(4):575–80. [DOI] [PubMed] [Google Scholar]
  • 35.Balke B, Ware RW. An experimental study of physical fitness of Air Force personnel. U S Armed Forces Med J. 1959;10(6):675–88. [PubMed] [Google Scholar]
  • 36.Simonsick E, Fan E, Fleg J. Estimating cardiorespiratory fitness in well-functioning older adults: treadmill validation of the long distance corridor walk. J Am Geriatr Soc. 2006;54(1):127–32. [DOI] [PubMed] [Google Scholar]
  • 37.Data Access - National Death Index - About 2021; [cited 2022 Mar 1 ] Available from: https://www.cdc.gov/nchs/ndi/about.htm.
  • 38.Schrack JA, Leroux A, Fleg JL, et al. Using heart rate and accelerometry to define quantity and intensity of physical activity in older adults. J Gerontol A Biol Sci Med Sci. 2018;73(5):668–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ware J, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34(3):220–33. [DOI] [PubMed] [Google Scholar]
  • 40.Wanigatunga AA, Gresham GK, Kuo P-L, et al. Contrasting characteristics of daily physical activity in older adults by cancer history. Cancer. 2018;124(24):4692–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Martin Bland J, Altman DouglasG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10. [PubMed] [Google Scholar]
  • 42.Datta D deepankardatta/blandr: Version 0.4.2. 2017; [cited 2022 Feb 24 ] Available from: https://zenodo.org/record/824524.
  • 43.D’Agostino RB, Grundy S, Sullivan LM, Wilson P, CHD Risk Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA. 2001;286(2):180–7. [DOI] [PubMed] [Google Scholar]
  • 44.Altman N, Krzywinski M. Regression diagnostics. Nat Methods. 2016;13(5):385–6. [Google Scholar]
  • 45.Therneau, Terry M. A Package for Survival Analysis in R. [date unknown]; Available from: https://cran.r-project.org/package=survival.
  • 46.Freund RJ, Littell RC, Creighton L. Regression Using JMP. J. Wiley; 2003. 286 p. [Google Scholar]
  • 47.Farrell SW, Braun L, Barlow CE, Cheng YJ, Blair SN. The relation of body mass index, cardiorespiratory fitness, and all-cause mortality in women. Obes Res. 2002;10(6):417–23. [DOI] [PubMed] [Google Scholar]
  • 48.Park M-S, Chung S-Y, Chang Y, Kim K. Physical activity and physical fitness as predictors of all-cause mortality in Korean men. J Korean Med Sci. 2009;24(1):13–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Salier Eriksson J, Ekblom B, Andersson G, Wallin P, Ekblom-Bak E. Scaling VO 2 max to body size differences to evaluate associations to CVD incidence and all-cause mortality risk. BMJ Open Sport Exerc Med. 2021;7(1):e000854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Artero EG, Jackson AS, Sui X, et al. Longitudinal algorithms to estimate cardiorespiratory fitness: associations with nonfatal cardiovascular disease and disease-specific mortality. J Am Coll Cardiol. 2014;63(21):2289–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Mitchell JH, Blomqvist G. Maximal oxygen uptake. N Engl J Med. 1971;284(18):1018–22. [DOI] [PubMed] [Google Scholar]

RESOURCES