Abstract
Air pollution intake represents the amount of pollution inhaled into the body and may be calculated by multiplying an individual’s ventilation rate with the concentration of pollutant present in their breathing zone. Ventilation rate is difficult to measure directly, and methods for estimating ventilation rate (and intake) are lacking. Therefore, the goal of this work was to examine how well linear models using heart rate and other basic physiologic data can predict personal ventilation rate.
We measured personal ventilation and heart rate among a panel of subjects (n = 36) while they conducted a series of specified routine tasks of varying exertion levels. From these data, 136 candidate models were identified using a series of variable transformation and selection algorithms. A second “free-living” validation study (n = 26) served as an independent validation dataset for these candidate models.
The top-performing model, which included heart rate (Hr), resting heart rate (Hrest), age, sex, and hip circumference and interactions between sex with Hr, Hrest, age, and hip predicted ventilation rate (Ve) to within 11% and 33% for moderate (Ve = 45 L/min) and low (Ve= 15 L/min) intensity activities, respectively, based on the validation study. Many of the promising candidate models performed substantially worse under independent validation.
Our results indicate that while measures of air pollution exposure and intake are highly correlated within tasks for a given individual, this correlation decreases substantially across tasks (i.e., as individuals go about a series of typical daily activities). This discordance between exposure and intake may influence exposure-response estimates in epidemiological studies. New air pollution studies should consider the trade-offs between the predictive ability of intake models and the error potentially introduced by not accounting for ventilation rate.
Introduction
Exposure to ambient air pollution is associated with increased risk for many adverse health conditions, including respiratory disease, cardiovascular disease, and cancer (1–6). The source-effect pathway (Figure 1) illustrates the major steps between air pollution emissions and a resulting health effect (7, 8); the pathway also provides a paradigm for research (and intervention) on the health effects of air pollution. Epidemiologic research commonly focuses on estimating exposure-response using ambient concentrations or personal exposures. Exposure concentrations are commonly used because they are feasible (from a study design perspective) and practical (from a regulatory perspective). However, previous research has demonstrated that the same external exposure may result in a different internal dose (9, 10). The use of exposure concentrations in epidemiologic studies ignores the differential pollutant doses that can be produced by heterogeneity in an individual’s intake and uptake of pollution. Thus, measurement error is likely introduced by ignoring person-to-person variability in the exposure-dose relationship, potentially resulting in bias and a loss of precision (11, 12).
Figure 1.

The air pollution source-effect pathway from emissions to health effects (7). Air pollution is modified during transport from source to the point of exposure. A fraction of inhaled the pollution can remain in the body resulting in potential adverse health effects.
The inclusion of intake, the product of exposure concentration and minute ventilation rate, has been suggested for air pollution epidemiology and risk assessment to account for differences in the amount of air pollution people inhale (e.g. 13, 14) and to reduce measurement error because it is one step closer to dose on the source-effect pathway. Ventilation rate is generally not measured in air pollution exposure studies (and, therefore, neither is intake). Although ventilation rate can be measured directly using a facemask with an airflow sensor, this method is not appropriate when simultaneously measuring endpoints in health studies because the necessary equipment covers the mouth and nose and so modifies the intake of air pollution. Estimates of ventilation rate may be obtained from less invasive measures, such as heart rate (e.g. 15, 16-18), which is correlated with ventilation rate. Low-cost personal heart-rate monitors are becoming more common as the market for wearable sensors continues to grow, devices improve, and their use becomes more ubiquitous (19). Thus, with improved predictive models, ventilation rate can potentially be accounted for in large-scale air pollution studies and used in conjunction with ambient concentration to predict intake and reduce measurement error in air pollution studies.
Models to predict ventilation rate based on heart rate have been described previously; these models are typically calibrated using exercise testing (e.g. 16, 18) and often do not account for sedentary behaviors. Models to estimate minute ventilation from heart rate calibrated on an individual level perform relatively well (e.g. 13). Predictive models (i.e. those not calibrated to an individual but intended to be generalizable across individuals) perform less well. Some studies have included subject-level measurements, including sex, height, weight, and spirometry to help explain between-person variation in the relationship between heart rate and ventilation, with mixed success (e.g. 18, 20). Measurements that capture body size such as height and weight are expected to explain some of the person-to-person variability in minute ventilation because of the higher energy demand associated with larger body sizes (21) and the correlation between body size and lung volume. Studies have also identified differences in how male’s and female’s minute ventilation responds to exercise (21, 22). Previous models have generally been validated using data from the training group (e.g. within-sample cross-validation techniques). Therefore, the performance (and generalizability) of these models is often uncertain when applied outside the original study population.
The objective of this study was to develop and validate models to predict ventilation rate from heart rate and other readily obtainable physiologic measurements (e.g. height and weight). More complex individual level measurements that require specialized equipment and/or clinical expertise such as lung function parameters were not considered to make the models easier to apply to larger studies prospectively and retrospectively. Such models may help reduce measurement error in epidemiologic research associated with ignoring differences in ventilation rate and, therefore, air pollution intake. Predictive ventilatory models of sufficient accuracy would help bridge the uncertainty between exposure and intake along the source-effect pathway.
2. Methods
Data were collected in two parts: a laboratory training study and a field validation study. The training study was used to develop candidate models for predicting ventilation rate from heart rate and other basic physiological variables. The validation study collected a new dataset under less controlled conditions to test the predictive models in a more realistic setting.
2.1. Participant Recruitment
We recruited healthy, adult volunteers to study their ventilation rate and heart rate as they completed activities requiring different levels of exertion. Inclusion criteria were: age between 18 and 65 years, non-smoking, and no major health problems (no self-reported chronic conditions, body mass index below 30 kg/m2, resting blood pressure below 160/100 mm Hg, and stable use of any prescription medication) and not pregnant. Participants fasted for four hours prior to participation. The study protocol was approved by the Colorado State University Institutional Review Board; participants completed written informed consent.
2.2. Laboratory Training
Thirty-six participants were recruited for the laboratory study. Participants were fitted with an Oxycon Mobile indirect calorimetry system (CareFusion Respiratory Care, CA, USA) that measured breath-to-breath ventilation rate and heart rate averaged to five-second resolution (23). The Oxycon Mobile’s flow (to within 1.5% difference) and gas sensors were calibrated at laboratory temperature, pressure, and humidity on each study day after a 30 minute warm up period. A number of studies have examined the validity of the mask used by the Oxycon Mobile to measure ventilation rate, finding differences of less than 10% for the values reported here (e.g. 23, 24, 25). Physiologically implausible heart rate data (heart rate <30 and heart rate > 200), presumably due to instrument error, were removed during data analysis. Ventilation rate data fell within Before participants began activities, we measured their blood pressure (mm Hg), chest size (cm), height (cm), hip size (cm), waist size (cm) and weight (kg) according to American College of Sports Medicine (ACSM) guidelines for exercise testing and prescription (26). They also completed a questionnaire that included age and sex. Resting heart rate was calculated for the training tests as the sitting heart rate minus five beats per minute, to bring it closer to supine heart rate which is likely at least five beats per minute lower (27).
The participants performed 9 or 11 prescribed tasks (the higher speed walking tasks were added partway through the training study because the calibrated treadmill speed of 2 mph was thought to be somewhat lower than typical walking pace), lasting six minutes each, at the Colorado State University Human Performance Laboratory. Tasks included sitting, standing, walking at 3.2 and/or 4.8 kilometers per hour (km hr−1), walking with a 4.8 kg load split between bags held in each hand at 3.2 and/or 4.8 km hr−1, sweeping, stationary cycling at 50 watts, stationary cycling at 100 watts, and shoveling sand. Participants were asked not to speak during each task and given the option to rest between tasks. The last two minutes of data for each activity were averaged and used in the predictive model development.
2.3. Predictive Model Development
Candidate predictive models were developed from the training data using a two-stage approach in the R statistical language (version 3.3.1, The R Foundation for Statistical Computing, details in supplementary material 2). First, we identified variable transformations that improve the model fit. Second, we employed a variable selection procedure to test the inclusion of the transformed variables identified in the first stage and two-way interactions between those variables that improve the model fit. All models that performed above a pre-specified threshold were then validated against independent field data. Figure 2 illustrates the steps from variable selection to model validation described below.
Figure 2.

Steps from variable selection to model validation. Nine variables are considered: age, blood pressure (bp), chest circumference (chest), height, heart rate (Hr), resting heart rate (Hrest), sex, and weight. A multi-fractional polynomial (MFP) algorithm was used to identify useful variables and their transformations. A two-way interaction search (glmulti) algorithm identified the best models from the MFP identified variables. Models were cross-validated using the training data and independently validated using the validation study dataset.
In the first step, we used multi-fractional-polynomials (MFP) to identify nonlinear relationships between variables and ventilation rate. MFP combines stepwise variable selection with polynomial transformations (28–30). We restrict the fractional polynomials to linear combinations of two terms, with powers, −2, −1, loge, 0.5, 1, 2 or 3 to limit the chance of overfitting and maintain a more interpretable model (31). We used a bootstrapping procedure (n = 10,000) to account for uncertainty in models selected by MFP (29). Candidate variables and their transformations that were selected by MFP in more than 15% (chosen to deliver a manageable number of models for analysis) of the bootstrap runs were retained for further consideration.
The second stage of the model building approach performs an exhaustive search of the candidate variables and their transformations identified with MFP including searching all two-way interactions between variables using the R package glmulti (32). A participant-specific random effect was included in the models to account for the same individual performing different activities. We selected all models with an Akaike information criterion (AIC) within two of the best fitting model (for each set of candidate variables) as the candidate models retained for further testing (33, 34). We restricted the models to contain at most one size variable (chest, height, hip, waist, or weight) and conducted a separate iteration of the glmulti algorithm for each variable to reduce the computational burden. This variable reduction step is also justified by the correlation between the size variables and small differences in MFP model performance between models with one or multiple size variables.
The predictive ability of each candidate model was tested using leave-one-out cross validation (35) on the training dataset for comparison to the independent validation. Each model was fitted with one subject’s training data removed at a time. Root mean square error (RMSE) for predicting the removed observations was calculated for each person on each task and averaged to obtain the mean RMSE for each task and the overall mean RMSE. Cross validation is commonly used to test the performance of predictive ventilatory models (e.g. 16, 36-38). We chose leave-one-out cross validation because this technique is widely reported with these types of models and because it is suitable for smaller datasets (39, section: 7.10). We also compare and contrast the cross- validation approach to an independent validation (i.e., validation against an independent dataset that was not used to for model development) in an effort to assess performance of the models and the cross-validation approach, more generally. Additional details on the predictive model development are provided in supplementary material.
2.4. Simplified models
Model over-fitting is an important concern when searching using statistical methods to identify the best model. The use of AIC (which penalizes the addition of model terms), hold-out validation, and independent validation is designed to minimize the chance of over-fitting. Additionally, we tested a set of simplified models with no interaction terms to test for over-fitting in our model selection. All combinations of the MFP-identified variables would be tested. See supplementary material for a list of the simplified models tested.
2.5. Basic Model
A basic single-level linear model with the form:
| [1] |
where Hr as the only independent variable was also run. The basic model provides a reference to gauge improvements gained by the addition of variables. The basic model could also be identified by the variable selection procedure if its performance were good enough.
2.6. Independent model validation in a field study
A second dataset was collected to validate the predictive performance of the candidate models identified from the training dataset. The validation study recruited 26 participants to perform a series of tasks at their own pace in and around the Colorado State University campus. These tasks involved walking (approximately 0.8 km), riding a bus (0.8 km), a seated task (a 6 minute, computer-based card game), an active task (approximately 10 minutes, sorting and weighing colored balls), and cycling between two locations (approximately 1.6 km). A member of the study team accompanied participants to provide instructions and answer questions as needed. The entire series of tasks was designed to take around one hour to complete. Ventilation rate, heart rate and physiological data were collected in the same manner as the training study. Heart and ventilation rates were aggregated to a 30 s running average for the purpose of validation.
Ventilation rate was predicted using each of the candidate models identified from the laboratory training data as described in Section 2.3 as well as the simplified and basic heart rate models (Sections 2.4 and 2.5, respectively). Task-specific and overall RMSE (as described above) were calculated to assess model performance. Resting heart rate was calculated as the last two minutes of the sitting heart rate task minus 5 beats per minute.
2.7. Exposure assessment
Exposure to particle number (PN) was measured for a subset of participants (n = 11) from the validation study using a diffusion classifier (Disc Mini, Matter Aerosol AG, Switzerland). PN data were used to calculate the time-weighted average concentration (TWA), the time-weighted inhalation rate (i.e., number of particles inhaled per minute), and the intake (total number of particles inhaled) for each task and participant. The exposure metrics (inhaled PN versus PN concentration) were compared to each other using linear models within and between tasks. The relationship between inhaled PN and PN concentration is then used to assess the potential for exposure misclassification when concentration is used as a proxy for pollution dose. The predicted PN intake was compared to the measured PN intake across all tasks, again using a linear model. The relationship between measured and predicted intake is used to infer the usefulness of the predictive ventilatory models.
3. Results
3.1. Study population
Thirty-five out of thirty-six participants completed the laboratory training study, one participant could not complete all the activities and was removed from the analysis. Twenty-six participants completed the validation study and heart rate data was successfully collected for twenty-two of the participants (Hr was not measured for four of the validation study participants due to malfunctioning heartrate leads). Less than 0.5% of the heart rate data was screened out of the analysis as a result of the heart rate range criteria (30–200 bpm). The participant characteristics are presented in Table 1.
Table 1.
Participant characteristics for the training and validation datasets.
| Variable | Range | Training N (%) |
Validation N (%) |
|---|---|---|---|
| Age, years | 18–24 | 3 (9%) | 7 (27%) |
| 25–34 | 8 (23%) | 10 (38%) | |
| 35–44 | 7 (20%) | 2 (8%) | |
| 45–54 | 6 (17%) | 3 (12%) | |
| 55–65 | 11 (31%) | 4(15%) | |
| Chest, cm | 60–70 | 1 (3%) | 3 (12%) |
| 70–80 | 13 (37%) | 8(31%) | |
| 80–90 | 12 (34%) | 9 (35%) | |
| 90–100 | 9 (26%) | 5 (14%) | |
| 100–110 | 0 (0%) | 1 (4%) | |
| Weight, kg | 40–50 | 2 (6%) | 2 (8%) |
| 50–60 | 5 (14%) | 5 (19%) | |
| 60–70 | 15 (43%) | 8(31%) | |
| 70–80 | 7 (20%) | 5 (19%) | |
| 80–90 | 6 (17%) | 5 (19%) | |
| 90–105 | 0 (0%) | 1 (4%) | |
| Sex | Female | 19 (54%) | 15 (58%) |
| Male | 16 (46%) | 11 (42%) | |
| Resting heart rate, bpm | 30–50 | 2 (6%) | 5 (19%) |
| 50–60 | 14 (40%) | 5 (19%) | |
| 60–70 | 16 (46%) | 2 (8%) | |
| 70–80 | 2 (6%) | 8(31%) | |
| 80–100 | 1 (3%) | 2 (8%) | |
| Missing | 0 (0%) | 4(15%) |
3.2. Laboratory training
3.2.1. Laboratory results
The (arithmetic) mean ventilation rate stratified by activity (Table 2) ranged from 9 to 45 L/min. Higher between-participant variability in ventilation rate was observed for the sweeping and shoveling tasks, which were performed at each participant’s own pace. The active tasks (walking and cycling) produced similar mean ventilation rates in the training and validation studies. The sitting tasks in the validation (sitting and bus ride) produced more variable ventilation rates with a higher mean, as some participant’s ventilation remained higher after completing a more active task beforehand.
Table 2.
Mean participant ventilation rates (L/min) and standard deviations (s.d.) stratified by activity.
| Activity | Mean (s.d.) Training N = 35 |
Mean (s.d.) Validation N= 26 |
|---|---|---|
| Sitting | 8.8 (1.6) | 15.1 (5.1) |
| Bus ride | - | 13.9 (5.6) |
| Standing still | 9.4 (1.8) | - |
| Sorting task | - | 17.6 (4.4) |
| Walking (2 mph) | 19.5 (3.0) | - |
| Loaded walk (2 mph) | 22.0 (2.9) | - |
| Walking (3 mph) | 25.4 (2.7) | - |
| Loaded walk (3 mph) | 28.3 (3.7) | - |
| Walking | - | 27.7 (5.8) |
| Cycling (50W) | 30.3 (4.3) | - |
| Cycling (100W) | 44.9 (7.1) | - |
| Cycling | - | 40.1 (10.7) |
| Sweeping | 31.5 (7.1) | - |
| Shoveling | 37.7 (8.9) | - |
3.2.2. Model building
The bootstrap MFP analysis showed consistent selection of variables over different iterations (equation 2) with Hr was selected 100% of the time, 54% of the time with the square root transformation and 24% with a logarithmic transformation. Hrest was selected 97% of the time, 77% of the time a transformation to the power of one (i.e., no transformation) was selected. Age was selected 83% of the time, suggested transformations varied, the most frequent a power one transformation that was selected 36% of the time. The categorical sex variable was selected in 78% of the runs. At least one size variable (chest, height, hip, waist or weight) appeared 92% of the time. A single size variable was selected in 36% (chest 16%, height 3%, hip 6%, waist 6%, and weight 5%) of the 10,000 bootstrap samples. Variable selection frequency and common power transformations from the bootstrap analysis are shown in Supplementary Material. Equation 2 shows the model selected by the MFP algorithm prior to the bootstrap analysis. Cross-validation of equation 2 results in a mean RMSE of 5.3 L/min.
| [2] |
Each glmulti search was repeated with three transformations of heart rate: untransformed, square root transformed, and log transformed. The glmulti analysis produced 136 unique candidate models with an AIC within two of the best model (for each set of candidate variables). Cross-validation of the candidate glmulti models resulted in RMSE of 4.9 to 5.4 L/min, slightly higher than with the full data set (4.6 to 5.2 L/min).
The glmulti analysis identified potential interaction terms between sex and each of the other candidate variables. The Hr, Hrest, sex and sex-Hr variables were retained by all the glmulti candidate models with AIC within two of the best models. Interaction terms for sex -Hrest and sex -age were included in 50% and 43% of the models respectively. A size variable was included in 88% of the glmulti candidate models. A size-sex interaction term was included in 44% of the glmulti candidate models.
3.3. Validation
All 136 candidate models identified in the laboratory test were evaluated for predictive performance in the validation study. The best performing model had a mean participant weighted RMSE of 4.9 (s.d. = 1.21) L/min over the five activity categories. The best performing model in the validation study was:
| [3] |
where β0 = 0.99, β1 = −27.41, β2 = 50.24, β3 = 15.73, β4 = −43.65, β5 = −7.02, β6 = 23.02, β7 = −10.34, β8 = −26.21, and β9 = 38.78. The full list of candidate models and their performance is provided in supplementary material.
The best performing models (RMSE 4.9–5.2 L/min) contained the untransformed Hr variable, followed closely by models with the square root Hr transform (RMSE 5.2–5.4 L/min). The best log Hr transformed model had an RMSE of 5.7 L/min. The sex variable was contained in all the top 50 models (RMSE 4.9–5.7 L/min). The age variable was contained in all but two of the top 50 performing models. No single size variable consistently outperformed any other size variable. The sex x Hrest interaction appeared in all of the top 35 models (max RMSE 5.5 L/min). The sex x Hr interaction appeared in all of the top 16 models (max RMSE 5.2 L/min). The sex x age interaction resulted in some minor improvements when added (< 0.1 L/min RMSE).
The validated performance of the 136 candidate models (and their simplifications) is shown in Figure 3 and compared to their performance under cross-validation of the original training dataset. The validation RMSE tends to be higher than the training RMSE. The color coding in Figure 3 illustrates how certain variables and variable combinations appear consistently in the top performing models. The basic heart-rate only model (highlighted in Figure 3) produced an RMSE of 7.9 L/min. The top-performing model in the training cross-validation (RMSE = 4.9 L/min) performed worse in the validation (RMSE = 5.9 L/min) and is also highlighted in Figure 3.
Figure 3.

Root mean square error (RMSE) of the 136 candidate (x) and 51 simplified (no interactions between variables - □) models under cross-validation (training study) and the independent validation study. The color- scale shows variables (where heart rate = Hr, resting heart rate = Hrest, sex interaction terms = sexx , and size is either chest, height, hip, waist, or weight) models have in common.
The top-performing model (equation 3) was examined as a function of each task conducted during the validation experiment. The RMSE for Equation 3 (averaged by participant) was consistent from task to task (approximately 5 L/min), except for cycling where it increased to 7.5 L/min.
3.4. Basic model results
The basic heart rate only model (Equation 1), used to assess the value of additional variables, resulted in a mean RMSE of 7.4 liters per minute (L/min) across all activities. Cross-validation of the heart-rate only model resulted in a mean participant-weighted RMSE of 7.6 L/min (s.d. = 3.8), only slightly higher than the 7.4 L/min RMSE when all data were included. The candidate models with additional variables improved upon the heart rate only model by 2.2 to 2.7 L/min in terms of the cross-validated RMSE.
3.5. Analysis of Exposure vs. Intake
Breathing zone particle number (PN) concentration was measured for 11 participants (one participant’s data was removed due to an instrument malfunction). Exposure (PN concentration) was compared to intake (PN inhaled) within task, illustrated in Figure 4. Within task there is generally a strong linear relationship between PN concentration and PN inhalation (R2 0.73–0.97). The relationship between time-weighted average exposure and intake was also explored between tasks (Figure 5). The relationship between exposure concentration and intake (Figure 5a) was only moderately linear across multiple tasks (linear model R2 = 0.53). Predicted intake (using the top-performing model) compared well to measured intake (Figure 5b). The predicted versus measured intake relationship has a higher R2 (0.93) than the exposure vs. intake relationship. The measured ventilation rates and exposure levels are provided in supplementary material.
Figure 4.

Number concentration versus number of particles inhaled by task, with linear regression (black line) and 95% confidence interval (grey shading).
Figure 5.

(a) Measured personal exposure concentration versus measured intake. (b) Predicted intake versus measured intake. Black lines show linear model fit with 95% confidence interval (grey shading).
4. Discussion
Accurate and generalizable models that predict personal ventilation rates may help reduce measurement error in epidemiological studies by bridging the gap between air pollution exposure and intake. Personal heart rate can be used to predict ventilation rate for the purpose of estimating air pollution intake. In addition to heart rate, the models developed here used variables that may be feasibly collected in many epidemiologic studies.
The top-performing model produced a task-average RMSE of 4.9 L/min in the validation study. The candidate models have similar errors under laboratory training cross-validation (4.9 – 5.4 L/min) but a larger range of errors (4.9 – 7.0 L/min) in the validation study. The difference in model performance between training and validation study datasets suggests that some over-fitting is occurring. Our sample size was similar to previous studies, suggesting that over-fitting could be a problem in previous studies that employ validation using the training dataset.
A novel aspect of our work is the inclusion of a resting heart rate variable, which was selected in every candidate model. The resting heart rate variable may add value to a predictive model because it helps account for the person-to-person variability in the heart rate - ventilation rate relationship - akin to the individually calibrated models. In this study, we defined resting heart rate as the sitting heart rate minus a constant of 5 beats-per-minute. Further work is needed to investigate how best to determine resting heart rate from continuous heart rate data measured during epidemiologic research, data collected during sleep should be a reliable method for example.
Our examination of the top models identified combinations of variables associated with improved performance. The age and sex variables were present in all models with RMSE less than approximately 5.5 L/min. Additionally, the best models with RMSE less than approximately 5.0 L/min contained a size and sex x Hr and or sex x Hrest interaction variables. Inclusion of these interaction terms, however, provided only marginal improvements to model performance (on the order of 10% improvement to RMSE).
Here we are not attempting to derive a mechanistic understanding of the heart rate - ventilation rate relationship, however it is worth noting briefly because we might expect models that are physiologically consistent to be more robust. Differences in ventilation patterns by sex have been observed (21, 22) the inclusion of a sex term and interactions between sex and other variables is consistent with these findings. Heartrate dynamic range is known to decrease with age, however all else being equal the bodies ventilatory requirements do not, thus it is plausible that the heart rate - ventilation rate relationship is modified by age. Similarly, body size will affect energy demand and thus oxygen requirement, thus it is plausible that the heart rate - ventilation rate relationship is modified by a size variable.
A number of the simplified models performed as well in the independent validation as the more complex models, another reason to suggest the laboratory training cross-validation is under-estimating the model error. There are groups of models that do outperform the best simple model (RMSE <5.4 L/min), suggesting with careful variable selection and validation more complex models can deliver reductions in error. However, some of the complex models perform less well in the validation study, demonstrating the importance of model validation in a realistic setting.
An important distinction of this work is that the models are designed to predict ventilation rate from heart rate for different activities performed at low and moderate levels of exertion. Therefore, the linear ventilation - heart rate relationship in the top-performing model (equation 3) is quite different from the exponential relationship in models designed around progressive exercise testing. The models developed here are designed to be applied to everyday tasks and may produce larger errors for more extreme exercise activities. The models developed here are geared towards studies that seek to determine the health effects of air pollution across typical daily activities and within common microenvironments where individuals spend the majority of their time. The models tested here are unlikely to be suitable for higher exertion activities (e.g. sports) because the heart rate - ventilation rate relationship is non-linear across low to high heart rates (e.g. 18).
Ease of use in epidemiological studies was central to the model design. Our models require heart rate data (which is becoming easier to collect) and other commonly collected information. This approach will enable ventilation rate to be predicted in larger studies with little additional burden. The usefulness of predictive ventilatory models for epidemiologic studies of air pollution will depend on whether they are able to reduce error in the exposure-response estimates. Investigators should consider if the modelled ventilation rate measurement error is smaller than the misclassification error implicit in not considering intake, as well as the type of error from each source and the implications for the specific epidemiologic study design. Work would also be required to validate or adapt these models developed using healthy populations for studies of unhealthy groups such as people with lung diseases.
Ventilation rate can vary both within individuals and between individuals, typically by up to 4 L/min within and by up to a factor of four for between everyday tasks (40, 41). When ventilation rate is ignored in exposure assessment, there is an implicit assumption that it is constant across the population, resulting in Berksonian error (42, 43). This variability, often unaccounted for in epidemiological studies, may reduce our ability to detect relationships between air pollution and health outcomes.
Our exploratory exposure analysis suggests that between tasks of different characteristic ventilation rate a predictive ventilatory model (RMSE = 4.9 L/min) may give a much better estimate of inhaled pollution (measured versus modelled intake R2 = 0.93) than time-weighted-average exposure (measured exposure concentration versus measured intake R2 = 0.53). For example, in studies that compare air pollution while driving to cycling you would expect to see substantial exposure misclassification if intake was not considered (44). If task is associated with ventilation rate then the ability to determine task in study populations could be useful. If task-specific models were developed they might be more precise. For large studies it would be useful to be able to classify common activities using wearable sensors (e.g. 45) eliminating the need for self-reported time-activity surveys. Within task there is a stronger linear relationship between measured intake and exposure (R2 0.73 to 0.97) than between tasks, suggesting that predictive ventilatory models are less likely to be useful for tasks associated with more heterogeneous ventilation rates.
In conclusion, we developed and validated a set of models designed to predict ventilation rate from heart rate for everyday activities. We showed that cross-validation approaches, which rely on the same data used to train models, may over-estimate predictive ability. We found the best models contained a resting heart rate variable and specific combinations of variables describing subject’s size, age and sex also improved performance. Finally, we compared exposure data against measured and predicted intake data to demonstrate how their relationships may vary between micro-environments. Future work could focus on exploring the relationships between exposure, inhaled dose and health effects.
Acknowledgements:
This work was funded by the United States Department of Health and Human Services (HHS), National Institute of Health (NIH), National Institute of Environmental Health Sciences (NIEHS) under grant R01ES020017 and by CDC NIOSH Mountain and Plains Education and Research Center (MAP-ERC) grant number T420H009229–08. The content of this article is solely the authors’ responsibility and does not necessarily represent official views of the HHS, NIH, NIEHS, CDC NIOSH or MAPERC.
Footnotes
Competing financial interests declaration: The authors declare no competing financial interests.
5. References
- 1.Künzli N, Kaiser R, Medina S, Studnicka M, Chanel O, Filliger P, et al. Public- health impact of outdoor and traffic-related air pollution: a European assessment. The Lancet. 2000;356(9232):795–801. [DOI] [PubMed] [Google Scholar]
- 2.Brunekreef B, Holgate ST. Air pollution and health. The Lancet. 2002;360(9341):1233–42. [DOI] [PubMed] [Google Scholar]
- 3.Pope CA 3rd,Burnett RT, Thurston GD, Thun MJ, Calle EE, Krewski D, et al. Cardiovascular mortality and long-term exposure to particulate air pollution: epidemiological evidence of general pathophysiological pathways of disease. Circulation. 2004;109(1):71–7. [DOI] [PubMed] [Google Scholar]
- 4.EPA US. Integrated Science Assessment for Oxides of Nitrogen – Health Criteria (2016 Final Report). Washington, DC, EPA/600/R-15/068: U.S. Environmental Protection Agency; 2016. [Google Scholar]
- 5.EPA US. 2013 Final Report: Integrated Science Assessment of Ozone and Related Photochemical Oxidants. Washington, DC, EPA/600/R-10/076F: U.S. Environmental Protection Agency; 2013. [Google Scholar]
- 6.U.S. EPA. Integrated Science Assessment for Particulate Matter - EPA/600/R- 08/139F. 2009.
- 7.Lioy PJ. Assessing Total Human Exposure to Contaminants - a Multidisciplinary Approach. Environmental Science & Technology. 1990;24(7):938–45. [Google Scholar]
- 8.Monn C Exposure assessment of air pollutants: a review on spatial heterogeneity and indoor/outdoor/personal exposure to suspended particulate matter, nitrogen dioxide and ozone. Atmospheric Environment. 2001;35(1):1–32. [Google Scholar]
- 9.Duan N, Dobbs A, Ott W. Comprehensive definitions of exposure and dose to environmental pollution, SIMS technical report no. 159. Stanford, CA: Department of Statistics, Stanford University; 1990. [Google Scholar]
- 10.Zhang JJ, Lioy PJ. Human exposure assessment in air pollution systems. ScientificWorldJournal. 2002;2:497–513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sheppard L, Burnett RT, Szpiro AA, Kim SY, Jerrett M, Pope CA 3rd et al. Confounding and exposure measurement error in air pollution epidemiology. Air Qual Atmos Health. 2012;5(2):203–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zeger SL, Thomas D, Dominici F, Samet JM, Schwartz J, Dockery D, et al. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect. 2000;108(5):419–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Samet JM, Lambert WE, James DS, Mermier CM, Chick TW. Assessment of heart rate as a predictor of ventilation. 1993. [PubMed] [Google Scholar]
- 14.U.S. Environmental Protection Agency. EPA methods for derivation of inhalation reference concentrations and application of inhalation dosimetry. Washington, DC: Office of Research and Development, Office of Health and Environmental Assessment, EPA/600/8–90/066F; 1994. [Google Scholar]
- 15.Cole-Hunter T, Morawska L, Stewart I, Jayaratne R, Solomon C. Inhaled particle counts on bicycle commute routes of low and high proximity to motorised traffic. Atmospheric Environment. 2012;61:197–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Greenwald R, Hayat MJ, Barton J, Lopukhin A. A novel method for quantifying the inhaled dose of air pollutants based on heart rate, breathing rate and forced vital capacity. PLoS One. 2016;11(1):e0147578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ramos CA, Reis JF, Almeida T, Alves F, Wolterbeek HT, Almeida SM. Estimating the inhaled dose of pollutants during indoor physical activity. Sci Total Environ. 2015;527-528:111–8. [DOI] [PubMed] [Google Scholar]
- 18.Zuurbier M, Hoek G, van den Hazel P, Brunekreef B. Minute ventilation of cyclists, car and bus passengers: an experimental study. Environ Health. 2009;8:48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Piwek L, Ellis DA, Andrews S, Joinson A. The Rise of Consumer Health Wearables: Promises and Barriers. PLoS Med. 2016;13(2):e1001953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Soucie LP, Carey C, Woodend AK, Tang AS. Correlation of the heart rate- minute ventilation relationship with clinical data: relevance to rate-adaptive pacing. Pacing Clin Electrophysiol. 1997;20(8 Pt 1):1913–8. [DOI] [PubMed] [Google Scholar]
- 21.Aitken ML, Franklin JL, Pierson DJ, Schoene RB. Influence of body size and gender on control of ventilation. J Appl Physiol (1985). 1986;60(6):1894–9. [DOI] [PubMed] [Google Scholar]
- 22.Kilbride E, McLoughlin P, Gallagher CG, Harty HR. Do gender differences exist in the ventilatory response to progressive exercise in males and females of average fitness? Eur J Appl Physiol 2003;89(6):595–602. [DOI] [PubMed] [Google Scholar]
- 23.Rosdahl H, Gullstrand L, Salier-Eriksson J, Johansson P, Schantz P. Evaluation of the oxycon mobile metabolic system against the Douglas bag method. Eur J Appl Physiol. 2010;109(2):159–71. [DOI] [PubMed] [Google Scholar]
- 24.Diaz V, Benito PJ, Peinado AB, Alvarez M, Martin C, Salvo VD, et al. Validation of a new portable metabolic system during an incremental running test. Journal of sports science & medicine. 2008;7(4):532–6. [PMC free article] [PubMed] [Google Scholar]
- 25.Carter J, Jeukendrup AE. Validity and reliability of three commercially available breath-by-breath respiratory systems. Eur J Appl Physiol. 2002;86(5):435– 41. [DOI] [PubMed] [Google Scholar]
- 26.American College of Sports Medicine ACSM’s guidelines for exercise testing and prescription. 9th Edition ed. Baltimore MD: Williams & Wilkins; 2014. [Google Scholar]
- 27.MacWilliam JA. Postural effects on heart-rate and blood pressure. Quarterly Journal of Experimental Physiology. 1933;23. [Google Scholar]
- 28.Royston P, Altman DG. Regression using fractional polynomials of continuous covariates - parsimonious parametric modeling. Applied Statistics-Journal of the Royal Statistical Society Series C. 1994;43(3):429–67. [Google Scholar]
- 29.Royston P, Sauerbrei W. Multivariable model-building: John Wiley & Sons Ltd; 2008. [Google Scholar]
- 30.Benner A Multivariable fractional polynomials 2015. [Available from: https://cran.r-project.org/web/packages/mfp/vignettes/mfp_vignette.pdf.
- 31.Sauerbrei W, Royston P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society Series a-Statistics in Society. 1999;162:71–94. [Google Scholar]
- 32.Calcagno V, de Mazancourt C. glmulti: An R package for easy automated model selection with (generalized) linear models. Journal of Statistical Software. 2010;34(12):1–29. [Google Scholar]
- 33.Raftery AE, Madigan D, Hoeting JA. Bayesian model averaging for linear regression models. Journal of the American Statistical Association. 1997;92(437):179–91. [Google Scholar]
- 34.Madigan D, Raftery AE. Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occams Window. Journal of the American Statistical Association. 1994;89(428):1535–46. [Google Scholar]
- 35.Kohavi R A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence. 1995:1–7. [Google Scholar]
- 36.McCool D, Samet J, editors. Noninvasive Methods for Measuring Ventilation in Mobile Subjects 1993. [Google Scholar]
- 37.Liu S, Gao R, He Q, Staudenmayer J, Freedson P. Improved regression models for ventilation estimation based on chest and abdomen movements. Physiol Meas. 2012;33(1):79–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sayadi O, Weiss EH, Merchant FM, Puppala D, Armoundas AA. An optimized method for estimating the tidal volume from intracardiac or body surface electrocardiographic signals: implications for estimating minute ventilation. Am J Physiol Heart Circ Physiol. 2014;307(3):H426–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd Edition ed. New York, NY, USA: 2017. [Google Scholar]
- 40.Paek D, McCool FD. Breathing patterns during varied activities. Journal of applied physiology. 1992;73(3):887–93. [DOI] [PubMed] [Google Scholar]
- 41.Beals JA, Funk LM, Fountain R, Sedman R. Quantifying the distribution of inhalation exposure in human populations: distribution of minute volumes in adults and children. Environ Health Perspect. 1996;104(9):974–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tosteson TD, Stefanski LA, Schafer DW. A measurement-error model for binary and ordinal regression. Statistics in medicine. 1989;8(9):1139–47; discussion 49. [DOI] [PubMed] [Google Scholar]
- 43.Heid IM, Kuchenhoff H, Miles J, Kreienbrock L, Wichmann HE. Two dimensions of measurement error: classical and Berkson error in residential radon exposure assessment. J Expo Anal Environ Epidemiol. 2004;14(5):365–77. [DOI] [PubMed] [Google Scholar]
- 44.van Wijnen JH, Verhoeff aP, Jans HW, van Bruggen M. The exposure of cyclists, car drivers and pedestrians to traffic-related air pollutants. International archives of occupational and environmental health. 1995;67:187–93. [DOI] [PubMed] [Google Scholar]
- 45.Freedson P, Bowles HR, Troiano R, Haskell W. Assessment of physical activity using wearable monitors: recommendations for monitor calibration and use in the field. Med Sci Sports Exerc. 2012;44(1 Suppl 1):S1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
