Abstract
Missing data due to non-wear are common in accelerometer studies measuring physical activity and sedentary behavior. Accelerometer output are high-dimensional time-series data that are episodic and often highly skewed, presenting unique challenges for handling missing data. Common methods for missing accelerometry either are ad-hoc, require restrictive parametric assumptions, or do not appropriately impute bouts. This study developed a flexible hot deck multiple imputation (MI; i.e., “replacing” missing data with observed values) procedure to handle missing accelerometry. For each missing segment of accelerometry, “donor pools” contained observed segments from either the same or different participants, and 10 imputed segments were randomly drawn from the donor pool according to selection weights, where the donor pool and selection weight depended on variables associated with non-wear and/or accelerometer-based measures. A simulation study of 2,550 women compared hot deck MI to two standard methods in the field: available case (AC) analysis (i.e., analyzing all observed accelerometry with no restriction on wear time or number of days) and complete case (CC) analysis (i.e., analyzing only participants that wore the accelerometer for ≥10 hours for 4–7 days). This was repeated using accelerometry from the entire 24-hour day and daytime (10am– 8pm) only, and data were missing at random. For the entire 24-hour day, MI produced less bias and better 95% confidence interval (CI) coverage than AC and CC. For the daytime only, MI produced less bias and better 95% CI coverage than AC; CC produced similar bias and 95% CI coverage, but longer 95% CIs than MI.
Keywords: accelerometer, multiple imputation, missing data, high-dimensional data, physical activity, hot deck
1. Introduction
Accelerometers are increasingly used in research studies to collect physical activity and sedentary behavior data in free-living settings. Measuring physical activity and sedentary behavior using accelerometers removes issues of recall and cultural bias common with questionnaire assessments of these behaviors. Accelerometers can also provide much more detail during short time segments (e.g., every 15 seconds), referred to as epochs. During accelerometer wear time, movement signals, referred to as activity counts, are recorded in predefined epochs, which, when calibrated to intensity of movement, can then be used to categorize levels of physical activity intensity (e.g., light, moderate, vigorous) or sedentary behavior.
For epidemiologic studies that use accelerometers, the study protocol usually requires participants to wear the accelerometer for multiple days (e.g., 7 days), to characterize day-to-day variability and patterns in physical activity levels and to obtain a more representative sample of daily movement patterns than is possible when based on only a day or two of monitoring (Trost et al. 2000). However, it is common to have missing data due to participants not wearing the accelerometer during an entire day or parts of the day (e.g., due to sleep, water-based activities, non-compliance). In practice, non-wear time is often defined based on a minimum length of time consisting of consecutive zero activity counts (e.g., 20, 60, 90 minutes) (Evenson and Terry 2009), although these time intervals tend to be defined arbitrarily, which can limit cross-study comparisons. Others use an algorithm developed and further refined by Choi et al. which defines non-wear time as an interval of ≥ 90 consecutive minutes of zero counts/minute, with allowance of up to 2 minutes of nonzero counts if no counts were detected during both the 30 minutes upstream and downstream from that interval (Choi et al. 2011; Choi et al. 2012).
Missing data due to non-wear of accelerometers can cause several problems with effect estimation and hypothesis testing. First, common summary measures from accelerometer data are based on total amount of time per day engaged in physical activity (e.g., time spent in moderate to vigorous physical activity (MVPA)). If the accelerometer was only worn for part of the day for one or more days, then these types of summary measures may be underestimated if all incomplete days are included in calculating the summary measure since MVPA during the non-wear time will not be included in the total. Second, if the patterns and amounts of physical activity during non-wear time differ systematically from the patterns of physical activity during the wear time, then biased estimates could occur when simply excluding accelerometer data during non-wear time from analysis, either only using the accelerometer data from wear-time for all participants, or only including participants with no or minimal non-wear time. Third, statistical analyses that simply exclude accelerometer data from non-wear time may result in loss of information (i.e., discarding useful data), which could reduce precision of estimates.
The choice of statistically valid methods to handle missing data depends on the “missing data mechanism” (Little and Rubin 2002). For example, data are missing completely at random (MCAR) if the probability of missingness is independent of all other observed or unobserved quantities. Data are missing at random (MAR) if the probability of missingness is independent of unobserved quantities (e.g., the missing value itself) conditional on observed quantities. Data are missing not at random (MNAR) if the probability of missingness depends on unobserved quantities (e.g., the missing value itself), even after conditioning on observed quantities. Since accelerometer non-wear is likely to be influenced by other factors, accelerometer data are likely to be at least MAR, and perhaps MNAR in special cases (such as for water-based physical activity).
Multiple imputation is one approach to handling missing data. Multiple imputation essentially replaces each missing value with “plausible” replacement values m > 1 times, resulting in m “complete” datasets that each contain the observed data and one set of replacement values for the missing data, where m is the number of imputations (Rubin 1987). Usually, multiple imputation is implemented to handle data that are MCAR or MAR, but it can also be flexible enough to accommodate data that are MNAR by incorporating additional assumptions about the model for missingness. There are many possible ways to obtain plausible replacement values for the missing data. Often, missing data are imputed based on the predictive distribution of the missing variable conditional on the observed data from an explicit parametric model, such as a multivariate Gaussian model (Schafer 1997) or a fully conditional specification (van Buuren 2007). An alternative to imputing missing values based on an explicit parametric model is to use hot deck imputation, which replaces missing values (recipients) with observed values (donors; Andridge and Little 2010). Regardless of the method used to impute the missing data, standard complete data methods can be repeated for each imputed dataset, and the parameter estimates and standard errors from the m imputed datasets can be combined using “Rubin’s rules” (Rubin 1987; see Section 1.1 of the Online Resource for more detail).
The performance of multiple imputation methods for handling missing accelerometer data has received some attention in the literature. Catellier and colleagues (2005) assumed a multivariate Gaussian distribution for summary physical activity measures (e.g., average MET-minutes of MVPA) from each day or partial day (3 – 5 hour window), allowing the physical activity measures to be correlated within person, and drew imputed values from the simulated predictive distribution of the missing values conditional on the observed data obtained using Markov chain Monte Carlo. Lee (2013) developed the following procedure to impute the average activity counts/day for days with 10 hours or less of wear time (i.e., “invalid days”): (1) impute the activity counts/minute for the part of the day when the accelerometer was not worn, (2) impute the number of hours that the accelerometer would have been worn if the study protocol had been followed, and (3) combine the counts/minute measure from the observed parts of the day, the imputed counts/minute measure for the missing parts of the day (from step 1), and the imputed wear time (from step 2) to calculate the imputed counts/day for the invalid day. Counts/minute (step 1) and wear time (step 2) were both imputed using predictive mean matching based on an additive regression model. Lee and Gill (2016) developed a multivariate zero-inflated Poisson log-normal (ZIPLN) model to impute accelerometer counts at the minute-level, which accounted for both excess zero counts and autocorrelation that are characteristic of accelerometer data. Data were imputed both directly from the parametric ZIPLN model (i.e., randomly drawing imputed values from the modeled ZIPLN distribution) and by predictive mean matching based on the ZIPLN model. Predictive mean matching generally performed better than imputing directly from the parametric model. However, the specification of the imputation model had greater influence on imputation accuracy than whether imputed values were drawn directly from the parametric model or by predictive mean matching, implying that the performance of the imputation depended on correctly specifying the imputation model.
The imputation procedures evaluated by Catellier and colleagues (2005), Lee (2013), and Lee and Gill (2016) all involved fitting a parametric imputation model, which required assumptions about the functional form of the model and/or the distribution of the accelerometer measures. However, measures such as daily minutes of MVPA or daily average activity counts/minute may have skewed distributions, making a multivariate Gaussian assumption implausible. In addition, imputing accelerometer data at the epoch-level (e.g., imputing data for 15-second epochs) instead of imputing the summary physical activity measures directly can be useful to prevent inconsistencies in the imputed data (e.g., if the number of minutes/day of different intensities of physical activity are imputed directly, then the total might not sum to the total number of minutes in a day). However, epoch-level accelerometer data are high dimensional (since each person has 40,320 15-second epochs/week), and so fitting a parametric imputation model for epoch-level data often may be impractical due to high computational burden, difficulty in specifying a justifiable joint distribution for all missing accelerometer counts, or inability to estimate a large number of parameters from the available observed data (Little and Rubin 2002; Schafer 1997).
Therefore, hot deck approaches, which are often non-parametric and do not involve specifying a functional form for an imputation model or distributional assumptions for the accelerometer measures (Andridge and Little 2010), may be a useful alternative to handling missing accelerometer data. The purpose of this paper is to identify a more flexible, yet still valid, strategy for handling missing accelerometer data using hot deck multiple imputation, in the context of a study of physical activity and sedentary behavior among community-dwelling older women. Section 2 describes the data that motivated this work, Section 3 describes the hot deck multiple imputation procedure in detail, Section 4 evaluates the performance of this imputation procedure via extensive simulation studies, Section 5 analyzes data from the Objective Physical Activity and Cardiovascular Health in Older Women (OPACH) Study using this method, and Section 6 provides a discussion.
2. OPACH Study
The OPACH Study is an ancillary study of the Women’s Health Initiative (WHI). Details regarding the design and enrollment for the OPACH Study can be found elsewhere (LaCroix et al. 2017). In summary, the WHI enrolled postmenopausal women ages 50 to 79 years old into one of three clinical trials or an observational study at 40 clinical sites throughout the United States from 1993–1998 (Anderson et al. 2003; Hays et al. 2003). Follow-up during the main study was through 2005, and subsequently two extension studies were conducted during 2005–2010 and 2010–2015 to continue follow-up activities among women who were eligible and consented to be in these extension studies. From March 2012 to April 2014, 7,875 WHI women aged 63 and older were asked to participate in the Long Life Study, a home visit protocol aimed at collecting information relevant to cardiovascular health and successful aging, including accelerometer measures of physical activity and sedentary behaviors. The Long Life Study included women from all 40 original clinical centers, with oversampling of minority groups, examining them in their homes. Protocols were approved by Institutional Review Boards at participating institutions and all women gave written informed consent. Overall, 7,048 WHI women who consented to participate in the Long Life Study agreed to wear an accelerometer for one week and were enrolled into the OPACH Study. From these 7,048 women, 6,719 women returned their accelerometers, and 6,489 women remained in the sample after excluding those with no valid accelerometer data.
2.1. Accelerometry Measurement
Participants were fitted with an ActiGraph GT3X+ triaxial accelerometer (Pensacola, Florida) in person during the home visit. If this was not possible, then the device was mailed with detailed wear instructions. Participants were instructed to place the accelerometer on their right hip, above the iliac crest, using a belt worn around the waist and to wear the device for 7 consecutive days during waking and sleeping hours, except during bathing or swimming. In addition, participants were instructed to record on a sleep log the time out of bed in the morning and time into bed at night for each day that the accelerometer was worn. After wearing the accelerometer for up to 7 days, participants mailed the accelerometer and sleep log to the WHI coordinating center, where the data were downloaded and stored.
ActiGraph software (ActiLife) versions 6.0.0–6.10.1 were used only to output the accelerometer data into a file using 15-second epochs (30 Hz) with all three axes using the normal filter. Vector magnitude (VM) activity counts were derived by taking the square root of the vertical axis squared, plus the anterior-posterior axis squared, plus the medial-lateral axis squared. Non-wear was defined by 15-second intervals with consecutive zero VM for at least 90 minutes, with allowance of nonzero VM up to 2 minutes if no counts were detected during both the 30 minutes upstream and downstream from that interval (Choi et al. 2011; Choi et al. 2012). Any nonzero VM counts (except the allowed short intervals) were considered wear time as indicated by Choi et al. (2011; 2012). Counts in the non-wear period were set to missing. Among the 6,489 women with accelerometry data, the accelerometer was not worn for a median of 12.6% of the 7-day period (i.e., the between-woman median amount of non-wear time was 21.2 hours out of the (7 days)*(24 hours/day) = 168 hours of instructed wear time per protocol; interquartile range: 5.4% to 35.6%), and 95.3% of women had some non-wear time. The accelerometer was not worn for a median of 10.1% of each day (i.e., the between-day median amount of non-wear time was 2.4 hours out of the 24 hours/day of instructed wear time per protocol; interquartile range: 0.0% to 36.3%), and 67.0% of days had some non-wear time (2.4% of days were completely missing). Wear time was lower at night (the accelerometer was worn 60.5% of in-bed person-time vs. 89.6% of out-of-bed person-time). Overall, 94.4% of women were classified as “adherent” using the common criterion of wearing the accelerometer for at least 10 hours/day on at least 4 of 7 days (LaCroix et al. 2017).
We characterized average intensity of physical activity using average VM or vertical axis (VA) counts/minute (calculated by taking the mean VM or VA counts/minute within the same day, and then taking the mean across all days within the same participant), premised on a positive relationship between observed accelerometer counts/minute and measured physical activity energy expenditure. Study specific activity count cut-points were determined from a laboratory-based calibration study among women ≥ 60 years (Evenson et al. 2015) in which normal mode VM cut-points were derived to minimize the absolute value of the difference between false positive and false negative classification of categories of physical activity intensity defined using standard absolute metabolic equivalent (MET) values, based on an alternative definition for resting metabolic rate intended for use with older adult populations. Using this criteria, the cut-points defined: sedentary behavior 0–18 VM/15-seconds (≤1.5 METs), light low 19–225 VM/15-seconds (1.6–2.2 METs), light high 226–518 VM/15-seconds (2.3–2.9 METs), and MVPA ≥ 519 VM/15-seconds (≥3 METs). Average sedentary, light low, light high, and MVPA minutes/day were calculated by summing the number of minutes for each intensity level for each day, and then taking the mean across all days within the same participant.
2.2. Clinical Data Measurement
At the home visit, a brief clinical assessment occurred with measurement of height, weight, waist circumference, resting blood pressure and pulse, physical function, and phlebotomy. Body mass index (BMI; kg/m2) was calculated from measured height and weight. The details on the blood draw protocol and biomarker measures can be found elsewhere (LaCroix et al. 2017). Physical function was assessed using the Short Form (36) Health Survey (SF-36), with scores ranging from 0 to 100 (Ware and Sherbourne 1992; McHorney et al. 1993). The SF-36 was administered up to 3 years prior to collection of accelerometer data.
2.3. Self-Reported Physical Activity Assessment
Self-reported physical activity data were collected from the WHI physical activity questionnaire for which evidence for reliability (Meyer et al. 2009) has been demonstrated. It has been shown that these self-reported physical activity data can be combined with other data available in WHI to construct calibrated estimates of activity-related energy expenditure (Neuhouser et al. 2013). The questionnaire asked about the frequency and duration of time per week spent walking outside the home for more than ten minutes without stopping and physical activity engagement in mild, moderate, and strenuous exercise. MVPA was calculated, and categorized as the following three categories: 0, > 0 and ≤ 90, and > 90 minutes/week. Sedentary behavior, which was only assessed among women participating in the observational study, was determined based on the answer to the question, “During a usual day and night, about how many hours do you spend sitting?” Eight response categories, ranging from “less than 4 hours” to “16 or more hours”, were provided. For analysis, sitting was re-categorized as the following five categories: < 4, 4 – 5, 6 – 7, 8 – 9, and ≥ 10 hours/day. Both MVPA and sitting were re-categorized for analysis to make the categories as similar in size as possible.
3. Hot Deck Multiple Imputation Procedure for Accelerometer Non-Wear
Next, we outline a general procedure for implementing hot deck multiple imputation to accommodate missing accelerometer data due to non-wear. Exploratory analyses should be conducted before doing any imputation, to determine which auxiliary variables (i.e., “extra” variables not included in the main analysis) should be used to design the hot deck imputation procedure. The hot deck imputation procedure should incorporate as many auxiliary variables as possible that are associated with accelerometer non-wear and/or with sedentary behavior/physical activity. For example, separate generalized estimating equation models could be fit for percentage of missing data per day and the proportion of each day spent in sedentary behavior or MVPA, and then covariates that are associated with at least one of those outcomes could be incorporated into the hot deck imputation procedure. Within-person covariates (e.g., time of day, day of the week) in addition to between-person covariates should be considered.
Next, for each missing epoch of accelerometer data, donor pools of observed accelerometer data (either from the same participant and/or other participants) are created based on categorized variables, and then m donors are randomly drawn from the corresponding donor pool with a specified selection probability that may or may not differ based on auxiliary variables, where m is the number of imputations. This methodology is designed to impute accelerometer data at any length of time window, so it can be directly applied to handle missing accelerometer data for each missing epoch given sufficient computational efficiency. Continuous variables must either be categorized to create donor pools or used to determine selection probabilities. Similarly, categorical variables may need to be re-categorized when creating donor pools if there are not enough donors in a particular cross-classification of variables used to create the donor pools. One useful rule of thumb is to ensure that each donor pool contains at least 10 donors.
Due to sparse cells, it may not be possible to directly incorporate all variables associated with non-wear and/or sedentary behavior/physical activity in creating the donor pools (e.g., if many variables are used to create the donor pools, there may be some donor pools that contain too few donors). In that case, one could opt to incorporate only a few key auxiliary variables with the strongest associations with non-wear and/or sedentary behavior/physical activity. Alternatively, one could define a continuous distance metric (e.g., maximum deviation, Mahalanobis distance, predictive mean) incorporating all desired auxiliary variables, and then either randomly sample donors with selection probability equal to the inverse of the distance metric or use the distance metric to stratify the sample into donor pools.
Also, since sedentary behavior/physical activity at different time-points tend to be correlated within individuals, it is important to enable the selection of donors from the same person (“self donors”) when it exists, and to ensure that the probability of selecting a donor from the same person is not greatly influenced by the number of donors available from other people (“non-self donors”). One way to ensure this is by fixing the selection probability for self donors regardless of the size of the donor pool from other people. For example, the selection probability for a self donor could be made equal to the sum of the selection probabilities for all non-self donors so that each self donor is expected to be selected with probability equal to the probability of selecting a non-self donor. See Andridge and Little (2010) for a review of different hot deck imputation methods.
The number of imputed datasets needed, m, depends on the amount of missing data. Increasing the number of imputations would improve the precision of the final results, and more imputations are needed for a similar level of precision when the amount of missing data is higher. However, the benefit for using a larger number of imputations generally levels off (i.e., increasing the number of imputations from 5 to 10 would provide a greater improvement in precision than increasing the number of imputations from 10 to 15), and so a small number of imputations is often sufficient. The relative efficiency (RE; i.e., relative precision) of a multiple imputation procedure with m imputations relative to a maximally efficient procedure with an infinite number of imputations can be calculated using the formula , where λ is the fraction of missing information as defined by Rubin (1987; see Section 1.2 of the Online Resource for the technical definition for the fraction of missing information). One could select the number of imputations based on the desired level of relative efficiency, or choose the number of imputations at which the increase in relative efficiency starts to level off (i.e., for which an increase in m results in only modest increases in RE).
After the hot deck imputation has been conducted, then statistical analysis proceeds the same way as for any other multiple imputation procedure. Imputation results in m sets of data, each including the original data for all non-missing epochs and one set of imputed data for all missing epochs. Standard statistical analyses (e.g., regression models to estimate associations) can then be repeated separately for each imputed dataset, and the parameter estimates and covariance matrices from the m analyses are combined using “Rubin’s rules” (Rubin 1987; see Section 1.1 of the Online Resource for more detail) to produce final parameter estimates and standard errors.
4. Simulation Study
4.1. Processing of Accelerometry
Two simulation scenarios were considered: (1) imputing accelerometer data across the entire 24-hour day, and (2) imputing accelerometer data during the daytime (10am – 8pm) only. “Daytime” was specified as 10am – 8pm because the vast majority of this time-frame for the sample (99% of person-time) was spent out of bed. For both of these simulation scenarios, five different types of datasets were created and used in analyses: (1) simulation, (2) true complete, (3) available case, (4) complete case, and (5) imputation. The process of creating these datasets involved four steps, which are described in detail in Section 2 of the Online Resource and summarized briefly below.
4.1.1. Step 1: Created simulation dataset
For the purposes of this simulation study, a sub-sample of the OPACH cohort with minimal missing accelerometer data was identified (simulation sample), and this dataset was used to estimate the “true” parameters that would be estimated in the absence of missing data. Each person-day of accelerometer data collection was categorized based on cross-classifying the time of day (categorized as 10 time periods) with an indicator of whether the participant was in bed (presumed asleep) or out of bed (presumed awake), resulting in the following 14 time windows: midnight – 6am, 6 – 8am (stratified by in-bed and out-of-bed time), 8 – 10am (stratified by in-bed and out-of-bed time), 10am – noon, noon – 2pm, 2 – 4pm, 4 – 6pm, 6 – 8pm, 8pm – 10pm (stratified by in-bed and out-of-bed time), 10pm – midnight (stratified by in-bed and out-of-bed time). Missing accelerometer data was generated (Step 3) and imputed (Step 4b) conditional on these time windows. This simulation sample consisted of accelerometer data from a total of 14,975 days from 2,550 participants.
4.1.2. Step 2: Re-sampled 1000 bootstrap datasets from simulation sample
For each simulation scenario, 1000 bootstrap datasets of 2,550 participants were resampled with replacement from the corresponding simulation sample using a stratified re-sampling method, where strata depended on age (younger than 80 years or 80+ years) and BMI category (underweight/normal (BMI < 25 kg/m2), overweight (25 kg/m2 ≤ BMI < 30 kg/m2), or obese (BMI ≥ 30 kg/m2)). These true complete datasets had no missing accelerometer data, and so analyses were repeated with these datasets as a “gold standard” comparison.
4.1.3. Step 3: Generated missing accelerometer data for 1000 bootstrap datasets
Let i indicate the participant, j indicate the day, and k indicate the time window. Accelerometer data were deleted from each of the 1000 true complete bootstrap datasets based on four randomly generated indicators ((1) a participant-level missing data indicator ri, (2) a day-level missing data indicator rij, (3) a window-level missing data indicator rijk, and (4) a “complete” missingness indicator ) to generate missing data patterns similar to those in the OPACH cohort (see Figure S2 in the Online Resource for an illustration of missing data generation for both scenarios). If the participant-level missing data indicator ri equaled 1, then a day-level missing data indicator rij was drawn for each day of data for that participant; otherwise, all data for that participant were observed. Similarly, if the day-level missing indicator rij equaled 1, then a window-level missing data indicator rijk was drawn for each window for that day; otherwise, all data for that day were observed. If the window-level missing indicator rijk equaled 1, then accelerometer data for that window was set to missing; otherwise data for that window were observed. For each time window for each participant, the complete missing data indicator determined whether that time window was set to missing for all days for that participant (see Section 2.3 of the Online Resource for more detail about generation of complete missingness). All missing data indicators were generated based on values of observed variables (time of day, in-bed status, age, and BMI), and therefore the missing data mechanism was MAR (see Section 2.3 of the Online Resource for the models used to generate all missing data indicators). The total percentage of missingness generated for both scenarios was approximately 25%. These datasets with missing accelerometer data were used for the available case analyses, and also were used to create the complete case (Step 4a) and imputation datasets (Step 4b).
4.1.4. Step 4a: Created complete case datasets
For each of the 1000 available case datasets, a complete case dataset was created by restricting to participants with at least 4 days with at least 10 hours/day of observed accelerometer data, and only including days with at least 10 hours/day of observed accelerometer data. Compared to the sample size of N=2,550 for the true complete and available case datasets, the sample size for the complete case datasets was reduced by less than 1% and ranged from 2,529 to 2,549 participants for scenario 1 (entire 24-hour day). The sample size for the complete case datasets was reduced by 74.5% to 80.4% (compared to the sample size of N=2,550 for the true complete and available case datasets), and ranged from 501 to 650 participants for scenario 2 (daytime only). Since scenario 2 restricted to accelerometer data between 10am – 8pm (i.e., a 10-hour time period each day), only including participants with at least 4 days with at least 10 hours of wear time in the complete case datasets was more restrictive for scenario 2 (compared to scenario 1) because it essentially required that included participants had no missing data during the “daytime” period for at least 4 days. Therefore, the sample size reduction for the complete case datasets was much greater for scenario 2 (daytime only) than for scenario 1 (entire 24-hour day).
4.1.5. Step 4b: Implemented hot deck multiple imputation procedure
For each of the 1000 available case datasets, a hot deck method was used to impute the missing accelerometer data 10 times. Among the 1000 bootstrap datasets (with simulated missingness), the estimated fraction of missing information (Rubin, 1987; see Section 1.2 in the Online Resource) was as high as 10.3%, and therefore the relative efficiency for a multiple imputation procedure using 10 imputations relative to an infinite number of imputations was very high (at least 99.0%) for all bootstrap datasets. Therefore, we decided that imputing the accelerometer data 10 times would be sufficient.
Figure 1 provides a flow chart outlining the hot deck multiple imputation procedure described in this section. Section 3 of the Online Resource contains pseudo-code outlining the steps used to impute the data, and Section 4 of the Online Resource contains example SAS code for implementing this procedure, which can be modified to implement variations on this hot deck imputation procedure. For each missing window, the donor pool was restricted to observed windows for the same time period and in-bed status either from the same participant (“self donors”) or from other participants (“non-self donors”) with the same BMI category, the same age category, and sufficiently small difference in physical function score (scale from 0 to 100) with “sufficiently small” defined in one of the following ways: (1) difference in physical function score less than 10, if there were at least 20 windows in the same stratum (based on time of day, in-bed status, BMI, and age) that met this criteria, or (2) otherwise, the 20 windows from the same stratum with the smallest difference in physical function score. Imputed data were randomly sampled from the corresponding donor pool based on a selection weight, which depended on absolute difference in physical function score between the donor participant and the recipient participant (for non-self donors), or was defined as the sum of the weights among all non-self donors (for self donors). Each resulting imputation dataset contained 10 sets of data, each including the original data (from the simulation dataset) for all non-missing epochs and one set of imputed data for all missing epochs.
Fig. 1.
Overview of hot deck multiple imputation procedure (for each window with missing data) for simulation study
4.2. Statistical Analysis
To evaluate the performance of hot deck multiple imputation when estimating associations, the associations between accelerometer-based summary physical activity/sedentary behavior measures and various outcome variables were estimated. The following six summary physical activity/sedentary behavior measures were used as predictor variables: (1) average sedentary minutes/day, (2) average light low minutes/day, (3) average light high minutes/day, (4) average MVPA minutes/day, (5) average VM counts/minute, and (6) average VA counts/minute. The following seven cardiovascular disease (CVD) risk factors were used as outcome variables, since it is often of interest to estimate associations of physical activity/sedentary behavior with cardiovascular risk: (1) high density lipoprotein (HDL) cholesterol (mg/dL), (2) low density lipoprotein (LDL) cholesterol (mg/dL), (3) triglycerides (mg/dL), (4) total cholesterol (mg/dL), (5) C-reactive protein (CRP; mg/L), (6) diastolic blood pressure (DBP; mmHg), and (7) systolic blood pressure (SBP; mmHg). Since HDL, LDL, triglycerides, total cholesterol, and CRP were positively skewed, these variables were log-transformed for the final analysis. In addition, the following two self-reported physical activity/sedentary behavior measures were also used as outcome variables, since it is sometimes of interest to estimate the association between accelerometer-measured and self-reported physical activity/sedentary behavior: MVPA and sitting time.
Separate models were run with each physical activity/sedentary behavior summary measure as a predictor variable and with each outcome variable (i.e., 54 models, 6 predictor variables, 9 outcome variables), controlling for age (continuous) and average daily accelerometer wear time (hours/day). Note that the average wear time/day variable was defined as the average number of hours of accelerometer data per day that was included in the calculation of the summary measures of physical activity for the participant (e.g., when accelerometer data were imputed, the average wear time/day variable equaled 24 hours (scenario 1) or 10 hours (scenario 2) for all participants). The continuous CVD biomarker outcomes were modeled using linear regression, and the coefficient estimates for the physical activity/sedentary behavior summary measure are reported. The categorical self-reported physical activity and sitting time measures were modeled using multinomial logistic regression, and odds ratios are reported; zero minutes/week was used as the reference category for self-reported MVPA, and < 4 hours/day was used as the reference category for self-reported sitting time.
Results from analyses using the following datasets were compared to the “true” parameter estimates from the simulation dataset: true complete, imputation, complete case, and available case. For the imputation analyses, regression results from each imputation were combined using Rubin’s rules (see Section 1.1 from the Online Resource) to obtain a single set of results for each bootstrap sample. For each set of datasets (i.e., each missing data method), the following summary statistics were calculated across the 1000 bootstrap datasets: the mean effect estimate (i.e., coefficient from a linear model or odds ratio from a multinomial logistic regression model), mean lower and upper bounds for the 95% confidence interval, and 95% confidence interval coverage (i.e., the percentage of 95% confidence intervals that contained the true parameter estimate from the simulation dataset). To summarize results across all analyses (i.e., across all summary measures of accelerometer-measured physical activity/sedentary behavior and all outcome measures), the following summary statistics were averaged across each set of analyses: absolute percent bias (i.e., the absolute value of the following: the difference between the mean effect estimate and the true parameter estimate from the simulation dataset, divided by the true parameter estimate), standardized 95% confidence interval length (i.e., the mean 95% confidence interval length for the bootstrap datasets divided by the 95% confidence interval length for the simulation sample), and 95% confidence interval coverage.
4.3. Results
The distributions of age, BMI, physical function score, and accelerometer-based measures of physical activity/sedentary behavior for the simulation sample (n=2,550) and the OPACH cohort (n=6,489) are presented in Table 1. As an illustration, Figure 2 presents the within-window average VM counts/minute for the true complete data compared to the imputed data for a randomly selected participant-day combination for each time window (for both simulation scenarios) from the first bootstrap dataset. Since the focus of these simulations was to appropriately handle missing accelerometer data, participants that were missing on the outcome variable were excluded from the corresponding analysis models (including estimation of the true parameter estimates based on the simulation sample): 590 participants from the simulation sample were excluded from analyses with HDL, triglycerides, total cholesterol, and CRP as outcomes; 593 participants were excluded from analyses with LDL; 591 participants were excluded from analyses with DBP and SBP; 87 participants were excluded from analyses with self-reported sitting time; and 53 participants were excluded from analyses with self-reported time spent in MVPA. Average absolute percent bias, standardized 95% confidence interval length, and 95% confidence interval coverage were calculated across all analysis models (78 total for each simulation scenario; Table 2). Mean effect estimates and 95% confidence intervals for each analysis model were plotted, with dotted lines to indicate the “true” parameter estimates from the simulation dataset (Figure 3). The 95% confidence interval coverage for each analysis model were also plotted (Figures 4 and 5).
Table 1.
Distribution of participant characteristics and physical activity/sedentary behavior measures in simulation sample (n=2,550) and OPACH cohort (n=6,489)
| Characteristic | Minimum | 1st Quartile | Median | 3rd Quartile | Maximum |
|---|---|---|---|---|---|
| Simulation Sample | |||||
| Age | 64 | 73 | 78 | 83 | 96 |
| BMI | 13.8 | 24.2 | 27.2 | 30.9 | 64.9 |
| Physical Function Score a | 0 | 55 | 80 | 90 | 100 |
| % Sedentary Behavior b | 42.1% | 67.0% | 71.9% | 76.4% | 91.3% |
| % Light Low Activity b | 5.9% | 13.2% | 15.7% | 18.2% | 40.2% |
| % Light High Activity b | 0.6% | 6.3% | 7.9% | 9.8% | 22.0% |
| % MVPA b | 0.1% | 2.4% | 4.0% | 5.8% | 26.8% |
| Average VM Counts/Min | 38.6 | 225.1 | 291.5 | 375.5 | 1196.0 |
| Average VA Counts/Min | 14.5 | 77.3 | 109.9 | 147.0 | 557.8 |
| OPACH Cohort | |||||
| Age | 64 | 73 | 80 | 84 | 97 |
| BMI | 13.6 | 24.2 | 27.4 | 31.3 | 64.9 |
| Physical Function Score a | 0 | 50 | 75 | 90 | 100 |
| % Sedentary Behavior b,c | 28.0% | 64.8% | 71.1% | 76.7% | 97.8% |
| % Light Low Activity b,c | 0.0% | 13.5% | 16.4% | 19.8% | 40.6% |
| % Light High Activity b,c | 0.0% | 6.2% | 8.2% | 10.4% | 24.9% |
| % MVPA b,c | 0.0% | 2.2% | 3.7% | 5.9% | 27.1% |
| Average VM Counts/Min c | 20.3 | 217.8 | 295.3 | 394.9 | 1362.8 |
| Average VA Counts/Min c | 4.3 | 73.7 | 107.4 | 150.2 | 682.8 |
Abbreviations: BMI, body mass index; MVPA, moderate to vigorous physical activity; VM, vector magnitude; VA, vertical axis.
341 (13.4%) participants from the simulation sample and 913 (14.1%) participants from the OPACH cohort were excluded due to missing physical function score.
Percent of total time that the accelerometer was worn that was spent in the corresponding type of physical activity or sedentary behavior (before deletion).
4 participants from the OPACH cohort were excluded due to completely missing accelerometer data.
Fig. 2.
Within-window average VM counts/minute for the true complete data compared to the imputed data for a randomly selected participant-day combination for each time window from the first bootstrap dataset
Table 2.
Absolute percent biasa, standardized 95% confidence intervalb, and 95% confidence interval coveragec for each missing data method based on 1000 bootstrap samples, averaged across all analysis models (n=2,550)
| Scenario 1: Entire 24-Hour Day | Scenario 2: Daytime Only | |||||
|---|---|---|---|---|---|---|
| Missing Data Method | Absolute % Bias | Standardized 95% CI | 95% CI Coverage | Absolute % Bias | Standardized 95% CI | 95% CI Coverage |
| True Complete Data | 0.8 | 1.00 | 95.2 | 0.5 | 1.00 | 95.4 |
| Imputation | 5.1 | 1.05 | 94.8 | 3.9 | 1.06 | 95.4 |
| Complete Case | 15.5 | 1.03 | 86.7 | 6.8 | 2.10 | 94.9 |
| Available Case | 15.9 | 1.03 | 86.3 | 16.1 | 1.16 | 86.0 |
Abbreviations: CI, confidence interval.
The absolute value of the following: the difference between the mean effect estimate and the true parameter estimate from the simulation dataset, divided by the true parameter estimate (smaller indicates less bias).
The mean 95% confidence interval length for the bootstrap datasets divided by the 95% confidence interval length for the simulation sample (smaller indicates better precision).
The percentage of 95% confidence intervals that contained the true parameter estimate from the simulation dataset (closer to 95% indicates better coverage).
Fig. 3.
Effect estimates (coefficients or odds ratios) and 95% confidence intervals averaged across 1000 bootstrap samples, for increase of 1 hour/day of physical activity/sedentary behavior or 60 counts/minute (n=2,550)
Note: Estimates closer to the true values indicate better (i.e., smaller) bias. Shorter confidence intervals indicate better precision. Estimates and confidence intervals for imputation method were obtained using Rubin’s rules (see Section 1 of Online Resource) to combine results across 10 imputations.
Note: Estimates closer to the true values indicate better (i.e., smaller) bias. Shorter confidence intervals indicate better precision. Estimates and confidence intervals for imputation method were obtained using Rubin’s rules (see Section 1 of Online Resource) to combine results across 10 imputations. The reference category for SR MVPA was 0 mins/week.
Note: Estimates closer to the true values indicate better (i.e., smaller) bias. Shorter confidence intervals indicate better precision. Estimates and confidence intervals for imputation method were obtained using Rubin’s rules (see Section 1 of Online Resource) to combine results across 10 imputations. The reference category for SR Sitting was < 4 hrs/week.
Fig. 4.
Probability coveragea for 95% confidence intervals based on 1000 bootstrap samples (Scenario 1: entire 24-hour day) (n=2550)
Abbreviations: H, HDL cholesterol; L, LDL cholesterol; Ch, total cholesterol; T, triglycerides; C, CRP; D, DBP; S, SBP; M1, SR MVPA ≤ 90 mins/week; M2, SR MVPA > 90 mins/week; Se1, SR sitting 4 – 5 hrs/week; Se2, SR sitting 6 – 7 hrs/week; Se3, SR sitting 8 – 9 hrs/week; Se4, SR sitting ≥ 10 hrs/week.
Note: The reference category for SR MVPA was 0 mins/week. The reference category for SR sitting was < 4 hrs/week.
a The percentage of 95% confidence intervals that contained the true parameter estimate from the simulation dataset (closer to 95% is better).
Fig. 5.
Probability coveragea for 95% confidence intervals based on 1000 bootstrap samples (Scenario 2: daytime only) (n=2550)
Abbreviations: H, HDL cholesterol; L, LDL cholesterol; Ch, total cholesterol; T, triglycerides; C, CRP; D, DBP; S, SBP; M1, SR MVPA ≤ 90 mins/week; M2, SR MVPA > 90 mins/week; Se1, SR sitting 4 – 5 hrs/week; Se2, SR sitting 6 – 7 hrs/week; Se3, SR sitting 8 – 9 hrs/week; Se4, SR sitting ≥ 10 hrs/week.
Note: The reference category for SR MVPA was 0 mins/week. The reference category for SR sitting was < 4 hrs/week.
a The percentage of 95% confidence intervals that contained the true parameter estimate from the simulation dataset (closer to 95% is better).
For scenario 1 (entire 24-hour day), multiple imputation performed approximately as well as or better than available case or complete case analysis in terms of bias (Table 2, Figure 3) and 95% confidence interval coverage for all models (Table 2, Figure 4). The absolute percent bias for multiple imputation did not exceed 17% for any analysis model, and was smaller than available case and complete case analysis for 97% of analysis models. Multiple imputation produced 95% confidence intervals with nearly nominal coverage (> 90%) for all models, compared to available case analysis (average 95% confidence interval coverage: 86.3%) and complete case analysis (average 95% confidence interval coverage: 86.7%). Average standardized 95% confidence interval length was similar among multiple imputation (1.05), available case analysis (1.03), and complete case analysis (1.03).
For scenario 2 (daytime only), multiple imputation performed approximately as well as or better than available case analysis and approximately as well as complete case analysis in terms of bias (Table 2; Figure 3) and 95% confidence interval coverage (Table 2; Figure 5) for all models. The absolute percent bias for multiple imputation did not exceed 11% for any analysis model, and was smaller than available case analysis for 88% of analysis models. Multiple imputation and complete case analysis produced 95% confidence intervals with nearly nominal coverage (> 90%) for all models, compared to available case analysis (average 95% confidence interval coverage: 86.0%). Multiple imputation produced much shorter 95% confidence intervals than complete case analysis for all models.
5. Applied Example with OPACH Cohort
A hot deck multiple imputation procedure was applied to estimate the association between accelerometer-measured physical activity/sedentary behavior and CVD biomarkers in the OPACH cohort, based on an analysis conducted by LaMonte and colleagues (2017). Each person-day of accelerometer data was categorized into 10 time windows: midnight – 6am, 6 – 8am, 8 – 10am, 10am – noon, noon – 2pm, 2 – 4pm, 4 – 6pm, 6 – 8pm, 8pm – 10pm, 10pm – midnight. For each window that contained missing accelerometer data, a donor pool was constructed of windows with completely observed accelerometer data, from both self donors and non-self donors, matching based on time period, BMI category (underweight (BMI < 18.5 kg/m2), normal (18.5 kg/m2 ≤ BMI < 25 kg/m2), overweight (25 kg/m2 ≤ BMI < 30 kg/m2), or obese (BMI ≥ 30 kg/m2)), and age category (younger than 80 years or 80+ years). Selection weights based on physical function score were created in a similar manner as was done in the previously described simulation study (see Section 4.1.5). Ten windows were randomly drawn with replacement from the donor pool for each window with missing data, with sampling probability proportional to the selection weight. Finally, for each imputation for each window with missing data, the missing epochs for the recipient were replaced with the corresponding (i.e., from the same time of day) observed epoch from the selected donor, resulting in 10 completed datasets.
Once the missing accelerometer data were imputed, average sedentary, light low, light high, and MVPA minutes/day were calculated as described in Section 2.1. Linear regression was used to estimate the partial Pearson’s correlation coefficient for the association of each of these physical activity/sedentary behavior measures with the following CVD biomarkers, adjusting for accelerometer wear time, age, BMI, and race/ethnicity: log-transformed triglycerides (mg/dL), log-transformed CRP (mg/L), DBP (mmHg), and SBP (mmHg). Each analysis was conducted separately for each imputed dataset, and the results were combined across imputations using Rubin’s rules (see Section 1.1 in the Online Resource). For comparison, each analysis was repeated both using the imputed data (n=5,100) and using accelerometer data from wear time only for an adherent sample (n=4,870) that wore the accelerometer for at least 4 days with at least 10 hours/day of out of bed wear time (the latter adherent sample approach was used in Lamonte et al (2017)).
Table 3 presents the partial Pearson’s correlation coefficients for the association between each physical activity/sedentary behavior measure and each CVD biomarker, using both the imputed data and the observed data from the adherent sample. For analyses involving triglycerides, DBP, and SBP, the estimates of the partial Pearson’s correlation coefficients were similar between the imputed data and the observed data from the adherent sample. However, estimates of the partial Pearson’s correlation coefficients for the associations of log-transformed CRP with sedentary minutes/day and light low minutes/day were stronger based on the observed data from the adherent sample (0.101 and −0.058, respectively) compared to the imputed data (0.094 and −0.049, respectively). In addition, a statistical test of whether the partial Pearson’s correlation coefficient is different from zero (with significance level 0.05) provided conflicting results for the association of DBP with light high minutes/day based on the observed data from the adherent sample (p-value = 0.04) compared to the imputed data (p-value = 0.06). Therefore, although imputing the accelerometer data for the non-wear time and simply restricting the dataset to the adherent sample only provided similar results for most analyses, for some statistical analyses (e.g., those focused on CRP and DBP) restricting to the adherent sample only may have over-estimated the association between physical activity/sedentary behavior and the CVD risk factor.
Table 3.
Partial Pearson’s correlation coefficients for the association between accelerometer-measured physical activity/sedentary behavior and cardiovascular disease biomarkers, estimated using imputed data (n=5,100) and an adherent sample (n=4,870) for the OPACH cohort
| Sedentary (mins/day) | Light Low (mins/day) | Light High (mins/day) | MVPA (mins/day) | |||||
|---|---|---|---|---|---|---|---|---|
| Imputed | Adherent | Imputed | Adherent | Imputed | Adherent | Imputed | Adherent | |
| Log(Triglycerides) | 0.164*** | 0.162*** | −0.172*** | −0.173*** | −0.104*** | −0.103*** | −0.070*** | −0.067*** |
| Log(CRP) | 0.094*** | 0.101*** | −0.049*** | −0.058*** | −0.092*** | −0.097*** | −0.083*** | −0.085*** |
| DBP | 0.037** | 0.040** | −0.039** | −0.042** | −0.027 | −0.029* | −0.012 | −0.013 |
| SBP | 0.030* | 0.032* | −0.013 | −0.016 | −0.020 | −0.021 | −0.041*** | −0.041** |
Abbreviations: MVPA, moderate to vigorous physical activity; CRP, C-reactive protein; DBP, diastolic blood pressure; SBP, systolic blood pressure.
p-value ≤ 0.05
p-value ≤ 0.01
p-value ≤ 0.001
6. Discussion
This study developed a hot deck multiple imputation procedure for handling large amounts of missing accelerometer data in the setting of an epidemiological cohort study due to participants not wearing the accelerometer. The performance of this hot deck multiple imputation procedure for handling MAR accelerometer data was evaluated compared to complete case analysis and available case analysis using data from a cohort of older women who wore an accelerometer for up to 7 consecutive days. Performance of the hot deck procedure was evaluated for two scenarios: (1) imputing data across the entire 24-hour day, and (2) imputing data during the daytime (10am – 8pm) only. Hot deck multiple imputation performed better than available case analysis on average for both scenarios, and better than complete case analysis on average for the entire 24-hour day. For the daytime only, hot deck multiple imputation performed similarly to complete case analysis on average in terms of bias and 95% confidence interval coverage, but hot deck multiple imputation produced more precise estimates than complete case analysis.
For both simulation scenarios, the hot deck multiple imputation procedure always produced 95% confidence intervals with approximately nominal coverage and almost always (97% of analyses for the entire 24-hour day, 88% of analyses for the daytime only) produced less bias compared to available case analysis. When using data from the entire 24-hour day, complete case analysis performed similarly to available case analysis, with lower 95% confidence interval coverage and greater absolute percent bias than multiple imputation on average. When using data from the daytime only, complete case analysis produced approximately nominal 95% confidence interval coverage for all analysis models and less absolute percent bias on average than available case analysis; however, multiple imputation always produced much more precise effect estimates than complete case analysis.
Complete case analysis performed better when restricting analyses to daytime data only compared to using data from the entire 24-hour day. Note that, for both scenarios, summary measures for “complete cases” were calculated only using days with at least 10 hours/day of non-missing data. Therefore, all days included in the complete case analyses had no missing data in the daytime only scenario (where a complete day of data spanned 10 hours), whereas the complete case analyses in the entire day scenario (where a complete day of data spanned 24 hours) included partially missing days. So, the average minutes/day at a specific activity level may have been more likely to be underestimated in the scenario including data from the entire 24-hour day than in the scenario including daytime data only, since the total time observed per day among the complete cases may have been a lower percentage of the total time per day (24 hours vs. 10 hours). Similarly, for the daytime only scenario, the average minutes/day at a specific activity level may have been more underestimated on average when using available case analysis than complete case analysis since a smaller proportion of the day was observed on average.
Our findings that multiple imputation produced little bias are consistent with previous research regarding the performance of multiple imputation for accelerometer non-wear (Catellier et al. 2005; Lee 2013). In addition, Catellier and colleagues (2005) found that when considering accelerometer non-wear during the daytime (6am – midnight), where missing data were generated either for 3 – 5 hour windows or for entire days, an analysis using the observed accelerometer data only was generally less precise than multiple imputation, which is consistent with our results for the daytime only scenario comparing multiple imputation to complete case analysis. Previous simulation work has evaluated the performance of multiple imputation methods based on fitting parametric imputation models (Catellier et al. 2005; Lee 2013; Lee and Gill 2016), which require restrictive assumptions about the functional form of the imputation model and/or the distribution of the accelerometer data that were not necessary for a hot deck imputation approach. In addition, simulations conducted by Lee and Gill (2016) have found that predictive mean matching (a type of hot deck imputation based on selecting donors with similar predicted values from a parametric model) performed better for imputing missing accelerometer data compared to using a corresponding parametric imputation model, suggesting that hot deck approaches may have the potential to perform better than parametric imputation models in some cases.
Although the focus of this paper was applying hot deck multiple imputation methods to accelerometer data, the methods described here could also be adapted to accommodate missing data from other devices that output high-dimensional data, such as heart rate or location from global positioning systems (GPS). The use of hot deck multiple imputation would be particularly useful for data that are high-dimensional with complex temporal- and/or spatial-correlation structures that are difficult to model, and/or that are highly skewed and cannot be easily modeled by known distributions. Many of the issues discussed in this paper related to the imputation of accelerometer data would also be important to consider when applying hot deck imputation methods to high-dimensional data from other devices (e.g., the importance of allowing imputed data to be drawn from self donors). Future research should adapt hot deck imputation methods to high-dimensional data from other devices, and evaluate the performance of these methods in those diverse contexts.
6.1. Limitations
There are a few limitations to this study that should be noted. First, multiple imputation, as it is usually implemented in standard software and as it was implemented in this simulation study, is only valid (i.e., unbiased with valid standard errors) when the data are MCAR or MAR. Although multiple imputation can be modified to accommodate MNAR data, this would require extra assumptions about the model for missingness and/or additional variables that may not be measured (Rubin 1987), and not all standard software packages are currently flexible enough to accommodate multiple imputation for MNAR data. In addition, the simulation study described in this paper did not assess the performance of this hot deck multiple imputation procedure when data were MNAR, nor did it compare whether the multiple imputation procedure still performed better than available case or complete case analysis under this scenario. A simulation study by Catellier and colleagues (2005) found that although multiple imputation using a multivariate Gaussian imputation model produced biased estimates when data were MNAR, the estimates based on multiple imputation were less biased than the estimates based on analyzing observed data only. Therefore, the performance of hot deck multiple imputation when the MAR assumption does not hold may be worthy of future investigation.
Second, although hot deck imputation procedures can avoid fitting explicit parametric imputation models and do not require distributional assumptions, thereby reducing the risk of model mis-specification, these procedures still require implicit assumptions based on how donor pools are created and/or how donors are selected (Andridge and Little 2010). For example, the hot deck procedure used in this study assumed that all variables that were related to the probability of non-wear and the missing data values were included in the construction of the donor pool or the donor sampling procedure (i.e., that the data were MAR conditional on time of day, in-bed status, BMI, age, and physical function). Methods exist to test whether missingness depends on specified observed variable(s) (i.e., whether the addition of particular auxiliary variable(s) would make the MAR assumption more plausible; e.g., Little 1988), but it is generally not possible to verify whether data are MAR versus MNAR based on the observed data. In addition, the performance of this hot deck procedure may depend on the cut-points used to categorize continuous covariates for creating the donor pools (e.g., the cut-points used for age and BMI). Estimates from multiple imputation may be biased if the implicit hot deck imputation model is mis-specified (e.g., if the accelerometer data are not MAR conditional on the covariates used to create the donor pool; Andridge and Little 2010).
Third, the true missing generation process in the OPACH cohort is unknown, and so it was necessary for the current study to make assumptions when generating missing data in the simulation study to approximate the missingness in the OPACH cohort. To improve this approximation as much as possible, missingness for the simulation study was modeled based on observed patterns of missingness in the full OPACH cohort (n=6,489).
6.2. Conclusion
Hot deck multiple imputation methods could be particularly useful for handling missing accelerometer-measured physical activity/sedentary behavior data, in that they avoid the need for distributional assumptions (which may be implausible for skewed physical activity or sedentary behavior data), and could facilitate imputing accelerometer data at the original epoch-level scale without the need to fit complex parametric imputation models. This simulation study found that a hot deck multiple imputation procedure for handling MAR accelerometer data performed well among a cohort of older women, and may be preferable compared to available case or complete case analysis. Future research should investigate the relative performance of similar hot deck multiple imputation procedures compared to the use of parametric imputation models (e.g., multivariate Gaussian model) to impute accelerometer data in other populations with greater diversity in age, physical functioning levels, and physical activity patterns. Given the increasing frequency of accelerometer use in large epidemiologic studies relating physical activity with disease risks, additional options and improved methods for handling missing data will be critical for maximizing the utility of accelerometer measurements and obtaining more precise and valid tests of study hypotheses pertaining to physical activity and disease.
Supplementary Material
Acknowledgments
The authors would like to acknowledge the following investigators in the WHI Program: Program Office: (National Heart, Lung, and Blood Institute, Bethesda, Maryland) Jacques Rossouw, Shari Ludlam, Joan McGowan, Leslie Ford, and Nancy Geller. Clinical Coordinating Center: (Fred Hutchinson Cancer Research Center, Seattle, WA) Garnet Anderson, Ross Prentice, Andrea LaCroix, and Charles Kooperberg. Investigators and Academic Centers: (Brigham and Women’s Hospital, Harvard Medical School, Boston, MA) JoAnn E. Manson; (MedStar Health Research Institute/Howard University, Washington, DC) Barbara V. Howard; (Stanford Prevention Research Center, Stanford, CA) Marcia L. Stefanick; (The Ohio State University, Columbus, OH) Rebecca Jackson; (University of Arizona, Tucson/Phoenix, AZ) Cynthia A. Thomson; (University at Buffalo, Buffalo, NY) Jean Wactawski-Wende; (University of Florida, Gainesville/Jacksonville, FL) Marian Limacher; (University of Iowa, Iowa City/Davenport, IA) Jennifer Robinson; (University of Pittsburgh, Pittsburgh, PA) Lewis Kuller; (Wake Forest University School of Medicine, Winston-Salem, NC) Sally Shumaker; (University of Nevada, Reno, NV) Robert Brunner; (University of Minnesota, Minneapolis, MN) Karen L. Margolis. Women’s Health Initiative Memory Study: (Wake Forest University School of Medicine, Winston-Salem, NC) Mark Espeland.
The authors thank Fang Wen for her assistance with the data. This research was funded by grants from National Heart, Lung, and Blood Institute, National Institutes of Health (NIH), grant R01HL105065 (PI: LaCroix) and contracts #HHSN268201100046C, #HHSN268201100001C, #HHSN268201100002C, #HHSN268201100003C, #HHSN268201100004C, and #HHSN271201100004C. The WHI program is funded by the National Heart, Lung, and Blood Institute, NIH through contracts #HHSN268201600018C, #HHSN268201600001C, #HHSN268201600002C, #HHSN268201600003C, and #HHSN268201600004C. The content is solely the responsibility of the authors and does not necessarily represent the views of the NIH. The results of the present study do not constitute endorsement by the authors of the products described in this paper. The results of this study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation.
Footnotes
Ethical Approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
References
- 1.Anderson GL, Manson J, Wallace R et al. (2003) Implementation of the Women’s Health Initiative study design. Annals of Epidemiology 13:S5–S17 [DOI] [PubMed] [Google Scholar]
- 2.Andridge RR, Little RJA (2010) A review of hot deck imputation for survey nonresponse. International Statistical Review 78(1):40–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Catellier DJ, Hannan PJ, Murray DM, Addy CL, Conway TL, Yang S, Rice JC (2005) Imputation of missing data when measuring physical activity by accelerometry. Medicine & Science in Sports & Exercise 37(11):S555–S562 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Choi L, Liu Z, Matthews CE, Buchowski MS (2011) Validation of accelerometer wear and nonwear time classification algorithm. Medicine & Science in Sports & Exercise 43(2):357–364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Choi L, Ward SC, Schnelle JF, Buchowski MS (2012) Assessment of wear/nonwear time classification algorithms for triaxial accelerometer. Medicine & Science in Sports & Exercise 44(10):2009–2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Evenson KR, Terry JW Jr (2009) Assessment of differing definitions of accelerometer nonwear time. Research Quarterly for Exercise and Sport 80(2):355–362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Evenson KR, Wen F, Herring AH et al. (2015) Calibrating physical activity intensity for hip-worn accelerometry in women age 60 to 91 years: The Women’s Health Initiative OPACH Calibration Study. Preventive Medicine Reports 2:750–756 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hays J, Hunt JR, Hubbell FA, Anderson GL, Limacher M, Allen C, Rossouw JE (2003) The Women’s Health Initiative recruitment methods and results. Annals of Epidemiology 13:S18–S77 [DOI] [PubMed] [Google Scholar]
- 9.LaCroix AZ, Rillamas-Sun E, Buchner D et al. (2017) The Objective Physical Activity and Cardiovascular Disease Health in Older Women (OPACH) Study. BMC Public Health 17(192) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.LaMonte MJ, Lewis CE, Buchner DM, Evenson KR, Rillamas-Sun E, Di C, Lee I-M, Bellettiere J, Stefanick ML, Eaton CB, Howard BV, Bird C, LaCroix AZ (2017) Both light intensity and moderate-to-vigorous physical activity measured by accelerometry are favorably associated with cardiometabolic risk factors in older women: The Objective Physical Activity and Cardiovascular Health (OPACH) Study. The Journal of the American Heart Association 6:e007064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee PH (2013) Data imputation for accelerometer-measured physical activity: The combined approach. American Journal of Clinical Nutrition 97:965–971 [DOI] [PubMed] [Google Scholar]
- 12.Lee JA, Gill J (2016) Missing value imputation for physical activity data measured by accelerometer. Statistical Methods in Medical Research 0(0):1–20 [DOI] [PubMed] [Google Scholar]
- 13.Little RJA (1988) A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association 83(404):1198–1202 [Google Scholar]
- 14.Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data, 2nd ed. John Wiley & Sons, Inc., Hoboken, NJ [Google Scholar]
- 15.McHorney CA, Ware JE, Raczek AE (1993) The MOS 36-item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Medical Care 31:247–263 [DOI] [PubMed] [Google Scholar]
- 16.Meyer AM, Evenson KR, Morimoto L, Siscovick D, White E (2009) Test-retest reliability of the WHI physical activity questionnaire. Medicine & Science in Sports & Exercise 41(3):530–538 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Neuhouser ML, Di C, Tinker LF et al. (2013) Physical activity assessment: Biomarkers and self-report of activity-related energy expenditure in the WHI. American Journal of Epidemiology 177(6):576–585 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, Inc., New York [Google Scholar]
- 19.Schafer JL (1997) Analysis of Incomplete Multivariate Data. Chapman & Hall, New York [Google Scholar]
- 20.Trost SG, Pate RR, Freedson PS, Sallis JF, Taylor WC (2000) Using objective physical activity measures with youth: How many days of monitoring are needed? Medicine & Science in Sports & Exercise 32(2):426–431 [DOI] [PubMed] [Google Scholar]
- 21.van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research 16:219–242 [DOI] [PubMed] [Google Scholar]
- 22.Ware JE, Sherbourne CD (1992) The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care 30:473–83 [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







