Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Oct 30;16(2):352–367. doi: 10.1093/biostatistics/kxu045

Quantifying the lifetime circadian rhythm of physical activity: a covariate-dependent functional approach

Luo Xiao 1,*, Lei Huang 1, Jennifer A Schrack 2, Luigi Ferrucci 3, Vadim Zipunnikov 4, Ciprian M Crainiceanu 4
PMCID: PMC4804116  PMID: 25361695

Abstract

Objective measurement of physical activity using wearable devices such as accelerometers may provide tantalizing new insights into the association between activity and health outcomes. Accelerometers can record quasi-continuous activity information for many days and for hundreds of individuals. For example, in the Baltimore Longitudinal Study on Aging physical activity was recorded every minute for 773 adults for an average of 7 days per adult. An important scientific problem is to separate and quantify the systematic and random circadian patterns of physical activity as functions of time of day, age, and gender. To capture the systematic circadian pattern, we introduce a practical bivariate smoother and two crucial innovations: (i) estimating the smoothing parameter using leave-one-subject-out cross validation to account for within-subject correlation and (ii) introducing fast computational techniques that overcome problems both with the size of the data and with the cross-validation approach to smoothing. The age-dependent random patterns are analyzed by a new functional principal component analysis that incorporates both covariate dependence and multilevel structure. For the analysis, we propose a practical and very fast trivariate spline smoother to estimate covariate-dependent covariances and their spectra. Results reveal several interesting, previously unknown, circadian patterns associated with human aging and gender.

Keywords: Accelerometer, Bivariate smoothing, Covariance, Sandwich smoother, Trivariate smoothing

1. Introduction

Physical activity is an important biomarker of human aging. Individuals who are physically active tend to live longer and healthier lives (U.S. Dept of Health and Human Services, 2010). Higher physical activity is associated with fewer chronic diseases, better physical functioning, and longer active life expectancy (Pate and others, 1995; Ferrucci and Alley, 2007). However, due to a lack of objective and accurate measures of physical activity, little is known about how patterns and intensity of daily free-living activity change with age. A better understanding of these processes may provide insights into the intensity and duration of physical activity necessary to maintain and extend a healthy life in an aging population.

Traditional subjective physical activity measurement tools such as the activity of daily living questionnaires provide biased and coarse estimates of daily activity and capture mostly moderate-to-high intensity activities (Sallis and Saelens, 2000). In contrast, accelerometers present objective and detailed measurements of physical activity, and have been used in many observational studies and clinical trials in recent years (Bai and others, 2014; Bussmann and others, 2001; Culhane and others, 2005; He and others, 2014; Troiano and others, 2008). Accelerometers capture a much wider range of activity intensities, including the light intensities characteristic of domestic and self-care tasks, which are essential for studying physical activity of the elderly. The use of accelerometers makes it possible to study the circadian rhythm of physical activity at an unprecedented temporal level, which enables the use of increasingly refined hypotheses and designs of experiments (Bai and others, 2014).

Our objective is to analyze the systematic and random circadian rhythms of physical activity as functions of time of day and age for both women and men. Data were collected in the Baltimore Longitudinal Study on Aging (BLSA), the longest-running scientific study on human aging in the United States. A description of the sample and enrollment procedures and criteria can be found in Stone and Norris (1966). Here we provide a summary of the activity data collection. Participants in the study wore the Actiheart portable physical activity monitor for several consecutive days in a free-living environment. Activity counts were measured in 1-min epochs. Participants were asked to wear the monitor at all times, except when bathing or swimming. The missing values were imputed as the subject-specific average over all of the recording days at the same time period and a daily profile with >5 of missing data were deleted. For this analysis, we focus on 2576 daily activity profiles from 394 female participants and 2852 daily activity profiles from 379 males. Each daily profile has 1440 minute-by-minute measurements of activity counts. The participants are aged between 31 and 96 and have at least two days of data. The median age for female participants is 66 (s.d. 12  years) and for males is 69 (s.d. 13  years). For other demographic information, see Schrack and others (2014).

The top and top-middle panels of Figure 1 display daily activity profiles from 2 days for one female and one male. Activity counts were log-transformed because counts data are highly skewed and then averaged in each 5-min interval. The transformation used was \rightarrow \log (counts + 1) at the minute level. Hereafter we refer to the log-transformed counts as log counts. The four sample profiles in Figure 1 show an overall circadian pattern of activity: Infrequent and light-intensity activity at night (roughly between 11 PM and 6 AM) and frequent and much higher-intensity activity during the day. The same pattern can be observed in other daily profiles for both women and men. The day-to-day variability is substantial because data were collected in a free-living environment and physical activity is not synchronized across days. As a result, the activity spikes are much less pronounced and even disappear in the mean profiles (panels on third rows of Figure 1). The mean profiles are useful for identifying patterns such as how active the subjects are on average and when the subjects are more active. However, the time-dependent mean profiles are less appropriate for quantifying durations of light, moderate and vigorous activity intensities that are not directly related to the specific time of day. Thus, we also consider sorted daily profiles obtained by sorting the counts from the smallest to the largest; this results in a function of the same length with the original activity count function. Thus, if t(1),\ldots ,t(288) are the time points sorted according to the size of the averaged 5-min log count, then the function is i \rightarrow ƒ (t(i)), where Inline graphic is the original count outcome. The bottom panels in Figure 1 display the same profiles as the panels above but in the sorted order. The time coordinate for the sorted profiles is referred to as “sorted time of day”.

Fig. 1.

Fig. 1.

Sample profiles from 2 days, mean profiles and sorted daily profiles for one female and one male. The mean profile is the mean of all activity profiles for the subject. In the bottom panels, the dashed lines are sorted daily profiles and the solid lines are mean sorted profiles.

To further explore the data, we consider four age strata for each gender: Inline graphic years old, 60–67 years old, 68–74 years old, Inline graphic years old. Figure 2 displays the smoothed group-mean profiles of log counts (top panels) and sorted log counts (bottom panels). The smoothed group-mean profiles of log counts differ across the age groups and mean activity counts are decreasing as a function of age at all times during the day for both women and men. Moreover, the association with age seems nonlinear with stronger decrease in average activity after aged 70 in the mid afternoon; this suggests that the interaction between age and the daily activity profile may be quite complex. Some gender differences in the change of activity profiles are also noticeable. For example, the decrease in the mean activity from the 60–67 age group to the 68–74 age group seems to be sharper in men than in women.

Fig. 2.

Fig. 2.

Smoothed mean profiles for four age groups of women and men: Inline graphic years old (solid lines), 60–67 years old (dashed lines), 68–74 years old (dotted lines), Inline graphic years old (dot-dashed lines). The top panels are for unsorted log counts and the bottom panels are for sorted log counts.

This preliminary analysis illustrates the complex interaction between age, gender, and the circadian rhythm of physical activity. Modeling activity profiles as functions of age is scientifically important and could help explain how time-of-day-related patterns and intensities of daily activity change as people become older. Moreover, it could show how durations of physical activity of different intensities change with age and thus provide a deeper understanding of the changes of daily activity.

In addition to the systematic patterns, we are also interested in quantifying the subject-level random circadian rhythms of activity. For example, the sorted log counts for some women are always above their corresponding group means, indicating that these women perform activity of longer duration and higher intensity than expected for their age-specific group. Thus, modeling the variation of the circadian rhythms of activity could explain how subjects differ in activity and identify those determinants that are strongly associated with subject-specific patterns of activity.

To address the scientific interests, we design a novel functional data model that incorporates covariates into the traditional functional data model with a single covariate and also accommodates the multilevel structure in the BLSA activity data. We also provide fast and accurate estimation methods for analyzing the model.

The rest of the paper is organized as follows. In Section 2, we introduce the model and our estimation methods. In Section 3, we evaluate our methods via simulations. In Section 4, we apply the proposed methods to the BLSA data. Section 5 concludes the paper with a discussion.

2. Model for covariate-dependent multilevel functional data

The model for the activity data is

2. (2.1)

where Inline graphic for some interval Inline graphic, Inline graphic is subject index, and Inline graphic corresponds to day Inline graphic for subject Inline graphic. Here Inline graphic can be either log counts or sorted log counts at time Inline graphic (or sorted time Inline graphic) and age Inline graphic, Inline graphic is a bivariate smooth mean function of Inline graphic and Inline graphic, representing the systematic circadian rhythm of activity, Inline graphic models the subject-specific random circadian rhythm of activity that does not vary over the days, Inline graphic models the day-to-day random deviation of the circadian rhythm of activity within subject Inline graphic, and Inline graphic is an error term.

We make the following assumptions: (1) Inline graphic, Inline graphic and Inline graphic are mutually independent; (2) Inline graphic is a stochastic process with zero mean and covariance operator Inline graphic for Inline graphic, a trivariate function that varies smoothly in Inline graphic, Inline graphic and Inline graphic; (3) Inline graphic is a stochastic process with zero mean and covariance operator Inline graphic for Inline graphic, a trivariate function that varies smoothly in Inline graphic, Inline graphic and Inline graphic; (4) Inline graphic's are realizations of Inline graphic, a white noise process with zero mean and constant variance Inline graphic.

Model (2.1) extends the functional data model with a single covariate (Rice and Silverman, 1991) and can be regarded as a structured functional data model (Di and others, 2009; Zipunnikov and others, 2011; Greven and others, 2010; Zipunnikov and others, 2014; Staicu and others, 2010; Shou and others, 2014), which falls into the framework of functional mixed models (Guo, 2002; Morris and Carroll, 2006; Morris and others, 2006; Zhou and others, 2010). Cardot (2007), Jiang and Wang (2010), and M. Li, A. Staicu, and H.D. Bondell, 2014, unpublished manuscript. also incorporate additional covariates into the functional principal component analysis. However, based on our limited knowledge, model (2.1) is the first that combines both covariate dependence and multilevel structure.

2.1. Data structure

The observations are of the form Inline graphic, where Inline graphic denotes the number of subjects, Inline graphic is the number of days that subject Inline graphic wore the device, Inline graphic (hour unit), and Inline graphic. Let Inline graphic, Inline graphic and Inline graphic. Then Inline graphic is an Inline graphic data matrix with Inline graphic.

2.2. Estimation of the bivariate mean surface

We use a bivariate smoother for estimating Inline graphic, the deterministic part of model (2.1). There are many bivariate smoothers including local polynomials (Fan and Gijbels, 1996), low-rank thin plate splines (see, e.g. Wood, 2003), bivariate Inline graphic-splines (Eilers and Marx, 2003). However, fitting data of the size and complexity of the BLSA data is computationally challenging for most of these smoothers. One exception is the recently introduced sandwich smoother (Xiao and others, 2013), which is orders of magnitude faster because of the careful choice of the bivariate penalty that allows the separation of the bivariate smoothing into a series of univariate smoothing steps. We shall use the sandwich smoother, which will be designed to account for the complexity of the BLSA data. Moreover, we shall estimate the smoothing parameters in the sandwich smoother using leave-one-subject-out cross validation and introduce a fast computational technique for the cross validation.

The sandwich smoother models Inline graphic by tensor-product splines Inline graphic, where Inline graphic is a coefficient matrix, Inline graphic (Inline graphic) is the collection of Inline graphic-spline basis functions for the Inline graphic-axis (Inline graphic-axis), and Inline graphic is the number of interior knots plus the order (degree plus 1) of the Inline graphic-splines for Inline graphic (Inline graphic). The coefficient matrix is estimated by penalized weighted least squares with a particular penalty on it so that the fitted surface Inline graphic has a sandwich form Inline graphic, where Inline graphic and Inline graphic are two univariate smoother matrices for Inline graphic and Inline graphic, respectively. We use Inline graphic-splines (Eilers and Marx, 1996) to construct the smoother matrices. For the Inline graphic-axis, we let Inline graphic, where Inline graphic is the Inline graphic model matrix Inline graphic, Inline graphic is a symmetric penalty matrix of size Inline graphic and is constructed by using a difference penalty (Eilers and Marx, 1996), and Inline graphic is a smoothing parameter. Note that if Inline graphic is time of the day, Inline graphic is periodic in Inline graphic in the sense that Inline graphic for all Inline graphic. For such case, periodic Inline graphic-splines are used and the penalty matrix Inline graphic is modified accordingly to ensure that the resulting estimate is periodic in Inline graphic. For the Inline graphic-axis, because the subjects have varying numbers of daily activity profiles, we let Inline graphic be the smoother matrix for penalized least squares inversely weighted by the number of profiles. Specifically, Inline graphic, where Inline graphic is an Inline graphic diagonal matrix with the diagonals the reciprocals of the numbers of subject-specific profiles. Similarly, Inline graphic is the Inline graphic model matrix Inline graphic, Inline graphic is a symmetric penalty matrix of size Inline graphic and constructed by using a difference penalty, and Inline graphic is a smoothing parameter. The knots can be either equally spaced or placed at the quantiles of the data points for each axis; we use the latter for both Inline graphic and Inline graphic. Finally, let Inline graphic be the estimated coefficient matrix. Then the penalized estimate is Inline graphic.

The two smoothing parameters Inline graphic can be selected via a fast implementation of generalized cross validation (GCV; Craven and Wahba, 1979). A major problem with using GCV in our context is that it does not take into account the strong functional correlations in the BLSA data and tends to select smoothing parameters that result in overfit; see, for example, the left-middle panel of supplementary material available at Biostatistics online, Figure S1. Thus, we investigated the leave-one-subject-out cross validation (iCV; Rice and Silverman, 1991), which has been widely used in functional data analysis (Lin and Carroll, 2000; Yao and others, 2003; Reiss and others, 2010). Despite its popularity, iCV is time-consuming and not practical in BLSA. Hence, we propose a simple and practical approximation to iCV, the iGCV, analogous to introducing GCV as an approximation to cross validation in univariate smoothing (Craven and Wahba, 1979). We derive a fast algorithm for calculating iGCV and maintain the fast speed of the sandwich smoother for highly correlated functional data.

2.2.1. The sandwich smoother with iGCV

Let Inline graphic be the prediction of Inline graphic using the sandwich smoother without the subject-specific information Inline graphic, then Inline graphic, where Inline graphic is the Euclidean norm. We use the Inline graphic notation, which stacks the columns of a matrix into a vector.

Proposition 1 —

If Inline graphic is the Inline graphic submatrix of Inline graphic corresponding to subject Inline graphic and Inline graphic is the Inline graphic diagonal block of Inline graphic for subject Inline graphic, then

graphic file with name M125.gif

We propose to replace Inline graphic by Inline graphic in the iCV formula and obtain

2.2.1. (2.2)

Further simplification for iGCV can be found in supplementary material available at Biostatistics online, Section S.2 and the final formula for iGCV is computationally much simpler than (2.2). An evaluation of the complexity of all algorithms and memory requirement is provided in supplementary material available at Biostatistics online, Section S.5.

2.3. Estimation of covariance operators

In this section, we estimate the between-subject covariance operator Inline graphic and the within-subject covariance operator Inline graphic. Similar to Di and others (2009) and Greven and others (2010), we first construct empirical estimates of the covariance operators. Let Inline graphic, where Inline graphic is obtained from Section 2.2. Define Inline graphic and Inline graphic. Then the empirical estimates of the covariance Inline graphic and Inline graphic for a fixed age Inline graphic are

2.3. (2.3)

Thus, we obtain two Inline graphic empirical covariance operators Inline graphic and Inline graphic, where the Inline graphicth layers are Inline graphic and Inline graphic. Note that layers here are induced by the subject-specific covariates, Inline graphic, and when covariance operators do not depend on covariates one would simply average Inline graphic and Inline graphic over Inline graphic.

Smoothing these empirical estimates is a non-trivial problem. Indeed, as in the BLSA Inline graphic and Inline graphic, both Inline graphic and Inline graphic contain Inline graphic25 million entries and trivariate smoothing becomes computationally prohibitive. To solve this problem, we propose to extend the sandwich smoother (Xiao and others, 2013, 2014) and investigate smooth covariance estimators Inline graphic of the type Inline graphic, where Inline graphic and Inline graphic denote two univariate symmetric smoother matrices that are constructed by Inline graphic-splines (Eilers and Marx, 1996) and smooth along the Inline graphic and Inline graphic directions of Inline graphic. Then the smoother has the form

2.3. (2.4)

where Inline graphic is the smooth estimator for Inline graphic and Inline graphic is the Inline graphicth column of Inline graphic. This suggests a two-step estimation procedure: (i) smooth each Inline graphic using the sandwich smoother described in Section 2.2 and (ii) weight the smoothed Inline graphic by Inline graphic to obtain Inline graphic. The first step implies that the same smoother matrix Inline graphic is used for smoothing all Inline graphic, whereas the second step resembles univariate smoothing, though the data points are square matrices instead of scalars.

By Mercer's theorem, Inline graphic can be decomposed as Inline graphic where for a fixed Inline graphic, Inline graphic is a set of orthonormal basis functions and Inline graphic are the associated eigenvalues. We refer to Inline graphic as eigensurfaces. Then Inline graphic has the Karhunen–Loeve representation Inline graphic, where Inline graphic are scores with zero mean and variance Inline graphic, and are uncorrelated for all Inline graphic and Inline graphic. Estimates of Inline graphic and Inline graphic can be obtained by eigendecompositions of an estimate of Inline graphic; see supplementary material available at Biostatistics online, Section S.3 for further details and prediction of scores.

2.3.1. Selection of smoothing parameters

For the two-step smoothing procedure of the covariance operators we need to estimate the smoothing parameters in Inline graphic and Inline graphic. First note that Inline graphic and Inline graphic in (2.3) can be rewritten as Inline graphic and Inline graphic, where Inline graphic and Inline graphic are two symmetric matrices. Let Inline graphic and Inline graphic be two matrices such that Inline graphic and Inline graphic. Then for selecting the smoothing parameter in Inline graphic (for Inline graphic), we extend the pooled GCV (PGCV) in Xiao and others (2014) to the case of simultaneously smoothing Inline graphic covariance matrices Inline graphic. The simplified formula in Proposition 2 of Xiao and others (2014) can be used to calculate PGCV. By minimizing PGCV, we can easily select the smoothing parameter in Inline graphic.

Next we consider the smoothing parameter in Inline graphic (for Inline graphic). First note that if we let Inline graphic, then equation (2.4) implies that we use the same smoother matrix Inline graphic for all columns of Inline graphic. This suggests an extension of GCV for smoothing matrices, Inline graphic. We also derive a fast formula for calculating GCVM; see supplementary material available at Biostatistics online, Section S.4 for details.

3. Simulation studies

We conduct simulations to compare the sandwich smoother using iGCV with the original sandwich smoother, which was developed for independent data. We also investigate the accuracy of the proposed trivariate spline smoother in Section 2.3 for estimating covariate-dependent covariance operators. At the same time, we assess the speed of the proposed methods.

3.1. Simulation settings

We generate data from the model

3.1.

where Inline graphic, Inline graphic, Inline graphic is the number of data points per subject, and Inline graphic is the number of subjects. Here Inline graphic is a multiplier that controls the ratio of between- and within-subject covariance. We simulate the random scores and noises independently with Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic. We fix the eigensurfaces Inline graphic and Inline graphic and design a full Inline graphic factorial experiment with five factors each with two levels: The mean function Inline graphic, the eigenvalue functions Inline graphic, the number of subjects Inline graphic, the multiplier Inline graphic, and the noise level Inline graphic. supplementary material available at Biostatistics online provides details of the design. Our experiment creates 32 different sets of models parameters and we simulate 100 datasets under each model setting.

3.2. Simulation results

3.2.1. Mean function estimation

We compared three methods: Sandwich smoother with GCV, sandwich smoother with iGCV (SS-iGCV), and thin plate regression splines with GCV (TPRS-GCV). We evaluated the performance of the three methods by mean integrated squared errors. The results indicate that SS-iGCV always outperforms SS-GCV and is better than or comparable with TPRS-GCV; see supplementary material available at Biostatistics online, Tables S1 and S2 for the detailed results. Simulations illustrate that when data are correlated within subjects, the sandwich smoother based on the iGCV performs very well for estimating the mean function.

As for computation speed, our results (supplementary material available at Biostatistics online, Table S3) indicate that SS-iGCV, while slower than SS-GCV, is still an order of magnitude faster than TPRS-GCV. The computational efficiency of SS-iGCV is essential for bootstrap-based inference. Indeed, using SS-iGCV allowed us to smooth Inline graphic bootstrap of dimensions Inline graphic and Inline graphic in. 1 h.

3.2.2. Covariance operator estimation

We investigated the performance of the trivariate spline smoother proposed in Section 2.3 for estimating covariate-dependent covariance operators. We applied the method to Inline graphic, where Inline graphic was estimated by SS-iGCV. We found that the method worked well; see supplementary material available at Biostatistics online, Table S4 for the detailed results. We also recorded the computation times of the proposed trivariate smoother; see supplementary material available at Biostatistics online, Table S5. The simulation results show that the proposed trivariate smoother is, indeed, very fast.

4. Results for the BLSA

We apply the functional data model (2.1) in Section 2 to the BLSA data. In Section 4.1, we analyze the systematic rhythms based on the estimated surfaces. In Section 4.2, we analyze the variation of random rhythms based on estimated trivariate covariance operators.

4.1. Systematic circadian rhythms

The left panels of Figure 3 show the heat maps of estimated bivariate activity surfaces using log counts for women, men and the associated gender difference. In the two heat maps for both genders (panels (a) and (b) in Figure 3), blue corresponds to no or little activity, green and yellow correspond to light intensity activity, and red corresponds to moderate and vigorous activity intensities. Blue areas overlap with the known resting periods of people; for example, Inline graphic PM to Inline graphic AM for Inline graphic-year-old people. Red areas overlap with the working and activity hours of the day; for example, Inline graphic AM to Inline graphic PM forInline graphic-year-old people and Inline graphic AM to Inline graphic PM for Inline graphic-year-old people. Panels (d) and (e) in Figure 3 display the estimated mean activity profiles for different ages and reveal two interesting patterns: (i) Mean activity is decreasing with increasing age at all times during the day and (ii) activity peaks are dropping with increasing age with larger decreases in late afternoon and evening. In particular, for people aged Inline graphic or younger their mean activity is pretty constant between Inline graphic AM and Inline graphic PM. For older people, there seems to be a decline in activity between Inline graphic and Inline graphic PM, with activity peaking up again just before Inline graphicpm. However, the second peak is much lower than the one of younger people. These results provide the first comprehensive and detailed look at the life-time circadian rhythms of activity in a large cohort.

Fig. 3.

Fig. 3.

Heat maps of estimated activity surfaces for women, men, and gender difference (left column) and estimated circadian activity curves of five different ages for women, men, and gender difference (right column). Note that the color bar in the bottom left heat map differs from those in the other two heat maps.

Panels (c) and (f) in Figure 3 display the age-specific gender difference in daily physical activity. Each point in (c) provides the estimated activity difference between men and women at a time of the day and for certain age. Plot (c) indicates that women who are 60 or older are more active than men in the same age category; this contradicts with the results reported by Troiano and others (2008) in the NHANES study. We believe that these differences are due to the large skewness of the activity count data, which was ignored by Troiano and others (2008). We have also conducted formal tests for gender differences in physical activity. More precisely, we focus on the parameter Inline graphic, where Inline graphic and Inline graphic denote the log activity surface for women and men, respectively. Inline graphic can be interpreted as the difference in the sum of log counts over the entire day between men and women of age Inline graphic. For each Inline graphic, the null hypothesis is Inline graphic against Inline graphic. Estimators of Inline graphic can be used to test at what age the gender difference is significant, while a nonparametric bootstrap of subjects approach (Crainiceanu and others, 2012) can be used to conduct the tests. To obtain the null distribution of the tests, we used a permutation approach where we randomly permuted the gender of subjects and applied the sandwich smoother using iGCV. Because the sandwich smoother is fast we obtained Inline graphic bootstraps in 1 h using a standard laptop. The joint tests over age provide confirmatory evidence that women between Inline graphic and Inline graphic are more active then men of the same age. Supplementary material available at Biostatistics online, Section S6 has more details.

So far, all results referred to time-of-day patterns of activity. Next, we focus on activity intensity irrespective to when the activity was conducted during the day. Thus, we investigate the sorted log counts function introduced in Section 1, which reduces the heterogeneity due to lack of temporal synchronization of activities between people and days. Panels (a) and (b) in Figure 4 are the heat maps of the estimated activity surfaces of sorted log counts for women and men, respectively. Inspection of these plots indicates that overall durations of active periods (active if sorted log counts Inline graphiclog 11) and moderate/vigorous activity (moderate/vigorous if sorted log counts Inline graphiclog 201, shown in dark red) are decreasing as a function of age. For example, Inline graphic-year-old women have on average Inline graphic7 active hours, while Inline graphic-year-old women have on average Inline graphic6 active hours. The active time is independent of when activity occurs. Similarly, Inline graphic-year-old men have Inline graphic25 min of moderate/vigorous activity, while Inline graphic-year-old men have, on average, only Inline graphic10 min of moderate/vigorous activity. Panels (c) and (f) in Figure 4 display the difference between men and women in aver sorted log counts. These results combined with plots (a) and (b) in Figure 4 indicate that light activity is conducted over longer periods by women than by men. In contrast, the intensity of moderate/vigorous activities in men is higher than in women, though the periods over which these activities are performed are very short. Similar to log counts, we conducted tests on Inline graphic, where Inline graphic and Inline graphic now denote the sorted log activity surface for women and men, respectively. The hypotheses are: For each Inline graphic, Inline graphic against Inline graphic. The test provides similar evidence that women between 65 and 75 are more active than men of the same age.

Fig. 4.

Fig. 4.

Heat maps of estimated sorted log activity surfaces for women, men, and gender difference (left column) and estimated log activity curves of five different ages for women, men, and gender difference (right column). Note that the color bar in the bottom left heat map differs from those in the other two heat maps.

To summarize, the results indicate that: (i) Duration of active periods, amount of activity, duration of moderate/vigorous activity, and activity peaks are all decreasing as a result of aging; (ii) more pronounced reductions with age in activity intensities happen in the second part of the day with a bigger trade-off of high to low intensity in men than in women; (iii) there is evidence of gender differences, though it goes in the opposite direction of what is reported in the literature and our results indicate that women tend to have longer periods of light activity, while younger men tend to have short periods of higher intensity activities when exercising moderately or vigorously.

4.2. Random circadian rhythms

Here, we analyze the variation in the activity profiles of sorted log counts, which can be modeled by the combination of a between-subject covariance operator and a within-subject covariance operator. Analysis for the log counts can be found in supplementary material available at Biostatistics online, Section S6. The covariance operators are age dependent and were estimated by the trivariate smoother in Section 2.3. In particular, we assumed the eigensurfaces of covariance operators are the same for both genders. Figure 5 displays two eigensurfaces for each covariance operator. Each row of the eigensurfaces is an eigenfunction corresponding to a certain age. For the between-subject covariance, the first and second eigenfunctions account for 79–83% and 14–18% of the total variation; for the within-subject covariance the corresponding ranges of proportions are 84–88% and 8–12%. Note that the proportion of variation for the eigenfunctions is allowed to change with age. Now we interpret the eigenfunctions of the between-subject covariance. The rows of the first eigensurfaces are mostly positive, suggesting that individuals with positive scores on the first eigenfunction are more active than the population average and those with negative scores are less active. Here “more active” means longer duration of active periods and higher intensity activity at each Inline graphic, whereas “less active” means shorter duration of active periods and activity of lower intensities at each Inline graphic. Therefore, each subject's score on the first eigenfunction can be used to identify physically more active and less active subjects. The second eigensurfaces show a contrast between moderate/vigorous activity and light activity: People with positive scores on the second eigenfunction have relatively more moderate/vigorous activity and less light intensity activity. A possible use of the = scores on the second eigenfunction is to distinguish between people who conduct moderate/vigorous activity from the ones who do not. The eigensurfaces for the within-subject covariance are similar and quantify the day-to-day within-subject variability of activity.

Fig. 5.

Fig. 5.

Estimated between-subject eigensurfaces (left panels) and within-subject eigensurfaces (right panels) for sorted log counts.

For each covariance operator, we calculated the between- and within-subject variability at each age, that is, the sum of all eigenvalues (see Figure 6). The between-subject variability first decreases as a function of age; this suggests that activity patterns of older people are more similar than those of younger people. Then starting from around aged 70, the between-subject variability increases. The within-subject variability is a decreasing function of age. This indicates that day-to-day variability of activity for older people is smaller than that for younger people. Finally, the within-subject variability is smaller than the between-subject variability and the relative difference becomes bigger with age, showing that day-to-day variability of activity decreases faster than the between-subject variability.

Fig. 6.

Fig. 6.

Estimated total variation of the between-subject covariance (left panel) and the within-subject covariance (right panel) for sorted log counts for each gender. Total variation is the sum of all eigenvalues.

5. Discussion

The scientific goal of the paper was to quantify the systematic and random patterns of activity as functions of age. To achieve this, we introduced an explicit model containing: (i) A bivariate smooth function of time of day and age to capture the systematic component and (ii) two latent processes that depend on time of day and age and describe the within- and between-subject variability. Using this model, we have shown that (i) average activity decreases with age at every time during the day, (ii) activity decreases faster with age in late afternoon and evening, (iii) older women tend to be more active than older men in low activity intensity activities, and (iv) the amount of within-subject variability is comparable with the amount of between-subject variability and both are decreasing functions of age.

From a methodological perspective, we have introduced bivariate and trivariate smoothing of data with sizable within- and between-subject correlation using leave-one-subject-out generalized cross validation (iGCV). Coupled with the fast sandwich smoother, we were able to conduct these analyses; the R code will be made publicly available.

Supplementary material

Supplementary Material is available at http://biostatistics.oxfordjournals.org.

Funding

This work was supported by Grant Number R01NS060910 from the National Institute of Neurological Disorders and Stroke. This work represents the opinions of the researchers and not necessarily that of the granting organizations.

Supplementary Material

Supplementary Data

Acknowledgement

Conflict of Interest: None declared.

References

  1. Bai J., He B., Shou H., Zipunnikov V., Glass T. A., Crainiceanu C. M. (2014). Normalization and extraction of interpretable metrics from raw accelerometry data. Biostatistics 15, 102–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bussmann J. B., Martens W. L., Tulen J. H., Schasfoort F. C., van den Berg-Emons H. J., Stam H. J. (2001). Measuring daily behavior using ambulatory accelerometry: the activity monitor. Behavior Research Methods, Instruments, & Computers 33(3), 349–356. [DOI] [PubMed] [Google Scholar]
  3. Cardot H. (2007). Conditional functional principal components analysis. Scandinavian Journal of Statistics 34, 317–335. [Google Scholar]
  4. Crainiceanu C. M., Staicu A., Ray S., Punjabi N. (2012). Bootstrap-based inference on the difference in the means of two correlated functional processes. Statistics in Medicine 31, 3223–3240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Craven P., Wahba G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik 31, 377–403. [Google Scholar]
  6. Culhane K. M., O’Connor M., Lyons D., Lyons G. M. (2005). Accelerometers in rehabilitation medicine for older adults. Age Ageing 34, 556–560. [DOI] [PubMed] [Google Scholar]
  7. Di C., Crainiceanu C. M., Caffo B. S., Punjabi N. (2009). Multilevel functional principal component analysis. Annals of Applied Statistics 3, 458–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Eilers P. H. C., Marx B. D. (1996). Flexible smoothing with B-splines and penalties (with Discussion). Statistical Science 11, 89–121. [Google Scholar]
  9. Eilers P. H. C., Marx B. D. (2003). Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intelligent Laboratory Systems 66, 159–174. [Google Scholar]
  10. Fan J., Gijbels I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall/CRC. [Google Scholar]
  11. Ferrucci L., Alley D. (2007). Obesity, disability, and mortality. Archives of Internal Medicine 167, 750–751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Greven S., Crainiceanu C. M., Caffo B. S., Reich D. (2010). Longitudinal functional principal component. Electronic Journal of Statistics 4, 1022–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Guo W. (2002). Functional mixed effects models. Biometrics 58, 121–128. [DOI] [PubMed] [Google Scholar]
  14. He B., Bai J., Zipunnikov V., Koster A., Paolo C., Lange-Maria B., Glynn N. W., Harris T. B., Crainiceanu C. M. (2014). Predicting human movement with multiple accelerometers using movelets. Medicine & Science in Sports & Exercise 46, 1859–1866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Jiang C., Wang J. (2010). Covariate adjusted functional principal components analysis for longitudinal data. Annals of Statistics 38, 362–388. [Google Scholar]
  16. Lin X., Carroll R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. Journal of the American Statistical Association 95, 520–534. [Google Scholar]
  17. Morris J. S., Arroyo C., Coull B. A., Ryan L. M., Herrick R., Gortmaker S. L. (2006). Using wavelet-based functional mixed models to characterize population heterogeneity in accelerometer profiles: a case study. Journal of the American Statistical Association 101, 1352–1364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Morris J. S., Carroll R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society B 68, 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pate R. R., Pratt M., Blair S. N. and others (1995). Physical activity and public health: a recommendation from the centers for disease control and prevention and the american college of sports medicine. Journal of the American Medical Association 273, 402–407. [DOI] [PubMed] [Google Scholar]
  20. Reiss P. T., Huang L., Mennes M. (2010). Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics 6, 28. [DOI] [PubMed] [Google Scholar]
  21. Rice J. A., Silverman B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society: Series B 53, 233–243. [Google Scholar]
  22. Sallis J. F., Saelens B. E. (2000). Assessment of physical activity by self-report: status, limitations, and future directions. Research Quarterly for Exercise & Sport 71, S1–14. [PubMed] [Google Scholar]
  23. Schrack J. A., Zipunnikov V., Goldsmith J., Bai J., Simonsick E. M., Crainiceanu C. M., Ferrucci L. (2014). Assessing the “physical cliff”: detailed quantification of aging and patterns of physical activity. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 69, 973–979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Shou H., Zipunnikov V., Crainiceanu C. M., Greven S. (2014). Structured functional principal component analysis. Biometrics (to appear). [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Staicu A. M., Crainiceanu C. M., Carroll R. J. (2010). Fast methods for spatially correlated multilevel functional data. Biostatistics 11, 177–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Stone J. L., Norris A. H. (1966). Activities and attitudes of participants in the Baltimore Longitudinal Study. The Journals of Gerontology 21, 575–580. [DOI] [PubMed] [Google Scholar]
  27. Troiano R. P., Berrigan D., Dodd K. W., Masse L. C., Tilert T., McDowell M. (2008). Physical activity in the United States measured by accelerometer. Medicine & Science in Sports & Exercise 40, 181–188. [DOI] [PubMed] [Google Scholar]
  28. U.S. Department of Health and Human Services. (2010). The surgeon general's vision for a healthy and fit nation. Rockville, MD: U.S. Department of Health and Human Services, Office of the Surgeon General. [PubMed] [Google Scholar]
  29. Wood S. N. (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B 65, 95–114. [Google Scholar]
  30. Xiao L., Li Y., Ruppert D. (2013). Fast bivariate Inline graphic-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B 75, 577–599. [Google Scholar]
  31. Xiao L., Ruppert D., Zipunnikov V., Crainiceanu C. M. (2014). Fast covariance function estimation for high-dimensional functional data. Statistics and Computing (to appear). [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Yao F., Müller H., Clifford A. J., Dueker S. R., Follett J., Lin Y., Buchholz B. A., Vogel J. S. (2003). Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 20, 852–873. [DOI] [PubMed] [Google Scholar]
  33. Zhou L., Huang J. Z., Martinez J. G., Maity A., Baladandayuthapani V., Carroll R. J. (2010). Reduced rank mixed effects models for spatially correlated hierarchical functional data. Journal of the American Statistical Association 105, 390–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zipunnikov V., Caffo B. S., Crainiceanu C. M., Yousem D. M., Davatzikos C., Schwartz B. S. (2011). Multilevel functional principal component analysis for high-dimensional data. Journal of Computational and Graphical Statistics 20, 852–873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Zipunnikov V., Greven S., Shou H., Caffo B. S., Reich D. S., Crainiceanu C. M. (2014). Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis. Annals of Applied Statistics (to appear). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES