Abstract
Objective measurement of physical activity using wearable devices such as accelerometers may provide tantalizing new insights into the association between activity and health outcomes. Accelerometers can record quasi-continuous activity information for many days and for hundreds of individuals. For example, in the Baltimore Longitudinal Study on Aging physical activity was recorded every minute for 773 adults for an average of 7 days per adult. An important scientific problem is to separate and quantify the systematic and random circadian patterns of physical activity as functions of time of day, age, and gender. To capture the systematic circadian pattern, we introduce a practical bivariate smoother and two crucial innovations: (i) estimating the smoothing parameter using leave-one-subject-out cross validation to account for within-subject correlation and (ii) introducing fast computational techniques that overcome problems both with the size of the data and with the cross-validation approach to smoothing. The age-dependent random patterns are analyzed by a new functional principal component analysis that incorporates both covariate dependence and multilevel structure. For the analysis, we propose a practical and very fast trivariate spline smoother to estimate covariate-dependent covariances and their spectra. Results reveal several interesting, previously unknown, circadian patterns associated with human aging and gender.
Keywords: Accelerometer, Bivariate smoothing, Covariance, Sandwich smoother, Trivariate smoothing
1. Introduction
Physical activity is an important biomarker of human aging. Individuals who are physically active tend to live longer and healthier lives (U.S. Dept of Health and Human Services, 2010). Higher physical activity is associated with fewer chronic diseases, better physical functioning, and longer active life expectancy (Pate and others, 1995; Ferrucci and Alley, 2007). However, due to a lack of objective and accurate measures of physical activity, little is known about how patterns and intensity of daily free-living activity change with age. A better understanding of these processes may provide insights into the intensity and duration of physical activity necessary to maintain and extend a healthy life in an aging population.
Traditional subjective physical activity measurement tools such as the activity of daily living questionnaires provide biased and coarse estimates of daily activity and capture mostly moderate-to-high intensity activities (Sallis and Saelens, 2000). In contrast, accelerometers present objective and detailed measurements of physical activity, and have been used in many observational studies and clinical trials in recent years (Bai and others, 2014; Bussmann and others, 2001; Culhane and others, 2005; He and others, 2014; Troiano and others, 2008). Accelerometers capture a much wider range of activity intensities, including the light intensities characteristic of domestic and self-care tasks, which are essential for studying physical activity of the elderly. The use of accelerometers makes it possible to study the circadian rhythm of physical activity at an unprecedented temporal level, which enables the use of increasingly refined hypotheses and designs of experiments (Bai and others, 2014).
Our objective is to analyze the systematic and random circadian rhythms of physical activity as functions of time of day and age for both women and men. Data were collected in the Baltimore Longitudinal Study on Aging (BLSA), the longest-running scientific study on human aging in the United States. A description of the sample and enrollment procedures and criteria can be found in Stone and Norris (1966). Here we provide a summary of the activity data collection. Participants in the study wore the Actiheart portable physical activity monitor for several consecutive days in a free-living environment. Activity counts were measured in 1-min epochs. Participants were asked to wear the monitor at all times, except when bathing or swimming. The missing values were imputed as the subject-specific average over all of the recording days at the same time period and a daily profile with >5 of missing data were deleted. For this analysis, we focus on 2576 daily activity profiles from 394 female participants and 2852 daily activity profiles from 379 males. Each daily profile has 1440 minute-by-minute measurements of activity counts. The participants are aged between 31 and 96 and have at least two days of data. The median age for female participants is 66 (s.d. 12 years) and for males is 69 (s.d. 13 years). For other demographic information, see Schrack and others (2014).
The top and top-middle panels of Figure 1 display daily activity profiles from 2 days for one female and one male. Activity counts were log-transformed because counts data are highly skewed and then averaged in each 5-min interval. The transformation used was \rightarrow \log (counts + 1) at the minute level. Hereafter we refer to the log-transformed counts as log counts. The four sample profiles in Figure 1 show an overall circadian pattern of activity: Infrequent and light-intensity activity at night (roughly between 11 PM and 6 AM) and frequent and much higher-intensity activity during the day. The same pattern can be observed in other daily profiles for both women and men. The day-to-day variability is substantial because data were collected in a free-living environment and physical activity is not synchronized across days. As a result, the activity spikes are much less pronounced and even disappear in the mean profiles (panels on third rows of Figure 1). The mean profiles are useful for identifying patterns such as how active the subjects are on average and when the subjects are more active. However, the time-dependent mean profiles are less appropriate for quantifying durations of light, moderate and vigorous activity intensities that are not directly related to the specific time of day. Thus, we also consider sorted daily profiles obtained by sorting the counts from the smallest to the largest; this results in a function of the same length with the original activity count function. Thus, if t(1),\ldots ,t(288) are the time points sorted according to the size of the averaged 5-min log count, then the function is i \rightarrow ƒ (t(i)), where
is the original count outcome. The bottom panels in Figure 1 display the same profiles as the panels above but in the sorted order. The time coordinate for the sorted profiles is referred to as “sorted time of day”.
Fig. 1.
Sample profiles from 2 days, mean profiles and sorted daily profiles for one female and one male. The mean profile is the mean of all activity profiles for the subject. In the bottom panels, the dashed lines are sorted daily profiles and the solid lines are mean sorted profiles.
To further explore the data, we consider four age strata for each gender:
years old, 60–67 years old, 68–74 years old,
years old. Figure 2 displays the smoothed group-mean profiles of log counts (top panels) and sorted log counts (bottom panels). The smoothed group-mean profiles of log counts differ across the age groups and mean activity counts are decreasing as a function of age at all times during the day for both women and men. Moreover, the association with age seems nonlinear with stronger decrease in average activity after aged 70 in the mid afternoon; this suggests that the interaction between age and the daily activity profile may be quite complex. Some gender differences in the change of activity profiles are also noticeable. For example, the decrease in the mean activity from the 60–67 age group to the 68–74 age group seems to be sharper in men than in women.
Fig. 2.
Smoothed mean profiles for four age groups of women and men:
years old (solid lines), 60–67 years old (dashed lines), 68–74 years old (dotted lines),
years old (dot-dashed lines). The top panels are for unsorted log counts and the bottom panels are for sorted log counts.
This preliminary analysis illustrates the complex interaction between age, gender, and the circadian rhythm of physical activity. Modeling activity profiles as functions of age is scientifically important and could help explain how time-of-day-related patterns and intensities of daily activity change as people become older. Moreover, it could show how durations of physical activity of different intensities change with age and thus provide a deeper understanding of the changes of daily activity.
In addition to the systematic patterns, we are also interested in quantifying the subject-level random circadian rhythms of activity. For example, the sorted log counts for some women are always above their corresponding group means, indicating that these women perform activity of longer duration and higher intensity than expected for their age-specific group. Thus, modeling the variation of the circadian rhythms of activity could explain how subjects differ in activity and identify those determinants that are strongly associated with subject-specific patterns of activity.
To address the scientific interests, we design a novel functional data model that incorporates covariates into the traditional functional data model with a single covariate and also accommodates the multilevel structure in the BLSA activity data. We also provide fast and accurate estimation methods for analyzing the model.
The rest of the paper is organized as follows. In Section 2, we introduce the model and our estimation methods. In Section 3, we evaluate our methods via simulations. In Section 4, we apply the proposed methods to the BLSA data. Section 5 concludes the paper with a discussion.
2. Model for covariate-dependent multilevel functional data
The model for the activity data is
![]() |
(2.1) |
where
for some interval
,
is subject index, and
corresponds to day
for subject
. Here
can be either log counts or sorted log counts at time
(or sorted time
) and age
,
is a bivariate smooth mean function of
and
, representing the systematic circadian rhythm of activity,
models the subject-specific random circadian rhythm of activity that does not vary over the days,
models the day-to-day random deviation of the circadian rhythm of activity within subject
, and
is an error term.
We make the following assumptions: (1)
,
and
are mutually independent; (2)
is a stochastic process with zero mean and covariance operator
for
, a trivariate function that varies smoothly in
,
and
; (3)
is a stochastic process with zero mean and covariance operator
for
, a trivariate function that varies smoothly in
,
and
; (4)
's are realizations of
, a white noise process with zero mean and constant variance
.
Model (2.1) extends the functional data model with a single covariate (Rice and Silverman, 1991) and can be regarded as a structured functional data model (Di and others, 2009; Zipunnikov and others, 2011; Greven and others, 2010; Zipunnikov and others, 2014; Staicu and others, 2010; Shou and others, 2014), which falls into the framework of functional mixed models (Guo, 2002; Morris and Carroll, 2006; Morris and others, 2006; Zhou and others, 2010). Cardot (2007), Jiang and Wang (2010), and M. Li, A. Staicu, and H.D. Bondell, 2014, unpublished manuscript. also incorporate additional covariates into the functional principal component analysis. However, based on our limited knowledge, model (2.1) is the first that combines both covariate dependence and multilevel structure.
2.1. Data structure
The observations are of the form
, where
denotes the number of subjects,
is the number of days that subject
wore the device,
(hour unit), and
. Let
,
and
. Then
is an
data matrix with
.
2.2. Estimation of the bivariate mean surface
We use a bivariate smoother for estimating
, the deterministic part of model (2.1). There are many bivariate smoothers including local polynomials (Fan and Gijbels, 1996), low-rank thin plate splines (see, e.g. Wood, 2003), bivariate
-splines (Eilers and Marx, 2003). However, fitting data of the size and complexity of the BLSA data is computationally challenging for most of these smoothers. One exception is the recently introduced sandwich smoother (Xiao and others, 2013), which is orders of magnitude faster because of the careful choice of the bivariate penalty that allows the separation of the bivariate smoothing into a series of univariate smoothing steps. We shall use the sandwich smoother, which will be designed to account for the complexity of the BLSA data. Moreover, we shall estimate the smoothing parameters in the sandwich smoother using leave-one-subject-out cross validation and introduce a fast computational technique for the cross validation.
The sandwich smoother models
by tensor-product splines
, where
is a coefficient matrix,
(
) is the collection of
-spline basis functions for the
-axis (
-axis), and
is the number of interior knots plus the order (degree plus 1) of the
-splines for
(
). The coefficient matrix is estimated by penalized weighted least squares with a particular penalty on it so that the fitted surface
has a sandwich form
, where
and
are two univariate smoother matrices for
and
, respectively. We use
-splines (Eilers and Marx, 1996) to construct the smoother matrices. For the
-axis, we let
, where
is the
model matrix
,
is a symmetric penalty matrix of size
and is constructed by using a difference penalty (Eilers and Marx, 1996), and
is a smoothing parameter. Note that if
is time of the day,
is periodic in
in the sense that
for all
. For such case, periodic
-splines are used and the penalty matrix
is modified accordingly to ensure that the resulting estimate is periodic in
. For the
-axis, because the subjects have varying numbers of daily activity profiles, we let
be the smoother matrix for penalized least squares inversely weighted by the number of profiles. Specifically,
, where
is an
diagonal matrix with the diagonals the reciprocals of the numbers of subject-specific profiles. Similarly,
is the
model matrix
,
is a symmetric penalty matrix of size
and constructed by using a difference penalty, and
is a smoothing parameter. The knots can be either equally spaced or placed at the quantiles of the data points for each axis; we use the latter for both
and
. Finally, let
be the estimated coefficient matrix. Then the penalized estimate is
.
The two smoothing parameters
can be selected via a fast implementation of generalized cross validation (GCV; Craven and Wahba, 1979). A major problem with using GCV in our context is that it does not take into account the strong functional correlations in the BLSA data and tends to select smoothing parameters that result in overfit; see, for example, the left-middle panel of supplementary material available at Biostatistics online, Figure S1. Thus, we investigated the leave-one-subject-out cross validation (iCV; Rice and Silverman, 1991), which has been widely used in functional data analysis (Lin and Carroll, 2000; Yao and others, 2003; Reiss and others, 2010). Despite its popularity, iCV is time-consuming and not practical in BLSA. Hence, we propose a simple and practical approximation to iCV, the iGCV, analogous to introducing GCV as an approximation to cross validation in univariate smoothing (Craven and Wahba, 1979). We derive a fast algorithm for calculating iGCV and maintain the fast speed of the sandwich smoother for highly correlated functional data.
2.2.1. The sandwich smoother with iGCV
Let
be the prediction of
using the sandwich smoother without the subject-specific information
, then
, where
is the Euclidean norm. We use the
notation, which stacks the columns of a matrix into a vector.
Proposition 1 —
If
is the
submatrix of
corresponding to subject
and
is the
diagonal block of
for subject
, then
We propose to replace
by
in the iCV formula and obtain
![]() |
(2.2) |
Further simplification for iGCV can be found in supplementary material available at Biostatistics online, Section S.2 and the final formula for iGCV is computationally much simpler than (2.2). An evaluation of the complexity of all algorithms and memory requirement is provided in supplementary material available at Biostatistics online, Section S.5.
2.3. Estimation of covariance operators
In this section, we estimate the between-subject covariance operator
and the within-subject covariance operator
. Similar to Di and others (2009) and Greven and others (2010), we first construct empirical estimates of the covariance operators. Let
, where
is obtained from Section 2.2. Define
and
. Then the empirical estimates of the covariance
and
for a fixed age
are
![]() |
(2.3) |
Thus, we obtain two
empirical covariance operators
and
, where the
th layers are
and
. Note that layers here are induced by the subject-specific covariates,
, and when covariance operators do not depend on covariates one would simply average
and
over
.
Smoothing these empirical estimates is a non-trivial problem. Indeed, as in the BLSA
and
, both
and
contain
25 million entries and trivariate smoothing becomes computationally prohibitive. To solve this problem, we propose to extend the sandwich smoother (Xiao and others, 2013, 2014) and investigate smooth covariance estimators
of the type
, where
and
denote two univariate symmetric smoother matrices that are constructed by
-splines (Eilers and Marx, 1996) and smooth along the
and
directions of
. Then the smoother has the form
![]() |
(2.4) |
where
is the smooth estimator for
and
is the
th column of
. This suggests a two-step estimation procedure: (i) smooth each
using the sandwich smoother described in Section 2.2 and (ii) weight the smoothed
by
to obtain
. The first step implies that the same smoother matrix
is used for smoothing all
, whereas the second step resembles univariate smoothing, though the data points are square matrices instead of scalars.
By Mercer's theorem,
can be decomposed as
where for a fixed
,
is a set of orthonormal basis functions and
are the associated eigenvalues. We refer to
as eigensurfaces. Then
has the Karhunen–Loeve representation
, where
are scores with zero mean and variance
, and are uncorrelated for all
and
. Estimates of
and
can be obtained by eigendecompositions of an estimate of
; see supplementary material available at Biostatistics online, Section S.3 for further details and prediction of scores.
2.3.1. Selection of smoothing parameters
For the two-step smoothing procedure of the covariance operators we need to estimate the smoothing parameters in
and
. First note that
and
in (2.3) can be rewritten as
and
, where
and
are two symmetric matrices. Let
and
be two matrices such that
and
. Then for selecting the smoothing parameter in
(for
), we extend the pooled GCV (PGCV) in Xiao and others (2014) to the case of simultaneously smoothing
covariance matrices
. The simplified formula in Proposition 2 of Xiao and others (2014) can be used to calculate PGCV. By minimizing PGCV, we can easily select the smoothing parameter in
.
Next we consider the smoothing parameter in
(for
). First note that if we let
, then equation (2.4) implies that we use the same smoother matrix
for all columns of
. This suggests an extension of GCV for smoothing matrices,
. We also derive a fast formula for calculating GCVM; see supplementary material available at Biostatistics online, Section S.4 for details.
3. Simulation studies
We conduct simulations to compare the sandwich smoother using iGCV with the original sandwich smoother, which was developed for independent data. We also investigate the accuracy of the proposed trivariate spline smoother in Section 2.3 for estimating covariate-dependent covariance operators. At the same time, we assess the speed of the proposed methods.
3.1. Simulation settings
We generate data from the model
![]() |
where
,
,
is the number of data points per subject, and
is the number of subjects. Here
is a multiplier that controls the ratio of between- and within-subject covariance. We simulate the random scores and noises independently with
,
,
,
, and
. We fix the eigensurfaces
and
and design a full
factorial experiment with five factors each with two levels: The mean function
, the eigenvalue functions
, the number of subjects
, the multiplier
, and the noise level
. supplementary material available at Biostatistics online provides details of the design. Our experiment creates 32 different sets of models parameters and we simulate 100 datasets under each model setting.
3.2. Simulation results
3.2.1. Mean function estimation
We compared three methods: Sandwich smoother with GCV, sandwich smoother with iGCV (SS-iGCV), and thin plate regression splines with GCV (TPRS-GCV). We evaluated the performance of the three methods by mean integrated squared errors. The results indicate that SS-iGCV always outperforms SS-GCV and is better than or comparable with TPRS-GCV; see supplementary material available at Biostatistics online, Tables S1 and S2 for the detailed results. Simulations illustrate that when data are correlated within subjects, the sandwich smoother based on the iGCV performs very well for estimating the mean function.
As for computation speed, our results (supplementary material available at Biostatistics online, Table S3) indicate that SS-iGCV, while slower than SS-GCV, is still an order of magnitude faster than TPRS-GCV. The computational efficiency of SS-iGCV is essential for bootstrap-based inference. Indeed, using SS-iGCV allowed us to smooth
bootstrap of dimensions
and
in. 1 h.
3.2.2. Covariance operator estimation
We investigated the performance of the trivariate spline smoother proposed in Section 2.3 for estimating covariate-dependent covariance operators. We applied the method to
, where
was estimated by SS-iGCV. We found that the method worked well; see supplementary material available at Biostatistics online, Table S4 for the detailed results. We also recorded the computation times of the proposed trivariate smoother; see supplementary material available at Biostatistics online, Table S5. The simulation results show that the proposed trivariate smoother is, indeed, very fast.
4. Results for the BLSA
We apply the functional data model (2.1) in Section 2 to the BLSA data. In Section 4.1, we analyze the systematic rhythms based on the estimated surfaces. In Section 4.2, we analyze the variation of random rhythms based on estimated trivariate covariance operators.
4.1. Systematic circadian rhythms
The left panels of Figure 3 show the heat maps of estimated bivariate activity surfaces using log counts for women, men and the associated gender difference. In the two heat maps for both genders (panels (a) and (b) in Figure 3), blue corresponds to no or little activity, green and yellow correspond to light intensity activity, and red corresponds to moderate and vigorous activity intensities. Blue areas overlap with the known resting periods of people; for example,
PM to
AM for
-year-old people. Red areas overlap with the working and activity hours of the day; for example,
AM to
PM for
-year-old people and
AM to
PM for
-year-old people. Panels (d) and (e) in Figure 3 display the estimated mean activity profiles for different ages and reveal two interesting patterns: (i) Mean activity is decreasing with increasing age at all times during the day and (ii) activity peaks are dropping with increasing age with larger decreases in late afternoon and evening. In particular, for people aged
or younger their mean activity is pretty constant between
AM and
PM. For older people, there seems to be a decline in activity between
and
PM, with activity peaking up again just before
pm. However, the second peak is much lower than the one of younger people. These results provide the first comprehensive and detailed look at the life-time circadian rhythms of activity in a large cohort.
Fig. 3.
Heat maps of estimated activity surfaces for women, men, and gender difference (left column) and estimated circadian activity curves of five different ages for women, men, and gender difference (right column). Note that the color bar in the bottom left heat map differs from those in the other two heat maps.
Panels (c) and (f) in Figure 3 display the age-specific gender difference in daily physical activity. Each point in (c) provides the estimated activity difference between men and women at a time of the day and for certain age. Plot (c) indicates that women who are 60 or older are more active than men in the same age category; this contradicts with the results reported by Troiano and others (2008) in the NHANES study. We believe that these differences are due to the large skewness of the activity count data, which was ignored by Troiano and others (2008). We have also conducted formal tests for gender differences in physical activity. More precisely, we focus on the parameter
, where
and
denote the log activity surface for women and men, respectively.
can be interpreted as the difference in the sum of log counts over the entire day between men and women of age
. For each
, the null hypothesis is
against
. Estimators of
can be used to test at what age the gender difference is significant, while a nonparametric bootstrap of subjects approach (Crainiceanu and others, 2012) can be used to conduct the tests. To obtain the null distribution of the tests, we used a permutation approach where we randomly permuted the gender of subjects and applied the sandwich smoother using iGCV. Because the sandwich smoother is fast we obtained
bootstraps in 1 h using a standard laptop. The joint tests over age provide confirmatory evidence that women between
and
are more active then men of the same age. Supplementary material available at Biostatistics online, Section S6 has more details.
So far, all results referred to time-of-day patterns of activity. Next, we focus on activity intensity irrespective to when the activity was conducted during the day. Thus, we investigate the sorted log counts function introduced in Section 1, which reduces the heterogeneity due to lack of temporal synchronization of activities between people and days. Panels (a) and (b) in Figure 4 are the heat maps of the estimated activity surfaces of sorted log counts for women and men, respectively. Inspection of these plots indicates that overall durations of active periods (active if sorted log counts
log 11) and moderate/vigorous activity (moderate/vigorous if sorted log counts
log 201, shown in dark red) are decreasing as a function of age. For example,
-year-old women have on average
7 active hours, while
-year-old women have on average
6 active hours. The active time is independent of when activity occurs. Similarly,
-year-old men have
25 min of moderate/vigorous activity, while
-year-old men have, on average, only
10 min of moderate/vigorous activity. Panels (c) and (f) in Figure 4 display the difference between men and women in aver sorted log counts. These results combined with plots (a) and (b) in Figure 4 indicate that light activity is conducted over longer periods by women than by men. In contrast, the intensity of moderate/vigorous activities in men is higher than in women, though the periods over which these activities are performed are very short. Similar to log counts, we conducted tests on
, where
and
now denote the sorted log activity surface for women and men, respectively. The hypotheses are: For each
,
against
. The test provides similar evidence that women between 65 and 75 are more active than men of the same age.
Fig. 4.
Heat maps of estimated sorted log activity surfaces for women, men, and gender difference (left column) and estimated log activity curves of five different ages for women, men, and gender difference (right column). Note that the color bar in the bottom left heat map differs from those in the other two heat maps.
To summarize, the results indicate that: (i) Duration of active periods, amount of activity, duration of moderate/vigorous activity, and activity peaks are all decreasing as a result of aging; (ii) more pronounced reductions with age in activity intensities happen in the second part of the day with a bigger trade-off of high to low intensity in men than in women; (iii) there is evidence of gender differences, though it goes in the opposite direction of what is reported in the literature and our results indicate that women tend to have longer periods of light activity, while younger men tend to have short periods of higher intensity activities when exercising moderately or vigorously.
4.2. Random circadian rhythms
Here, we analyze the variation in the activity profiles of sorted log counts, which can be modeled by the combination of a between-subject covariance operator and a within-subject covariance operator. Analysis for the log counts can be found in supplementary material available at Biostatistics online, Section S6. The covariance operators are age dependent and were estimated by the trivariate smoother in Section 2.3. In particular, we assumed the eigensurfaces of covariance operators are the same for both genders. Figure 5 displays two eigensurfaces for each covariance operator. Each row of the eigensurfaces is an eigenfunction corresponding to a certain age. For the between-subject covariance, the first and second eigenfunctions account for 79–83% and 14–18% of the total variation; for the within-subject covariance the corresponding ranges of proportions are 84–88% and 8–12%. Note that the proportion of variation for the eigenfunctions is allowed to change with age. Now we interpret the eigenfunctions of the between-subject covariance. The rows of the first eigensurfaces are mostly positive, suggesting that individuals with positive scores on the first eigenfunction are more active than the population average and those with negative scores are less active. Here “more active” means longer duration of active periods and higher intensity activity at each
, whereas “less active” means shorter duration of active periods and activity of lower intensities at each
. Therefore, each subject's score on the first eigenfunction can be used to identify physically more active and less active subjects. The second eigensurfaces show a contrast between moderate/vigorous activity and light activity: People with positive scores on the second eigenfunction have relatively more moderate/vigorous activity and less light intensity activity. A possible use of the = scores on the second eigenfunction is to distinguish between people who conduct moderate/vigorous activity from the ones who do not. The eigensurfaces for the within-subject covariance are similar and quantify the day-to-day within-subject variability of activity.
Fig. 5.
Estimated between-subject eigensurfaces (left panels) and within-subject eigensurfaces (right panels) for sorted log counts.
For each covariance operator, we calculated the between- and within-subject variability at each age, that is, the sum of all eigenvalues (see Figure 6). The between-subject variability first decreases as a function of age; this suggests that activity patterns of older people are more similar than those of younger people. Then starting from around aged 70, the between-subject variability increases. The within-subject variability is a decreasing function of age. This indicates that day-to-day variability of activity for older people is smaller than that for younger people. Finally, the within-subject variability is smaller than the between-subject variability and the relative difference becomes bigger with age, showing that day-to-day variability of activity decreases faster than the between-subject variability.
Fig. 6.

Estimated total variation of the between-subject covariance (left panel) and the within-subject covariance (right panel) for sorted log counts for each gender. Total variation is the sum of all eigenvalues.
5. Discussion
The scientific goal of the paper was to quantify the systematic and random patterns of activity as functions of age. To achieve this, we introduced an explicit model containing: (i) A bivariate smooth function of time of day and age to capture the systematic component and (ii) two latent processes that depend on time of day and age and describe the within- and between-subject variability. Using this model, we have shown that (i) average activity decreases with age at every time during the day, (ii) activity decreases faster with age in late afternoon and evening, (iii) older women tend to be more active than older men in low activity intensity activities, and (iv) the amount of within-subject variability is comparable with the amount of between-subject variability and both are decreasing functions of age.
From a methodological perspective, we have introduced bivariate and trivariate smoothing of data with sizable within- and between-subject correlation using leave-one-subject-out generalized cross validation (iGCV). Coupled with the fast sandwich smoother, we were able to conduct these analyses; the R code will be made publicly available.
Supplementary material
Supplementary Material is available at http://biostatistics.oxfordjournals.org.
Funding
This work was supported by Grant Number R01NS060910 from the National Institute of Neurological Disorders and Stroke. This work represents the opinions of the researchers and not necessarily that of the granting organizations.
Supplementary Material
Acknowledgement
Conflict of Interest: None declared.
References
- Bai J., He B., Shou H., Zipunnikov V., Glass T. A., Crainiceanu C. M. (2014). Normalization and extraction of interpretable metrics from raw accelerometry data. Biostatistics 15, 102–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bussmann J. B., Martens W. L., Tulen J. H., Schasfoort F. C., van den Berg-Emons H. J., Stam H. J. (2001). Measuring daily behavior using ambulatory accelerometry: the activity monitor. Behavior Research Methods, Instruments, & Computers 33(3), 349–356. [DOI] [PubMed] [Google Scholar]
- Cardot H. (2007). Conditional functional principal components analysis. Scandinavian Journal of Statistics 34, 317–335. [Google Scholar]
- Crainiceanu C. M., Staicu A., Ray S., Punjabi N. (2012). Bootstrap-based inference on the difference in the means of two correlated functional processes. Statistics in Medicine 31, 3223–3240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Craven P., Wahba G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik 31, 377–403. [Google Scholar]
- Culhane K. M., O’Connor M., Lyons D., Lyons G. M. (2005). Accelerometers in rehabilitation medicine for older adults. Age Ageing 34, 556–560. [DOI] [PubMed] [Google Scholar]
- Di C., Crainiceanu C. M., Caffo B. S., Punjabi N. (2009). Multilevel functional principal component analysis. Annals of Applied Statistics 3, 458–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eilers P. H. C., Marx B. D. (1996). Flexible smoothing with B-splines and penalties (with Discussion). Statistical Science 11, 89–121. [Google Scholar]
- Eilers P. H. C., Marx B. D. (2003). Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intelligent Laboratory Systems 66, 159–174. [Google Scholar]
- Fan J., Gijbels I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall/CRC. [Google Scholar]
- Ferrucci L., Alley D. (2007). Obesity, disability, and mortality. Archives of Internal Medicine 167, 750–751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greven S., Crainiceanu C. M., Caffo B. S., Reich D. (2010). Longitudinal functional principal component. Electronic Journal of Statistics 4, 1022–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo W. (2002). Functional mixed effects models. Biometrics 58, 121–128. [DOI] [PubMed] [Google Scholar]
- He B., Bai J., Zipunnikov V., Koster A., Paolo C., Lange-Maria B., Glynn N. W., Harris T. B., Crainiceanu C. M. (2014). Predicting human movement with multiple accelerometers using movelets. Medicine & Science in Sports & Exercise 46, 1859–1866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang C., Wang J. (2010). Covariate adjusted functional principal components analysis for longitudinal data. Annals of Statistics 38, 362–388. [Google Scholar]
- Lin X., Carroll R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. Journal of the American Statistical Association 95, 520–534. [Google Scholar]
- Morris J. S., Arroyo C., Coull B. A., Ryan L. M., Herrick R., Gortmaker S. L. (2006). Using wavelet-based functional mixed models to characterize population heterogeneity in accelerometer profiles: a case study. Journal of the American Statistical Association 101, 1352–1364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris J. S., Carroll R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society B 68, 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pate R. R., Pratt M., Blair S. N. and others (1995). Physical activity and public health: a recommendation from the centers for disease control and prevention and the american college of sports medicine. Journal of the American Medical Association 273, 402–407. [DOI] [PubMed] [Google Scholar]
- Reiss P. T., Huang L., Mennes M. (2010). Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics 6, 28. [DOI] [PubMed] [Google Scholar]
- Rice J. A., Silverman B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society: Series B 53, 233–243. [Google Scholar]
- Sallis J. F., Saelens B. E. (2000). Assessment of physical activity by self-report: status, limitations, and future directions. Research Quarterly for Exercise & Sport 71, S1–14. [PubMed] [Google Scholar]
- Schrack J. A., Zipunnikov V., Goldsmith J., Bai J., Simonsick E. M., Crainiceanu C. M., Ferrucci L. (2014). Assessing the “physical cliff”: detailed quantification of aging and patterns of physical activity. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 69, 973–979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shou H., Zipunnikov V., Crainiceanu C. M., Greven S. (2014). Structured functional principal component analysis. Biometrics (to appear). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staicu A. M., Crainiceanu C. M., Carroll R. J. (2010). Fast methods for spatially correlated multilevel functional data. Biostatistics 11, 177–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stone J. L., Norris A. H. (1966). Activities and attitudes of participants in the Baltimore Longitudinal Study. The Journals of Gerontology 21, 575–580. [DOI] [PubMed] [Google Scholar]
- Troiano R. P., Berrigan D., Dodd K. W., Masse L. C., Tilert T., McDowell M. (2008). Physical activity in the United States measured by accelerometer. Medicine & Science in Sports & Exercise 40, 181–188. [DOI] [PubMed] [Google Scholar]
- U.S. Department of Health and Human Services. (2010). The surgeon general's vision for a healthy and fit nation. Rockville, MD: U.S. Department of Health and Human Services, Office of the Surgeon General. [PubMed] [Google Scholar]
- Wood S. N. (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B 65, 95–114. [Google Scholar]
-
Xiao L., Li Y., Ruppert D. (2013). Fast bivariate
-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B
75, 577–599. [Google Scholar] - Xiao L., Ruppert D., Zipunnikov V., Crainiceanu C. M. (2014). Fast covariance function estimation for high-dimensional functional data. Statistics and Computing (to appear). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao F., Müller H., Clifford A. J., Dueker S. R., Follett J., Lin Y., Buchholz B. A., Vogel J. S. (2003). Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 20, 852–873. [DOI] [PubMed] [Google Scholar]
- Zhou L., Huang J. Z., Martinez J. G., Maity A., Baladandayuthapani V., Carroll R. J. (2010). Reduced rank mixed effects models for spatially correlated hierarchical functional data. Journal of the American Statistical Association 105, 390–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zipunnikov V., Caffo B. S., Crainiceanu C. M., Yousem D. M., Davatzikos C., Schwartz B. S. (2011). Multilevel functional principal component analysis for high-dimensional data. Journal of Computational and Graphical Statistics 20, 852–873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zipunnikov V., Greven S., Shou H., Caffo B. S., Reich D. S., Crainiceanu C. M. (2014). Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis. Annals of Applied Statistics (to appear). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



















