Abstract
Motivated by physical activity data obtained from the BodyMedia FIT device (www.bodymedia.com), we take a functional data approach for longitudinal studies with continuous proportional outcomes. The functional structure depends on three factors. In our three-factor model, the regression structures are specified as curves measured at various factor-points with random effects that have a correlation structure. The random curve for the continuous factor is summarized using a few important principal components. The difficulties in handling the continuous proportion variables are solved by using a quasilikelihood type approximation. We develop an efficient algorithm to fit the model, which involves the selection of the number of principal components. The method is evaluated empirically by a simulation study. This approach is applied to the BodyMedia data with 935 males and 84 consecutive days of observation, for a total of 78, 540 observations. We show that sleep efficiency increases with increasing physical activity, while its variance decreases at the same time.
Keywords: BodyMedia FIT device, continuous proportions, functional data, mixed-effects model, physical activity, sleep efficiency
1. Introduction
Motivated by physical activity data, we take a functional data approach for longitudinal studies on continuous proportional outcomes. The research is aimed at understanding the influence of physical activity on sleep efficiency. The response variable, sleep efficiency, is measured by the ratio of daily sleep time to lying down time for each participant (Lambiase et al., 2013), and thus it is a continuous proportion. The major explanatory factor, physical activity level, is measured by the daily minutes of moderate to vigorous physical activity (MVPA). The intensity of physical activity is evaluated in a unit called METs, with 1 MET being the energy required to sit quietly, a quantity that depends on one’s body weight and other characteristics. MVPA has 3–6 METs, and is roughly when a person is moving fast enough or strenuously enough to burn off three to six times as much energy per minute as when she/he is sitting quietly. Month and weekday effects may also influence sleep efficiency, and we take them into account as two additional factors. Therefore, our functional structure depends on physical activity, month and weekday effects factors.
It has been widely reported that greater physical activity leads to greater sleep efficiency, both marginally (Lambiase et al., 2013; Oudegeest-Sander et al., 2013) and, in longitudinal data, within-person (Ekstedt et al., 2013). However, these studies mainly use correlation-based or linear regression methods, which do not take advantage of newly developed instruments, such as the BodyMedia FIT device (bodymedia.com), the ActiGraph device (actigraphcorp.com), and the ActivPal device (paltechnologies.com). These devices can measure physical activity data continuously (e.g., minute-by-minute) for an extended period (e.g., weeks). In this study, we use data from the BodyMedia FIT device. The BodyMedia FIT device is a multi-sensor armband which measures skin temperature, heat flux, galvanic skin response, and motion through a 3-axis accelerometer. With this device, there is the opportunity for new and more powerful statistical methods to understand physical activity and other outcomes, both within and between individuals.
The data we have are summaries of daily minutes of moderate to vigorous physical activity (MVPA) and daily sleep efficiency over 12 week periods with starting dates distributed throughout the calendar year. There are numerous questions that arise, including the following:
What is the effect of daily MVPA minutes on daily sleep efficiency?
If daily MVPA minutes lead to greater sleep efficiency, is the effect constant or is there some point where the effect of increasing MVPA plateaus or even decreases?
Does increasing MVPA minutes have influence on the stability (variability) of sleep efficiency?
The purpose of this paper is to develop initial methods that can be used to answer the questions and conjectures described above. We do not claim that the methods are the last word in the analysis of physical activity and sleep efficiency, but we believe that our work is novel because we take a functional data analysis approach to the question, while simultaneously recognizing explicitly that sleep efficiency is a variable that necessarily is constrained to the unit interval (0, 1), a constraint that, to the best of our knowledge, has not been considered in the functional data literature.
There are many studies for continuous proportional responses, but not for longitudinal functional data. One naive way is to ignore that the proportional outcomes should be in the unit interval (0, 1), and then fit the responses by ordinary linear regression, but this potentially gives predictions outside the unit interval (Kieschnick & McCullough, 2003). The beta distribution (Ferrari & Cribari-Neto, 2004) can also be employed to analyze the data. Simas et al. (2010) and Zhao et al. (2012) used beta regression with functional forms for predictors, but they were limited by assuming that all observations are independent. Verkuilen & Smithson (2012) and Figueroa-Zúñiga et al. (2013) used mixed-effects in the beta regression to model correlated data, but their fixed and random effect structures were not in functional formulations. On the other hand, continuous proportions can be analyzed by first taking the logit transformation of the outcomes, and then using linear regression to fit the data. However, as we will show, this approach can lead to biased results.
There are of course many statistical papers that focus on functional data analysis, but, as far as we are aware, there are only a few studies that focus on proportional responses. Hall et al. (2008) proposed a functional model for non-Gaussian data. However, the model estimation is partly based on a simplified first-order linear approximation formulation, and is known to lead to biased results in some settings (Serban et al., 2013). Goldsmith et al. (2015) proposed a generalized multilevel function-on-scalar regression model for outcomes with exponential family. Gertheiss et al. (2015) discussed a marginal functional regression model for binary outcomes, and Scheipl et al. (2016) studied generalized functional additive mixed models for outcomes with exponential family distribution as well as others like beta-distribution. However, these models cannot handle the effects from the three factors in our data.
In the case of correlated functional data, a large number of random effects are required to model the smooth random curves. Current computational methods for random effects mainly use Monte Carlo or Gauss-Hermite quadrature approximations (Molenberghs & Verbeke, 2005), but in our context these are computationally expensive. For example, Figueroa-Zúñiga et al. (2013) suggested that the beta regression could require more quadrature nodes than logistic regression.
We address this problem with the approach proposed by Cox (1996), where a quasilikelihood method (Wedderburn, 1974) is used to model continuous proportions. The quasilikelihood does not need to find a full distribution for outcomes but only requires the specification of the first and second moments. However, the existing methodology is limited to independent data, and we extend it to our longitudinal functional data scenario. In particular, the modeling of the correlation structure with respect to physical activity, month and weekday effects is necessary. To build a flexible model, we use functional random curves for the MVPA minutes. To avoid the dimension problem in random curves, we only use a few important functional principal components to summarize the random curves. This method is developed by Zhou et al. (2008) in the linear model and we take this general approach to address a continuous proportional response.
A new efficient algorithm is proposed. The algorithm includes both features of penalized quasilikelihood (Breslow & Clayton, 1993) and the eigen-decomposition discussed in Yao et al. (2005). Since our problem involves quasilikelihood modeling and random effects, a penalized quasilikelihood approach is convenient. On the other hand, the eigen-decomposition approach is efficient for simultaneous selection and estimation of the functional principal components. As a result, the new algorithm includes both procedures.
The paper is organized as follows. Section 2 describes the model, and Section 3 is for our algorithm in model fitting. Section 4 gives results from a simulation study. Section 5 analyzes the Body Media data set involving physical activity, and suggests answers to the three questions posed earlier. Concluding remarks are given in Section 6.
2. Model
2.1. The Mixed Effects Model for Continuous Proportional Data
Let Yi(r, s, t) be a continuous proportional observation at MVPA minute r, month s and weekday t for subject i = 1… n. Each subject has mi observations. The possible values for s can be 1, 2,…, 12, which represents January to December, while t can be 1, 2,…, 7 indicating Sunday through Saturday. We use Yi (rij, sij, tij) to denote the jth observation for subject i. Define Yi = {Yi(ri1, si1, ti1),…, Yi (rimi, simi, timi)}⊤.
According to the quasilikelihood method suggested by Wedderburn (1974) and McCullagh & Nelder (1989), we only specify the first and second moments for the outcomes. The mean and variance functions of Yi(r, s, t) given random effects are
| (1) |
where H(·) denotes the logistic distribution function, μ(r, s, t) is a fixed curve, Ui(r, s, t) is a random effects curve, and σ2 is a dispersion parameter. We further assume that given the random effects Ui(r, s, t), the variables in Yi are independent.
Cox (1996) discussed other candidates for modeling Yi(r, s, t). For example, the variance structure can be specified as
However, unlike Cox’s independent observation scenarios, our method involves the random effect curves Ui(r, s, t) to model the correlation structure. Therefore, we focus on model (1).
We further specify μ(r, s, t) and Ui(r, s, t) terms in (1) by additive models
| (2) |
| (3) |
where μ0(r), μ1(s) and μ2(t) are fixed curves at r, s, t, and Ui,0(r), Ui,1(s), Ui,2(t) are random curves at r, s, t, respectively. For model identifiability, we set μ1(1) = μ2(1) = 0. We also assume that Ui,0(r), Ui,1(s) and Ui,2(t) are mutually independent for all r, s, t.
2.2. Basis Functions
To model the fixed and random curves in (2) and (3), let b0(r) = {b0,1(r),…, b0,q0(r)}⊤, b1(s) = {b1,1(s),…, b1,q1(s)}⊤ and b2(t) = {b2,1(t),…, b2,q2(t)}⊤ be the vectors of orthogonal B-spline basis functions evaluated at physical activity minutes r, month s and weekday t, respectively. The orthogonal B-spline basis functions can be computed using an exact approach found in the R package “orthogonalsplinebasis” (Redd, 2011; R Core Team, 2016).
We model the fixed effect curves to be
where β0, β1, β2 are q0 × 1, q1 × 1, q2 × 1 regression coefficient vectors, and s = 2,…, 12 and t = 2,…, 7.
For the random effect curve Ui,0(r), we set
where ui,0 are q0 × 1 correlated random effect vectors. In practice, when q0 is large, the estimation of the variance structure for ui,0 could be difficult. Based on the principal component approach (Zhou et al., 2008), we summarize ui,0 by using only a few principal components by setting , where L is the number of principal components, θℓ is the ℓth q0 × 1 orthogonal principal component vector, and αi,ℓ is the ℓth principal component score. For identifiability, the principal components are sorted in decreasing order by the variance of αi,ℓ, and the αi,ℓ is set to be independent across all ℓ = 1,…, L. Denote Θ; = (θ1,…, θL) and αi = (αi,1,…, αi,L)⊤. We assume αi ~ Normal(0, Δ) where Δ = diag(Δ1,…, ΔL), and thus ui,0 ~ Normal(0,Ψ0) with Ψ0 = Θ;ΔΘ;⊤. We further denote as the ℓth principal component curve.
Remark 1
Instead of using principal components, there are two commonly used models for the random effect curve Ui,0(r), namely
where and are scalar random effects. The first formulation only involves random-intercepts, which implies homoscedasticity for Ui,0(r) across r. The second model has an additional random-slope term, so that the variance of Ui,0(r) is a quadratic function over r. However, as we show in the simulation study, both formulations can be limited when the random effect structure is complicated, and they can lead to biased estimates.
For Ui,1(s) and Ui,2(t), we use dummy variables given as
where I(·) is an indicator function, ui,1 = (ui,1,1,…,ui,1,12)⊤ and ui,2 = (ui,2,1, …, ui,2,7)⊤ are 12 × 1 and 7 × 1 random effect vectors. We assume ui,1 ~ Normal(0,Ψ1) and ui,2 ~ Normal(0,Ψ2) with Ψ1 = diag(Ψ1,1,…, Ψ1,12) and Ψ2 = diag(Ψ2,1,…, Ψ2,7).
Therefore, the model (2)–(3) can be rewritten as
| (4) |
| (5) |
The modeling with B-splines involves six sets of parameters to be estimated: (a) the dispersion parameter: σ2; (b) the B-spline coefficients for the fixed effects: β0, β1 and β2; (c) the number of principal component: L; (d) the B-spline coefficients for principal component functions: Θ;; (e) the principal component scores’ covariance matrix: Δ; and (f) the covariance matrices for ui,1 and ui,2: Ψ1 and Ψ2.
3. Model Fitting Procedure
3.1. Second Order Approximation for Continuous Proportions
Estimation of the parameters is complicated by the continuous proportional outcomes. We approximate the continuous proportions using a penalized quasilikelihood that includes a second order approximation term. This method was introduced in Goldstein & Rasbash (1996) and it outperforms the methods proposed by Breslow & Clayton (1993). The method is as follows. Since H(·) is logistic distribution function, the first and second derivatives of H(·) are H′(·) = H(·){1 − H(·)} and H″(·) = {1 − 2H(·)} H′(·). Let g(·) = 1/H′ (·).
Set . Let I1(s) = {I(s = 1),…I(s = 12)}⊤, and I2(t) = {I(t = 1)…, I(t = 7)}⊤, and set . Denote and . Given known values of (β̂, ûi), letting η̂i(r, s, t) = X(r, s, t)β̂ + Z(r, s, t)û i, we use the approximate model
| (6) |
where εi (r, s, t) = Normal[0, σ2g{η̂i (r, s, t)}]. The derivation of this approximation can be referred to Molenberghs & Verbeke (2005)
3.2. Estimation Algorithm
According to the second order approximation in (6), the transformed continuous proportional outcomes can be treated as continuous variables with normal distributions. We estimate the parameters using an ECME algorithm (Schafer, 1998). The ECME algorithm updates fixed structure parameters by the Newton-Raphson approach, and updates the random effects parameters by the EM method. We provide a brief sketch of the model estimation procedure here, and the details are in supporting information for Appendix S1.
We set the initial numbers of principal components to be L= q0 and thus Ψ0 is a full rank covariance matrix. We also give initial values for other parameters listed in Section 2.2. Then the iteration procedure is
-
1
update β and σ2 by a Newton-Raphson approach,
-
2
update Ψ0, Ψ1 and Ψ2 by the EM method, and
-
3
update L, Θ;, and Δ with an eigen-decomposition of Ψ0.
The entire procedure is iterated until convergence. For convergence properties of the ECME algorithm, see Liu & Rubin (1994).
3.3. Maximum Penalized Likelihood
The previous discussion focuses on the modeling of the response variables using basis functions. It is helpful however to introduce roughness penalties to regularize the fits of functions (Eilers & Marx, 1996). Denote θℓ to be the ℓth column for Θ;.
We penalize the loglikelihood and update the parameters in each iteration to maximize
where ℒ is defined in the Appendix S1 equation (S.2), τβ and τθ are penalty parameters, and the penalty matrices are , and D= diag(D0, D1, D2).
Using maximum penalized likelihood has only a minor effect on the estimation algorithm, although of course it has a major effect on the estimation results. We describe the details in supporting information for Appendix S1.3.
In all of our work, we use five-fold crossvalidation to choose penalty parameters. We searched over a two dimensional grid for τβ and τθ. The tuning parameters are obtained by maximizing the crossvalidated loglikelihood
where the estimates Ê(Yi) and are described in supporting information for Appendix S1.4.
4. Simulation Studies
We use a simulation of 500 runs to assess the performance of our longitudinal functional additive model. There are n = 240 subjects, and each subject has 84 visits observed in 12 weeks; this is similar to our BodyMedia data, but with a much smaller number of subjects. Each week has complete observations from Monday to Sunday. We set each month to have four weeks. All subjects are observed in three consecutive months. For example, subject 1 is observed from January to March and subject 2 is observed from February to April. Then Yi(r, s, t) is generated according to, the beta distribution with density function conditional on Ui(r, s, t) as
where Γ(·) is the gamma function, κ = E{Yi(r, s, t)|Ui(r, s, t)} and ϕ = 1/σ2 − 1. We set E{Yi (r, s, t)|Ui(r, s, t)} = H{μ0(r) + μ1(s) + μ2(t) + f1(r)αi,1 + f2(r)αi,2 + Ui,1(s) + Ui,2(t)}, where μ0(r) = H(r/2 − 5.5), μ1(s) = (s − 7)2/36 − 1 and μ2(t) = −(t − 3)2/5 + 0.8. The principal component curves are and . We generate αi,1, αi,2, Ui,1(s) and Ui,2(t) as normally distributed with zero means, and set Δ1 = 12, Δ2 = 6, Ψ1,s = 2 for all s, and Ψ2,t = 1 for all t. We also generate r as uniformly distributed in [0, 22]. For σ2, we studied σ2 = 0.02 by following the suggestion from Figueroa-Zúñiga et al. (2013) and σ2 = 1/30 which is similar to the result of our data application in Section 5. Our method has good performances in both scenarios, and we only report the simulation results from σ2 = 1/30 here.
As a comparison to our method, three naive approaches are explored. The first approach (labeled as NAIVE1) follows our algorithm but uses a random-intercepts model for Ui,0(r) as discussed in Remark 1. The second method (labeled as NAIVE2) is similar to NAIVE1 but uses a random-slopes model. The third method (labeled as NAIVE3) uses an identical random effect structure as our method, but it first takes a logit transformation of the responses and then fits the outcomes by a linear functional data model (Zhou et al., 2008).
We use cubic B-spline basis function with 10 equispaced knots to fit μ0(r), and use linear B-spline basis functions with 5 and 4 knots to fit μ1(s) and μ2(t), respectively. Convergence was achieved for all simulated data sets. The correct number of principal components was selected in all simulated data sets. Table 1 presents the mean estimates and the mean squared errors (MSE) of the parameters, which indicates good performance in the estimation of the model parameters for our approach.
Table 1.
Results for simulation results in Section 4. Displayed are the average estimates and mean squared errors (MSE) of the parameters. The symbol * means that the actual number is multiplied by 10000.
| Parameter | σ2 | Δ1 | Δ2 | Ψ1,1 | Ψ1,2 | Ψ1,3 | Ψ1,4 | Ψ1,5 | Ψ1,6 | Ψ1,7 | Ψ1,8 |
| True | 0.03 | 12.00 | 6.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 |
| Mean | 0.03 | 12.05 | 6.01 | 2.00 | 1.97 | 2.03 | 1.97 | 1.98 | 2.03 | 1.99 | 1.98 |
| MSE | 0.01* | 1.22 | 0.31 | 0.14 | 0.17 | 0.17 | 0.17 | 0.15 | 0.16 | 0.16 | 0.17 |
| Parameter | Ψ1,9 | Ψ1,10 | Ψ1,11 | Ψ1,12 | Ψ2,1 | Ψ2,2 | Ψ2,3 | Ψ2,4 | Ψ2,5 | Ψ2,6 | Ψ2,7 |
| True | 2.00 | 2.00 | 2.00 | 2.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Mean | 2.00 | 1.99 | 1.98 | 2.02 | 1.00 | 1.00 | 1.01 | 1.02 | 1.01 | 1.01 | 0.99 |
| MSE | 0.15 | 0.16 | 0.15 | 0.15 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
Figures 1 (a)–(b) show the true fixed curve μ0(r), and the averaged estimates of the four methods. They indicate that NAIVE1, NAIVE2 and NAIVE3 approaches lead to obviously biased outcomes, while there is little bias for our method. Figures 1 (c)–(f) represent the performance of our approach in fixed curves μ1(s), μ2(t) and the principal component curves f1(r), f2(r), respectively. Our approach captures all of the true curve patterns.
Figure 1.
Fitted fixed and random effects curves for 500 simulated data sets: (a) mean of the fixed effects curve estimates of μ0(r) obtained from NAIVE1, NAIVE2 and our method, (b) mean of the fixed effects curve estimates of μ0(r) obtained from NAIVE3 and our method, (c) mean of the fixed effects curve estimates of μ1(s), (d) mean of the fixed effects curve estimates of μ2(t), (e) mean of the principal component curve estimates for f1(r), (f) mean of the principal component curve estimates of f2(r). Dotted lines denote the true curves. Solid thin lines represent the average values of the fitted curves from our method. Dot-dashed thin and thick lines in figure (a) represent the average values of the fitted curves from NAIVE1 and NAIVE2 methods, respectively. Solid thick line in figure (b) represents the average values of the fitted curves from NAIVE3 method. The upper and lower dashed lines in figures (c)–(f) are the 10% and 90% quantiles of the fitted values from our method over the 500 simulated data sets.
5. Application to Physical Activity Data
In this section we apply our methods to the BodyMedia data to help answer the question raised in the introduction. Our data involve 935 males and each person has 84 observations consisting of daily METS and daily sleep efficiency. Referring back to Section 2’s notation, Yi(r, s, t) is the ratio of daily sleep time to lying down time, r is the average minutes of moderate and vigorous activity (MVPA) time on the current day and previous day, s is the month and t is the weekday. We use cubic B-spline basis function with 24 equispaced knots to fit μ0(r), while other basis functions follow the settings in Section 4. The dispersion parameter σ2 is estimated to be 0.034.
Figure 2 presents the estimation results for fixed effect curves μ0(r), μ1(s) and μ2(t), and principal component curve f1(r). Figure 2(a) suggests that conditioning on the random effect Ui(r, s, t), the relation between MVPA minutes and sleep efficiency can be divided into three parts. From minutes 0 to 60, the increase of MVPA minutes leads to higher sleep efficiency. The effect of increasing MVPA minutes flattens out gradually between minutes 60 to 120, while the greater MVPA minutes have negative influence on sleep efficiency after minutes 120. Figure 2(b) suggests that sleep efficiency has a strong monthly trend, where January and February have higher sleep efficiency while October has lower sleep efficiency. Figure 2(c) indicates that Sunday and Monday have lower sleep efficiency but the sleep efficiency on Wednesday and Thursday are higher. Thus, it suggests sleep efficiency is greater in the middle of the week.
Figure 2.
Fitted fixed effect and principal component curves for data application. (a) Fitted fixed effects curve μ0(r) for the factor MVPA minutes, (b) Fitted fixed effects curve μ1(s) for the factor month, (c) Fitted fixed effects curve μ2(t) for the factor weekday, (d) Fitted principal component curve for factor MVPA minutes f1(r). Solid lines represent the values of the fitted curves. The upper and lower dashed lines are the 10% and 90% quantiles of the fitted values across 500 bootstrap estimates. Dot-dashed vertical lines represent MVPA time on minutes 60 and 120, respectively.
The number of principal components for MVPA minutes is selected to be 1. Figure 2(d) shows the principal component curve is decreasing with increasing MVPA time, which means greater physical activity leads to less between-subject variability. The result implies subjects with more physical activity have more consistent sleep efficiency.
We also study the marginal mean and variance structure of Yi(r, s, t). The month is set to be January and the weekday is Monday. Figure 3(a) presents the marginal mean of the outcomes evaluated at different MVPA minutes. It displays the increasing MVPA results in the improvement of sleep efficiency. However, the increase in sleep efficiency is up to about 120 MVPA minutes, and then it tails off. Figure 3(b) is the marginal variance of the responses. The variability of sleep efficiency is decreasing with increasing MVPA time. In particular, the variability at 200 MVPA minutes is about half of that at 0 MVPA minutes. This suggests that people with higher physical activity time will generally have more constant sleep efficiency.
Figure 3.
Fitted marginal curves for data application. (a) Fitted marginal mean curve for the factor MVPA minutes on Monday in January, (b) Fitted marginal variance curve for the factor MVPA minutes on Monday in January. Solid lines represent the values of the fitted curves. The upper and lower dashed lines are the 10% and 90% quantiles of the fitted values across 500 bootstrap estimates. Dot-dashed vertical lines represent MVPA time on minutes 60 and 120, respectively.
We show the correlation structure of the outcomes with respect to MVPA time on Monday in January, corr{Yi(j, 1, 2), Yi(k, 1, 2)}, as a 3-D plot in Figure 4. The figure reaches its peak around (j = 0, k = 0). This makes sense because lower MVPA time leads to higher variability in sleep efficiency which causes greater correlation. On the other hand, the plot decreases as MVPA minutes increase. This is likely because, intuitively, sleep efficiency is relatively constant for people with longer MVPA minutes.
Figure 4.
The estimates of correlation surfaces for corr{Yi(j, 1, 2), Yi(k, 1, 2)} (j ≠ k) on Monday in January.
6. Discussion
We have proposed a three-factor joint modeling and estimation strategy for functional data with continuous proportions. The simulation results are encouraging, with little bias. The analysis of the BodyMedia data using the our method demonstrates its utility in real applications. Our conclusions are that daily sleep efficiency improves with increasing MVPA up to about 120 minutes and increasing MVPA results in a decrease in the variance of sleep efficiency throughout the range of MVPA minutes. The former conclusion makes sense in general, however, the plateau of mean daily sleep efficiency at about 120 MVPA minutes has not been reported previously, largely because fully linear modeling of this data is standard. We believe that the substantial decrease in the variability of sleep efficiency as MVPA minutes increase is also a new finding, with standard analyses focusing only on means.
Supplementary Material
Acknowledgments
Li was supported by discovery grants program from the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2015-04409). Carroll was supported by a grant from the National Cancer Institute (U01-CA057030). The authors thank BodyMedia, Inc. for making the data available to them.
Footnotes
Supporting Information
Additional information for this article is available at the publisher’s web-site.
Appendix S1: Detailed model estimation algorithm.
Figure 1: Fitted fixed and random effects curves for 500 simulated data sets.
Figure 2: Fitted fixed effect and principal component curves for data application.
Figure 3: Fitted marginal curves for data application.
Figure 4: The estimates of correlation surfaces.
Table 1: Results for simulation results in Section 4.
References
- Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
- Cox C. Nonlinear quasi-likelihood models: applications to continuous proportions. Computational Statistics and Data Analysis. 1996;21:449–461. [Google Scholar]
- Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11:89–121. [Google Scholar]
- Ekstedt M, Nyberg G, Ingre M, Örjan Ekblom & Marcus C. Sleep, physical activity and bmi in six to ten-year-old children measured by accelerometry: a cross-sectional study. International Journal of Behavioral Nutrition and Physical Activity. 2013;10:82. doi: 10.1186/1479-5868-10-82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of Applied Statistics. 2004;31:799–815. [Google Scholar]
- Figueroa-Zúñiga JI, Arellano-Valle RB, Ferrari SL. Mixed beta regression: A Bayesian perspective. Computational Statistics and Data Analysis. 2013;61:137–147. [Google Scholar]
- Gertheiss J, Maier V, Hessel EF, Staicu AM. Marginal functional regression models for analyzing the feeding behavior of pigs. Journal of Agricultural, Biological, and Environmental Statistics. 2015;20:353–370. [Google Scholar]
- Goldsmith J, Zipunnikov V, Schrack J. Generalized multilevel function-on-scalar regression and principal component analysis. Biometrics. 2015 doi: 10.1111/biom.12278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein H, Rasbash J. Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society. Series A. 1996;159:505–513. [Google Scholar]
- Hall P, Müller HG, Yao F. Modelling sparse generalized longitudinal observations with latent gaussian processes. Journal of the Royal Statistical Society, Series B. 2008;70:703–723. [Google Scholar]
- Kieschnick R, McCullough BD. Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Statistical Modelling. 2003;3:193–213. [Google Scholar]
- Lambiase MJ, Gabriel KP, Kuller LH, Matthews KA. Temporal relationships between physical activity and sleep in older women. Medicine and Science in Sports and Exercise. 2013;45:2362–2368. doi: 10.1249/MSS.0b013e31829e4cea. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu C, Rubin DB. The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika. 1994;81:633–648. [Google Scholar]
- McCullagh P, Nelder JA. Generalized Linear Models. London: Chapman and Hall; 1989. [Google Scholar]
- Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. Springer; 2005. [Google Scholar]
- Oudegeest-Sander MH, Eijsvogels TH, Verheggen RJ, Poelkens F, Hopman MT, Jones H, Thijssen DH. Impact of physical fitness and daily energy expenditure on sleep efficiency in young and older humans. Gerontology. 2013;59:8–16. doi: 10.1159/000342213. [DOI] [PubMed] [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2016. [Google Scholar]
- Redd A. orthogonalsplinebasis: Orthogonal Bspline Basis Functions. r package version 0.1.5. 2011 [Google Scholar]
- Schafer JL. Tech. rep. The Pennsylvania State University: The Methodological Center; 1998. Some improved procedures for linear mixed models. [Google Scholar]
- Scheipl F, Gertheiss J, Greven S. Generalized functional additive mixed models. Electronic Journal of Statistics. 2016 [Google Scholar]
- Serban N, Staicu AM, Carroll RJ. Multilevel cross-dependent binary longitudinal data. Biometrics. 2013;69:903–913. doi: 10.1111/biom.12083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simas AB, Barreto-Souza W, Rocha AV. Improved estimators for a general class of beta regression models. Computational Statistics and Data Analysis. 2010;54:348–366. [Google Scholar]
- Verkuilen J, Smithson M. Mixed and mixture regression models for continuous bounded responses using the beta distribution. Journal of Educational and Behavioral Statistics. 2012;37:82–113. [Google Scholar]
- Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the gauss-newton method. Biometrika. 1974;61:439–447. [Google Scholar]
- Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:577–590. [Google Scholar]
- Zhao W, Zhang R, Huang Z, Feng J. Partially linear single-index beta regression model and score test. Journal of Multivariate Analysis. 2012;103:116–123. [Google Scholar]
- Zhou L, Huang JZ, Carroll RJ. Joint modelling of paired sparse functional data using principal components. Biometrika. 2008;95:601–619. doi: 10.1093/biomet/asn035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




