Skip to main content
Entropy logoLink to Entropy
. 2022 Jan 28;24(2):203. doi: 10.3390/e24020203

Multivariate Functional Kernel Machine Regression and Sparse Functional Feature Selection

Joseph Naiman 1, Peter Xuekun Song 1,*
Editors: S Ejaz Ahmed1, Farouk Nathoo1
PMCID: PMC8871497  PMID: 35205498

Abstract

Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of scalar covariates. Utilizing functional principal component analysis (FPCA) and the least-squares kernel machine method (LSKM), we are able to substantially extend the framework of semi-parametric regression models of scalar responses on scalar predictors by allowing multiple functional predictors to enter the nonlinear model. Regularization is established for feature selection in the setting of reproducing kernel Hilbert spaces. Our method performs simultaneously model fitting and variable selection on functional features. For the implementation, we propose an effective algorithm to solve related optimization problems in that iterations take place between both linear mixed-effects models and a variable selection method (e.g., sparse group lasso). We show algorithmic convergence results and theoretical guarantees for the proposed methodology. We illustrate its performance through simulation experiments and an analysis of accelerometer data.

Keywords: functional principal component analysis, functional predictor, linear mixed-effects model, mobile device, sparse group regularization, wearable device data

1. Introduction

Data captured by mobile devices have lately received much attention in the data science community. Such data are typically recorded at a high frequency, giving rise to an ample volume of information at a very fine scale, and thus present many methodological challenges in statistical modeling and data analyses. In this paper, we plan to utilize the strength of the classical kernel machine method that enjoys fast computing speed via the linear mixed-effects model to deal with such high-frequency data using a functional data analysis approach. The motivation for our proposed framework come from data collected from a tri-axis accelerometer. Accelerometers, worn on the hip or wrist as a way of monitoring physical activity, are becoming more and more common [1,2,3,4]. There are several different accelerometers available such as ActiGraph GT3X+ (ActiGraph, Pensacola, FL, USA) and Actical (Phillips Respironics, Bend, OR). Raw accelerometer data are often collected in high-resolution signals with a sampling frequency ranging from 30–100 Hz. The commercial software on these devices provides activity counts (ACs) [2,4], which are calculated from the raw accelerometer data using proprietary algorithms. As an example from our motivating dataset, Figure 1 displays a three-dimensional time series of ACs per minute, each on one axis, from one subject wearing the GT3X+ over a period of 7 days (d).

Figure 1.

Figure 1

Activity counts over 7 d from a tri-axis (X-, Y- and Z-axis) accelerometer of a subject.

Oftentimes, different types of summaries of the tri-axis ACs are suggested in the literature as opposed to the utility of all three raw functionals [5,6,7,8]. These summary-data-based approaches may be regarded as a quick and dirty dimension reduction strategy that comes up with summarized data with computationally manageable volumes, which would be then analyzed by existing methods and software. One concern with the use of summarized data would be the loss of potential fine features that can only be captured in data of high resolution. Recently, some researchers have attempted to use the entire functional AC curve through functional data analysis techniques [6,9,10]. Further details on current methods being used to retrieve and interpret accelerometer data can be found in [11]. Our contribution in this paper pertains to a new framework in that tri-axis accelerometer data are used as three-dimensional correlated functional predictors in an association analysis with a potential health outcome such as the Body Mass Index (BMI). The relationship between physical activities and childhood obesity has long been a central interest of public health sciences, and our new scalar-on-functional regression model can provide some new insights into this important scientific problem.

We begin with a brief review of existing functional data models, the least-squares kernel machine model, and different variable selection techniques, which prelude the framework for this paper.

1.1. Functional Regression

There has been much attention in recent years given to functional data analysis (FDA) where either covariates, or response, or both are functional as opposed to scalar in nature [12,13,14,15,16,17]. In this paper, we focused on the methodology that allows us to relate multiple functional covariates to a scalar outcome in a nonlinear way in the presence of other scalar covariates. To proceed, let us introduce some notation. Let L2(T) be the class of square-integrable functions on a compact set T. This is a separable Hilbert space with inner product <f,g>:=Tfg for f,gL2(T). Consider a probability space (Ω,F,P), where Z denotes a functional random variable that maps into L2(T), namely Z:ΩL2(T). Define L2(Ω):={Z:(ΩZ2dP)12<}, where P is a certain probability measure, Z2= <Z, Z>, and assume ZL2(Ω) in the rest of this paper. For convenience, we also assume that Z is mean centered, namely E(Z)=0.

The class of functional linear models (FLM) (e.g., [13,14,15]) is proposed to relate a functional covariate Z with a mean-centered scalar outcome y, which is also known as scalar-on-functional regression: y=<b,Z>+ϵ, where the error term ϵ is a mean zero random variable uncorrelated with Z. An optimal solution of the unknown functional parameter bL2(T) is typically obtained by minimizing the mean-squared error: infbL2(T)E(y<b,Z>)2. Moreover, the mean model for the mean-centered scalar y takes the form E(y|Z)=TZ(t)b(t)dt.

As suggested in the literature, we may obtain an optimal estimator of b by expanding functional predictor Z under certain basis functions. In this paper, we focus on the utility of functional principal component analysis (FPCA) to perform the decomposition of the functional Z. By the Karhunen–Loève expansion (e.g., [18,19,20]), we may write Z(t)=k=1ςkξkϕk(t), where ςk>0 are the eigenvalues, and the loadings are given by ξk:=1ςk<Z,ϕk>. These coefficients satisfy (i) mean zero, E(ξk)=0; (ii) variance one, E(ξk2)=1; (iii) uncorrelated, E(ξkξj)=0 for kj. Then, the mean model may be rewritten as follows,

E(y|Z)=k=1βkξk, (1)

where coefficients βk=<b,ςkϕk>, k=1,···, which are unknown due to the unknown b. Equation (1) presents a linear projection of scalar outcome y on the space spanned by the standardized principal components (PCs) ξk’s of functional predictor Z. On these lines of research, Müller and Yao (2008) proposed a class of functional additive models (FAMs) that extends Equation (1) by allowing a nonparametric form of the projection:

E(y|Z)=k=1fk(ξk), (2)

where fk is a fully unspecified nonlinear smooth function to be estimated. It is obvious that Müller and Yao’s extension given in (2) takes an additive model on individual coefficient (or feature) components ξk’s. Regularization is often needed for both (1) and (2) in order to deal with these infinite-dimensional unknowns. One of the challenges concerning regularization for (2) lies in the technical treatment in the functional space. Müller and Yao (2008) [21] proposed truncation (or a hard threshold) of the eigenspace to retain only the leading components that explain the majority of the total variation in Z. Zhu, Yao, and Zhang (2014) [15] proposed another regularization for the functions fk using the powerful COSSO method [22]. One advantage for this kind of regularization method is that sums of higher-order functional principal components are allowed to be potentially included in the fit model, if they make stronger contributions to the functional relationship than the leading functional principal components. This regularization method [15] begins with an additive model E(y|Z)=k=1sfk(ξk), where s represents some initial degrees of truncation to specify the total number of additive components to be considered. Then, COSSO helps simultaneously regularize and select important functional components among the s functions fk. Although the above discussion is based on a single functional predictor Z in mind, it is appealing to extend such a framework with multiple functional predictors for a broad range of problems.

When multiple functional predictors, say Z1,,Zp, are considered, it is not clear if the above additive model specification remains suitable to handle the complexity, especially a non-additive relationship (e.g., interactions) may be of interest to understand the association between a scalar outcome and multiple functional predictors. In effect, from both the perspectives of theoretical advances and application needs, relaxing the additive relationship is an important task in functional data analysis. Alternatively, there are some methods (e.g., [16,17]) in the literature that do not use the strategy of decomposing Z into its functional components. In this paper, we adopt the framework of kernel machine regression models to extend the methodologies with non-additive relationships between multiple functional predictors and the scalar outcome.

1.2. Least-Squares Kernel Machine

Liu, Lin, and Ghosh (2007) [23] proposed a semi-parametric regression model yi=xiβ+h(zi)+ϵi for subject i=1,,n, where they used the least-squares kernel machine (LSKM) to analyze multidimensional genetic pathways denoted by a vector zi. The key feature of this model is the nonlinear relationship between the outcome yi and a vector of gene expressions zi, which is characterized by a nonparametric smooth function h. Under the theory of smoothing splines, function h is assumed to lie in a reproducing kernel Hilbert space (RKHS), HK, generated by a positive-definite kernel function K(·,·). For the ease of exposition, we suppress the bandwidth for the kernel K in the following discussion. Then, both parameter β and function h are estimated by maximizing the scaled penalized likelihood function:

J(h,β)=12i=1n{yixiβh(zi)}212λ1hHK2, (3)

where λ1>0 is the tuning parameter and ·HK is the norm of the RKHS. For a function hL2(HK), we have h(·)=i=1nαiK(·,zi). Then, hHK2=αKα, where K is an n×n matrix whose (i,j) entry is K(zi,zj) and α=(α1,,αn).

It is known in the literature (e.g., [23,24]) that maximizing J(h,β) in (3) turns out to be equivalent to solving the normal equations from the following linear mixed-effects model (LMM): Y=Xβ+h+ϵ, where h is an n×1 vector of random effects with distribution N(0,τK) and an n-dimensional vector error term ϵN(0,σ2I), with τ=λ11σ2>0. One remarkable advantage of solving (3) through the existing numerical procedure of the LMM is most advocated in the literature [25], where we can determine the smoothing parameter λ1 as part of the estimation of the variance components of the LMM. Therefore, instead of using cross-validation or other information-based tuning methods on λ1, we can solve simultaneously for all the model parameters in (3), as shown in [23]. Utilizing this numerical strength of the kernel machine regression model, we propose a semi-parametric regression model by incorporating functional principal components of functional predictors (i.e., the zi) to evaluate a nonlinear relationship of a scalar outcome with multiple functional covariates in a non-additive way. Assuming that function h belongs to an RKHS, we can use existing software packages for solving LMMs to obtain estimates of all model parameters and the smoothing parameter.

1.3. Feature Selection

To deal with high-dimensional functional principal components from functional covariates, we invoked the sparse regularization approach in the kernel machine regression model. Note that for both mean models (1) and (2), one needs to truncate the series from the Karhunen–Loève expansion. Regularization helps reduce from an infinite number of terms to a sum of finite terms. To introduce some notations, here we present a brief review on the group lasso (GL) [26], sparse group lasso (SGL) [27], and non-negative garrote [28]. See also the series of work originated by COSSO [22]. Yuan and Lin (2007) [26] proposed the group lasso, which solves the convex optimization problem: minβRpY=1LXβ22+λ=1Lβ2, where L is the total number of groups of covariates and X refers to a subset of covariates associated with group . Friedman, Hastie, and Tibshirani [27] extended the group lasso to allow within-group sparsity, namely SGL, given as minβRpY=1LXβ22+λ(1δ)=1Lβ2+λδβ1, where δ[0,1]. The additional 1-norm penalty term on β encourages individual sparsity, while the first penalty targets sparsity at the group level. It is easy to see that group lasso is a special case of the SGL when δ=0.

The non-negative garrote proposed by Breiman (1995) [28] is another useful means of variable selection. It invokes a scaled version of least-squares estimation given by: argmind12YX˜d22+λj=1pdj, subject to dj0,j=1,,p. Here, X˜=(x˜1,,x˜p) is an n×p matrix with columns x˜j=xjβ^jOLS, with β^jOLS being the least-squares estimates from argminβ12YXβ22 with no constraints. Obviously, estimate d^j=0 implies that covariate xj would be excluded from the fit model. Breiman’s formulation that turns a variable selection problem into a parameter estimation problem will be applied for the development of feature selection on functional principal components in this paper.

This paper is organized as follows. Section 2 introduces our proposed high-dimensional kernel machine regression. Section 3 outlines a simple step-by-step algorithm that is used to implement the sparse estimation method. Section 4 concerns asymptotic properties for our proposed sparse kernel machine regression. Section 5 provides simulation results to examine the performance of our method, with comparisons with existing methods. Section 6 illustrates the proposed method by an association analysis of the relationship between the BMI and functional accelerometer data. Section 7 includes our conclusions. The Appendix A contains some key technical details, including the proofs of the theoretical results, while Appendix B presents a discussion on the model identifiability issue.

2. Model and Estimation

Consider a regression analysis of a scalar outcome y on p functional covariates, Z, =1,,p. Let zi=(ξ1,,ξs)i be the s-element vector of functional principal component (FPC) features from the ith observation of the th functional covariate Z, and let zi=[(zi1),,(zip)] be the grand vector of all FPC features from all p functional covariates for subject i, i=1,,n. Clearly, the set of FPC features from each functional covariate forms a group, and in total, there are p groups with s==1ps many FPC features and ziRs. The high dimensionality of FPC features presents the key methodological challenge in the analysis. We consider the following functional kernel machine regression (FKMR) model:

yi=xiβ+h(zi)+ϵi,i=1,···,n, (4)

where βRq is a set of parameters for the effects of q scalar covariates x=(x1,,xq), hHK is an s-variate smooth nonparametric function with HK being the functional space generated by a Mercer kernelK and error terms ϵiiidN(0,σ2). The FKMR model (4) allows for not only nonlinear, but also non-additive relationships with multiple functional covariates Z via their FPC features, =1,,p, and a scalar outcome, y. The statistical task is to estimate and select important functional covariates that are related to the outcome of interest through regularizing the FPC features within each functional covariate. To proceed, following Beiman’s [28] non-negative garrote method, we here introduce a new s-dimensional scaling vector γRs, γ=(γ1,,γs1,,γs), by which we can set γzi=(γ1ξ11,,γs1ξs11,,γsξspp)i a new vector of weighted FPC features by γ via the Hadamard product (i.e., elementwise product). Note that γ is grouped and denoted by γ=((γ1),,(γp)) where γ is an s-element vector of FPC features z of the th functional covariate Z. When the element, say γj, is equal to zero, the corresponding FPC feature ξj will not be selected in the set of important FPCs, and moreover, functional covariate Z is excluded from the FKMR model when the entire vector (γ)=0.

We estimate the unknowns in the FKMR model (4), as well as the scaling parameters γ by minimizing the penalized objective function J1(h,β,γ), whose expression is given on the right-hand side of the following Equation (5):

minh,β,γJ1(h,β,γ)=minh,β,γ12ni=1n{yixiβh(γzi)}2+12λ1hHK2+λ2ρ(γ;δ), (5)

where λ1>0 and λ2>0 are two tuning parameters, and penalty ρ(γ;δ) may be specified according to a certain regularization method. For the case of sparse group lasso (SGL), we take p(γ;δ)=(1δ)=1pγ2+δγ1, δ[0,1]. Typically, δ is predetermined and set to 0.95 or 0.05 depending on the trade-off between group and within-group sparsity, while the factor (1δ) controls the relative group sparsity to individual sparsity of each functional predictor Z. Meanwhile, a large tuning parameter for λ2 would remove a certain group of FPC features from the FKMR model when all elements in the vector γ are zero. Given hHK, an equivalent optimization to the above (5) can be formulated as follows:

minα,β,γJ2(α,β,γ)=minα,β,γ12ni=1nyixiβk=1nαkK(γzi,γzk)2+12λ1αK(γ;Z)α+λ2ρ(γ;δ), (6)

where K(γ;Z) is an n×n matrix whose (i,k)th element is [K(γ;Z)]ik=K(γzi,γzk). Lemma 1 below establishes the equivalency of optimization solutions between (5) and (6), which is crucial in our estimation procedure.

Lemma 1.

A solution (h^, β^, γ^) is a minimizer of (5) if and only if (α^, β^, γ^) is a minimizer of (6), where h^(γ^z)=k=1nα^kK(γ^z,γ^zk).

The proof of Lemma 1 is given in Appendix A.1.

Theorem 1

(Existence of optimizers). If the kernel K(·,γz) is continuous with respect to γRs, then there exists a global minimizer (h^, β^, γ^) for the optimization problem (5).

The proof of Theorem 1 is given in Appendix A.3. Note that there may exist multiple optimal minimizers for (5); Theorem 1 ensures only the existence of optimal solutions, but provides no guarantees for uniqueness due to the fact that (5) or (6) is a nonlinear and non-convex optimization problem. It is worth noting that in both (5) and (6), we set the bandwidth for the kernel at a fixed value due to the identifiability issue with respect to the scaling parameters γ. Refer to Appendix B for more detailed discussions on the issue of parameter identifiability.

3. Implementation and Algorithm

We propose an iterative algorithm to implement our proposed estimation procedure in which we require the differentiability of the kernel with respect to the scaling factor γ and some additional assumptions presented below in order to ensure algorithmic convergence. One part of the algorithm solving (5) is carried out under fixed γ, where the resulting minimization problem reduces to the equivalent maximization problem in the least-squares kernel machine (3) with the FPC features, zi, being replaced by γzi. As pointed out in Section 1.2, the step of numerical calculation can be easily executed in the same fashion as the solution from the linear mixed model, including the REML estimation of the smoothing parameter λ1. The other part of the algorithm is performed under fixed α, β and λ1, where we solve the nonlinear and non-convex optimization problem to update estimates of γ. Lemma 2 below helps us solve for the scaling parameter γ.

Lemma 2.

For fixed (α, β, λ1), minimizing (6) over γ is equivalent to minimizing over γ the following objective function:

12nF(γ)Y˜22+λ2ρ(γ;δ),forλ2>0, (7)

where F(γ)=K(γ;Z)α and Y˜=YXβn2λ1α.

The proof of Lemma 2 is given in Appendix A.2. Linearizing the function F(γ) in (7) leads to an equivalent form:

minγ12nY˜=1pγF()(γ˜)γ22+λ2ρ(γ;δ), (8)

where Y˜=YXβn2λ1αF(γ˜)+γF(γ˜)γ˜, with γF(γ˜) being the gradient of the function F with respect to γ evaluated at γ˜ for some γ˜, and γF()(γ˜) being the columns of γF(γ˜) associated with the th group of γ. This is precisely the form of the standard sparse group regularization problem: minβRp12nY=1pXβ22+λ2ρ(γ;δ). This implies that (8) presents a standard sparse group regularization problem with a specific choice of penalty function ρ(γ;δ).

The convergence of the above iterative search algorithm for updating γ˜ for fixed (α, β, λ1) can be justified by the proximal Gauss–Newton method [29]. Readers are referred to [30] for details on the proximal Gauss–Newton method. One of the key assumptions of the proximal Gauss–Newton method is the existence of a local minimizer. This condition is satisfied in the above (8). This is because according to Theorem 1, there exists a global minimizer.

Algorithm 1 summarizes these iterative steps, which is showed to satisfy a descent property: J2(α(r+1),β(r+1),γ(r+1))J2(α(r),β(r),γ(r)) under the convergence of the proximal Gauss–Newton algorithm for Step 2.2.

Algorithm 1 An iterative algorithm for optimization in FKMR.
  • 1.1

    Perform FPCA (e.g., the R package fdapace) to extract the functional component features for the p functional predictors, and store them in a grand vector for each individual subject zi=[(zi1),,(zip))], i=1,···,n;

  • 1.2

    Initialize γ to be a vector of ones. which translates to mapping the original component scores to itself. Set up a grid of possible tuning parameters for λ1 and λ2, respectively. Set the kernel bandwidth parameter, which may depend on λ1. For each pair of (λ1,λ2) from our grid, perform Steps 2.1–2.3 and 3.1 below.

  • 2.1

    At the (r+1)-th step in the algorithm, first solve the LSKM problem with fixed (γ(r),λ1) (based on a closed-form solution) to update β(r+1) and α(r+1).

  • 2.2

    Solve the group regularity problem (8) with fixed γ˜=γ(r) and fixed (α(r+1), β(r+1), λ1, λ2) using the r+1 updates from the previous iteration. At this step, the proximal Gauss–Newton algorithm produces an update γ(r+1) at convergence.

  • 2.3

    Repeat Steps 2.1–2.2 until convergence.

  • 3.1

    Perform cross-validation over all pairs of (λ1,λ2) to determine the final (α,β,γ).

To speed up Algorithm 1, we propose the following operational schemes that avoid setting up the pairs of (λ1,λ2) and performing Step 3.1. Here are a few remarks on the two algorithms. (i) Algorithm 2 depends on good starting values in order to enjoy a fast search. (ii) The main difference between Algorithms 1 and 2 is that λ2 is fixed in Algorithm 1, while it is changing in Algorithm 2. Some similar algorithms with changing tuning parameters have been proposed in the literature, such as the single index model [31]. (iii) There is no guarantee that both algorithms converge to a global minimizer, and the proximal Gauss–Newton method used in the implementation can only find stationary points. Numerical solvers for the optimization problem in (5) or in (6) indeed remain an open problem in the field of nonlinear and nonconvex optimization.

Algorithm 2 A fast operational scheme of Algorithm 1.
  • 1.

    Step 2.1 of Algorithm 1 is performed by running the linear mixed model with our initial fixed γ from Step 1.2 of Algorithm 1 to obtained updated values of λ1, β, and α.

  • 2.

    Step 2.2 is performed with solving the group regularity problem (8) through the Gauss–Newton algorithm using cross-validation-based tuning (e.g., R package oem).

  • 3.

    Rerun Step 2.1 using the updated γ from Step 2.2 to obtain the estimates for β and α.

4. Theoretical Guarantees

Our theoretical analysis focuses on the finite-sample L2 error bounds for the estimators (h^,γ^) obtained by (5) or (6). Consequently, we are able to establish the estimation consistency. For simplicity, we set β=0 and consider a general setting of random vectors z1,,zn so that the FPC features z1,,zn correspond to a special case. Along similar lines as those of [15,32], the estimation consistency is proven in the case of the SGL penalty function. We define a map Γ with an s-element vector γRs, which gives rise to a collection of all scaling map functions: A={Γ:RsRsΓ(z)=γz,zRsandγRs}. Since Γ is a linear (and bounded) operator, A is a real vector space where (c1Γ1+c2Γ2)(z)=c1Γ1(z)+c2Γ2(z) with any c1,c2R and Γ1,Γ2A. To perform a group regularization estimation, we define an SGL penalty by a norm on A for a fixed δ[0,1] as follows:

ΓSGL=δ=1pγ2+(1δ)γ1. (9)

Consequently, the SGL regularization estimation requires the following constrained optimization:

minΓA,hHKJ3(Γ,h)=minΓA,hHKYhΓn2+λ1hHK2+λ2ΓSGL, (10)

where YhΓn2=1ni=1nyi(hΓ)(zi)2. Lemma 3 below provides the essential finite-sample inequalities that lead to the estimation consistency.

Lemma 3

(Basic inequality). Let h^Γ^ be the minimizer of (10). Let h0Γ0 be the true function. Then, we have:

J3(Γ^,h^)2(ϵ,h^Γ^h0Γ0)n+λ1h0HK2+λ2Γ0SGL, (11)

where 2(ϵ,h^Γ^h0Γ0)n=2ni=1nϵi(h^Γ^)(zi)(h0Γ0)(zi).

We need the following notation before presenting our theoretical guarantees. Let N(δ,M,Pn) denote the minimal δ covering number of the function set M under the empirical metric Pn based on the random vectors z1,,zn. Let N=N(δ,M,Pn) be a shorthand notation. This means that there exist functions m1,,mN (not necessarily in the set M) such that for every function mM, there exists a j{1,,N} such that mmjPnδ, with mmjPn:=1ni=1n{m(zi)mj(zi)}2. Define the δ-entropy of M for the empirical metric, Pn, as H(δ,M,Pn):=log(N(δ,M,Pn)). Consider a functional space of the form:

B=b:=b(h,Γ)=hΓh0Γ0hHK2+h0HK2+ΓSGL2+Γ0SGL2|hHK,ΓA.

We postulate the following assumptions.

Assumption A1.

The error term ϵ=(ϵ1,,ϵn) is uniformly sub-Gaussian; that is, for constants C1 and C2,

maxn1maxi=1,,nC12Eexpϵi2C121C2.

Clearly, the moment condition is bounded below from zero.

Assumption A2.

Γ0SGL2+h0HK2>0, and the entropy of space B with respect to the empirical metric Pn is bounded as follows:

H(δ,B,Pn)C3δ2ψ,

where C3 is some constant and ψ(0,1).

Assumption A3.

supbBbPnC4 for some constant C4.

Theorem 2.

(Consistency) Under Assumptions 1–3 above, if tuning parameters λ1 and λ2 satisfy

λ21=n11+ψh0HK2+Γ0SGL1ψ1+ψ,andλ1=Op(1)λ2,

then we have

h^Γ^h0Γ0n=Op(n12+2ψ)hHK2+ΓSGLψ1+ψ,and (12)
h^HK2+Γ^SGL=Op(1)h0HK2+Γ0SGL. (13)

Theorem 2 implies estimation consistency under the right rates for the two tuning parameters λ1 and λ2. Due to the potential identifiability issues explained in detail in Appendix B, although the estimator (h^,Γ^) may not be unique, the sum of h^ and Γ^ is not too far away from the sum of the true h0 and Γ0.

Corollary 1.

If the RKHS, HK, contains differentiable functions h(z) whose norm h(z)HK is uniformly bounded for all functions hHK and zRs, then Assumption 2 holds when Theorem 2 is replaced by H(δ,HK,Pn)C1δ2ψ,forallδ0.

The proofs of Theorem 2 and Corollary 1 are given in Appendix A.4 and Appendix A.5, respectively. Often, when we are only interested in a subset of functions in the RKHS (e.g., functions with norm less than one), we can substitute the full space HK in Corollary 1 with the subspace of interest. Refer to [15] or [32], where both considered an RKHS (i.e., Sobolev space) with functions of norm less than or equal to one.

5. Simulation Experiments

We performed extensive simulation to investigate the performance of our proposed procedure, including the performance of SGL variable selection and its overall accuracy. Due to the limitations of space, we include results from two simulation experiments in this section, and more results may be found in the first author’s Ph.D. dissertation [30].

5.1. Setup

In the evaluation of the performance accuracy, following [15], we used both quasi-R2 and adjusted quasi-R2 defined as follows:

RQ2:=1i=1n(yiyi^)2i=1n(yiyi¯)2,andRAQ2:=11RQ2n1n(k+1).

The latter is known to be appealing for the comparison of the estimation sparsity. There is another performance metric of interest in addition to model accuracy. Performance in variable selection is summarized in terms of the stability measured by sensitivity and specificity for both functional and variable selections under these simulation experiments. Our algorithm uses existing R packages, including emmreml, kspm, and oem.

Specifically, we designed the following two simulation settings.  

Scenario 1: A single functional predictor with sparsity in the FPC features.

Scenario 2: Multiple functional predictors with sparsity in the functional predictors and with sparsity in the FPC features of important functional predictors.  

Each of these two scenarios would be handled using certain suitable penalty functions to address the designed sparsity; for example, in Scenario 2 we used a two-level variable selection penalty (e.g., SGL) to deal with two types of sparsity in the true model. In all analyses, we used the Gaussian kernel K(u,v)=exp(1puv2) in our estimation, where p was set as the number of features, which is equivalent to dividing the γ vector by p. This scaling parameter may be either estimated or set to the number of features to overcome the identifiability issue according to [33], where theoretical justification was given for the use of the number of features for the bandwidth parameter in the case of the Gaussian kernel.

According to [23], due to the difficulty of the graphical display for the estimated s-dimensional function h(·) of z, we summarized the goodness-of-fit by regressing the true h on the estimated h^, with both being evaluated at the design points. From this concordance regression analysis, we may measure the goodness-of-fit on h^ through the average intercepts, slopes, and R-squared (also known as the coefficient of determination) obtained over the number of replications. Clearly, a high-quality fit is reflected by (i) the intercept being close to zero, (ii) the slope being close to one, and (iii) the R-squared being close to one. Moreover, we graphically display the estimated function h^ by setting all variables equal to 0.5 except the one of interest over a grid of 100 equally spaced points on the interval [0,1]. Such visualization of the functional estimation at each margin further facilitates the evaluation of the proposed algorithm in addition to the results obtained from the concordance regression analyses.

In all scenarios, we generated 1000 IID functional paths, of which 750 paths were assigned to the training set and 250 paths were assigned to the test set for an external performance evaluation. It is the test set that we used to display the performance accuracy. We used a one-dimensional covariate xi to show the flexibility of our model in a semi-parametric setting, with independent copies of xiN(0,1). We chose the true coefficients in the kernel machine model similar to those given in [23].

5.2. Simulation in Scenario 1

In this simple scenario with a single functional predictor, we simulated data from a model with sparsity in its FPC features. To do so, we generated a single functional predictor based on the first 15 eigenbasis of the Fourier basis functions over the interval [0,1]: Z(t)=j=115ςjξjϕj(t). That is, a functional predictor was created as a linear combination of the 15 basis functions, where ϕj(·) is the jth Fourier basis function, ςj is the jth eigenvalue of Z, and ξj is the jth FPC feature that is simulated from a normal distribution detailed as follows.

There were 100 sampled points that were first equally spaced in the interval [0,1] and then varied with certain small deviations drawn from νN(0,0.001). Set ςj=45×0.64j and ξjN(0,1) independently over j=1,,15. As was done in [17], instead of directly using ξj, we used ζj=Φ(ξj), where Φ is the CDF of the standard normal. This resulted in z=(ζ1,,ζ15). We chose the second, ζ2, and ninth, ζ9, features as important features in the following true nonlinear non-additive model:

yi=2xi+20cos(2πζi2)10sin(2πζi9)+ζi2ζi9+ϵi,

with ϵiiidN(0,1). FPCA was performed by the R package PACE [34], producing the estimated FPC scores, ξj^, as well as the estimated eigenvalues, ςj^, which in turn enabled us to compute ζ^j, j=1,,15.

We applied both LASSO and MCP penalty functions in our implementation, termed as FKMRLasso and FKMRMCP, respectively. We compared the results of our method with the standard linear approach with both LASSO and MCP under the assumption of linear functional relationships, as well as the COSSO method for functional additive regression [15] using the R package COSSO [15,34]. Since the COSSO package is built for nonparametric regression (and not partial linear models), we adopted the backfitting strategy and regressed the residuals with our estimated effect of xi removed.

In addition, we compared our method with an oracle FKMR estimator, called FKMRoracle, that assumed the full knowledge of the true ζj containing two true nonzero signals, ζ2 and ζ9. We also considered two oracle versions of our proposed algorithm, FKMRLassooracle and FKMRMCPoracle, both of which used the knowledge of true ζj in order to evaluate the performance of the FPCA procedure. This evaluation is important as our proposed procedure can be in principle used in simpler cases that do not involve functional covariates. Note that once we used FPCA to obtain ζ^j features, our algorithm essentially works in a standard regression setting with the sparsity of covariates. Thus, our proposed procedure can be in principle used in simpler cases with scalar covariates. In Scenario 1, due to the highly nonlinear relationships between the FPC features and the outcome, as expected, the naive linear model performed poorly in terms of both model selection and model consistency. The detailed simulation results for Scenario 1 can be found in the first author’s Ph.D. dissertation [30]. In brief, our proposed method worked well in all aspects. In this setting, COSSO also worked well in terms of model fit, but it tended to select noisy features more frequently than our proposed method, leading to more false positives.

5.3. Simulation in Scenario 2

Now, we generated four functional predictors of the form: Z(t)=j=19ςjξjϕj(t), =1,,4, where ϕj, ςj, and ξj were set in the same way as those given in Scenario 1. It follows that z=(ζ11,,ζ91,,ζ14,,ζ94), where ζj is the jth Φ-transformed feature for the th functional covariate. Sparsity was specified as follows: the first and second functional covariates, Z1 and Z2, were chosen as important signals in which these transformed FPC features, {ζ11,ζ31,ζ41,ζ22,ζ72}, are five important features (three features from the Z1 and two features from Z2) that are related to the outcome:

yi=2xi+ζi11+ζi31+ζi41+ζi22+ζi72+10cos(2πζi11)10ζi222+10ζi72210ζi312+10exp(ζi31)ζi418sin(2πζi72)cos(2πζi31)+20ζi11ζi72+ϵi,i=1,,n,

where ϵiiidN(0,1). This model specifies both group sparsity (two of the four functional predictors) and within-group sparsity (three of the nine FPC features in Z1 and two of the nine FPC features in Z2). In addition, we specified non-additive relationships in the true model across multiple functional covariates.

We fit the data using the proposed methods, including FKMRGMCPoracle, FKMRLasso, FKMRGLasso, FKMRSGL, FKMRMCP, and FKMRGMCP, and the results based on 100 replicates are summarized in Table 1. For comparison, we also fit the simulated data by existing methods, including the linear model (denoted by LM + penalty), COSSO functional additive regression, and the oracle method using the knowledge of true important features in the analysis, as done in the above simulation of Scenario 1. From Table 1 regarding the goodness-of-fit, we see that all of our FKMR estimators outperformed the standard linear estimators in terms of RAQ2 among all of our penalty functions, and they outperformed COSSO for penalties that accounted for group sparsity. In the concordance regression analysis, we see that all intercepts were close to zero, all slopes close to one, and all R2 close to one, indicating a high goodness-of-fit for functional estimation. COSSO tended to perform on par for penalties that did not account for group sparsity (LASSO and MCP). It is evident that using a group sparsity penalty function (SGL, GLasso, and GMCP) clearly outperformed the methods that did not regularize the grouping of covariates (Lasso and MCP). In addition, our FKMR estimators (except FKMRLasso) performed as well as the oracle estimator FKMRGMCPoracle both in terms of RAQ2 and in terms of our estimate of functional h. The results also indicated that there were little differences between using a concave (MCP or GMCP) penalty function or using a convex (GLasso or SGL) penalty function.

Table 1.

Goodness-of-fit and the concordance regression for Scenario 2.

Model RAQ2 β Reg of h on h^
Intercept Slope R2
FKMRLasso 0.830 2.00 −0.062 1.01 0.848
FKMRGLasso 0.937 1.99 −0.055 1.01 0.972
FKMRSGL 0.928 2.00 −0.051 1.01 0.955
FKMRMCP 0.835 2.01 −0.062 1.01 0.856
FKMRGMCP 0.935 1.99 −0.056 1.01 0.970
FKMRGMCPoracle 0.911 1.99 −0.049 1.01 0.937
COSSO 0.832
LM + Lasso 0.453
LM + GLasso 0.324
LM + SGL 0.450
LM + MCP 0.513
LM + GMCP 0.307

As regards the group sparsity, Table 2 indicates that the all methods had a high sensitivity of detecting functional signals, while the proposed FKMR methods had better specificity than both sparse linear models and COSSO. Concerning the within-group sparsity, it is interesting to note that a bigger difference was seen in terms of what type of penalty function was being used in feature selection. As shown in Table 3 and Table 4, using a general penalty (e.g., Lasso and MCP) that does not take the grouping structure into account tended to under-select important features within a group. COSSO tended to perform well within group sparsity. Moreover, Figure 2 shows that the FKMR method estimated the five signal functions (Z1 and Z2) well.

Table 2.

Sensitivity and specificity of functional selection for Scenario 2.

Model Selection Frequency
Z1^ Z2^ Z3^ Z4^
FKMRLasso 100 100 0 0
FKMRGLasso 100 100 4 4
FKMRSGL 100 100 0 0
FKMRMCP 100 100 0 0
FKMRGMCP 100 100 3 4
COSSO 100 100 5 6
LM + Lasso 100 100 19 21
LM + GLasso 94 99 7 8
LM + SGL 100 100 19 18
LM + MCP 100 100 20 19
LM + GMCP 93 99 7 8

Table 3.

FPC feature selection for signal functional Z1 in Scenario 2.

Model Selection Frequency
ζ11^ ζ21^ ζ31^ ζ41^ ζ51^ ζ61^ ζ71^ ζ81^ ζ91^
FKMRLasso 100 1 97 0 0 0 0 0 0
FKMRGLasso 100 100 100 100 100 100 100 100 100
FKMRSGL 100 21 100 71 26 20 17 16 15
FKMRMCP 100 1 99 1 0 0 0 0 0
FKMRGMCP 100 100 100 100 100 100 100 100 100
COSSO 100 2 100 93 1 0 0 1 0
LM + Lasso 100 10 100 100 10 8 7 10 5
LM + GLasso 94 94 94 94 94 94 94 94 94
LM + SGL 100 12 100 100 10 8 8 11 5
LM + MCP 100 10 100 100 9 8 9 7 5
LM + GMCP 93 93 93 93 93 93 93 93 93

Table 4.

FPC feature selection for signal functional Z2 in Scenario 2.

Model Selection Frequency
ζ12^ ζ22^ ζ32^ ζ42^ ζ52^ ζ62^ ζ72^ ζ82^ ζ92^
FKMRLasso 0 3 0 0 0 0 100 0 0
FKMRGLasso 100 100 100 100 100 100 100 100 100
FKMRSGL 16 100 14 7 16 23 100 15 7
FKMRMCP 0 11 0 0 0 1 100 0 0
FKMRGMCP 100 100 100 100 100 100 100 100 100
COSSO 8 97 5 5 5 15 100 3 3
LM + Lasso 17 100 14 7 16 23 100 15 6
LM + GLasso 99 99 99 99 99 99 99 99 99
LM + SGL 17 100 14 7 16 23 100 15 7
LM + MCP 17 100 13 6 16 23 100 15 8
LM + GMCP 99 99 99 99 99 99 99 99 99

Figure 2.

Figure 2

Five marginal estimates of important feature functions with 95% shaded confidence bands evaluated at 100 grid points while holding all other components equal to 0.5 in Scenario 2.

6. Data Example

To show the usefulness of our proposed methodology, we analyzed data of 550 children recruited by the ELEMENTS study [35], who had consent to wear an actigraph (ActiGraph GT3X+; ActiGraph LLC. Pensacola, FL, USA). This wearable was to be placed on their non-dominant wrist for five to seven days with no interruption. The actigraph measured tri-axis accelerometer data sampled at 30 Hz, which captured three different directions of a person’s movement. The BMI was the outcome of interest as it is biomarker of obesity. Sex and age were confounding factors used in the analysis. Due to some missing data, our analysis only included children who wore the device properly for 85% or more over the study period, which resulted in 395 participants, consisting of 189 males and 206 females. Other studies such as [36] have excluded days of accelerometer data with more than five percent missing. The mean ± SD BMI of the study cohort was 21.5 ± 4.1. The mean age of the study participants was 14.3 ± 2.1 y. A more detailed description of the dataset used for this paper can be found in [37]. Our primary interest was to see if the BMI is associated with physical activity in the presence of other covariates, specifically sex and age. We preprocessed the activity counts over the 7 d of wear by taking the median in the 1 min epoch over the entire 7 d of wear. For example, since all the participants started wearing the device at 3 p.m., the first data point for each individual was a median of 7 ACs (each for one day) for the 1 min epoch of 3:00–3:01 p.m. This procedure that takes the medians across the minutes from different days has been considered in other applications such as [36]. See Figure 3 as an example of the resulting time series of medians derived from the AC data displayed in Figure 1.

Figure 3.

Figure 3

The 24 h minute-by-minute medians of 7 d ACs for one subject.

We applied the following five models, labeled as M0–M4 for convenience, to analyze the data with the 24 h median ACs as functional predictors. Let ξijk be the ith person’s kth FPC score for functional predictor j.

  • M0:

    Linear model (LM) with only the fixed features: BMIiβ0+β1Agei+β2Sexi;

  • M1:

    Linear model with SGL penalty (LM+SGL) using the FPCA features: BMIiβ0+β1Agei+β2Sexi+j=13k=1skβjkξijk;

  • M2:

    LSKM using the FPCA features: BMIiβ0+β1Agei+β2Sexi+h(zi);

  • M3:

    FKMR model with SGL penalty (FKMRSGL) using the FPCA features: BMIiβ0+β1Agei+β2Sex+h(γzi);

  • M4:

    COSSO using the FPCA features: res(BMIi)|zij=13k=1skfij(ξijk). In order for a direct application of the COSSO R package, we used residuals res(BMIi)=BMIiβ0^+β1^Agei+β2^Sexi in the COSSO model fit, with β^0,β^1 and β^2 being the estimates of the coefficients from Model M0.

The BMI and age were mean centered and scaled to be a standard deviation of one, so β0 was absent in the models. Here are some key findings from the data analyses. First, in terms of the goodness-of-fit, Table 5 suggests that M3, i.e., our proposed model FKMR with the SGL penalty, gave the best performance, where the adjusted R2 of M3 was nearly twice as big as all the other four models. Second, it is interesting to note that both the COSSO and the FKMRSGL did not select the FPC scores associated with the Z-axis. Third, as shown in Table 6, all of the FPC components chosen by COSSO were also chosen by the FKMRSGL. It is worth noting that the linear model together with the SGL penalty selected the highest number of FPC components, yet performed the worst in terms of the model fit.

Table 5.

Goodness-of-fit for the five models used in the data analysis.

Model Adjusted R2
M0: LM 0.07
M1: LM + SGL 0.13
M2: LSKM 0.18
M3: FKMRSGL 0.30
M4: COSSO 0.14

Table 6.

Axis-specific FPC feature selection.

Model X-Axis Y-Axis Z-Axis
ζ11^ ζ21^ ζ31^ ζ41^ ζ51^ ζ61^ ζ12^ ζ22^ ζ32^ ζ42^ ζ52^ ζ13^ ζ23^ ζ33^ ζ43^
FKMRSGL
COSSO
LM + SGL

7. Conclusions

In this paper, we proposed a method to model the nonlinear relationship between multiple functional predictors and a scalar outcome in the presence of other scalar confounders. We used the FPCA to decompose the functional predictors for feature extraction and used the LSKM framework to model the functional relationship between the outcome and principal components. We developed a simultaneous procedure to select important functional predictors and important features within selected functionals. We proposed a computationally efficient algorithm to implement our regularization method, which was easily programmed in R with the utility of multiple existing R packages. It should be noted that although we focused on functional regression in this paper, the method proposed can be applied to non-functional predictors. In effect, by using functional principal components, we essentially bypassed the infinite-dimensional problem and worked effectively in a non-functional framework with the FPC features. Through simulation and using data from the ELEMENT dataset, we demonstrated how the FKMR estimator outperformed existing methods in terms of both variable selection and model fit. It should be noted that the existing COSSO method did perform well in terms of variable selection, as shown in Section 5.

A technical issue pertains to identifiability limitations with regard to the bandwidth parameter and to the RKHS estimator. To overcome this, we suggested fixing the bandwidth parameter; see the detailed discussion in Section 3. We established key theoretical guarantees for our proposed estimator. In the case where there are multiple proposed estimators (and thus the identifiability issues arise), the established theoretical properties in Section 4 apply to any of those estimators.

Variable section on functional predictors presents many technical challenges, and there are many methodological problems that remain unsolved. This paper demonstrated a possible framework to regularize estimation with a bi-level sparsity of functional group sparsity and within-group sparsity. In the LSKM paper [23], it was briefly mentioned that if the relationship between the scalar outcome and p genetic pathways is additive, we can tweak the model as yi=xiβ+h1(zi1)++hp(zip)+ϵi where each hj belongs to its own RKHS. It is easy to extend our method and algorithms to handle this case. For future research, an extension on longitudinal outcomes may be considered via a mixed-effects model yij=xiβ+h(zij)+uijvi+ϵij where uijvi are the random effects. Other useful extensions to the proposed paradigm would be on the lines of generalized linear models and Cox regression models.

Appendix A. Technical Assumptions and Proofs

Appendix A.1. Proof of Lemma 1

It suffices to show that for any J1(h,β,γ) in (5) we can always find αRn such that J1(h˜=i=1nαiK(·,γzi),γ,β)J1(h,β,γ) where h˜ is the projection of h onto the linearly spanned space given by span{K(·,γzi),,K(·,γzn)}. For any h we can write h=h+h˜ where hspan{K(·,γz1),···,K(·,γzn)}. Since Hk is a reproducing kernel Hilbert space we can rewrite (5) as follows:

J1(h,γ,β)=12ni=1n{yixiβ<h,K(·,γzi)>}2+12λ1hHk2+λ2ρ(γ;δ).

Since <h,K(·,γzi)>=0 for every i, we obtain

J1(h,γ,β)=12ni=1nyixiβk=1nαkK(γzi,γzk))2+12λ1h+h˜Hk2+λ2ρ(γ;δ)12ni=1nyixiβk=1nαkK(γzi,γzk))2+12λ1h˜Hk2+λ2ρ(γ;δ)=J1(h˜,γ,β).

Appendix A.2. Proof of Lemma 2

The equivalence of forms become clear once we rewrite (6) in the matrix notation. Equation (6) can be written as follows:

minα,β,γJ2(α,β,γ)=minα,β,γ12nYXβK(γ;Z)α22+12λ1αK(γ;Z)α+λ2ρ(γ;δ). (A1)

For fixed α, β and λ1, minimizing the function in (A1) with respect to γ is equivalent to

minγ12nYXβn2λ1αK(γ;Z)α22+λ2ρ(γ;δ). (A2)

Appendix A.3. Proof of Theorem 1

With loss of the generality we use the penalty function for sparse group lasso but this proof can easily be modified for other penalty functions. Also, we fix λ1=λ2=δ=1, and consider βR as well as set the design matrix X (or vector in this case) scaled to have norm 1. The case of βRq will follow along similar lines of arguments. Let γD3 with D3={γ:γ112nY22}. Define f(γ)=K(γ;Z)=ηmax(K(γ;Z))0, where ηmax(K(γ;Z)) denotes the largest eigenvalue of K(γ;Z) with the operator norm (the norm of K(γ;Z)) defined in its usual way K(γ;Z)=sup{K(γ;Z)x22:x22=1}. Since D3 is compact and K(γ;Z) is continuous with respect to γ it achieves its maximum over D3. Thus, we define η=supγD3f(γ)0. Define D2={β:β(1+η)Y2}, where the upper bound is denoted by b=(1+η)Y20. Moreover, define D1={α:α2n(Y2+b)}.

Since D1,D2 and D3 are compact there exists a (α,β,γ) such that J2(α,β,γ)J2(α,β,γ) for all (α,β,γ)D1×D2×D3. Note that J2(0,0,0)=12nY22 and (0,0,0)D1×D2×D3. We claim that (α,β,γ) is a global minimizer, which is proved below by contradiction.

Suppose that there exists (α˜,β˜,γ˜)D1×D2×D3 where J2(α˜,β˜,γ˜)<J2(α,β,γ). We must have that γ˜D3; if not, we have J2(α˜,β˜,γ˜)γ˜1J2(0,0,0)J2(α,β,γ). Let q1,···,qn be the orthonormal vectors of K(γ˜;Z) with its associated eigenvalues η1ηn0. We can write out α˜,X,Y in terms of these basis functions where α˜=i=1n<α˜,qi>qi, Y=i=1n<Y,qi>qi and X=i=1n<X,qi>qi. Let Ciα˜=<α˜,qi>, CiY=<Y,qi> and CiX=<X,qi>. It follows that

J2(α˜,β˜,γ˜)12ni=1nCiYqii=1nCiXβ˜qii=1nCiα˜ηiqi22+12i=1n(Ciα˜)2ηi,

which is equal to 12ni=1n(CiYCiXβ˜Ciα˜ηi)2+12i=1n(Ciα˜)2ηi. We can minimize the above objective function with respect to Ciα˜ and β˜. First, note that for any ηi=0 we can let Ciα˜=0 as it will not affect the expression above. It is sufficient to consider ηi>0. Taking the first derivative and setting it equal to zero, we obtain the score equations the minimizer must satisfy, for our minimum β˜ and Ciα˜

β=i=1nCiX(CiYCiα˜ηi) (A3)
Ciα˜=1n+ηi(CiYCiXβ˜). (A4)

In the above derivation we used the fact that 1=X22=i=1n(CiX)2. Plugging (A4) into (A3), we obtain

β=i=1nCiXCiY(1ηin+ηi)1i=1n(CiX)2ηin+ηi. (A5)

It follows that

βi=1nCiXCiY1i=1n(CiX)2ηn+ηX2Y2X22(1ηn+η)Y2(1η1+η)=b.

Thus, the β that minimizes J2 for a given γD3 is in D2. Also, (A4) implies that Ciα˜(Y2+X2β2); consequently, the optimal α for the given γ˜D3 and βD2 that minimizes J2 satisfies α2n(Y2+b). As a result, αD2. This suggests that for any (α˜,β˜,γ˜)D1×D2×D3 we can find an (α,β,γ)D1×D2×D3 such that J2(α˜,β˜,γ˜)J2(α,β,γ).

Appendix A.4. Proof of Theorem 2

By Lemma 8.4 on page 129 in [32], Assumptions 1, 2, and 3 imply:

PsupbB1n|i=1nϵib(zi)|bPn1ψTcexpT2c2,Tc (A6)

where the constant c is dependent on C1,C2,C3,C4, and ψ. It follows that

supbB1n|i=1nϵib(zi)|bPn1ψ=Op(1). (A7)

Therefore, for any hHK and a scaling map function ΓA, we obtain

n(ϵ,hΓh0Γ0)nhHK2+h0HK2+ΓSGL2+Γ0SGL2ψhΓh0Γ0Pn1ψ=Op(1). (A8)

For our estimators, h^ and Γ^, it is easy to see that

(ϵ,h^Γ^h0Γ0)n=Op(n12)h^Γ^h0Γ0n1ψh^HK2+h0HK2+Γ^SGL2+Γ0SGL2ψ. (A9)

From (A9), we obtain the following inequality:

h^Γ^h0Γ0n2+λ1h^HK2+λ2Γ^SGL2Op(n12)h^Γ^h0Γ0n1ψh^HK2+h0HK2+Γ^SGL2+Γ0SGL2ψ+λ1h0HK2+λ2Γ0SGL2. (A10)

We require λ1=Op(1)λ2, namely λ2 and λ1 go to zero at the same rate. We will show at the end of the proof what happens if they are not of the same order. Therefore, without loss of generality, we set λ1=λ2, denoted by λ. In what follows, we divide (A10) into two cases.

Case 1: Suppose that

Op(n12)h^Γ^h0Γ0n1ψh^HK2+h0HK2+Γ^SGL2+Γ0SGL2ψλh0HK2+Γ0SGL2.

In this case, we have

h^Γ^h0Γ0n2+λh^HK2+Γ^SGL2Op(n12)h^Γ^h0Γ0n1ψh^HK2+h0HK2+Γ^SGL2+Γ0SGL2ψ. (A11)

Above (A11) is further discussed separately in two sub-cases.

Case 1a: If h0HK2+Γ0SGL2h^HK2+Γ^SGL2, then we have

h^Γ^h0Γ0n2+λh^HK2+Γ^SGL2Op(n12)h^Γ^h0Γ0n1ψh^HK2+Γ^SGL2ψ. (A12)

Therefore,

h^HK2+Γ^SGL2ψOp(nψ2(1ψ))h^Γ^h0Γ0nψλψ1ψ. (A13)

It follows that

h^Γ^h0Γ0n=Op(n12(1ψ))Op(λψ1ψ),h^HK2+Γ^SGL2=Op(n11ψ)Op(λ1+ψ1ψ). (A14)

Case 1b: If h0HK2+Γ0SGL2h^HK2+Γ^SGL2, then:

h^HK2+Γ^SGL2=Op(h0HK2+Γ0SGL2)Op(1).

Therefore,

h^Γ^h0Γ0n=Op(n12(1+ψ))h0HK2+ΓSGL2]ψ1+ψ.

Consequently, we obtain

h^Γ^h0Γ0n=Op(n12(1ψ))Op(λψ1ψ),h^HK2+Γ^SGL2=Op(n11ψ)Op(λ1+ψ1ψ). (A15)

Both terms in (A15) are the same rates as those in (A14).

Case 2: Suppose that

Op(n12)h^Γ^h0Γ0n1ψh^HK2+h0HK2+Γ^SGL2+Γ0SGL2ψλ(h0HK2+Γ0SGL2).

Then, we have

h^Γ^h0Γ0n2+λh^HK2+Γ^SGL22λh0HK2+Γ0SGL2.

This implies that

h^Γ^h0Γ0n=Op(λ12)h0HK2+Γ0SGL212,h^HK2+Γ^SGL2=Op(1)h0HK2+Γ0SGL2. (A16)

In order to make (A14) and (A16) have the same rates we first equate the two term Op(λ12)hHK2+ΓSGL212 and Op(n12(1ψ))Op(λψ1ψ), and then solve for a common λ. The solution is given as follows:

λ1=n11+ψhHK2+ΓSGL21ψ1+ψ.

Under this λ value we obtain that (A14)–(A16) as of the form:

h^Γ^h0Γ0n=Op(n12(1+ψ))h0HK2+Γ0SGL2ψ1+ψ, (A17)
h^HK2+Γ^SGL2=Op(1)h0HK2+Γ0SGL2. (A18)

This completes the proof of Theorem 2.

Now we discuss the situation where the tuning parameters λ1 and λ2 are not of the same order. As seen blow, the selection consistency may not be guaranteed. Take Case 2 as an example. Suppose that

Op(n12)h^Γ^h0Γ0n1ψh^HK2+h0HK2+Γ^SGL2+Γ0SGL2ψλ1h0HK2+λ2Γ0SGL2.

Let us consider two cases.

Case 2a: If λ1h0HK2λ2Γ0SGL2, following the same arguments above, we have

h^Γ^h0Γ0n=Op(λ212)Γ0SGL),h^HK2=Op(λ2λ1)Γ0SGL2,Γ^SGL2=Op(1)Γ0SGL2. (A19)

Case 2b: If λ1h0HK2λ2Γ0SGL2, then following the same logic as before:

h^Γ^h0Γ0n=Op(λ112)h0HK),Γ^SGL2=Op(λ1λ2)h0HK2,h^HK2=Op(1)h0HK2. (A20)

Both terms involve Op(λ1λ2) and Op(λ2λ1), indicating that these two tuning parameters λ1 and λ2 should go to zero at the same rates. Moreover, we can think of our estimator h^Γ^ as one operational object. See Appendix B for more details on this, which can further explain the need of one rate for the two penalties.

Appendix A.5. Proof of Corollary 1

For convenience, we present the following lemma proved by [32] (on page 20).

Lemma A1.

(Geer’s Lemma) A d dimensional ball of radius R, Bd(R), in Rd with Euclidean metric can be covered by (4R+δδ)d balls of radius δ.

We have shown in the proof of Theorem 1 that the optimal γ vector is restricted to be within a ball of a radius that depends on the norm of Y. For the sake of simplicity let us confine our γ to be within a norm ball of radius 1, γG={γ:γ221}. We then confine our set which we called A to be restricted to those γ, that is A={Γ:Γ(z)=γz,γG}. Since our γRs, we can use above Lemma A1 and cover our set A with N1=4+δδs number of functions in the following sense. The ball of radius 1 in Rs can be covered (using the Euclidean metric) by {γ1,γN1}. Since there is a one to one relationship between the functions Γ and γ, take the set {Γ1,,ΓN1} and define the metric between some Γj and Γk in the set A as d(Γj,Γk)=γjγk2. Then, the set of functions {Γ1,,ΓN1} is a δ-covering for A under this metric with entropy s log(4+δδ). For each Γj we have an induced RKHS, HKΓj={hΓj:hHK} with entropy no larger than that of HK, which according to the assumption, has entropy Aδ2ψ for some ψ(0,1) and AR. Therefore, the covering number N2=N(δ,HKΓj,Pn)exp{Aδ2ψ}. This implies that for every Γj there exists a set {hj1Γj,,hjN2Γj} such that for every hΓjHKΓj there exists an integer i{1,,N2} we have hΓjhjiΓjPnδ. Set B is essentially the union of the different Hilbert spaces of the form HKΓ. Under the setup, a natural estimate of the delta-covering number of this set would be approximately of size N1×N2 where functions take the form of {h11Γ1,,h1N2Γ1,,hN11ΓN1,,hN1N2ΓN1}. In addition, we add N2 functions from the set {h1Γ0,,hN2Γ0} where Γ0 is the true Γ0 (or one of the true Γ0). Since HKΓj is a Hilbert space for every j, if hΓjHKΓj so is hΓjhHK2+h0HK2+ΓjSGL2+Γ0SGL2. We can simply ignore the denominator and substitute hΓjhHK2+h0HK2+ΓjSGL2+Γ0SGL2 with h˜ΓjHKΓj where h˜=hhHK2+h0HK2+ΓjSGL2+Γ0SGL2.

We now prove Corollary 1.

Proof. 

Set M=suph<h(z),h(z)> where the inner product is the standard Euclidean inner product. This is for a fixed z, or under the assumption that the gradient is uniformly bounded, we can take the suphHK,zRs<h(z),h(z)>. Let N1=4+δ3M12δ3M12s which is the number of balls needed to provide a δ3M12 covering for a norm 1 ball in Rs. Let N2=expA(δ3)2ψ which is the covering number needed to provide a δ3 cover of our space HK. Let:

h^˜Γ^h˜0Γ0=h^Γ^h^HK2+h0HK2+Γ^SGL2+Γ0SGL2h0Γ0h^HK2+h0HK2+Γ^SGL2+Γ0SGL2

be an arbitrary function in the set B. There exists a Γj where j{1,,N1} such that d(Γj,Γ^)δ3maxi=1,,nzi2M, and there exists an i where i{1,,N2} such that h^˜ΓjhjiΓjPnδ3.

Similarly, there exists a t{1,,N2} such that h˜0Γ0htΓ0Pnδ3. We construct our approximating function of h^˜Γ^h˜0Γ0 as hjiΓjhtΓ0. We now show that this function is within δ of our arbitrary function h^˜Γ^h˜0Γ0. Applying the mean value theorem for multivariate functions, h^˜Γ^(z)=h^˜Γj(z)+h^˜(C(z))(Γ(z)^Γj(z)), we have:

(h^˜Γ^h˜0Γ0)(hjiΓjhtΓ0)Pnh^˜Γ^hjiΓjPn+h˜0Γ0htΓ0Pnh^˜Γ^hjiΓjPn+δ3=h^˜ΓjhjiΓj+h^˜(C(·))(Γ^Γj)Pn+δ3

where vector zRs lies in the segment from γjz and γ^z, and C(·) is an unknown function that maps from Rs into Rs that allows for the formula to hold. Continuing our chain of inequalities, we obtain:

h^˜ΓjhjiΓj+h^˜(C(·))(Γ^Γj)Pn+δ3h^˜(C(·))(Γ^Γj)Pn+δ3+δ3=1ni=1nh^˜(C(zi))(Γ^(zi)Γj(zi))2+δ3+δ31ni=1nMγ^ziγjzi22+δ3+δ3Mδ3maxi=1,,nzi2M2maxi=1,,nzi22+δ3+δ3=δ3+δ3+δ3=δ.

Therefore, to provide a δ cover we need N1×N2+N2 number of functions or:

exp{A(δ3)2ψ}4+δ3M12δ3M12s+expAδ32ψ=exp{A˜δ2ψ}C+δδs+exp{A˜δ2ψ},

where A˜=A32ψ and C=12M12. Taking the log we see the entropy is A˜δ2ψ+log(C+δδ)s+1 which is of the same order as A˜δ2ψ (the log term is dominated by the first term). Therefore a sufficient (but not necessary) condition for our set B to have the same entropy as that of the original RKHS HK is for the suph<h(z),h(z)> to be bounded. Having bounded derivatives is reasonable for any RKHS since every RKHS satisfies the Lipschitz condition of the form:

|h(X)h(Y)|=|<h,KX><h,KY>|hHK<KX,KY>12=hHKd(X,Y),

where the distance metric in Rs is defined as d(X,Y)2=K(X,X)2K(X,Y)+K(Y,Y). If we restrict our functions in the RKHS of norm C for some constant C then we have a universal Lipschitz constant C to ensure bounded derivatives. □

Appendix B. Discussion about the FKMR Estimator

We introduce γ as a way of performing variable selection on our vector of FPC features. We want to illustrate this technical trick with some concrete examples and discuss identifiability issues with the resulting estimator. There are two ways of looking at the estimation of the unknown functions h0 and Γ0. The first way is to view our feature vector, z, as being related to the dependent variable y through the composite function hΓ, as explained in Section 4. The second and equivalent way is to view our features as unknown. The true features take the form of γz, where in this case the ∘ denotes the Hadamard product. We are given z and need to estimate the “true" features γz. In addition, we need to estimate the relationship between γz and y, which is done through the function hHK.

The first way is to estimate the function h0Γ0. The function belongs to the RKHS HKΓ. We essentially consider many different function spaces to construct our estimator. The intersection between the function spaces is not necessarily empty, implying that our estimator may not be unique. We proceed this discussion more formally. Let K:Rs×RsR be a positive definite function. Let Γ:RsRs. We define KΓ:Rs×RsR as the function given by KΓ(s,t)=K(Γ(s),Γ(t)). This new function, KΓ is positive definite. There is a relationship between the original RKHS, HK and the new RKHS, HKΓ. This results in HKΓ={hΓ:hHK}. For any vector uHKΓ, we have that uHKΓ=inf{hHK:u=hΓ}. In general, HKΓ¬HK. In (5), we take the norm with respect to the original space HK. Our iterative procedure essentially presents the second way in which the true features are unknown, whereas our theoretical arguments are justified through the first way. Given the knowledge of the features (which translates to fixing a γ), we are confined to just one RKHS, HK. Take the linear kernel, K(x1,x2)=x1x2 as an example. Suppose the truth is that y is related to a one-dimensional feature z0 through the following formulation: y=h0(z0)+ε where h0HK1, where K1 is the kernel that maps from R×RR. Therefore, if we knew the feature z1, we would proceed to optimize (6) using the standard LSKM. However, when each y is associated with a two-dimensional vector z=(z1,z2), where z2 is a “noisy” feature and unrelated to y. Suppose that a priori we do not know this information. Typically we use a model y=h(z1,z2)+ε where hHK, where K is the kernel that maps from R2×R2R. In this case, we introduce our γ vector (γ1,γ2) and formulate y=h(γ1z1,γ2z2)+ϵ. All functions, h in the space HK, are of the form h(z)=xz for some two-dimensional vector x=(x1,x2). There is a one-to-one relationship between h and x. The true function, h0, has an associated real number c where h1(z1)=cz1. We can recover h1HK1 from our estimation of h and γ if we set γ=(1,0) and x=(c,), where "★" is any real number. Equivalently, we can recover h1 under γ=(1,1) where x=(c,0). There are many functions that may recover the original function in the RKHS corresponding to the linear space kernel. Formulating our problem in the first way, through function composition, we can estimate Γ0 with the γ being (1,0) or (1,1).

We can now see that in the intersection between HKΓ1 and HKΓ2, where Γ1 has associated γ1=(1,0) and Γ2 has associated γ2=(1,1), lies our estimate of h1. In truth, for the linear space RKHS, there is no need to apply our method since h0HK1 can be estimated directly from the larger space HK where we set h(z)=xz where x=(c,0). We can never hope to have variable selection consistency nor can we hope to have identifiability of our estimator for these types of spaces. However, from a goodness-of-fit standpoint, we are able to do just as good a job with many types of function compositions. Our hope is that we can glean some variable selection by penalizing the γ vector with the ρ(γ;δ) term which, going back to the above scenario, should give preference to γ=(1,0) over γ=(1,1). For the RKHS associated with the Gaussian Kernel, the “larger dimensional space”, a Gaussian Kernel mapping from higher dimensions, does not necessarily contain the functions from a “lower dimensional space”, a Gaussian Kernel mapping from lower dimensions. However through the introduction of the γ transformation of the features, we can recover the equivalent functions of the "lower dimensional space”.

Author Contributions

Conceptualization, P.X.S. and J.N.; Formal analysis, J.N.; Methodology, J.N. and P.X.S.; Supervision, P.X.S.; Writing—original draft, J.N.; Writing—review & editing, P.X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by NSF DMS#2113564.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The used data of physical activity counts, BMI and demographic variables (sex and age) are available upon request through a formal data request procedure outlined by the ELEMENT Cohort Study. Contact the corresponding author of this paper for the detail.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Chandler J.L., Brazendale K., Beets M.W., Mealing B.A. Classification of Physical Activity Intensities Using a Wrist-worn Accelerometer in 8–12-Year-old Children. Pediatric Obes. 2016;11:120–127. doi: 10.1111/ijpo.12033. [DOI] [PubMed] [Google Scholar]
  • 2.Chen K.Y., Bassett D.R. The Technology of Accelerometry-based Activity Monitors: Current and Future. Med. Sci. Sport. Exerc. 2005;37:S490–S500. doi: 10.1249/01.mss.0000185571.49104.82. [DOI] [PubMed] [Google Scholar]
  • 3.Bai J., Di C., Xiao L., Evenson K.R., LaCroix A.Z., Crainiceanu C.M., Buchner D.M. An Activity Index for Raw Accelerometry Data and Its Comparison with Other Activity Metrics. PLoS ONE. 2016;11:e0160644. doi: 10.1371/journal.pone.0160644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.John D., Freedson P. ActiGraph and Actical Physical Activity Monitors: A Peek under the Hood. Med. Sci. Sport. Exerc. 2012;44:S86–S89. doi: 10.1249/MSS.0b013e3182399f5e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kim Y., Lee J.M., Peters B.P., Gaesser G.A., Welk G.J. Examination of Different Accelerometer Cut-points for Assessing Sedentary Behaviors in Children. PLoS ONE. 2014;9:e90630. doi: 10.1371/journal.pone.0090630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bai J., Sun Y., Schrack J.A., Crainiceanu C.M., Wang M.C. A Two-stage Model for Wearable Device Data. Biometrics. 2018;74:744–752. doi: 10.1111/biom.12781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sasaki J.E., Hickey A.M., Staudenmayer J.W., John D., Kent J.A., Freedson P.S. Performance of Activity Classification Algorithms in Free-Living Older Adults. Med. Sci. Sport. Exerc. 2016;48:941–950. doi: 10.1249/MSS.0000000000000844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Di C.Z., Crainiceanu C.M., Caffo B.S., Punjabi N.M. Multilevel Functional Principal Component Analysis. Ann. Appl. Stat. 2009;3:458–488. doi: 10.1214/08-AOAS206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Goldsmith J., Liu X., Rundle A., Jacobson J. New Insights into Activity Patterns in Children, Found Using Functional Data Analyses. Med. Sci. Sport. Exerc. 2016;48:1723–1729. doi: 10.1249/MSS.0000000000000968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Li H., Keadle S.K., Staudenmayer J., Assaad H., Huang J.Z., Carroll R.J. Methods to Assess An Exercise Intervention Trial Based on 3-Level Functional Data. Biostatistics. 2015;16:754–771. doi: 10.1093/biostatistics/kxv015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhang Y., Li H., Keadle S.K., Matthews C.E., Carroll R.J. A Review of Statistical Analyses on Physical Activity Data Collected from Accelerometers. Stat. Biosci. 2019;11:465–476. doi: 10.1007/s12561-019-09250-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ramsay J.O., Silverman B.W. Functional Data Analysis. Springer; Berlin/Heidelberg, Germany: 2005. (Springer Series in Statistics). [Google Scholar]
  • 13.Cardot H., Ferraty F., Sarda P. Spline Estimators for the Functional Linear model. Stat. Sin. 2003;13:571–591. [Google Scholar]
  • 14.Cardot H., Ferraty F., Sarda P. Functional Linear Model. Stat. Probab. Lett. 1999;45:11–22. doi: 10.1016/S0167-7152(99)00036-X. [DOI] [Google Scholar]
  • 15.Zhu H., Yao F., Zhang H.H. Structured Functional Additive Regression in Reproducing Kernel Hilbert Spaces. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2014;76:581–603. doi: 10.1111/rssb.12036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ferraty F., Mas A., Vieu P. Nonparametric Regression on Functional Data: Inference and Practical Aspects. Aust. N. Z. J. Stat. 2007;49:267–286. doi: 10.1111/j.1467-842X.2007.00480.x. [DOI] [Google Scholar]
  • 17.McLean M.W., Hooker G., Staicu A.M., Scheipl F., Ruppert D. Functional Generalized Additive Models. J. Comput. Graph. Stat. 2014;23:249–269. doi: 10.1080/10618600.2012.729985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bosq D. Linear Processes in Function Spaces. Volume 149 Springer; New York, NY, USA: 2000. Lecture Notes in Statistics. [Google Scholar]
  • 19.Hall P., Müller H.G., Wang J.L. Properties of Principal Component Methods for Functional and Longitudinal Data Analysis. Ann. Stat. 2006;34:1493–1517. doi: 10.1214/009053606000000272. [DOI] [Google Scholar]
  • 20.Hall P., Hosseini-Nasab M. On Properties of Functional Principal Components Analysis. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2006;68:109–126. doi: 10.1111/j.1467-9868.2005.00535.x. [DOI] [Google Scholar]
  • 21.Müller H.G., Yao F. Functional Additive Models. J. Am. Stat. Assoc. 2008;103:1534–1544. doi: 10.1198/016214508000000751. [DOI] [Google Scholar]
  • 22.Lin Y., Zhang H.H. Component Selection and Smoothing in Multivariate Nonparametric Regression. Ann. Stat. 2006;34:2272–2297. doi: 10.1214/009053606000000722. [DOI] [Google Scholar]
  • 23.Liu D., Lin X., Ghosh D. Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models. Biometrics. 2007;63:1079–1088. doi: 10.1111/j.1541-0420.2007.00799.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wood S.N. Generalized Additive Models: An Introduction with R. Chapman and Hall; London, UK: 2006. [Google Scholar]
  • 25.Lin X., Zhang D. Inference in Generalized Additive Mixed Models by Using Smoothing Splines. J. R. Stat. Soc. Ser. (Stat. Methodol.) 1999;61:381–400. doi: 10.1111/1467-9868.00183. [DOI] [Google Scholar]
  • 26.Yuan M., Lin Y. Model Selection and Estimation in Regression with Grouped Variables. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2006;68:49–67. doi: 10.1111/j.1467-9868.2005.00532.x. [DOI] [Google Scholar]
  • 27.Simon N., Friedman J., Hastie T., Tibshirani R. A Sparse-Group Lasso. J. Comput. Graph. Stat. 2013;22:231–245. doi: 10.1080/10618600.2012.681250. [DOI] [Google Scholar]
  • 28.Breiman L. Better Subset Regression Using the Nonnegative Garrote. Technometrics. 1995;37:373–384. doi: 10.1080/00401706.1995.10484371. [DOI] [Google Scholar]
  • 29.Salzo S., Villa S. Convergence Analysis of a Proximal Gauss–Newton Method. Comput. Optim. Appl. 2012;53:557–589. doi: 10.1007/s10589-012-9476-9. [DOI] [Google Scholar]
  • 30.Naiman J. Ph.D. Dissertation. University of Michigan; Ann Arbor, MI, USA: 2020. Multivariate Functional Kernel Machine Regression and Feature Selection with Applications to Accelerometer Mobile Health Devices. [Google Scholar]
  • 31.Peng H., Huang T. Penalized Least Squares for Single Index Models. J. Stat. Plan. Inference. 2011;141:1362–1379. doi: 10.1016/j.jspi.2010.10.003. [DOI] [Google Scholar]
  • 32.Geer S.A. Empirical Processes in M-Estimation. Cambridge University Press; Cambridge, UK: 2000. (Cambridge Series in Statistical and Probabilistic Mathematics). [Google Scholar]
  • 33.Hainmueller J., Hazlett C. Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach. Political Anal. 2014;22:143–168. doi: 10.1093/pan/mpt019. [DOI] [Google Scholar]
  • 34.Yao F., Müller H.G., Wang J.L. Functional Data Analysis for Sparse Longitudinal Data. J. Am. Stat. Assoc. 2005;100:577–590. doi: 10.1198/016214504000001745. [DOI] [Google Scholar]
  • 35.Lewis R.C., Meeker J.D., Peterson K.E., Lee J.M., Pace G.G., Cantoral A., Téllez-Rojo M.M. Predictors of Urinary Bisphenol A and Phthalate Metabolite Concentrations in Mexican Children. Chemosphere. 2013;93:2390–2398. doi: 10.1016/j.chemosphere.2013.08.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Schrack J.A., Zipunnikov V., Goldsmith J., Bai J., Simonsick E.M., Crainiceanu C., Ferrucci L. Assessing the Physical Cliff: Detailed Quantification of Age-related Differences in Daily Patterns of Physical Activity. J. Gerontol. Ser. Biol. Sci. Med. Sci. 2014;69:973–979. doi: 10.1093/gerona/glt199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Jansen E.C., Dunietz G.L., Chervin R.D., Baylin A., Baek J., Banker M., Song P.X.K., Cantoral A., Tellez Rojo M.M., Peterson K.E. Adiposity in Adolescents: The Interplay of Sleep Duration and Sleep Variability. J. Pediatr. 2018;203:309–316. doi: 10.1016/j.jpeds.2018.07.087. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The used data of physical activity counts, BMI and demographic variables (sex and age) are available upon request through a formal data request procedure outlined by the ELEMENT Cohort Study. Contact the corresponding author of this paper for the detail.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES