Summary
Wearable device technology allows continuous monitoring of biological markers and thereby enables study of time-dependent relationships. For example, in this paper, we are interested in the impact of daily energy expenditure over a period of time on subsequent progression toward obesity among children. Data from these devices appear as either sparsely or densely observed functional data and methods of functional regression are often used for their statistical analyses. We study the scalar-on-function regression model with imprecisely measured values of the predictor function. In this setting, we have a scalar-valued response and a function-valued covariate that are both collected at a single time period. We propose a generalized method of moments-based approach for estimation while an instrumental variable belonging in the same time space as the imprecisely measured covariate is used for model identification. Additionally, no distributional assumptions regarding the measurement errors are assumed, while complex covariance structures are allowed for the measurement errors in the implementation of our proposed methods. We demonstrate that our proposed estimator is L2 consistent and enjoys the optimal rate of convergence for univariate nonparametric functions. In a simulation study, we illustrate that ignoring measurement error leads to biased estimations of the functional coefficient. The simulation studies also confirm our ability to consistently estimate the function-valued coefficient when compared to approaches that ignore potential measurement errors. Our proposed methods are applied to our motivating example to assess the impact of baseline levels of energy expenditure on BMI among elementary school-aged children.
Keywords: Accelerometers, Energy expenditure, Functional data, Generalized method of moments, Measurement error
1 |. MOTIVATING EXAMPLE
It is estimated that about 20% of the U.S. child population suffer from obesity and the percentage of childhood obesity has more than tripled in the last 40 years1. The consequences of childhood obesity include reduced healthy physiological, behavioral and psychological development during childhood. Obesity in children and adolescents also leads to adverse health outcomes such as type 2 diabetes and cardiovascular diseases in adulthood. To combat this epidemic, targeted environmental and behavioral school-based interventions designed to increase physical activity among school-aged children have gained widespread interest. Examples of these school-based interventions include activity permissive learning environments and the use of stand-biased desks in classrooms2,3,4,5.
In a recent study, stand-biased desks were introduced to a Texas school district as a means of increasing school day physical activity. A research question of interest was to quantify the association between daily energy expenditure and subsequent progression toward obesity among children. The children were given accelerometer armbands to approximate their daily energy expenditure. Since the levels of true daily energy expenditure is not directly observable, it is calculated as a function of the observed physical activity behavior from the devices. In this manuscript, we assume that the objective measures of energy expenditure obtained from physical activity monitors are prone to measurement error and develop a method of analysis that calibrates the measurement error and is easily applicable for assessing the effects of daily energy expenditure on 18-month change in BMI.
Technological advances on wearable or implantable devices enable continuous monitoring of biological markers resulting in complex data designed to answer scientific questions such as questions related to energy expenditure levels obtained from activity monitors6,7,8,9,10,11. The resulting data appear as either sparsely or densely observed functional data and techniques for functional data analysis are often used for their statistical analyses12,13. Functional data analysis focuses on the analysis of infinite dimensional data that appear as curves, trajectories, shapes or images12,13. Methods developed for functional data analysis are based on extensions of ideas from multivariate analysis, nonparametric regression, functional analysis, dimension reduction techniques and square integrable processes14,12.
In determining the role of energy expenditure in obesity development among children, we consider the linear scalar-on-function regression model with a scalar-valued outcome Y and an imprecisely observed function-valued covariate, X(t). In this setting, X(t) is a latent function-valued covariate that is not directly observable. Instead, it is unbiasedly measured by W(t) prone to some measurement error. Linear scalar-on-function regression models extend classical regression methods to allow function-valued covariates with scalar-valued outcomes in regression settings and many statistical methods have been proposed to estimate the model15,16,17,18,13,12,19,20 when the covariate is measured with negligible error.
When functional data are contaminated with errors, measurement errors were often treated as additional error terms associated with the function-valued responses. For example,21 considered nonparametric estimation of longitudinal data where the responses were longitudinally observed and contaminated with errors. Under independence error structures for the measurement errors, scatter plot smoothing methods were used to estimate the mean and covariance functions of the response curves21.22 provided methods for nonparametric estimation of response curves contaminated by random noise. The mean functions were estimated through the use of B-splines and functional principal component analysis. While22 discussed the presence of measurement errors under independent realizations from a random process, the measurement errors considered were associated with random response curves.23 assumed uncorrelated error structures and provided Gaussian and generalized shrinkage estimates for the functional principal components scores to improve the variance of the errors associated with the function-valued responses prone to errors.24 considered measurement error in the functional smooth random-effects model where the responses were curves with vector-valued covariates. The error process considered were random errors associated with the response curves and the model was estimated through quasi-score estimating equations24.25 proposed a nonparametric approach for the analyses of sparsely observed longitudinal data using functional principal component analyses in the presence of measurement errors. However, the measurement errors considered were errors associated with the observed responses25.
Most work addressing measurement error in functional data have treated these errors as additional error terms in the models as discussed above. To our knowledge, there is limited research on functional regression models when the functional covariate is contaminated with measurement error. A common practice in the literature is to pre-smooth each contaminated functional covariate, then use the smoothed curves to build and estimate regression models. However, our simulation studies show that the pre-smoothing step does not correct the attenuation bias in regression coefficient estimation caused by measurement error and it has similar numerical performances as the naive estimator which uses the contaminated functional covariate directly without any pre-smoothing. Similar findings were also discussed in26. More recently, some authors have considered treating these error terms as classical measurement errors. These recent developments27,26,28 extend methods for addressing measurement errors in linear regression models to functional regression settings. Using the smoothing spline mixed model to estimate the measurement error variance,27 developed a two-stage nonparametric regression calibration method for the partial functional linear model. The method proposed in27 relies on the assumption that the measurement errors are independent and identically distributed normal random variables. However, in practice, the measurement errors from the same curve can be correlated and not necessarily follow the normal distribution.26 provided a simulation-extrapolation approach for addressing imprecisely observed function-valued covariates with scalar outcomes. The authors allowed correlated measurement error structures, but required its covariance structure to be of a pre-determined parametric form. We recently developed methods for reducing measurement error biases associated with function-valued covariates prone to measurement error in regression models involving multiple function-valued outcomes28. We estimated the model parameters using the EM algorithm, while functional principal components were used to estimate the variance of the classical measurement error.
In this paper, we propose a different approach to incorporate measurement errors and allow unspecified error structures. A function-valued instrumental variable belonging in the same parameter space as X(t) is used for model identification, and the generalized method of moments-based approach is proposed to consistently estimate the functional coefficient, β(t), in the presence of functional measurement errors. Our proposed method for functional measurement errors do not treat the imprecisely observed function-valued covariate as longitudinal or time series data. Rather, we consider the functional covariate as a single function that is used to estimate a latent variable such as true energy expenditure. Under our newly developed methods, estimation of the measurement error covariance is not required for parameter estimation. To the best of our knowledge, the use of function-valued instrumental variables in the functional linear regression model is novel. We illustrate the impacts of measurement error and covariance structures on the estimated parameters through simulation studies. With the increasing use of wearable or activity monitoring devices to study biological phenomenon in biomedical research, it is critical that statistical methods that allow their accurate and unbiased assessments be developed.
The rest of the paper is organized as follows. Our proposed methodology is introduced and described in Section 2. We provide relevant asymptotic results in Section 3; while the simulation results and the application to our motivating example are provided in Sections 4 and 5, respectively. Finally, discussions and concluding remarks are provided in Sections 5.2 and 6, respectively.
2 |. MODELS
Let (Y, X) be a pair of scalar-valued random variable and a random function assumed to be square integrable and defined on [0, 1] such that X = {X(t), t ∈ [0,1]}. The scalar-on-function regression model with a mis-measured functional covariate for the ith subject is
(1) |
(2) |
where β(t) is an unknown functional coefficient. The Xi(t) is a function-valued covariate that is not directly observable but measured by Wi(t). The Wi(t)’s serve as unbiased measures for Xi(t) subjected to measurement errors Ui(t) that are possibly correlated over time. For notation simplicity, we leave out the intercept α in (2) and assume both response Yi and functional covariate Xi are centered with and for t ∈ [0, 1].
We first approximate β(t) in (1) using polynomial splines and write where are unknown spline coefficients, while are a set of spline basis functions on [0,1]. In this manuscript, B-spline basis functions are used due to their flexibility and computational efficiency. These basis functions can be efficiently constructed using the Cox-De Boor recursion formula29. In the spline approximation provided above, the number of basis, Kn, is allowed to increase with the sample size and the corresponding spline functions provide better approximations for larger sample sizes. For large n, Kn is often chosen to be large enough to reasonably approximate the patterns in β(t). In subsection 4.2, we propose a data driven method to automatically select Kn for finite samples.
Following the spline approximations, Model (1) becomes
(3) |
Let , and . The measurement error model in (2) becomes Wik = Xik + Uik and the full model is re-written as
(4) |
(5) |
where are correlated errors. Under this representation, the proposed model reduces to a variation of multivariable linear regression model with measurement errors. However, the main difference is that the number of linear covariates in (4) and (5) is not fixed, instead it increases with the sample size.
2.1 |. Instrumental variables
The presence of measurement errors in predictor variables of regression models renders the model unidentifiable without additional information30. Such additional information can come in the form of replicates of W(t), assumption of a known covariance function of the measurement error ΣUU, or the presence of instrumental variables for X(t) in the data. An instrumental variable is a variable that is correlated with X(t) but is, uncorrelated with U(t). The presence of an instrumental variable for X(t) in the data allows for consistent estimation of β(t) when X(t) is subjected to error. While the use of instrumental variables has been well studied in generalized linear regression models with measurement errors31,30,32,33,34,35,36,37, use of instrumental variables in functional linear regression settings with measurement errors are limited.38 considered the use of instrumental variables in scalar on function regression when X(t) is endogenous (i.e. corr{X(t), ε} ≠ 0). Using a function-valued instrumental variable, the authors extended the generalized method of moments approach to high dimensional settings to estimate the function-valued model parameter. While our proposed models also consider scalar on function regression, the current application focuses on the case where X(t) is imprecisely observed, rather than it being an endogenous covariate.26 estimated the covariance matrix of the measurement error in the scalar on function models by treating the function-valued covariate as longitudinal data. In our proposed methods, we do not consider X(t) longitudinal. Rather, it is considered a function obtained at one time point to describe a latent variable or a true covariate. In this paper, an instrumental variable approach is proposed for model identifiability while generalized method of moments is used to consistently estimate β(t).
For i = 1, …, n, let be a function-valued instrumental variable observed for the ith individual. Assume are independent across subjects with {Mi(t)} independent of {Mj(t)}, for i ≠ j. Also, cov {Mi(t), Ui(s)} = 0 and cov {Mi(t), εi} = 0 for any t, s ∈ [0,1], while {Mi(t)} is correlated with {Xi(t)}. The independence assumption between Mi(t) and Ui(s) is often referred to as instrument exogeneity across time. While a strong assumption, this condition cannot be directly tested or assessed since Ui(t) is unobserved. Therefore, theoretical considerations regarding the application are often used in the selection of an instrumental variable in practice.
In addition to equations (1) and (2), we add the model equation for the instrumental variable as Mi(t) = δXi(t) + ωi(t), for some constant δ ≠ 0 and a mean zero error {ωi(t)}, which is uncorrelated with {Xi(t)}. While Mi(t) is correlated with Xi(t), it is not necessarily an unbiased measure for Xi(t). We reformulate our final model below with all the assumptions
(6) |
(7) |
(8) |
where E(εi) = 0, E{Ui(t)} = 0 and E{ωi(t)} = 0. In addition, we assume cov{Xi(t), εi} = 0, cov {Mi(t),εi} = 0, cov {Mi(t), Ui(s)} = 0, for t, s ∈ [0,1] and i = 1, ⋯ , n. Our methodology is described next.
2.2 |. Proposed method for estimating the functional coefficient
Let , for k = 1, …, Kn, and . Then one has
(9) |
(10) |
where . Therefore,
(11) |
(12) |
and the unknown coefficients γ can be estimated by
(13) |
where ΩWM and ΩMY are sample estimates of cov(Wi, Mi) and cov(Yi, Mi) respectively, defined as
(14) |
(15) |
and , and , are centered variables, each with a sample mean of zero. When Mi and Wi, are of the same dimension, and ΩMW is an invertible square matrix, then is reduced to . As a result, for any t ∈ [0, l], the estimator of the regression coefficient function is defined as
(16) |
The proposed is a generalized method of moments based estimator. While no distributional assumptions are required for Ui(t), the estimation of β(t) depends on the assumption that an instrument, Mi(t), exists in the data. Additionally, estimation of the covariance matrix for the measurement error is not required for the successful implementation of our proposed methodology. Under current functional data methodology, a naive estimator of β(t) would be based on Wi(t) and Yi with Wi(t) being treated as the true value for Xi(t). Simulation studies in Section 4 show that failure to account for potential measurement errors can substantially bias the results. The strength of our is that while Xi(t) might not be directly observed, estimation of its effect on the response is based on its unbiased measure as well as additional information provided in the data in the form of Mi(t).
3 |. ASYMPTOTIC PROPERTIES
In this section, we establish the L2 consistency of . We summarize the needed assumptions as follows:
We assume (Yi, Xi(t), Wi(t), Mi(t), t ∈ [0, l]) for i = 1, …, n are independent with the same distribution as (Y, X(t), W(t), M(t), t ∈ [0, l]).
The instrument variable M = {M(t), t ∈ [0, l]} is uncorrelated with regression error ϵ and the measurement error U = {U(t), t ∈ [0, l]} with cov{M(t), ϵ} = 0 and cov{M(t), U(s)} = 0 for any s, t ∈ [0, l].
- The latent functional covariate X = {X(t), t ∈ [0, 1]} is independent of the regression error ϵ with cov{X(t), ϵ} = 0 for t ∈ [0, l], but is correlated with instrument variable M. Let ΣXM (t, s) = cov{X (t), M (s)}. We assume that for any positive functions h1, h2, h3, h4, there exist constants λ1, λ2 > 0 such that
We assume supt [E |M (t)|l + E |W (t)|l + E |U (t)|l] < +∞ for some sufficiently large l > 0.
The variance of the error term is bounded.
- We assume ΣXX (t, s) = Cov {X (t), X (s)}, ΣMM (t, s) = Cov {M (t), M (s)}, ΣUU (t, s) = Cov {U (t), U (s)} are all positive definite bivariate functions and there exist positive constants λ1 and λ2 such that for any positive functions a1 (t), a2 (t) ∈ L2[0, 1],
The coefficient function β(t) is (p + 1)-times continuously differentiable with β(t) ∈ ℂp+1[0, 1].
- The number of knots and interior knots satisfy that
for some constant c > 0.
Assumptions (A1), (A4)-(A5) and (A7)-(A8) are standard in polynomial spline regression literature. Similar assumptions were also used in39,40,41. Assumption (A3) requires that {X(t)} and {M(t)} be correlated and {M(t)} contains information about {X(t)}. Assumption (A3) fails if {X (t)} and {M (t)} are independent of each other with ΣXM (t, s) = 0 for all t, s ∈ [0, 1]. This is required to guarantee the invertibility of the matrix in (13) and the proposed generalized method of moments estimator to be well defined. Assumption (A6) implies that the covariance functions of random processes {X(t)}, {M(t)} and {W(t)} all are positive definite.
Theorem 1.
Under assumptions (A1)-(A8), the coefficient function estimator in (16) is L2–consistent with
where ||·||2 is the functional L2 norm.
Theorem 1 establishes the L2 rate of consistency for in the presence of measurement errors. Our asymptotic result is comparable to the rate of convergence results given in42 and43 when the functional covariates are measured without errors. Here we assume the functional covariates are observed continuously. As argued in43, the rate of convergence obtained in Theorem 1 does not change when the functional covariates are observed discretely at a sequence of grid points, provided that the maximum distance between any neighboring grid points converges to zero sufficiently quickly. The proof of our asymptotic results are provided in the Appendix.
4 |. SIMULATION
In this section, we discuss our simulation results and describe the tuning parameter selection.
4.1 |. Simulation Results
We now describe our simulation experiments and study the numerical performance of our proposed methodology. All data in our simulations were independently generated from the functional linear regression model
where we consider two forms for β(t) with β1 (t) = sin (2πt) and β2(t) = sin(π(8(t−.5))/2)/(1+(2(8(t−.5))2)(sign(t−.5)+1)), and sign(a) = 1 and sign(−a) = −1, for a > 0. We only present the result for the case β1(t) and defer the simulation results for β2(t) in the Supplementary Material. The regression errors, ε, were simulated independently and follow a N (0, σ2). While the observable functional covariate X (t) = sin (2πt) + εX (t), where εX(t) denotes a mean zero Gaussian process with constant marginal variance and cor{εX(t1), εX(t2)} = ρX for any t1 ≠ t2. We generated the observed functional covariate W (t) = X (t)+u (t) and the instrumental variable M (t) = X (t)+ ω (t) where errors u (t) and ω (t) are also mean zero Gaussian processes with constant marginal variances and , and correlations ρu and ρM respectively. All the error terms were generated to be independent of each other. In all our simulations, the number of replications considered were nr = 1000. For the methods described in this section, the number of knots were selected using a tailored cross-validation approach as discussed in Section 4.2.
Since we only report the results for β1(t), we will simply use β(t) and drop the subscript. Let be the estimator of β(t) in rth replication and . Let be a sequence of equally spaced grid points on (0, l) to evaluate the performance of proposed estimator. We define the averaged squared bias of as
the averaged sample variance as
and averaged integrated mean square error as
We first generated data with σ = 1, σX = 4, σu = 4, σω = 1, ρX = ρu = ρM = 0 and four different sample sizes n = l00, 200, 500, l000. We estimated the regression coefficient function using the proposed methodology. However, the matrix inversion in the definition of the proposed method of moments estimator can be unstable. Therefore, we adopted the small sample modification30 to improve the finite sample performance of our proposed method. In addition to our approach, four additional approaches were also considered for estimating β in the simulation studies. In the first scenario, we assumed X(t) was observed and was estimated by regressing {Yi} on {Xik} directly in Equation (4). The second estimator, , ignored the measurement error and estimated the spline coefficients by regressing {Yi} on {Wik} instead. The third estimator is a variant of the second approach and obtained using individually pre-smoothed based on polynomial splines regression. The fourth estimator, , is obtained by pre-smoothing each Wi using smoothing spline approach instead. Note that was not available in the real data analysis. However, it served as a benchmark to assess the performance of our estimator in the simulation studies. The naive estimators, , , , ignored the measurement error in the data. The estimator was obtained using our proposed instrumental variable based method.
Table 1 reports the ABias2, Avar and AIMSE values for different estimators. For our proposed instrumental variable based estimator, , we clearly see that ABias2, Avar and AIMSE all decrease with increasing sample sizes, supporting our asymptotic convergence result. Furthermore, the biases of and are similar and much smaller than the bias of . Furthermore, the bias of was non-ignorable even when the sample size was increased to l000. This suggests that failure to account for measurement error can lead to biased estimation of the functional coefficient. In addition, similar to , both and have non-ignorable bias, which indicates that pre-smoothing step does not take care the attenuation bias. Comparing Avar, had the smallest sample variance due to larger variability in W and the fact that the variance of regression coefficient is inversely related to the variability in the covariates. Our proposed method of moment estimator had the largest sample variance due to variability in both W and M. However, for relatively large sample sizes (n=500, or 1000), the proposed had better overall performance than all the approaches based on W with smaller AIMSE values.
TABLE 1.
n | ABias2 | Avar | AIMSE |
100 | 0.0017 | 0.1764 | 0.1781 |
200 | 0.0011 | 0.0864 | 0.0875 |
500 | 0.0001 | 0.0408 | 0.0408 |
1000 | 0.0000 | 0.0198 | 0.0199 |
n | ABias2 | Avar | AIMSE |
100 | 0.0394 | 0.1121 | 0.1515 |
200 | 0.0400 | 0.0534 | 0.0934 |
500 | 0.0392 | 0.0246 | 0.0638 |
1000 | 0.0394 | 0.0121 | 0.0515 |
n | ABias2 | Avar | AIMSE |
100 | 0.0393 | 0.1117 | 0.1510 |
200 | 0.0400 | 0.0538 | 0.0938 |
500 | 0.0392 | 0.0247 | 0.0638 |
1000 | 0.0393 | 0.0122 | 0.0515 |
n | ABias2 | Avar | AIMSE |
100 | 0.0145 | 0.3806 | 0.3951 |
200 | 0.0149 | 0.1676 | 0.1825 |
500 | 0.0144 | 0.0867 | 0.1011 |
1000 | 0.0147 | 0.0448 | 0.0595 |
n | ABias2 | Avar | AIMSE |
100 | 0.0017 | 0.2144 | 0.2161 |
200 | 0.0011 | 0.1044 | 0.1055 |
500 | 0.0001 | 0.0497 | 0.0498 |
1000 | 0.0000 | 0.0244 | 0.0245 |
To investigate the performance of the proposed estimator when the response Yi follows a non-normal distribution, we now allow the regression error ε have a non-symmetric distribution centered at 0. Namely, the regression errors are independently and identically simulated from a Gamma(1.0, l.5) and then shifted to have mean 0. We report the simulation result in Table 2. Although, the approaches based on W tend to have smaller AIMSEs for smaller sample sizes, our approach tend to do comparably well for sample size 500 and dominates for large sample size (1000) in term of AIMSE. Our approach () along with also have have very low bias. Again, the naive approaches , and preform poorly and have non-diminishing biases.
TABLE 2.
n | ABias2 | Avar | AIMSE |
100 | 0.0009 | 0.3963 | 0.3972 |
200 | 0.0014 | 0.1887 | 0.1901 |
500 | 0.0001 | 0.0895 | 0.0895 |
1000 | 0.0000 | 0.0438 | 0.0438 |
n | ABias2 | Avar | AIMSE |
100 | 0.0380 | 0.2301 | 0.2681 |
200 | 0.0395 | 0.1078 | 0.1474 |
500 | 0.0390 | 0.0519 | 0.0909 |
1000 | 0.0391 | 0.0252 | 0.0642 |
n | ABias2 | Avar | AIMSE |
100 | 0.0380 | 0.2306 | 0.2686 |
200 | 0.0395 | 0.1079 | 0.1474 |
500 | 0.0389 | 0.0520 | 0.0910 |
1000 | 0.0391 | 0.0254 | 0.0645 |
n | ABias2 | Avar | AIMSE |
100 | 0.0140 | 1.0100 | 1.0241 |
200 | 0.0145 | 0.3434 | 0.3579 |
500 | 0.0148 | 0.1974 | 0.2121 |
1000 | 0.0145 | 0.0941 | 0.1086 |
n | ABias2 | Avar | AIMSE |
100 | 0.0012 | 0.4335 | 0.4346 |
200 | 0.0013 | 0.2122 | 0.2135 |
500 | 0.0001 | 0.1012 | 0.1013 |
1000 | 0.0001 | 0.0496 | 0.0497 |
We now assess how the size of error terms of u(t) and W(t) affect the proposed estimation method. For ρX = ρu = ρM = 0, σ = 1, σX = 4, n = 500, we consider different combinations of (σu, σω) with potential values of σu, σω ranging from 0.5, l, 4 to l6. Thus, the signal to noise ratio in the measurement error and instrumental variable equation were 8, 4, 1 or 0.25. Table 3 summarizes our simulation results from the various set-ups. We found that increasing the error sizes associated with either the measurement error or the instrumental variable lead to larger AIMSEs. In addition, the error in the instrumental variable had a larger effect on the accuracy of our estimated β(t) when compared to the impact of the measurement errors. We also note that the AIMSEs for (σu = 1, σω = 16) was more than four times larger than those for (σu = 16, σω = 1). Although the naive and IV approaches tended to perform comparably for smaller values of σu, our IV approach dominates the naive approaches for larger measurement error and βs has the worse performance. But changes in the IV error variance have little effect on the AIMSE estimates for the naive approaches since IVs are completely ignored in the naive estimation. Therefore, it is not surprising naive approaches have smaller AIMSEs than our IV approach. We report the performance of the naive estimator in Section S.2 of the supplementary Material.
TABLE 3.
σω = 1 | σu = 1 | ||||||
---|---|---|---|---|---|---|---|
σu | ABias2 | Avar | AIMSE | σω | ABias2 | Avar | AIMSE |
0.5 | 0.0001 | 0.0506 | 0.0507 | 0.50 | 0.0001 | 0.0488 | 0.0489 |
1 | 0.0001 | 0.0504 | 0.0505 | 1.00 | 0.0001 | 0.0504 | 0.0505 |
4 | 0.0001 | 0.0497 | 0.0498 | 4.00 | 0.0001 | 0.0794 | 0.0795 |
16 | 0.0028 | 0.1070 | 0.1097 | 16.00 | 0.0003 | 0.4784 | 0.4787 |
We are also interested in investigating the impact of the correlation in the error terms affect on our estimated coefficient. To do this, we simulated data with σ = 1, σX = σu = 4, σω = 1 and n = 500, under varying degrees of correlations in εX(t), u(t), and W(t) with ρX, ρu, ρω = 0, 0.25, 0.5 or 0.75, corresponding to none to strong correlation in the error terms. Table 4 indicates that larger correlation in εX(t) lead to larger AIMSEs and less accurate estimate of the coefficient function, due to increased multi-collinearlity in predictor variables. However, correlations in the measurement error u(t) and instrument error W(t) have less impact on the coefficient function estimation. This is due to the fact that these errors are independent of each other and of the covariate X(t). Similar to , the degree of correlation in X(t) is more relevant for the performance of our proposed estimator.
TABLE 4.
ρX | ρu | ρM | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.25 | 0.5 | 0.75 | 0.25 | 0.5 | 0.75 | 0.25 | 0.5 | 0.75 | |
ABias2 | 0.0001 | 0.0001 | 0.0002 | 0.0003 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
Avar | 0.0497 | 0.0540 | 0.0775 | 0.1521 | 0.0485 | 0.0483 | 0.0482 | 0.0552 | 0.0603 | 0.0652 |
AIMSE | 0.0498 | 0.0541 | 0.0776 | 0.1524 | 0.0486 | 0.0484 | 0.0483 | 0.0553 | 0.0604 | 0.0653 |
4.2 |. Tuning parameter selection
Our proposed method requires specification of the number of bases beforehand by the practitioner. In non-parametric settings, selection of the number of basis functions amounts to a model selection problem. Additionally, it is well known that model selection in measurement error settings are complex44. In this manuscript, we provide an approach based on a 5-fold cross- validation for the selection of the number of basis functions. For each choice of possible number of bases, the original data set is divided into 5 non-overlapping subsets. The model parameters are then estimated repeatedly by excluding one of the subsets of the original data under each estimation. The mean prediction error of the fitted model, using W(t) in lieu of X(t), is estimated based on each data subset withheld, averaging over the 5 data subsets. Subsequently, the number of basis functions associated with the smallest mean prediction error is selected as the number of bases. Plots of the estimated mean prediction error for the function considered in our simulation studies were obtained. As an example, we plotted the prediction errors for one simulation run with sample size n = 500. Based on this plot, the number of bases selected was 5, see Figure 1.
5 |. APPLICATION
In this section, we describe the application of our methods to the motivating example. Students enrolled in the study were followed over an eighteen month period. The study design was a cluster randomized trial where teachers within three schools in the College Station Independent School District were randomly assigned to receive either the treatment (stand-biased desks) or control (traditional desks)5. The data contain measurements obtained at baseline and at the beginning of each semester over two academic years. An objective of the study was to investigate the relationship between energy expenditure behavior at baseline and the 18-month change in body mass index (BMI) from baseline among the students. Thus, an outcome of interest was the difference or change in BMI values from baseline to 18 months post follow up. The count of steps represents the number of steps taken over a given period of time and is an indicator of a subject’s physical activity levels. Current guidelines for recommended daily physical activity levels are based on the duration of time spent in either moderate or vigorous intensity activity levels and number of steps per day7,45,46,47,48. For example,47 indicated that activity levels of 12,000 steps/day and 15,000 steps/day for boys and girls, respectively were recommended for maintenance of healthy body composition for children between the ages of 6–12 years. While daily energy expenditure is defined as the total number of calories or energy used by the body to perform daily bodily functions.
In our application, energy expenditure and step counts were both collected per minute from the SenseWear Armband® (Body-Media, Pittsburgh, PA) among the 374 children enrolled in the study who wore accelerometers while in school for one week at baseline. The children’s body weight, height, age, and sex were all collected at baseline, while their BMI’s were calculated at the beginning of each semester over the study period. True daily energy expenditure behavior, X(t), was considered the latent covariate. The surrogate measure for X(t) was the energy expenditure taken per hour obtained from the device, W(t). Step counts measured by the device was treated as the instrumental variable in this application, M(t). We assume that cov{X(t), M(t)} ≠ 0 and cov{M(t), U(t)} = 0. Justification of the use of instrumental variables is challenging in practice. However, an instrumental variable may be based on a separate independent measure of X(t). In our application, both M(t) and W(t) were obtained from the same device. But their measured or calculated measures were obtained separately. The SenseWear Armband® obtained the step count based on a 3-axis accelerometer and pattern recognition. While the calculation of total energy expenditure was based on heat flux, skin temperature, galvanic skin response, and anthropometrics49. A description of the final analytic sample is provided in Table 5.
TABLE 5.
Variable | Mean(s.d.)/ N(%) |
---|---|
BMI at baseline (kg/m2) | 17.40(2.98) |
BMI in Spring Year 2 (kg/m2) | 17.55(3.18) |
Average Step Counts/hour | 13.16(11.51) |
Average EE (kcal/hour) | 1.2(0.41) |
Age (years) | 8.79(0.76) |
Whites | 174(68.24 %) |
Blacks | 34(13.33 %) |
Hispanics | 25(9.80 %) |
Other | 22(8.63 %) |
Boys | 132(51.76 %) |
Girls | 123(48.24 %) |
Treatment | 148(58.04 %) |
Control | 107(41.96 %) |
To assess impacts of energy expenditure obtained at baseline on the difference in BMI values among the enrolled students, we first assumed that both W and M were discretely observed on a time interval [0, T]. On average, the students wore the devices for six hours on each school day during the week it was worn at baseline. Since the accelerometry data were collected per minute, we combined all the data for the week the device was worn and averaged all the minute-level data collected within the week to hourly-level data to reduce any potential noise associated with the data collection. Figure 2 provides the plot of Wi(t) and Mi(t) against time for all subjects included in the study. The grey lines illustrate the individual trajectories while the blue solid line is the smoothed mean for the observed energy expenditure and step counts among all the subjects.
Two sets of analyses were performed to illustrate our developed methods. We first assessed the relationship between energy expenditure and BMI at baseline. The second analysis involved investigating the impact of energy expenditure at baseline on changes in BMI values at 18 months follow up. Due to loss of follow up or missing data, 255 and 156 students contributed to the baseline and the 18-month follow up analyses, respectively.
The average BMI values at baseline was 17.4 kg/m2(SD = 2.98) and 17.6 kg/m2(SD = 3.2) during the spring semester of the second academic year. The mean step counts per hour at baseline was 13.16 (SD = 11.5) and the mean energy expenditure at baseline was 1.21 kcal/hour (SD = 0.41), while the average age of the children at baseline was 7.9 years (SD = 0.80). About n = 174(68.24%) were whites, blacks n = 34(13.33%), Hispanics n = 25(9.8%) and others n = 22(8.63%). See Table 5 for additional details.
5.1 |. Results
5.1.1 |. Impacts of error-free covariates on outcomes
The error free covariates collected from the study include the student’s school, teacher, ethnicity, grade, age, gender and treatment assignment group. To adjust for these error free covariates as well as the cluster randomized setting of the study design, we first performed random effects analyses of the error free covariates against the outcomes. A random intercept for the nested effects of teachers nested within schools was included in the models. We also fitted a random effect term for both schools and teachers nested within schools, however, the models failed to converge. The error free adjusted residuals were subsequently obtained from the regression fits from the mixed effects model with the random intercept term for teacher within school.
Two sets of mixed effects analyses were performed. The first analysis focused on BMI at baseline as the outcome. The second analysis focused on 18-month change in BMI from baseline as the outcome. The results from the error free analyses of both the baseline and follow up data are included in Table 6. Overall, we found that age had a significant impact on the BMI values at both baseline and at 18 months post baseline (p < 0.0001 and p = 0.04). Additionally, there were statistically significant differences in the race effect when we compared the differences in BMI between students from ethnically minority populations (blacks and Hispanics) to the white students at both baseline and follow up (p < 0.0001). Specifically, we found that after controlling for all other covariates included in the model, the BMI values for the black and Hispanic students were 0.08 and 0.06 higher on average than the BMI values for the white students at baseline. While at follow-up, we found the BMI values for the black and Hispanic students to be 0.06 and 0.03 higher on average than the BMI values for the study students after controlling for age, school, teacher, baseline levels of BMI, and treatment assignment. No statistically significant difference was observed between the other race category when compared to the white students included in the study at baseline and follow up (p = 0.15 and p = 0.07). There were also no differences in the average BMI values between the schools, teachers, grades, and treatments at both baseline and follow up (p > 0.05).
TABLE 6.
Baseline Model | Follow up Model | ||||||
---|---|---|---|---|---|---|---|
Effect | Estimate | S.E. | P-value | Effect | Estimate | S.E. | P-value |
Intercept | 2.53 | 0.15 | < 0.0001 | Intercept | −0.03 | 0.06 | 0.62 |
Age | 0.03 | 0.004 | < 0.0001 | Age | −0.005 | 0.002 | 0.04 |
School 1 vs. School 3 | −0.08 | 0.11 | 0.46 | School 1 vs. School 3 | −0.07 | 0.06 | 0.29 |
School 2 vs. School 3 | −0.06 | 0.26 | 0.83 | School 2 vs. School 3 | −0.18 | 0.16 | 0.30 |
Teacher | 0.003 | 0.02 | 0.87 | Teacher | 0.01 | 0.01 | 0.28 |
Grade 2 vs. Grade 4 | −0.003 | 0.14 | 0.98 | Grade 2 vs. Grade 3 | 0.04 | 0.05 | 0.44 |
Grade 3 vs. Grade 4 | 0.005 | 0.11 | 0.97 | Log BMI at baseline | 1.02 | 0.007 | < 0.0001 |
Black vs. White | 0.08 | 0.006 | < 0.0001 | Black vs. White | 0.06 | 0.004 | < 0.0001 |
Hispanic vs. White | 0.06 | 0.006 | < 0.0001 | Hispanic vs. White | 0.03 | 0.004 | < 0.0001 |
Other vs. White | 0.01 | 0.006 | 0.15 | Other vs. White | −0.006 | 0.003 | 0.07 |
Girls vs. Boys | −0.02 | 0.003 | < 0.0001 | Girls vs. Boys | −0.001 | 0.002 | 0.73 |
Treatment vs. Control | 0.02 | 0.06 | 0.77 | Treatment vs. Control | −0.04 | 0.03 | 0.28 |
Teacher(School) | 0.01 | 0.003 | 0.003 | Teacher(School) | 0.001 | 0.0003 | 0.03 |
Residual | 0.02 | 0.0003 | < 0.0001 | Residual | 0.004 | 0.0001 | < 0.0001 |
5.1.2 |. Impact of baseline levels of energy expenditure on BMI
Residuals obtained from the mixed effects assessments of the impacts of the error free covariates on the outcomes at were obtained from the baseline and follow up analyses the following model
where , , Zijk =(log(BMIFall14) ethnicity, grade, age, gender, treatment, teacher, school)⊤, i = 1, …,157 students, j = 1, …,3 schools, k = 1, …,8 teachers (nested within schools). These residuals were subsequently used as the outcomes in our measurement error models. Thus, the outcome assessing the effects of energy expenditure on BMI were the error free and cluster randomized design adjusted residuals for the baseline measures of BMI for the first analyses and for the difference between BMI obtained at baseline and the BMI obtained at end of the study for the second analyses. Six knots were used in the application, while nonparametric bootstraps were used for computing the 95% point-wise confidence intervals for .
We provide the results from the baseline analyses and the follow up analyses in Figure 3. Plots of the estimated functional coefficient and the estimated 95% point-wise confidence intervals are provided in the figure. For assessments of the impact of energy expenditure on BMI at baseline, the bootstrap confidence intervals did not contain the zero line completely, indicating that the functional coefficient was not zero across the whole time space. Similarly, in determining the impacts of baseline measures of energy expenditure on the 18-month change in BMI over the study period, the estimated bootstrap confidence intervals did not contain the zero line completely. Because the function-valued coefficient was not completely zero across time, there was some statistical evidence of a relationship between baseline measures of energy expenditure and BMI values obtained at a future time, such as 18 months post baseline. Additionally, the relationship observed depended on both the level of energy expenditure and time.
5.1.3 |. Impact of measurement error on the analyses
In addition to our method of moments-based instrumental variable estimator, we also obtained naive estimators of the effects of energy expenditure on BMI see Figure 3. As illustrated in both sets of analyses, the approaches obtained without accounting for measurement error appeared notably different from the estimators obtained from the instrumental variable based approaches. Based on Figure 3, the impacts of measurement error on both sets of analyses depended on time. While it is well known in simple linear regression models that the effects of measurement on estimation is to attenuate its effects towards zero, its impact in this functional linear regression setting is more complex. For both sets of analyses, we found that the measurement error adjusted function-valued coefficients tended to be larger than the naive coefficient. However, the naive estimate of β(t) at baseline was found to be larger than the measurement error adjusted at the beginning and the end of the observational period.
5.2 |. Discussion
50 recently studied the relationship between baseline energy expenditure and the three-year change in BMI among 182 five to ten year old children with overweight and obesity health conditions in Australia. Using regression analysis and change in BMI Z-scores, the authors concluded that baseline measures of energy expenditure significantly impacted the three-year change in BMI among the children. However, our current results indicated that baseline levels of energy expenditure did have some statistically significant relationships on the future body weights among children, however, these impacts depended on activity levels and the time of activity.
In this manuscript, we developed an instrumental variable approach for addressing potential measurement errors associated with function-valued covariates in scalar on function regression models. The developed methods can be used for assessments of the impacts of data collected on biological markers obtained repeatedly over a dense time space on health outcomes. A limitation of our current approach is that the instrumental variable must be collected on the same time period as the unbiased measure for the true covariate. Thus, the developed methods are applicable for devices that collect data on multiple biological markers over the same time period.
Our current approach does not allow inclusion of random effects of error-free covariates directly into (1) to account for cluster randomized or impacts of demographics. Some future work in this area include accounting for multi-level designs as well allowing the inclusion of error free covariates. Finally, the current methods are based on assessing impacts of energy expenditure on health outcomes using mean regression methods. It will be interesting to discover how accounting for measurement errors associated with function-valued covariates work in model settings that permit robust modeling of BMI such as quantile regression or other generalized robust model settings.
6 |. CONCLUSION
We studied the scalar on function regression model with measurement error. In this setting, we considered a scalar valued outcome with a functional covariate that was corrupted by measurement error. Most existing methods either implicitly assume the measurement errors are independent over time, or the measurement error covariance is known or can be estimated. However, the measurement errors are likely to be correlated over time. In addition, the measurement error variances are never known and estimates are seldom available. In this paper, we took advantage of the additional information provided in an instrument variable and developed a generalized methods of moments-based approach to identify and consistently estimate the functional regression coefficient. To our knowledge, it is the first in the literature to use instrument variable approach to address the measurement error problem in the scalar on function regression model. Using B-spline basis expansions, we re-parameterized the functional linear regression model to a multiple linear regression model with measurement error. The function-valued coefficient was estimated by first identifying the model using a function-valued instrumental variable observed on the same time space as the surrogate measure, while the generalized methods of moments approach was used for estimation. The proposed methodology was motivated by a childhood obesity study focused on assessing the relationship between energy expenditure and subsequent progression to obesity among elementary school-aged children. We successfully applied our proposed model to conclude that the estimated association between baseline measures of energy expenditure and the 18-month change in BMI was sometimes significant. This association indicated that school programs and policies that increase physical activity among students might have some beneficial impact. In an effort to combat childhood obesity, physical activity policies within school are implemented to encourage more physical activity behavior among children. Our developed methods improves on the current statistical approaches used to evaluate the effectiveness of such policies.
Finally, our simulation studies indicated the importance of accounting for measurement errors when a function-valued covariate in functional linear regression model is suspected to be imprecisely observed. Failure to account for the measurement errors can lead to severely biased estimates.
Supplementary Material
ACKNOWLEDGMENTS
Tekwe’s research was supported by National Cancer Institute Supplemental Award Number U01-CA057030–29S2. Zoh’s research was supported by National Cancer Institute Supplemental Award Number U01-CA057030–29S1. Carroll’s research was supported by National Cancer Institute Award Number U01-CA057030. Allison’s research was supported by R25DK099080 and R25HL124208. Xue’s research was supported by Simons Foundation Award Number 272556. The research reported in this publication was also supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R21HD068841.
APPENDIX.
A SKETCH OF TECHNICAL ARGUMENTS
We denote the B-spline basis of degree p on [0, 1] as . For notational convenience, we use a scaled B-spline basis in the proof, which is defined as for k = 1, …,Kn. With some abuse of notation, we still denote for the scaled B-spline basis for simplicity.
By29, there exists a set of coefficients and a spline function such that for some constant c > 0. Let . Then one can write
Therefore,
where and are centered versions of , εi, and respectively. Thus, by Lemma 1 in supplementary materials included in the Web Appendix, there exists a constant c > 0, such that
By Lemma 4 in the Web Appendix, there exist c,C > 0 such that
By Lemmas 2,3,6 in the Web Appendix, one has .
Finally, an error decomposition gives that
Footnotes
Financial disclosure
None reported.
Conflict of interest
Mark Benden notes that he has a financial conflict of interest on file with Texas A&M University as the stand-biased desks used in this study are derived from one of his 20 US Patents. This intellectual property was licensed by Texas A&M University to Stand2Learn, LLC for commercialization. He was not involved in data analysis or collection but instead focused on the experimental design and background for this article. The other authors of this paper do not have conflicts of interest to disclose.
SUPPORTING INFORMATION
Supplementary Materials are available online as part of this article. These materials provide additional theoretical and simulation results relevant to our proposed method and its comparison with the naive approaches.
References
- 1.CDC. Childhood obesity facts 2017. https://www.cdc.gov/healthyschools/obesity/facts.htm.
- 2.Salmon J Novel strategies to promote childrenâĂŹs physical activities and reduce sedentary behavior. Journal of Physical Activity and Health. 2010;7(s3):S299–S306. [DOI] [PubMed] [Google Scholar]
- 3.Wechsler H, Devereaux RS, Davis M, Collins J. Using the school environment to promote physical activity and healthy eating. Preventive Medicine. 2000;31(2):S121–S137. [Google Scholar]
- 4.Lanningham-Foster L, Foster RC, McCrady SK, et al. Changing the school environment to increase physical activity in children. Obesity. 2008;16(8):1849–1853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Benden ME, Blake JJ, Wendel ML, Huber JC. The impact of stand-biased desks in classrooms on calorie expenditure in children. American Journal of Public Health. 2011;101(8):1433–1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kastellorizios M, Burgess DJ. Continuous metabolic monitoring based on multi-analyte biomarkers to predict exhaustion. Scientific Reports. 2015;5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Matthews CE, Hagströmer M, Pober DM, Bowles HR. Best practices for using physical activity monitors in population-based research. Medicine and Science in Sports and Exercise. 2012;44(1 Suppl 1):S68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mader JK, Feichtner F, Bock G, et al. MicrodialysisâĂŤA versatile technology to perform metabolic monitoring in diabetes and critically ill patients. Diabetes Research and Clinical Practice. 2012;97(1):112–118. [DOI] [PubMed] [Google Scholar]
- 9.Stuckey M, Fulkerson R, Read E, et al. Remote monitoring technologies for the prevention of metabolic syndrome: the Diabetes and Technology for Increased Activity (DaTA) study. Journal of Diabetes Science and Technology. 2011;5(4):936–944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Butte NF, Ekelund U, Westerterp KR. Assessing physical activity using wearable monitors: measures of physical activity. Medicine and Science in Sports and Exercise. 2012;44(1S):S5–S12. [DOI] [PubMed] [Google Scholar]
- 11.Muaremi A, Arnrich B, Tröster G. Towards measuring stress with smartphones and wearable devices during workday and sleep. BioNanoScience. 2013;3(2):172–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Silverman BW, Ramsay JO. Functional data analysis. Springer; 2005. [Google Scholar]
- 13.Ramsay JO, Dalzell CJ. Some tools for functional data analysis. Journal of the Royal Statistical Society. Series B (Methodological). 1991;:539–572. [Google Scholar]
- 14.Müller HG. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics. 2005;32(2):223–240. [Google Scholar]
- 15.Faraway JJ. Regression analysis for a functional response. Technometrics. 1997;39(3):254–261. [Google Scholar]
- 16.Yao F, Müller HG, Wang JL, others Functional linear regression analysis for longitudinal data. The Annals of Statistics. 2005;33(6):2873–2903. [Google Scholar]
- 17.James GM, Wang J, Zhu J. Functional linear regression that’s interpretable. The Annals of Statistics. 2009;:2083–2108. [Google Scholar]
- 18.Ramsay JO. Functional data analysis. Wiley Online Library; 2006. [Google Scholar]
- 19.Crambes C, Kneip A, Sarda P. Smoothing splines estimators for functional linear regression. The Annals of Statistics. 2009;:35–72. [Google Scholar]
- 20.Ferraty F, Vieu P. Nonparametric functional data analysis: theory and practice. Springer Science & Business Media; 2006. [Google Scholar]
- 21.Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association. 1998;93(444):1403–1418. [Google Scholar]
- 22.Cardot H Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. Journal of Nonparametric Statistics. 2000;12(4):503–538. [Google Scholar]
- 23.Yao F, Müller HG, Clifford AJ, et al. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics. 2003;59(3):676–685. [DOI] [PubMed] [Google Scholar]
- 24.Chiou JM, Müller HG, Wang JL. Functional quasi-likelihood regression models with smooth random effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2003;65(2):405–423. [Google Scholar]
- 25.Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100(470):577–590. [Google Scholar]
- 26.Cai X. Methods for handling measurement error and sources of variation in functional data models. 2015;.
- 27.Zhang D, Lin X, Sowers MF. Two-stage functional mixed models for evaluating the effect of longitudinal covariate profiles on a scalar outcome. Biometrics. 2007;63(2):351–362. [DOI] [PubMed] [Google Scholar]
- 28.Tekwe CD, Zoh RS, Bazer FW, Wu G, Carroll RJ. Functional multiple indicators, multiple causes measurement error models. Biometrics. 2018;74(1):127–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.De Boor C Calculation of the smoothing spline with weighted roughness measure. Mathematical Models and Methods in Applied Sciences. 2001;11(01):33–41. [Google Scholar]
- 30.Carroll RJ, Ruppert D, Stefanski L, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective, second Edition Chapman and Hall; 2006. [Google Scholar]
- 31.Carroll RJ, Stefanski LA. Measurement error, instrumental variables and corrections for attenuation with applications to meta-analyses. Statistics in Medicine. 1994;13(12):1265–1282. [DOI] [PubMed] [Google Scholar]
- 32.Angrist J, Krueger AB. Instrumental variables and the search for identification: from supply and demand to natural experiments. : National Bureau of Economic Research; 2001. [Google Scholar]
- 33.Tekwe CD, Carter RL, Cullings HM, Carroll RJ. Multiple indicators, multiple causes measurement error models. Statistics in Medicine. 2014;33(25):4469–4481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Greenland S An introduction to instrumental variables for epidemiologists. International Journal of Epidemiology. 2000;29(4):722–729. [DOI] [PubMed] [Google Scholar]
- 35.Fuller WA. Measurement Error Models. John Wiley & Sons; 2009. [Google Scholar]
- 36.Hu Y, Schennach SM. Instrumental variable treatment of nonclassical measurement error models. Econometrica. 2008;76(1):195–216. [Google Scholar]
- 37.Tekwe CD, Carter RL, Cullings HM. Generalized multiple indicators, multiple causes measurement error models. Statistical Modelling. 2016;16(2):140–159. [Google Scholar]
- 38.Florens JP, Van Bellegem S. Instrumental variable estimation in functional linear models. Journal of Econometrics. 2015;186(2):465–476. [Google Scholar]
- 39.Huang JZ, others Projection estimation in multiple regression with application to functional ANOVA models. The Annals of Statistics. 1998;26(1):242–272. [Google Scholar]
- 40.Xue L, Yang L. Additive coefficient modeling via polynomial spline. Statistica Sinica. 2006;:1423–1446. [Google Scholar]
- 41.Wang L, Yang L. Spline estimation of single-index models. Statistica Sinica. 2009;:765–783. [Google Scholar]
- 42.Li Y, Hsing T. On rates of convergence in functional linear regression. Journal of Multivariate Analysis. 2007;98(9):1782–1804. [Google Scholar]
- 43.Cardot H, Ferraty F, Sarda P. Spline estimators for the functional linear model. Statistica Sinica. 2003;:571–591. [Google Scholar]
- 44.Ma Y, Li R. Variable selection in measurement error models. Bernoulli: Official Journal of the Bernoulli Society for Mathematical Statistics and Probability. 2010;16(1):274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Locke-Tudor C, Craig CL, Brown WJ, others How many steps/day are enough? For adults. International Journal Behavior Nutrition Physical Activity. 2011;8:1–17. [Google Scholar]
- 46.Welk GJ, Differding JA, Thompson RW, Blair SN, Dziura J, Hart P. The utility of the Digi-walker step counter to assess daily physical activity patterns.. Medicine and Science in Sports and Exercise. 2000;32(9 Suppl):S481–8. [DOI] [PubMed] [Google Scholar]
- 47.Tudor-Locke C, Pangrazi RP, Corbin CB, et al. BMI-referenced standards for recommended pedometer-determined steps/day in children. Preventive Medicine. 2004;38(6):857–864. [DOI] [PubMed] [Google Scholar]
- 48.Adams MA, Johnson WD, Tudor-Locke C. Steps/day translation of the moderate-to-vigorous physical activity guideline for children and adolescents. International Journal of Behavioral Nutrition and Physical Activity. 2013;10(1):49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lee JA, Laurson KR. Validity of the SenseWear armband step count measure during controlled and free-living conditions. Journal of Exercise Science & Fitness. 2015;13(1):16–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Trinh A, Campbell M, Ukoumunne OC, Gerner B, Wake M. Physical activity and 3-year BMI change in overweight and obese children. Pediatrics. 2013;131(2):e470–e477. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.