Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Dec 14.
Published in final edited form as: J Multivar Anal. 2024 Dec 14;207:105400. doi: 10.1016/j.jmva.2024.105400

Quadratic inference with dense functional responses

Pratim Guha Niyogi a,*, Ping-Shou Zhong b
PMCID: PMC12320752  NIHMSID: NIHMS2044682  PMID: 40766879

Abstract

We address the challenge of estimation in the context of constant linear effect models with dense functional responses. In this framework, the conditional expectation of the response curve is represented by a linear combination of functional covariates with constant regression parameters. In this paper, we present an alternative solution by employing the quadratic inference approach, a well-established method for analyzing correlated data, to estimate the regression coefficients. Our approach leverages non-parametrically estimated basis functions, eliminating the need for choosing working correlation structures. Furthermore, we demonstrate that our method achieves a parametric n-convergence rate, contingent on an appropriate choice of bandwidth. This convergence is observed when the number of repeated measurements per trajectory exceeds a certain threshold, specifically, when it surpasses na0, with n representing the number of trajectories. Additionally, we establish the asymptotic normality of the resulting estimator. The performance of the proposed method is compared with that of existing methods through extensive simulation studies, where our proposed method outperforms. Real data analysis is also conducted to demonstrate the proposed method.

Keywords: Constant Linear-Effect Models, Functional Principal Component Analysis, Quadratic Inference, Semi-parametric Functional Regression

1. Introduction

Longitudinal data analysis (LDA) involves tracking repeated measurements on the same individuals over time, allowing us to study changes over time and identify influencing factors. Unlike cross-sectional studies, which capture only “between-individual” responses at a single time point, LDA can capture “within-individual” changes through repeated measurements. Longitudinal data are often observed in clusters, with each cluster representing measurements from one individual. As data complexity increases with advancing technology, functional data analysis (FDA) has become a vital tool, extending our understanding from finite to infinite dimensions. In LDA, data are generally observed with noise for measurements at each time-point [6, 16]. Moreover, a few repeated measurements are required in LDA, and the data are observed sparsely with noise. On the other hand, in FDA, data are densely observed as a continuous-time stochastic process without or with noise [22,37]. Often, the sampling plan can have an effect on the performance of the estimation procedures and inference [14]. In some situations, data are typically functions by nature and are observed densely over time. Chiou et al. [4] proposed a class of semi-parametric functional regression models to describe the influence of vector-valued covariates on a sample of the response curve. When data collection leads to experimental error, smoothing is performed at closely spaced time-points in order to reduce the effect of noise. The current developments of functional regression techniques have been rigorously studied in Chen et al. [3], Fan and Zhang [8], Hall and Horowitz [13]. See a more recent and complete review in Li et al. [22].

The generalized estimating equation (GEE) technique proposed by Liang and Zeger [23] has been extensively used in LDA for estimation of parameters. Although an efficient technique, the GEE may not be efficient when the correlation matrix is not correctly specified. Hence, without requiring the estimation of the correlation parameters, the quadratic inference function (QIF) approach proposed by Qu et al. [29] is useful for parameter estimation in longitudinal studies [6] and cluster randomized trials [31]. By representing the inverse of the working correlation matrix in terms of linear combinations of the basis matrices and involving multiple sets of score functions, the QIF approach has improved efficiency over GEE when the working correlation matrix is not specified correctly. Although it maintains the same efficiency as in the situation where the working correlation matrix is specified correctly, the QIF method is not independent of the choice of the working correlation matrix. A QIF method-based approach to varying-coefficient models for longitudinal data was proposed by Qu and Li [28]. The related work of Bai et al. [1] is an extension of QIF for the partial linear model. An alternative method was presented in Yu et al. [34] where each set of score equations was solved separately and their solutions were combined afterward; thereby providing results on inference for an optimally weighted estimator and extending those insights to the general setting with over-identified estimating equations.

The fundamental limitations that all the above-mentioned powerful techniques suffer from are: (1) all the above methodologies require prior information on the working correlation structure or choose appropriate basis matrices, and (2) the performance of the classical QIF approach is unknown for dense functional data. Our study is motivated by problems from multiple real data applications that involve dense functional data when information on the working correlation structure is lacking. Let us discuss two motivating examples that we will use to illustrate the proposed method in this paper (see Section 5 for details). In the Beijing2017-data example, particulate matter (PM) with a diameter of less than 2.5 micrometers is collected over different time-points in different locations in China. Scientists are interested in knowing the linear dependence of the pollution factor PM2.5 with other atmospheric chemicals [24]. Figure 1 (left) in Section 5 pictorially demonstrates the readings of PM2.5 for the given locations over several hourly time-points; therefore, dense functional data analysis can be implemented. In another example from a neuroimaging study Apnea-data, scientists are interested in modeling the change of white matter structure among voxels in each region of interest (ROI) of the human brain. Xiong et al. [32] investigated white matter structural alterations using diffusion tensor imaging (DTI) in obstructive sleep apnea (OSA) patients. Here, the change of DTI parameters such as fractional anisotropy (FA) is investigated by using constant linear effect models (see subsection 2.1 for a formal definition) with the interaction of the count of lapses obtained from the Psychomotor Vigilance Task and voxel locations as predictors. We applied this model to each ROI and compared the results obtained from all the ROIs.

Fig. 1:

Fig. 1:

Beijing2017-data: (Left) Reading of hourly PM2.5 measures for twelve different locations over 608 hourly time-points during January 2017. (Right) Scree plots of a fraction of variance explained (FVE) to determine the optimal number of components to retain in functional principal component analysis. The red line indicates the cumulative fraction of variance across the number of components.

We propose a data-driven way to select the working covariance matrix and express the inverse of the covariance function in terms of the empirical eigen-functions of the covariance operator. The covariance operator can be estimated as in Hsing and Eubank [18] and other related methods based on functional principal component analysis (FPCA) as found in Dauxois et al. [5], Hall and Horowitz [13], Hall and Hosseini-Nasab [14], Li and Hsing [21], Yao et al. [33], Zhang and Li [35]. Note that the estimation of the eigen-functions is nonparametric and it introduces some errors in the proposed estimation method. In this article, we answer the following question: while we estimate the eigen-functions nonparametrically from data, is the estimation of coefficient vectors in a semi-parametric problem n-consistent in dense functional data, and can we achieve asymptotic normality? The advantages of our proposed method are the following: First, our method preserves the good properties of the QIF method and is easier to implement since the eigen-functions can be estimated using the existing packages in statistical software such as R. Second, under some mild conditions, our proposed estimator can obtain the optimal convergence rate and is asymptotically normally distributed with less variance as compared to the classical QIF methods. Third, asymptotic results show the estimation accuracy of the coefficients in semi-parametric functional model, therefore, making the influence of the dimension reduction step using FPCA redundant. The error in the estimation of the eigen-functions contributes to the error in the estimation of the parameters. Under some mild bandwidth conditions, the above-mentioned error contribution is of the same order of magnitude as an error in parameter estimation if eigen-functions are known in advance.

The rest of the paper is organized as follows. In Section 2, we introduce the basic concept of QIF along with our proposed method. The asymptotic results for the proposed estimator are presented in Section 3. In Section 4, we demonstrate the performance for finite samples. We also apply the proposed method to real data-sets in Section 5. We conclude with some remarks in Section 6. All technical proofs are presented in the Appendix of the article. The supplementary material contains additional tables from simulation results.

2. Functional response model and estimation procedure

2.1. Basic model

To analyze longitudinal data, a straightforward application of a generalized linear model [27] for single response variables is not applicable due to the lack of independence between repeated measures. To account for the high correlation in the longitudinal data, some special techniques are required. A seminal work by Liang and Zeger [23] proposed the use of generalized linear models for the analysis of longitudinal data. The model we consider in this article is commonly observed in spatial modeling, where associations among variables do not change over the functional domain (see Zhang and Banerjee [36] and references therein); which is termed as constant linear effects model. In neuroimaging studies, constant linear effects model is a popular choice for the region of interest analysis (see Friston et al. [9, 10], Lindquist [25], Xiong et al. [32]) due to its easy and practical interpretations of the constant coefficients. In this paper, the variable time is used as a functional domain variable.

Let y(t) be the response variable at time-point t and x(t) be p-dimensional covariates observed at time t𝒯 where without loss of generality we assume 𝒯=[0,1], is the spectrum of the time-points. The stochastic process y(t) is square-integrable with conditional mean E{y(t)x(t)} and finite covariance function; the regression parameter β is unknown and is to be efficiently estimated. Thus, the constant linear effects model with longitudinal data has the following expression,

y(t)=x(t)Tβ+e(t), (1)

where the stochastic process y(t) is decomposed into two parts: one is the mean function μ(t)=x(t)Tβ that depends on time-varying covariates and vector-valued coefficient vector β, and the other is the random error part e(t) where E{e(t)}=0 and has finite second-order covariance R(s,t). Let yi(t) be i.i.d. copies of the stochastic process and for each individual, the measurements are taken on mi discrete time-points Tij for j1,,mi;i{1,,n}. Therefore, at time Tij, we observe a mi×1 response vector yiTij and corresponding covariates xiTij for the i-th subject. We assume that mis are all of the same order as m=na for some a0, thus mi/m is bounded below and above by some constants. Functional data are considered to be sparse depending on the choice of a [14]. Data with bounded m or a=0 are called sparse functional data and if aa0 where a0 is a transition point, are called dense functional data. Moreover, the regions 0,a0 are sometimes referred to as moderately dense. Furthermore, we denote yij and xij as yiTij and xiTij respectively. Therefore, yi1,,yimiT and μi1,,μimiT are mi component vectors, denoted as yi and μi respectively. In addition, define residuals ei=yi-μi,mi×p design matrices Xi=xiji=1,j=1mip, and the derivative of μi with respect to β, denoted as μ˙i, is a mi×p matrix. For instance, in case of the linear model discussed in (1), μ˙i=Xi. We keep the notation μ˙i because it could be generalized for other types of responses.

In the classical problem of GEE, we estimate β by solving the quasi-likelihood equations [23]:

i=1nμ˙iTVi-1yi-μi=0. (2)

We denote Vi=vAi1/2Ri(ρ)Ai1/2 where Ri(ρ) is the working correlation matrix, v is an over-dispersion parameter and Ai is a diagonal matrix where entries are marginal variances Varyi1,,Varyimi. In this article, we simply set v=1 while the extension to a general v is straightforward. In practice, the prior knowledge of the working correlation matrix is unknown, and the estimation of the coefficient is influenced by its choice. Therefore, Qu et al. [29] suggested an expansion of the inverse of the working correlation matrix as R(ρ)-1=k=1κ0ak(ρ)Mk where Mk are some basis matrices. Zhou and Qu [38] modified linear representation by grouping the basis matrices into an identity matrix and some symmetric basis matrices. For example, if the working correlation matrix is exchangeable/compound symmetric, R(ρ)-1=c1Im+c2Jm where Im is the m×m identity matrix and Jm is the m×m matrix such that 0 is in diagonal and 1 is in off-diagonal positions. On the other hand, for first-order auto-regressive correlation matrix, R(ρ)-1=c1Im+c2Jm(1)+c3Jm(2) where Jm(1) is a matrix with 1 on the two main off-diagonal positions and 0 otherwise, Jm(2) is a matrix such that 1 is in the corner positions, viz. (1, 1) and (m,m) and 0 elsewhere. Here, cks are real constants that depend on the nuisance parameter ρ. Therefore, (2) reduces to the linear combination of the score vectors:

g¯(β)n-1i=1ngi(β)g¯(1)(β)g¯κ0(β)=n-1i=1nμ˙iTAi-1/2M1Ai-1/2yi-μin-1i=1nμ˙iTAi-1/2Mκ0Ai-1/2yi-μi, (3)

where g¯(k)(β)n-1i=1ng(k)(β)n-1i=1nμ˙iTAi-1/2MkAi-1/2yi-μi for k1,,κ0, each g¯(k)(β) is a p×1 vector and g¯(β) is pκ0×1 vector after stacking all g¯(k)(β)s. Due to the higher dimension of g¯, Qu et al. [29] used the generalized method of moments (GMM) [17] for which the method of estimation boils down to minimization of the quadratic inference function Q(β)=ng¯(β)TC^(β)-1g¯(β) where C^(β)=n-1i=1ng¯i(β)g¯i(β)T is the sample covariance matrix of (3). To obtain the solution of β, the Newton-Raphson method is used which iteratively updates the value of β.

2.2. Incorporating eigen-functions in QIF

Now, due to standard Karhunen-Loève expansions of ei(t)=yi(t)-μit [20, 26]

eit=r=1ξirϕrt, (4)

with probability 1, where uncorrelated random variables ξir=<ei,ϕr> have zero mean and variance λr for ordered eigen-values λr such that λ1λ20 and ϕrs are orthonormal eigen-functions such that ϕr(t)ϕl(t)dt=1(r=l). Due to Mercer’s theorem [19], a symmetric continuous non-negative definite kernel function R has the representation R(s,t)=r=1λrϕr(s)ϕr(t) where the sum is absolutely and uniformly continuous. We extract the main directions of the variation of the response variables using FPCA. In this situation, we take the first κ0 terms which provide a good approximation of the infinite sum in (4) by considering that the majority of the variations in the data are contained in the subspace spanned by few eigen-functions [3]. For finite κ01, we therefore consider the rank-κ0 FPCA model,

Eytxtμt+r=1κ0Eξrxtϕrt. (5)

Unlike existing models in Chiou et al. [4] where the main interest of the work is to study how the vector covariates influence the whole response curve, we study point-wise effect of covariates on functional response the influence of covariates vectors indirectly on the functional response through the regression of scores on covariates, via., Eξrx(t). An analogue of the truncated empirical version of (4) and Mercer’s representation can be provided easily, and we discuss the proposed method based on this truncated version. Moreover, we discuss how to choose κ0 in our situation in Sections 3 and 4 in detail.

In this paper, we propose a data-driven way to compute the basis matrices to obtain the approximate inverse of V as discussed earlier. In this approach, it is enough to find the eigen-functions to construct a GEE. Let us define

g¯(β)g¯(1)(β)g¯κ0(β)n-1i=1nμ˙iTΦ^i1yi-μin-1i=1nμ˙iTΦ^ik0yi-μi, (6)

where, for k=1,,κ0,Φ^ik=mi-2ϕ^kTijϕ^kTij'j,j'=1,,mi is an mi×mi symmetric matrix and

g¯(k)(β)=n-1i=1ngi(k)(β)=n-1i=1nμ˙iTΦ^ikyi-μi.

Since the dimension of g¯ in (6) is greater than the number of parameters to estimate, instead of setting g¯ to zero, we minimize quadratic function by β^=argminβQ(β) where Q(β)=ng¯(β)TC^(β)-1g¯(β) where, C^(β)=n-1i=1ngi(β)gi(β)T, where gi(β)=gi(1)T(β),,giκ0T(β)T and gi(k)(β)=μ˙iTΦ^ikyi-μi. For the existence of C^-1 we need the additional restriction: nκ0 where κ0 is the number of eigen-functions. Under the given set-up, by Equation (8) in Qu et al. [29] the estimating equation for β will be Q˙(β)2g¯˙(β)TC^(β)-1g¯(β). To obtain the solution of the above equation, we use a Newton-like method. In practice, the standard Newton method does not lead to a decrease in the objective function, that is, at each step of the iteration, there is no guarantee that Qβs+1<Qβs. Therefore, we use the Algorithm 1 to estimate β using the Quasi-Newton method with halving [12].

Algorithm 1.

Estimation of β using the Quasi-Newton method with halving.

Set β^1β˜ (initial estimates)
Set ϵ0 (threshold, a small number)
Set max.count (maximum number of repetition)
Set l0
while Error>ϵ0 and lmax.count do
 Calculate Q˙β^1 and Q¨β^1 based on β^1 using proposed method
 Initialise r0=1
β^2β^1r0Q¨β^11Q˙β^1
 Calculate Qβ^1 and Qβ^2 based on β^1 and β^2 respectively using proposed method
while Qβ^2>Qβ^1 do
  r0r0/2
  β^2β^1r0Q¨β^11Q˙β^1
  Calculate Qβ^1 and Qβ^2 based on β^1 and β^2 respectively using proposed method
end while
 Calculate Error=β^2β^12
β^1β^2
ll+1
end while
Output β

2.3. Estimation of eigen-functions

Estimation of eigen-functions is an important step in our proposed quadratic inference technique. In general, FPCA plays an important role as a dimension-reduction technique in functional data analysis. Some important theories on FPCA have been developed in recent years. In particular, Hall and Hosseini-Nasab [14] proved various asymptotic expressions for FPCA for densely observed functional data. Later, Hall and Hosseini-Nasab [15] showed more common theoretical arguments, including the effect of the gap between eigen-value (a.k.a., spacing) on the property of eigen-value estimators. In Li and Hsing [21], uniform rates of convergence of the mean and covariance functions are given, which are equipped for all possible choices/scenarios of mis.

Note that the error process e(t) has mean zero, defined on compact set 𝒯=[0,1] satisfying 𝒯Ee2(t)dt<. The functional principal components can be constructed via the covariance function R(s,t) (induces the kernel operator ) defined as R(s,t)=E{e(s)e(t)} which is assumed to be square-integrable. An empirical analogue of the spectral decomposition of R can be obtained R^(s,t)=r=1λ^rϕ^r(s)ϕ^r(t) where the random variables λ^1λ^20 are the eigen-values of the estimated operator ^ and the corresponding sequence of eigen-functions are ϕ^1,ϕ^2,. Further, assume that 𝒯ϕrϕ^r0 to avoid the issue regarding change of sign [14] for practical comparison of eigen-functions, otherwise there is no impact on the convergence rate of eigen-functions and hence the proposed estimators.

Suppose that Tij are observational points with a positive density function fT(). Assume mi2 and define N=i=1nNi where Ni=mimi-1. This approach is based on local linear smoother which is popular in functional data analysis, including Fan and Gijbels [7], Li and Hsing [21], Yao et al. [33] among many others. Let K() be a symmetric probability density function on [−1, 1], which is used as kernel and h>0 be bandwidth, thus re-scaled kernel function is defined as Kh()=h-1K(). Therefore, for given s,t𝒯, choose (a^0,b^1,b^2) be the minimizer of the following equation.

n1i=1nNi1j1=1mij2=1mij1j2eiTij1eiTij2a0b1Tij1sb2Tij2t2KhTij1sKhTij2t. (7)

Thus we estimate R(s,t)=E{e(s)e(t)} using the quantity a^0, viz., R^(s,t)=a^0. The operator ^ is in general positive semi-definite and the estimated eigen-values λ^r are non-negative; indeed, R^ is symmetric. In practice, ei(t)s are not observable and this is replaced by e˜i(t)=yi(t)-xiT(t)β˜ where β˜ is an initial estimate that is consistent to β but may not be efficient. For example, one can choose the initial estimator as an ordinary least squares of β.

3. Asymptotic properties

In this section, we study the asymptotic properties of the proposed estimator. Let us introduce some notations. Assume that mis are all of the same order, viz, mm(n)=na for some a>0. Define, dn1(h)=h2+hm¯/m and dn2h=h4+h3m¯/m+h2m¯¯/m2 where m=limsupnn-1i=1nm/mi and m¯¯=limsupnn1i=1nm/mi2. Denote δn1(h)=dn1(h)logn/nh21/2 and δn2(h)=dn2(h)logn/nh41/2. Further, va,b=taKb(t)dt. Define, W=ϕt1T,,ϕtmTT is a matrix of order m×κ0 obtaining after stacking all ϕkS and random components ξi= ξi1,,ξik0T. Further, ξ has mean zero and variance Λ which is a diagonal matrix with components λ1,,λκ0. The sign “≲” indicates that the left-hand side of the inequality is bounded by the right-hand up to a multiplicative positive constant, i.e. for two sequence of positive real numbers bn1 and bn2 we define for large n,bn1bn2 as bn1Cbn2 where C is a positive constant not involving n. The following conditions are needed for further discussion of the asymptotic properties:

  • (C1) Kernel function K() is a symmetric density function defined on bounded support [−1, 1].

  • (C2) Density function fT of T is bounded above and away from infinity. Also, the density function is bounded below away from zero. Moreover, f is differentiable and the derivative is continuous.

  • (C3) R(s,t) is twice differentiable, and all second-order partial derivatives are bounded on [0, 1]2.

  • (C4) Esupt[0,1]|e(t)|γ< and Esupt[0,1]xi(t)γ< for some γ(4,).

  • (C5) h0 as n such that dn1-1(logn/n)1-2/γ0 and dn2-1(logn/n)1-4/γ0 for γ(4,).

  • (C6) Condition for eigen-components:
    1. for each 1k<r<,λk/λk-λrC0r/|k-r| for non-zero finite generic constant C0.
    2. For some α>0, with the condition Vrλr-2r1+α0 as r where Vr=Eμ˙(t)ϕr(t)dt2.
      The above two conditions hold if λr=r-τ1Λ(r) and Vr=r-τ2Γ(r) for slowly varying functions Λ and Γ where τ2>1+2τ1>3 and τ=α+τ1.
    3. ϕk4(t)dt and μ˙2(t)ϕk2(t)dt are finite for all k1.
  • (C7) C^β0 converges almost surely to an invertible matrix C0=Egβ0gβ0T where gβ0=g(1)Tβ0,,gκ0Tβ0T, where β0 is the true value of β.

  • (C8) Conditions for h and κ0.
    1. If a>1/4,κ0=On1/(3-τ) and n-1/4hn-(a+1)/5.
    2. If 0<a1/4,κ0=On4(1+a)/5(3-τ) and hn-1/4.

Remark 1. Condition (C1) is commonly used in non-parametric regression. The bound condition for the density function of time-points has the standard Condition (C2) for random design. Similar results can be obtained for a fixed design where the grid-points are pre-fixed according to the design density 0Tjf(t)dt=j/m for j{1,,m}, for m1. Furthermore, it is important to note that this approach does not involve the requirement to obtain sample path differentiation when we invoke the estimation of eigenfunctions from Li and Hsing [21]. Therefore the method could be applicable for Brownian motion which has a continuously non-differentiable sample path. To expand in Taylor series, Condition (C3) is required, and is also common in non-parametric regression. Condition (C4) is required for a uniform bound for certain higher-order expectations to show uniform convergence. This is a similar condition adopted from Li and Hsing [21]. Smoothness conditions in (C5) and (C8) is common in kernel smoothing and functional data to control bias and variance. The first condition for tuning the parameters mentioned in (C5) is similar to Li and Hsing [21]. The required spacing assumptions for eigen-values in Conditions (C6)a and (C6)b are similar as in Hall and Hosseini-Nasab [15]. Condition (C6)c is a trivial assumption that frequently arises in functional data analysis literature. In most of the situations, this condition automatically holds. Using the weak law of large numbers, Condition (C7) holds for large n. Similar kind of conditions can be invoked such as convexity assumption, i.e., λr-λr+1λr-1-λr for all r2. Condition (C8) is determined to control the rate of the number of repeated measurements.

We present the following theorem to provide the asymptotic expansion and consistency of the proposed estimator for β^.

Theorem 1. Let β0 be the true value of β. Under the Conditions (C1)-(C6), for k1,,κ0, we have the asymptotic mean square error for g¯(k)β0 (see Equation (6)) as

AMSEg¯(k)β0=On-1+n-1κ03-τRn(h),almostsurely, (8)

where Rn(h)=h4+(1/n)+(1/nmh)+1/n2m2h2+1/n2m4h4+1/n2mh+1/n2m3h3. Moreover, under Condition (C8), AMSE g¯(k)β0=On-1. Therefore, if in addition, Condition (C7) holds, as n,β^-β0=On-1/2 in probability.

The following theorem states the results of asymptotic normality.

Theorem 2. Define Ci=k1=1κ0k2=1κ0Φk1XiCi,k1,k2-1XiTΦk2, where Ci,k1,k2-1 is the k1,k2 block of Ci,0-1 with Ci,0=Egiβ0giTβ0. Assume that the conditions for Theorem 1 holds. Then nβ^-β0dN(0,Σ), where Σ=B-1AB-1,A=limnn-1i=1nEXiT𝒞ieieiT𝒞iXi and B=limnn-1i=1nEXiT𝒞iXi.

Remark 2. Here, the selection of the bandwidth only affects the second-order term of the MSE of β^ and has no effect on the asymptotic result of normality as long as h satisfies the Conditions (C5) and (C8) along with some restrictions on κ0. Moreover, it is important to observe that Φk is normalized by the number of repeated measurements.

Remark 3. The proposed method is applicable to both sparse and dense functional data. However, this article focuses on dense functional data as discussed in Section 1. The second part of Theorem 1 and Theorem 2 are derived based on the dense functional response and we assume mi is growing with sample size. This assumption enables us to simplify the asymptotic leading order term of the AMSE expression in Theorem 1.

Outline of the proofs of the proposed theorems is discussed in the Appendix with additional technical details.

4. Simulation studies

We conduct extensive numerical studies to compare the finite sample performance of our proposed method to that of Qu et al. [29] under different correlation structures.

4.1. Simulation set-up

Consider the normal response model

yiTij=xiTijTβ+eiTij. (9)

For p=2, we set coefficient vectors, β=β1,β2T, where β1=1 and β2=0.5. Consider the covariates xik(t)=χi1(k)+χi2(k)2sin(πt)+χi3(k)2cos(πt) where the coefficients χi1(k)~N0,2-0.5(k-1)2,χi2(k)~N0,0.85×2-0.5(k-1)2,χi3(k)~N0,0.7×2-0.5(k-1)2 and χijs are mutually independent for each trajectories i and each j. In fixed design situations, associated observational times are fixed. Sample trajectories are observed at m=100 equidistant time-points t1,,tm on [0, 1]. Set number of trajectories n{100,300,500}. The residual process ei(t) is a smoothed function with mean zero and unknown covariance function, where each ei is distributed as ei=k1ξiϕi and ξks are independent normal random variables with mean zero and respective variances λk. For numerical computation, we truncate the finite series at k=3 in Karhunen-Loève expansions for Situations (a), (b), and (c) as described below. In Situations (d) and (e), the error process is generated from given covariance functions.

  1. Brownian motion. The covariance function for the Brownian motion is min(s,t),λk=4/π2(2k-1)2 and ϕk(t)=2sint/λk.

  2. Linear process. Consider the eigen-values be λk=k-2l0 and ϕk(t)=2cos(kπt). We fix l0{1,2,3}.

  3. Ornstein Uhlenbeck (OU) process. For positive constants μ0 and ρ0, we have a stochastic differential equation for e(t) as e(t)=-μ0e(t)t+ρ0w(t) for the Brownian motion w(t). It can be shown that covet,es=cexp-μ0|t-s| where c=ρ02/2μ0. Here we assume c=1. Thus, by solving the integral equation we have ϕk(t)=Akcosωkt+Bksinωkt and λk=2μ0/ωk2+μ02 where ω is solution of cot(ω)=ω2-μ02/2μ0ω. The constants Ak and Bk are defined as Bk=μ0Ak/ωk where Ak=2ωk2/2μ0+μ02+ωk2. Here μ0 is chosen to be 1 or 3.

  4. Power exponential. R(s,t)=exp-|s-t|/a0b0 where scale parameter a0=1 and shape parameter b0{1,2,5}.

  5. Rational quadratic. R(s,t)=1+(s-t)2/a02-b0 where scale parameter a0=1 and shape parameter b0{1,2,5}.

4.2. Comparison and evaluation

For each of the situations, we perform 500 simulation replicates. To execute Qu et al. [29]’s approach, we construct the scores using basis matrices as described in Example 1 (approximation of the compound symmetric correlation structure, denoted as ldaCS in the tables) and Example 2 (for the first-order autoregressive correlation structure, denoted as ldaAR in the tables) in their paper. Ordinary least squares estimate (init) is taken as the initial estimate of β for both ours and Qu et al. [29]’s approach. We indicate fda-κ0 as our proposed method with the number of basis functions κ0. Additionally, we denote fda-AIC as our proposed method where the number of basis functions is determined by the AIC, as suggested by Yao et al. [33]. The estimation procedure in the iterative algorithm converges when the square difference between the estimated values of two consecutive steps is bounded by a small number 10−10 or the maximum number of steps crosses 500, whichever happens earlier. To make theoretical results and numerical examples consistent, we use FPCA function in R which is available in fdapace packages [11]. Bandwidths are selected using generalized cross-validation and the Epanechnikov kernel K(x)=0.751-x2+ is used for estimation where (a)+=max(a,0).

The means and standard deviations (SD) of the regression coefficients based on 500 simulations are given as summary measures. We calculate the standard deviation mentioned in the tables based on 500 estimates from 500 replications that can be viewed as the true standard error. Moreover, we also compute absolute bias (defined as AB=b=1500β^b-β/500) and mean square error (defined as MSE=b=1500β^b-β2/500) to compare the performance of estimation, where for b-th replication β^b be the estimated value for β, MSEs are reported in the order of 10−2. In the the last column of all the tables, we report the average fraction of variance explained (FVE) under different values of κ0. Since our objective is to see the performance of the proposed method, we compare the results for different choices of κ0. However, in practice, the number of selected eigen-functions plays a critical role in the proposed method. We can choose κ0 based on a scree plot where we plot the number of the component against the FVE. The elbow of the graph is found and the components to the left are considered as significant. Another possibility is to use AIC to select the number of eigen-functions as we have demonstrated in the numerical results.

4.3. Simulation results

Simulation results associated with the Brownian motion are shown in Table 1. In this situation, we observe that our approach produces better results in terms of the dispersion measures. Tables S1, S2 and S3 of supplementary material show results for linear processes, our proposed method performs better in situations with a working correlation matrix as AR but is comparable for exchangeable structure for l0=1,2,3. Moreover, in our proposed method, as l increases all dispersion measures such as MSE decrease. Results based on the OU process are documented in Tables S4 and S5 in supplementary material. Our method outperforms the existing methods for both situations and as μ0 increases, the MSE decreases. For three different parameter choices of the power exponential and rational quadratic covariance structure, the numerical results are presented in Tables S6, S7, S8 and S9, S10, S11 in the supplementary material, respectively. As before, we observe that our proposed method is finer than the existing ones in all sub-cases; but interestingly, when b0 increases, the MSE decreases for the power exponential, whereas it increases for the rational quadratic covariance structure as expected due to the covariance structure. Overall, we observe that for all the above situations, as sample size increases, the dispersion measures, for example, SD and MSE decrease. It establishes that as sample size increases, the parameter estimates get closer and closer to the true parameters. In each of the above situations, the SDs for the proposed methods decrease as we increase κ0 and stabilize after some value of κ0 where the fraction of variance (FVE) is approximately 100%. In most cases, the estimated κ0 is close to 3 using AIC approach. However, in the specific scenario described in (d) with a0=b0=1, the median κ0 selected by AIC is 1, 19 and 20 for n=100,300,500 respectively. We observe that the proposed method, which selects the number of eigenvalues using the AIC approach, performs better than the existing methods by Qu et al. [29]. Additionally, the results selected by AIC are generally consistent with those chosen by the scree plot criteria of FVE when FVE is sufficiently large.

Table 1:

Performance of the estimation procedure where the residuals are generated from Brownian motion. Mean of the estimated coefficients, standard deviation, absolute bias, mean square error (×100), and FVE in percentage are summarized. The results are based on: (1) ordinary least squares (init); (2) Qu et al. [29]’s approach with first-order autoregressive (ldaAR), and compound symmetric (ldaCS) correlation structures; (3) our proposed approach (fda-κ0), where κ0 denotes the number of basis functions, and fda-AIC with AIC-selected κ0.

β1 β2
Method Mean SD AB MSE Mean SD AB MSE FVE %-age
n = 100
init 0.9999 0.0373 0.0297 0.1391 0.4995 0.0486 0.0384 0.2354
ldaAR 0.9991 0.0331 0.0265 0.1095 0.5004 0.0445 0.0353 0.1972
ldaCS 0.9987 0.0316 0.0253 0.1000 0.4997 0.0411 0.0322 0.1685
fda-1 0.9998 0.0564 0.0447 0.3180 0.5006 0.0743 0.0587 0.5516 86.2672
fda-2 1.0001 0.0269 0.0213 0.0723 0.4971 0.0362 0.0290 0.1314 96.3746
fda-3 0.9998 0.0231 0.0181 0.0532 0.4978 0.0317 0.0251 0.1010 99.9220
fda-4 1.0004 0.0052 0.0014 0.0028 0.4994 0.0092 0.0022 0.0085 99.9979
fda-5 0.9999 0.0021 0.0008 0.0004 0.4999 0.0051 0.0012 0.0026 100.0000
fda-AIC 0.9998 0.0231 0.0181 0.0532 0.4978 0.0317 0.0251 0.1010 99.9220
n = 300
init 1.0002 0.0200 0.0162 0.0401 0.5003 0.0288 0.0226 0.0825
ldaAR 1.0003 0.0184 0.0147 0.0336 0.5000 0.0259 0.0203 0.0670
ldaCS 1.0007 0.0170 0.0134 0.0288 0.4995 0.0242 0.0190 0.0583
fda-1 1.0002 0.0309 0.0251 0.0955 0.5008 0.0443 0.0350 0.1962 86.7578
fda-2 1.0002 0.0142 0.0114 0.0202 0.4998 0.0213 0.0169 0.0451 96.4747
fda-3 1.0002 0.0122 0.0098 0.0150 0.4992 0.0179 0.0144 0.0321 99.9745
fda-4 1.0002 0.0021 0.0003 0.0004 0.4998 0.0032 0.0005 0.0010 99.9993
fda-5 1.0000 0.0002 0.0001 0.0000 0.5000 0.0003 0.0002 0.0000 100.0000
fda-AIC 1.0002 0.0122 0.0098 0.0150 0.4992 0.0179 0.0144 0.0321 99.9745
n = 500
init 1.0002 0.0148 0.0117 0.0219 0.5006 0.0223 0.0177 0.0497
ldaAR 1.0005 0.0138 0.0111 0.0189 0.5000 0.0206 0.0162 0.0422
ldaCS 1.0000 0.0128 0.0102 0.0164 0.4992 0.0184 0.0146 0.0340
fda-1 1.0007 0.0234 0.0185 0.0545 0.5012 0.0348 0.0277 0.1213 86.7520
fda-2 0.9996 0.0105 0.0083 0.0110 0.5002 0.0157 0.0126 0.0247 96.5174
fda-3 0.9991 0.0091 0.0074 0.0084 0.4999 0.0133 0.0107 0.0177 99.9851
fda-4 1.0000 0.0002 0.0001 0.0000 0.5000 0.0003 0.0002 0.0000 99.9996
fda-5 1.0000 0.0002 0.0001 0.0000 0.5000 0.0003 0.0002 0.0000 100.0000
fda-AIC 0.9991 0.0091 0.0074 0.0084 0.4999 0.0133 0.0107 0.0177 99.9851

5. Data analysis

In this section, we apply our proposed method to motivating examples in two different data-sets.

5.1. Beijing’s PM2.5 pollution study

In the atmosphere, suspended microscopic particles of solid and liquid matter are commonly known as particulates or particulate matter (PM). Such particulates often have a strong noxious impact toward human health, climate, and visibility. One such common and fine type of atmospheric particle is PM2.5 with a diameter less than 2.5 micrometers. Many developed and developing cities across the world are experiencing chronic air pollution, with major pollutants being PM2.5; Beijing and a substantial part of China are among such places. Some studies show that there are many non-ignorable sources of variability in the distribution and transmission pattern of PM2.5, which are confounded with secondary chemical generation. The atmospheric PM2.5 data used in our analysis were collected from the UCI machine learning repository https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data [24]. The data-set includes daily measurements of PM2.5 and associated covariates at twelve different locations in China, namely Aotizhongxin, Changping, Dingling, Dongsi, Guanyuan, Gucheng, Huairou, Nongzhanguan, Shunyi, Tiantan, Wanliu, and Wanshouxigong during January 2017. After excluding missing data, there were 608 hourly data-points in Beijing2017-data. We assume that the atmospheric measurements are independent since they are located quite apart. The objective of our analysis is to describe the trend of functional response PM2.5 (as shown in Fig 1 (left)) and evaluate the effect of covariates including the chemical compounds such as sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO) and ozone (O3) over time. We smooth the covariates and responses to reduce variability and center them. Subsequently, we consider the following model

Yit=β0+SO2tβ1+NO2tβ2+COtβ3+O3tβ4+eit. (10)

We use Algorithm 1 to estimate the coefficients of the regression model mentioned above. Through the simulation results, we observe that if the values of κ0 increase, the standard deviation of the coefficients decreases. For small FVE such as 50%, the corresponding κ0=1 and the estimation procedure performs poorly; whereas for large FVE percentages, the estimation procedure has adequately improved in terms of standard error. The estimated values for β0,β1,β2,β3, and β4 produce similar results across different choices of κ0. From the estimated standard error, using scree-plot of standard error, we conclude that the suitable choice of κ0 is approximately 10. The estimated scaled eigen-values are provided in Figure 1 (right) which clearly shows their decay rate. The estimated coefficients with standard error are 0.0009 (1.1644), 0.0829 (0.2584), 0.9503 (0.1586), 0.0196 (0.0037) and 1.1523 (0.1198), respectively.

5.2. DTI study for sleep apnea patients

The diffusion tensor imaging data (DTI) to understand the white matter structural alteration using diffusion tensor imaging (DTI) are used to illustrate the application of the proposed method and its estimation procedure [32]. MRI is a powerful technique for investigating the structural and functional changes in the brain during pathological and neuro-psychological processes. Due to the advancement in diffusion tensor imaging (DTI), several studies on white matter alterations associated with clinical variables can be found in recent literature. For our analysis, we use Apnea-data obtained from one such study on obstructive sleep apnea (OSA) patients [32]. The data consists of 29 male patients between the ages 30–55 years who underwent a study for the diagnosis of continuous positive airway pressure (CPAP) therapy. Among those who have sleep disorders other than OSA, night-shift workers, patients with psychiatric disorders, hypertension, diabetes, and other neurological disorders were excluded. In this study, the psychomotor vigilance task (PVT) was performed in which a light was randomly switched-on on a screen for several seconds in a certain interval of time, and subjects were asked to press a button as soon as they saw the light appear on screen; such an experiment provides a numerical measure of sleepiness by counting the number of “lapses” for each individual. Psychomotor vigilance task was performed after continuous positive airway pressure treatment which counts the number of lapses in the attention tracking test. In such a test, subjects were asked to press a button as soon as the light appeared on the screen which was turned on randomly for several seconds in a certain interval of time. The interesting measure is sustained attention and provides a numerical measure of sleepiness by counting the number of “lapses” for each individual

DTI was performed at 3T MRI scanner using a commercial 32-channel head coil, followed by the analysis using tract-based spatial statistics to investigate the difference in fractional anisotropy (FA) and other DTI parameters. The image acquisition is as follows. An axial T1-weighted image of the brain (3D-BRAVO) is collected with repetition slice thicknesstime (TR) = 12ms, echo time (TE) = 5,2ms, flip angle = 13°, inversion time = 450 ms, matrix = 384 × 256, voxel size = 1.2 × 0.57 × 0.69mm and scan time = 2 min 54 sec. DTI are obtained in the axial plane using a spin-echo echo planner imaging sequence with TR = 4500ms, TE = 89.4ms, field of view = 20 × 20cm2, matrix size = 160 × 132, slice thickness = 3mm, slice spacing = 1mm, b-values = 0, 1000 s/mm2.

Our objective is to investigate the structural alteration of white matter using DTI in patients with OSA over each voxel at various regions of the brain (called ROIs). Thus, our response variable is one of the DTI parameters, viz., fractional anisotropy (FA) and we are interested in studying the effect of the changes of FA over continuous domains such as voxels with the interaction of the lapses and the voxel locations in each ROIs. We consider the following model for each ROI.

FAi(s)=β0+β1lapsesi×s+ei(s) (11)

where s𝒮, a set of voxels in the considered ROIs. Using the Algorithm 1, we estimate the coefficients β1 and β2 as mentioned in the model 11 and the results are presented in Table 2. We find that the coefficient estimates are close enough to their initial estimates and the estimated standard error is smaller for the coefficients based on the proposed method. Here κ0 (i.e., the number of eigen-functions) is determined for simplicity using FVE, which is fixed at 0.99, resulting in κ0=7.

Table 2:

Estimated values and associated standard errors for the regression coefficients are provided upto four decimal places based on the existing and proposed methods. First line corresponding to each ROI shows results based on initial estimates and the second line corresponds to that of proposed estimates.

β0 β1
region # functional points Estimate Std. Error (×100) Estimate (×100) Std. Error (×100)
ROI.6 659 0.4512 0.1343 −0.0606 0.0130
0.4512 0.0983 −0.0605 0.0028
ROI.7 1362 0.5048 0.0628 0.0309 0.0061
0.5050 0.0681 0.0342 0.0007
ROI.8 1370 0.5256 0.0586 −0.0667 0.0057
0.5271 0.0346 −0.0733 0.0006
ROI.9 690 0.4951 0.0910 0.2904 0.0088
0.5443 0.0874 0.1660 0.0014
ROI.10 699 0.4951 0.0892 0.3314 0.0086
0.5262 0.1398 0.4231 0.0014
ROI.11 968 0.4372 0.0979 0.1323 0.0095
0.4380 0.0637 0.1311 0.0009
ROI.12 968 0.4529 0.0948 0.0965 0.0092
0.4664 0.0750 0.0504 0.0013
ROI.13 992 0.5448 0.1060 0.3453 0.0103
0.5449 0.0856 0.3559 0.0011
ROI.14 992 0.5435 0.1068 0.3432 0.0104
0.5436 0.0754 0.3437 0.0003
ROI.37 1236 0.3695 0.0779 −0.1126 0.0076
0.3713 0.0669 −0.1175 0.0017
ROI.38 1155 0.3564 0.0819 −0.1356 0.0079
0.3578 0.0420 −0.1356 0.0009
ROI.39 1124 0.4618 0.0760 0.1972 0.0074
0.4621 0.0615 0.1996 0.0007
ROI.40 1125 0.4786 0.0658 0.0953 0.0064
0.4780 0.0369 0.1016 0.0005
ROI.45 380 0.4189 0.1071 0.1647 0.0104
0.4190 0.0175 0.1648 0.0001
ROI.46 376 0.4074 0.1033 0.1988 0.0100
0.4074 0.0159 0.1994 0.0002
ROI.47 596 0.4596 0.0932 0.1304 0.0090
0.4594 0.0191 0.1349 0.0001
ROI.48 600 0.4045 0.0868 0.1100 0.0084
0.4036 0.0644 0.1067 0.0006

6. Discussion

In this article, we propose an estimation procedure for the constant linear effects model, which is commonly used in statistics [36] especially in spatial modeling. One of the key factors of this estimation procedure is the fact that it is based on the quadratic inference methodology that has served a huge role in the analysis of correlated data since it was discovered by Qu et al. [29]. In contrast with the existing method, our approach allows the number of repeated measurements to grow with sample size; therefore, the trajectories of individuals can be observed on a dense grid of a continuum domain. Instead of assuming a working correlation structure, we propose a data-driven way by estimating the eigen-functions that are obtained by functional principal component analysis. Here, we achieve n-consistency of the parametric estimates in the regression model, even though the eigen-functions are estimated non-parametrically.

Additionally, our method is easy to implement in a wide range of applications. The applicability of the proposed method is illustrated by extensive simulation studies. Moreover, two real-data applications in different scientific domains are provided which confirm the efficacy of the proposed method.

Supplementary Material

1

Acknowledgement

We would like to thank Dr. Xiaohong Joe Zhou of University of Illinois at Chicago for providing the Apnea-data used in Section 5.2. We thank the Editor, Associate Editor, and reviewers for their constructive comments which have improved the quality of this article. Zhong’s research was partially supported by an NSF grant FRG 2152070 and an NIH grant R03NS128450.

Appendix A. Some preliminary definitions and concepts of operators

Consider the standard 2[0,1] space that defines the set of square-integrable functions defined on the closed set [0, 1] that takes values on the real line. The space 2[0,1] is equipped with an inner product and is defined as f,g=01f(t)g(t)dt for f and g in that space, and forms a Hilbert space. Moreover, we denote the norm 2 in 2 which is defined as f2=f2(u)du1/2. Define be an operator that assigns an element f in 2[0,1] to a new element f in 2[0,1] moreover, is linear and bounded. A linear mapping f()=R(,u)f(u)du for any function f2[0,1] and for some integrable function R(,) on [0, 1] × [0, 1]. This function is preferably known as integral operator and the bivariate function R is known as a kernel in statistics and functional analysis literature. Furthermore, under the assumption that R2(u,v)dudv<. It is easy to see that f() is uniformly continuous and compact for a non-negative definite symmetric kernel R. For some λ, in Fredholm integral equation, ϕ=λϕ has non-zero solution ϕ then we call λ as eigen-value of and the solution of the eigen-equation is called eigen-functions, altogether, the pair of eigen-values and eigen-function, viz., (λ,ϕ) are called eigen-elements. Due to non-negative definiteness of , the eigen-values are ordered as λ1λ20.

Suppose for self-adjoint compact operator on Hilbert space consider two operators and 𝒢, define perturbation operator Δ=𝒢- such that 𝒢=+Δ where 𝒢 is an approximation to where Δ amount of error is occurred. Let and 𝒢 have kernels F and G respectively with eigen-elements θr,ψr and λr,ϕr. For simplicity, we assume that the eigen-values are distinct. Then the following Lemma provides perturbation of the eigen-functions.

Lemma 1 (Theorem 5.1.8 in Hsing and Eubank [18]). Let (λ,ϕ) be the eigen-components of and (θ,ψ) be that of 𝒢 with multiplicity of all eigen-values are restricted to be 1. Define ηk=minrkλr-λk. Assume ϕr,ψr0 and ηk>0. Then

ψk-ϕk=r=1rkθk-λr-1𝒫rΔψk+𝒫kψk-ϕk. (A.1)

The above equation follows

ψk-ϕk=r=1rkθk-λr-1𝒫rΔψk+OΔ2. (A.2)

Remark 4. Equation (A.2) plays an important role in finding the bound of the proposed estimator introduced in Section 2.2. Note that supr1θr-λrΔinfrkλk-λr (see Theorem 4.2.8 in Hsing and Eubank [18] for proof). Thus, it is easy to see, θr-λrλk-λr which implies from Equation (A.1)

ψkϕk=r=1rkλkλr1s=0λkθr/λkλrs𝒫rΔϕk+ψkϕk+𝒫kψkϕk=r=1rkλkλr1𝒫rΔϕk+r=1rkλkλr1𝒫rΔψkϕk+r=1rks=1λkλss/λkλrs+1𝒫rΔψk+𝒫kψkϕk. (A.3)

Moreover, using Bessel’s inequality, we can bound last three terms in the above equation by Δ2.

Appendix B. Some useful lemmas

In this section, we present some useful lemmas. For convenience, let us recall the notation. Assume that mis are all of the same order, viz, mm(n). Define, dn1(h)=h2+hm/m and dn2(h)=h4+h3m/m+h2m¯/m2 where m=limsupnn-1i=1nm/mi and m¯=limsupnn-1i=1nm/mi2. Denote δn1(h)=dn1(h)logn/nh21/2,δn2(h)=dn2(h)logn/nh41/2 and δn(h)=h2+δn1(h)+δn22(h). Further, va,b=taKb(t)dt. Define, W=ϕt1T,,ϕtmTT be matrix of order m×κ0 obtained after stacking all ϕks and random components ξi=ξi1,,ξik0T. Further, ξ has mean zero and variance Λ which is a diagonal matrix with components λ1,,λk0. The sign ‘≲’ indicates that for two sequence of positive real numbers bn1 and bn2 we define for large n, bn1bn2 as bn1Cbn2 where C is a positive constant not involving n.

Lemma 2. Consider Z1,,Zn be independent and identically distributed random variables with mean zero and finite variance. Suppose that there exists an M such that PZiM=1 for all i{1,,n}. Let Tn=n-1i=1nZi. then, n-1i=1nZi=O(logn/n)1/2 almost surely. If VarTn=O(logn/n)1/2 then Tn=O(logn/n) almost surely.

Proof: Bernstein’s inequality states that if Z1,,Zn be centered independent bounded random variables with probability 1. Let Tn=n-1i=1nZi, then let VarTn=σn2. Then for any positive real number u, we have PTnuexp-nu2/2σn2+2Mu/3 where M is such that PZiM=1. Moreover, if Tn converges to its limit in probability fast enough, then it converge almost surely in the limit, i.e., if for any u>0,n=1PTnu< them Tn converges to zero almost surely. Now, choose u=4σn2logn/n+4Mlogn/3n. Thus, n=1PTnu<n=11/n2 which is finite. Therefore, Tn=O(u) almost surely. Now let σn4M2logn/9n, we have, Tn=O(logn/n) and if σn=O(1) then Tn=O(logn/n)1/2 almost surely. □

Lemma 3. Suppose Tij are i.i.d. with density fT. Then for fixed i{1,,n}, any k and l1, under assumptions (C2) and (C6)c, mi-1j=1miϕkTijϕlTij=1(k=l)+Ologmi/mi1/2 almost surely where 1 is the indicator function.

Proof: Observe, Emi-1j=1miϕkTijϕlTij=ϕk(t)ϕl(t)dt=1(k=l) and,

Varmi1j=1miϕkTijϕlTij=Emi1j=1miϕkTijϕlTij21(k=1)=mi2j=1miEϕk2Tijϕl2Tij+mi2j1=1mij2=1mij1j2EϕkTij1ϕkTij2ϕlTij1ϕlTij21(k=l)=mi2j=1miEϕk2Tijϕl2Tij+mi2j1=1mij2=1mij1j2EϕkTij1ϕlTij1EϕkTij2ϕlTij21(k=l)=mi1ϕk4(t)dt+mi1/miϕk2(t)dt21ifk=lmi1ϕk2(t)ϕl2(t)dt+mi1/miϕk(t)ϕl(t)ϕktϕltdtdtifkl=O1/mi. (B.1)

Therefore, applying the Lemma 2, the result is immediate. □

Lemma 4. Suppose Tij are i.i.d with density fT. Then for fixed i{1,,n}, for any k1, under assumptions (C2), (C6)c, mi-1j=1miμ˙iTijϕkTij=μ˙i(t)ϕk(t)dt+Ologmi/mi1/2 almost surely.

Proof: Observe, Emi-1j=1miμ˙iTijϕkTij=μ˙i(t)ϕk(t)dt and,

Varmi1j=1miμ˙iTijϕkTijEmi1j=1miμ˙iTijϕkTij2=mi2j=1miEμ˙i2TijϕkTij2+mi2j1=1mij2=1mij1j2Eμ˙iTij1μ˙iTij2ϕkTij1ϕkTij2=O1/mi, (B.2)

since μ˙2(t)ϕk2(t)dt<. Therefore, applying the Lemma 2, the result is immediate. □

Lemma 5. Define, ir=μ˙i(t)ϕr(t)dt and Vr=Eμ˙(t)ϕr(t)dt2 for r1. Then under Conditions (C6)a and (C6)b, for some α>0 such that Vrλr-2r1+α0 as r (due to Condition (C6)b, r=1rkλk-λr-1n-1i=1nirξik=O(logn/n)1/2λk1/2k(1-α)/2 almost surely.

Proof: It is easy to see that, Er=1rkλk-λr-1n-1i=1nirξik=0. Using the spacing condition among the eigen-values in (C6)a, for each 1k<r< and for nonzero finite generic constant C0,

λk/λk-λrC0r/|k-r|. (B.3)

Similar kind of conditions can be invoked such as the convexity assumption, i.e. λr-λr+1λr-1-λr for all r2. Thus, using Inequality (B.3), for some α>0, with condition Vrλr-2r1+α0 as r, we can write

r=1rkVrλk-λr-2r=1rkVrmax(k,r)/|k-r|maxλk,λr2=rk/2Vrλr-2k2/(k-r)2+r>2kVrλk-2r2/(k-r)2+k/2<r<kVrλr-2k2/(k-r)2+k<r<2kVrλk-2r2/(k-r)2rk/2,r>2kVrλr-2+k2k/2<r<2kVrλr-2(k-r)-21+k1-αk/2<r<2k(k-r)-2k1-α. (B.4)

This follows the line of proofs in Hall and Hosseini-Nasab [15] in different contexts. Thus, using the inequality (B.4), it follows that

Varr=1rkλk-λr-1n-1i=1nirξik=n-1λkr=1rkVrλk-λr-2=On-1λkk(1-α). (B.5)

Therefore, applying Lemma 2, the proof is immediate. □

Lemma 6. For ir=μ˙(t)ϕr(t)dt and ηk=minrkλk-λr>0, under Conditions (C6)a and (C6)b, r11r1kr21r2kκ0λk-λr1-1λk-λr2-1n-1i=1nir1ξir2=O(logn/n)1/2κ0(3-α)/2λκ0-1r=1κ0λr1/2 almost surely.

Proof: It is not difficult to see that,

Er1=1r1kr2=1r2kk0λk-λr1-1λk-λr2-1n-1i=1nir1ξir2=0.

Moreover, using the spacing condition mentioned in (C6)b, one can derive the upper bound of ηk-1 by

ηk1=minrkλkλr1=maxrkλkλr1maxrkmaxk,r/krmaxλk,λrλk1k. (B.6)

Due to the monotonic decreasing property of eigen-values, for fixed k1,,κ0, we have

r=1rkκ0λrλk-λr-2ηk-2r=1κ0λrλk-2k2r=1κ0λrλκ0-2κ02r=1κ0λr. (B.7)

Therefore, the following holds under similar conditions to obtain the Inequality (B.4),

Er1=1r1kr2=1r2kk0λk-λr1-2λk-λr2-2n-1i=1nir1ξir22=n-1r1=1r1kr1=1r1kk0λk-λr1-2λk-λr2-2Vr1λr2n-1k1-αr=1rkκ0λrλk-λr-2n-1κ03-αλκ0-2r=1κ0λr. (B.8)

Therefore, applying the Lemma 2, the result is immediate. □

Remark 5. Define, dikTij1,Tij2ϕ^kTij1ϕ^kTij2-ϕkTij1ϕkTij2. Now replace 𝒢,θ and ψ by ^,λ^ and ϕ^ respectively since ^ be the approximation of and Δ is the corresponding perturbation operator in Equation (A.1). Therefore, Lemma 1 immediately implies the following expansion, which is the key fact to represent the objective function in QIF.

ϕ^k-ϕk=r=1rkλk-λr-1ϕr,Δϕkϕr+OΔ2,almostsurely, (B.9)

where Δ be the integral operator with kernel R^-R.

Appendix C. Proof of Theorem 1

For the k-th element of g¯nβ01kκ0,

g¯n(k)β0=n-1i=1nmi-2μ˙iTΦ^kyi-μi=n-1i=1nmi-2μ˙iTΦ^kWξi=n-1i=1nmi-2μ˙iTΦkWξi+n-1i=1nmi-2μ˙iTΦ^k-ΦkWξiJkn1+Jkn2. (C.1)

Now, using Lemmas 3 and 4, the first part of the expression of g¯n,kβ0 becomes

Jkn1=n-1i=1nmi-2μ˙iTΦkWξi=n-1i=1nmi-2j1=1mij2=1mil=1κ0μ˙iTij1ϕkTij1ϕkTij2ϕlTij2ξiln-1i=1nik+O(logm/m)1/21+O(logm/m)1/2ξik,whereik=μ˙i(t)ϕk(t)dt=n-1i=1nikξik1+O(logm/m)1/2=O(logn/n)1/21+(logm/m)1/2,almostsurely. (C.2)

On the other hand, the last part of g¯n(k)β0 can be expressed as

Jkn2=n-1i=1nmi-2μ˙iTΦ^k-ΦkWξi=n-1i=1nmi-2j1=1mij2=1mil=1κ0μ˙iTij1dikTij1,Tij2ϕlTij2ξil, (C.3)

where dikTij1,Tij2ϕ^kTij1ϕ^kTij2-ϕkTij1ϕkTij2 as defined in Remark 5. Therefore, using the discussion in Remark 5, we can obtain the following expression almost surely.

dikTij1,Tij2ϕ^kTij1ϕ^kTij2-ϕkTij1ϕkTij2=ϕkTij1+r=1rkλk-λr-1ϕr,ΔϕkϕrTij1+OΔ2ϕkTij2+r=1rkλk-λr-1ϕr,ΔϕkϕrTij2+OΔ2-ϕkTij1ϕkTij2=r=1rkλk-λr-1ϕr,ΔϕkϕrTij1ϕkTij2+ϕkTij1ϕrTij2+r1kr2kλk-λr1-1λk-λr2-1ϕr1,Δϕkϕr2,Δϕkϕr1Tij1ϕr2Tij2+OΔ2Iikn1Tij1,Tij2+Iikn2Tij1,Tij2+OΔ2. (C.4)

Thus, almost surely, we can write

n-1i=1nmi-2μ˙iTΦ^k-ΦkWξi=n-1i=1nmi-2j1=1mij2=1mil=1κ0μ˙iTij1dikTij1,Tij2ϕlTij2ξil=n-1i=1nmi-2j1=1mij2=1mil=1κ0μ˙iTij1dikTij1,Tij2ϕlTij2ξil=n-1i=1nmi-2j1=1mij2=1mil=1κ0μ˙iTij1Iikn1Tij1,Tij2+Iikn2Tij1,Tij2ϕlTij2ξil+OΔ2Jk1n2+Jk2n2+OΔ2. (C.5)

Under assumptions (C1)-(C5), by using Theorem 3.3 in Li and Hsing [21], Δ2=Oh4+δn22(h) almost surely. Now observe that

Jk1n2=n-1i=1nmi-2j1=1mij2=1mil=1k0r=1rkλk-λr-1μ˙iTij1ϕrTij1ϕkTij2+ϕkTij1ϕrTij2ϕlTij2ϕr,Δϕkξiln-1i=1nmi-11j1=1mir=1rkl=1κ0λk-λr-1μ˙iTij1ϕrTij11(l=k)+O(logm/m)1/2ϕr,Δϕkξil+n-1i=1nmi-1j1=1mir=1rkl=1k0λk-λr-1μ˙iTij1ϕkTij11(r=l)+O(logm/m)1/2ϕr,Δϕkξiln-1i=1nmi-1j1=1mir=1rkλk-λr-1μ˙iTij1ϕrTij1ϕr,Δϕkξik1+O(logm/m)1/2+n-1i=1nmi-1mi=1m1=k0r1λk-λr-1μ˙iTij1ϕkTij1ϕr,Δϕkξir1+O(logm/m)1/2Uk1n+Uk2n1+O(logm/m)1/2,almostsurely. (C.6)

Then applying the triangle inequality, we have

Uk1n=n-1i=1nmi-1j=1mir=1rkλk-λr-1μ˙iTijϕrTijΔϕk,ϕrξikΔϕkr=1rkλk-λr-1n-1i=1nir+O(logm/m)1/2ξik=Δϕkr=1rkλk-λr-1n-1i=1nirξik1+O(logm/m)1/2, (C.7)

where ir=μ˙i(t)ϕr(t)dt. By Lemma 6 of Li and Hsing [21], under conditions (C1)-(C5), for any measurable bounded function e on [0, 1], Δϕk=Oh2+δn1(h)+δn22(h)Oδn(h) almost surely, where δn(h)=h2+δn1(h)+δn22(h). Thus, in addition with Inequalities (B.5) in Lemma 5, we obtain

Uk1n=Oδn(h)(logn/n)1/2λk1/2k(1-α)/21+(logm/m)1/2

almost surely. Next, under the spacing condition mentioned earlier and in assumption (C6)a, using the Inequality (B.6), recall ηk-1λk-1k. Thus, observe that

Uk2n=n-1i=1nmi-1j=1mir=1rkκ0λk-λr-1μ˙iTijϕkTijΔϕk,ϕrξirΔϕkr=1rkκ0λk-λr-1n-1i=1nikξir1+O(logm/m)1/2Δϕkηk-1r=1rkκ0n-1i=1nikξir1+O(logm/m)1/2. (C.8)

Using condition (C6)b, we also have Vk1/2ηk-1Vk1/2λk-1k=Ok(1-α)/2. Finally, combining with the bounds for Uk1n,Uk2n, we have, almost surely,

Jk1n2=O(logn/n)1/2δn(h)λk1/2k(1-α)/2+ηk-1Vk1/2r=1rkκ0λr1/21+(logm/m)1/2Oωk1(n,h), (C.9)

where ωk1(n,h)=(logn/n)1/2δn(h)k(1-α)/2k=1κ0λr1/21+(logm/m)1/2.

It is easy to see that r=1κ0λr1/2~κ0-τ1/2+1. Therefore, for τ=α+τ1,ωk1(n,h)~(logn/n)1/2δn(h)κ0(3-τ)/21+(logm/m)1/2. Similarly, to the derivation of the bound for Jk1n2, we can write

Jk2n2=n1i=1nmi2j1=1mij2=1mil=1κ0μ˙iTij1Iikn2Tij1,Tij2ϕlTij2ξil=n1i=1nmi2j1=1mij2=1mil=1κ0r1=1r1kr1=1r2kk0λkλr11λkλr21ϕr1,Δϕkϕr2,Δϕkμ˙iTij1ϕr1Tij1ϕr2Tij2ϕlTij2ξiln1i=1nmi1j1=1mil=1κ0r1=1r1kr2=1r2kk0λkλr11λkλr21ϕr1,Δϕkϕr2,Δϕkμ˙iTi1ϕr1Tij11r2=l+O(logm/m)1/2ξil=n1i=1nmi1j1=1mir1=1r1kr2=1r2kk0λkλr11λkλr21ϕr1,Δϕkϕr2,Δϕkμ˙iTij1ϕr1Tij1ξir21+O(logm/m)1/2Δϕk2r1=1r1kr2=1r2kk0λkλr11λkλr21n1i=1nir1+O(logm/m)1/2ξir21+O(logm/m)1/2=Δϕk2r1=1r1kr2=1r2kk0λkλr11λkλr21×n1i=1nir1ξir21+O(logm/m)1/2. (C.10)

Therefore, Inequality (C.10) immediately follows using Lemma 6,

Jk2n2=O(logn/n)1/2δn2(h)κ0(3-α)/2λκ0-1r=1κ0λr1/21+(logm/m)1/2Oωk2(n,h),almostsurely,

where

ωk2(n,h)=(logn/n)1/2δn2(h)κ0(3-α)/2λκ0-1r=1κ0λr1+(logm/m)1/2.

We observe that r=1κ0λr~κ0-τ1+1 under assumption (C6)b. Thus, ωk2(n,h)~(logn/n)δ2(h)κ0(4-τ)/21+(logm/m)1/2. Since ωn2=Oωn1 and δn2(h)=Oωn1, in summary, for each k1,,κ0,

g¯(k)(β)=O(logn/n)1/21+(logm/m)1/2+ωk1(n,h)

almost surely. Since 1/(nm)=O(1/n),

AMSEg¯(k)β0=On-1+n-1κ03-τRn(h),

where Rn(h)=h4+1n+1nmh+1n2m2h2+1n2m4h4+1n2mh+1n2m3h3. Combining the above conditions, we find that if a>1/4,κ0=On1/(3-τ) and n-1/4hn-(a+1)/5 then AMSEg(k)¯β0=O(1/n). On the other hand, if a1/4,κ0=On4(1+a)/5(3-τ) and hn-1/4AMSEg(k)¯β0=O(1/n).

Note that for a three-dimensional array C/β1,,C/βp such that the following is a p×1 vector.

g¯β0TC-1β0C˙β0C-1β0g¯β0 (C.11)

Therefore,

n-1Q˙β0=2g¯˙β0TC-1β0g¯β0-g¯β0TC-1β0C˙β0C-1β0g¯β0 (C.12)

and

n-1Q¨β0=2g¯˙β0TC-1β0g¯˙β0+rn1+rn2+rn3+rn4 (C.13)

where

rn1=2g¯¨β0C-1β0g¯β0,rn2=-4g¯˙β0TC-1β0C˙β0C-1β0g¯β0,rn3=2g¯˙β0C-1β0C-1β0C˙β0C-1β0g¯β0,rn4=-g¯β0TC-1β0C¨β0C-1β0g¯β0. (C.14)

Since g¯β0=OPn-1/2 and the weight matrix converges almost surely to an invertible matrix,

g¯β0TC-1β0C˙β0C-1β0g¯β0=on-1,

almost surely. Furthermore, rn1=On-1/2,rn2=on-1/2,rn3=On-1/2, and rn4=On-1 almost surely. Combining these bounds, we have rn=o(1) almost surely. Therefore, n1Q˙2g¯˙(β0)TC1(β0)g¯(β0)=oP(n1) and n-1Q¨-2g¯˙β0TC-1β0g¯˙β0=oP(1).

The following lines are based on common steps in the GEE literature that includes Balan and Schiopu-Kratina [2], McCullagh and Nelder [27], Tian et al. [30] among many others. Let βn=β0+δd where set δ=n-1/2. We have to show that for any ϵ>0 there exists a large constant c such that

Pinfd=cQβnQβ0>1-ϵ. (C.15)

Note that the above statement is always true if ϵ1. Thus, we assume that ϵ(0,1). Due to Taylor series expansion,

Qβn=Qβ0+δd=Qβ0+δdTQ˙β0+0.5δdTQ¨β0d+d2oP(1). (C.16)

Now, observe that, using Equation (C.12),

δdTQ˙β0=dOP(nδ)+dOP(δ), (C.17)

and

0.5δ2dTQ¨β*d=nδ2dTg¯˙β0C-1β0g˙β0d+nδ2d2oP1. (C.18)

Therefore, for given ϵ>0, there exists a large enough c such that the above equation (C.15) holds. This implies that there exists a βˆ that satisfies β^-β0=OP(δ). Thus, for large n, with probability 1, Q(β) attains the minimal value at βˆ and therefore, Q˙=0.

Appendix D. Proof of Theorem 2

Recall, Ci=k1=1k0k2=1k0Φk1XiCk1,k2-1XiTΦk2, where Ck1,k2-1 is the k1,k2 block of C0-1. Similarly, we can define 𝒞i^=k1=1k0k2=1k0Φ^k1XiCk1,k2-1XiTΦ^k2. It is easy to observe that Ci^=Ci+k1=1k0k2=1κ0Φ^k1-Φk1XiCk1,k2-1XiTΦ^k2-Φk2+2k1=1κ0k2=1κ0Φk1XiCk1,k2-1XiTΦ^k2-Φk2. Therefore, n1i=1nμ˙iTCi^CXi=n1i=1nk1=1k0k2=1κ0Pik1Ck1,k21Pik2 and n-1i=1nμ˙iTCi^-Cyi=n-1i=1nk1=1K0k2=1K0Pik1Ck1,k2-1Qik2 where Pi,k=μ˙iTDikXi and Qik=μ˙iTDikyi with Dik be the difference matrix with j1,j2-th element is diTij1,Tij2. Thus, note that, almost surely we have the following relation,

Pik=mi-2j1=1mij2=1miμ˙iTij1diTij1,Tij2xiTij2mi-2j1=1mij2=1miμ˙iTij1r=1rkλk-λr-1ϕr,ΔϕkϕrTij1ϕkTij2+ϕkTij1ϕrTij2+OΔ2r=1rkλk-λr-1Δϕk+OΔ2=O(ϖ), (D.1)

since mi-1j=1miμ˙TijϕkTij and mi-1j=1mixiTijϕkTij are finite, where ϖ=r=1rkλk-λr-1δn(h)+h2+δn22(h). A similar result can be obtained for Qik. Combining such results, in summary, we have -2n-1i=1nXiT𝒞i^yi-XiTβ^=-2n-1i=1nXiTCiyi-XiTβ^+Oϖn. Since, for n,Q(β) attains a minimal value at β=β^, we therefore have Q˙(β^)=0. Thus,

Q˙(β^)=-2n-1i=1nXiT𝒞i^yi-XiTβ^=-2n-1i=1nXiTCi^-Ciyi-XiTβ^-2n-1i=1nXiTCiyi-XiTβ^=0. (D.2)

Therefore, almost surely, we have,

-2n-1i=1nXiTCiyi-XiTβ^+Oϖn=0,-2n-1i=1nXiT𝒞iXiTβ0+ei-XiTβ^+Oϖn=0,nβ^-β0n-1i=1nXiTCiXi=n-1i=1nXiT𝒞iei. (D.3)

Now, using the central limit theorem, we can obtain the following.

i=1nXiTCiei/ndN(0,A). (D.4)

In addition, by the law of large numbers n-1i=1nXiT𝒞iXiB in probability. Therefore, using the Slutsky theorem, we complete the proof of Theorem 2.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • [1].Bai Y, Zhu Z, Fung WK, Partial linear models for longitudinal data based on quadratic inference functions, Scandinavian Journal of Statistics 35 (2008) 104–118. [Google Scholar]
  • [2].Balan RM, Schiopu-Kratina I, Asymptotic results with generalized estimating equations for longitudinal data, The Annals of Statistics 33 (2005) 522–541. [Google Scholar]
  • [3].Chen X, Li H, Liang H, Lin H, Functional response regression analysis, Journal of Multivariate Analysis 169 (2019) 218–233. [Google Scholar]
  • [4].Chiou J-M, Müller H-G, Wang J-L, Functional quasi-likelihood regression models with smooth random effects, Journal of the Royal Statistical Society: Series B 65 (2003) 405–423. [Google Scholar]
  • [5].Dauxois J, Pousse A, Romain Y, Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference, Journal of multivariate analysis 12 (1982) 136–154. [Google Scholar]
  • [6].Diggle P, Diggle PJ, Heagerty P, Liang K-Y, Heagerty PJ, Zeger S, Analysis of Longitudinal Data, Oxford University Press, 2002. [Google Scholar]
  • [7].Fan J, Gijbels I, Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability, Chapman & Hall/CRC, New York, USA, 1996. [Google Scholar]
  • [8].Fan J, Zhang W, Statistical estimation in varying coefficient models, The Annals of Statistics 27 (1999) 1491–1518. [Google Scholar]
  • [9].Friston KJ, Ashburner J, Kiebel S, Nichols T, Penny W (Eds.), Statistical Parametric Mapping: The Analysis of Functional Brain Images, Academic Press, San Diego, CA, 2007. [Google Scholar]
  • [10].Friston KJ, Holmes AP, Worsley KJ, Poline J-P, Frith CD, Frackowiak RS, Statistical parametric maps in functional imaging: a general linear approach, Human Brain Mapping 2 (1994) 189–210. [Google Scholar]
  • [11].Gajardo A, Carroll C, Chen Y, Dai X, Fan J, Hadjipantelis PZ, Han K, Ji H, Mueller H-G, Wang J-L, fdapace: Functional Data Analysis and Empirical Dynamics, 2021. R package version 0.5.7.
  • [12].Givens GH, Hoeting JA, Computational Statistics, John Wiley & Sons, New York, USA, 2 edition, 2012. [Google Scholar]
  • [13].Hall P, Horowitz JL, Methodology and convergence rates for functional linear regression, The Annals of Statistics 35 (2007) 70–91. [Google Scholar]
  • [14].Hall P, Hosseini-Nasab M, On properties of functional principal components analysis, Journal of the Royal Statistical Society: Series B 68 (2006) 109–126. [Google Scholar]
  • [15].Hall P, Hosseini-Nasab M, Theory for high-order bounds in functional principal components analysis, in: Mathematical Proceedings of the Cambridge Philosophical Society, volume 146, Cambridge University Press, p. 225. [Google Scholar]
  • [16].Hand D, Crowder M, Practical longitudinal data analysis, Routledge, 2017. [Google Scholar]
  • [17].Hansen LP, Large sample properties of generalized method of moments estimators, Econometrica: Journal of the Econometric Society (1982) 1029–1054. [Google Scholar]
  • [18].Hsing T, Eubank RL, Theoretical Foundations of Functional Data Analysis, With an Introduction to Linear Operators, Wiley Series in Probability and Statistics, Wiley, New York, USA, 2015 [Google Scholar]
  • [19].J Mercer B, Functions of positive and negative type, and their connection the theory of integral equations, Philosophical Transactions of the Royal Society of London. Series A 209 (1909) 415–446. [Google Scholar]
  • [20].Karhunen K, Zur spektraltheorie stochastischer prozesse, Annales Academiae Scientiarum Fennicae. Series A. I. Mathematica 34 (1946). [Google Scholar]
  • [21].Li Y, Hsing T, Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data, The Annals of Statistics 38 (2010) 3321–3351. [Google Scholar]
  • [22].Li Y, Qiu Y, Xu Y, From multivariate to functional data analysis: Fundamentals, recent developments, and emerging areas, Journal of Multivariate Analysis 188 (2022) 104806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Liang K-Y, Zeger SL, Longitudinal data analysis using generalized linear models, Biometrika 73 (1986) 13–22. [Google Scholar]
  • [24].Liang X, Zou T, Guo B, Li S, Zhang H, Zhang S, Huang H, Chen SX, Assessing beijing’s pm2. 5 pollution: severity, weather impact, apec and winter heating, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 471 (2015) 20150257. [Google Scholar]
  • [25].Lindquist MA, The statistical analysis of fmri data, Statistical Science 23 (2008) 439–464. [Google Scholar]
  • [26].Loève M, Functions aleatoire de second ordre, Revue Science 84 (1946) 195–206. [Google Scholar]
  • [27].McCullagh J, Nelder J, Generalized Linear Models, volume 37 of Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 2nd edition, 1989. [Google Scholar]
  • [28].Qu A, Li R, Quadratic inference functions for varying-coefficient models with longitudinal data, Biometrics 62 (2006) 379–391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Qu A, Lindsay BG, Li B, Improving generalised estimating equations using quadratic inference functions, Biometrika 87 (2000) 823–836. [Google Scholar]
  • [30].Tian R, Xue L, Liu C, Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data, Journal of Multivariate Analysis 132 (2014) 94–110. [Google Scholar]
  • [31].Turner EL, Li F, Gallis JA, Prague M, Murray DM, Review of recent methodological developments in group-randomized trials: part 1—design, American journal of public health 107 (2017) 907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Xiong Y, Zhou XJ, Nisi RA, Martin KR, Karaman MM, Cai K, Weaver TE, Brain white matter changes in cpap-treated obstructive sleep apnea patients with residual sleepiness, Journal of Magnetic Resonance Imaging 45 (2017) 1371–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Yao F, Müller H-G, Wang J-L, Functional data analysis for sparse longitudinal data, Journal of the American Statistical Association 100 (2005) 577–590. [Google Scholar]
  • [34].Yu H, Tong G, Li F, A note on the estimation and inference with quadratic inference functions for correlated outcomes, Communications in Statistics-Simulation and Computation (2020) 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Zhang H, Li Y, Unified principal component analysis for sparse and dense functional data under spatial dependency, Journal of Business & Economic Statistics 40 (2022) 1523–1537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Zhang L, Banerjee S, Spatial factor modeling: A bayesian matrix-normal approach for misaligned data, Biometrics (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Zhang X, Wang J-L, From sparse to dense functional data and beyond, The Annals of Statistics 44 (2016) 2281–2321. [Google Scholar]
  • [38].Zhou J, Qu A, Informative estimation and selection of correlation structure for longitudinal data, Journal of the American Statistical Association 107 (2012) 701–710. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES