Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jun 23.
Published in final edited form as: Biometrika. 2013 Mar;100(1):203–212. doi: 10.1093/biomet/ass050

Unified inference for sparse and dense longitudinal models

Seonjin Kim 1, Zhibiao Zhao 1
PMCID: PMC4066936  NIHMSID: NIHMS588624  PMID: 24966413

Summary

In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing estimate of the mean function, the convergence rates and limiting variance functions are different under the two scenarios. The latter phenomenon poses challenges for statistical inference as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop self-normalization based methods that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods.

Keywords: Dense longitudinal data, Kernel smoothing, Mixed-effects model, Nonparametric estimation, Self-normalization, Sparse longitudinal data

1. Introduction

Longitudinal models have extensive applications in biomedical, psychometric and environmental sciences (Fitzmaurice et al., 2004; Wu & Zhang, 2006). In longitudinal studies, repeated measurements are recorded over time from subjects, and therefore measurements from the same subject are correlated. One popular framework is to assume that the observations from each subject are noisy discrete realizations of an underlying process {ξ(·)}:

Yij=ξi(Xij)+σ(Xij)εij(i=1,,n;j=1,,ni). (1)

Here Yij is the measurement at time Xij from subject i, {ξi(·)} are independent realizations of an underlying process {ξ(·)}, εij are errors with Eij) = 0 and E(εij2)=1, ni is the number of measurements collected on subject i, and n is the total number of subjects.

There are two typical approaches to taking between-subject variation into account: functional principal component analysis (Yao et al., 2005a,b; Yao, 2007; Ma et al., 2012) and the mixed-effects approach (Wu & Zhang, 2002; Zhang & Chen, 2007). The basic idea of the latter is to decompose {ξi(·)} into a fixed population mean μ(·) = Ei(·)} and a subject-specific random trajectory υi(·) with Ei(x)} = 0 and covariance function γ(x, x′) = cov{υi(x), υi(x′)}. Then (1) becomes

Yij=μ(Xij)+υi(Xij)+σ(Xij)εij(i=1,,n;j=1,,ni). (2)

The goal is to estimate the population mean μ(·) and construct a confidence interval for it.

Depending on the number of measurements within subjects, model (2) has two scenarios: dense and sparse longitudinal data. Dense longitudinal data allow ni → ∞ and a conventional estimation approach is to smooth each individual curve and then construct an estimator based on the smoothed curves (Ramsay & Silverman, 2005; Hall et al., 2006; Zhang & Chen, 2007). In sparse longitudinal data, the ni are either bounded or independent and identically distributed with E(ni) < ∞, and due to the sparse observations from individual subjects, it is essential to pool data together (Yao et al., 2005a; Hall et al., 2006; Yao, 2007; Ma et al., 2012).

In practice, the boundary between dense and sparse cases may not always be clear, and such ambiguity could pose challenges for statistical inference, since different researchers may likely classify the same data set differently. To address this issue, Li & Hsing (2010) proposed a unified weighted local linear estimator of μ(x). However, as shown in Section 2, the latter estimator has different convergence rates and limiting variances under the two scenarios. Therefore, to construct a confidence interval for μ(x), one should make a subjective decision whether to treat the data as sparse or dense. In Section 2, we show that the constructed confidence intervals based on a sparse or dense assumption could differ substantially, depending on many unknown factors. Another challenging issue is that the limiting variance function contains the unknown functions γ(x, x) and σ2(x). As shown by Wu & Zhang (2002), Yao et al. (2005a,b), Müller (2005) and Li & Hsing (2010), covariance estimation requires extra smoothing procedures.

We develop two unified nonparametric approaches that can successfully solve the aforementioned issues. First, we establish a unified convergence theory so that inference can be conducted without deciding whether the data are dense or sparse. Second, the unknown limiting variance is canceled out through a self-normalization technique, and thus the proposed methods do not require estimation of the functions γ(x, x) and σ2(x). The first approach introduces a unified self-normalized central limit theorem that can adapt to both cases. The second approach constructs a self-normalizer based on recursive estimates of the mean function. The related methods have been explored mainly under parametric settings for time series data (Lobato, 2001; Kiefer & Vogelsang, 2005; Shao, 2010). In the longitudinal setting, our development of the self-normalization method is more attractive due to the sparse and dense scenario and the more complicated structure such as the within-subject covariance and overall noise variance function. Simulations show that the proposed methods outperform some existing methods.

2. Motivation

For model (2), we consider two scenarios: (i) sparse longitudinal data: n1, … , nn are independent and identically distributed positive-integer-valued random variables with E(ni) < ∞; and (ii) dense longitudinal data: niMn for some Mn → ∞ as n → ∞.

Throughout we let f(·) denote the density function of Xij and let x be an interior point of the support of f(·). Li & Hsing (2010) proposed a sample-size weighted local linear estimator of μ(x). For technical convenience, we consider the weighted local constant estimator

μ^n(x)=argminθi=1n1nij=1ni(Yijθ)2K(Xijxb)=GnHn, (3)

where K is a kernel function satisfying ∫ K(u)du = 1 and b > 0 is a bandwidth, with

Hn=i=1n1nij=1niK(Xijxb),Gn=i=1n1nij=1niYijK(Xijxb). (4)

The convergence rates and limiting variances are different for sparse and dense longitudinal data. To gain intuition about this, write

μ^n(x)μ(x)1Hni=1n1nij=1ni{μ(Xij)μ(x)}K(Xijxb)=1Hni=1nξi, (5)

where the right hand side determines the asymptotic distribution of μ̂n(x), with

ξi=1nij=1niξij,ξij={υi(Xij)+σ(Xij)εij}K(Xijxb). (6)

Recall γ(x, x′) = cov{υi(x), υi(x′)}. For jj′, by Eijξij) = E{Eijξij | Xij, Xij)},

E(ξijξij)=E{γ(Xij,Xij)K(Xijxb)K(Xijxb)}b2f2(x)γ(x,x). (7)

Throughout, cndn means that cn/dn → 1. Similarly,

E(ξij2)=E{E(ξij2|Xij)}bf(x)ψK{γ(x,x)+σ2(x)},ψK=K2(u)du. (8)

Applying (7)(8) to var(ξi|ni)=ni2{1jjniE(ξijξij)+j=1niE(ξij2)}, we obtain

var(ξi|ni)(11/ni)b2f2(x)γ(x,x)+f(x)ψK{γ(x,x)+σ2(x)}b/ni. (9)

For sparse case with b → 0, var(ξi | ni) ≈ bf(xK{γ(x, x) + σ2(x)}/ni; for dense case with niMn and Mnb → ∞, var(ξi | ni) ≈ b2f2(x)γ(x, x).

Theorem 1. Assume Assumption 1 in the Appendix. Let f(x) be the density of Xij. Write

ψK=K2(u)du,ρ(x)={μ(x)2+μ(x)f(x)f(x)}u2K(u)du.
  1. Sparse data: Assume nb → ∞ and supn nb5 < ∞. Then
    (nb){μ^n(x)μ(x)b2ρ(x)}N{0,ssparse2(x)}, (10)
    where ssparse2(x)=τψK{γ(x,x)+σ2(x)}/f(x) and τ = E(1/n1).
  2. (ii) Dense data: Assume niMn, Mnb → ∞, nb → ∞ and supn nb4 < ∞. Then
    n{μ^n(x)μ(x)b2ρ(x)}N{0,sdense2(x)},sdense2(x)=γ(x,x). (11)

It is worth mentioning some related results. Li & Hsing (2010) established the uniform consistency of μ̂n(x) with different rates under the sparse and dense cases, but they did not obtain the asymptotic distribution. Wu & Zhang (2002) also showed that the local polynomial mixed-effects estimator has different convergence rates and limiting variances under the two scenarios. Under a Karhunen–Loève representation of longitudinal models, Yao (2007) studied the sparse case by allowing ni to be dependent on n; see also Ma et al. (2012).

By Theorem 1, the confidence interval for μ(x) is different under the two cases. Let z1−α/2 be the 1 − α/2 standard normal quantile. Then an asymptotic 1 − α confidence interval for μ(x) is

μ^n(x)b2p^(x)±z1α/2(nb)1/2[τ^ψK{γ^(x,x)+σ^2(x)}/f^(x)]1/2 (12)

for sparse data, or

μ^n(x)b2ρ^(x)±z1α/2n1/2{γ^(x,x)}1/2 (13)

for dense data. Here, τ^=n1i=1nni1, γ̂(x, x), σ̂2(x), (x) and ρ̂(x) are consistent estimates of τ, γ(x, x), σ2(x), f(x) and ρ(x). The ratio of the lengths of the two confidence intervals is R = [ψK τ̂{1 + σ̂2(x)/γ̂(x, x)}/{b (x)}]1/2, which depends on the denseness parameter τ, the signal-to-noise ratio γ(x, x)/σ2(x), the bandwidth b and the design density f(·). The further away R is from one, the larger the discrepancy between the two constructed confidence intervals.

Remark 1. In the dense case, suppose ni is proportional to Mn → ∞. Theorem 1 (ii) studies the case Mnb → ∞. If Mnb → 0, then the leading term in (9) is f(xK{γ(x, x) + σ2(x)}b/ni. If Mnb is bounded away from 0 and ∞, then both terms in (9) are of the same order. If b is proportional to (nMn)−1/5, then a sufficient condition for Mnb → ∞ is Mn4/n. In many practical problems, n is about 30–200, Mn is about 10–30, and Mn4/n is sufficiently large.

3. Unified approaches for sparse and dense data

3·1. A unified self-normalized central limit theorem

The discussion in Section 2 suggests a need for a unified approach. For independent and identically distributed random variables Z1, … , Zn, de la Peña et al. (2009) gave an extensive account of the asymptotic properties of the self-normalized statistic i=1nZi/(i=1nZi2). In this section, we present a unified self-normalized central limit theorem for μ̂n(x). For Hn in (4), define

Un2(x)=1Hn2i=1n[1nij=1ni{Yijμ^(Xij)}K(xXijb)]2.

Theorem 2. Assume Assumption 1 in the Appendix. Suppose nb/ log n → ∞, supn nb5 < ∞ for sparse data or niMn, Mnb → ∞, nb2/ log n → ∞, supn nb4 < ∞ for dense data. Then {μ̂n(x) − μ(x) − b2 ρ(x)}/Un(x) → N(0, 1) in both the sparse and dense settings.

Many papers treat sparse and dense data separately. For example, Yao et al. (2005a,b), Yao (2007) and Ma et al. (2012) studied sparse longitudinal data. For the local polynomial mixed-effects estimator, Wu & Zhang (2002) obtained different central limit theorems under the two scenarios. By contrast, Theorem 2 establishes a unified central limit theorem, which can be used to construct a unified asymptotic pointwise 1 − α confidence interval for μ(x):

μ^n(x)b2ρ^(x)±z1α/2Un(x). (14)

While the confidence intervals (12)(13) require estimation of the within-subject covariance function γ(x, x) and the overall noise variance function σ2(x), (14) avoids such extra smoothing steps and can adapt to the sparse or dense setting through the self-normalizer Un(x).

To select the bandwidth b, we adopt subject-based cross-validation (Rice & Silverman, 1991). The idea is to leave one subject out in model fitting, validate the fitted model using the left-out subject, and choose the optimal bandwidth by minimizing the prediction error:

b*=argminbSJCV(b),SJCV(b)=i=1n1nij=1ni{Yijμ^(i)(Xij)}2, (15)

where μ̂(−i)(x) represents the estimator of μ(x) based on data from all but the ith subject.

In practice, it is difficult to estimate the bias b2ρ(x) due to the unknown derivatives f′, μ′, μ″. In our simulations, we use K(u) = 2G(u) − G(u/ √ 2)/ √2 with G(u) the standard normal density. Then ∫ u2K(u)du = 0 and ρ(x) = 0. However, this does not solve the bias issue. For example, if f and μ are four times differentiable, then we have the higher order bias term O(b4). The bias issue is inherently difficult and there is no good solution so far.

3·2. Self-normalization based on recursive estimates

In this section we introduce another self-normalization method based on recursive estimates. For m = 1, … , n, denote by μ̂m(x) the estimator in (3) based on observations from the first m subjects. Then μ̂1(x), … , μ̂n(x) are estimates of μ(x) with increasing accuracy. Moreover, μ̂m(x) has similar asymptotic normality as in (10)(11). For example, for each 0 < t ≤ 1, the counterpart of (10) for sparse data is (ntb){μ^nt(x)μ(x)b2ρ(x)}N{0,ssparse2(x)}. Throughout, ⌊z⌋ is the integer part of z. Therefore, μ̂n(x) and μ̂nt(x) have proportional convergence rates and the same limiting variance, which motivates us to consider certain ratios between μ̂n(x) and μ̂nt (x) to cancel out the convergence rates and limiting variance.

Since the above analysis holds for all 0 < t ≤ 1, we consider an aggregated version

Tn(x)=μ^n(x)μ(x)b2ρ(x)Vn(x),Vn(x)=n3/2{m=cnnm2|μ^m(x)μ^n(x)|2}1/2.

Throughout c > 0 is a small constant included to avoid unstable estimation at the boundary. By our simulations, c =0·1 works reasonably well. Intuitively, we may interpret μ̂m(x), m = 1, … , n, as observations from a population with mean μ(x) and treat μ̂n(x) as sample average. Thus, Vn(x) can be viewed as a weighted sample standard deviation with the weight m2 reflecting the accuracy of μ̂m(x), and Vn(x) mimics the usual normalizer in the Student-t distribution.

Theorem 3. Assume the conditions in Theorem 1. Let {Bt} be a standard Brownian motion. Then Tn(x)B1/{c1(BttB1)2dt}1/2 under either the sparse or the dense settings.

By Theorem 3, an asymptotic pointwise 1 − α confidence interval for μ(x) is μ̂n(x) − b2 ρ̂(x) ± q1−α/2Vn(x), where q1−α/2 is the 1 − α/2 quantile of the limiting distribution. The latter confidence interval is the same for both scenarios, with the convergence rate and limiting variance being built into the self-normalizer Vn(x) implicitly. Our method can be viewed as an extension of the parametric self-normalization methods in Lobato (2001), Kiefer & Vogelsang (2005) and Shao (2010) for time series data to the nonparametric longitudinal model (2).

In practice, however, subjects have no natural ordering, and we can use the average of multiple copies of Vn2(x) through permuting the subjects. For a large n, since it is computationally infeasible to enumerate all permutations, we consider only a fixed number, say T, of random permutations. Denote the corresponding Vn(x) by Vn1(x),,VnT(x). Consider

T˜n(x)=μ^n(x)μ(x)b2ρ(x)V˜n(x),V˜n2(x)=1Tr=1T{VnT(x)}2.

By the above analysis, the asymptotic distribution of n(x) is the same under both the sparse and dense settings. However, it is not clear whether n(x) is asymptotically normally distributed. Nevertheless, in light of the asymptotic normality of μ̂n(x), the proof of Theorem 3 and E{c1(BttB1)2dt}=(13c2+2c3)/6, we propose the pointwise confidence interval

μ^n(x)b2ρ^(x)±z1α/2V˜n(x)c1,c1=6/(13c2+2c3). (16)

Here z1−α/2 is defined in (12). We call it the rule-of-thumb self-normalization based confidence interval. Our quantile-quantile studies show that the empirical quantile of n(x) with 200 permutations matches well with that of N(0, c1) under the settings in Section 4.

4. Numerical results

Following Li & Hsing (2010), we consider the model

Yij=μ(Xij)+k=13αikΦk(Xij)+σεij(i=1,,n;j=1,,ni),

where αik ~ N (0, ωk) and εij ~ N(0, 1). Let μ(x) = 5 (x − 0.6)2, Φ1(x) = 1, Φ2 (x) = √2 sin(2πx), Φ3(x) = √2 cos(2πx), (ω1, ω2, ω3) =(0·6,0·3,0·1), and n = 200. Then the variance function γ(x, x) =0·6+0·6sin2(2πx)+0·2cos2(2πx). Two noise levels σ = 1, 2 are considered. The design points Xij are uniformly distributed on [0, 1]. For the vector N = (n1, … , nn) of the number of measurements on individual subjects, we consider four cases

N1:ni~U[{2,3,,8}];N2:ni~U[{15,16,,35}]; (17)
N3:ni~U[{30,31,,70}];N4:ni~U[{150,151,,250}]. (18)

Here U[𝒟] stands for the discrete uniform distribution on a finite set 𝒟.

We compare six confidence intervals: the two self-normalization based confidence intervals in (14) and (16) with 200 permutations, the asymptotic normality based confidence intervals (12)(13) assuming sparse and dense data, respectively, the bootstrap confidence interval with 200 bootstrap replications from sampling subjects with replacement, and the confidence interval

μ^n(x)b2ρ^(x)±z1α/2n1/2{(1τ^)γ(x,x)+τ^ψKγ(x,x)+σ2(x)bf(x)}1/2. (19)

The confidence interval (19) is practically infeasible as we need to estimate the unknown functions. Nevertheless, by using the true theoretical limiting variance function in (9), (19) serves as a standard against which we can measure the performance of other confidence intervals. When using the local linear method in Li & Hsing (2010) to estimate γ(x, x), we found that negative estimates of γ(x, x) occur frequently, especially when the noise level σ is high. For the purpose of comparison, we use the true functions γ(x, x), σ2(x) and f(x) to implement (12)(13).

We consider two criteria: empirical coverage probabilities and lengths of confidence intervals. Let x1 < ⋯ < x20 be 20 grid points evenly spaced on [0·1, 0·9]. For each xj and a given nominal level, we construct confidence intervals for μ(xj), and compute the empirical coverage probabilities based on 1000 replications. For each of the six confidence intervals, we average their empirical coverage probabilities and lengths at 20 grid points. To facilitate computations in bandwidth selection, instead of using (15) for each replication, we set b to be the average of 20 optimal bandwidths in (15) based on 20 replications from each set of parameter choices.

The results are presented in Table 1. The performance of the confidence intervals (12)(13) depends on whether the data are sparse or dense. As we increase the number of measurements on each subject from the sparse setting N1 to the dense setting N4, (12) under sparse assumption performs increasingly worse whereas (13) under dense assumption performs increasingly better. The simulation study further confirms the theoretical results in Theorem 1 that the confidence intervals (12)(13) perform well only under their corresponding sparse or dense assumption. By contrast, the self-normalization based confidence intervals (14) and (16) deliver robust and superior performance: (i) they have similar widths but slightly better coverage probabilities than the bootstrap confidence interval; and (ii) they perform similar to the infeasible confidence interval (19) with true functions. Finally, (14) and (16) have comparable performance.

Table 1.

Average empirical coverage percentages and lengths, in brackets, of six confidence intervals.

1 − α σ N SN1 SN2 NS ND NSD BS
90% 1 N1 88·1 (0·381) 88·6 (0·386) 82·8 (0·331) 66·5 (0·236) 89·2 (0·389) 87·4 (0·377)
N2 88·9 (0·288) 89·2 (0·290) 68·0 (0·178) 81·0 (0·236) 89·4 (0·292) 88·1 (0·284)
N3 89·8 (0·262) 89·8 (0·263) 57·8 (0·126) 86·3 (0·236) 90·3 (0·265) 89·1 (0·258)
N4 88·4 (0·246) 88·5 (0·247) 37·3 (0·076) 86·9 (0·236) 88·6 (0·247) 87·5 (0·242)
2 N1 88·8 (0·528) 89·3 (0·534) 86·5 (0·497) 51·7 (0·236) 89·4 (0·537) 87·8 (0·523)
N2 88·6 (0·330) 88·7 (0·332) 75·8 (0·243) 74·3 (0·236) 89·3 (0·335) 87·9 (0·326)
N3 89·5 (0·293) 89·4 (0·294) 69·5 (0·183) 81·1 (0·236) 90·1 (0·297) 88·8 (0·289)
N4 88·4 (0·257) 88·6 (0·257) 48·6 (0·106) 85·1 (0·236) 88·7 (0·258) 87·6 (0·252)
95% 1 N1 93·6 (0·454) 94·0 (0·460) 89·7 (0·394) 75·2 (0·281) 94·6 (0·464) 92·9 (0·446)
N2 94·1 (0·343) 94·3 (0·345) 76·4 (0·212) 88·1 (0·281) 94·7 (0·347) 93·4 (0·335)
N3 95·0 (0·312) 95·1 (0·314) 66·0 (0·150) 92·1 (0·281) 95·3 (0·316) 94·0 (0·305)
N4 94·2 (0·293) 94·3 (0·294) 43·7 (0·090) 92·9 (0·281) 94·3 (0·294) 93·1 (0·286)
2 N1 94·2 (0·629) 94·4 (0·636) 92·6 (0·592) 59·7 (0·281) 94·8 (0·640) 93·2 (0·618)
N2 93·9 (0·393) 94·0 (0·395) 83·6 (0·289) 82·3 (0·281) 94·2 (0·399) 93·0 (0·385)
N3 94·7 (0·349) 94·8 (0·351) 77·9 (0·219) 88·0 (0·281) 95·1 (0·354) 93·8 (0·341)
N4 94·1 (0·306) 94·1 (0·307) 56·4 (0·127) 91·6 (0·281) 94·2 (0·308) 93·0 (0·298)

SN1 and SN2: the self-normalized confidence intervals in (14) and (16) with 200 permutations, respectively; NS and ND: the asymptotic normality based confidence intervals (12)(13) assuming sparse and dense data, respectively; NSD: the infeasible confidence interval in (19); BS: bootstrap confidence interval; N1N4: the numbers of measurements on individual subjects in (17)(18).

Acknowledgements

We are grateful to Editor Anthony Davison and three referees for their constructive comments. This work was supported by a National Institute on Drug Abuse grant. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Drug Abuse or the National Institutes of Health.

Appendix: Regularity conditions and proofs

Assumption 1. (i) K(·) is bounded, symmetric, and has bounded support and bounded derivative. (ii) {υi(·)}i, {Xij}ij, {εij}ij are independent and identically distributed and mutually independent. Furthermore, the density function f(·) of Xij is twice continuously differentiable in a neighborhood of x and f(x) > 0. (iii) In a neighborhood of x, μ(·) is twice continuously differentiable, σ2(·) is continuously differentiable; in a neighborhood of (x, x), γ(x, x′) = cov{υi(x), υi(x′)} is continuously differentiable. Moreover, γ(x, x) > 0, σ2(x) > 0. (iv) E{|υi(·) + σ(·)εij |4} is continuous in a neighborhood of x and E{|υi(x) + σ(xij |4} < ∞.

Proof of Theorem 1. Let ξi be defined in (6). Recall the decomposition (5). Write

Hn=i=1nνi,νi=1nij=1niνij,νij=K(Xijxb), (20)
In=i=1nζi,ζi=1nij=1niζij,ζij={μ(Xij)μ(x)}K(Xijxb). (21)

By the symmetry of K and Taylor’s expansion, Eij) = {1 + O(b2)}bf(x), var(νij) = O(b), Eij) = b3f(x)ρ(x) + o(b3), var(ζij) = O(b3). In either the sparse or the dense case, Ei | ni) = Eij) is non-random. Thus, var(νi) = E{var(νi | ni)} = var(νij)E(1/ni) and var(Hn)=i=1nvar(νi)=O(b)i=1nE(1/ni). Write τn=n1i=1nE(1/ni). Then

Hn=E(Hn)+Op{var(Hn)}=[1+Op{b2+(nb/τn)1/2}]nbf(x). (22)

Similarly, In = nb3 f (x) ρ(x) + o(nb3) + Op{√(nb3τn)}. Thus,

In/Hn=b2ρ(x)+δn,δn=op(b2)+Op{(bτn/n)}. (23)

Dense case: Under given conditions, δn = op(n−1/2) and {nb2f2(x)}1var(i=1nξi)γ(x,x). For distinct j, r, s, k, by the argument in (7), Eijξirξisξik) = O(b4), E(ξij2ξirξis)=O(b3),E(ξij2ξir2)=O(b2),E(ξij3ξir)=O(b2),E(ξij4)=O(b). Thus, i=1nE(ξi4)=O(nb4)=o{(bn)4}. By the Lyapunov central limit theorem, i=1nξi/{bnf(x)}N{0,γ(x,x)}.

Sparse case: In (5), ξ1, … , ξn are independent and identically distributed. The result follows from δn = op{(nb)−1/2} and var(ξi) = E{var(ξi | ni)} ≈ bτψK f(x){γ(x, x) + σ2(x)}.

Proof of Theorem 2. By Theorem 1, it suffices to show nUn2(x)sdense2(x) for dense data or nbUn2(x)ssparse2(x) for sparse data. For convenience, write Kij = K{(Xijx)/b}. Let

Sn=i=1n[1nij=1ni{Yijμ^n(Xij)}Kij]2=i=1n(ξi2+ηi2+2ξiηi), (24)

where ξi is defined in (6) and ηi=ni1j=1ni{μ(Xij)μ^(Xij)}Kij. By Theorem 3.1 in Li & Hsing (2010), |μ̂(z) − μ(z)| = Op(ℓn) uniformly for z in the neighborhood of x, where ℓn = b2 + (n/ log n)−1/2 for dense data or ℓn = b2 + (nb/ log n)−1/2 for sparse data. Then ηi=Op(n)ni1j=1ni|Kij|. Using ξi=ni1j=1niξij, where ξij is defined in (6), we obtain

i=1n|ηi2+2ξiηi|=Op(n2)i=1n(1nij=1ni|Kij|)2+Op(n)i=1n1ni2j=1ni|ξij|j=1ni|Kij|Op(n)Jn,Jn=i=1n1nij=1niKij2+i=1n1ni2j=1nij=1ni(Kij2+ξij2).

Here we have used n2=o(n),(j=1ni|Kij|)2nij=1niKij2,2|Kijξij|Kij2+ξij2. By E(Kij2)=O(b)andE(ξij2)=O(b), E(Jn) = O(nb). Thus, i=1n|ηi2+2ξiηi|=Op(nbn). By (24) and the independence of ξ1, … , ξn,

Sn=i=1nE(ξi2)+Op(χn),χn={i=1nvar(ξi2)}1/2+nbn.

From the proof of Theorem 1, {nb2f2(x)}1i=1nE(ξi2)sdense2(x) for dense data or {nbf2(x)}1i=1nE(ξi2)ssparse2(x) for sparse data. By (22), Hn = {1 + op(1)}nbf(x). Thus, it remains to show χn = o(nb2) for dense data or χn = o(nb) for sparse data. In the dense case, by the proof of Theorem 1, i=1nvar(ξi2)i=1nE(ξi4)=O(nb4), and consequently χn = O{√ nb2 + nb3 + b √ (n log n)} = o(nb2). In the sparse case, by the proof of the dense case in Theorem 1, E(ξi4|ni)=O(1)ni4(ni4b4+ni3b3+ni2b2+nib),E(ξi4)=E{E(ξi4|ni)}=O(b), and thus χn = O{√ (nb) + nb3 + √ (nb log n)} = o(nb).

Proof of Theorem 3. Recall ssparse(x) and sdense(x) in Theorem 1. Let Γn = nbf(x)/Λn, Λn = √ (nb)f(x)ssparse(x) for sparse data or Λn = bnf(x)sdense(x) for dense data. Suppose we can show the weak convergence

{Γnt{μ^nt(x)μ(x)b2ρ(x)}}ct1{Bt}ct1. (25)

For convenience, we write 2(g)={c1|g(t)|2dt}1/2 and suppress the argument x. By (25) and the continuous mapping theorem, (μ̂n − μ − b2ρ)ℒ2{t(μ̂nt − μ̂n)} → B1/ℒ2(BttB1). By |n−1nt⌋ − t| ≤ n−1 for t ∈ [c, 1], ℒ2{t(μ̂nt − μ̂n)} is asymptotically equivalent to ℒ2{n−1nt⌋(μ̂nt −μ̂n)} = Vn(x), where Vn(x) is defined in Tn(x). This completes the proof.

It remains to show (25). Recall νi and ζi in (20)(21). As in (3) and (5),

μ^nt(x)μ(x)1Hnti=1ntζi=Wn(t)Hnt,Hnt=i=1ntνi,Wn(t)=i=1ntξi.

By Kolmogorov’s maximal inequality for independent random variables,

supct1|HntE(Hnt)|=maxcnmn|HmE(Hm)|=Op[{i=1nvar(νi)}1/2].

Thus, similar to (22), Hnt = [1 + Op{b2 + (nbn)−1/2}]⌊ntbf(x) uniformly in ct ≤ 1. Applying the same argument to (23) gives i=1ntζi/Hnt=b2ρ(x)+δn uniformly, where δn is defined in (23). Thus it suffices to show {Wn(t)/Λn}c≤t≤1 → {Bt}ct≤1. The finite-dimensional convergence follows from the same argument in Theorem 1 and the Cramér–Wold device. It remains to prove the tightness. Let ct < t′ ≤ 1. By independence,

Δn(t,t)=E{Wn(t)ΛnWn(t)Λn}4=1Λn4{i=nt+1ntE(ξi1)+6nt+1i<kntE(ξi2)E(ξk2)}.

By the argument in the proof of Theorem 1, in the dense case, E(ξi2)=O(b2),E(ξi4)=O(b4), and thus Δn(t, t′) = O{|tt′|/n + |tt′|2}; in the sparse case, E(ξi4)=O(b),E(ξi2)=O(b), and thus Δn(t, t′) = O{|tt′|/(nb) + |tt′|2}. This proves the tightness.

Contributor Information

Seonjin Kim, Email: szk172@psu.edu.

Zhibiao Zhao, Email: zuz13@stat.psu.edu.

References

  1. De la Peña VH, Lai TL, Shao QM. Self-Normalized Processes. New York: Springer; 2009. [Google Scholar]
  2. Fitzmaurice GM, Laird NM, Ware JM. Applied Longitudinal Analysis. New Jersey: Wiley; 2004. [Google Scholar]
  3. Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 2006;34:1493–1517. [Google Scholar]
  4. Kiefer NM, Vogelsang TJ. A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Economet. Theory. 2005;21:1130–1164. [Google Scholar]
  5. Li Y, Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Statist. 2010;38:3321–3351. [Google Scholar]
  6. Lobato IN. Testing that a dependent process is uncorrelated. J. Am. Statist. Assoc. 2001;96:1066–1076. [Google Scholar]
  7. Ma S, Yang L, Carroll RJ. A simultaneous confidence band for sparse longitudinal regression. Statist. Sinica. 2012;22:95–122. doi: 10.5705/ss.2010.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Müller HG. Functional modeling and classification of longitudinal data. Scand. J. Statist. 2005;32:223–240. [Google Scholar]
  9. Ramsay JO, Silverman BW. Functional Data Analysis. New York: Springer; 2005. [Google Scholar]
  10. Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. 1991;B 53:233–243. [Google Scholar]
  11. Shao X. A self-normalized approach to confidence interval construction in time series. J. R. Statist. Soc. 2010;B 72:343–366. [Google Scholar]
  12. Wu H, Zhang JT. Local polynomial mixed-effects for longitudinal data. J. Am. Statist. Assoc. 2002;97:883–897. [Google Scholar]
  13. Wu H, Zhang JT. Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches. New Jersey: Wiley; 2006. [Google Scholar]
  14. Yao F. Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data. J. Mult. Anal. 2007;98:40–56. [Google Scholar]
  15. Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Ann. Statist. 2005a;33:2873–2903. [Google Scholar]
  16. Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. J. Am. Statist. Assoc. 2005b;100:577–590. [Google Scholar]
  17. Zhang JT, Chen J. Statistical inferences for functional data. Ann. Statist. 2007;35:1052–1079. [Google Scholar]

RESOURCES