Simultaneous Inference For The Mean Function Based on Dense Functional Data

Guanqun Cao; Lijian Yang; David Todem

doi:10.1080/10485252.2011.638071

. Author manuscript; available in PMC: 2013 Jun 1.

Published in final edited form as: J Nonparametr Stat. 2012 Apr 30;24(2):359–377. doi: 10.1080/10485252.2011.638071

Simultaneous Inference For The Mean Function Based on Dense Functional Data

Guanqun Cao ^b, Lijian Yang ^a,^b,^*, David Todem ^c

PMCID: PMC3365609 NIHMSID: NIHMS337849 PMID: 22665964

Abstract

A polynomial spline estimator is proposed for the mean function of dense functional data together with a simultaneous confidence band which is asymptotically correct. In addition, the spline estimator and its accompanying confidence band enjoy oracle efficiency in the sense that they are asymptotically the same as if all random trajectories are observed entirely and without errors. The confidence band is also extended to the difference of mean functions of two populations of functional data. Simulation experiments provide strong evidence that corroborates the asymptotic theory while computing is efficient. The confidence band procedure is illustrated by analyzing the near infrared spectroscopy data.

Keywords: B spline, confidence band, functional data, Karhunen-Loéve L² representation, oracle efficiency

AMS Subject Classification: Primary 62M10, Secondary 62G08

1. Introduction

In functional data analysis problems, estimation of mean functions is the fundamental first step; see Cardot (2000); Rice and Wu (2001); Cuevas, Febrero and Frainman (2006); Ferraty and Vieu (2006); Degras (2011) and Ma, Yang and Carroll (2011) for example. According to Ramsay and Silverman (2005), functional data consist of a collection of iid realizations 3-1 ${η_{i} (x)}_{i = 1}^{n}$ of a smooth random function η(x), with unknown mean function Eη(x) = m(x) and covariance function G(x, x′) = cov {η(x), η(x′)}. Although the domain of η(·) is an entire interval 𝒳, the recording of each random curve η_i(x) is only over a finite number N_i of points in 𝒳, and contaminated with measurement errors. Without loss of generality, we take 𝒳 = [0, 1].

Denote by Y_ij the j-th observation of the random curve η_i(·) at time point X_ij, 1 ≤ i ≤ n, 1 ≤ j ≤ N_i. Although we refer to variable X_ij as time, it could also be other numerical measures, such as wavelength in Section 6. In this paper, we examine the equally spaced dense design, in other words, X_ij = j/N, 1 ≤ i ≤ n, 1 ≤ j ≤ N with N going to infinity. For the i-th subject, i = 1, 2, …,n, its sample path {j/N, Y_ij} is the noisy realization of the continuous time stochastic process η_i(x) in the sense that Y_ij = η_i (j/N)+σ (j/N) ε_ij, with errors ε_ij satisfying $E (ε_{ij}) = 0, E (ε_{ij}^{2}) = 1$ , and {η_i(x), x ∈ [0, 1]} are iid copies of the process {η(x), x ∈ [0, 1]} which is L², i.e., E ∫_[0,1] η²(x)dx < +∞.

For the standard process {η(x), x ∈ [0, 1]}, let sequences ${λ_{k}}_{k = 1}^{\infty}, {ψ_{k} (x)}_{k = 1}^{\infty}$ be the eigenvalues and eigenfunctions of G(x, x′) respectively, in which λ₁ ≥ λ₂ ≥ ⋯ ≥ 0, $\sum_{k = 1}^{\infty} λ_{k} < \infty, {ψ_{k}}_{k = 1}^{\infty}$ form an orthonormal basis of L² ([0, 1]) and $G (x, x') = \sum_{k = 1}^{\infty} λ_{k} ψ_{k} (x) ψ_{k} (x')$ , which implies that ∫ G(x, x′) ψ_k (x′) dx′ = λ_kψ_k(x). The process {η_i(x), x ∈ [0, 1]} allows the Karhunen-Loève L² representation $η_{i} (x) = m (x) + \sum_{k = 1}^{\infty} ξ_{ik} ϕ_{k} (x)$ , where the random coefficients ξ_ik are uncorrelated with mean 0 and variance 1, and $ϕ_{k} = \sqrt{λ_{k}} ψ_{k}$ . In what follows, we assume that λ_k = 0, for k > κ, where κ is a positive integer or ∞, thus $G (x, x') = \sum_{k = 1}^{κ} ϕ_{k} (x) ϕ_{k} (x')$ and the model that we consider is

Y_{ij} = m (j / N) + \sum_{k = 1}^{κ} ξ_{ik} ϕ_{k} (j / N) + σ (j / N) ε_{ij} .

(1)

Although the sequences ${λ_{k}}_{k = 1}^{κ}, {ϕ_{k} (\cdot)}_{k = 1}^{κ}$ and the random coefficients ξ_ik exist mathematically, they are unknown or unobservable respectively.

The existing literature focuses on two data types. Yao, Müller and Wang (2005) studied sparse longitudinal data for which N_i, i.e. the number of observations for the i-th curve, is bounded and follows a given distribution, in which case Ma, Yang and Carroll (2011) obtained asymptotically simultaneous confidence band for the mean function of the functional data, using piecewise constant spline estimation. Li and Hsing (2010a) established uniform convergence rate for local linear estimation of mean and covariance function of dense functional data, where min_1≤i≤n N_i ≫ (n/logn)^1/4 as n → ∞ similar to our Assumption (A3), but did not provide asymptotic distribution of maximal deviation or simultaneous confidence band. Degras (2011) built asymptotically correct simultaneous confidence band for dense functional data using local linear estimator. Bunea, Ivanescu and Wegkamp (2011) proposed asymptotically conservative rather than correct confidence set for the mean function of Gaussian functional data.

In this paper, we propose polynomial spline confidence band for the mean function based on dense functional data. In function estimation problems, simultaneous confidence band is an important tool to address the variability in the mean curve, see Zhao and Wu (2008); Zhou, Shen and Wolfe (1998) and Zhou and Wu (2010) for related theory and applications. The fact that simultaneous confidence bands have not been widely used for functional data analysis is certainly not due to lack of interesting applications, but to the greater technical difficulty to formulate such bands for functional data and establish their theoretical properties. In this work, we have established asymptotic correctness of the proposed confidence band using various properties of spline smoothing. The spline estimator and the accompanying confidence band are asymptotically the same as if all the n random curves are recorded over the entire interval, without measurement errors. They are oracally efficient despite the use of spline smoothing, see Remark 1. This provides partial theoretical justification for treating functional data as perfectly recorded random curves over the entire data range, as in Ferraty and Vieu (2006). Theorem 3 of Hall, Müller and Wang (2006) stated mean square (rather than the stronger uniform) oracle efficiency for local linear estimation of eigenfunctions and eigenvalues (rather than the mean function), under assumptions similar to ours, but provided only an outline of proof. Among the existing works on functional data analysis, Ma, Yang and Carroll (2011) proposed the simultaneous confidence band for sparse functional data. However, their result does not enjoy the oracle efficiency stated in Theorem 2.1, since there are not enough observations for each subject to obtain an good estimate of the individual trajectories. As a result, it has the slow nonparametric convergence rate of n^−1/3logn, instead of the parametric rate of n^−1/2 as this paper. This essential difference completely separates dense functional data from sparse ones.

The aforementioned confidence band is also extended to the difference of two regression functions. This is motivated by Li and Yu (2008), which applied functional segment discriminant analysis to a Tecator data set, see Figure 3. In this data set, each observation (meat) consists of a 100-channel absorbance spectrum in the wavelength with different fat, water and protein percent. Li and Yu (2008) used the spectra to predict whether the fat percentage is greater than 20%. On the flip side, we are interested in building a 100 (1 − α) % confidence band for the difference between regression functions from the spectra of the less than 20% fat group and the higher than 20% fat group. If this 100 (1 − α) % confidence band covers the zero line, one accepts the null hypothesis of no difference between the two groups, with p-value no greater than α. Test for equality between two groups of curves based on the adaptive Neyman test and wavelet thresholding techniques were proposed in Fan and Lin (1998), which did not provide an estimator of the difference of the two mean functions nor a simultaneous confidence band for such estimator. As a result, their test did not extend to testing other important hypotheses on the difference of the two mean functions while our Theorem 2.3 provides a benchmark for all such testing. More recently, Benko, Häardle and Kneip (2009) developed two-sample bootstrap tests for the equality of eigenfunctions, eigenvalues and mean functions by using common functional principal components and bootstrap tests.

Left: Plot of Tecator data. Right: Sample curves for the Tecator data. Each class has 10 sample curves. Dashed lines represent spectra with fact > 20% and solid lines represent spectra with fact < 20%.

The paper is organized as follows. Section 2 states main theoretical results on confidence bands constructed from polynomial splines. Section 3 provides further insights into the error structure of spline estimators. The actual steps to implement the confidence bands are provided in Section 4. A simulation study is presented in Section 5, and an empirical illustration on how to use the proposed spline confidence band for inference is reported in Section 6. Technical proofs are collected in the Appendix.

2. Main results

For any Lebesgue measurable function ϕ on [0, 1], denote ‖ϕ‖_∞ = sup_x∈[0,1] |ϕ(x)|. For any ν ∈ (0, 1] and nonnegative integer q, let C^q,ν [0, 1] be the space of functions with ν-Häolder continuous q-th order derivatives on [0, 1], i.e.

C^{q, ν} [0, 1] = {ϕ : {‖ ϕ ‖}_{q, ν} = sup_{t \neq s, t, s \in [0, 1]} \frac{| ϕ^{(q)} (t) - ϕ^{(q)} (s) |}{| t - s |^{ν}} < + \infty} .

To describe the spline functions, we first introduce a sequence of equally-spaced points ${t_{J}}_{J = 1}^{N_{m}}$ , called interior knots which divide the interval [0, 1] into (N_m + 1) equal subintervals I_J = [t_J, t_J+1), J = 0, ….,N_m − 1, I_{N_m} = [t_{N_m}, 1]. For any positive integer p, introduce left boundary knots t_1−p, …,t₀, and right boundary knots t_{N_m+1}, ….,t_{N_m+p},

\begin{matrix} t_{1 - p} = & \dots = t_{0} = 0 < t_{1} < \dots < t_{N_{m}} < 1 = t_{N_{m} + 1} = \dots = t_{N_{m} + p}, \\ t_{J} = & {Jh}_{m}, 0 \leq J \leq N_{m} + 1, h_{m} = 1 / (N_{m} + 1), \end{matrix}

in which h_m is the distance between neighboring knots. Denote by ℋ^(p−2) the space of p-th order spline space, i.e., p − 2 times continuously differentiable functions on [0, 1] that are polynomials of degree p − 1 on [t_J, t_J+1], J = 0,…, N_m. Then $ℋ^{(p - 2)} = {\sum_{J = 1 - p}^{N_{m}} b_{J, p} B_{J, p} (x), b_{J, p} \in ℛ, x \in [0, 1]}$ , where B_J,p is the J-th B-spline basis of order p as defined in de Boor (2001).

We propose to estimate the mean function m(x) by

{\hat{m}}_{p} (x) = \underset{g (\cdot) \in ℋ^{(p - 2)}}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{N} {Y_{ij} - g (j / N)}^{2} .

(2)

The technical assumptions we need are as follows:

(A1)
The regression function m ∈ C^p−1,1 [0, 1], i.e., m^(p−1) ∈ C^0,1 [0, 1].
(A2)
The standard deviation function σ(x) ∈ C^{0, μ} [0, 1] for some μ ∈ (0, 1].
(A3)
As n → ∞, N⁻¹n^1/(2p) → 0 and N = O (n^θ) for some θ > 1/ (2p); the number of interior knots N_m satisfies ${NN}_{m}^{- 1} \to \infty, N_{m}^{- p} n^{1 / 2} \to 0, N^{- 1 / 2} N_{m}^{1 / 2} log n \to 0$ or equivalently Nh_m → ∞, $h_{m}^{p} n^{1 / 2} \to 0, N^{- 1 / 2} h_{m}^{- 1 / 2} log n \to 0$ .
(A4)
There exists C_G > 0 such that G(x, x) ≥ C_G, x ∈ [0, 1], for k ∈ {1, …, κ}, ϕ_k (x) ∈ C^0,μ [0, 1], $\sum_{k = 1}^{κ} {‖ ϕ_{k} ‖}_{\infty} < \infty and as n \to \infty, h_{m}^{μ} \sum_{k = 1}^{κ_{n}} {‖ ϕ_{k} ‖}_{0, μ} = o (1)$ for a sequence ${κ_{n}}_{n = 1}^{\infty}$ of increasing integers, with lim_{_n→∞} κ_n= κ and the constant μ ∈ (0, 1] as in Assumption (A2). In particular, $\sum_{k = κ_{n} + 1}^{κ} {‖ ϕ_{k} ‖}_{\infty} = o (1)$ .
(A5)
There are constants C₁,C₂ ∈ (0,+∞), γ₁, γ₂ ∈ (1,+∞), β ∈ (0, 1/2) and iid N(0, 1) variables ${Z_{ik, ξ}}_{i = 1, k = 1}^{n, κ}, {Z_{ij, ε}}_{i = 1, j = 1}^{n, N}$ such that
$max_{1 \leq k \leq κ} P [max_{1 \leq t \leq n} | \sum_{i = 1}^{t} ξ_{ik} - \sum_{i = 1}^{t} Z_{ik, ξ} | > C_{1} n^{β}] < C_{2} n^{- γ_{1}},$ (3)

$P {max_{1 \leq j \leq N} max_{1 \leq t \leq n} | \sum_{i = 1}^{t} ε_{ij} - \sum_{i = 1}^{t} Z_{ij, ε} | > C_{1} n^{β}} < C_{2} n^{- γ_{2}} .$ (4)

Assumptions (A1)–(A2) are typical for spline smoothing, see Huang and Yang (2004), Xue and Yang (2006) and Wang and Yang (2009a). Assumption (A3) concerns the number of observations for each subject, and the number of knots of B-splines. Assumption (A4) ensures that the principal components have collectively bounded smoothness. Assumption (A5) provides Gaussian approximation of estimation error process, and is ensured by the following elementary assumption:

(A5’)
There exist η₁ > 4, η₂ > 4+2θ such that E |ξ_ik|^η₁+E |ε_ij|^η₂ < +∞, for 1 ≤ i < ∞, 1 ≤ k ≤ κ, 1 ≤ j < ∞. The number κ of nonzero eigenvalues is finite or κ is infinite while the variables {ξ_ik}_{1≤i<∞,1≤k<∞} are iid.

Degras (2011) makes a restrictive assumption (A.2) on the Hölder continuity of the stochastic process $η (x) = m (x) + \sum_{k = 1}^{\infty} ξ_{k} ϕ_{k} (x)$ . It is elementary to construct examples where our Assumptions (A4) and (A5) are satisfied while assumption (A.2) of Degras (2011) is not.

The part of Assumption (A4) on ϕ_k’s holds trivially if κ is finite and all ϕ_k (x) ∈ C^0,μ [0, 1]. Note also that by definition, $ϕ_{k} = \sqrt{λ_{k}} ψ_{k}, {‖ ϕ_{k} ‖}_{\infty} = \sqrt{λ_{k}} {‖ ψ_{k} ‖}_{\infty}, {‖ ϕ_{k} ‖}_{0, μ} = \sqrt{λ_{k}} {‖ ψ_{k} ‖}_{0, μ}$ , in which ${ψ_{k}}_{k = 1}^{\infty}$ form an orthonormal basis of L² ([0, 1]), hence, Assumption (A4) is fulfilled for κ = ∞ as long as λ_k decreases to zero sufficiently fast. Following one Referee’s suggestion, we provide the following example. One takes λ_k = ρ^2[k/2], k = 1, 2, … for any ρ ∈ (0, 1), with ${ψ_{k}}_{k = 1}^{\infty}$ the canonical orthonormal Fourier basis of L² ([0, 1])

\begin{matrix} ψ_{1} (x) \equiv & 1, ψ_{2 k + 1} (x) \equiv \sqrt{2} cos (k π x) \\ ψ_{2 k} (x) \equiv & \sqrt{2} sin (k π x), k = 1, 2, \dots, x \in [0, 1] . \end{matrix}

In this case, $\sum_{k = 1}^{\infty} {‖ ϕ_{k} ‖}_{\infty} = 1 + \sum_{k = 1}^{\infty} ρ^{k} (\sqrt{2} + \sqrt{2}) = 1 + 2 \sqrt{2} ρ {(1 - ρ)}^{- 1} < \infty$ , while for any ${κ_{n}}_{n = 1}^{\infty}$ with κ_n increasing, odd and κ_n → ∞, and Lipschitz order μ = 1

h_{m} \sum_{k = 1}^{κ_{n}} {‖ ϕ_{k} ‖}_{0, 1} = h_{m} \sum_{k = 1}^{(κ_{n} - 1) / 2} ρ^{k} (\sqrt{2} k π + \sqrt{2} k π) \leq 2 \sqrt{2} π h_{m} ρ \sum_{k = 1}^{\infty} ρ^{k - 1} k = 2 \sqrt{2} π h_{m} {(1 - ρ)}^{- 2} = O (h_{m}) = o (1) .

Denote by ζ (x), x ∈ [0, 1] a standardized Gaussian process such that Eζ (x) ≡ 0, Eζ² (x) ≡ 1, x ∈ [0, 1] with covariance function

E ζ (x) ζ (x') = G (x, x') {G (x, x) G (x', x')}^{- 1 / 2}, x, x' \in [0, 1]

and define the 100 × (1 − α)-th percentile of the absolute maxima distribution of ζ (x), ∀x ∈ [0, 1], i.e., P [sup_x∈[0,1] |ζ (x)| ≤ Q_1−α] = 1 − α, ∀α ∈ (0, 1). Denote by z_1−α/2 the 100 (1 − α/2)-th percentile of the standard normal distribution. Define also the following “infeasible estimator” of function m

\bar{m} (x) = \bar{η} (x) = n^{- 1} \sum_{i = 1}^{n} η_{i} (x), x \in [0, 1] .

(5)

The term “infeasible” refers to the fact that m̄(x) is computed from unknown quantity η_i(x), x ∈ [0, 1], and it would be the natural estimator of m(x) if all the iid random curves η_i(x), x ∈ [0, 1] were observed, a view taken in Ferraty and Vieu (2006).

We now state our main results in the following theorem.

Theorem 2.1 : Under Assumptions (A1)–(A5), for ∀α ∈ (0, 1), as n → ∞, the “infeasible estimator” m̄(x) converges at the $\sqrt{n}$ rate

\begin{matrix} P {{sup}_{x \in [0, 1]} n^{1 / 2} | \bar{m} (x) - m (x) | G {(x, x)}^{- 1 / 2} \leq Q_{1 - α}} \to 1 - α, \\ P {n^{1 / 2} | \bar{m} (x) - m (x) | G {(x, x)}^{- 1 / 2} \leq z_{1 - α / 2}} \to 1 - α, \forall x \in [0, 1], \end{matrix}

while the spline estimator m̂_p is asymptotically equivalent to m̄ up to order n^1/2, i.e.

{sup}_{x \in [0, 1]} n^{1 / 2} | \bar{m} (x) - {\hat{m}}_{p} (x) | = o_{P} (1) .

Remark 1 : The significance of Theorem 2.1 lies in the fact that one does not need to distinguish between the spline estimator m̂_p and the “infeasible estimator” m̄ in (5), which converges with $\sqrt{n}$ rate like a parametric estimator. We therefore have established oracle efficiency of the nonparametric estimator m̂p.

Corollary 2.2: Under Assumptions (A1)–(A5), as n → ∞, an asymptotic 100 (1 − α) % correct confidence band for m(x), x ∈ [0, 1] is

{\hat{m}}_{p} (x) \pm G {(x, x)}^{1 / 2} Q_{1 - α} n^{- 1 / 2}, \forall α \in (0, 1)

while an asymptotic 100 (1 − α) % pointwise confidence interval for m(x), x ∈ [0, 1], is m̂_p(x) ± G(x, x)^1/2 z_1−α/2n^−1/2.

We next describe a two-sample extension of Theorem 2.1. Denote two samples indicated by d = 1, 2, which satisfy

Y_{dij} = m_{d} (j / N) + \sum_{k = 1}^{κ_{d}} ξ_{dik} ϕ_{dk} (j / N) + σ_{d} (j / N) ε_{dij}, 1 \leq i \leq n_{d}, 1 \leq j \leq N

with covariance functions $G_{d} (x, x') = \sum_{k = 1}^{κ_{d}} ϕ_{dk} (x) ϕ_{dk} (x')$ respectively. We denote the ratio of two sample sizes as r̂ = n₁/n₂ and assume that lim_n₁→∞ r̂ = r > 0.

For both groups, let m̂_1p(x) and m̂_2p(x) be the order p spline estimates of mean functions m₁(x) and m₂(x) by (2). Also denote by ζ₁₂ (x), x ∈ [0, 1] a standardized Gaussian process such that Eζ₁₂ (x) ≡ 0, $E ζ_{12}^{2} (x) \equiv 1$ , x ∈ [0, 1] with covariance function

E ζ_{12} (x) ζ_{12} (x') = \frac{G_{1} (x, x') + {rG}_{2} (x, x')}{{G_{1} (x, x) + {rG}_{2} (x, x)}^{1 / 2} {G_{1} (x, x') + {rG}_{2} (x, x')}^{1 / 2}}, x, x' \in [0, 1] .

Denote by Q_12,1−α the (1 − α)-th quantile of the absolute maxima deviation of ζ₁₂ (x), x ∈ [0, 1] as above. We mimic the two sample t-test and state the following theorem whose proof is analogous to that of Theorem 2.1.

Theorem 2.3 : If Assumptions (A1)–(A5) are modified for each group accordingly, then for any α ∈ (0, 1), as n₁ → ∞, r̂ → r > 0,

P {{sup}_{x \in [0, 1]} \frac{n_{1}^{1 / 2} | ({\hat{m}}_{1 p} - {\hat{m}}_{2 p} - m_{1} + m_{2}) (x) |}{{(G_{1} + {rG}_{2}) (x, x)}^{1 / 2}} \leq Q_{12, 1 - α}} \to 1 - α .

Theorems 2.3 yields uniform asymptotic confidence band for m₁(x)−m₂(x), x ∈ [0, 1].

Corollary 2.4: If Assumptions (A1)–(A5) are modified for each group accordingly, as n₁ → ∞, r̂ → r > 0, a 100 × (1 − α) % asymptotically correct confidence band for m₁(x)−m₂(x), x ∈ [0, 1] is $({\hat{m}}_{1 p} - {\hat{m}}_{2 p}) (x) \pm n_{1}^{- 1 / 2} Q_{12, 1 - α} {(G_{1} + r G_{2}) (x, x)}^{1 / 2}$ , ∀α ∈ (0, 1).

If the confidence band in Corollary 2.2 is used to test hypothesis

H_{0} : m (x) = m_{0} (x), \forall x \in [0, 1] \leftrightarrow H_{a} : m (x) \neq m_{0} (x), for some x \in [0, 1],

for some given function m₀(x), as one referee pointed out, the asymptotic power of the test is α under H₀, 1 under H₁ due to Theorem 2.1. The same can be said for testing hypothesis about m₁(x) − m₂(x) using the confidence band in Corollary 2.4.

3. Error Decomposition For the Spline Estimators

In this section, we break the estimation error m̂_p(x) − m(x) into three terms. We begin by discussing the representation of the spline estimator m̂p(x) in (2).

The definition of m̂_p(x) in (2) means that

{\hat{m}}_{p} (x) \equiv \sum_{J = 1 - p}^{N_{m}} {\hat{β}}_{J, p} B_{J, p} (x),

with coefficients {β̂_1−p,p, …, β̂_{N_m,p} }^T solving the following least squares problem

{{\hat{β}}_{1 - p, p}, \dots, {\hat{β}}_{N_{m}, p}}^{T} = \underset{{β_{1 - p, p}, \dots, β_{N_{m}, p}} \in R^{N_{m + p}}}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{N} {Y_{ij} - \sum_{J = 1 - p}^{N_{m}} β_{J, p} B_{J, p} (j / N)}^{2} .

(6)

Applying elementary algebra, one obtains

{\hat{m}}_{p} (x) = {B_{1 - p, p} (x), \dots, B_{N_{m, p}} (x)} {(X^{T} X)}^{- 1} X^{T} Y

(7)

where Y = (Ȳ_.1,…, Ȳ_.N)^T, ${\bar{Y}}_{․ j} = n^{- 1} \sum_{i = 1}^{n} Y_{ij}, 1 \leq j \leq N$ , and the design matrix X is

X = {(\begin{matrix} B_{1 - p, p} (1 / N) & \dots & B_{N_{m}, p} (1 / N) \\ \dots & \dots & \dots \\ B_{1 - p, p} (N / N) & \dots & B_{N_{m}, p} (N / N) \end{matrix})}_{N \times (N_{m} + p)} .

Projecting via (7) the relationship in model (1) onto the linear subspace of R^N_m+p spanned by (B_J,p (j/N)}_{1≤j≤N,1−p≤J≤N_m}, we obtain the following crucial decomposition in the space ℋ^(p−2) of spline functions:

{\hat{m}}_{p} (x) = {\tilde{m}}_{p} (x) + {\tilde{e}}_{p} (x) + {\tilde{ξ}}_{p} (x),

(8)

where

\begin{matrix} {\tilde{m}}_{p} (x) & = \sum_{J = 1 - p}^{N_{m}} {\tilde{β}}_{J, p} B_{J, p} (x), {\tilde{ε}}_{p} (x) = \sum_{J = 1 - p}^{N_{m}} {\tilde{a}}_{J, p} B_{J, p} (x), \\ {\tilde{ξ}}_{p} (x) & = \sum_{k = 1}^{κ} {\tilde{ξ}}_{k, p} (x), {\tilde{ξ}}_{k, p} (x) = \sum_{J = 1 - p}^{N_{m}} {\tilde{τ}}_{k, J, p} B_{J, p} (x) . \end{matrix}

(9)

The vectors {β̃_1−p, …,β̃N_m}^T, {ã_1−p, …, ãN_m}^T and {τ̃_k,1−p, …,τ̃_{k,N_m}}^T in (9) are solutions to (6) with Y_ij replaced by m(j/N), σ (j/N) ε_ij and ξ_ikϕ_k (j/N) respectively.

Alternatively,

\begin{matrix} {\tilde{m}}_{p} (x) & = {B_{1 - p, p} (x), \dots, B_{N_{m}, p} (x)} {(X^{T} X)}^{- 1} X^{T} m \\ {\tilde{e}}_{p} (x) & = {B_{1 - p, p} (x), \dots, B_{N_{m}, p} (x)} {(X^{T} X)}^{- 1} X^{T} e \\ {\tilde{ξ}}_{k, p} (x) & = {\bar{ξ}}_{. k} {B_{1 - p, p} (x), \dots, B_{N_{m}, p} (x)} {(X^{T} X)}^{- 1} X^{T} ϕ_{k}, 1 \leq k \leq κ \end{matrix}

in which m = (m(1/N),…m(N/N))^T is the signal vector, $e = {(σ (1 / N) {\bar{ε}}_{.1}, \dots, σ (N / N) {\bar{ε}}_{. N})}^{T}, {\bar{ε}}_{. j} = n^{- 1} \sum_{i = 1}^{n} ε_{ij}, 1 \leq j \leq N$ is the noise vector and ϕ_k = (ϕ_k (1/N),…, ϕ_k (N/N))^T are the eigenfunction vectors, and ${\bar{ξ}}_{. k} = n^{- 1} \sum_{i = 1}^{n} ξ_{ik}, 1 \leq k \leq κ$ .

We cite next an important result from de Boor (2001), p. 149.

Theorem 3.1 : There is an absolute constant C_p−1,μ > 0 such that for every ϕ ∈ C^p−1,μ [0, 1] for some μ ∈ (0, 1], there exists a function g ∈ ℋ^(p−1) [0, 1] for which ${‖ g - ϕ ‖}_{\infty} \leq C_{p - 1, μ} {‖ ϕ^{(p - 1)} ‖}_{0, μ} h_{m}^{μ + p - 1}$ .

The next three propositions concern m̃_p(x), ẽ_p(x) and ζ̃_p(x) given in (8).

Proposition 3.2: Under Assumptions (A1) and (A3), as n → ∞

{sup}_{x \in [0, 1]} n^{1 / 2} | {\tilde{m}}_{p} (x) - m (x) | = o (1) .

(10)

Proposition 3.3: Under Assumptions (A2)–(A4), as n → ∞

{sup}_{x \in [0, 1]} n^{1 / 2} | {\tilde{e}}_{p} (x) | = o_{P} (1) .

(11)

Proposition 3.4: Under Assumptions (A2)–(A4), as n → ∞

{sup}_{x \in [0, 1]} n^{1 / 2} | {\tilde{ξ}}_{p} (x) - (\bar{m} (x) - m (x)) | = o_{P} (1)

(12)

also for any α ∈ (0, 1)

P {{sup}_{x \in [0, 1]} n^{1 / 2} | {\tilde{ξ}}_{p} (x) | G {(x, x)}^{- 1 / 2} \leq Q_{1 - α}} \to 1 - α .

(13)

Equations (10), (11) and (12) yield the asymptotic efficiency of the spline estimator m̂_p, i.e. sup_x∈[0,1] n^1/2 |m̄(x) − m̂_p(x)| = o_P (1). The Appendix contains proofs for the above three propositions, which together with (8), imply Theorem 2.1.

4. Implementation

This section describes procedures to implement the confidence band in Corollary 2.2.

Given any data set ${(j / N, Y_{ij})}_{j = 1, i = 1}^{N, n}$ from model (1), the spline estimator m̂_p (x) is obtained from (7), the number of interior knots in estimating m(x) is taken to be N_m = [cn^1/(2p)log (n)], in which [a] denotes the integer part of a. Our experiences show that the choice of constant c = 0.2, 0.3, 0.5, 1, 2 seems quite adequate, and that is what we recommend. When constructing the confidence bands, one needs to estimate the unknown functions G(·, ·) and the quantile Q_1−α and then plug in these estimators: the same approach is taken in Ma, Yang and Carroll (2011) and Wang and Yang (2009a).

The pilot estimator Ĝ_p (x, x′) of covariance function G(x, x′) is

{\hat{G}}_{p} = \underset{g (\cdot, \cdot) \in ℋ^{(p - 2), 2}}{argmin} \sum_{j \neq j'}^{N} {C_{. jj'} - g (j / N, j' / N)}^{2},

with $C_{. jj'} = n^{- 1} \sum_{i = 1}^{n} {Y_{ij} - {\hat{m}}_{p} (j / N)} {Y_{ij'} - {\hat{m}}_{p} (j' / N)}, 1 \leq j \neq j' \leq N$ and the tensor product spline space $ℋ^{(p - 2), 2} = {\sum_{J, J' = 1 - p}^{N_{G}} b_{JJ'} B_{J, p} (t) B_{J', p} (s), b_{JJ'} \in ℛ, t, s \in [0, 1]}$ in which N_G = [n^1/(2p)log(log(n))].

In order to estimate Q_1−α, one first does the eigenfunction decomposition of Ĝ_p (x, x′), i.e. $N^{- 1} \sum_{j = 1}^{N} {\hat{G}}_{p} (j / N, j' / N) {\hat{ψ}}_{k} (j / N) = {\hat{λ}}_{k} {\hat{ψ}}_{k} (j' / N)$ , to obtain the estimated eigenvalues λ̂_k and eigenfunctions ψ̂_k. Next, one chooses the number κ of eigenfunctions by using the following standard and efficient criterion, i.e. $κ = {argmin}_{1 \leq l \leq T} {\sum_{k = 1}^{l} {\hat{λ}}_{k} / \sum_{k = 1}^{T} {\hat{λ}}_{k} > 0.95}, where {λ_{k}}_{k = 1}^{T}$ are the first T estimated positive eigenvalues. Finally, one simulates ${\hat{ζ}}_{b} (x) = {\hat{G}}_{p} {(x, x)}^{- 1 / 2} \sum_{k = 1}^{κ} Z_{k, b} {\hat{ϕ}}_{k} (x), where {\hat{ϕ}}_{k} = \sqrt{{\hat{λ}}_{k}} {\hat{ψ}}_{k}, Z_{k, b}$ are i.i.d standard normal variables with 1 ≤ k ≤ κ and b = 1, …, b_M, where b_M is a preset large integer, the default of which is 1000. One takes the maximal absolute value for each copy of ζ̂_b (x) and estimates Q_1−α by the empirical quantile Q̂_1−α of these maximum values. One then uses the following confidence band

{\hat{m}}_{p} (x) \pm n^{- 1 / 2} {\hat{G}}_{p} {(x, x)}^{1 / 2} {\hat{Q}}_{1 - α}, x \in [0, 1],

(14)

for the mean function. One estimates Q_12,1−α analogous to Q̂_1−α and computes

({\hat{m}}_{1 p} - {\hat{m}}_{2 p}) (x) \pm n_{1}^{- 1 / 2} {\hat{Q}}_{12, 1 - α} {({\hat{G}}_{1 p} + \hat{r} {\hat{G}}_{2 p}) (x, x)}^{1 / 2},

(15)

as confidence band for m₁(x) − m₂(x). Although beyond the scope of this paper, as one referee pointed out, the confidence band in (14) is expected to enjoy the same asymptotic coverage as if true values of Q_1−α and G(x, x) were used instead, due to the consistency of Ĝ_p (x, x) estimating G(x, x). The same holds for the band in (15).

5. Simulation

To demonstrate the practical performance of our theoretical results, we perform a set of simulation studies. Data are generated from model

Y_{ij} = m (j / N) + \sum_{k = 1}^{2} ξ_{ik} ϕ_{k} (j / N) + σ ε_{ij}, 1 \leq j \leq N, 1 \leq i \leq n,

(16)

where ξ_ik ~ N(0, 1), k = 1, 2, ε_ij ~ N(0, 1), for 1 ≤ i ≤ n, 1 ≤ j ≤ N, m(x) = 10+sin {2π (x − 1/2)}, ϕ₁(x) = −2 cos {π (x − 1/2)} and ϕ₂(x) = sin {π (x − 1/2)}. This setting implies λ₁ = 2 and λ₂ = 0.5. The noise levels are set to be σ = 0.5 and 0.3. The number of subjects n is taken to be 60, 100, 200, 300 and 500, and under each sample size the number of observations per curve is assumed to be N = [n^0.25log²(n)]. This simulated process has a similar design as one of the simulation models in Yao, Müller and Wang (2005), except that each subject is densely observed. We consider both linear and cubic spline estimators, and use confidence levels 1 − α = 0.95 and 0.99 for our simultaneous confidence bands. The constant c in the definition of N_m in Section 4 is taken to be 0.2, 0.3, 0.5, 1 and 2. Each simulation is repeated 500 times.

Figures 1 and 2 show the estimated mean functions and their 95% confidence bands for the true curve m(·) in Model (16) with σ = 0.3 and n = 100, 200, 300, 500 respectively. As expected when n increases, the confidence band becomes narrower and the linear and cubic spline estimators are closer to the true curve.

Plots of the linear spline estimator (2) for simulated data (dashed-dotted line) and 95% confidence bands (14) (upper and lower dashed lines) (14) for m(x) (solid lines). In all panels, σ = 0.3.

Plots of the cubic spline estimator (2) for simulated data (dashed-dotted line) and 95% confidence bands (14) (upper and lower dashed lines) (14) for m(x) (solid lines). In all panels, σ = 0.3.

Tables 1 and 2 show the empirical frequency that the true curve m(·) is covered by the linear and cubic spline confidence bands (14) at 100 points {1/100, …, 99/100, 1} respectively. At all noise levels, the coverage percentages for the confidence band are close to the nominal confidence levels 0.95 and 0.99 for linear splines with c = 0.5, 1 (Table 1), and cubic splines with c = 0.3, 0.5 (Table 2) but decline slightly for c = 2 and markedly for c = 0.2. The coverage percentages thus depend on the choice of N_m, and the dependency becomes stronger when sample sizes decrease. For large sample sizes n = 300, 500, the effect of the choice of N_m on the coverage percentages is negligible. Although our theory indicates no optimal choice of c, we recommend using c = 0.5 for data analysis as its performance in simulation for both linear and cubic splines is either optimal or near optimal.

Table 1.

Coverage frequencies from 500 replications using linear spline (14) with p = 2 and N_m = [cn^1/(2p)log(n)].

n	1 − α	Coverage frequency σ = 0.5					Coverage frequency σ = 0.3

		c = 0.2	c = 0.3	c = 0.5	c = 1	c = 2	c = 0.2	c = 0.3	c = 0.5	c = 1	c = 2
60	0.950	0.384	0.790	0.876	0.894	0.852	0.410	0.786	0.930	0.914	0.884
	0.990	0.692	0.938	0.970	0.976	0.942	0.702	0.950	0.972	0.966	0.954

100	0.950	0.184	0.826	0.886	0.884	0.838	0.198	0.822	0.916	0.916	0.896
	0.990	0.476	0.936	0.964	0.966	0.944	0.496	0.940	0.974	0.974	0.968

200	0.950	0.418	0.856	0.914	0.922	0.862	0.414	0.862	0.946	0.942	0.926
	0.990	0.712	0.966	0.976	0.990	0.972	0.720	0.966	0.984	0.984	0.980

300	0.950	0.600	0.888	0.920	0.932	0.874	0.602	0.896	0.940	0.934	0.926
	0.990	0.834	0.978	0.976	0.980	0.972	0.840	0.982	0.984	0.986	0.980

500	0.950	0.772	0.880	0.922	0.886	0.894	0.768	0.888	0.954	0.950	0.942
	0.990	0.902	0.964	0.984	0.976	0.976	0.906	0.968	0.992	0.994	0.988

Open in a new tab

Table 2.

Coverage frequencies from 500 replications using cubic spline (14) with p = 4 and N_m = [cn^1/(2p)log(n)].

n	1 − α	Coverage frequency σ = 0.5					Coverage frequency σ = 0.3

		c = 0.2	c = 0.3	c = 0.5	c = 1	c = 2	c = 0.2	c = 0.3	c = 0.5	c = 1	c = 2
60	0.950	0.644	0.916	0.902	0.890	0.738	0.672	0.922	0.940	0.940	0.916
	0.990	0.866	0.980	0.958	0.964	0.888	0.884	0.986	0.986	0.984	0.982

100	0.950	0.596	0.902	0.904	0.876	0.846	0.610	0.916	0.914	0.914	0.896
	0.990	0.786	0.970	0.968	0.956	0.952	0.798	0.980	0.974	0.970	0.964

200	0.950	0.928	0.942	0.932	0.936	0.904	0.938	0.952	0.950	0.948	0.934
	0.990	0.978	0.992	0.982	0.992	0.978	0.982	0.984	0.992	0.982	0.984

300	0.950	0.920	0.948	0.926	0.948	0.898	0.922	0.956	0.948	0.942	0.938
	0.990	0.976	0.986	0.986	0.988	0.980	0.982	0.984	0.988	0.984	0.982

500	0.950	0.928	0.922	0.954	0.902	0.898	0.928	0.928	0.936	0.932	0.916
	0.990	0.980	0.982	0.990	0.976	0.978	0.980	0.982	0.990	0.990	0.992

Open in a new tab

Following the suggestion of one referee and the Associate Editor, we compare by simulation the proposed spline confidence band to the least squares Bonferroni and least squares bootstrap bands in Bunea, Ivanescu and Wegkamp (2011) (BIW). Table 3 presents the empirical frequency that the true curve m(·) for model (16) is covered by these bands at {1/100, …, 99/100, 1} respectively as Table 1. The coverage frequency of the BIW Bonferroni band is much higher than the nominal level making it too conservative. The coverage frequency of the BIW bootstrap band is consistently lower than the nominal level by at least 10%, thus not recommended for practical use.

Table 3.

Coverage frequencies from 500 replications using least squares Bonferroni band and least squares Bootstrap band.

n	1 − α	Coverage frequency least squares Bonferroni		Coverage frequency least squares bootstrap

		σ = 0.5	σ = 0.3	σ = 0.5	σ = 0.3
60	0.950	0.990	0.988	0.742	0.744
	0.990	0.994	0.994	0.856	0.864

100	0.950	0.996	0.998	0.678	0.712
	0.990	0.998	1.000	0.860	0.870

200	0.950	0.988	0.992	0.710	0.734
	0.990	1.000	1.000	0.856	0.888

300	0.950	0.988	0.998	0.704	0.720
	0.990	1.000	1.000	0.868	0.870

500	0.950	0.996	0.998	0.718	0.732
	0.990	1.000	1.000	0.856	0.860

Open in a new tab

Following the suggestion of one referee and the Associate Editor, we also compare the widths of the three bands. For each replication, we calculate the ratios of widths of the two BIW bands against the spline band at {1/100, …, 99/100, 1} and then average these 100 ratios. Table 4 shows the five number summary of these 500 averaged ratios for σ = 0.3 and p = 4. The BIW Bonferroni band is much wider than cubic spline band, making it undesirable. While the BIW bootstrap band is narrower, we have mentioned previously that its coverage frequency is too low to be useful in practice. Simulation for other cases (e.g. p = 2, σ = 0.5) leads to the same conclusion.

Table 4.

Five number summary of ratios of confidence band widths.

n	1 − α	least squares Bonferroni/cubic spline					least squares bootstrap/cubic spline

		Min.	Q1	Med.	Q3	Max.	Min.	Q1	Med.	Q3	Max.
60	0.950	0.964	1.219	1.299	1.397	1.845	0.522	0.667	0.716	0.770	0.967
	0.990	0.907	1.114	1.188	1.285	1.730	0.527	0.662	0.715	0.770	1.048

100	0.950	0.995	1.263	1.331	1.415	1.684	0.565	0.675	0.714	0.754	0.888
	0.990	0.910	1.148	1.219	1.295	1.603	0.536	0.665	0.708	0.752	0.925

200	0.950	1.169	1.326	1.383	1.433	1.653	0.600	0.683	0.715	0.743	0.855
	0.990	1.045	1.197	1.250	1.300	1.507	0.557	0.668	0.702	0.740	0.888

300	0.950	1.169	1.363	1.412	1.462	1.663	0.574	0.690	0.717	0.742	0.838
	0.990	1.067	1.228	1.277	1.322	1.509	0.587	0.676	0.707	0.739	0.850

500	0.950	1.273	1.395	1.432	1.476	1.601	0.620	0.691	0.714	0.737	0.818
	0.990	1.132	1.243	1.288	1.334	1.465	0.607	0.674	0.707	0.734	0.839

Open in a new tab

To examine the performance of the two-sample test based on spline confidence band, Table 5 reports the empirical power and type I error for the proposed two-sample test. The data were generated from (16) with σ = 0.5 and m₁(x) = 10+sin {2π (x − 1/2)}+δ (x), n = n₁ for the first group, and m₂(x) = 10 + sin {2π (x − 1/2)}, n = n₂ for the another group. The remaining parameters, ξ_ik, ε_ij, ϕ₁(x) and ϕ₂(x) were set to the same values for each group as in (16). In order to mimic the real data in Section 6, we set N = 50, 100 and 200 when n₁ = 160, 80 and 40 and n₂ = 320, 160 and 80 accordingly. The studied hypotheses are:

H_{0} : m_{1} (x) = m_{2} (x), \forall x \in [0, 1] \leftrightarrow H_{a} : m_{1} (x) \neq m_{2} (x), for some x \in [0, 1] .

Table 5.

Empirical power and type I error of two-sample test using cubic spline.

δ (x)	n₁ = 160, n₂ = 320		n₁ = 80, n₂ = 160		n₁ = 40, n₂ = 80
	Nominal test level		Nominal test level		Nominal test level
	0.05	0.01	0.05	0.01	0.05	0.01

0.6t	1.000	1.000	0.980	0.918	0.794	0.574
0.7sin(x)	1.000	1.000	0.978	0.910	0.788	0.566

0	0.058	0.010	0.068	0.010	0.096	0.028

Monte Carlo SE	0.001	0.004	0.001	0.004	0.001	0.004

Open in a new tab

Table 5 shows the empirical frequencies of rejecting H₀ in this simulation study with nominal test level equal to 0.05 and 0.01. If δ(x) ≠ 0, these empirical powers should be close to 1, and for δ(x) ≡ 0, the nominal levels. Each set of simulations consists of 500 Monte Carlo runs. Asymptotic standard errors (as the number of Monte Carlo iterations tends to infinity) are reported in the last row of the table. Results are listed only for cubic spline confidence bands, as those of the linear spline are similar. Overall, the two-sample test performs well, even with a rather small difference (δ(x) = 0.7 sin(x)), providing a reasonable empirical power. Moreover, the differences between nominal levels and empirical type I error do diminish as the sample size increases.

6. Empirical Example

In this section, we revisit the Tecator data mentioned in Section 1, which can be downloaded at http://lib.stat.cmu.edu/datasets/tecator. In this data set, there are measurements on n = 240 meat samples, where for each sample a N = 100 channel near-infrared spectrum of absorbance measurements was recorded, and contents of moisture (water), fat and protein were also obtained. The Feed Analyzer worked in the wavelength range from 850 nm to 1050 nm. Figure 3 shows the scatter plot of this data set. The spectral data can be naturally considered as functional data, and we will perform a two-sample test to see whether absorbance from the spectrum differs significantly due to difference in fat content.

This data set has been used for comparing four classification methods (Li and Yu, 2008), building a regression model to predict the fat content from the spectrum (Li and Hsing, 2010b). Following Li and Yu (2008), we separate samples according to their fat contents being less than 20% or not. The right panel of Figure 3 shows 10 samples from each group. Here, hypothesis of interest is:

H_{0} : m_{1} (x) = m_{2} (x), \forall x \in [850, 1050] \leftrightarrow H_{a} : m_{1} (x) \neq m_{2} (x), for some x \in [850, 1050],

where m₁(x) and m₂(x) are the regression functions of absorbance on spectrum, for samples with fat content less than 20% and great than or equal to 20% respectively. Among 240 samples, there are n₁ = 155 with fat content less than 20%, the rest n₂ = 85 no less than 20%. The numbers of interior knots in (2) are computed as in Section 3 with c = 0.5 and are N_1m = 4 and N_2m = 3 for cubic spline fit and N_1m = 8 and N_2m = 6 for linear spline fit. Figure 4 depicts the linear and cubic spline confidence bands according to (15) at confidence levels 0.99 (upper and lower dashed lines) and 0.999995 (upper and lower dotted lines), with the center dashed-dotted line representing the spline estimator m̂₁(x)− m̂₂(x) and a solid line representing zero. Since even the 99.9995% confidence band does not contain the zero line entirely, the difference of low fat and high fat populations' absorbance was extremely significant. In fact, Figure 4 clearly indicates that the less the fat contained, the higher the absorbance is.

Plots of the fitted linear and cubic spline regressions of m₁(x)−m₂(x) for the Tecator data (dashed-dotted line), 99% confidence bands (15) (upper and lower dashed lines), 99.9995% confidence bands (15) (upper and lower dotted lines) and the zero line (solid line).

Acknowledgment

This work has been supported in part by NSF awards DMS 0706518, 1007594, NCI/NIH K-award, 1K01 CA131259, a Dissertation Continuation Fellowship from Michigan State University, and funding from the Jiangsu Specially-Appointed Professor Program, Jiangsu Province, China. The helpful comments by two referees and the Associate Editor have led to significant improvement of the paper.

APPENDIX

In this appendix, we use C to denote a generic positive constant unless otherwise stated.

A.1. Preliminaries

For any vector ζ = (ζ₁, …, ζ_s) ∈ R^s, denote the norm ‖ζ‖_r = (|ζ₁|^r + ⋯ + |ζ_s|^r)^1/r, 1 ≤ r < +∞, ‖ζ‖_∞ = max (|ζ₁|, …, |ζ_s|). For any s × s symmetric matrix A, we define λ_min (A) and λ_max (A) as its smallest and largest eigenvalues, and its L_r norm as ${‖ A ‖}_{r} = {max}_{ζ \in R^{s}, ζ \neq 0} {‖ A ζ ‖}_{r} {‖ ζ ‖}_{r}^{- 1}$ . In particular, ‖A‖₂ = λ_max (A), and if A is also nonsingular, ${‖ A^{- 1} ‖}_{2} = λ_{min}^{- 1} (A)$ .

For functions ϕ, φ ∈ L₂[0, 1], one denotes the theoretical and empirical inner products as $〈 ϕ, φ 〉 = \int_{0}^{1} ϕ (x) φ (x) dx and {〈 ϕ, φ 〉}_{2, N} = N^{- 1} \sum_{j = 1}^{N} ϕ (j / N) φ (j / N)$ . The corresponding norms are ${‖ ϕ ‖}_{2}^{2} = 〈 ϕ, ϕ 〉, {‖ ϕ ‖}_{2, N}^{2} = {〈 ϕ, ϕ 〉}_{2, N}$ .

We state a strong approximation result, which is used in the proof of Lemma A.6.

Lemma A.1: [Theorem 2.6.7 of Csőrgő and Révész (1981)] Suppose that ξ_i, 1 ≤ i < ∞ are iid with E(ξ₁) = 0, $E (ξ_{1}^{2}) = 1$ and H(x) > 0 (x ≥ 0) is an increasing continuous function such that x^−2−γH(x) is increasing for some γ > 0 and x⁻¹logH(x) is decreasing with EH (|ξ₁|) < ∞. Then there exist constants C₁, C₂, a > 0 which depend only on the distribution of ξ₁ and a sequence of Brownian motions ${W_{n} (l)}_{n = 1}^{\infty}$ , such that for any ${x_{n}}_{n = 1}^{\infty}$ satisfying H⁻¹ (n) < x_n < C₁ (nlogn)^1/2 and $S_{l} = \sum_{i = 1}^{l} ξ_{i}$

P {max_{1 \leq l \leq n} | S_{l} - W_{n} (l) | > x_{n}} \leq C_{2} n {H ({ax}_{n})}^{- 1} .

The next lemma is a special case of Theorem 13.4.3, Page 404 of DeVore and Lorentz (1993). Let p be a positive integer, a matrix A = (a_ij) is said to have bandwidth p if a_ij = 0 when |i − j| ≥ p, and p is the smallest integer with this property.

Lemma A.2: If a matrix A with bandwidth p has an inverse A⁻¹ and d = ‖A‖₂‖A⁻¹‖₂ is the condition number of A, then ‖A⁻¹‖_∞ ≤ 2c₀ (1 − η)⁻¹, with c₀ = ν^−2p‖A⁻¹‖₂, η = ((d² − 1)/(d² + 1))^1/(4p).

One writes $X^{T} X = N {\hat{V}}_{p}, X^{T} Y = {\sum_{j = 1}^{N} B_{J, p} (j / N) {\bar{Y}}_{. j}}_{J = 1 - p}^{N_{m}}$ , where the theoretical and empirical inner product matrices of ${B_{J, p} (x)}_{J = 1 - p}^{N_{m}}$ are denoted as

V_{p} = {(〈 B_{J, p,} B_{J', p} 〉)}_{J, J' = 1 - p}^{N_{m}}, {\hat{V}}_{p} = {({〈 B_{J, p,} B_{J', p} 〉}_{2, N})}_{J, J' = 1 - p}^{N_{m}} .

(A.1)

We establish next that the theoretical inner product matrix V_p defined in (A.1) has an inverse with bounded L_∞ norm.

Lemma A.3: For any positive integer p, there exists a constant M_p > 0 depending only on p, such that ${‖ V_{p}^{- 1} ‖}_{\infty} \leq M_{p} h_{m}^{- 1}$ , where h_m = (N_m + 1)⁻¹.

Proof. According to Lemma A.1 in Wang and Yang (2009b), V_p is invertible since it is a symmetric matrix with all eigenvalues positive, i.e. $0 < c_{p} N_{m}^{- 1} \leq λ_{min} (V_{p}) \leq λ_{max} (V_{p}) \leq C_{p} N_{m}^{- 1} < \infty$ , where c_p and C_p are positive real numbers. The compact support of B-spline basis makes V_p of bandwidth p, hence one can apply Lemma A.2. Since d_p = λ_max (V_p) /λ_min (V_p) ≤ C_p/c_p, hence

η_{p} = {(d_{p}^{2} - 1)}^{1 / 4 p} {(d_{p}^{2} + 1)}^{- 1 / 4 p} \leq {(C_{p}^{2} c_{p}^{- 2} - 1)}^{1 / 4 p} {(C_{p}^{2} c_{p}^{- 2} + 1)}^{- 1 / 4 p} < 1 .

If p = 1, then $V_{p}^{- 1} = h_{m}^{- 1} I_{N_{m} + p}$ , the lemma holds with M_p = 1. If p > 1, let $u_{1 - p} = {(1, 0_{N_{m} + p - 1}^{T})}^{T}, u_{0} = {(0_{p - 1}^{T}, 1, 0_{N_{m}}^{T})}^{T}$ , then ‖u_1−p‖₂ = ‖u₀‖₂ = 1. Also lemma A.1 in Wang and Yang (2009b) implies that

\begin{matrix} λ_{min} (V_{p}) & = λ_{min} (V_{p}) {‖ u_{1 - p} ‖}_{2}^{2} \leq u_{1 - p}^{T} V_{p} u_{1 - p} = {‖ B_{1 - p, p} ‖}_{2}^{2}, \\ u_{0}^{T} V_{p} u_{0} & = {‖ B_{0, p} ‖}_{2}^{2} \leq λ_{max} (V_{p}) {‖ u_{0} ‖}_{2}^{2} = λ_{max} (V_{p}), \end{matrix}

hence $d_{p} = λ_{max} (V_{p}) / λ_{min} (V_{p}) \geq {‖ B_{0, p} ‖}_{2}^{2} {‖ B_{1 - p, p} ‖}_{2}^{- 2} = r_{p} > 1$ where r_p is an absolute constant depending only on p. Thus $η_{p} = {(d_{p}^{2} - 1)}^{\frac{1}{4 p}} {(d_{p}^{2} + 1)}^{- \frac{1}{4 p}} \geq {(r_{p}^{2} - 1)}^{\frac{1}{4 p}} {(r_{p}^{2} + 1)}^{- \frac{1}{4 p}} > 0$ . Applying Lemma A.2 and putting the above bounds together, one obtains

{‖ V_{p}^{- 1} ‖}_{\infty} h_{m} \leq 2 η_{p}^{- 2 p} {‖ V_{p}^{- 1} ‖}_{2} {(1 - η_{p})}^{- 1} h_{m} \leq 2 {(\frac{r_{p}^{2} + 1}{r_{p}^{2} - 1})}^{1 / 2} λ_{min}^{- 1} (V_{p}) \times {1 - {(\frac{C_{p}^{2} c_{p}^{- 2} - 1}{C_{p}^{2} c_{p}^{- 2} + 1})}^{1 / 4 p}}^{- 1} h_{m} \leq 2 {(\frac{r_{p}^{2} + 1}{r_{p}^{2} - 1})}^{1 / 2} c_{p}^{- 1} {1 - {(\frac{C_{p}^{2} c_{p}^{- 2} - 1}{C_{p}^{2} c_{p}^{- 2} + 1})}^{1 / 4 p}}^{- 1} \equiv M_{p} .

The lemma is proved.

For any function ϕ ∈ C [0, 1], denote the vector ϕ = (ϕ (1/N), …, ϕ (N/N))^T and function

\tilde{ϕ} (x) \equiv {B_{1 - p, p} (x), \dots, B_{N_{m}, p} (x)} {(X^{T} X)}^{- 1} X^{T} ϕ .

Lemma A.4: Under Assumption (A3), for V_p and V̂p defined in (A.1), ‖V_p − V̂_p‖_∞ = O (N⁻¹) and ${‖ {\hat{V}}_{p}^{- 1} ‖}_{\infty} \leq 2 h_{m}^{- 1}$ . There exists c_ϕ,p ∈ (0,∞) such that when n is large enough, ‖ϕ̃‖_∞≤ c_ϕ,p ‖ϕ‖_∞ for any ϕ ∈ C [0, 1]. Furthermore, if ϕ ∈ C^p−1,μ [0, 1] for some μ ∈ (0, 1], then for C̃_p−1,μ = (c_ϕ,p + 1)C_p−1,μ

{‖ \tilde{ϕ} - ϕ ‖}_{\infty} \leq {\tilde{C}}_{p - 1, μ} {‖ ϕ^{(p - 1)} ‖}_{0, μ} h_{m}^{μ + p - 1} .

(A.2)

Proof. We first show that ‖V_p−V̂_p‖_∞ = O (N⁻¹). In the case of p = 1, define for any 0 ≤ J ≤ N_m, the number of design points j/N in the J-th interval I_J as N_J, then

N_{J} = {\begin{matrix} # {j : j \in [NJ / (N_{m} + 1), N (J + 1) / (N_{m} + 1))}, 0 \leq J < N_{m} \\ # {j : j \in [NJ / (N_{m} + 1), N (J + 1) / (N_{m} + 1)]}, J = N_{m} \end{matrix} .

Clearly max_{0≤J≤N_m} |N_J − Nh_m| ≤ 1 and hence

{‖ V_{1} - {\hat{V}}_{1} ‖}_{\infty} = max_{0 \leq J \leq N_{m}} | {‖ B_{J, 1} ‖}_{2, N}^{2} - {‖ B_{J, 1} ‖}_{2}^{2} | = max_{0 \leq J \leq N_{m}} | N^{- 1} \sum_{j = 1}^{N} B_{J, 1}^{2} (j / N) - h_{m} | = max_{0 \leq J \leq N_{m}} | N^{- 1} N_{J} - h_{m} | = N^{- 1} max_{0 \leq J \leq N_{m}} | N_{J} - {Nh}_{m} | \leq N^{- 1} .

For p > 1, de Boor (2001), Page 96, B-spline property ensures that there exists a constant C_1,p > 0 such that

max_{1 - p \leq J, J' \leq N_{m}} max_{1 \leq j \leq N} sup_{x \in [(j - 1) / N, j / N]} | B_{J, p} (j / N) B_{J', p} (j / N) - B_{J, p} (x) B_{J', p} (x) | \leq C_{1, p} N^{- 1} h_{m}^{- 1},

while there exists a constant C_2,p > 0 such that max_{1−p≤J,J′≤N_m} N_J,J′≤C_2,pNh_m where N_J,J′ = #{j : 1 ≤ j ≤ N,B_J,p (j/N)B_J′,p (j/N) > 0}. Hence

{‖ V_{p} - {\hat{V}}_{p} ‖}_{\infty} = max_{1 - p \leq J, J' \leq N_{m}} | N^{- 1} \sum_{j = 1}^{N} B_{J, p} (j / N) B_{J', p} (j / N) - \int_{0}^{1} B_{J, p} (x) B_{J', p} (x) dx | \leq max_{1 - p \leq J, J' \leq N_{m}} \sum_{j = 1}^{N} \int_{(j = 1) / N}^{j / N} | B_{J, p} (j / N) B_{J', p} (j / N) - B_{J, p} (x) B_{J', p} (x) | dx \leq C_{2, p} {Nh}_{m} \times N^{- 1} \times C_{1, p} N^{- 1} h_{m}^{- 1} \leq {CN}^{- 1} .

According to Lemma A.3, for any (N_m + p) vector γ, ${‖ V_{p}^{- 1} γ ‖}_{\infty} \leq h_{m}^{- 1} {‖ γ ‖}_{\infty}$ . Hence, ‖V_pγ‖_∞ ≥ h_m ‖γ‖_∞ : By Assumption (A3), N⁻¹ = o (h_m) so if n is large enough, for any γ, one has

{‖ {\hat{V}}_{p} γ ‖}_{\infty} \geq {‖ V_{p} γ ‖}_{\infty} - {‖ V_{p} γ - {\hat{V}}_{p} γ ‖}_{\infty} \geq h_{m} {‖ γ ‖}_{\infty} - O (N^{- 1}) {‖ γ ‖}_{\infty} = \frac{h_{m}}{2} {‖ γ ‖}_{\infty} .

Hence ${‖ {\hat{V}}_{p}^{- 1} ‖}_{\infty} \leq 2 h_{m}^{- 1}$ .

To prove the last statement of the lemma, note that for any x ∈ [0, 1] at most (p + 1) of the numbers B_1−p,p (x), …, B_{N_m,p} (x) are between 0 and 1, others being 0, so

| \tilde{ϕ} (x) | \leq (p + 1) | {(X^{T} X)}^{- 1} X^{T} ϕ | = (p + 1) | {\hat{V}}_{p}^{- 1} (X^{T} ϕ N^{- 1}) | \leq (p + 1) {‖ {\hat{V}}_{p}^{- 1} ‖}_{\infty} | X^{T} ϕ N^{- 1} | \leq 2 (p + 1) h_{m}^{- 1} | X^{T} I_{N} N^{- 1} | {‖ ϕ ‖}_{\infty}

in which I_N = (1, …, 1)^T. Clearly |X^TI_NN⁻¹|≤Ch_m for some C > 0, hence |ϕ̃(x)|≤ 2 (p + 1)C ‖ϕ‖_∞ = c_ϕ,p ‖ϕ‖_∞. Now if ϕ ∈ C^p−1,μ [0, 1] for some μ ∈ (0, 1], let g ∈ ℋ^(p−1) [0, 1] be such that ${‖ g - ϕ ‖}_{\infty} \leq C_{p - 1, μ} {‖ ϕ^{(p - 1)} ‖}_{0, μ} h_{m}^{μ + p - 1}$ according to Theorem 3.1, then g̃ ≡ g as g ∈ ℋ^(p−1) [0, 1] hence

{‖ \tilde{ϕ} - ϕ ‖}_{\infty} = {‖ \tilde{ϕ} - \tilde{g} - (ϕ - g) ‖}_{\infty} \leq {‖ \tilde{ϕ} - \tilde{g} ‖}_{\infty} + {‖ ϕ - g ‖}_{\infty} \leq (c_{ϕ, p} + 1) {‖ ϕ - g ‖}_{\infty} \leq (c_{ϕ, p} + 1) C_{p - 1, μ} {‖ ϕ^{(p - 1)} ‖}_{0, μ} h_{m}^{μ + p - 1}

proving (A.2).

Lemma A.5: Under Assumption (A5), for $C_{0} = C_{1} (1 + β C_{2} \sum_{s = 1}^{\infty} s^{β - 1 - γ_{1}})$ and n ≥ 1

max_{1 \leq k \leq κ} E | {\bar{ξ}}_{., k} - {\bar{Z}}_{. k, ξ} | \leq C_{0} n^{β - 1},

(A.3)

max_{1 \leq j \leq N} | {\bar{ε}}_{., j} - {\bar{Z}}_{. j, ε} | = O_{a . s .} (n^{β - 1})

(A.4)

where ${\bar{Z}}_{. k, ξ} = n^{- 1} \sum_{i = 1}^{n} Z_{ik, ξ}, {\bar{Z}}_{. j, ε} = n^{- 1} \sum_{i = 1}^{n} Z_{ij, ε}, 1 \leq j \leq N, 1 \leq k \leq κ$ . Also

max_{1 \leq k \leq κ} E | {\bar{ξ}}_{., k} | \leq n^{- 1 / 2} {(2 / π)}^{1 / 2} + C_{0} n^{β - 1} .

(A.5)

Proof. The proof of (A.4) is trivial. Assumption (A5) entails that F̄_n+t,k < C₂ (n + t)^−γ₁, k = 1, …, κ, t = 0, 1, …, ∞, in which ${\bar{F}}_{n + t, k} = P [| \sum_{i = 1}^{n} ξ_{ik} - \sum_{i = 1}^{n} Z_{ik, ξ} | > C_{1} {(n + t)}^{β}]$ . Taking expectation, one has

E | \sum_{i = 1}^{n} ξ_{ik} - \sum_{i = 1}^{n} Z_{ik, ξ} | \leq C_{1} {(n + 0)}^{β} + \sum_{t = 1}^{\infty} C_{1} {(n + t)}^{β} ({\bar{F}}_{n + t - 1, k} - {\bar{F}}_{n + t, k}) \leq C_{1} n^{β} + \sum_{t = 0}^{\infty} C_{1} C_{2} {(n + t)}^{- γ_{1}} β {(n + t)}^{β - 1} \leq C_{1} {n^{β} + β C_{2} \sum_{t = 0}^{\infty} {(n + t)}^{β - 1 - γ_{1}}} \leq n^{β} C_{1} [1 + β C_{2} n^{- 1 - γ_{1}} \sum_{s = 1}^{\infty} \sum_{t = sn - n}^{sn - 1} {(1 + t / n)}^{β - 1 - γ_{1}}] \leq n^{β} C_{1} [1 + β C_{2} n^{- 1 - γ_{1}} \times n \sum_{t = 1}^{\infty} t^{β - 1 - γ_{1}}] \leq C_{0} n^{β},

which proves (A.3) if one divides the above inequalities by n. The fact that Z̄_,k,ξ ~ N (0, 1/n) entails that E |Z̄_.k,ξ| = n^−1/2 (2/π)^1/2 and thus max_1≤k≤κE|ξ̄_.,k|≤ n^−1/2 (2/π)^1/2 + C₀n^β−1.

Lemma A.6: Assumption (A5) holds under Assumption (A5’).

Proof. Under Assumption (A5’), E |ξ_ik|^η₁ < +∞, η₁ > 4, E |ε_ij |^η₂ < +∞, η₂ > 4+2θ, so there exists some β ∈ (0, 1/2) such that η₁ > 2/β, η₂ > (2 + θ) /β.

Now let H(x) = x^η₁, then Lemma A.1 entails that there exists constants C_1k, C_2k, a_k which depend on the distribution of ξ_ik, such that for $x_{n} = C_{1 k} n^{β}, \frac{n}{H (a_{k} x_{n})} = a_{k}^{- η_{1}} C_{1 k}^{- η_{1}} n^{1 - η_{1} β}$ and iid N(0, 1) variables Z_ik,ξ such that

P [max_{1 \leq t \leq n} | \sum_{i = 1}^{t} ξ_{ik} - \sum_{i = 1}^{t} Z_{ik, ξ} | > C_{1 k} n^{β}] < C_{2 k} a_{k}^{- η_{1}} C_{1 k}^{- η_{1}} n^{1 - η_{1} β} .

Since η₁ > 2/β, γ₁ = η₁β − 1 > 1. If the number κ of k is finite, so there are common constants C₁,C₂ > 0 such that $P [{max}_{1 \leq t \leq n} | \sum_{i = 1}^{t} ξ_{ik} - \sum_{i = 1}^{t} Z_{ik, ξ} | > C_{1} n^{β}] < C_{2} n^{- γ_{1}}$ which entails (3) since κ is finite. If κ is infinite but all the ξ_ik's are iid, then C_{1_k}, C_2k, a_k are the same for all k, so the above is again true.

Likewise, under Assumption (A5’), if one lets H(x) = x^η₂, Lemma A.1 entails that there exists constants C₁, C₂, a which depend on the distribution of ξ_ij, such that for $x_{n} = C_{1} n^{β}, \frac{n}{H (a_{k} x_{n})} = a^{- η_{2}} C_{1}^{- η_{2}} n^{1 - η_{2} β}$ and iid N(0, 1) variables Z_ij,ε such that

max_{1 \leq j \leq N} P {max_{1 \leq t \leq N} | \sum_{i = 1}^{t} ε_{ij} - \sum_{i = 1}^{t} Z_{ij, ε} | > C_{1} n^{β}} \leq C_{2} a^{- η_{2}} C_{1}^{- η_{2}} n^{1 - η_{2} β},

now η₂β > 2+θ implies that there is γ₂ > 1 such that η₂β −1 > γ₂ +θ and (4) follows.

Proof of Proposition 3.2. Applying (A.2), ${‖ {\tilde{m}}_{p} - m ‖}_{\infty} \leq C_{p - 1, 1} h_{m}^{p}$ . Since Assumption (A3) implies that $O (h_{m}^{p} n^{1 / 2}) = o (1)$ , equation (10) is proved.

Proof of Proposition 3.3.

Denote by Z̃_p,ε (x) = {B_1−p,p (x), …, B_{N_m,p} (x)} (X^TX)⁻¹X^T Z, where Z = (σ (1/N) Z̄_.1,ε, …, σ (N/N) Z̄_.N,ξ)^T. By (A.4), one has ‖Z − e‖_∞ = O_a.s.(n^β−1), while

{‖ N^{- 1} X^{T} (Z - e) ‖}_{\infty} \leq {‖ Z - e ‖}_{\infty} max_{1 - p \leq J \leq N_{m}} {〈 B_{J, p}, 1 〉}_{2, N} \leq C {‖ Z - e ‖}_{\infty} max_{1 - p \leq J \leq N_{m}} # {j : B_{J, p} (j / N) > 0} N^{- 1} \leq C {‖ Z - e ‖}_{\infty} h_{m} .

Also for any fixed x ∈ [0, 1], one has

{‖ {\tilde{Z}}_{p, ε} (x) - {\tilde{e}}_{p} (x) ‖}_{\infty} = {‖ {B_{1 - p, p} (x), \dots, B_{N_{m}, p} (x)} {\hat{V}}_{p}^{- 1} N^{- 1} X^{T} (Z - e) ‖}_{\infty} \leq C {‖ {\hat{V}}_{p}^{- 1} ‖}_{\infty} {‖ Z - e ‖}_{\infty} h_{m} = O_{a . s .} (n^{β - 1}) .

Note next that the random vector ${\hat{V}}_{p}^{- 1} N^{- 1} X^{T} Z is (N_{m} + p)$ -dimensional normal with covariance matrix $N^{- 2} {\hat{V}}_{p}^{- 1} X^{T} var (Z) X {\hat{V}}_{p}^{- 1}$ , bounded above by

max_{x \in [0, 1]} σ^{2} (x) {‖ n^{- 1} N^{- 1} {\hat{V}}_{p}^{- 1} {\hat{V}}_{p} {\hat{V}}_{p}^{- 1} ‖}_{\infty} \leq {CN}^{- 1} n^{- 1} {‖ {\hat{V}}_{p}^{- 1} ‖}_{\infty} \leq {CN}^{- 1} n^{- 1} h_{m}^{- 1},

bounding the tail probabilities of entries of ${\hat{V}}_{p}^{- 1} N^{- 1} X^{T} Z$ and applying Borel-Cantelli Lemma leads to

{‖ {\hat{V}}_{p}^{- 1} N^{- 1} X^{T} Z ‖}_{\infty} = O_{a . s .} (N^{- 1 / 2} n^{- 1 / 2} h_{m}^{- 1 / 2} {log}^{1 / 2} (N_{m} + p)) = O_{a . s .} (N^{- 1 / 2} n^{- 1 / 2} h_{m}^{- 1 / 2} {log}^{1 / 2} n) .

Hence, ${sup}_{x \in [0, 1]} | n^{1 / 2} {\tilde{Z}}_{p, ε} (x) | = O_{a . s .} (N^{- 1 / 2} h_{m}^{- 1 / 2} {log}^{1 / 2} n)$ and

sup_{x \in [0, 1]} | n^{1 / 2} {\tilde{e}}_{p} (x) | = O_{a . s .} (n^{β - 1 / 2} + N^{- 1 / 2} h_{m}^{- 1 / 2} {log}^{1 / 2} n) = o_{a . s .} (1) .

Thus (11) holds according to Assumption (A3).

Proof of Proposition 3.4.

We denote ζ̃_k(x) = Z̄_.k,ξϕ_k (x), k = 1, …, κ and define

\tilde{ζ} (x) = n^{1 / 2} {[\sum_{k = 1}^{κ} {ϕ_{k} (x)}^{2}]}^{- 1 / 2} \sum_{k = 1}^{κ} {\tilde{ζ}}_{k} (x) = n^{1 / 2} G {(x, x)}^{- 1 / 2} \sum_{k = 1}^{κ} {\tilde{ζ}}_{k} (x) .

It is clear that ζ̃ (x) is a Gaussian process with mean 0, variance 1 and covariance Eζ̃ (x) ζ̃ (x′) = G(x, x)^−1/2 G(x, x′)^−1/2 G(x, x′), for any x, x′ ∈ [0, 1]. Thus ζ̃ (x), x ∈ [0, 1] has the same distribution as ζ (x), x ∈ [0, 1].

Using Lemma A.4, one obtains that

{‖ {\tilde{ϕ}}_{k} ‖}_{\infty} \leq c_{ϕ, p} {‖ ϕ_{k} ‖}_{\infty}, {‖ {\tilde{ϕ}}_{k} - ϕ_{k} ‖}_{\infty} \leq {\tilde{C}}_{0, μ} {‖ ϕ_{k} ‖}_{0, μ} h_{m}^{μ}, 1 \leq k \leq κ .

(A.6)

Applying the above (A.6), (A.5) and Assumptions (A3), (A4), one has

{En}^{1 / 2} {sup}_{x \in [0, 1]} G {(x, x)}^{- 1 / 2} | \sum_{k = 1}^{κ} {\bar{ξ}}_{. k} {ϕ_{k} (x) - {\tilde{ϕ}}_{k} (x)} | \leq {Cn}^{1 / 2} {\sum_{k = 1}^{κ_{n}} E | {\bar{ξ}}_{. k} | {‖ ϕ_{k} ‖}_{0, μ} h_{m}^{μ} + \sum_{k = κ_{n} + 1}^{κ} E | {\bar{ξ}}_{. k} | {‖ ϕ_{k} ‖}_{\infty}} \leq C {\sum_{k = 1}^{κ_{n}} {‖ ϕ_{k} ‖}_{0, μ} h_{m}^{μ} + \sum_{k = κ_{n} + 1}^{κ} {‖ ϕ_{k} ‖}_{\infty}} = o (1),

hence

n^{1 / 2} {sup}_{x \in [0, 1]} G {(x, x)}^{- 1 / 2} | \sum_{k = 1}^{κ} {\bar{ξ}}_{. k} {ϕ_{k} (x) - {\tilde{ϕ}}_{k} (x)} | = o_{P} (1) .

(A.7)

In addition, (A.3) and Assumptions (A3), (A4) entail that

{En}^{1 / 2} {sup}_{x \in [0, 1]} G {(x, x)}^{- 1 / 2} | \sum_{k = 1}^{κ} ({\bar{Z}}_{. k, ξ} - {\bar{ξ}}_{. k}) ϕ_{k} (x) | \leq {Cn}^{β - 1 / 2} \sum_{k = 1}^{κ} {‖ ϕ_{k} ‖}_{\infty} = o (1) .

hence

n^{1 / 2} {sup}_{x \in [0, 1]} G {(x, x)}^{- 1 / 2} | \sum_{k = 1}^{κ} ({\bar{Z}}_{. k, ξ} - {\bar{ξ}}_{. k}) ϕ_{k} (x) | = o_{P} (1) .

(A.8)

Note that

\begin{matrix} \bar{m} (x) - m (x) - {\tilde{ξ}}_{p} (x) & = \sum_{k = 1}^{κ} {\bar{ξ}}_{. k} {ϕ_{k} (x) - {\tilde{ϕ}}_{k} (x)}, \\ n^{- 1 / 2} G {(x, x)}^{1 / 2} \tilde{ζ} (x) - {\bar{m} (x) - m (x)} & = \sum_{k = 1}^{κ} ({\bar{Z}}_{. k, ξ} - {\bar{ξ}}_{. k}) ϕ_{k} (x) \end{matrix}

hence

\begin{matrix} n^{1 / 2} sup_{x \in [0, 1]} G {(x, x)}^{- 1 / 2} | \bar{m} (x) - m (x) - {\tilde{ξ}}_{p} (x) | = o_{P} (1), \\ {sup}_{x \in [0, 1]} | \tilde{ζ} (x) - n^{1 / 2} G {(x, x)}^{- 1 / 2} {\bar{m} (x) - m (x)} | = o_{P} (1) . \end{matrix}

according to (A.7) and (A.8), which leads to both (12) and (13).

References

1.Benko M, Härdle W, Kneip A. Common functional principal components. Annals of Statistics. 2009;37:1–34. [Google Scholar]
2.Bunea F, Ivanescu AE, Wegkamp M. Adaptive inference for the mean of a Gaussian process in functional data. Journal of the Royal Statistical Society, Series B. 2011 Forthcoming. [Google Scholar]
3.Cardot H. Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. Journal of Nonparametric Statistics. 2000;12:503–538. [Google Scholar]
4.Csőrgő M, Révész P. Strong Approximations in Probability and Statistics. New York-London: Academic Press; 1981. [Google Scholar]
5.Cuevas A, Febrero M, Fraiman R. On the use of the bootstrap for estimating functions with functional data. Computational Statistics and Data Analysis. 2006;51:1063–1074. [Google Scholar]
6.de Boor C. A Practical Guide to Splines. New York: Springer-Verlag; 2001. [Google Scholar]
7.Degras DA. Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica. 2011;21:1735–1765. [Google Scholar]
8.DeVore R, Lorentz G. Constructive approximation : polynomials and splines approximation. Berlin: Springer-Verlag; 1993. [Google Scholar]
9.Fan J, Lin S-K. Tests of significance when data are curves. Journal of the American Statistical Association. 1998;93:1007–1021. [Google Scholar]
10.Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Berlin: Springer Series in Statistics, Springer; 2006. [Google Scholar]
11.Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics. 2006;34:1493–1517. [Google Scholar]
12.Huang J, Yang L. Identification of nonlinear additive autoregressive models. Journal of the Royal Statistical Society Series B. 2004;66:463–477. [Google Scholar]
13.Li Y, Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics. 2010a;38:3321–3351. [Google Scholar]
14.Li Y, Hsing T. Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Annals of Statistics. 2010b;38:3028–3062. [Google Scholar]
15.Li B, Yu Q. Classification of functional data: a segmentation approach. Computational Statistics and Data Analysis. 2008;52:4790–4800. [Google Scholar]
16.Ma S, Yang L, Carroll RJ. A simultaneous confidence band for sparse longitudinal data. Statistica Sinica. 2011 doi: 10.5705/ss.2010.034. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ramsay JO, Silverman BW. Functional Data Analysis. Second Edition. New York: Springer Series in Statistics, Springer; 2005. [Google Scholar]
18.Rice JA, Wu CO. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics. 2001;57:253–259. doi: 10.1111/j.0006-341x.2001.00253.x. [DOI] [PubMed] [Google Scholar]
19.Wang J, Yang L. Polynomial spline confidence bands for regression curves. Statistica Sinica. 2009a;19:325–342. [Google Scholar]
20.Wang L, Yang L. Spline estimation of single index model. Statistica Sinica. 2009b;19:765–783. [Google Scholar]
21.Xue L, Yang L. Additive coefficient modelling via polynomial spline. Statistica Sinica. 2006;16:1423–1446. [Google Scholar]
22.Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:577–590. [Google Scholar]
23.Zhao Z, Wu W. Confidence bands in nonparametric time series regression. Annals of Statistics. 2008;36:1854–1878. [Google Scholar]
24.Zhou S, Shen X, Wolfe DA. Local asymptotics of regression splines and confidence regions. Annals of Statistics. 1998;26:1760–1782. [Google Scholar]
25.Zhou Z, Wu W. Simultaneous inference of linear models with time varying coefficients. Journal of the Royal Statistical Society, Series B. 2010;72:513–531. [Google Scholar]

[R1] 1.Benko M, Härdle W, Kneip A. Common functional principal components. Annals of Statistics. 2009;37:1–34. [Google Scholar]

[R2] 2.Bunea F, Ivanescu AE, Wegkamp M. Adaptive inference for the mean of a Gaussian process in functional data. Journal of the Royal Statistical Society, Series B. 2011 Forthcoming. [Google Scholar]

[R3] 3.Cardot H. Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. Journal of Nonparametric Statistics. 2000;12:503–538. [Google Scholar]

[R4] 4.Csőrgő M, Révész P. Strong Approximations in Probability and Statistics. New York-London: Academic Press; 1981. [Google Scholar]

[R5] 5.Cuevas A, Febrero M, Fraiman R. On the use of the bootstrap for estimating functions with functional data. Computational Statistics and Data Analysis. 2006;51:1063–1074. [Google Scholar]

[R6] 6.de Boor C. A Practical Guide to Splines. New York: Springer-Verlag; 2001. [Google Scholar]

[R7] 7.Degras DA. Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica. 2011;21:1735–1765. [Google Scholar]

[R8] 8.DeVore R, Lorentz G. Constructive approximation : polynomials and splines approximation. Berlin: Springer-Verlag; 1993. [Google Scholar]

[R9] 9.Fan J, Lin S-K. Tests of significance when data are curves. Journal of the American Statistical Association. 1998;93:1007–1021. [Google Scholar]

[R10] 10.Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Berlin: Springer Series in Statistics, Springer; 2006. [Google Scholar]

[R11] 11.Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics. 2006;34:1493–1517. [Google Scholar]

[R12] 12.Huang J, Yang L. Identification of nonlinear additive autoregressive models. Journal of the Royal Statistical Society Series B. 2004;66:463–477. [Google Scholar]

[R13] 13.Li Y, Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics. 2010a;38:3321–3351. [Google Scholar]

[R14] 14.Li Y, Hsing T. Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Annals of Statistics. 2010b;38:3028–3062. [Google Scholar]

[R15] 15.Li B, Yu Q. Classification of functional data: a segmentation approach. Computational Statistics and Data Analysis. 2008;52:4790–4800. [Google Scholar]

[R16] 16.Ma S, Yang L, Carroll RJ. A simultaneous confidence band for sparse longitudinal data. Statistica Sinica. 2011 doi: 10.5705/ss.2010.034. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Ramsay JO, Silverman BW. Functional Data Analysis. Second Edition. New York: Springer Series in Statistics, Springer; 2005. [Google Scholar]

[R18] 18.Rice JA, Wu CO. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics. 2001;57:253–259. doi: 10.1111/j.0006-341x.2001.00253.x. [DOI] [PubMed] [Google Scholar]

[R19] 19.Wang J, Yang L. Polynomial spline confidence bands for regression curves. Statistica Sinica. 2009a;19:325–342. [Google Scholar]

[R20] 20.Wang L, Yang L. Spline estimation of single index model. Statistica Sinica. 2009b;19:765–783. [Google Scholar]

[R21] 21.Xue L, Yang L. Additive coefficient modelling via polynomial spline. Statistica Sinica. 2006;16:1423–1446. [Google Scholar]

[R22] 22.Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:577–590. [Google Scholar]

[R23] 23.Zhao Z, Wu W. Confidence bands in nonparametric time series regression. Annals of Statistics. 2008;36:1854–1878. [Google Scholar]

[R24] 24.Zhou S, Shen X, Wolfe DA. Local asymptotics of regression splines and confidence regions. Annals of Statistics. 1998;26:1760–1782. [Google Scholar]

[R25] 25.Zhou Z, Wu W. Simultaneous inference of linear models with time varying coefficients. Journal of the Royal Statistical Society, Series B. 2010;72:513–531. [Google Scholar]

PERMALINK

Simultaneous Inference For The Mean Function Based on Dense Functional Data

Guanqun Cao

Lijian Yang

David Todem

Abstract

1. Introduction

Figure 3.

2. Main results

3. Error Decomposition For the Spline Estimators

4. Implementation

5. Simulation

Figure 1.

Figure 2.

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

6. Empirical Example

Figure 4.

Acknowledgment

APPENDIX

A.1. Preliminaries

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Simultaneous Inference For The Mean Function Based on Dense Functional Data

Guanqun Cao

Lijian Yang

David Todem

Abstract

1. Introduction

Figure 3.

2. Main results

3. Error Decomposition For the Spline Estimators

4. Implementation

5. Simulation

Figure 1.

Figure 2.

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

6. Empirical Example

Figure 4.

Acknowledgment

APPENDIX

A.1. Preliminaries

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases