Effective dimension reduction for sparse functional data

F YAO; E LEI; Y WU

doi:10.1093/biomet/asv006

. Author manuscript; available in PMC: 2015 Nov 10.

Published in final edited form as: Biometrika. 2015 Apr 2;102(2):421–437. doi: 10.1093/biomet/asv006

Effective dimension reduction for sparse functional data

F YAO ¹, E LEI ¹, Y WU ²

PMCID: PMC4640368 NIHMSID: NIHMS733755 PMID: 26566293

Summary

We propose a method of effective dimension reduction for functional data, emphasizing the sparse design where one observes only a few noisy and irregular measurements for some or all of the subjects. The proposed method borrows strength across the entire sample and provides a way to characterize the effective dimension reduction space, via functional cumulative slicing. Our theoretical study reveals a bias-variance trade-off associated with the regularizing truncation and decaying structures of the predictor process and the effective dimension reduction space. A simulation study and an application illustrate the superior finite-sample performance of the method.

Keywords: Cumulative slicing, Effective dimension reduction, Inverse regression, Sparse functional data

1. Introduction

In functional data analysis, one is often interested in how a scalar response $Y \in R$ varies with a smooth trajectory X(t), where t is an index variable defined on a closed interval $T$ ; see Ramsay & Silverman (2005). To be specific, one seeks to model the relationship Y = M(X; ϵ), where M is a smooth functional and the error process ϵ has zeromean and finite variance σ² and is independent of X. Although modelling M parametrically can be restrictive in many applications, modelling M nonparametrically is infeasible in practice due to slow convergence rates associated with the curse of dimensionality. Therefore a class of semiparametric index models has been proposed to approximate M(X; ϵ) with an unknown link function $g : R^{K + 1} \to R$ ; that is,

Y = g (〈 β_{1}, X 〉, \dots, 〈 β_{K}, X 〉; ϵ),

(1)

where K is the reduced dimension of the model, β₁, … , β_K are linearly independent index functions, and ⟨u, v⟩ = ∫ u(t)v(t) dt is the usual L² inner product. The functional linear model Y = β₀ + ∫ β₁(t)X(t) dt + ϵ is a special case and has been studied extensively (Cardot et al., 1999; Müller & Stadtmüller, 2005; Yao et al., 2005b; Cai & Hall, 2006; Hall & Horowitz, 2007; Yuan & Cai, 2010).

In this article, we tackle the index model (1) from the perspective of effective dimension reduction, in the sense that the K linear projections ⟨β₁, X⟩, … , ⟨β_K , X⟩ form a sufficient statistic. This is particularly useful when the process X is infinite-dimensional. Our primary goal is to discuss dimension reduction for functional data, especially when the trajectories are corrupted by noise and are sparsely observed with only a few observations for some, or even all, of the subjects. Pioneered by Li (1991) for multivariate data, effective dimension reduction methods are typically link-free, requiring neither specification nor estimation of the link function (Duan & Li, 1991), and aim to characterize the K-dimensional effective dimension reduction space S_Y∣X = span(β₁, … , β_K ) onto which X is projected. Such index functions β_k are called effective dimension reduction directions, K is the structural dimension, and S_Y∣X is also known as the central subspace (Cook, 1998). Li (1991) characterized S_Y∣X via the inversemean E(X ∣ Y) by sliced inverse regression, which has motivated much work for multivariate data. For instance, Cook & Weisberg (1991) estimated var(X ∣ Y), Li (1992) dealt with the Hessian matrix of the regression curve, Xia et al. (2002) proposed minimum average variance estimation as an adaptive approach based on kernel methods, Chiaromonte et al. (2002) modified sliced inverse regression for categorical predictors, Li & Wang (2007) worked with empirical directions, and Zhu et al. (2010) proposed cumulative slicing estimation to improve on sliced inverse regression.

The literature on effective dimension reduction for functional data is relatively sparse. Ferré & Yao (2003) proposed functional sliced inverse regression for completely observed functional data, and Li & Hsing (2010) developed sequential χ² testing procedures to decide the structural dimension of functional sliced inverse regression. Apart from effective dimension reduction approaches, James & Silverman (2005) estimated the index and link functions jointly for an additive form $g (〈 β_{1}, X 〉, \dots, 〈 β_{K}, X 〉; ϵ) = β_{0} + \sum_{k = 1}^{K} g_{k} (〈 β_{k}, X 〉) + ϵ$ , assuming that the trajectories are densely or completely observed and that the index and link functions are elements of a finite-dimensional spline space. Chen et al. (2011) estimated the index and additive link functions nonparametrically and relaxed the finite-dimensional assumption for theoretical analysis but retained the dense design.

Jiang et al. (2014) proposed an inverse regression method for sparse functional data by estimating the conditional mean $E {X (t) ∣ Y = \tilde{y}}$ with a two-dimensional smoother applied to pooled observed values of X in a local neighbourhood of $(t, \tilde{y})$ . The computation associated with a two-dimensional smoother is considerable and further increased by the need to select two different bandwidths. In contrast, we aim to estimate the effective dimension reduction space by drawing inspiration from cumulative slicing for multivariate data (Zhu et al., 2010). When adapted to the functional setting, cumulative slicing offers a novel and computationally simple way of borrowing strength across subjects to handle sparsely observed trajectories. This advantage has not been exploited elsewhere. As we will demonstrate later, although extending cumulative slicing to completely observed functional data is straightforward, it adopts a different strategy for the sparse design via a one-dimensional smoother, with potentially effective usage of the data.

2. Methodology

2·1. Dimension reduction for functional data

Let $T$ be a compact interval, and let X be a random variable defined on the real separable Hilbert space $H \equiv L^{2} (T)$ endowed with inner product $〈 f, g 〉 = \int_{T} f (t) g (t) d t$ and norm ‖f‖ = ⟨f, f⟩^1/2. We assume that:

Assumption 1

X is centred and has a finite fourth moment, ∫_τ E{X⁴(t)} dt < ∞.

Under Assumption 1, the covariance surface of X is ∑(s, t) = E{X(s)X(t)}, which generates a Hilbert–Schmidt operator ∑ on H that maps f to (∑f)(t) = ∫_τ ∑(s, t) f (s) ds. This operator can be written succinctly as ∑ = E(X ⊗ X), where the tensor product u ⊗ v denotes the rank-one operator on H that maps w to (u ⊗ v)w = ⟨u, w⟩v. By Mercer’s theorem, ∑ admits a spectral decomposition $Σ = \sum_{j = 1}^{\infty} α_{j} ϕ_{j} \otimes ϕ_{j}$ , where the eigenfunctions {ϕ_j}_j=1,2,… form a complete orthonormal system in H and the eigenvalues {α_j}_j=1,2,… are strictly decreasing and positive such that $\sum_{j = 1}^{\infty} α_{j} < \infty$ . Finally, recall that the effective dimension reduction directions β₁, … , β_K in model (1) are linearly independent functions in H, and the response $Y \in R$ is assumed to be conditionally independent of X given the K projections ⟨β₁, X⟩, … , ⟨β_K , X⟩.

Zhu et al. (2010) observed that for a fixed $\tilde{y} \in R$ , using two slices $I_{1} = (- \infty, \tilde{y}]$ and $I_{2} = (\tilde{y}, + \infty)$ would maximize the use of data and minimize the variability in each slice. The kernel of the sliced inverse regression operator var{E(X ∣ Y)} is estimated by the two-slice version $Λ_{0} (s, t; \tilde{y}) \propto m (s, \tilde{y}) m (t, \tilde{y})$ , where $m (t, \tilde{y}) = E {X (t) 1 (Y \leq \tilde{y})}$ is an unconditional expectation, in contrast to the conditional expectation E{X(t) ∣ Y} of functional sliced inverse regression. Since Λ₀ with a fixed $\tilde{y}$ spans at most one direction of S_Y∣X, it is necessary to combine all possible estimates of $m (t, \tilde{y})$ by letting $\tilde{y}$ run across the support of $\tilde{Y}$ , an independent copy of Y. Therefore, the kernel of the proposed functional cumulative slicing is

Λ (s, t) = E {m (s, \tilde{Y}) m (t, \tilde{Y}) w (\tilde{Y})},

(2)

where $w (\tilde{y})$ is a known nonnegative weight function. Denote the corresponding integral operator of Λ(s, t) by Λ also. The following theorem establishes the validity of our proposal. Analogous to the multivariate case, a linearity assumption is needed.

Assumption 2

For any function b ∈ H, there exist constants $c_{0}, \dots, c_{K} \in R$ such that

E (〈 b, X 〉 ∣ 〈 β_{1}, X 〉, \dots, 〈 β_{K}, X 〉) = c_{0} + \sum_{k = 1}^{K} c_{k} 〈 β_{k}, X 〉 .

This assumption is satisfied when X has an elliptically contoured distribution, which is more general than, but has a close connection to, a Gaussian process (Cambanis et al., 1981; Li & Hsing, 2010).

Theorem 1

If Assumptions 1 and 2 hold for model (1), then the linear space spanned by ${m (t, \tilde{y}) : \tilde{y} \in R}$ is contained in the linear space spanned by {∑β₁, … , ∑β_K}, i.e., $span ({m (t, \tilde{y}) : \tilde{y} \in R}) \subseteq span (Σ β_{1}, \dots, Σ β_{K})$ .

An important observation from Theorem 1 is that for any b ∈ H orthogonal to the space spanned by {∑β₁, … , ∑β_K} and for any x ∈ H, we have ⟨b, Λx⟩ = 0, implying that range(Λ) ⊆ span(∑β₁, … , ∑β_K). If has K nonzero eigenvalues, the space spanned by its eigenfunctions is precisely span(∑β₁, … , ∑β_K). Recall that our goal is to estimate the central subspace S_Y∣X, even though the effective dimension reduction directions themselves are not identifiable. For specificity, we regard these eigenfunctions of ∑⁻¹Λ associated with the K largest nonzero eigenvalues as the index functions β₁, … , β_K, unless stated otherwise.

As the covariance operator ∑ is Hilbert–Schmidt, it is not invertible when defined from H to H. Similarly to Ferré & Yao (2005), let R_∑ denote the range of ∑, and let $R_{Σ}^{- 1} = {b \in H : \sum_{j = 1}^{\infty} α_{j}^{- 1} 〈 b, ϕ_{j} 〉 ϕ_{j}, b \in R_{Σ}}$ . Then ∑ is a one-to-one mapping from $R_{Σ}^{- 1} \subset H$ onto R_∑, with inverse $Σ^{- 1} = \sum_{j = 1}^{\infty} α_{j}^{- 1} ϕ_{j} \otimes ϕ_{j}$ . This is reminiscent of finding a generalized inverse of a matrix. Let ξ_j = ⟨X, ϕ_j⟩ denote the jth principal component, or generalized Fourier coefficient, of X, and assume that:

Assumption 3

$\sum_{j = 1}^{\infty} \sum_{l = 1}^{\infty} α_{j}^{- 2} α_{l}^{- 1} E^{2} [E {ξ_{j} 1 (Y \leq \tilde{Y}) ∣ \tilde{Y}} E {ξ_{l} 1 (Y \leq \tilde{Y}) ∣ \tilde{Y}}] < \infty$ .

Proposition 1

Under Assumptions 1–3, the eigenspace associated with the K nonnull eigenvalues of ∑⁻¹ Λ is well-defined in H.

This is a direct analogue of Theorem 4.8 in He et al. (2003) and Theorem 2.1 in Ferré & Yao (2005).

2·2. Functional cumulative slicing for sparse functional data

For the data {(X_i, Y_i) : i = 1, … , n}, independent and identically distributed as (X, Y), the predictor trajectories X_i are observed intermittently, contaminated with noise, and collected in the form of repeated measurements {(T_ij, U_ij) : i = 1, … , n; j = 1, … , N_i}, where U_ij = X_i (T_ij) + ε_ij with measurement error ε_ij that is independent and identically distributed as ε with zero mean and constant variance $σ_{x}^{2}$ , and independent of all other random variables. When only a few observations are available for some or even all subjects, individual smoothing to recover X_i is infeasible and one must pool data across subjects for consistent estimation.

To estimate the functional cumulative slicing kernel Λ in (2), the key quantity is the unconditional mean $m (t, \tilde{y}) = E {X (t) 1 (Y \leq \tilde{y})}$ . For sparsely and irregularly observed X_i, cross-sectional estimation as used in multivariate cumulative slicing is inapplicable. To maximize the use of available data, we propose to pool the repeated measurements across subjects via a scatterplot smoother, which works in conjunction with the strategy of cumulative slicing. We use a local linear estimator $\hat{m} (t, \tilde{y}) = {\hat{a}}_{0}$ (Fan & Gijbels, 1996), solving

min_{(a_{0}, a_{1})} \sum_{i = 1}^{n} \sum_{j = 1}^{N_{i}} {U_{ij} 1 (Y_{i} \leq \tilde{y}) - a_{0} - a_{1} (T_{ij} - t)}^{2} K_{1} (\frac{T_{ij} - t}{h_{1}}),

(3)

where K₁ is a nonnegative and symmetric univariate kernel density and h₁ = h₁(n) is the bandwidth to control the amount of smoothing. We ignore the dependence among data from the same individual (Lin & Carroll, 2000) and use leave-one-curve-out crossvalidation to select h₁ (Rice & Silverman, 1991). Then an estimator of the kernel function Λ(s, t) is its sample moment

\hat{Λ} (s, t) = \frac{1}{n} \sum_{i = 1}^{n} \hat{m} (s, Y_{i}) \hat{m} (t, Y_{i}) w (Y_{i}) .

(4)

The distinction between our method and that of Jiang et al. (2014) lies in the inverse function m(t, y) which forms the effective dimension reduction space. It is notable that (4) is a univariate smoother that includes the effective data satisfying {T_ij ∈ (t − h₁, t + h₁), Y_i ≤ y}, roughly at an order of (nh₁)^1/2 for estimating m(t, y) = E{X(t)1(Y ≤ y)} for a sparse design with E(N_n) < ∞, where N_n is the expected number of repeated observations per subject. By contrast, equation (2·4) in Jiang et al. (2014) uses the data satisfying {T_ij ∈ (t − h_t, t + h_t), Y_i ∈ (y − h_y, y + h_y)} for estimating m(t, y)= E{X(t) ∣ Y = y}, roughly at an order of (nh_th_y)^1/2. This is reflected in the faster convergence of the estimated operator $\hat{Λ}$ compared with ${\hat{Γ}}_{e}$ in Jiang et al. (2014), indicating potentially effective usage of the data based on univariate smoothing. The computation associated with a two-dimensional smoother is considerable and further exacerbated by the need to select different bandwidths h_t and h_y.

For the covariance operator ∑, following Yao et al. (2005a), denote the observed raw covariances by G_i(T_ij, T_il) = U_ijU_il. Since E{G_i(T_ij, T_il) ∣ T_ij, T_il} = cov{X(T_ij), X(T_il)} + σ²δ_jl, where δ_jl is 1 if j = l and 0 otherwise, the diagonal of the raw covariances should be removed. Solving

min_{(b_{0}, b_{1}, b_{2})} \sum_{i = 1}^{n} \sum_{j \neq l}^{N_{i}} {G_{i} (T_{ij}, T_{il}) - b_{0} - b_{1} (T_{ij} - s) - b_{2} (T_{il} - t)}^{2} K_{2} (\frac{T_{ij} - s}{h_{2}}, \frac{T_{il} - t}{h_{2}})

(5)

yields $\hat{Σ} (s, t) = {\hat{b}}_{0}$ , where K₂ is a nonnegative bivariate kernel density and h₂ = h₂(n) is the bandwidth chosen by leave-one-curve-out crossvalidation; see Yao et al. (2005a) for details on the implementation. Since the inverse operator ∑⁻¹ is unbounded, we regularize by projection onto a truncated subspace. To be precise, let s_n be a possibly divergent sequence and let $Π_{s_{n}} = \sum_{j = 1}^{s_{n}} ϕ_{j} \otimes ϕ_{j}$ and ${\hat{Π}}_{s_{n}} = \sum_{j = 1}^{s_{n}} {\hat{ϕ}}_{j} \otimes {\hat{ϕ}}_{j}$ denote the orthogonal projectors onto the eigensubspaces associated with the s_n largest eigenvalues of ∑ and $\hat{Σ}$ , respectively. Then ∑_{s_n} = π_{s_n}∑π_{s_n} and ${\hat{Σ}}_{s_{n}} = {\hat{Π}}_{s_{n}} \hat{Σ} {\hat{Π}}_{s_{n}}$ are two sequences of finite-rank operators converging to ∑ and $\hat{Σ}$ as n → ∞, with bounded inverses $Σ_{s_{n}}^{- 1} = \sum_{j = 1}^{s_{n}} α_{j}^{- 1} ϕ_{j} \otimes ϕ_{j}$ and ${\hat{Σ}}_{s_{n}}^{- 1} = \sum_{j = 1}^{s_{n}} {\hat{α}}_{j}^{- 1} {\hat{ϕ}}_{j} \otimes {\hat{ϕ}}_{j}$ , respectively. Finally, we obtain the eigenfunctions associated with the K largest nonzero eigenvalues of ${\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ}$ as the estimates of the effective dimension reduction directions ${{\hat{β}}_{k, s_{n}}}_{k = 1, \dots, K}$ .

The situation for completely observed X_i is similar to the multivariate case and considerably simpler. The quantities $m (t, \tilde{y})$ and ∑(s, t) are easily estimated by their respective samplemoments $\hat{m} (t, \tilde{y}) = n^{- 1} \sum_{i = 1}^{n} X_{i} (t) 1 (Y_{i} \leq \tilde{y})$ and $\hat{Σ} (s, t) = n^{- 1} \sum_{i = 1}^{n} X_{i} (s) X_{i} (t)$ , while the estimate of Λ remains the same as (4). For densely observed X_i, individual smoothing can be used as a pre-processing step to recover smooth trajectories, and the estimation error introduced in this step can be shown to be asymptotically negligible under certain design conditions, i.e., it is equivalent to the ideal situation of the completely observed X_i (Hall et al., 2006).

For small values of $Y_{i}, \hat{m} (t, Y_{i})$ obtained by (3) may be unstable due to the smaller number of pooled observations in the slice. A suitable weight function w may be used to refine the estimator $\hat{Λ} (s, t)$ . In our numerical studies, the naive choice of w ≡ 1 performed fairly well compared to other methods. Analogous to the multivariate case, choosing an optimal w remains an open question.

Ferré & Yao (2005) avoided inverting ∑ with the claim that for a finite-rank operator Λ, range(Λ⁻¹∑) = range(∑⁻¹ Λ); however, Cook et al. (2010) showed that this requires more stringent conditions that are not easily fulfilled.

The selection of K_n and s_n deserves further study. For selecting the structural dimension K, the only relevant work to date is Li & Hsing (2010), where sequential χ² tests are used to determine K for the method of Ferré & Yao (2003). How to extend such tests to sparse functional data, if feasible at all, is worthy of further exploration. It is also important to tune the truncation parameter s_n that contributes to the variance-bias trade-off of the resulting estimator, although analytical guidance for this is not yet available.

3. Asymptotic properties

In this section we present asymptotic properties of the functional cumulative slicing kernel operator and the effective dimension reduction directions for sparse functional data. The numbers of measurements N_i and the observation times T_ij are considered to be random, to reflect a sparse and irregular design. Specifically, we make the following assumption.

Assumption 4

The N_i are independent and identically distributed as a positive discrete random variable N_n, where E(N_n) < ∞, pr(N_n ≥ 2) > 0 and pr(N_n ≤ M_n) = 1 for some constant sequence M_n that is allowed to diverge, i.e., M_n → ∞ as n → ∞. Moreover, ({T_ij, j ∈ J_i}, {U_ij, j ∈ J_i}) are independent of N_i for J_i ⊆ {1, … , N_i}.

Writing T_i = (T_i1, … , T_{iN_i})^T and U_i = (U_i1, … , U_{iN_i})^T, the data quadruplets Z_i = {T_i, U_i, Y_i, N_i} are thus independent and identically distributed. Extremely sparse designs are also covered, with only a few measurements for each subject. Other regularity conditions are standard and listed in the Appendix, including assumptions on the smoothness of themean and covariance functions of X, the distributions of the observation times, and the bandwidths and kernel functions used in the smoothing steps. Write ${‖ A ‖}_{H}^{2} = \int_{T} \int_{T} A^{2} (s, t) d s d t$ for $A \in L^{2} (T \times T)$ .

Theorem 2

Under Assumptions 1, 4 and A1–A4 in the Appendix, we have

{‖ \hat{Λ} - Λ ‖}_{H} = O_{p} (n^{- 1 ∕ 2} h_{1}^{- 1 ∕ 2} + h_{1}^{2}), {‖ \hat{Σ} - Σ ‖}_{H} = O_{p} (n^{- 1 ∕ 2} h_{2}^{- 1} + h_{2}^{2}) .

The key result here is the L² convergence of the estimated operator $\hat{Λ}$ , in which we exploit the projections of nonparametric U-statistics together with a decomposition of $\hat{m} (t, \tilde{y})$ to overcome the difficulty caused by the dependence among irregularly spaced measurements. The estimator $\hat{Λ}$ is obtained by averaging the smoothers $\hat{m} (t, Y_{i})$ over Y_i, which is crucial in order to achieve the univariate convergence rate for this bivariate estimator. The convergence of the covariance operator ∑ is presented for completeness, given in Theorem 2 of Yao & Müller (2010).

We are now ready to characterize the estimation of the central subspace S_Y∣X = span(β₁, … , β_K). Unlike the multivariate or finite-dimensional case, where the convergence of ${\hat{S}}_{Y ∣ X}$ follows immediately from the convergence of $\hat{Σ}$ and $\hat{Λ}$ given a bounded ∑⁻¹, we have to approximate ∑⁻¹ with a sequence of truncated estimates ${\hat{Σ}}_{s_{n}}^{- 1}$ , which introduces additional variability and bias inherent in a functional inverse problem. Since we specifically regarded the index functions {β₁, … , β_K} as the eigenfunctions associated with the K largest eigenvalues of ∑⁻¹Λ, their estimates are thus equivalent to ${\hat{S}}_{Y ∣ X}$ . For some constant C > 0, we require the eigenvalues of ∑ to satisfy the following condition:

Assumption 5

α_j > α_j+1 > 0, $E (ξ_{j}^{4}) \leq C α_{j}^{2}$ , and α_j − α_j+1 ≥ C⁻¹ j^−a−1 for j ≥ 1.

This condition on the decaying speed of the eigenvalues α_j prevents the spacings between consecutive eigenvalues from being too small, and also implies that α_j ≥ Cj^−a with a > 1 given the boundedness of ∑. Expressing the index functions as $β_{k} = \sum_{j = 1}^{\infty} b_{kj} ϕ_{j}$ (k = 1, … , K), we impose a decaying structure on the generalized Fourier coefficients b_kj = ⟨β_k, ϕ_j⟩:

Assumption 6

∣b_kj∣ ≤ Cj^−b for j ≥ 1 and k = 1, … , K, where b > 1/2.

In order to accurately estimate the eigenfunctions ϕ_j from $\hat{Σ}$ , one requires $j \leq \sup {ℓ : α_{ℓ} - α_{ℓ + 1} > 2 {‖ \hat{Σ} - Σ ‖}_{H}}$ , i.e., that the distance to α_j from the nearest eigenvalue does not fall below $2 {‖ \hat{Σ} - Σ ‖}_{H}$ (Hall & Hosseini-Nasab, 2006); this implicitly places an upper bound on the truncation parameter s_n. Given Assumption 5 and Theorem 2, we provide a sufficient condition on s_n. Here we write c_1n ≍ c_2n when c_1n = O(c_2n) and c_2n = O(c_1n).

Assumption 7

As n → ∞, $s_{n}^{a + 1} (n^{- 1 ∕ 2} h_{2}^{- 1} + h_{2}^{2}) \to 0$ ; moreover, if h₂ ≍ n^−1/6, s_n = o{n^1/(3a+3)}.

Theorem 3

Under Assumptions 1–7 and A1–A4 in the Appendix, for all k = 1, … , K,

‖ {\hat{β}}_{k, s_{n}} - β_{k} ‖ = O_{p} {s_{n}^{3 a ∕ 2 + 1} (n^{- 1 ∕ 2} h_{1}^{- 1 ∕ 2} + h_{1}^{2}) + s_{n}^{2 a + 3 ∕ 2} (n^{- 1 ∕ 2} h_{2}^{- 1} + h_{2}^{2}) + s_{n}^{- b + 1 ∕ 2}} .

(6)

This result associates the convergence of ${\hat{β}}_{k, s_{n}}$ with the truncation parameter s_n and the decay rates of α_j and b_kj, indicating a bias-variance trade-off with respect to s_n. One can view s_n as a tuning parameter that is allowed to diverge slowly and which controls the resolution of the covariance estimation. Specifically, the first two terms on the right-hand side of (6) are attributed to the variability of estimating $Σ_{s_{n}}^{- 1} Λ$ with ${\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ}$ , and the last term corresponds to the approximation bias of $Σ_{s_{n}}^{- 1} Λ$ . The first term of the variance is due to ${‖ {\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - {\hat{Σ}}_{s_{n}}^{- 1} Λ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} ‖}_{H}$ and becomes increasingly unstable with a larger truncation. The second part of the variance is due to ${‖ (Σ_{s_{n}}^{- 1} - {\hat{Σ}}_{s_{n}}^{- 1}) Λ Σ_{s_{n}}^{- 1 ∕ 2} ‖}_{H}$ , and the approximation bias is determined by the smoothness of β_k; for instance, a smoother β_k with a larger b leads to a smaller bias.

4. Simulations

In this section we illustrate the performance of the proposed functional cumulative slicing method in terms of estimation and prediction. Although our proposal is link-free for estimating index functions β_k, a general index model (1) may lead to model predictions with high variability, especially given the relatively small sample sizes frequently encountered in functional data analysis. Thus we follow Chen et al. (2011) in assuming an additive structure for the link function g in (1), i.e., $Y = β_{0} + \sum_{k = 1}^{K} g_{k} (〈 β_{k}, X 〉) + ϵ$ . In each Monte Carlo run, a sample of n = 200 functional trajectories is generated from the process $X_{i} (t) = \sum_{j = 1}^{50} ξ_{ij} ϕ_{j} (t)$ , where ϕ_j(t) = sin(πtj/5)/ √5 for j even and ϕ_j(t)=cos(πtj/5)/ √5 for j odd, the functional principal component scores ξ_ij are independent and identically distributed as N(0, j^−1·5), and $T = [0, 10]$ . For the setting of sparsely observed functional data, the number of observations per subject, N_i, is chosen uniformly from {5, … , 10}, the observational times T_ij are independent and identically distributed as Un[0, 10], and the measurement error ε_ij is independent and identically distributed as N(0, 0·1). The effective dimension reduction directions are generated by $β_{1} (t) = \sum_{j = 1}^{50} b_{j} ϕ_{j} (t)$ , where b_j = 1 for j = 1, 2, 3 and b_j = 4(j − 2)⁻³ for j = 4, … , 50, and β₂(t)=0·3^1/2(t/5 − 1), which cannot be represented with finite Fourier terms. The following single- and multiple-index models are considered:

\begin{matrix} Model I : Y & = \sin (π 〈 β_{1}, X 〉 ∕ 4) + ϵ, \\ Model II : Y & = arctan (π 〈 β_{1}, X 〉 ∕ 2) + ϵ, \\ Model III : Y & = \sin (π 〈 β_{1}, X 〉 ∕ 3) + exp (〈 β_{2}, X 〉 ∕ 3) + ϵ, \\ Model IV : Y & = arctan (π 〈 β_{1}, X 〉) + \sin (π 〈 β_{2}, X 〉 ∕ 6) ∕ 2 + ϵ, \end{matrix}

where the regression error ϵ is independent and identically distributed as N(0, 1) for all models.

We compare our method with that of Jiang et al. (2014) for sparse functional data in terms of estimation and prediction. Denote the true structural dimension by K₀. Due to the nonidentifiability of the β_k, we examine the projection operator of the effective dimension space, i.e., $P = \sum_{k = 1}^{K_{0}} β_{k} \otimes β_{k}$ and ${\hat{P}}_{K, s_{n}} = \sum_{k = 1}^{K} {\hat{β}}_{K, s_{n}} \otimes {\hat{β}}_{K, s_{n}}$ . To assess the estimation of the effective dimension reduction space, we calculate ${‖ {\hat{P}}_{K, s_{n}} - P ‖}_{H}$ as the estimation error. To assess model prediction, we estimate the link functions g_k nonparametrically by fitting a generalized additive model $Y_{i} = β_{0} + \sum_{k = 1}^{K} g_{k} (Z_{ik}) + ϵ_{i}$ (Hastie & Tibshirani, 1990), where $Z_{ik} = 〈 {\hat{β}}_{k, s_{n}}, {\tilde{X}}_{i} 〉$ with ${\tilde{X}}_{i}$ being the best linear unbiased predictor of X_i (Yao et al., 2005a). We generate a validation sample of size 500 in each Monte Carlo run and calculate the average of the relative prediction errors, 500⁻¹ $\sum_{i = 1}^{500} {({\hat{Y}}_{i}^{*} - Y_{i}^{*})}^{2} ∕ σ^{2}$ , over different values of (K, s_n), where σ² = 1 and ${\hat{Y}}_{i}^{*} = {\hat{β}}_{0} + \sum_{k = 1}^{K} {\hat{g}}_{k} (Z_{ik}^{*})$ with $Z_{ik}^{*} = 〈 {\hat{β}}_{k, s_{n}}, X_{i}^{*} 〉$ , the $X_{i}^{*}$ being the underlying trajectories in the testing sample. We report in Table 1 the average estimation and prediction errors, minimized over (K, s_n), along with their standard errors over 100 Monte Carlo repetitions. For estimation and prediction, both methods selected (K, s_n) = (1, 3) for the single-index models I and II, and selected (K, s_n) = (2, 2) for the multiple-index models III and IV. The two approaches perform comparably in this sparse setting, which could be due to the inverse covariance estimation that dominates the overall performance. Our method takes one-third of the computation time of the method of Jiang et al. (2014) for this sparse design.

Table 1.

Estimation error and relative prediction error, multiplied by 100, obtained from 100 Monte Carlo repetitions (with standard errors in parentheses) for sparse functional data

Model	Metric	FCS	IRLD	Metric	FCS	IRLD
I	Estimation error	61·1 (1·1)	61·3 (1·1)	Prediction error	17·7 (0·6)	17·9 (0·5)
II		59·3 (1·0)	59·5 (1·0)		19·6 (0·6)	19·4 (0·5)
III		63·7 (0·8)	63·9 (0·9)		18·8 (0·5)	19·5 (0·4)
IV		63·8 (0·8)	63·9 (0·9)		45·2 (1·1)	45·4 (1·1)

Open in a new tab

FCS, functional cumulative slicing; IRLD, the method of Jiang et al. (2014), where (K, s_n) is selected by minimizing the estimation and prediction errors.

We also present simulation results for dense functional data, where N_i = 50 and the T_ij are sampled independently and identically from Un[0, 10]. With (K, s_n) selected so as to minimize the estimation and prediction errors, we compare our proposal with the method of Jiang et al. (2014), functional sliced inverse regression (Ferré & Yao, 2003) using five or ten slices, and the functional index model of Chen et al. (2011). Table 2 indicates that our method slightly outperforms the method of Jiang et al. (2014), followed by the method of Chen et al. (2011), while functional sliced inverse regression (Ferré & Yao, 2003) is seen to be suboptimal. Our method takes only one-sixth of the time required by Jiang et al. (2014) for this setting.

Table 2.

Estimation error and relative prediction error, multiplied by 100, obtained from 100 Monte Carlo repetitions (with standard errors in parentheses) for dense functional data

Metric	Model	FCS	IRLD	FSIR5	FSIR10	FIND
Estimation error	I	39·2 (1·6)	45·5 (1·5)	59·4 (2·1)	61·7 (2·2)	47·1 (1·6)
	II	35·5 (1·4)	38·1 (1·3)	56·1 (1·8)	57·8 (1·9)	44·5 (1·5)
	III	59·6 (0·8)	63·1 (0·8)	72·6 (1·1)	74·1 (1·3)	63·6 (0·9)
	IV	57·2 (0·6)	59·0 (0·6)	69·3 (1·0)	68·9 (0·9)	61·0 (0·8)
Prediction error	I	11·1 (0·6)	12·7 (0·5)	17·1 (0·7)	16·7 (0·6)	16·1 (1·1)
	II	9·8 (0·5)	10·5 (0·4)	15·5 (0·7)	16·9 (1·0)	14·9 (0·8)
	III	13·5 (0·5)	15·2 (0·5)	15·8 (0·6)	16·6 (0·5)	14·7 (0·6)
	IV	19·9 (0·7)	21·9 (0·7)	31·1 (1·4)	32·2 (1·4)	24·2 (1·2)

Open in a new tab

FCS, functional cumulative slicing; IRLD, inverse regression for longitudinal data (Jiang et al., 2014); FSIR5, functional sliced inverse regression (Ferré & Yao, 2003) with five slices; FSIR10, functional sliced inverse regression (Ferré & Yao, 2003) with ten slices; FIND, functional index model (Chen et al., 2011).

5. Data application

In this application, we study the relationship between the winning bid price of 156 Palm M515 PDA devices auctioned on eBay between March and May of 2003 and the bidding history over the seven-day period of each auction. Each observation from a bidding history represents a live bid, the actual price a winning bidder would pay for the device, known as the willingness-to-pay price. Further details on the bidding mechanism can be found in Liu & Müller (2009). We adopt the view that the bidding histories are independent and identically distributed realizations of a smooth underlying price process. Due to the nature of online auctions, the jth bid of the ith auction usually arrives irregularly at time T_ij, and the number of bids N_i can vary widely, from nine to 52 for this dataset. As is usual in modelling prices, we take the log-transform of the bid prices. Figure 1 shows a sample of nine randomly selected bid histories over the seven-day period of the respective auction. Typically, the bid histories are sparse until the final hours of each auction, when bid sniping occurs. At this point, snipers place their bids at the last possible moment to try to deny competing bidders the chance of placing a higher bid.

Fig. 1 — Observed bid prices over the seven-day auction period of nine randomly selected auctions, after log-transform.

Since our main interest is in the predictive power of price histories up to time T for the winning bid prices, we consider the regression of the winning price on the history trajectory X(t) (t ∈ [0, T]), and set T = 4·5, 4·6, 4·7, … , 6·8 days. For each analysis on the domain [0, T], we select the optimal structural dimension K and the truncation parameter s_n by minimizing the average five-fold crossvalidated prediction error over 20 random partitions. Figure 2(a) shows the minimized average crossvalidated prediction errors, compared with those obtained using the method of Jiang et al. (2014). With the increasing prediction power as the bidding histories encompass more data, the proposed method appears to yield more favourable prediction across different time domains.

As an illustration, we present the analysis for T = 6. The estimated model components using the proposed method are shown in Fig. 2(b), with the parameters chosen as K = 2 and s_n = 2. The first index function assigns contrasting weights to bids made before and after the first day, indicating that some bidders tend to underbid at the beginning only to quickly overbid relative to the mean. The second index represents a cautious type of bidding behaviour, entering at a lower price and slowly increasing towards the average level. These features contribute most towards the prediction of the winning bid prices. Also seen are the nonlinear patterns in the estimated additive link functions. Using these estimated model components, we display in Fig. 3(a) the additive surface ${\hat{β}}_{0} + {\hat{g}}_{1} (〈 {\hat{β}}_{1}, X 〉) + {\hat{g}}_{2} (〈 {\hat{β}}_{2}, X 〉)$ . We also fit an unstructured index model g(⟨β₁, X⟩, ⟨β₂, X⟩), where g is nonparametrically estimated using a bivariate local linear smoother; this is shown in Fig. 3(b), and is seen to agree reasonably well with the additive regression surface.

Fig. 3 — Fitted regression surfaces for the eBay data: (a) additive; (b) unstructured.

Acknowledgement

We thank two reviewers, an associate editor, and the editor for their helpful comments. This research was partially supported by the U.S. National Institutes of Health and National Science Foundation, and the Natural Sciences and Engineering Research Council of Canada.

Appendix

Regularity conditions and auxiliary lemmas

Without loss of generality, we assume that the known weight function is w(·) = 1. Write $T = [a, b]$ and $T^{δ} = [a - δ, b + δ]$ for some δ > 0; denote a single observation time by T and a pair of observation times by (T₁, T₂)^T, with densities f(t) and f₂(s, t), respectively. Recall the unconditional mean function m(t, y) = E{X(t)1(Y ≤ y)}. The regularity conditions for the underlying moment functions and design densities are as follows, where ℓ₁ and ℓ₂ are nonnegative integers. We assume that:

Assumption A1

∂²∑/(∂s^ℓ₁∂t^ℓ₂) is continuous on $T^{δ} \times T^{δ}$ for ℓ₁ + ℓ₂ = 2, and ∂²m/∂t² is bounded and continuous with respect to $t \in T$ for all $y \in R$ .

Assumption A2

$f_{1}^{(1)} (t)$ is continuous on $T^{δ}$ with f₁(t) > 0, and ∂f₂/(∂s^ℓ₁∂t^ℓ₂) f₂ is continuous on $T^{δ} \times T^{δ}$ for ℓ₁ + ℓ₂ = 1 with f₂(s, t) > 0.

Assumption A1 can be guaranteed by a twice-differentiable process, and Assumption A2 is standard and implies the boundedness and Lipschitz continuity of f. Recall the bandwidths h₁ and h₂ used in the smoothing steps for $\hat{m}$ in (3) and $\hat{Σ}$ in (5), respectively; we assume that:

Assumption A3

h₁ → 0, h₂ → 0, ${nh}_{1}^{3} ∕ \log n \to \infty$ , and ${nh}_{2}^{2} \to \infty$ .

We say that a bivariate kernel function K₂ is of order (ν, ℓ), where ν is a multi-index ν = (ν₁, ν₂)^T, if

\int \int u^{ℓ_{1}} v^{ℓ_{2}} K_{2} (u, v) d u d v {\begin{matrix} = 0, & 0 \leq ℓ_{1} + ℓ_{2} < ℓ, ℓ_{1} \neq ν_{1}, ℓ_{2} \neq ν_{2}, \\ = {(- 1)}^{∣ ν ∣} ν_{1}! ν_{2}!, & ℓ_{1} = ν_{1}, ℓ_{2} = ν_{2}, \\ \neq 0, & ℓ_{1} + ℓ_{2} = ℓ, \end{matrix}

where ∣ν∣ = ν₁ + ν₂ < ℓ. The univariate kernel K is said to be of order (ν, ℓ) for a univariate ν = ν₁ if this definition holds with ℓ₂ = 0 on the right-hand side, integrating only over the argument u on the left-hand side. The following standard conditions on the kernel densities are required.

Assumption A4

The kernel functions K₁ and K₂ are nonnegative with compact supports, bounded, and of order (0, 2) and {(0, 0)^T, 2}, respectively.

Lemma A1 is a mean-squared version of Theorem 1 in Martins-Filho & Yao (2006), which asserts the asymptotic equivalence of a nonparametric V-statistic to the projection of the corresponding U-statistic. Lemma A2 is a restatement of Lemma 1(b) of Martins-Filho & Yao (2007) adapted to sparse functional data.

Lemma A1

Let ${Z_{i}}_{i = 1}^{n}$ be a sequence of independent and identically distributed random variables, and let u_n and v_n be U- and V-statistics with kernel function ψ_n(Z₁, … , Z_k). In addition, let ${\hat{u}}_{n} = n^{- 1} k \sum_{i = 1}^{n} {ψ_{1 n} (Z_{i}) - ϕ_{n}} + ϕ_{n}$ , where ψ_1n(Z_i)= E{ψ_n(Z_i1, … , Z_ik) ∣ Z_i} for i ∈ {i₁, … , i_k} and ϕ_n = E{ψ_n(Z₁, … , Z_k)}. If $E {ψ_{n}^{2} (Z_{1}, \dots, Z_{k})} = o (n)$ , then $n E {{(v_{n} - {\hat{u}}_{n})}^{2}} = o (1)$ .

Lemma A2

Given Assumptions 1–4 and A1–A4, let

s_{k} (t) = \sum_{i = 1}^{n} \sum_{j = 1}^{M_{n}} \frac{1 (N_{i} \geq j)}{{nh}_{1}} K_{1} (\frac{T_{ij} - t}{h_{1}}) {(\frac{T_{ij} - t}{h_{1}})}^{k} .

Then $\sup_{t \in T} h_{1}^{- 1} ∣ s_{k} (t) - E {s_{k} (t)} ∣ = O_{p} (1)$ for k = 0, 1, 2.

Proofs of the theorems

Proof of Theorem 1

This theorem is an analogue of Theorem 1 in Zhu et al. (2010); thus its proof is omitted.

Proof of Theorem 2

For brevity, we write M_n and N_n as M and N, respectively. Let

\begin{matrix} S_{n} (t) & = \sum_{i = 1}^{n} \sum_{j = 1}^{M} \frac{1 (N_{i} \geq j)}{{nh}_{1} E (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) (\begin{matrix} 1 & (T_{ij} - t) ∕ h_{1} \\ (T_{ij} - t) ∕ h_{1} & {(T_{ij} - t) ∕ h_{1}}^{2} \end{matrix}), \\ S (t) & = (\begin{matrix} f_{T} (t) & 0 \\ 0 & f_{T} (t) σ_{K}^{2} \end{matrix}), \end{matrix}

where $σ_{K}^{2} = \int u^{2} K (u) d u$ . The local linear estimator of $m (t, \tilde{y})$ with kernel K₁ is

\begin{matrix} \hat{m} & (t, \tilde{y}) \\ = (1, 0) S_{n}^{- 1} (t) (\begin{matrix} Σ_{i} Σ_{j} 1 (N_{i} \geq j) {{nh}_{1} E (N)}^{- 1} K_{1} {(T_{ij} - t) ∕ h_{1}} U_{ij} 1 (Y_{i} \leq \tilde{y}) \\ Σ_{i} Σ_{j} 1 (N_{i} \geq j) {{nh}_{1} E (N)}^{- 1} K_{1} {(T_{ij} - t) ∕ h_{1}} {(T_{ij} - t) ∕ h_{1}} U_{ij} 1 (Y_{i} \leq \tilde{y}) \end{matrix}) . \end{matrix}

Let $U_{ij}^{*} (t, \tilde{y}) = U_{ij} 1 (Y_{i} \leq \tilde{y}) - m (t, \tilde{y}) - m^{(1)} (t, \tilde{y}) (T_{ij} - t), W_{n} (z, t) = (1, 0) S_{n}^{- 1} (t) {(1, z)}^{T} K_{1} (z)$ . Then

\hat{m} (t, \tilde{y}) - m (t, \tilde{y}) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{M} \frac{1 (N_{i} \geq j)}{h_{1} E (N)} W_{n} (\frac{T_{ij} - t}{h_{1}}, t) U_{ij}^{*} (t, \tilde{y}) .

Denote a point between T_ij and t by $t_{ij}^{*}$ ; by Taylor expansion, $U_{ij}^{*} (\tilde{y}) = U_{ij} 1 (Y_{i} \leq \tilde{y}) - m (T_{ij}, \tilde{y}) + m^{(2)} (t_{ij}^{*}, \tilde{y}) {(T_{ij} - t)}^{2} ∕ 2$ . Finally, let $e_{ij} (\tilde{y}) = U_{ij} 1 (Y_{i} \leq \tilde{y}) - m (T_{ij}, \tilde{y})$ . Then

\begin{matrix} \hat{m} (t, \tilde{y}) - m (t, \tilde{y}) = & \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{M} \frac{1 (N_{i} \geq j)}{h_{1} E (N) f_{T} (t)} K_{1} (\frac{T_{ij} - t}{h_{1}}) e_{ij} (\tilde{y}) \\ + \frac{1}{2 n} \sum_{i = 1}^{n} \sum_{j = 1}^{M} \frac{h_{1} 1 (N_{i} \geq j)}{E (N) f_{T} (t)} K_{1} (\frac{T_{ij} - t}{h_{1}}) {(\frac{T_{ij} - t}{h_{1}})}^{2} m^{(2)} (t_{ij}^{*}, \tilde{y}) + A_{n} (t, \tilde{y}), \end{matrix}

where

A_{n} (t, \tilde{y}) = \hat{m} (t, \tilde{y}) - m (t, \tilde{y}) - {{nh}_{1} E (N) f_{T} (t)}^{- 1} \sum_{i} \sum_{j} 1 (N_{i} \geq j) K_{1} {(T_{ij} - t) ∕ h_{1}} U_{ij}^{*} (t, \tilde{y}) .

This allows us to write $\hat{Λ} (s, t) - Λ (s, t) = I_{1 n} (s, t) + I_{2 n} (s, t) + I_{3 n} (s, t)$ , where

\begin{matrix} I_{1 n} (s, t) & = \frac{1}{n} \sum_{k = 1}^{n} [m (s, Y_{k}) {\hat{m} (t, Y_{k}) - m (t, Y_{k})} + m (t, Y_{k}) {\hat{m} (s, Y_{k}) - m (s, Y_{k})}], \\ I_{2 n} (s, t) & = \frac{1}{n} \sum_{k = 1}^{n} {\hat{m} (s, Y_{k}) - m (s, Y_{k})} {\hat{m} (t, Y_{k}) - m (t, Y_{k})}, \\ I_{3 n} (s, t) & = \frac{1}{n} \sum_{k = 1}^{n} m (s, Y_{k}) m (t, Y_{k}) - Λ (s, t), \end{matrix}

which implies, by the Cauchy–Schwarz inequality, that ${‖ \tilde{Λ} - Λ ‖}_{H}^{2} = O_{p} ({‖ I_{1 n} ‖}_{H}^{2} + {‖ I_{2 n} ‖}_{H}^{2} + {‖ I_{3 n} ‖}_{H}^{2})$ . In the rest of the proofs, we drop the subscript H and the dummy variable in integrals for brevity. Recall that we defined Z_i as the underlying data quadruplet (T_i, U_i, Y_i, N_i). Further, let ∑_(p) h_{i₁,…,i_p} denote the sum of h_{i₁,…,i_p} over the permutations of i₁, … , i_p. Finally, by Assumptions A1, A2 and A4, write $0 < B_{T}^{L} \leq f (t) \leq B_{T}^{U} < \infty$ for the lower and upper bounds of the density function of T, ∣K₁(x)∣ ≤ B_K < ∞ for the bound on the kernel function K₁, and ∣∂²m/∂t²∣ ≤ B_2m < ∞ for the bound on the second partial derivative of $m (t, \tilde{y})$ with respect to t.

(a) We further decompose I_1n(s, t) into I_1n(s, t) = I_11n(s, t) + I_12n(s, t) + I_13n(s, t), where

\begin{matrix} I_{11 n} (s, t) = \frac{1}{n^{2}} \sum_{k = 1}^{n} \sum_{i = 1}^{n} \sum_{j = 1}^{M} & {\frac{1 (N_{i} \geq j)}{h_{1} E (N) f_{T} (t)} K_{1} (\frac{T_{ij} - t}{h_{1}}) e_{ij} (Y_{k}) m (s, Y_{k}) \\ + \frac{1 (N_{i} \geq j)}{h_{1} E (N) f_{T} (s)} K_{1} (\frac{T_{ij} - s}{h_{1}}) e_{ij} (Y_{k}) m (t, Y_{k})} \\ I_{12 n} (s, t) = \frac{1}{2 n^{2}} \sum_{k = 1}^{n} \sum_{i = 1}^{n} \sum_{j = 1}^{M} & {\frac{h_{1} 1 (N_{i} \geq j)}{E (N) f_{T} (t)} K_{1} (\frac{T_{ij} - t}{h_{1}}) {(\frac{T_{ij} - t}{h_{1}})}^{2} m^{(2)} (t_{ij}^{*}, Y_{k}) m (s, Y_{k}) \\ + \frac{h_{1} 1 (N_{i} \geq j)}{E (N) f_{T} (s)} K_{1} (\frac{T_{ij} - s}{h_{1}}) {(\frac{T_{ij} - s}{h_{1}})}^{2} m^{(2)} (t_{ij}^{*}, Y_{k}) m (t, Y_{k})} \\ I_{13 n} (s, t) = \frac{1}{n} \sum_{k = 1}^{n} {m (s, Y_{k}) A_{n} & (t, Y_{k}) + m (t, Y_{k}) A_{n} (s, Y_{k})}, \end{matrix}

which we analyse individually below.

We first show that $E {‖ I_{11 n} ‖}^{2} = O (n^{- 1} h_{1}^{- 1})$ . We write I_11n(s, t) as

I_{11 n} (s, t) = \frac{1}{2 n^{2}} \sum_{k = 1}^{n} \sum_{i = 1}^{n} \sum_{(2)} {h_{ik} (s, t) + h_{ik} (t, s)} = \frac{1}{2 n^{2}} \sum_{k = 1}^{n} \sum_{i = 1}^{n} ψ_{n} (Z_{i}, Z_{k}; s, t) = \frac{1}{2} v_{n} (s, t),

where v_n(s, t) is a V-statistic with symmetric kernel ψ_n(Z_i, Z_k; s, t) and

h_{ik} (s, t) = \sum_{j = 1}^{M} \frac{1 (N_{i} \geq j)}{h_{1} E (N) f_{T} (t)} K_{1} (\frac{T_{ij} - t}{h_{1}}) e_{ij} (Y_{k}) m (s, Y_{k}) .

Since E{e_ij(Y_k) ∣ T_ij, Y_k} = 0, it is easy to show that E{h_ik(s, t)} = E{h_ik(t, s)} = E{h_ki(s, t)} = E{h_ki(t, s)} = 0. Thus θ_n(s, t) = E{ψ_n(Z_i, Z_k; s, t)} = 0. Additionally,

\begin{matrix} ψ_{1 n} (Z_{i}; s, t) = & E {ψ_{n} (Z_{i}, Z_{k}; s, t) ∣ Z_{i}} \\ = & \sum_{j = 1}^{M} \frac{1 (N_{i} \geq j)}{h_{1} E (N) f_{T} (t)} K_{1} (\frac{T_{ij} - t}{h_{1}}) E {e_{ij} (Y_{k}) m (s, Y_{k}) ∣ Z_{i}} \\ + \sum_{j = 1}^{M} \frac{1 (N_{i} \geq j)}{h_{1} E (N) f_{T} (s)} K_{1} (\frac{T_{ij} - s}{h_{1}}) E {e_{ij} (Y_{k}) m (t, Y_{k}) ∣ Z_{i}} . \end{matrix}

If $E {ψ_{n}^{2} (Z_{i}, Z_{k}; s, t)} = o (n)$ , Lemma A1 gives $n E {v_{n} (s, t) - {\tilde{u}}_{n} (s, t)}^{2} = o (1)$ , where ${\tilde{u}}_{n} (s, t) = 2 n^{- 1} \sum_{i = 1}^{n} ψ_{1 n} (Z_{i}; s, t)$ is the projection of the corresponding U-statistic. Since the projection of a U-statistic is a sum of independent and identically distributed random variables ψ_1n(Z_i; s, t), E‖I_11n‖² ≤ 2n⁻¹ ∫∫ var[E{h_ik(s, t) ∣ Z_i}] + 2n⁻¹ ∫∫ var[E{h_ik(t, s) ∣ Z_i}] + o(n⁻¹), where

\begin{matrix} \frac{2}{n} \int \int var [ & E {h_{ik} (s, t) ∣ Z_{i}}] d s d t \\ \leq \sum_{j = 1}^{M} \frac{2 P (N_{i} \geq j)}{{nh}_{1}^{2} E (N)} \int \int f_{T}^{- 2} (t) E [K_{1}^{2} (\frac{T_{ij} - t}{h_{1}}) E^{2} {e_{ij} (Y_{k}) m (s, Y_{k}) ∣ Z_{i}}] d s d t \\ = \sum_{j = 1}^{M} \frac{2 P (N_{i} \geq j)}{{nh}_{1} E (N)} \int \int \int f_{T}^{- 2} (t) K_{1}^{2} (u) \\ \times E_{X_{i}, Y_{i}, ε_{i}} [E_{Y_{k}}^{2} {e_{ij} (Y_{k}) m (s, Y_{k}) ∣ T_{ij} = t + {uh}_{1}}] f_{T} (t + {uh}_{1}) d u d s d t \\ \to \sum_{j = 1}^{M} \frac{2 {‖ K_{1} ‖}^{2} P (N_{i} \geq j)}{{nh}_{1} E (N)} \int \int f_{T}^{- 1} (t) E_{X_{i}, Y_{i}, ε_{i}} [E_{Y_{k}}^{2} {e_{ij} (Y_{k}) m (s, Y_{k}) ∣ T_{ij} = t}] \\ \leq \frac{8 {‖ K_{1} ‖}^{2}}{{nh}_{1} B_{T}^{L} E (N)} E ‖ X^{4} ‖ + \frac{4 {‖ K_{1} ‖}^{2} σ^{2}}{{nh}_{1} B_{T}^{L} E (N)} E ‖ X^{2} ‖ = O (\frac{1}{{nh}_{1}}), \end{matrix}

where the first line follows from the Cauchy–Schwarz inequality, the second line is obtained by letting $u = h_{1}^{- 1} (T_{ij} - t)$ and observing that T_ij is independent of X_i, Y_i and ε_i, and the third line follows from a variant of the dominated convergence theorem (Prakasa Rao, 1983, p. 35) that allows us to derive rates of convergence for nonparametric regression estimators. Thus $E {‖ I_{11 n} ‖}^{2} = O (n^{- 1} h_{1}^{- 1})$ , provided that $E {ψ_{n}^{2} (Z_{i}, Z_{k}; s, t)} = o (n)$ for all i and k, which we will show below. For i ≠ k,

\begin{matrix} E {ψ_{n}^{2} (Z_{i}, Z_{k}; s, t)} = & 2 E {h_{ik}^{2} (s, t)} + 2 E {h_{ik}^{2} (t, s)} + 4 E {h_{ik} (s, t) h_{ik} (t, s)} \\ + 4 E {h_{ik} (s, t) h_{ki} (s, t)} + 4 E {h_{ik} (s, t) h_{ki} (t, s)} . \end{matrix}

Observe that

\frac{1}{n} E {h_{ik}^{2} (s, t)} = \sum_{j = l}^{M} \sum_{l = 1}^{M} \frac{P {N_{i} \geq max (j, l)}}{E^{2} (N) f_{T}^{2} (t)} \times E {{({nh}_{1}^{2})}^{- 1} K_{1} (\frac{T_{ij} - t}{h_{1}}) K_{1} (\frac{T_{il} - t}{h_{1}}) e_{ij} (Y_{k}) e_{il} (Y_{k}) m^{2} (s, Y_{k})} .

For j = l, applying the dominated convergence theorem to the expectation on the right-hand side gives $n^{- 1} h_{1}^{- 1} {‖ K_{1} ‖}^{2} f_{T} (t) E {e_{ij}^{2} (Y_{k}) m^{2} (s, Y_{k}) ∣ T_{ij} = t}$ , and hence $n^{- 1} E {h_{ik}^{2} (s, t)} = o (1)$ by Assumption A3. For j ≠ l, a similar argument gives n⁻¹ $f_{T}^{2} (t) E {e_{ij} (Y_{k}) e_{il} (Y_{k}) m^{2} (s, Y_{k}) ∣ T_{ij} = T_{il} = t}$ . The next two terms, $E {h_{ik}^{2} (t, s)}$ and E{h_ik(s, t)h_ik(t, s)}, can be handled similarly, as well as E{h_ik(s, t)h_ki(s, t)} = o(n) and the case of i = k. Thus $E {ψ_{n}^{2} (Z_{i}, Z_{k}; s, t)} = o (n)$ .

Using similar derivations, one can show that $E {‖ I_{12 n} ‖}^{2} = O (h_{1}^{4}) + o (n^{- 1})$ .

We next show that ${‖ I_{13 n} ‖}^{2} = O_{p} (n^{- 1} h_{1} + h_{1}^{6})$ . Following Lemma 2 of Martins-Filho & Yao (2007),

\begin{matrix} ∣ A_{n} (t, Y_{k}) ∣ = & ∣ \sum_{j = 1}^{M} \sum_{i = 1}^{n} \frac{1 (N_{i} \geq j)}{{nh}_{1} E (N)} {W_{n} (\frac{T_{ij} - t}{h_{1}}, t) - f_{T}^{- 1} (t) K_{1} (\frac{T_{ij} - t}{h_{1}})} U_{ij}^{*} (t, Y_{k}) ∣ \\ \leq & h_{1}^{- 1} {[(1, 0) {S_{n}^{- 1} (t) - S^{- 1} (t)}^{2} {(1, 0)}^{T}]}^{1 ∕ 2} \\ \times {∣ \sum_{j} \sum_{i} \frac{1 (N_{i} \geq j)}{n E (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) U_{ij}^{*} (t, Y_{k}) ∣ \\ + ∣ \sum_{j} \sum_{i} \frac{1 (N_{i} \geq j)}{nE (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) (\frac{T_{ij} - t}{h_{1}}) U_{ij}^{*} (t, Y_{k}) ∣} \\ = & h_{1}^{- 1} {[(1, 0) {S_{n}^{- 1} (t) - S^{- 1} (t)}^{2} {(1, 0)}^{T}]}^{1 ∕ 2} R_{n} (t, Y_{k}) . \end{matrix}

Lemma A2 gives $\sup_{t \in T} h_{1}^{- 1} ∣ {[(1, 0) {S_{n}^{- 1} (t) - S^{- 1} (t)}^{2} {(1, 0)}^{T}]}^{1 ∕ 2} ∣ = O_{p} (1)$ . Next, R_n(t, Y_k) ≤ ∣R_n1(t, Y_k)∣ + ∣R_n2(t, Y_k)∣ + ∣R_n3(t, Y_k)∣ + ∣R_n4(t, Y_k)∣, where

\begin{matrix} R_{n 1} (t, Y_{k}) & = \sum_{j = 1}^{M} \sum_{i = 1}^{n} \frac{1 (N_{i} \geq j)}{nE (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) e_{ij} (Y_{k}), \\ R_{n 2} (t, Y_{k}) & = \sum_{j = 1}^{M} \sum_{i = 1}^{n} \frac{h_{1}^{2} 1 (N_{i} \geq j)}{2 n E (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) {(\frac{T_{ij} - t}{h_{1}})}^{2} m^{(2)} (t_{ij}^{*}, Y_{k}), \\ R_{n 3} (t, Y_{k}) & = \sum_{j = 1}^{M} \sum_{i = 1}^{n} \frac{1 (N_{i} \geq j)}{n E (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) (\frac{T_{ij} - t}{h_{1}}) e_{ij} (Y_{k}), \\ R_{n 4} (t, Y_{k}) & = \sum_{j = 1}^{M} \sum_{i = 1}^{n} \frac{h_{1}^{2} 1 (N_{i} \geq j)}{2 n E (N)} K_{1} (\frac{T_{ij} - t}{h_{1}}) {(\frac{T_{ij} - t}{h_{1}})}^{3} m^{(2)} (t_{ij}^{*}, Y_{k}) . \end{matrix}

Thus n⁻¹ ∑_k m(s, Y_k) R_n1(t, Y_k) = h₁f_T(t)I_11n(s, t) leads to ‖h₁ f_T I_11n‖² = O_p(n⁻¹h₁), and n⁻¹ ∑_k m(s, Y_k)R_n2(t, Y_k) = h₁ f_T(t)I_12n(s, t) leads to ${‖ h_{1} f_{T} I_{12 n} ‖}^{2} = O_{p} (h_{1}^{6})$ . It follows similarly that the third and fourth terms are O_p(n⁻¹h₁) and $O_{p} (h_{1}^{6})$ , respectively. Hence, ${‖ I_{13 n} ‖}^{2} = O_{p} (n^{- 1} h_{1} + h_{1}^{6})$ . Combining the previous results gives ${‖ I_{1 n} ‖}^{2} = O_{p} {{({nh}_{1})}^{- 1} + h_{1}^{4}}$ .

(b) These terms are of higher order and are omitted for brevity.

(c) By the law of large numbers, ${‖ n^{- 1} \sum_{i = 1}^{n} m (\cdot, Y_{i}) m (\cdot, Y_{i}) - Λ ‖}^{2} = O_{p} (n^{- 1})$ .

Combining the above results leads to ${‖ \hat{Λ} - Λ ‖}^{2} = O_{p} (n^{- 1} h_{1}^{- 1} + h_{1}^{4})$ .

Proof of Theorem 3

To facilitate the theoretical derivation, for each k = 1, … , K let η_k = ∑^1/2 β_k and ${\hat{η}}_{k, s_{n}} = {\hat{Σ}}_{s_{n}}^{1 ∕ 2} {\hat{β}}_{k, s_{n}}$ be, respectively, the normalized eigenvectors of the equations ∑⁻¹ Λ ∑^−1/2η_k = λ_kβ_k and ${\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} {\hat{η}}_{k, s_{n}} = {\hat{λ}}_{k, s_{n}} {\hat{β}}_{k, s_{n}}$ . Then

\begin{matrix} ‖ {\hat{β}}_{k, s_{n}} - β_{k} ‖ & \leq ‖ {\hat{λ}}_{k, s_{n}}^{- 1} {\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - λ_{k}^{- 1} Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖ + λ_{k}^{- 1} ‖ Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖ ‖ {\hat{η}}_{k, s_{n}} - η_{k} ‖ \\ \leq {\hat{λ}}_{k, s_{n}}^{- 1} ‖ {\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖ + ‖ Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖ (∣ {\hat{λ}}_{k, s_{n}}^{- 1} - λ_{k}^{- 1} ∣ + λ_{k}^{- 1} ‖ {\hat{η}}_{k, s_{n}} - η_{k} ‖), \end{matrix}

using the fact that ${\hat{λ}}_{k, s_{n}}^{- 1} \leq λ_{k}^{- 1} + ∣ {\hat{λ}}_{k, s_{n}}^{- 1} - λ_{k}^{- 1} ∣$ . Applying standard theory for self-adjoint compact operators (Bosq, 2000) gives

\begin{matrix} ∣ {\hat{λ}}_{k, s_{n}} - λ_{k} ∣ & \leq ‖ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - Σ^{- 1 ∕ 2} Λ Σ^{- 1 ∕ 2} ‖, \\ ‖ {\hat{η}}_{k, s_{n}} - η_{k} ‖ & \leq C ‖ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - Σ^{- 1 ∕ 2} Λ Σ^{- 1 ∕ 2} ‖ (k = 1, \dots, K), \end{matrix}

where C > 0 is a generic positive constant. Thus ${{\hat{β}}_{k, s_{n}} - β_{k} ‖}^{2} = O_{p} (I_{1 n} + I_{2 n})$ , where

I_{1 n} = {‖ {\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖}^{2}, I_{2 n} = {‖ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - Σ^{- 1 ∕ 2} Λ Σ^{- 1 ∕ 2} ‖}^{2} .

It suffices to show that $I_{1 n} = O_{p} {s_{n}^{3 a + 2} (n^{- 1 ∕ 2} h_{1}^{- 1 ∕ 2} + h_{1}^{2}) + s_{n}^{(4 a + 3)} (n^{- 1 ∕ 2} h_{2}^{- 1} + h_{2}^{2}) + s_{n}^{- 2 b + 1}}$ . The calculations for I_2n are similar and yield that I_2n = o_p(I_1n).

Observe that I_1n ≤ 3I_11n + 3I_12n + 3I_13n, where $I_{11 n} = {‖ Σ_{s_{n}}^{- 1} Λ Σ_{s_{n}}^{- 1 ∕ 2} - Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖}^{2}, I_{12 n} = {‖ {\hat{Σ}}_{s_{n}}^{- 1} Λ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - Σ_{s_{n}}^{- 1} Λ Σ_{s_{n}}^{- 1 ∕ 2} ‖}^{2}$ and $I_{13 n} = {‖ {\hat{Σ}}_{s_{n}}^{- 1} \hat{Λ} {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} - {\hat{Σ}}_{s_{n}}^{- 1} Λ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} ‖}^{2}$ . Recall that $Π_{s_{n}} = \sum_{j = 1}^{s_{n}} ϕ_{j} \otimes ϕ_{j}$ is the orthogonal projector onto the eigenspace associated with the s_n largest eigenvalues of ∑. Let I denote the identity operator and $Π_{s_{n}}^{⊥} = I - Π_{s_{n}}$ the operator perpendicular to π_{s_n}, i.e., $Π_{s_{n}}^{⊥}$ is the orthogonal projector onto the eigenspace associated with eigenvalues of ∑ that are less than α_{s_n}. Thus $Σ_{s_{n}}^{- 1} Λ Σ_{s_{n}}^{- 1 ∕ 2} = Π_{s_{n}} Σ^{- 1} Λ Σ^{- 1 ∕ 2} Π_{s_{n}}$ allows us to write $I_{11 n} \leq ‖ Π_{s_{n}}^{⊥} Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖ + ‖ Σ^{- 1} Λ Σ^{- 1 ∕ 2} Π_{s_{n}}^{⊥} ‖$ . Since ∑⁻¹Λ ∑^−1/2η_k = λ_kβ_k,

\begin{matrix} {‖ Π_{s_{n}}^{⊥} Σ^{- 1} Λ Σ^{- 1 ∕ 2} ‖}^{2} & \leq \sum_{k = 1}^{K} λ_{k}^{2} {‖ \sum_{i > s_{n}} \sum_{j = 1}^{\infty} b_{k j} 〈 ϕ_{i}, ϕ_{j} 〉 ϕ_{i} ‖}^{2} \leq \sum_{k = 1}^{K} λ_{k}^{2} \sum_{i > s_{n}} b_{k i}^{2} \\ \leq C \sum_{k = 1}^{K} λ_{k}^{2} \sum_{i > s_{n}} i^{- 2 b} = O (s_{n}^{- 2 b + 1}); \end{matrix}

similarly, ${‖ Σ^{- 1} Λ Σ^{- 1 ∕ 2} Π_{s_{n}}^{⊥} ‖}^{2} = O (s_{n}^{- 2 b + 1})$ .

We decompose I_12n as I_12n ≤ 3I_121n + 3I_122n + 3I_123n, where $I_{121 n} = {‖ (Σ_{s_{n}}^{- 1} - {\hat{Σ}}_{s_{n}}^{- 1}) Λ Σ_{s_{n}}^{- 1 ∕ 2} ‖}^{2}, I_{122 n} = {‖ (Σ_{s_{n}}^{- 1} Λ (Σ_{s_{n}}^{- 1 ∕ 2} - {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2}) ‖}^{2}$ and $I_{123 n} = {‖ (Σ_{s_{n}}^{- 1} - {\hat{Σ}}_{s_{n}}^{- 1}) Λ (Σ_{s_{n}}^{- 1 ∕ 2} - {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2}) ‖}^{2}$ . Note that I_121n ≤ 6‖Λ∑^−1/2π_{s_n}‖²(I_1211n + I_1212n), where

I_{1211 n} = {‖ \sum_{j = 1}^{s_{n}} (α_{j}^{- 1} - {\hat{α}}_{j}^{- 1}) {\hat{ϕ}}_{j} \otimes {\hat{ϕ}}_{j} ‖}^{2}, I_{1212 n} = {‖ \sum_{j = 1}^{s_{n}} α_{j}^{- 1} ({\hat{ϕ}}_{j} \otimes {\hat{ϕ}}_{j} - ϕ_{j} \otimes ϕ_{j}) ‖}^{2} .

Under Assumption 7, for all 1 ≤ j ≤ s_n, $∣ {\hat{α}}_{j} - α_{j} ∣ \leq ‖ \hat{Σ} - Σ ‖ \leq 2^{- 1} (α_{j} - α_{j + 1})$ implies that ${\hat{α}}_{j} \geq 2^{- 1} (α_{j} + α_{j + 1}) \geq C^{- 1} j^{- a}$ , i.e., ${\hat{α}}_{j}^{- 1} \leq C j^{a}$ for some C > 0. Thus

I_{1211 n} \leq \sum_{j = 1}^{s_{n}} {({\hat{α}}_{j} - α_{j})}^{2} {(α_{j} {\hat{α}}_{j})}^{- 2} \leq C {‖ \hat{Σ} - Σ ‖}^{2} \sum_{j = 1}^{s_{n}} j^{4 a} = O_{p} {s_{n}^{4 a + 1} (n^{- 1} h_{2}^{- 2} + h_{2}^{4})} .

For I_1212n, using the fact that $‖ {\hat{ϕ}}_{j} - ϕ_{j} ‖ \leq 2 \sqrt 2 δ_{j}^{- 1} ‖ \hat{Σ} - Σ ‖$ (Bosq, 2000), where δ₁ = α₁ − α₂ and δ_j = min_2≤ℓ≤j (α_ℓ−1 − α_ℓ, α_ℓ − α_ℓ+1) for j > 1, we have that $δ_{j}^{- 1} \leq j^{a + 1}$ and

I_{1212 n} \leq 2 \sum_{j = 1}^{s_{n}} α_{j}^{- 2} {‖ {\hat{ϕ}}_{j} - ϕ_{j} ‖}^{2} \leq C {‖ \hat{Σ} - Σ ‖}^{2} \sum_{j = 1}^{s_{n}} j^{4 a + 2} = O_{p} {s_{n}^{4 a + 3} (n^{- 1} h_{2}^{- 2} + h_{2}^{4})} .

Using Λ∑^−1/2η_k = λ_k∑β_k, we obtain ${‖ Λ Σ^{- 1 ∕ 2} Π_{s_{n}} ‖}^{2} \leq \sum_{k = 1}^{K} λ_{k}^{2} {‖ \sum_{i = 1}^{s_{n}} α_{i} \sum_{j = 1}^{\infty} b_{kj} 〈 ϕ_{i}, ϕ_{j} 〉 ϕ_{i} ‖}^{2} \leq \sum_{k = 1}^{K} λ_{k}^{2} \sum_{i = 1}^{s_{n}} α_{i}^{2} b_{ki}^{2} < \infty$ . Thus $I_{121 n} = O_{p} {s_{n}^{4 a + 3} (n^{- 1} h_{2}^{- 2} + h_{2}^{4})}$ . Using decompositions similar to the one for I_121n, both I_122n and I_123n can be shown to be $o_{p} {s_{n}^{4 a + 3} (n^{- 1} h_{2}^{- 2} + h_{2}^{4})}$ . This leads to $I_{12 n} = O_{p} {s_{n}^{4 a + 3} (n^{- 1} h_{2}^{- 2} + h_{2}^{4})}$ .

Note that $I_{13 n} \leq {‖ {\hat{Σ}}_{s_{n}}^{- 1} ‖}^{2} {‖ \hat{Λ} - Λ ‖}^{2} {‖ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} ‖}^{2}$ , where ${‖ {\hat{Σ}}_{s_{n}}^{- 1} ‖}^{2} \leq \sum_{j = 1}^{s_{n}} {\hat{α}}_{j}^{- 2} \leq C \sum_{j = 1}^{s_{n}} j^{2 a} = O_{p} (s_{n}^{2 a + 1})$ and, similarly, ${‖ {\hat{Σ}}_{s_{n}}^{- 1 ∕ 2} ‖}^{2} = O_{p} (s_{n}^{a + 1})$ . From Theorem 2, we have $I_{13 n} = O_{p} {s_{n}^{3 a + 2} (n^{- 1} h_{1}^{- 1} + h_{1}^{4})}$ . Combining the above results leads to (6).

REFERENCES

Bosq D. Linear Processes in Function Spaces: Theory and Applications. Springer; New York: 2000. [Google Scholar]
Cai TT, Hall P. Prediction in functional linear regression. Ann. Statist. 2006;34:2159–79. [Google Scholar]
Cambanis S, Huang S, Simons G. On the theory of elliptically contoured distributions. J. Mult. Anal. 1981;11:368–85. [Google Scholar]
Cardot H, Ferraty F, Sarda P. Functional linear model. Statist. Prob. Lett. 1999;45:11–22. [Google Scholar]
Chen D, Hall P, MüLler H-G. Single and multiple index functional regression models with nonparametric link. Ann. Statist. 2011;39:1720–47. [Google Scholar]
Chiaromonte F, Cook DR, Li B. Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 2002;30:475–97. [Google Scholar]
Cook DR. Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley; New York: 1998. [Google Scholar]
Cook DR, Weisberg S. Comment on “Sliced inverse regression for dimension reduction”. J. Am. Statist. Assoc. 1991;86:328–32. [Google Scholar]
Cook DR, Forzani L, Yao A-F. Necessary and sufficient conditions for consistency of a method for smoothed functional inverse regression. Statist. Sinica. 2010;20:235–8. [Google Scholar]
Duan N, Li K-C. Slicing regression: A link-free regression method. Ann. Statist. 1991;19:505–30. [Google Scholar]
Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman & Hall; London: 1996. [Google Scholar]
Ferré L, Yao A-F. Functional sliced inverse regression analysis. Statistics. 2003;37:475–88. [Google Scholar]
Ferré L, Yao A-F. Smoothed functional inverse regression. Statist. Sinica. 2005;15:665–83. [Google Scholar]
Hall P, Horowitz JL. Methodology and convergence rates for functional linear regression. Ann. Statist. 2007;35:70–91. [Google Scholar]
Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. J. R. Statist. Soc. B. 2006;68:109–26. [Google Scholar]
Hall P, Müller H-G, Wang J-L. Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 2006;34:1493–517. [Google Scholar]
Hastie TJ, Tibshirani RJ. Generalized Additive Models. Chapman & Hall; London: 1990. [Google Scholar]
He G, Müller H-G, Wang J-L. Functional canonical analysis for square integrable stochastic processes. J. Mult. Anal. 2003;85:54–77. [Google Scholar]
James GM, Silverman BW. Functional adaptive model estimation. J. Am. Statist. Assoc. 2005;100:565–76. [Google Scholar]
Jiang C-R, Yu W, Wang J-L. Inverse regression for longitudinal data. Ann. Statist. 2014;42:563–91. [Google Scholar]
Li B, Wang S. On directional regression for dimension reduction. J. Am. Statist. Assoc. 2007;102:997–1008. [Google Scholar]
Li K-C. Sliced inverse regression for dimension reduction. J. Am. Statist. Assoc. 1991;86:316–27. [Google Scholar]
Li K-C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Am. Statist. Assoc. 1992;87:1025–39. [Google Scholar]
Li Y, Hsing T. Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Ann. Statist. 2010;38:3028–62. [Google Scholar]
Lin X, Carroll RJ. Nonparametric function estimation for clustered data when the predictor ismeasured without/with error. J. Am. Statist. Assoc. 2000;95:520–34. [Google Scholar]
Liu B, Müller H-G. Estimating derivatives for samples of sparsely observed functions, with application to on-line auction dynamics. J. Am. Statist. Assoc. 2009;104:704–14. [Google Scholar]
Martins-Filho C, Yao F. A note on the use of V and U statistics in nonparametric models of regression. Ann. Inst. Statist. Math. 2006;58:389–406. [Google Scholar]
Martins-Filho C, Yao F. Nonparametric frontier estimation via local linear regression. J. Economet. 2007;141:283–319. [Google Scholar]
Müller H-G, Stadtmüller U. Generalized functional linear models. Ann. Statist. 2005;33:774–805. [Google Scholar]
Prakasa Rao BLS. Nonparametric Functional Estimation. Academic Press; Orlando, Florida: 1983. [Google Scholar]
Ramsay JO, Silverman BW. Functional Data Analysis. 2nd ed Springer; New York: 2005. [Google Scholar]
Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. B. 1991;53:233–43. [Google Scholar]
Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. J. R. Statist. Soc. B. 2002;64:363–410. [Google Scholar]
Yao F, Müller H-G. Empirical dynamics for longitudinal data. Ann. Statist. 2010;38:3458–86. [Google Scholar]
Yao F, Müller H-G, Wang J-L. Functional data analysis for sparse longitudinal data. J. Am. Statist. Assoc. 2005a;100:577–90. [Google Scholar]
Yao F, Müller H-G, Wang J-L. Functional linear regression analysis for longitudinal data. Ann. Statist. 2005b;33:2873–903. [Google Scholar]
Yuan M, Cai TT. A reproducing kernel Hilbert space approach to functional linear regression. Ann. Statist. 2010;38:3412–44. [Google Scholar]
Zhu L-P, Zhu L-X, Feng Z-H. Dimension reduction in regressions through cumulative slicing estimation. J. Am. Statist. Assoc. 2010;105:1455–66. [Google Scholar]

[R1] Bosq D. Linear Processes in Function Spaces: Theory and Applications. Springer; New York: 2000. [Google Scholar]

[R2] Cai TT, Hall P. Prediction in functional linear regression. Ann. Statist. 2006;34:2159–79. [Google Scholar]

[R3] Cambanis S, Huang S, Simons G. On the theory of elliptically contoured distributions. J. Mult. Anal. 1981;11:368–85. [Google Scholar]

[R4] Cardot H, Ferraty F, Sarda P. Functional linear model. Statist. Prob. Lett. 1999;45:11–22. [Google Scholar]

[R5] Chen D, Hall P, MüLler H-G. Single and multiple index functional regression models with nonparametric link. Ann. Statist. 2011;39:1720–47. [Google Scholar]

[R6] Chiaromonte F, Cook DR, Li B. Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 2002;30:475–97. [Google Scholar]

[R7] Cook DR. Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley; New York: 1998. [Google Scholar]

[R8] Cook DR, Weisberg S. Comment on “Sliced inverse regression for dimension reduction”. J. Am. Statist. Assoc. 1991;86:328–32. [Google Scholar]

[R9] Cook DR, Forzani L, Yao A-F. Necessary and sufficient conditions for consistency of a method for smoothed functional inverse regression. Statist. Sinica. 2010;20:235–8. [Google Scholar]

[R10] Duan N, Li K-C. Slicing regression: A link-free regression method. Ann. Statist. 1991;19:505–30. [Google Scholar]

[R11] Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman & Hall; London: 1996. [Google Scholar]

[R12] Ferré L, Yao A-F. Functional sliced inverse regression analysis. Statistics. 2003;37:475–88. [Google Scholar]

[R13] Ferré L, Yao A-F. Smoothed functional inverse regression. Statist. Sinica. 2005;15:665–83. [Google Scholar]

[R14] Hall P, Horowitz JL. Methodology and convergence rates for functional linear regression. Ann. Statist. 2007;35:70–91. [Google Scholar]

[R15] Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. J. R. Statist. Soc. B. 2006;68:109–26. [Google Scholar]

[R16] Hall P, Müller H-G, Wang J-L. Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 2006;34:1493–517. [Google Scholar]

[R17] Hastie TJ, Tibshirani RJ. Generalized Additive Models. Chapman & Hall; London: 1990. [Google Scholar]

[R18] He G, Müller H-G, Wang J-L. Functional canonical analysis for square integrable stochastic processes. J. Mult. Anal. 2003;85:54–77. [Google Scholar]

[R19] James GM, Silverman BW. Functional adaptive model estimation. J. Am. Statist. Assoc. 2005;100:565–76. [Google Scholar]

[R20] Jiang C-R, Yu W, Wang J-L. Inverse regression for longitudinal data. Ann. Statist. 2014;42:563–91. [Google Scholar]

[R21] Li B, Wang S. On directional regression for dimension reduction. J. Am. Statist. Assoc. 2007;102:997–1008. [Google Scholar]

[R22] Li K-C. Sliced inverse regression for dimension reduction. J. Am. Statist. Assoc. 1991;86:316–27. [Google Scholar]

[R23] Li K-C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Am. Statist. Assoc. 1992;87:1025–39. [Google Scholar]

[R24] Li Y, Hsing T. Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Ann. Statist. 2010;38:3028–62. [Google Scholar]

[R25] Lin X, Carroll RJ. Nonparametric function estimation for clustered data when the predictor ismeasured without/with error. J. Am. Statist. Assoc. 2000;95:520–34. [Google Scholar]

[R26] Liu B, Müller H-G. Estimating derivatives for samples of sparsely observed functions, with application to on-line auction dynamics. J. Am. Statist. Assoc. 2009;104:704–14. [Google Scholar]

[R27] Martins-Filho C, Yao F. A note on the use of V and U statistics in nonparametric models of regression. Ann. Inst. Statist. Math. 2006;58:389–406. [Google Scholar]

[R28] Martins-Filho C, Yao F. Nonparametric frontier estimation via local linear regression. J. Economet. 2007;141:283–319. [Google Scholar]

[R29] Müller H-G, Stadtmüller U. Generalized functional linear models. Ann. Statist. 2005;33:774–805. [Google Scholar]

[R30] Prakasa Rao BLS. Nonparametric Functional Estimation. Academic Press; Orlando, Florida: 1983. [Google Scholar]

[R31] Ramsay JO, Silverman BW. Functional Data Analysis. 2nd ed Springer; New York: 2005. [Google Scholar]

[R32] Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. B. 1991;53:233–43. [Google Scholar]

[R33] Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. J. R. Statist. Soc. B. 2002;64:363–410. [Google Scholar]

[R34] Yao F, Müller H-G. Empirical dynamics for longitudinal data. Ann. Statist. 2010;38:3458–86. [Google Scholar]

[R35] Yao F, Müller H-G, Wang J-L. Functional data analysis for sparse longitudinal data. J. Am. Statist. Assoc. 2005a;100:577–90. [Google Scholar]

[R36] Yao F, Müller H-G, Wang J-L. Functional linear regression analysis for longitudinal data. Ann. Statist. 2005b;33:2873–903. [Google Scholar]

[R37] Yuan M, Cai TT. A reproducing kernel Hilbert space approach to functional linear regression. Ann. Statist. 2010;38:3412–44. [Google Scholar]

[R38] Zhu L-P, Zhu L-X, Feng Z-H. Dimension reduction in regressions through cumulative slicing estimation. J. Am. Statist. Assoc. 2010;105:1455–66. [Google Scholar]

PERMALINK

Effective dimension reduction for sparse functional data

F YAO

E LEI

Y WU

Summary

1. Introduction

2. Methodology

2·1. Dimension reduction for functional data

Assumption 1

Assumption 2

Theorem 1

Assumption 3

Proposition 1

2·2. Functional cumulative slicing for sparse functional data

3. Asymptotic properties

Assumption 4

Theorem 2

Assumption 5

Assumption 6

Assumption 7

Theorem 3

4. Simulations

Table 1.

Table 2.

5. Data application

Fig. 1.

Fig. 2.

Fig. 3.

Acknowledgement

Appendix

Regularity conditions and auxiliary lemmas

Assumption A1

Assumption A2

Assumption A3

Assumption A4

Lemma A1

Lemma A2

Proofs of the theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases